From patchwork Tue Oct 2 12:25:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Oberhollenzer X-Patchwork-Id: 977816 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:e::133; helo=bombadil.infradead.org; envelope-from=linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=sigma-star.at Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="ZfVuonvp"; dkim-atps=neutral Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42Pdfq1xGyz9sjB for ; Tue, 2 Oct 2018 22:26:11 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=6hIVJpmp9a05JRxuETa+LvECvlRoqA/InWr+b3yUmEc=; b=ZfV uonvp5vZwT4MQFanaSq31dyEGvfbZMUrkoUVj8oifEQobhnL7qkDNzTWZ8TEP5NKtFQaIupUogpU/ yoRuvtgkpGh99jRrswIo5ypLhhK43dBdoDp8r/NQ3h08KxRsmsvj6J7t0xsEI5DHYgh5fcAfODiI7 sbjdOjS060UJboyXD0+1ahh1XgmTIw4BKvtGn3HwmMza4EiHSzPwqs+5mkuEMBBZ8NwaY/FWadBwG y8UrTRqUjjPHglPGbdozw4zjippxGq9aOy/NQa7nkvQRTqCL1/ZWOsdAYb+DrRbhVgVXDqFFnqFgl lTDiVGGOh1ACtPz+OnBLNm8E2FNRLmw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1g7JkV-0003TP-8R; Tue, 02 Oct 2018 12:25:59 +0000 Received: from lilium.sigma-star.at ([109.75.188.150]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1g7JkR-0003Rl-DE for linux-mtd@lists.infradead.org; Tue, 02 Oct 2018 12:25:57 +0000 Received: from localhost (localhost [127.0.0.1]) by lilium.sigma-star.at (Postfix) with ESMTP id 2653A1806353E; Tue, 2 Oct 2018 14:25:44 +0200 (CEST) From: David Oberhollenzer To: linux-mtd@lists.infradead.org Subject: [PATCH mtd-www] Remove references to the "unstable bits" issue Date: Tue, 2 Oct 2018 14:25:23 +0200 Message-Id: <20181002122523.12422-1-david.oberhollenzer@sigma-star.at> X-Mailer: git-send-email 2.17.1 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20181002_052555_738959_D6296EB2 X-CRM114-Status: GOOD ( 30.11 ) X-Spam-Score: 0.0 (/) X-Spam-Report: SpamAssassin version 3.4.1 on bombadil.infradead.org summary: Content analysis details: (0.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 T_SPF_PERMERROR SPF: test of record failed (permerror) X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: richard@nod.at, David Oberhollenzer MIME-Version: 1.0 Sender: "linux-mtd" Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org In 2011, several sections about an issue with SLC and MLC NAND flashes of that time were added. The problem was referred to as "unstable bits" issue throughout the documentation and the FAQ. As of writing, the issue could not be reproduced with modern NADN flashes for several years and the section on the web site actually caused some confusion, with some users of MTD/UBI/UBIFS attributing actual bugs to the unstable bits issue instead of reporting on the mailing list. To avoid future confusion, this patch removes the section about the issue and all references to it. Signed-off-by: David Oberhollenzer Acked-by: Miquel Raynal --- doc/ubifs.xml | 141 -------------------------------------------------- faq/ubi.xml | 12 ----- 2 files changed, 153 deletions(-) diff --git a/doc/ubifs.xml b/doc/ubifs.xml index c1f54b5..6520d79 100644 --- a/doc/ubifs.xml +++ b/doc/ubifs.xml @@ -16,7 +16,6 @@
  • Overview
  • Power-cuts tolerance
  • UBIFS and MLC NAND flash
  • -
  • The unstable bits issue
  • Source code
  • Mailing list
  • User-space tools
  • @@ -168,13 +167,6 @@ subsystems involved:

    Both UBI (see here) and UBIFS are tolerant to power-cuts, and they were designed with this property in mind.

    -

    Year 2011 note: however, there is an unsolved -unstable bits issue which makes -UBI/UBIFS fail to recover after a power cut on modern SLC and MLC flashes. This -issue has not been observed on older SLC NANDs back at the time UBI/UBIFS was -being developed. Note, the below text is quite old and has been written before -the unstable bits issue has been first discovered.

    -

    UBIFS has internal debugging infrastructure to emulate power failures and the authors used it for extensive testing. It was tested for long time with power-fail emulation. The advantage of the emulation is that it emulates power @@ -311,141 +303,8 @@ some specific aspects of MLC NAND flashes:

    emulation, then use the integck test for testing. After all the issues are fixed, real power-cut tests could be carried out.

    - -
  • [NEED WORK] The "unstable bits issue", which is not - MLC-specific, described - here.
  • - - -

    The unstable bits issue

    - -

    In the MTD community the "unstable bits" term is used to describe data -instabilities caused by power cuts while writing or erasing. The unstable bits -issue is still not resolved in UBI and UBIFS, and it was reported several times -in the MTD mailing list. In theory, this issue should be visible in any flash, -but for some reason back at the times when we developed UBI/UBIFS and -extensively tested them on a robust SLC NAND, we did not observe it. No one -reported about this issue for NOR flash yet. However, on modern SLC and MLC -flashes this problem is reproducible.

    - -

    The unstable bits are the result of a power cut during a program or erase -operation. Depending on when the power cut has happened, they can corrupt the -data or the free space. Consider the following 4 situations:

    - -
      -
    1. The power cut happens just before the NAND page program operation - finishes. After reboot the page may be read correctly and without - a single bit-flip say, 2 times, and the 3rd time you may get an ECC - error. This happens because the page contains a number of unstable bits - which are sometimes read correctly and sometimes not.
    2. - -
    3. The power cut happens just after the NAND page program operation - starts. After reboot, the page may be read correctly (return all - 0xFFs) most of the time, but sometimes you may get some bits set to - zero. Moreover, if you then program this page, it also may be sometimes - read correctly, but sometimes return an ECC error. The reason is again - the unstable bits in the NAND page.
    4. - -
    5. The power cut happens just before the eraseblock erase operation - finishes. After reboot, the eraseblock may contain unstable bits and - data in this eraseblock may suddenly become corrupted.
    6. - -
    7. The power cut happens just after the eraseblock erase operation - starts. After reboot, the eraseblock may contain unstable bits and - sometimes return zero bits on read, or corrupted data if you program - it.
    8. -
    - -

    The number of unstable bits resulting from a power-cut may be greater than -what the ECC algorithm is able to correct. This is why a previously readable -page may suddenly become unreadable, or conversely a previously unreadable page -may suddenly become readable.

    - -

    Here is an example scenario how UBIFS may fail. UBIFS writes data node A to -the journal LEB, and a power cut of type 1 happens. After the reboot, UBIFS -recovery code reads that LEB, no bit-flips are reported by MTD, all the CRCs -match, everything looks fine. UBIFS just assume that this LEB is all-right and -the free space at the end of this LEB can be used for writing more data. UBIFS -performs the commit operations, writes more user data, and everything works -fine until the user reads node A by reading the corresponding file: an ECC -error happens and the user gets the EIO error.

    - -

    The EIO may be what the user gets instead of his/her data also -if a type 2 power cut happens, and UBIFS re-uses the corrupted free space for -writing new nodes, and then these nodes are read.

    - -

    The solution is to teach UBIFS to erase-cycle any LEB which could potentially -be written to when the power cut happened. This is not only about the -journal LEBs, but also LPT, log, master and orphan LEBs. This means that the -valid data from this LEB has to be read (and only once!) and then it should be -written back to this LEB using the -atomic LEB change UBI operation. -This has to be done even if the LEB looks all-right - no corruptions, all 0xFFs -at the end.

    - -

    Similarly, UBI has to erase-cycle every eraseblock which could potentially be -erased when the power cut happened.

    - -

    The other requirement is that during the recovery UBI/UBIFS should read data -from the media only once. This is easy to demonstrate on the delayed recovery -example. The delayed recovery happens when after a power cut the file-system is -mounted R/O, in which case UBIFS must not write anything to the flash, and the -real recovery is delayed until the FS is re-mounted R/W. Currently UBIFS just -scans the journal during mounting R/O, drops (or "remembers") corrupted nodes, -and "does not let" users read them. But there is no guarantee that UBIFS -spots all the corrupted nodes during the first scanning, so users may get -EIO while reading data from the R/O-mounted FS.

    - -

    When UBIFS is then remounted R/W, it actually drops the corrupted nodes from -the flash media by erase-cycling the corresponding LEBs. And UBIFS re-reads -all the LEB data again. And there is no guarantee that UBIFS will get the same -corruptions again.

    - -

    So it is important to make sure that the corrupted LEBs are read only once. -E.g., we can cache the results of the first scanning, and then use that data -when running the delayed recovery, instead of re-reading the data. Probably we -may remember only the last NAND page containing valid nodes, not whole LEB, -since for the journal only unstable bits of type 1 and 2 are relevant.

    - -

    There are similar double-read issues in UBI scanning - when it finds 2 PEBs -belonging to the same LEB and it has to find out which one is newer. The volume -table has to be erase-cycled as well in UBI.

    - -

    There are more issues related to unstable bits of type 2 and 3 in UBI, I -think. This all needs a very careful look, and this is not trivial to fix -because of the complexity: UBIFS as any file-system has many interfaces and a -lot of states. The best strategy to attack this problem would be:

    - -
      -
    1. Improve the existing power cut emulation infrastructure in UBIFS - and start emulating unstable bits. Start with emulating only one type - of unstable bits, e.g., type 1.
    2. - -
    3. Use the integck test to stress the file-system with - power cut emulation enabled - the test can re-start when an emulated - power cut happens. This will allow you to very quickly emulate hundreds - of power cuts in interesting places. Fix all the bugs. Make sure it is - rock solid. Of course, if you have various independent issues, you may - temporary hack the power cut emulation code to emulate unstable bits - only at certain places, to temporarily limit the amount of problems you - have to simultaneously deal with.
    4. - -
    5. Start emulating other types of unstable bits, and fix all the - issues one-by-one.
    6. - -
    7. Go down to UBI and add a similar power cut emulation - infrastructure. But emulate unstable bits only in UBI-specific on-flash - data structures - the EC/VID headers and the volume table. Improve the - integck test to support that infrastructure and fix all the - issues.
    8. - -
    9. Run real power cut tests on real hardware.
    10. -
    - - -

    Source code

    The UBIFS git tree is

    diff --git a/faq/ubi.xml b/faq/ubi.xml index fdd7abb..f9a76b3 100644 --- a/faq/ubi.xml +++ b/faq/ubi.xml @@ -449,12 +449,6 @@ probably do this.

    Yes, UBI is designed to be tolerant of power failures and unclean reboots.

    -

    Year 2011 note: however, there is an unsolved -unstable bits issue which may make -UBI fail to recover after a power cut on modern SLC and MLC flashes. This has -never been reported yet for UBI, but has been reported for UBIFS and we believe -must be an issue for UBI as well.

    -

    What happens when the PEBs reserved for bad block handling run out?

    @@ -480,12 +474,6 @@ life-cycle (about 1000-10000, unlike 100000-1000000 for SLC NAND and NOR flashes), the threshold has to be set to a lower value (e.g., 256). This may be done via the Linux kernel configuration menu.

    -

    Year 2011 note: however, there is an unsolved -unstable bits issue which may make -UBI fail to recover after a power cut on modern SLC and MLC flashes. This has -never been reported yet for UBI, but has been reported for UBIFS and we believe -must be an issue for UBI as well.

    -

    Note, unlike UBI, JFFS2 uses random wear-leveling algorithm, which is in fact not completely random, because JFFS2 makes it more probable to garbage collect eraseblocks with more dirty data. This means that JFFS2 is not