From patchwork Mon May 18 17:30:40 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Holmberg X-Patchwork-Id: 27367 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from bombadil.infradead.org (bombadil.infradead.org [18.85.46.34]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by bilbo.ozlabs.org (Postfix) with ESMTPS id 5CF89B7043 for ; Tue, 19 May 2009 03:33:43 +1000 (EST) Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.69 #1 (Red Hat Linux)) id 1M66g7-0006Et-VI; Mon, 18 May 2009 17:31:08 +0000 Received: from west-smtp1.trimble.com ([155.63.128.51]) by bombadil.infradead.org with esmtps (Exim 4.69 #1 (Red Hat Linux)) id 1M66fn-0006As-5o for linux-mtd@lists.infradead.org; Mon, 18 May 2009 17:31:01 +0000 Received: (qmail 26715 invoked by uid 501); 18 May 2009 11:30:43 -0600 Received: from 10.1.80.133 by west-smtp1.trimble.com (envelope-from , uid 108) with qmail-scanner-2.05 (clamdscan: 0.95.1/9366. trophie: 8.310-1002/135/398971. spamassassin: 3.2.5. Clear:RC:1(10.1.80.133):. Processed in 0.029879 secs); 18 May 2009 17:30:43 -0000 Received: from unknown (HELO usw-am-xch-02.am.trimblecorp.net) (10.1.80.133) by west-smtp1.trimble.com with SMTP; 18 May 2009 11:30:43 -0600 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Subject: RE: UBIFS Corrupt during power failure Date: Mon, 18 May 2009 11:30:40 -0600 Message-ID: In-Reply-To: <200905150916.54091.sr@denx.de> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: UBIFS Corrupt during power failure Thread-Index: AcnVLUcHzilfj9b7TM6AjNAHIsQ6dwCpptpA References: <1239979018.3390.298.camel@localhost.localdomain> <200905150916.54091.sr@denx.de> From: "Eric Holmberg" To: "Stefan Roese" X-Spam-Score: 0.0 (/) Cc: Jamie Lokier , Adrian Hunter , linux-mtd@lists.infradead.org, Urs Muff X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-mtd-bounces@lists.infradead.org Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Hi Stefan, I am still seeing corruption even with the write buffer size limited to 8 bytes, but it's greatly limited. Unfortunately our schedule doesn't allow me to work on this full-time for the immediate future, so I'm limited to small chunks of time for now. Let me know if there is anything I can do to assist/share since it looks like we both are in need of fixing this. At this point, I believe I have characterized the interrupted erase and interrupted write patterns that are causing the problems, so the next step I may take is to add the failure conditions into the NOR MTD device simulator mtdram and see if I can get the same failures. Let me know if you have any other ideas of approaches. Here's the patch to change the maximum write buffer size to 8 bytes (2^3). #ifdef DEBUG_CFI /* Dump the information therein */ print_cfi_ident(cfi->cfiq); Best Regards, Eric Holmberg Senior Firmware Engineer Trimble Construction Services Westminster, Colorado > -----Original Message----- > From: Stefan Roese [mailto:sr@denx.de] > Sent: Friday, May 15, 2009 1:17 AM > To: Eric Holmberg > Cc: linux-mtd@lists.infradead.org; dedekind@infradead.org; > Jamie Lokier; Urs Muff; Adrian Hunter > Subject: Re: UBIFS Corrupt during power failure > > Hi Eric, > > On Saturday 18 April 2009 01:49:52 Eric Holmberg wrote: > > > Yeah, let's wait for Eric's results and then will work on > > > extending MTD device model with this parameter. > > > > As suggested, I patched my 2.6.27 kernel with the latest from > > http://git.infradead.org/users/dedekind/ubifs-v2.6.27.git > (includes all > > updates up to and including fhe fix-recovery bug, > > > http://git.infradead.org/users/dedekind/ubifs-v2.6.27.git?a=co > mmit;h=e14 > > 4c1c037f1c6f7c687de5a2cd375cb40dfe71e). > > > > I have the unit running with a maximum write buffer of 8 > bytes (the NOR > > flash chip is capable of 64 bytes). > > How exactly did you do this? In cfi_cmdset_0002.c? > > > I was seeing 4 different failure scenarios with the base > 2.6.27 code, > > but now I am only seeing one remaining failure after 30+ > hours of power > > cycling. I added a stack dump this afternoon that will let > me pinpoint > > exactly what is happening, but haven't seen the failure, yet. > > > > The failure happens when I get two corrupt empty LEB's. I > believe the > > scenario is that an erase is interrupted and on the next > boot, while the > > file system is being recovered, another power failure occurs. > > > > I can erase one of the LEB's manually in U-Boot and the file system > > recovers properly. > > > > I'm going to leave the units running over the weekend and > see what is > > waiting for me Monday morning. > > Do you have an update for this? What's the current status on > your system now? > Which patches did you apply to work reliably with the Spansion FLASH? > > I'm asking since we are seeing a similar issue on one of our > boards equipped > with the S29GL512P. This simple script triggers problems upon > the next mount: > > --- > mount -t ubifs ubi0:testvolume /mnt > sync > reboot -n -f > --- > > The next mount will result most of the time in this: > > UBIFS: recovery needed > UBIFS error (pid 406): ubifs_scan: corrupt empty space at LEB 3:130320 > UBIFS error (pid 406): ubifs_scanned_corruption: corrupted > data at LEB > 3:130320 > UBIFS error (pid 406): ubifs_scan: LEB 3 scanning failed > UBIFS error (pid 406): ubifs_recover_leb: corrupt empty space > at LEB 3:32 > UBIFS error (pid 406): ubifs_scanned_corruption: corrupted > data at LEB 3:32 > UBIFS error (pid 406): ubifs_recover_leb: LEB 3 scanning failed > mount: Structure needs cleaning > > This is without the patch from this thread included (in > recovery.c). With this > patch included the recovery is successful all the time, as > far as we can see > right now. But I'm wondering if we really need to disable the > write buffer in > the CFI driver or reduce the write buffer to 8. > > Thanks. > > Best regards, > Stefan > > ===================================================================== > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de > ===================================================================== > Index: drivers/mtd/chips/cfi_probe.c =================================================================== --- drivers/mtd/chips/cfi_probe.c (revision 4477) +++ drivers/mtd/chips/cfi_probe.c (working copy) @@ -18,7 +18,7 @@ #include #include -//#define DEBUG_CFI +#define DEBUG_CFI #ifdef DEBUG_CFI static void print_cfi_ident(struct cfi_ident *); @@ -251,6 +251,18 @@ cfi->cfiq->InterfaceDesc = le16_to_cpu(cfi->cfiq->InterfaceDesc); cfi->cfiq->MaxBufWriteSize = le16_to_cpu(cfi->cfiq->MaxBufWriteSize); + //DEBUG - BEGIN - force max write size to 8 bytes (2^3) + if (cfi->cfiq->MaxBufWriteSize) + { + printk("Warning: Overriding MaxBufWriteSize from 2^%d to 2^%d\n", + cfi->cfiq->MaxBufWriteSize, + 3 + ); + cfi->cfiq->MaxBufWriteSize = 3; + } + //DEBUG - END + +