From patchwork Mon May 18 17:30:40 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: UBIFS Corrupt during power failure Date: Mon, 18 May 2009 07:30:40 -0000 From: Eric Holmberg X-Patchwork-Id: 27367 Message-Id: To: "Stefan Roese" Cc: Jamie Lokier , Adrian Hunter , linux-mtd@lists.infradead.org, Urs Muff Hi Stefan, I am still seeing corruption even with the write buffer size limited to 8 bytes, but it's greatly limited. Unfortunately our schedule doesn't allow me to work on this full-time for the immediate future, so I'm limited to small chunks of time for now. Let me know if there is anything I can do to assist/share since it looks like we both are in need of fixing this. At this point, I believe I have characterized the interrupted erase and interrupted write patterns that are causing the problems, so the next step I may take is to add the failure conditions into the NOR MTD device simulator mtdram and see if I can get the same failures. Let me know if you have any other ideas of approaches. Here's the patch to change the maximum write buffer size to 8 bytes (2^3). #ifdef DEBUG_CFI /* Dump the information therein */ print_cfi_ident(cfi->cfiq); Best Regards, Eric Holmberg Senior Firmware Engineer Trimble Construction Services Westminster, Colorado > -----Original Message----- > From: Stefan Roese [mailto:sr@denx.de] > Sent: Friday, May 15, 2009 1:17 AM > To: Eric Holmberg > Cc: linux-mtd@lists.infradead.org; dedekind@infradead.org; > Jamie Lokier; Urs Muff; Adrian Hunter > Subject: Re: UBIFS Corrupt during power failure > > Hi Eric, > > On Saturday 18 April 2009 01:49:52 Eric Holmberg wrote: > > > Yeah, let's wait for Eric's results and then will work on > > > extending MTD device model with this parameter. > > > > As suggested, I patched my 2.6.27 kernel with the latest from > > http://git.infradead.org/users/dedekind/ubifs-v2.6.27.git > (includes all > > updates up to and including fhe fix-recovery bug, > > > http://git.infradead.org/users/dedekind/ubifs-v2.6.27.git?a=co > mmit;h=e14 > > 4c1c037f1c6f7c687de5a2cd375cb40dfe71e). > > > > I have the unit running with a maximum write buffer of 8 > bytes (the NOR > > flash chip is capable of 64 bytes). > > How exactly did you do this? In cfi_cmdset_0002.c? > > > I was seeing 4 different failure scenarios with the base > 2.6.27 code, > > but now I am only seeing one remaining failure after 30+ > hours of power > > cycling. I added a stack dump this afternoon that will let > me pinpoint > > exactly what is happening, but haven't seen the failure, yet. > > > > The failure happens when I get two corrupt empty LEB's. I > believe the > > scenario is that an erase is interrupted and on the next > boot, while the > > file system is being recovered, another power failure occurs. > > > > I can erase one of the LEB's manually in U-Boot and the file system > > recovers properly. > > > > I'm going to leave the units running over the weekend and > see what is > > waiting for me Monday morning. > > Do you have an update for this? What's the current status on > your system now? > Which patches did you apply to work reliably with the Spansion FLASH? > > I'm asking since we are seeing a similar issue on one of our > boards equipped > with the S29GL512P. This simple script triggers problems upon > the next mount: > > --- > mount -t ubifs ubi0:testvolume /mnt > sync > reboot -n -f > --- > > The next mount will result most of the time in this: > > UBIFS: recovery needed > UBIFS error (pid 406): ubifs_scan: corrupt empty space at LEB 3:130320 > UBIFS error (pid 406): ubifs_scanned_corruption: corrupted > data at LEB > 3:130320 > UBIFS error (pid 406): ubifs_scan: LEB 3 scanning failed > UBIFS error (pid 406): ubifs_recover_leb: corrupt empty space > at LEB 3:32 > UBIFS error (pid 406): ubifs_scanned_corruption: corrupted > data at LEB 3:32 > UBIFS error (pid 406): ubifs_recover_leb: LEB 3 scanning failed > mount: Structure needs cleaning > > This is without the patch from this thread included (in > recovery.c). With this > patch included the recovery is successful all the time, as > far as we can see > right now. But I'm wondering if we really need to disable the > write buffer in > the CFI driver or reduce the write buffer to 8. > > Thanks. > > Best regards, > Stefan > > ===================================================================== > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: office@denx.de > ===================================================================== > Index: drivers/mtd/chips/cfi_probe.c =================================================================== --- drivers/mtd/chips/cfi_probe.c (revision 4477) +++ drivers/mtd/chips/cfi_probe.c (working copy) @@ -18,7 +18,7 @@ #include #include -//#define DEBUG_CFI +#define DEBUG_CFI #ifdef DEBUG_CFI static void print_cfi_ident(struct cfi_ident *); @@ -251,6 +251,18 @@ cfi->cfiq->InterfaceDesc = le16_to_cpu(cfi->cfiq->InterfaceDesc); cfi->cfiq->MaxBufWriteSize = le16_to_cpu(cfi->cfiq->MaxBufWriteSize); + //DEBUG - BEGIN - force max write size to 8 bytes (2^3) + if (cfi->cfiq->MaxBufWriteSize) + { + printk("Warning: Overriding MaxBufWriteSize from 2^%d to 2^%d\n", + cfi->cfiq->MaxBufWriteSize, + 3 + ); + cfi->cfiq->MaxBufWriteSize = 3; + } + //DEBUG - END + +