Patchwork UBIFS Corrupt during power failure

login
register
mail settings
Submitter Eric Holmberg
Date May 18, 2009, 5:30 p.m.
Message ID <C77C279BA71FD14985DC8E75FB265AB70344D104@usw-am-xch-02.am.trimblecorp.net>
Download mbox | patch
Permalink /patch/27367/
State New
Headers show

Comments

Eric Holmberg - May 18, 2009, 5:30 p.m.
Hi Stefan,

I am still seeing corruption even with the write buffer size limited to
8 bytes, but it's greatly limited.  Unfortunately our schedule doesn't
allow me to work on this full-time for the immediate future, so I'm
limited to small chunks of time for now.  Let me know if there is
anything I can do to assist/share since it looks like we both are in
need of fixing this.

At this point, I believe I have characterized the interrupted erase and
interrupted write patterns that are causing the problems, so the next
step I may take is to add the failure conditions into the NOR MTD device
simulator mtdram and see if I can get the same failures.

Let me know if you have any other ideas of approaches.

Here's the patch to change the maximum write buffer size to 8 bytes
(2^3).

 #ifdef DEBUG_CFI
 	/* Dump the information therein */
 	print_cfi_ident(cfi->cfiq);

Best Regards,

Eric Holmberg
Senior Firmware Engineer
Trimble Construction Services
Westminster, Colorado

> -----Original Message-----
> From: Stefan Roese [mailto:sr@denx.de] 
> Sent: Friday, May 15, 2009 1:17 AM
> To: Eric Holmberg
> Cc: linux-mtd@lists.infradead.org; dedekind@infradead.org; 
> Jamie Lokier; Urs Muff; Adrian Hunter
> Subject: Re: UBIFS Corrupt during power failure
> 
> Hi Eric,
> 
> On Saturday 18 April 2009 01:49:52 Eric Holmberg wrote:
> > > Yeah, let's wait for Eric's results and then will work on
> > > extending MTD device model with this parameter.
> >
> > As suggested, I patched my 2.6.27 kernel with the latest from
> > http://git.infradead.org/users/dedekind/ubifs-v2.6.27.git 
> (includes all
> > updates up to and including fhe fix-recovery bug,
> > 
> http://git.infradead.org/users/dedekind/ubifs-v2.6.27.git?a=co
> mmit;h=e14
> > 4c1c037f1c6f7c687de5a2cd375cb40dfe71e).
> >
> > I have the unit running with a maximum write buffer of 8 
> bytes (the NOR
> > flash chip is capable of 64 bytes).
> 
> How exactly did you do this? In cfi_cmdset_0002.c?
> 
> > I was seeing 4 different failure scenarios with the base 
> 2.6.27 code,
> > but now I am only seeing one remaining failure after 30+ 
> hours of power
> > cycling.  I added a stack dump this afternoon that will let 
> me pinpoint
> > exactly what is happening, but haven't seen the failure, yet.
> >
> > The failure happens when I get two corrupt empty LEB's.  I 
> believe the
> > scenario is that an erase is interrupted and on the next 
> boot, while the
> > file system is being recovered, another power failure occurs.
> >
> > I can erase one of the LEB's manually in U-Boot and the file system
> > recovers properly.
> >
> > I'm going to leave the units running over the weekend and 
> see what is
> > waiting for me Monday morning.
> 
> Do you have an update for this? What's the current status on 
> your system now? 
> Which patches did you apply to work reliably with the Spansion FLASH?
> 
> I'm asking since we are seeing a similar issue on one of our 
> boards equipped 
> with the S29GL512P. This simple script triggers problems upon 
> the next mount:
> 
> ---
> mount -t ubifs ubi0:testvolume /mnt
> sync
> reboot -n -f
> ---
> 
> The next mount will result most of the time in this:
> 
> UBIFS: recovery needed
> UBIFS error (pid 406): ubifs_scan: corrupt empty space at LEB 3:130320
> UBIFS error (pid 406): ubifs_scanned_corruption: corrupted 
> data at LEB 
> 3:130320
> UBIFS error (pid 406): ubifs_scan: LEB 3 scanning failed
> UBIFS error (pid 406): ubifs_recover_leb: corrupt empty space 
> at LEB 3:32
> UBIFS error (pid 406): ubifs_scanned_corruption: corrupted 
> data at LEB 3:32
> UBIFS error (pid 406): ubifs_recover_leb: LEB 3 scanning failed
> mount: Structure needs cleaning
> 
> This is without the patch from this thread included (in 
> recovery.c). With this 
> patch included the recovery is successful all the time, as 
> far as we can see 
> right now. But I'm wondering if we really need to disable the 
> write buffer in 
> the CFI driver or reduce the write buffer to 8.
> 
> Thanks.
> 
> Best regards,
> Stefan
> 
> =====================================================================
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: +49-8142-66989-0 Fax: +49-8142-66989-80  Email: office@denx.de
> =====================================================================
>
Artem Bityutskiy - May 19, 2009, 8:18 a.m.
On Mon, 2009-05-18 at 11:30 -0600, Eric Holmberg wrote:
> Hi Stefan,
> 
> I am still seeing corruption even with the write buffer size limited to
> 8 bytes, but it's greatly limited.

Do you mean, UBIFS still dies when buf. size is 8?
Eric Holmberg - May 19, 2009, 10:16 p.m.
> On Mon, 2009-05-18 at 11:30 -0600, Eric Holmberg wrote:

> > Hi Stefan,

> > 

> > I am still seeing corruption even with the write buffer 

> size limited to

> > 8 bytes, but it's greatly limited.

> 

> Do you mean, UBIFS still dies when buf. size is 8?

> 

> -- 

> Best regards,

> Artem Bityutskiy (Битюцкий Артём)



Yes, I'm still seeing two failures.  One is where I get 2 corrupt empty blocks when an LEB erase operation is interrupted by a power failure.  Erasing one of them manually in U-Boot allows the system to boot.  I believe this happens when an LEB erase operation is interrupted and then during the deferred recovery, another erase operation is interrupted.  The system never expects to have more than one erase operation interrupted and panics.

The other failure is a corruption issue, even with the write buffer size limited to 8 bytes.  Scroll down to the end of the kernel messages for the failure.

I unfortunately didn't get a chance to get an image of the flash to see what happened to the data block before the board was reprogrammed.  I'm trying to reproduce it so I can get more details on what is happening.

[42949374.110000] physmap platform flash device: 02000000 at 30000000
[42949374.110000] Number of erase regions: 1
[42949374.120000] Warning:  Overriding MaxBufWriteSize from 2^6 to 2^3
[42949374.120000] Primary Vendor Command Set: 0002 (AMD/Fujitsu Standard)
[42949374.130000] Primary Algorithm Table at 0040
[42949374.140000] Alternative Vendor Command Set: 0000 (None)
[42949374.140000] No Alternate Algorithm Table
[42949374.140000] Vcc Minimum:  2.7 V
[42949374.150000] Vcc Maximum:  3.6 V
[42949374.150000] No Vpp line
[42949374.150000] Typical byte/word write timeout: 64 µs
[42949374.160000] Maximum byte/word write timeout: 512 µs
[42949374.160000] Typical full buffer write timeout: 64 µs
[42949374.170000] Maximum full buffer write timeout: 2048 µs
[42949374.180000] Typical block erase timeout: 512 ms
[42949374.180000] Maximum block erase timeout: 4096 ms
[42949374.180000] Typical chip erase timeout: 524288 ms
[42949374.190000] Maximum chip erase timeout: 2097152 ms
[42949374.190000] Device size: 0x2000000 bytes (32 MiB)
[42949374.200000] Flash Device Interface description: 0x0002
[42949374.210000]   - supports x8 and x16 via BYTE# with asynchronous interface
[42949374.210000] Max. bytes in buffer write: 0x8
[42949374.220000] Number of Erase Block Regions: 1
[42949374.220000]   Erase Region #0: BlockSize 0x20000 bytes, 256 blocks
[42949374.230000] physmap-flash.1: Found 1 x16 devices at 0x0 in 16-bit bank
[42949374.230000]  Amd/Fujitsu Extended Query Table at 0x0040
[42949374.240000]   Silicon revision: 10
[42949374.240000]   Address sensitive unlock: Required
[42949374.250000]   Erase Suspend: Read/write
[42949374.250000]   Block protection: 1 sectors per group
[42949374.260000]   Temporary block unprotect: Not supported
[42949374.260000]   Block protect/unprotect scheme: 8
[42949374.270000]   Number of simultaneous operations: 0
[42949374.270000]   Burst mode: Not supported
[42949374.280000]   Page mode: 8 word page
[42949374.280000]   Vpp Supply Minimum Program/Erase Voltage: 11.5 V
[42949374.290000]   Vpp Supply Maximum Program/Erase Voltage: 12.5 V
[42949374.290000]   Top/Bottom Boot Block: Uniform, Bottom WP
[42949374.300000]   Write buffers enabled
[42949374.300000] physmap-flash.1: CFI does not contain boot bank location. Assuming top.
[42949374.310000] number of CFI chips: 1
[42949374.310000] cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.
[42949374.320000] RedBoot partition parsing not available
[42949374.330000] Using physmap partition information
[42949374.330000] Creating 3 MTD partitions on "physmap-flash.1":
[42949374.340000] 0x00000000-0x00200000 : "kernel"
[42949374.350000] 0x00200000-0x00400000 : "kernel-failsafe"
[42949374.360000] 0x00400000-0x02000000 : "root"
[42949374.370000] UBI: attaching mtd7 to ubi0
[42949374.370000] UBI: physical eraseblock size:   131072 bytes (128 KiB)
[42949374.380000] UBI: logical eraseblock size:    130944 bytes
[42949374.380000] UBI: smallest flash I/O unit:    1
[42949374.390000] UBI: VID header offset:          64 (aligned 64)
[42949374.390000] UBI: data offset:                128
[42949375.090000] UBI: attached mtd7 to ubi0
[42949375.090000] UBI: MTD device name:            "root"
[42949375.100000] UBI: MTD device size:            28 MiB
[42949375.110000] UBI: number of good PEBs:        224
[42949375.110000] UBI: number of bad PEBs:         0
[42949375.110000] UBI: max. allowed volumes:       128
[42949375.120000] UBI: wear-leveling threshold:    4096
[42949375.120000] UBI: number of internal volumes: 1
[42949375.130000] UBI: number of user volumes:     1
[42949375.130000] UBI: available PEBs:             0
[42949375.140000] UBI: total number of reserved PEBs: 224
[42949375.140000] UBI: number of PEBs reserved for bad PEB handling: 0
[42949375.150000] UBI: max/mean erase counter: 85/21
...
[42949375.620000] UBIFS: recovery needed
[42949375.630000] UBIFS: recovery needed - but mounted in read-only mode
[42949375.770000] UBIFS error (pid 1): ubifs_check_node: bad CRC: calculated 0xa2ef18b9, read 0x5ebf03c1
[42949375.780000] UBIFS error (pid 1): ubifs_check_node: bad node at LEB 120:0
[42949375.790000] UBIFS error (pid 1): ubifs_scanned_corruption: corrupted data at LEB 120:0
[42949375.810000] UBIFS error (pid 1): ubifs_recover_leb: LEB 120 scanning failed
[42949375.820000] VFS: Cannot open root device "ubi0:rootfs" or unknown-block(0,0)
[42949375.830000] Please append a correct "root=" boot option; here are the available partitions:
[42949375.840000] 1f00         16 mtdblock0 (driver?)
[42949375.840000] 1f01          8 mtdblock1 (driver?)
[42949375.850000] 1f02          8 mtdblock2 (driver?)
[42949375.850000] 1f03         32 mtdblock3 (driver?)
[42949375.860000] 1f04        960 mtdblock4 (driver?)
[42949375.860000] 1f05       2048 mtdblock5 (driver?)
[42949375.870000] 1f06       2048 mtdblock6 (driver?)
[42949375.870000] 1f07      28672 mtdblock7 (driver?)
[42949375.880000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)


Getting the failures to occur using physical hardware takes 7 or 8 hours which is why I would like to modify either the drivers/mtd/devices/block2mtd.c NOR simulator or the RAM simulator and put in the interrupted flash patterns that I've already characterized.  Any ideas on how to simulate a power failure in either module and then do a UBIFS remount?

-Eric
Artem Bityutskiy - May 25, 2009, 8:38 a.m.
[Loong lines in your e-mail make it difficult to read it]

On Tue, 2009-05-19 at 16:16 -0600, Eric Holmberg wrote:
> Yes, I'm still seeing two failures.  One is where I get 2 corrupt
> empty blocks when an LEB erase operation is interrupted by a power
> failure.

You mean you have 2 LEBs containing corrupted nodes?
Just to make it clear - this is the second problem. The first one
was about the NOR write buffering. And this one is separate, right?

>   Erasing one of them manually in U-Boot allows the system 
> to boot.  I believe this happens when an LEB erase operation is
> interrupted and then during the deferred recovery, another erase
> operation is interrupted.  The system never expects to have more
> than one erase operation interrupted and panics.

Hmm, if this is true, it should not be too difficult to fix this.


> I unfortunately didn't get a chance to get an image of the flash to
> see what happened to the data block before the board was reprogrammed.
> I'm trying to reproduce it so I can get more details on what is happening.

Please, provide all messages. UBIFS prints much more of them when
debugging is enabled. It prints them with KERN_DEBUG level, which
means they do not go to your console by default. You should use
'ignore_loglevel' boot option to make kernel print everything to the
serial console, see here:

http://www.linux-mtd.infradead.org/doc/ubifs.html#L_how_send_bugreport

Please, use that option - it will give us mush more information
about the error, including stackdump and node dumps.


> [42949374.300000] physmap-flash.1: CFI does not contain boot bank location. Assuming top.
> [42949374.310000] number of CFI chips: 1
> [42949374.310000] cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.
> [42949374.320000] RedBoot partition parsing not available
> [42949374.330000] Using physmap partition information
> [42949374.330000] Creating 3 MTD partitions on "physmap-flash.1":
> [42949374.340000] 0x00000000-0x00200000 : "kernel"
> [42949374.350000] 0x00200000-0x00400000 : "kernel-failsafe"
> [42949374.360000] 0x00400000-0x02000000 : "root"
> [42949374.370000] UBI: attaching mtd7 to ubi0
> [42949374.370000] UBI: physical eraseblock size:   131072 bytes (128 KiB)
> [42949374.380000] UBI: logical eraseblock size:    130944 bytes
> [42949374.380000] UBI: smallest flash I/O unit:    1
> [42949374.390000] UBI: VID header offset:          64 (aligned 64)
> [42949374.390000] UBI: data offset:                128
> [42949375.090000] UBI: attached mtd7 to ubi0
> [42949375.090000] UBI: MTD device name:            "root"
> [42949375.100000] UBI: MTD device size:            28 MiB
> [42949375.110000] UBI: number of good PEBs:        224
> [42949375.110000] UBI: number of bad PEBs:         0
> [42949375.110000] UBI: max. allowed volumes:       128
> [42949375.120000] UBI: wear-leveling threshold:    4096
> [42949375.120000] UBI: number of internal volumes: 1
> [42949375.130000] UBI: number of user volumes:     1
> [42949375.130000] UBI: available PEBs:             0
> [42949375.140000] UBI: total number of reserved PEBs: 224
> [42949375.140000] UBI: number of PEBs reserved for bad PEB handling: 0
> [42949375.150000] UBI: max/mean erase counter: 85/21
> ...
> [42949375.620000] UBIFS: recovery needed
> [42949375.630000] UBIFS: recovery needed - but mounted in read-only mode
> [42949375.770000] UBIFS error (pid 1): ubifs_check_node: bad CRC: calculated 0xa2ef18b9, read 0x5ebf03c1
> [42949375.780000] UBIFS error (pid 1): ubifs_check_node: bad node at LEB 120:0
> [42949375.790000] UBIFS error (pid 1): ubifs_scanned_corruption: corrupted data at LEB 120:0
> [42949375.810000] UBIFS error (pid 1): ubifs_recover_leb: LEB 120 scanning failed
> [42949375.820000] VFS: Cannot open root device "ubi0:rootfs" or unknown-block(0,0)
> [42949375.830000] Please append a correct "root=" boot option; here are the available partitions:

Presumably what happens it: UBIFS scans LEB 120. It checks the first
node, and finds CRC mismatch. Then UBIFS logic is as follows. If this
corrupted node is the last one, then there was a write interrupt,
which is harmless. But if after this node some other data follows,
this is some serious corruption. So the 'is_last_write()' function
is called, it is supposed to check that.

In 'is_last_write()' I see it has different logic depending on whether
c->min_io_size == 1 or not. The former case is NOR case, the latter
is NAND. Well, since I know we never tested UBIFS well for NOR,
I conclude the NOR case may have a bug.

I'll look at this function closer a bit later and let you know.
But please, if you reproduce this, do not fix this in u-boot.
We may come up with a patch for you and you would test it.

Thanks.

> Getting the failures to occur using physical hardware takes 7 or 8
> hours which is why I would like to modify either the
> drivers/mtd/devices/block2mtd.c NOR simulator or the RAM simulator
> and put in the interrupted flash patterns that I've already
> characterized.  Any ideas on how to simulate a power failure in
> either module and then do a UBIFS remount?

But testing on real HW is better anyway. You see real issues in
this case.

But we have mtdram. You could simulate various patterns by
creating various images on you host FS. Then you may do:

dd if=my_simulated_file of=/dev/mtd0

Probably it makes sense to create an UBIFS FS first. Then
dump /deve/mtd0 to a file, and start abusing this file differently.
Artem Bityutskiy - May 25, 2009, 12:54 p.m.
On Mon, 2009-05-25 at 11:38 +0300, Artem Bityutskiy wrote:
> Presumably what happens it: UBIFS scans LEB 120. It checks the first
> node, and finds CRC mismatch. Then UBIFS logic is as follows. If this
> corrupted node is the last one, then there was a write interrupt,
> which is harmless. But if after this node some other data follows,
> this is some serious corruption. So the 'is_last_write()' function
> is called, it is supposed to check that.
> 
> In 'is_last_write()' I see it has different logic depending on whether
> c->min_io_size == 1 or not. The former case is NOR case, the latter
> is NAND. Well, since I know we never tested UBIFS well for NOR,
> I conclude the NOR case may have a bug.

Oh, this 'c->min_io_size == 1' case is just dead code, we never have
c->min_io_size < 8 in UBIFS. So I just remove that (patch at the end
of the e-mail).

Eric, please, reproduce this problem again. Then please, do not
"fix" it from u-boot. But instead, please do:

1. Enable UBIFS debugging
2. Enable recovery and mount messages, by booting with
   "ubifs.debug_msgs=6144" kernel parameter
3. also add the "ignore_loglevel" boot parameter
4. capture _all_ messages in minicom
5. If possible, make a full dump of your flash to play with it
   later.

Share the messages with us. I hope we can fix these problems.
Just provide us the info.
Artem Bityutskiy - July 3, 2009, 1:26 p.m.
On Tue, 2009-05-19 at 16:16 -0600, Eric Holmberg wrote:
> > On Mon, 2009-05-18 at 11:30 -0600, Eric Holmberg wrote:
> > > Hi Stefan,
> > > 
> > > I am still seeing corruption even with the write buffer 
> > size limited to
> > > 8 bytes, but it's greatly limited.
> > 
> > Do you mean, UBIFS still dies when buf. size is 8?
> > 
> > -- 
> > Best regards,
> > Artem Bityutskiy (Битюцкий Артём)
> 
> 
> Yes, I'm still seeing two failures.  One is where I get 2 corrupt
> empty blocks when an LEB erase operation is interrupted by a power
> failure.  Erasing one of them manually in U-Boot allows the system
> to boot.  I believe this happens when an LEB erase operation is
> interrupted and then during the deferred recovery, another erase
> operation is interrupted.  The system never expects to have more
> than one erase operation interrupted and panics.
> 
> The other failure is a corruption issue, even with the write buffer
> size limited to 8 bytes.  Scroll down to the end of the kernel messages
> for the failure.
> 
> I unfortunately didn't get a chance to get an image of the flash to
> see what happened to the data block before the board was reprogrammed. 
> I'm trying to reproduce it so I can get more details on what is happening.

Stefan sent me a board which should has similar flash: S29GL512N
http://www.spansion.com/datasheets/s29gl-n_00_b8_e.pdf

And the board is Kilauea:
http://www.appliedmicro.com/Embedded/Downloads/download.html?item=537

Here is what MTD thinks about it:

fc000000.nor_flash: Found 1 x16 devices at 0x0 in 16-bit bank
 Amd/Fujitsu Extended Query Table at 0x0040
fc000000.nor_flash: CFI does not contain boot bank location. Assuming
top.
number of CFI chips: 1
cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness.

The kernel is 2.6.30.

I've done power cut tests and UBIFS dies pretty quickly and every time
complains that there are unexpected errors in the LEB, something similar
to what you describe.

You discovered 2 problems:
1. Write-buffering, which you disabled by 8-byte limit
2. Unexpected zeroes, which you reported but never had time to work on.

It seems I hit problem 2, but I could not see problem 1. Anyway, I'm
putting problem 1 aside so far.

I've hacked UBI a little, and made it save PEBs (physical erase blocks)
before erasing them. This means, that before erasing a PEB A, I read it,
and save its contents to another PEB B (at the end of the flash). Then I
erase PEB A.

I found that interrupted erases introduce zeroes at the end of the PEB.

What I observe is: UBIFS has LEB (logical erase block) 3 mapped
to PEB 282. UBIFS unmaps the LEB 3, which means UBI erases PEB 282.
Before erasing PEB 3 my hack reads it and copies its contents to
PEB 472. The the erasure of PEB 3 then starts, but is interrupted by
power cut.

What I observe then is that PEB 282 contains all zeroes at the end,
but the beginning is intact.

Here is what PEB 472 contains (which means PEB 282 contained this before
the erasure):

offset 0-64       - valid erase counter header
offset 64-128     - valid Volume ID header
offset 128-544    - several small UBIFS reference nodes
offset 544-131072 - 0xFF bytes

After the power cut PEB 282 contains:

offset 0-64         - valid erase counter header
offset 64-128       - valid Volume ID header
offset 128-544      - several small UBIFS reference nodes
offset 544-29584    - 0xFF bytes
offset 29584-131072 - zeroes

I've also attached 2 files which contain full dump of PEB 282 and PEB
472.

This stuff confuses UBI. When UBI scans the media, it reads the EC
header and the VID header, checks CRC, they are fine, and it treats
the PEB 282 to be mapped to LEB 3. Then UBIFS panics because it sees
LEB 3 containing unexpected zeroes.

This is the first time I work with NOR, and on NAND we have not seen
such an effect. But this looks weird to me.
Artem Bityutskiy - July 3, 2009, 1:29 p.m.
On Fri, 2009-07-03 at 16:26 +0300, Artem Bityutskiy wrote:
> I've also attached 2 files which contain full dump of PEB 282 and PEB
> 472.

Now attached for real.
Urs Muff - July 3, 2009, 1:33 p.m.
Trimble is closed until 7/13 and Eric is already gone.  I'm not a flash expert at all, and only know things from passing, but from what I have heard, NOR flash does physical erasing quite different from NAND flash, by togging everything to 00, and then to FF.  So what you are seeing is absolutely explainable with what I understand.

URS C. MUFF
FIRMWARE ENGINEER
CONSTRUCTION SERVICES, TRIMBLE, WESTMINSTER, CO
OFFICE: 720-587-4683

> -----Original Message-----

> From: Artem Bityutskiy [mailto:dedekind@infradead.org]

> Sent: Friday, July 03, 2009 7:30 AM

> To: Eric Holmberg

> Cc: Stefan Roese; Adrian Hunter; linux-mtd@lists.infradead.org; Urs

> Muff

> Subject: RE: UBIFS Corrupt during power failure

> 

> On Fri, 2009-07-03 at 16:26 +0300, Artem Bityutskiy wrote:

> > I've also attached 2 files which contain full dump of PEB 282 and PEB

> > 472.

> 

> Now attached for real.

> 

> --

> Best regards,

> Artem Bityutskiy (Битюцкий Артём)
Artem Bityutskiy - July 3, 2009, 2:05 p.m.
On Fri, 2009-07-03 at 07:33 -0600, Urs Muff wrote:
> Trimble is closed until 7/13 and Eric is already gone. 
> I'm not a flash expert at all, and only know things from passing,
> but from what I have heard, NOR flash does physical erasing quite
> different from NAND flash, by togging everything to 00, and then
> to FF.  So what you are seeing is absolutely explainable with what
> I understand.

Hmm, I'll try to google. But if this is true, I do not understand
how JFFS2 may work on NOR...

I wonder if it is possible to ask NOR to erase from the beginning,
not from the end, which should help.
Urs Muff - July 3, 2009, 2:47 p.m.
I can't find the data-sheet reference to that, but I would not characterize it as 'erasing from the end'.

As far as I understand it is a 2 phase process, and you can lose power at any point so you can see a partial phase I or a partial phase II state.
- phase I: set all data to 00: a partial picture would be to see 00 at the beginning and random at the end of the block
- phase II: set all data to FF: a partial picture would be to see FF at the beginning and 00 at the end of the block

From looking at your dumps, it appears that you got interrupted during phase II.  I'm not sure why you still have the header, but that might be b/c the block has been reclaimed / reinitialized (I'm not familiar with the actual code, but it appears that way from the dump).

URS C. MUFF
FIRMWARE ENGINEER
CONSTRUCTION SERVICES, TRIMBLE, WESTMINSTER, CO
OFFICE: 720-587-4683


> -----Original Message-----

> From: Artem Bityutskiy [mailto:dedekind@infradead.org]

> Sent: Friday, July 03, 2009 8:06 AM

> To: Urs Muff

> Cc: Eric Holmberg; Stefan Roese; Adrian Hunter; linux-

> mtd@lists.infradead.org

> Subject: RE: UBIFS Corrupt during power failure

> 

> On Fri, 2009-07-03 at 07:33 -0600, Urs Muff wrote:

> > Trimble is closed until 7/13 and Eric is already gone.

> > I'm not a flash expert at all, and only know things from passing,

> > but from what I have heard, NOR flash does physical erasing quite

> > different from NAND flash, by togging everything to 00, and then

> > to FF.  So what you are seeing is absolutely explainable with what

> > I understand.

> 

> Hmm, I'll try to google. But if this is true, I do not understand

> how JFFS2 may work on NOR...

> 

> I wonder if it is possible to ask NOR to erase from the beginning,

> not from the end, which should help.

> 

> --

> Best regards,

> Artem Bityutskiy (Битюцкий Артём)
Artem Bityutskiy - July 3, 2009, 2:58 p.m.
On Fri, 2009-07-03 at 08:47 -0600, Urs Muff wrote:
> I can't find the data-sheet reference to that, but I would not characterize it as 'erasing from the end'.
> 
> As far as I understand it is a 2 phase process, and you can lose power at any point so you can see a partial phase I or a partial phase II state.
> - phase I: set all data to 00: a partial picture would be to see 00 at the beginning and random at the end of the block
> - phase II: set all data to FF: a partial picture would be to see FF at the beginning and 00 at the end of the block

That would be nice, because then UBI would work fine, since it would
notice corrupted EC header at the beginning, and would drop the PEB.

> From looking at your dumps, it appears that you got interrupted during
> phase II.  I'm not sure why you still have the header, but that might
> be b/c the block has been reclaimed / reinitialized (I'm not familiar
> with the actual code, but it appears that way from the dump).

No, I believe it was not reinitialized.

But I should write a small test program which writes a known pattern to
PEBs, then erases them, and sees what happened to them after an unclean
reboot.
Artem Bityutskiy - July 6, 2009, 4:30 a.m.
On Fri, 2009-07-03 at 17:58 +0300, Artem Bityutskiy wrote:
> On Fri, 2009-07-03 at 08:47 -0600, Urs Muff wrote:
> > I can't find the data-sheet reference to that, but I would not characterize it as 'erasing from the end'.
> > 
> > As far as I understand it is a 2 phase process, and you can lose power at any point so you can see a partial phase I or a partial phase II state.
> > - phase I: set all data to 00: a partial picture would be to see 00 at the beginning and random at the end of the block
> > - phase II: set all data to FF: a partial picture would be to see FF at the beginning and 00 at the end of the block
> 
> That would be nice, because then UBI would work fine, since it would
> notice corrupted EC header at the beginning, and would drop the PEB.
> 
> > From looking at your dumps, it appears that you got interrupted during
> > phase II.  I'm not sure why you still have the header, but that might
> > be b/c the block has been reclaimed / reinitialized (I'm not familiar
> > with the actual code, but it appears that way from the dump).
> 
> No, I believe it was not reinitialized.
> 
> But I should write a small test program which writes a known pattern to
> PEBs, then erases them, and sees what happened to them after an unclean
> reboot.

[CCed Nicolas Pitre]

OK, I've written a small user-space program which first fills the NOR
flash with an '0x89ABCDEF' pattern, then starts erasing it, and then
I cut the power at random point.

And unfortunately the power cut results in eraseblocks which have
0x89ABCDEF at the beginning, and all zeroes at the end. I've attached
one example.

So it indeed looks like NOR erasure includes writing zeroes from the
end. Unfortunately UBI/UBIFS cannot handle this correctly ATM.

I wonder if it is possible to ask NOR rather fill eraseblocks with
zeroes from the beginning, not from the end?

Nicolas, do you have a suggestion? The NOR flash is Spansion S29GL512N:
http://www.spansion.com/datasheets/s29gl-n_00_b8_e.pdf
Artem Bityutskiy - July 6, 2009, 4:51 a.m.
On Mon, 2009-07-06 at 07:30 +0300, Artem Bityutskiy wrote:
> On Fri, 2009-07-03 at 17:58 +0300, Artem Bityutskiy wrote:
> > On Fri, 2009-07-03 at 08:47 -0600, Urs Muff wrote:
> > > I can't find the data-sheet reference to that, but I would not characterize it as 'erasing from the end'.
> > > 
> > > As far as I understand it is a 2 phase process, and you can lose power at any point so you can see a partial phase I or a partial phase II state.
> > > - phase I: set all data to 00: a partial picture would be to see 00 at the beginning and random at the end of the block
> > > - phase II: set all data to FF: a partial picture would be to see FF at the beginning and 00 at the end of the block
> > 
> > That would be nice, because then UBI would work fine, since it would
> > notice corrupted EC header at the beginning, and would drop the PEB.
> > 
> > > From looking at your dumps, it appears that you got interrupted during
> > > phase II.  I'm not sure why you still have the header, but that might
> > > be b/c the block has been reclaimed / reinitialized (I'm not familiar
> > > with the actual code, but it appears that way from the dump).
> > 
> > No, I believe it was not reinitialized.
> > 
> > But I should write a small test program which writes a known pattern to
> > PEBs, then erases them, and sees what happened to them after an unclean
> > reboot.
> 
> [CCed Nicolas Pitre]
> 
> OK, I've written a small user-space program which first fills the NOR
> flash with an '0x89ABCDEF' pattern, then starts erasing it, and then
> I cut the power at random point.
> 
> And unfortunately the power cut results in eraseblocks which have
> 0x89ABCDEF at the beginning, and all zeroes at the end. I've attached
> one example.
> 
> So it indeed looks like NOR erasure includes writing zeroes from the
> end. Unfortunately UBI/UBIFS cannot handle this correctly ATM.

Although I can easily fix this by writing few zeroes at the beginning of
the eraseblock _before) erasing it, so that UBI will be happy. But it is
still interesting whether I may just ask NOR to amend it's embedded
erase algorithm.
Jamie Lokier - July 15, 2009, 8:55 p.m.
Artem Bityutskiy wrote:
> So it indeed looks like NOR erasure includes writing zeroes from the
> end. Unfortunately UBI/UBIFS cannot handle this correctly ATM.

For that chip.  I wouldn't like to assume all NOR chips use the same
erase algorithm.

Also, remember that little problem with the 8-byte write buffer?

I guess it's possible that it's pre-erase-to-zero step might write
zeros in 8-byte blocks too, or in some other size depending on how the
hardware works.  And when it erases bytes in parallel, there's no
guarantee about the order you'll see the bits change if it's
interrupted by a power cycle.

So I guess the right thing is to assume nothing, just that the whole
block may have bits flipped from 1 to 0 in an indeterminate order, and
then all bits flipped from 0 to 1 in an indeterminate order.

Or maybe the weaker assumption, that the whole block is indeterminate
during erase.

-- Jamie
Eric Holmberg - July 15, 2009, 9:36 p.m.
> Artem Bityutskiy wrote:
> > So it indeed looks like NOR erasure includes writing zeroes from the
> > end. Unfortunately UBI/UBIFS cannot handle this correctly ATM.
> 
> For that chip.  I wouldn't like to assume all NOR chips use the same
> erase algorithm.
> 
> Also, remember that little problem with the 8-byte write buffer?

Yes, the configurable buffer size is still a to-do item.

> 
> I guess it's possible that it's pre-erase-to-zero step might write
> zeros in 8-byte blocks too, or in some other size depending on how the
> hardware works.  And when it erases bytes in parallel, there's no
> guarantee about the order you'll see the bits change if it's
> interrupted by a power cycle.
> 
> So I guess the right thing is to assume nothing, just that the whole
> block may have bits flipped from 1 to 0 in an indeterminate order, and
> then all bits flipped from 0 to 1 in an indeterminate order.
> 
> Or maybe the weaker assumption, that the whole block is indeterminate
> during erase.

From the beginning of the erase to the end is definitely an
indeterminate state for the entire PEB.  Writing all zero's to the
header as in Artem's fix should work in all cases excluding the
extremely rare cases where a write of 0's is interrupted and the header
has been changed to a valid value and in the case where an erase
(0-to-1) transition is interrupted which results in a valid header.  The
odds against that are huge, so I would expect the flash to wear out
before it ever happens in real life.

-Eric
Jamie Lokier - July 15, 2009, 10:09 p.m.
Eric Holmberg wrote:
> > So I guess the right thing is to assume nothing, just that the whole
> > block may have bits flipped from 1 to 0 in an indeterminate order, and
> > then all bits flipped from 0 to 1 in an indeterminate order.
> > 
> > Or maybe the weaker assumption, that the whole block is indeterminate
> > during erase.
> 
> >From the beginning of the erase to the end is definitely an
> indeterminate state for the entire PEB.  Writing all zero's to the
> header as in Artem's fix should work in all cases excluding the
> extremely rare cases where a write of 0's is interrupted and the header
> has been changed to a valid value and in the case where an erase
> (0-to-1) transition is interrupted which results in a valid header.  The
> odds against that are huge, so I would expect the flash to wear out
> before it ever happens in real life.

I agree, with a nice strong checksum that should be rare.  With 100
millions of devices and full lifetime of each device, I don't know if
they are so rare with the checksum actually used that they'll never
happen though, or if it matters.

Anyway, the checksums have to be strong for other reasons.

It could be made virtually impossible by writing to a record on a
different PEB which says which PEB is undergoing erase and therefore
indeterminate.  Is that required for NAND in principle, since you
can't overwrite the header to zero it?

If there are NANDs which would require that, it could be a generic
part of UBI/UBIFS and strengthen the behaviour on NOR slightly,
otherwise I'm sure the header-zeroing is enough for NOR.

-- Jamie
Artem Bityutskiy - July 16, 2009, 7:14 a.m.
On Wed, 2009-07-15 at 21:55 +0100, Jamie Lokier wrote:
> So I guess the right thing is to assume nothing, just that the whole
> block may have bits flipped from 1 to 0 in an indeterminate order, and
> then all bits flipped from 0 to 1 in an indeterminate order.

Yes, agree. This should be fine if we have invalidated the magic
numbers in the headers.

> Or maybe the weaker assumption, that the whole block is indeterminate
> during erase.

If we assume this, then we have to introduce a kind of "journal", where
we write "erase start"/"erase end" markers. This is doable, but I
wouldn't go for this unless there is a real case.
Artem Bityutskiy - July 16, 2009, 7:16 a.m.
On Wed, 2009-07-15 at 15:36 -0600, Eric Holmberg wrote:
> indeterminate state for the entire PEB.  Writing all zero's to the
> header as in Artem's fix should work in all cases excluding the
> extremely rare cases where a write of 0's is interrupted and the header
> has been changed to a valid value and in the case where an erase
> (0-to-1) transition is interrupted which results in a valid header.  The
> odds against that are huge, so I would expect the flash to wear out
> before it ever happens in real life.

Hmm, we can zero out both headers completely by writing 128 bytes,
even.
Artem Bityutskiy - July 16, 2009, 7:22 a.m.
On Wed, 2009-07-15 at 23:09 +0100, Jamie Lokier wrote:
> Eric Holmberg wrote:
> > > So I guess the right thing is to assume nothing, just that the whole
> > > block may have bits flipped from 1 to 0 in an indeterminate order, and
> > > then all bits flipped from 0 to 1 in an indeterminate order.
> > > 
> > > Or maybe the weaker assumption, that the whole block is indeterminate
> > > during erase.
> > 
> > >From the beginning of the erase to the end is definitely an
> > indeterminate state for the entire PEB.  Writing all zero's to the
> > header as in Artem's fix should work in all cases excluding the
> > extremely rare cases where a write of 0's is interrupted and the header
> > has been changed to a valid value and in the case where an erase
> > (0-to-1) transition is interrupted which results in a valid header.  The
> > odds against that are huge, so I would expect the flash to wear out
> > before it ever happens in real life.
> 
> I agree, with a nice strong checksum that should be rare.  With 100
> millions of devices and full lifetime of each device, I don't know if
> they are so rare with the checksum actually used that they'll never
> happen though, or if it matters.

Well, I invalidate the magic EC/VID header's 32-bit words, so this
is not even about checksum. Unless these words somehow resurrect from
all-zero to valid-number, we are safe.

The magic numbers are the first 32-bit words of both headers:

/* Erase counter header magic number (ASCII "UBI#") */
#define UBI_EC_HDR_MAGIC  0x55424923
/* Volume identifier header magic number (ASCII "UBI!") */
#define UBI_VID_HDR_MAGIC 0x55424921

> It could be made virtually impossible by writing to a record on a
> different PEB which says which PEB is undergoing erase and therefore
> indeterminate.  Is that required for NAND in principle, since you
> can't overwrite the header to zero it?

For MLC, yes. In case of SLC we have free OOB bytes.

> If there are NANDs which would require that, it could be a generic
> part of UBI/UBIFS and strengthen the behaviour on NOR slightly,
> otherwise I'm sure the header-zeroing is enough for NOR.

Let's wait and see if some one comes up wit such a requirement. Anyway,
the user base of UBIFS is small, and it is not clear if it will grow
in future, because the industry goes away from raw NANDs.
Gilles Casse - July 16, 2009, 8:54 p.m.
Artem Bityutskiy a écrit :
> On Wed, 2009-07-15 at 15:36 -0600, Eric Holmberg wrote:
>   
>> indeterminate state for the entire PEB.  Writing all zero's to the
>> header as in Artem's fix should work in all cases excluding the
>> extremely rare cases where a write of 0's is interrupted and the header
>> has been changed to a valid value and in the case where an erase
>> (0-to-1) transition is interrupted which results in a valid header.  The
>> odds against that are huge, so I would expect the flash to wear out
>> before it ever happens in real life.
>>     
>
> Hmm, we can zero out both headers completely by writing 128 bytes,
> even.
>
>   
According to a fellow electronician, Marc, offlist, it would not be safe
to force at 0 a bit already at 0 in flash.
For zeroing a byte, he recommends to write its complementary value (e.g.
if 0x85 is read then write 0x7A).

Best regards,
Gilles
Carl-Daniel Hailfinger - July 17, 2009, 12:29 a.m.
On 16.07.2009 22:54, Gilles Casse wrote:
> Artem Bityutskiy a écrit :
>   
>> On Wed, 2009-07-15 at 15:36 -0600, Eric Holmberg wrote:
>>   
>>     
>>> Writing all zero's to the
>>> header as in Artem's fix should work in all cases excluding the
>>> extremely rare cases where a write of 0's is interrupted and the header
>>> has been changed to a valid value and in the case where an erase
>>> (0-to-1) transition is interrupted which results in a valid header.  The
>>> odds against that are huge, so I would expect the flash to wear out
>>> before it ever happens in real life.
>>>     
>>>       
>> Hmm, we can zero out both headers completely by writing 128 bytes,
>> even.
>>   
>>     
> According to a fellow electronician, Marc, offlist, it would not be safe
> to force at 0 a bit already at 0 in flash.
> For zeroing a byte, he recommends to write its complementary value (e.g.
> if 0x85 is read then write 0x7A).
>   

I've seen flash where the data sheet mentions implicit erase for each
byte write, so writing a complementary value there might not set all
bits to 0. That might have been NOR flash, though.

If you have a data sheet or similar publication where writing the
complementary value is recommended or mentioned, I'd appreciate a
pointer to it. It does sound logical, but sometimes hardware is a bit odd.


Regards,
Carl-Daniel
Jamie Lokier - July 24, 2009, 2:08 p.m.
Carl-Daniel Hailfinger wrote:
> > According to a fellow electronician, Marc, offlist, it would not be safe
> > to force at 0 a bit already at 0 in flash.
> > For zeroing a byte, he recommends to write its complementary value (e.g.
> > if 0x85 is read then write 0x7A).
> 
> I've seen flash where the data sheet mentions implicit erase for each
> byte write, so writing a complementary value there might not set all
> bits to 0. That might have been NOR flash, though.

Those little serial flashes where you can write bytes individually do
that, of course.

I thought it was a standard, well-known feature of NOR-type flashes
that you could overwrite bytes to zero more of the bits, but I've not
read a standard which says so.

> If you have a data sheet or similar publication where writing the
> complementary value is recommended or mentioned, I'd appreciate a
> pointer to it. It does sound logical, but sometimes hardware is a bit odd.

I haven't head of the complementary value thing before, but I agree it
sounds logical.

-- Jamie

Patch

Index: drivers/mtd/chips/cfi_probe.c
===================================================================
--- drivers/mtd/chips/cfi_probe.c	(revision 4477)
+++ drivers/mtd/chips/cfi_probe.c	(working copy)
@@ -18,7 +18,7 @@ 
 #include <linux/mtd/cfi.h>
 #include <linux/mtd/gen_probe.h>
 
-//#define DEBUG_CFI
+#define DEBUG_CFI
 
 #ifdef DEBUG_CFI
 static void print_cfi_ident(struct cfi_ident *);
@@ -251,6 +251,18 @@ 
 	cfi->cfiq->InterfaceDesc =
le16_to_cpu(cfi->cfiq->InterfaceDesc);
 	cfi->cfiq->MaxBufWriteSize =
le16_to_cpu(cfi->cfiq->MaxBufWriteSize);
 
+	//DEBUG - BEGIN - force max write size to 8 bytes (2^3)
+	if (cfi->cfiq->MaxBufWriteSize)
+	{
+		printk("Warning:  Overriding MaxBufWriteSize from 2^%d
to 2^%d\n",
+				cfi->cfiq->MaxBufWriteSize,
+				3
+				);
+		cfi->cfiq->MaxBufWriteSize = 3;
+	}
+	//DEBUG - END
+
+