Message ID | 1326717654.83444.YahooMailNeo@web121506.mail.ne1.yahoo.com |
---|---|
State | New, archived |
Headers | show |
On Mon, 2012-01-16 at 04:40 -0800, Karsten Jeppesen wrote: > Hi Artem, > > OO something bad happened here. We must have lost track of each other. > > What I sent you was ***NOT*** produced by our standard 2.6.32.8 kernel. > I spent quite some efforts in adapting the new 3.2.0 kernel. Well, ok, does not matter anyway. > I wrote that in my original email, but you may have drifted off to the emails from November. > To make life easier for you I assumed you would rather look at data including the newest stuff rather than backported stuff. There are few fixes for 3.2 in ubifs-v3.2 tree. Namely, they make UBI/UBIFS print some debugging message which are not printed otherwise, due to a bug I introduced at 3.0 times. Please, pick them. Please, just substitute s/2.6.32/3.2/ in my reply.
To recap the most important things - but refer to my todays reply anyway. 1. Pick printing fixes from ubifs-v3.2 2. Find out what UBIFS tells about your max_write_size 3. Check your _real_ buffer size from your flash docs and the driver and verify that it is correct. 4. My theory that you should have 256 ore larger NOR flash write buffer size. 5. With the hack I sent I can mount successfully. To verify my theory you can test with this hack and see if you can reproduce the problem. If you can, then either write-buffer is larger than 256 or my theory is incorrect. BTW, here is what I used to test your image in mtdram, just for reference: $ sudo dmesg -c > /dev/null && sudo ./unload_all.sh && sudo modprobe mtdram total_size=29440 && sudo dd if=~/tmp/Karsten/corrupt_mtd4_20120116.img of=/dev/mtd0 && sudo modprobe ubi mtd=0 && sudo modprobe ubifs && sudo sh -c 'echo "format \"UBIFS DBG\" +p" > /sys/kernel/debug/dynamic_debug/control' && sudo mount -t ubifs /dev/ubi0_0 /mnt/ubifs Yeah, I know it is long and ugly, I wrote it on-the-spot to quickly debug your issue. If you need, you can decompose it.
Hi Artem,
Sorry for the delay, but Im down with the flu and thus I have snot for brains.
This took *a lot* of kernel compiles to sample this data.
Err - that doesn't mean I don't appreciate your efforts - I really do.
Artem: Would it help you in any
way if I get a some of these units sent to you? They are like 15 x 15 cm
Single board. I would send it with all needed (battery included :-) )
and never to be returned (You can keep everything).
I
am asking because it seems to me that you have only limited hardware to
test on and maybe thats the problem? The 64/256 seems to be real and a
problem to be taken serious
1. Where are the patches for 3.2? git://git.infradead.org/~dedekind/ubifs-v3.2.git ?? To get the max_write output I changed the dbg_msg to ubi_msg.
2. UBI: max_write_size 64
3. Confirmed 64 from data sheets
4 Theory unfortunately bust.
5. See below
Now for the weird part: Setting the write buffer INCORRECTLY to 256 does mount the system - but is that healthy???: And what are the implications of setting it to a 4 times wrong value?
UBI: attaching mtd4 to ubi0
UBI: max_write_size 256 <<<<< forced via kernel patch
UBI: physical eraseblock size: 131072 bytes (128 KiB)
UBI: logical eraseblock size: 130944 bytes
UBI: smallest flash I/O unit: 1
UBI: VID header offset: 64 (aligned 64)
UBI: data offset: 128
UBI: max. sequence number: 17653
UBI: attached mtd4 to ubi0
UBI: MTD device name: "User"
UBI: MTD device size: 28 MiB
UBI: number of good PEBs: 230
UBI: number of bad PEBs: 0
UBI: number of corrupted PEBs: 0
UBI: max. allowed volumes: 128
UBI: wear-leveling threshold: 4096
UBI: number of internal volumes: 1
UBI: number of user volumes: 1
UBI: available PEBs: 0
UBI: total number of reserved PEBs: 230
UBI: number of PEBs reserved for bad PEB handling: 0
UBI: max/mean erase counter: 31/5
UBI: image sequence number: 1704669600
UBI: background thread "ubi_bgt0d" started, PID 1734
UBI device number 0, total 230 LEBs (30117120 bytes, 28.7 MiB), available 0 LEBs (0 bytes), LEB size 130944 bytes (127.9 KiB)
# mkdir -p /skov/mnt/rootfs
# Starting dropbear
mount -t ubifs ubi0:rootfs /skov/mnt/rootfs
UBIFS: recovery needed
UBIFS: recovery completed
UBIFS: mounted UBI device 0, volume 0, name "rootfs"
UBIFS: file system size: 28414848 bytes (27748 KiB, 27 MiB, 217 LEBs)
UBIFS: journal size: 1440384 bytes (1406 KiB, 1 MiB, 11 LEBs)
UBIFS: media format: w4/r0 (latest is w4/r0)
UBIFS: default compressor: lzo
UBIFS: reserved for root: 1342103 bytes (1310 KiB)
dmesg gives no additional info.
----- Original Message -----
From: Artem Bityutskiy <dedekind1@gmail.com>
To: Karsten Jeppesen <arm9263@yahoo.com>
Cc: ubifs <linux-mtd@lists.infradead.org>
Sent: Monday, January 16, 2012 1:50 PM
Subject: Re: Corrupted UBIFS, bad CRC
To recap the most important things - but refer to my todays reply
anyway.
1. Pick printing fixes from ubifs-v3.2
2. Find out what UBIFS tells about your max_write_size
3. Check your _real_ buffer size from your flash docs and the driver and
verify that it is correct.
4. My theory that you should have 256 ore larger NOR flash write buffer
size.
5. With the hack I sent I can mount successfully. To verify my theory
you can test with this hack and see if you can reproduce the problem. If
you can, then either write-buffer is larger than 256 or my theory is
incorrect.
BTW, here is what I used to test your image in mtdram, just for
reference:
$ sudo dmesg -c > /dev/null && sudo ./unload_all.sh && sudo modprobe mtdram total_size=29440 && sudo dd if=~/tmp/Karsten/corrupt_mtd4_20120116.img of=/dev/mtd0 && sudo modprobe ubi mtd=0 && sudo modprobe ubifs && sudo sh -c 'echo "format \"UBIFS DBG\" +p" > /sys/kernel/debug/dynamic_debug/control' && sudo mount -t ubifs /dev/ubi0_0 /mnt/ubifs
Yeah, I know it is long and ugly, I wrote it on-the-spot to quickly
debug your issue. If you need, you can decompose it.
On Tue, 2012-01-17 at 04:23 -0800, Karsten Jeppesen wrote: > Artem: Would it help you in any > way if I get a some of these units sent to you? They are like 15 x 15 cm > Single board. I would send it with all needed (battery included :-) ) > and never to be returned (You can keep everything). I would be happy to help, but I really have no time to do more than suggesting and giving some advises - my employer and the baby take all my time, sorry. Besides, I am not a flash HW expert, and the issue you observe look like it is very related to your HW and how it behaves when it loses power when a write operation is ongoing. Or may be erase operation, but it looks like that was a write operation. It does not look at all like UBI/UBIFS issue. > 1. Where are the patches for 3.2? git://git.infradead.org/~dedekind/ubifs-v3.2.git Yes. > ?? To get the max_write output I changed the dbg_msg to ubi_msg. > > 2. UBI: max_write_size 64 > 3. Confirmed 64 from data sheets OK. > 4 Theory unfortunately bust. Not necessarily. You need to dig deeper - what if your driver is doing something you are not aware about or the controller? Better to ask the vendor how the flash behaves on a power cut while writing. > 5. See below > > Now for the weird part: Setting the write buffer INCORRECTLY to 256 > does mount the system - but is that healthy???: And what are the > implications of setting it to a 4 times wrong value? You need to really dig deeper into this. Let me elaborate the concept of 'max_write_size', and also you can find it from git log and by googling - we discussed this in the mailing list. So, UBI has a notion of min_io_size - this is minimum amount of bytes you can write. For NAND this is often 2048. For NOR it is 1 byte. NORs have optimization called "write-buffer", which means that NOR can write many bytes at a time. This "write buffer" size is called 'max_write_size' in UBIFS, to be consistent with 'min_io_size', and also because UBIFS has its own write-buffers, so this term has already been occupied when we added 'max_write_size'. Note, we added 'max_write_size' to fix NOR issues after power cuts, I think last year. So what happens when you write data and have a power cut? On the driver level, you write write-buffer after write-buffer - 'max_write_size' bytes at a time. The experiments with NOR showed that the after the power cut the 'max_write_size' area which you have been writing to during the power cut will contain garbage, or unstable bits, or zeroes, or few zeroes, or any other anomaly. When UBIFS recovers after a power cut, in has to scan the journal and find the last node. The last node is the one which follows with all 0xFF bytes. In your case you have one good node from offset 0 to offset 112 (AFAIR), then 32 bytes of 0xFFs, and then 32 bytes of zeroes, and then the rest is all 0xFFs. So my theory is that your write-buffer size is 256, or you have some kind of striping, or something, so that when UBIFS submits a 112 bytes write request, on the driver level a 256-byte write buffer is used, and actually the area (0, 256) is being programmed, but area 113-226 is programmed with all 0xFFs. Or something like that, I do not know NOR well enough. Anyway, what happens is that due to power cut you end up with random corruption within that 256 bytes area, so you end up with those zeroes. UBIFS is aware of this effect and it knows that it should only check for all 0xFFs starting from the next 'max_write_size'-align offset after the last node. But because in your case 'max_write_size' is 64, it hits those zeroes and refuses mounting, because it is unexpected. I do not think there is a big downside of having 'max_write_size' to be 256 from the performance POW at least. The downside is that in case of power cut you may lose a bit more data, because UBIFS has its own write-buffers, but this is really minor. You can also experiment by forcing your flash to not use write-buffer at all to verify if the corruptions are related.
diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c index 6c3fb5a..58a49e7 100644 --- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c @@ -691,6 +691,8 @@ static int io_init(struct ubi_device *ubi) ubi_assert(ubi->min_io_size % ubi->hdrs_min_io_size == 0); ubi->max_write_size = ubi->mtd->writebufsize; + ubi->max_write_size = 256; + /* * Maximum write size has to be greater or equivalent to min. I/O * size, and be multiple of min. I/O size.