diff mbox

MTD : Kernel oops when remounting ubifs as read/write

Message ID 1363252425.11441.94.camel@sauron.fi.intel.com
State New, archived
Headers show

Commit Message

Artem Bityutskiy March 14, 2013, 9:13 a.m. UTC
On Wed, 2013-03-13 at 11:12 +0000, Mark Jackson wrote:
> Sorry ... this just locks up the unit.

OK, I've reproduced the issue with 3.9-rc2 in nandsim, see the details
below. The patch I proposed did not get the error path correctly, but it
does fix the issue.

I think what you treat as "lockup" is the fixup process. UBIFS basically
reads the entire UBI volume and writes it back. And it uses the atomic
change UBI service, which means it also calculates CRC of everything it
writes. And this all just takes a lot of time. This has to be done only
once on the first mount.

I've attached the following:

1. The patch which fixes the issue when I use nandsim. It is also
   inlined in the end. Please, give it a try and be more patient -
   wait longer. Please, do report your results back.
2. 'reproduce.sh' - a quick and dirty shell script which reproduces the
    problem
3. ubinize.cfg - is needed for 'reproduce.sh'.


Thanks!

From a173f8e9296562e5ece3dd2936d799001897d6c6 Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Date: Thu, 14 Mar 2013 10:49:23 +0200
Subject: [PATCH] UBIFS: make space fixup work in the remount case

The UBIFS space fixup is a useful feature which allows to fixup the "broken"
flash space at the time of the first mount. The "broken" space is usually the
result of using a "dumb" industrial flasher which is not able to skip empty
NAND pages and just writes all 0xFFs to the empty space, which has grave
side-effects for UBIFS when UBIFS trise to write useful data to those empty
pages.

The fix-up feature works roughly like this:
1. mkfs.ubifs sets the fixup flag in UBIFS superblock when creating the image
   (see -F option)
2. when the file-system is mounted for the first time, UBIFS notices the fixup
   flag and re-writes the entire media atomically, which may take really a lot
   of time.
3. UBIFS clears the fixup flag in the superblock.

This works fine when the file system is mounted R/W for the very first time.
But it did not really work in the case when we first mount the file-system R/O,
and then re-mount R/W. The reason was that we started the fixup procedure too
late, which we cannot really do because we have to fixup the space before it
starts being used.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reported-by: Mark Jackson <mpfj-list@mimc.co.uk>
Cc: stable@vger.kernel.org # 3.0+
---
 fs/ubifs/super.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

Comments

Mark Jackson March 14, 2013, 9:54 a.m. UTC | #1
On 14/03/13 09:13, Artem Bityutskiy wrote:
> On Wed, 2013-03-13 at 11:12 +0000, Mark Jackson wrote:
>> Sorry ... this just locks up the unit.
> 
> OK, I've reproduced the issue with 3.9-rc2 in nandsim, see the details
> below. The patch I proposed did not get the error path correctly, but it
> does fix the issue.
> 
> I think what you treat as "lockup" is the fixup process. UBIFS basically
> reads the entire UBI volume and writes it back. And it uses the atomic
> change UBI service, which means it also calculates CRC of everything it
> writes. And this all just takes a lot of time. This has to be done only
> once on the first mount.

Okay ... I've retried, but how long is "a lot of time" ?

I've waited 15 minutes and still nothing.

And I can see that there's no activity on the NAND chip select !?!

I'll put some debug info into the fixup routines to see if I can trace what's going on.

Regards
Mark J.
Artem Bityutskiy March 14, 2013, 10:30 a.m. UTC | #2
On Thu, 2013-03-14 at 09:54 +0000, Mark Jackson wrote:
> On 14/03/13 09:13, Artem Bityutskiy wrote:
> > On Wed, 2013-03-13 at 11:12 +0000, Mark Jackson wrote:
> >> Sorry ... this just locks up the unit.
> > 
> > OK, I've reproduced the issue with 3.9-rc2 in nandsim, see the details
> > below. The patch I proposed did not get the error path correctly, but it
> > does fix the issue.
> > 
> > I think what you treat as "lockup" is the fixup process. UBIFS basically
> > reads the entire UBI volume and writes it back. And it uses the atomic
> > change UBI service, which means it also calculates CRC of everything it
> > writes. And this all just takes a lot of time. This has to be done only
> > once on the first mount.
> 
> Okay ... I've retried, but how long is "a lot of time" ?
> 
> I've waited 15 minutes and still nothing.
> 
> And I can see that there's no activity on the NAND chip select !?!
> 
> I'll put some debug info into the fixup routines to see if I can trace what's going on.

Just to make sure - try this last patch I sent. I did test it with
nandsim at least, and I am sure it works. I did not test at all the
first one.

And yes, debug messages would be useful, just do not forget to add the
'ignore_loglevel' kernel boot option, otherwise you won't see the
messages on your console, since they are of KERN_DEBUG level.

You may have other issues which cause lockup, e.g., in driver level. It
makes sense to validate your flash with MTD test modules.
Mark Jackson March 14, 2013, 11:15 a.m. UTC | #3
On 14/03/13 10:30, Artem Bityutskiy wrote:
> On Thu, 2013-03-14 at 09:54 +0000, Mark Jackson wrote:
>> On 14/03/13 09:13, Artem Bityutskiy wrote:
>>> On Wed, 2013-03-13 at 11:12 +0000, Mark Jackson wrote:
>>>> Sorry ... this just locks up the unit.
>>>
>>> OK, I've reproduced the issue with 3.9-rc2 in nandsim, see the details
>>> below. The patch I proposed did not get the error path correctly, but it
>>> does fix the issue.
>>>
>>> I think what you treat as "lockup" is the fixup process. UBIFS basically
>>> reads the entire UBI volume and writes it back. And it uses the atomic
>>> change UBI service, which means it also calculates CRC of everything it
>>> writes. And this all just takes a lot of time. This has to be done only
>>> once on the first mount.
>>
>> Okay ... I've retried, but how long is "a lot of time" ?
>>
>> I've waited 15 minutes and still nothing.
>>
>> And I can see that there's no activity on the NAND chip select !?!
>>
>> I'll put some debug info into the fixup routines to see if I can trace what's going on.
> 
> Just to make sure - try this last patch I sent. I did test it with
> nandsim at least, and I am sure it works. I did not test at all the
> first one.
> 
> And yes, debug messages would be useful, just do not forget to add the
> 'ignore_loglevel' kernel boot option, otherwise you won't see the
> messages on your console, since they are of KERN_DEBUG level.
> 
> You may have other issues which cause lockup, e.g., in driver level. It
> makes sense to validate your flash with MTD test modules.

My first initial findings ...

I added some simple printk(KERN_INFO) lines to fixup_free_space(), and when
I remount, it gets as far as:-

printk(KERN_INFO "ffs 7\n");
	/* Fixup LEBs in the main area */
	for (lnum = c->main_first; lnum < c->leb_cnt; lnum++) {
		lprops = ubifs_lpt_lookup(c, lnum);
		if (IS_ERR(lprops)) {
			err = PTR_ERR(lprops);
			goto out;
		}

		if (lprops->free > 0) {
			err = fixup_leb(c, lnum, c->leb_size - lprops->free);
			if (err)
				goto out;
		}
	}

out:
printk(KERN_INFO "ffs x\n");
	ubifs_release_lprops(c);
	return err;
}

... before we get an oops with crc32_le().  See the full log below.

I'll keep digging ...

Regards
Mark J.
---
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 3.8.0-next-20130225-00002-g678576f-dirty (mpfj@mpfj-nanobone) (gcc version 4.5.4 (Buildroot 2012.11) ) #40 Thu Mar 14 10:58:28 GMT 2013
[    0.000000] CPU: ARMv7 Processor [413fc082] revision 2 (ARMv7), cr=10c53c7d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] Machine: Generic AM33XX (Flattened Device Tree), model: TI AM335x BeagleBone
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] Memory policy: ECC disabled, Data cache writeback
[    0.000000] On node 0 totalpages: 65280
[    0.000000] free_area_init_node: node 0, pgdat c059c7b0, node_mem_map c0ac6000
[    0.000000]   Normal zone: 512 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 65280 pages, LIFO batch:15
[    0.000000] CPU: All CPU(s) started in SVC mode.
[    0.000000] AM335X ES1.0 (neon )
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 64768
[    0.000000] Kernel command line: console=ttyO0,115200n8 noinitrd ip=off mem=256M rootwait=1 ubi.mtd=6,2048 rootfstype=ubifs root=ubi0:rootfs ignore_loglevel
[    0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes)
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] __ex_table already sorted, skipping sort
[    0.000000] Memory: 255MB = 255MB total
[    0.000000] Memory: 247768k/247768k available, 14376k reserved, 0K highmem
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     vmalloc : 0xd0800000 - 0xff000000   ( 744 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xd0000000   ( 256 MB)
[    0.000000]     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
[    0.000000]       .text : 0xc0008000 - 0xc05194ac   (5190 kB)
[    0.000000]       .init : 0xc051a000 - 0xc054bfbc   ( 200 kB)
[    0.000000]       .data : 0xc054c000 - 0xc059d360   ( 325 kB)
[    0.000000]        .bss : 0xc059d360 - 0xc0ac21a0   (5268 kB)
[    0.000000] NR_IRQS:16 nr_irqs:16 16
[    0.000000] IRQ: Found an INTC at 0xfa200000 (revision 5.0) with 128 interrupts
[    0.000000] Total of 128 interrupts on 1 active controller
[    0.000000] OMAP clockevent source: GPTIMER1 at 26000000 Hz
[    0.000000] sched_clock: 32 bits at 26MHz, resolution 38ns, wraps every 165191ms
[    0.000000] OMAP clocksource: GPTIMER2 at 26000000 Hz
[    0.000000] Console: colour dummy device 80x30
[    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.000000] ... MAX_LOCK_DEPTH:          48
[    0.000000] ... MAX_LOCKDEP_KEYS:        8191
[    0.000000] ... CLASSHASH_SIZE:          4096
[    0.000000] ... MAX_LOCKDEP_ENTRIES:     16384
[    0.000000] ... MAX_LOCKDEP_CHAINS:      32768
[    0.000000] ... CHAINHASH_SIZE:          16384
[    0.000000]  memory used by lock dependency info: 3695 kB
[    0.000000]  per task-struct memory footprint: 1152 bytes
[    0.000866] Calibrating delay loop... 476.77 BogoMIPS (lpj=2383872)
[    0.119747] pid_max: default: 32768 minimum: 301
[    0.120031] Security Framework initialized
[    0.120132] Mount-cache hash table entries: 512
[    0.130113] CPU: Testing write buffer coherency: ok
[    0.131391] Setting up static identity map for 0xc0402cf8 - 0xc0402d50
[    0.134796] devtmpfs: initialized
[    0.196435] pinctrl core: initialized pinctrl subsystem
[    0.201669] regulator-dummy: no parameters
[    0.203862] NET: Registered protocol family 16
[    0.204608] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.222839] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
[    0.223232] OMAP GPIO hardware version 0.1
[    0.225994] gpiochip_add: registered GPIOs 32 to 63 on device: gpio
[    0.228624] gpiochip_add: registered GPIOs 64 to 95 on device: gpio
[    0.231214] gpiochip_add: registered GPIOs 96 to 127 on device: gpio
[    0.245983] omap-gpmc 50000000.gpmc: could not find pctldev for node /pinmux@44e10800/gpmc_pins, deferring probe
[    0.246035] platform 50000000.gpmc: Driver omap-gpmc requests probe deferral
[    0.246674] No ATAGs?
[    0.246691] hw-breakpoint: debug architecture 0x4 unsupported.
[    0.284604] bio: create slab <bio-0> at 0
[    0.350695] omap-dma-engine omap-dma-engine: OMAP DMA engine driver
[    0.357130] usbcore: registered new interface driver usbfs
[    0.357526] usbcore: registered new interface driver hub
[    0.358167] usbcore: registered new device driver usb
[    0.359106] omap_i2c 44e0b000.i2c: could not find pctldev for node /pinmux@44e10800/i2c1_pins, deferring probe
[    0.359152] platform 44e0b000.i2c: Driver omap_i2c requests probe deferral
[    0.366391] cfg80211: Calling CRDA to update world regulatory domain
[    0.368068] Switching to clocksource gp_timer
[    0.416727] NET: Registered protocol family 2
[    0.418471] TCP established hash table entries: 2048 (order: 2, 16384 bytes)
[    0.418683] TCP bind hash table entries: 2048 (order: 4, 73728 bytes)
[    0.419630] TCP: Hash tables configured (established 2048 bind 2048)
[    0.419844] TCP: reno registered
[    0.419874] UDP hash table entries: 256 (order: 2, 20480 bytes)
[    0.420133] UDP-Lite hash table entries: 256 (order: 2, 20480 bytes)
[    0.420922] NET: Registered protocol family 1
[    0.422131] RPC: Registered named UNIX socket transport module.
[    0.422155] RPC: Registered udp transport module.
[    0.422165] RPC: Registered tcp transport module.
[    0.422175] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.423061] NetWinder Floating Point Emulator V0.97 (double precision)
[    0.423514] CPU PMU: probing PMU on CPU 0
[    0.423544] hw perfevents: enabled with ARMv7 Cortex-A8 PMU driver, 5 counters available
[    0.435819] msgmni has been set to 483
[    0.440155] io scheduler noop registered
[    0.440181] io scheduler deadline registered
[    0.440250] io scheduler cfq registered (default)
[    0.441286] pinctrl-single 44e10800.pinmux: 142 pins at pa f9e10800 size 568
[    0.442944] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.450243] 44e09000.serial: ttyO0 at MMIO 0x44e09000 (irq = 88) is a OMAP UART0
[    1.062979] console [ttyO0] enabled
[    1.069318] 48022000.serial: ttyO1 at MMIO 0x48022000 (irq = 89) is a OMAP UART1
[    1.078922] 48024000.serial: ttyO2 at MMIO 0x48024000 (irq = 90) is a OMAP UART2
[    1.088374] 481a6000.serial: ttyO3 at MMIO 0x481a6000 (irq = 60) is a OMAP UART3
[    1.097747] 481a8000.serial: ttyO4 at MMIO 0x481a8000 (irq = 61) is a OMAP UART4
[    1.107140] 481aa000.serial: ttyO5 at MMIO 0x481aa000 (irq = 62) is a OMAP UART5
[    1.141639] brd: module loaded
[    1.160604] loop: module loaded
[    1.167555] Generic platform RAM MTD, (c) 2004 Simtec Electronics
[    1.179155] usbcore: registered new interface driver rt2800usb
[    1.185699] usbcore: registered new interface driver asix
[    1.191823] usbcore: registered new interface driver cdc_ether
[    1.198389] usbcore: registered new interface driver smsc95xx
[    1.204825] usbcore: registered new interface driver net1080
[    1.211201] usbcore: registered new interface driver cdc_subset
[    1.217812] usbcore: registered new interface driver zaurus
[    1.224164] usbcore: registered new interface driver cdc_ncm
[    1.232048] usbcore: registered new interface driver cdc_wdm
[    1.238521] usbcore: registered new interface driver usbtest
[    1.244879] i2c /dev entries driver
[    1.252025] omap_wdt: OMAP Watchdog Timer Rev 0x01: initial timeout 60 sec
[    1.260616] omap_cpufreq_init: unable to get MPU regulator
[    1.269707] usbcore: registered new interface driver usbhid
[    1.275600] usbhid: USB HID core driver
[    1.280825] oprofile: using arm/armv7
[    1.285411] TCP: cubic registered
[    1.288994] Initializing XFRM netlink socket
[    1.293630] NET: Registered protocol family 17
[    1.298456] NET: Registered protocol family 15
[    1.303575] Key type dns_resolver registered
[    1.308250] VFP support v0.3: implementor 41 architecture 3 part 30 variant c rev 3
[    1.316401] ThumbEE CPU extension supported.
[    1.324959] omap-gpmc 50000000.gpmc: GPMC revision 6.0
[    1.332583] enabling NAND BCH ecc with 8-bit correction
[    1.338520] ONFI param page 0 valid
[    1.342206] ONFI flash detected
[    1.345527] NAND device: Manufacturer ID: 0x2c, Chip ID: 0xda (Micron MT29F2G08ABAEAWP), 256MiB, page size: 2048, OOB size: 64
[    1.357810] 8 ofpart partitions found on MTD device omap2-nand.0
[    1.364195] Creating 8 MTD partitions on "omap2-nand.0":
[    1.369857] 0x000000000000-0x000000020000 : "spl1"
[    1.377262] 0x000000020000-0x000000040000 : "spl2"
[    1.383363] 0x000000040000-0x000000060000 : "spl3"
[    1.389263] 0x000000060000-0x000000080000 : "spl4"
[    1.395107] 0x000000080000-0x0000001e0000 : "boot"
[    1.402034] 0x0000001e0000-0x000000200000 : "env"
[    1.407758] 0x000000200000-0x000004200000 : "rootfs"
[    1.468624] 0x000004200000-0x000010000000 : "data"
[    1.641028] omap_i2c 44e0b000.i2c: bus 0 rev0.11 at 400 kHz
[    1.647706] mcp230xx 0-0020: invalid or missing platform data
[    1.653901] mcp230xx: probe of 0-0020 failed with error -22
[    1.664675] DCDC1: at 1800 mV
[    1.670057] vdd_mpu: 925 <--> 1325 mV at 3300 mV
[    1.677053] vdd_core: 925 <--> 1150 mV at 1100 mV
[    1.684039] LDO1: at 1800 mV
[    1.689131] LDO2: at 3300 mV
[    1.694112] LDO3: at 1800 mV
[    1.699383] LDO4: at 2800 mV
[    1.704094] tps65217 0-0024: TPS65217 ID 0x7 version 1.1
[    1.710665] at24 0-0053: 256 byte 24c02 EEPROM, writable, 8 bytes/write
[    1.721368] rtc-ds1307 0-0068: rtc core: registered ds1307 as rtc0
[    1.727934] rtc-ds1307 0-0068: 56 bytes nvram
[    1.734589] UBI: attaching mtd6 to ubi0
[    1.944782] UBI: scanning is finished
[    1.958753] UBI warning: print_rsvd_warning: cannot reserve enough PEBs for bad PEB handling, reserved 4, need 40
[    1.982129] UBI: volume 0 ("rootfs") re-sized from 504 to 504 LEBs
[    1.991995] UBI: attached mtd6 (name "rootfs", size 64 MiB) to ubi0
[    1.998710] UBI: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[    2.005875] UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 512
[    2.012935] UBI: VID header offset: 2048 (aligned 2048), data offset: 4096
[    2.020220] UBI: good PEBs: 512, bad PEBs: 0, corrupted PEBs: 0
[    2.026469] UBI: user volume: 1, internal volumes: 1, max. volumes count: 128
[    2.034028] UBI: max/mean erase counter: 1/0, WL threshold: 4096, image sequence number: 104567561
[    2.043506] UBI: available PEBs: 0, total reserved PEBs: 512, PEBs reserved for bad PEB handling: 4
[    2.054154] UBI: background thread "ubi_bgt0d" started, PID 591
[    2.148196] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
[    2.154650] davinci_mdio 4a101000.mdio: detected phy mask fffffffc
[    2.179932] libphy: 4a101000.mdio: probed
[    2.184195] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720
[    2.193904] davinci_mdio 4a101000.mdio: phy[1]: device 4a101000.mdio:01, driver SMSC LAN8710/LAN8720
[    2.203969] Random MACID = 36:f9:8d:06:57:7f
[    2.240893] rtc-ds1307 0-0068: setting system clock to 2013-03-14 11:07:34 UTC (1363259254)
[    2.419408] UBIFS: mounted UBI device 0, volume 0, name "rootfs", R/O mode
[    2.426699] UBIFS: LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[    2.436415] UBIFS: FS size: 62599168 bytes (59 MiB, 493 LEBs), journal size 8253440 bytes (7 MiB, 65 LEBs)
[    2.446637] UBIFS: reserved for root: 0 bytes (0 KiB)
[    2.452006] UBIFS: media format: w4/r0 (latest is w4/r0), UUID 7A53349A-BCDB-4026-B45F-CCE435589ADA, small LPT model
[    2.520859] VFS: Mounted root (ubifs filesystem) readonly on device 0:12.
[    2.530533] devtmpfs: mounted
[    2.534117] Freeing init memory: 196K
Starting logging: OK
Initializing random number generator... read-only file system detected...done
Starting network...
[    3.060289] net eth0: initializing cpsw version 1.12 (0)
[    3.068476] net eth0: phy found : id is : 0x7c0f1
[    3.074025] net eth0: phy found : id is : 0x7c0f1
udhcpc (v1.20.2) started
Sending discover...
Sending discover...
[    6.149073] libphy: 4a101000.mdio:00 - Link is Up - 100/Full
Sending discover...
Sending select for 10.0.0.127...
Lease of 10.0.0.127 obtained, lease time 259200
deleting routers
route: SIOCDELRT: No such process
adding dns 4.2.2.5
adding dns 208.67.220.220
ip: SIOCGIFFLAGS: No such device
Starting dropbear sshd: mkdir: can't create directory '/etc/dropbear': Read-only file system
generating rsa key... generating dsa key... OK
Starting lighttpd: OK

Welcome to Buildroot
nanobone login: root
Password:
# mount
rootfs on / type rootfs (rw)
ubi0:rootfs on / type ubifs (ro,relatime)
devtmpfs on /dev type devtmpfs (rw,relatime,size=123884k,nr_inodes=30971,mode=755)
proc on /proc type proc (rw,relatime)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,relatime,mode=777)
tmpfs on /tmp type tmpfs (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
# mount -o remount,rw /
[   28.482971] UBIFS: start fixing up free space
[   28.487601] uffs 1
[   28.489849] ffs 1
[   28.491901] ffs 2
[   28.498617] ffs 3
[   28.500932] ffs 4
[   28.504148] ffs 5
[   28.507352] ffs 6
[   28.509468] ffs 7
[   28.527936] Unable to handle kernel paging request at virtual address d08ea004
[   28.535639] pgd = cf3b8000
[   28.538525] [d08ea004] *pgd=8f045811, *pte=00000000, *ppte=00000000
[   28.545173] Internal error: Oops: 7 [#1] ARM
[   28.549685] CPU: 0    Not tainted  (3.8.0-next-20130225-00002-g678576f-dirty #40)
[   28.557595] PC is at crc32_le+0x50/0x168
[   28.561735] LR is at ubi_eba_atomic_leb_change+0x1b4/0x410
[   28.567523] pc : [<c01e7244>]    lr : [<c026dd9c>]    psr: 20000013
[   28.567523] sp : cf385de0  ip : 180a985a  fp : c054f840
[   28.579632] r10: c054f040  r9 : c054fc40  r8 : 158a1232
[   28.585141] r7 : 4d506705  r6 : 9324fd72  r5 : 4dc8a10b  r4 : 4c162691
[   28.592025] r3 : c054e040  r2 : c054f440  r1 : d08ea000  r0 : 4c8ee09f
[   28.598912] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   28.606439] Control: 10c5387d  Table: 8f3b8019  DAC: 00000015
[   28.612501] Process mount (pid: 659, stack limit = 0xcf384238)
[   28.618652] Stack: (0xcf385de0 to 0xcf386000)
[   28.623254] 5de0: cf2f8554 00000000 d08e6e8c 180a9e88 5a257c4f 58399ee9 c8d98a08 bb0ee864
[   28.631886] 5e00: eae10678 c054fc40 c054f040 c054f840 cf2f8000 00000000 d08caffc 00003c00
[   28.640517] 5e20: cf2f8000 cf357c00 00000000 0000000c cf2ec000 00000000 0000000c cf2f8554
[   28.649148] 5e40: d08cb000 0001e000 00000000 d08cb000 00008000 00000000 00000000 0001e000
[   28.657779] 5e60: 00000000 0000000c d08cb000 00000080 0000000c 0000000c 00000000 00000020
[   28.666409] 5e80: 00008000 c026c41c 0001e000 cf330000 cf330000 d08cb000 0001e000 c0179b14
[   28.675042] 5ea0: 0000000d c0177a68 0001e000 cf330000 00000000 cf330b20 0000000d c0179698
[   28.683672] 5ec0: cf330000 00000000 cf330a9c 00000000 cf385f48 c0175170 00000001 60000013
[   28.692303] 5ee0: cf32c800 00000000 00000000 00000000 cf385f48 00000000 00000020 c00c9e24
[   28.700934] 5f00: 00100100 00200200 cf3a6c80 00008000 cf384000 00208020 00000000 cf01a200
[   28.709564] 5f20: cf32c800 c00e3d6c 00000000 0000000c cf32c840 00000000 c0013968 cf31fb80
[   28.718195] 5f40: 0000000c 00000000 cf01a210 ce828858 0000000c cf3ac000 000a18b4 00000000
[   28.726827] 5f60: 00208020 c0013968 cf384000 00000000 00000003 c00e3e40 00000000 c0071e24
[   28.735459] 5f80: 00000000 00000000 cf31fb80 cf31fbc0 a0000010 00000000 bed87b68 b6ff148c
[   28.744091] 5fa0: 00000015 c00137c0 00000000 bed87b68 000a18b4 000a18c0 000a18c2 00208020
[   28.752720] 5fc0: 00000000 bed87b68 b6ff148c 00000015 00000000 00000000 00000000 00000003
[   28.761350] 5fe0: b6fa3f48 bed87a64 00042994 b6fa3f58 a0000010 000a18b4 00000000 00000000
[   28.769989] [<c01e7244>] (crc32_le+0x50/0x168) from [<cf2f8000>] (0xcf2f8000)
[   28.777522] Code: e58d8008 0a000026 e1a0c005 e58d2004 (e5916004)
[   28.784029] ---[ end trace f50f53afffe647f1 ]---
Segmentation fault
Artem Bityutskiy March 14, 2013, 11:23 a.m. UTC | #4
On Thu, 2013-03-14 at 11:15 +0000, Mark Jackson wrote:
> [   28.538525] [d08ea004] *pgd=8f045811, *pte=00000000, *ppte=00000000
> [   28.545173] Internal error: Oops: 7 [#1] ARM
> [   28.549685] CPU: 0    Not tainted  (3.8.0-next-20130225-00002-g678576f-dirty #40)
> [   28.557595] PC is at crc32_le+0x50/0x168
> [   28.561735] LR is at ubi_eba_atomic_leb_change+0x1b4/0x410
> [   28.567523] pc : [<c01e7244>]    lr : [<c026dd9c>]    psr: 20000013
> [   28.567523] sp : cf385de0  ip : 180a985a  fp : c054f840
> [   28.579632] r10: c054f040  r9 : c054fc40  r8 : 158a1232
> [   28.585141] r7 : 4d506705  r6 : 9324fd72  r5 : 4dc8a10b  r4 : 4c162691
> [   28.592025] r3 : c054e040  r2 : c054f440  r1 : d08ea000  r0 : 4c8ee09f
> [   28.598912] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> [   28.606439] Control: 10c5387d  Table: 8f3b8019  DAC: 00000015
> [   28.612501] Process mount (pid: 659, stack limit = 0xcf384238)
> [   28.618652] Stack: (0xcf385de0 to 0xcf386000)
> [   28.623254] 5de0: cf2f8554 00000000 d08e6e8c 180a9e88 5a257c4f 58399ee9 c8d98a08 bb0ee864
> [   28.631886] 5e00: eae10678 c054fc40 c054f040 c054f840 cf2f8000 00000000 d08caffc 00003c00
> [   28.640517] 5e20: cf2f8000 cf357c00 00000000 0000000c cf2ec000 00000000 0000000c cf2f8554
> [   28.649148] 5e40: d08cb000 0001e000 00000000 d08cb000 00008000 00000000 00000000 0001e000
> [   28.657779] 5e60: 00000000 0000000c d08cb000 00000080 0000000c 0000000c 00000000 00000020
> [   28.666409] 5e80: 00008000 c026c41c 0001e000 cf330000 cf330000 d08cb000 0001e000 c0179b14
> [   28.675042] 5ea0: 0000000d c0177a68 0001e000 cf330000 00000000 cf330b20 0000000d c0179698
> [   28.683672] 5ec0: cf330000 00000000 cf330a9c 00000000 cf385f48 c0175170 00000001 60000013
> [   28.692303] 5ee0: cf32c800 00000000 00000000 00000000 cf385f48 00000000 00000020 c00c9e24
> [   28.700934] 5f00: 00100100 00200200 cf3a6c80 00008000 cf384000 00208020 00000000 cf01a200
> [   28.709564] 5f20: cf32c800 c00e3d6c 00000000 0000000c cf32c840 00000000 c0013968 cf31fb80
> [   28.718195] 5f40: 0000000c 00000000 cf01a210 ce828858 0000000c cf3ac000 000a18b4 00000000
> [   28.726827] 5f60: 00208020 c0013968 cf384000 00000000 00000003 c00e3e40 00000000 c0071e24
> [   28.735459] 5f80: 00000000 00000000 cf31fb80 cf31fbc0 a0000010 00000000 bed87b68 b6ff148c
> [   28.744091] 5fa0: 00000015 c00137c0 00000000 bed87b68 000a18b4 000a18c0 000a18c2 00208020
> [   28.752720] 5fc0: 00000000 bed87b68 b6ff148c 00000015 00000000 00000000 00000000 00000003
> [   28.761350] 5fe0: b6fa3f48 bed87a64 00042994 b6fa3f58 a0000010 000a18b4 00000000 00000000
> [   28.769989] [<c01e7244>] (crc32_le+0x50/0x168) from [<cf2f8000>] (0xcf2f8000)
> [   28.777522] Code: e58d8008 0a000026 e1a0c005 e58d2004 (e5916004)
> [   28.784029] ---[ end trace f50f53afffe647f1 ]---

OK, this is an independent problem which, as I think, has nothing to do
with the first one.

I do not know why crc32 oopses on your system. You need to investigate
this. I believe this is not UBI/UBIFS's fault.

One theory could be that UBI uses vmalloc'ed buffers for the atomic
update operation, and submits the buffer to the MTD layer for the I/O.
If your NAND driver is trying to DMA this memory, you may be in trouble,
because vmalloced memory is often not DMA-able on many systems,
especially ARM systems which do not have coherent cache support.
Mark Jackson March 14, 2013, 12:02 p.m. UTC | #5
On 14/03/13 11:23, Artem Bityutskiy wrote:
> On Thu, 2013-03-14 at 11:15 +0000, Mark Jackson wrote:
>> [   28.538525] [d08ea004] *pgd=8f045811, *pte=00000000, *ppte=00000000
>> [   28.545173] Internal error: Oops: 7 [#1] ARM
>> [   28.549685] CPU: 0    Not tainted  (3.8.0-next-20130225-00002-g678576f-dirty #40)
>> [   28.557595] PC is at crc32_le+0x50/0x168
>> [   28.561735] LR is at ubi_eba_atomic_leb_change+0x1b4/0x410
>> [   28.567523] pc : [<c01e7244>]    lr : [<c026dd9c>]    psr: 20000013
>> [   28.567523] sp : cf385de0  ip : 180a985a  fp : c054f840
>> [   28.579632] r10: c054f040  r9 : c054fc40  r8 : 158a1232
>> [   28.585141] r7 : 4d506705  r6 : 9324fd72  r5 : 4dc8a10b  r4 : 4c162691
>> [   28.592025] r3 : c054e040  r2 : c054f440  r1 : d08ea000  r0 : 4c8ee09f
>> [   28.598912] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
>> [   28.606439] Control: 10c5387d  Table: 8f3b8019  DAC: 00000015
>> [   28.612501] Process mount (pid: 659, stack limit = 0xcf384238)
>> [   28.618652] Stack: (0xcf385de0 to 0xcf386000)
>> [   28.623254] 5de0: cf2f8554 00000000 d08e6e8c 180a9e88 5a257c4f 58399ee9 c8d98a08 bb0ee864
>> [   28.631886] 5e00: eae10678 c054fc40 c054f040 c054f840 cf2f8000 00000000 d08caffc 00003c00
>> [   28.640517] 5e20: cf2f8000 cf357c00 00000000 0000000c cf2ec000 00000000 0000000c cf2f8554
>> [   28.649148] 5e40: d08cb000 0001e000 00000000 d08cb000 00008000 00000000 00000000 0001e000
>> [   28.657779] 5e60: 00000000 0000000c d08cb000 00000080 0000000c 0000000c 00000000 00000020
>> [   28.666409] 5e80: 00008000 c026c41c 0001e000 cf330000 cf330000 d08cb000 0001e000 c0179b14
>> [   28.675042] 5ea0: 0000000d c0177a68 0001e000 cf330000 00000000 cf330b20 0000000d c0179698
>> [   28.683672] 5ec0: cf330000 00000000 cf330a9c 00000000 cf385f48 c0175170 00000001 60000013
>> [   28.692303] 5ee0: cf32c800 00000000 00000000 00000000 cf385f48 00000000 00000020 c00c9e24
>> [   28.700934] 5f00: 00100100 00200200 cf3a6c80 00008000 cf384000 00208020 00000000 cf01a200
>> [   28.709564] 5f20: cf32c800 c00e3d6c 00000000 0000000c cf32c840 00000000 c0013968 cf31fb80
>> [   28.718195] 5f40: 0000000c 00000000 cf01a210 ce828858 0000000c cf3ac000 000a18b4 00000000
>> [   28.726827] 5f60: 00208020 c0013968 cf384000 00000000 00000003 c00e3e40 00000000 c0071e24
>> [   28.735459] 5f80: 00000000 00000000 cf31fb80 cf31fbc0 a0000010 00000000 bed87b68 b6ff148c
>> [   28.744091] 5fa0: 00000015 c00137c0 00000000 bed87b68 000a18b4 000a18c0 000a18c2 00208020
>> [   28.752720] 5fc0: 00000000 bed87b68 b6ff148c 00000015 00000000 00000000 00000000 00000003
>> [   28.761350] 5fe0: b6fa3f48 bed87a64 00042994 b6fa3f58 a0000010 000a18b4 00000000 00000000
>> [   28.769989] [<c01e7244>] (crc32_le+0x50/0x168) from [<cf2f8000>] (0xcf2f8000)
>> [   28.777522] Code: e58d8008 0a000026 e1a0c005 e58d2004 (e5916004)
>> [   28.784029] ---[ end trace f50f53afffe647f1 ]---
> 
> OK, this is an independent problem which, as I think, has nothing to do
> with the first one.
> 
> I do not know why crc32 oopses on your system. You need to investigate
> this. I believe this is not UBI/UBIFS's fault.
> 
> One theory could be that UBI uses vmalloc'ed buffers for the atomic
> update operation, and submits the buffer to the MTD layer for the I/O.
> If your NAND driver is trying to DMA this memory, you may be in trouble,
> because vmalloced memory is often not DMA-able on many systems,
> especially ARM systems which do not have coherent cache support.

Hmmm ... bit stuck then !?!

I've added some debug in ubi_eba_atomic_leb_change(), to simply show the values being passed to
crc32(), and the crc is being called several times with the same buff memory pointer and a size
of 2048 bytes.

But there's also a call to crc with a size of 122880 bytes, and that's when the oops occurs.

Is this size larger than the allocated buffer ?

Regards
Mark J.
---
# mount -o remount,rw /
[   24.609350] UBIFS: start fixing up free space
[   24.613979] uffs 1
[   24.616098] ffs 1
[   24.618138] ffs 2
[   24.620250] fl 1 : lnum = 1, len = 2048
[   24.624740] fl 2
[   24.627010] uealc crc32 : d08cb000 2048
[   24.631153] uealc crc_x
[   24.636597] fl 1 : lnum = 2, len = 2048
[   24.641048] fl 2
[   24.643019] uealc crc32 : d08cb000 2048
[   24.647088] uealc crc_x
[   24.650789] ffs 3
[   24.652881] ffs 4
[   24.654911] fl 1 : lnum = 3, len = 2048
[   24.659316] fl 2
[   24.661278] uealc crc32 : d08cb000 2048
[   24.665336] uealc crc_x
[   24.672101] ffs 5
[   24.674146] fl 1 : lnum = 7, len = 2048
[   24.678543] fl 2
[   24.680505] uealc crc32 : d08cb000 2048
[   24.684574] uealc crc_x
[   24.688744] ffs 6
[   24.690801] ffs 7
[   24.692831] ffs 7a : lnum = 10
[   24.696386] ffs 7b
[   24.698560] ffs 7c
[   24.700682] ffs 7a : lnum = 11
[   24.703901] ffs 7b
[   24.706019] ffs 7c
[   24.708137] ffs 7a : lnum = 12
[   24.711384] ffs 7b
[   24.713503] ffs 7c
[   24.715622] ffs 7d : c->leb_size = 126976, lprops->free = 4096
[   24.721802] fl 1 : lnum = 12, len = 122880
[   24.741216] fl 2
[   24.743176] uealc crc32 : d08cb000 122880
[   24.747581] Unable to handle kernel paging request at virtual address e7938204
[   24.755199] pgd = cf408000
[   24.758052] [e7938204] *pgd=00000000
[   24.761833] Internal error: Oops: 5 [#1] ARM
[   24.766342] CPU: 0    Not tainted  (3.8.0-next-20130225-00002-g678576f-dirty #45)
[   24.774248] PC is at crc32_le+0xf8/0x168
[   24.778389] LR is at ubi_eba_atomic_leb_change+0x1d8/0x460
[   24.784177] pc : [<c01e734c>]    lr : [<c026de20>]    psr: 20000013
[   24.784177] sp : cf359e10  ip : 00003145  fp : c054f840
[   24.796285] r10: e7938104  r9 : c054fc40  r8 : af5e2a9e
[   24.801796] r7 : e59f3038  r6 : e59f0040  r5 : 00000040  r4 : 000000e5
[   24.808682] r3 : c054e040  r2 : 00000000  r1 : d08d05d0  r0 : 3e5ed77d
[   24.815570] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   24.823097] Control: 10c5387d  Table: 8f408019  DAC: 00000015
[   24.829160] Process mount (pid: 659, stack limit = 0xcf358238)
[   24.835313] Stack: (0xcf359e10 to 0xcf35a000)
[   24.839912] 9e00:                                     d08cb000 00000000 d08caffc 00003c00
[   24.848543] 9e20: cf2f8000 00000000 cf2ec000 cf32da00 cf2f8554 00000000 0000000c d08cb000
[   24.857173] 9e40: d08cb000 c059f1f6 cf32da00 00000000 00000000 00000000 00000000 0001e000
[   24.865803] 9e60: cf32e000 0000000c d08cb000 00000080 0000000c cf3c8f88 00000000 00000020
[   24.874435] 9e80: 00008000 c026c47c 0001e000 cf359e9c cf32e000 d08cb000 0001e000 c0179b80
[   24.883066] 9ea0: cf390c80 00000001 0001e000 cf32e000 00000000 cf32eb20 0000000c c01796f0
[   24.891698] 9ec0: cf32e000 00000000 cf32ea9c 00000000 cf359f48 c0175170 00000001 60000013
[   24.900329] 9ee0: cf326800 00000000 00000000 00000000 cf359f48 00000000 00000020 c00c9e24
[   24.908963] 9f00: 00100100 00200200 cf390c80 00008000 cf358000 00208020 00000000 cf01a200
[   24.917595] 9f20: cf326800 c00e3d6c 00000000 0000000c cf326840 00000000 c0013968 cf3c4680
[   24.926227] 9f40: 0000000c 00000000 cf01a210 ce828858 0000000c cf3a4000 000a18b4 00000000
[   24.934859] 9f60: 00208020 c0013968 cf358000 00000000 00000003 c00e3e40 00000000 c0071e24
[   24.943491] 9f80: 00000000 00000000 cf3c4680 cf314540 a0000010 00000000 be984b68 b6fbc48c
[   24.952124] 9fa0: 00000015 c00137c0 00000000 be984b68 000a18b4 000a18c0 000a18c2 00208020
[   24.960757] 9fc0: 00000000 be984b68 b6fbc48c 00000015 00000000 00000000 00000000 00000003
[   24.969391] 9fe0: b6f6ef48 be984a64 00042994 b6f6ef58 a0000010 000a18b4 ebfecd47 00095348
[   24.978033] [<c01e734c>] (crc32_le+0xf8/0x168) from [<d08cb000>] (0xd08cb000)
[   24.985570] Code: 0a000008 e59da008 e28a1003 e5f1c001 (e2522001)
[   24.992006] ---[ end trace 1496ae984fb21f1a ]---
Segmentation fault
Artem Bityutskiy March 14, 2013, 12:18 p.m. UTC | #6
On Thu, 2013-03-14 at 12:02 +0000, Mark Jackson wrote:
> But there's also a call to crc with a size of 122880 bytes, and that's
> when the oops occurs.
> 
This is when we do the atomic LEB change.

> Is this size larger than the allocated buffer ?

I believe so.
Artem Bityutskiy March 14, 2013, 12:23 p.m. UTC | #7
On Thu, 2013-03-14 at 14:18 +0200, Artem Bityutskiy wrote:
> > Is this size larger than the allocated buffer ?
> 
> I believe so.

Err, I mean, the buffer is large enough. I do not believe there is a
stupid bug like too small buffer. This code has worked for years and I
do not think it was changes much.

The oops may be cause by memory corruption - some of your drivers may
corrupt memory. You need to spend more time debugging this carefully.

Do you have fastmap UBI feature enabled?
Mark Jackson March 14, 2013, 1:40 p.m. UTC | #8
On 14/03/13 12:23, Artem Bityutskiy wrote:
> On Thu, 2013-03-14 at 14:18 +0200, Artem Bityutskiy wrote:
>>> Is this size larger than the allocated buffer ?
>>
>> I believe so.
> 
> Err, I mean, the buffer is large enough. I do not believe there is a
> stupid bug like too small buffer. This code has worked for years and I
> do not think it was changes much.
> 
> The oops may be cause by memory corruption - some of your drivers may
> corrupt memory. You need to spend more time debugging this carefully.

It can handle 64k, but not 122880 bytes ...

# mount -o remount,rw /
[   19.208302] UBIFS: start fixing up free space
[   19.235881] uealc crc32 : d08cb000 2048
[   19.240015] uealc crc32 : d08cb000 4096
[   19.244091] uealc crc32 : d08cb000 8192
[   19.248184] uealc crc32 : d08cb000 16384
[   19.252448] uealc crc32 : d08cb000 32768
[   19.256772] uealc crc32 : d08cb000 65536
[   19.260133] uealc crc32 : d08cb000 122880
[   19.261117] Unable to handle kernel paging request at virtual address e79381bc
[   19.268741] pgd = cf40c000
[   19.271598] [e79381bc] *pgd=00000000
[   19.275387] Internal error: Oops: 5 [#1] ARM
[   19.279902] CPU: 0    Not tainted  (3.8.0-next-20130225-00002-g678576f-dirty #47)
[   19.287819] PC is at crc32_le+0xf8/0x168
[   19.291965] LR is at ubi_eba_atomic_leb_change+0x3ac/0x4f8
[   19.297760] pc : [<c01e724c>]    lr : [<c026def4>]    psr: 20000013
[   19.297760] sp : cf3bbe08  ip : 00000e4e  fp : c054f840
[   19.309882] r10: e7938104  r9 : c054fc40  r8 : 65e95c1c
[   19.315396] r7 : 322e315f  r6 : 352e332e  r5 : 0000002e  r4 : 00000035
[   19.322288] r3 : c054e040  r2 : 00000033  r1 : d08d3d90  r0 : 63c3884e
[   19.329180] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   19.336713] Control: 10c5387d  Table: 8f40c019  DAC: 00000015
[   19.342781] Process mount (pid: 659, stack limit = 0xcf3ba238)
[   19.348939] Stack: (0xcf3bbe08 to 0xcf3bc000)
[   19.353542] be00:                   cf2f8554 00000000 d08caffc 00002000 cf2f8000 cf357a00
[   19.362183] be20: 00000000 0000000c cf2ec000 00000000 0000000c cf2f8554 00000000 00000000
[   19.370823] be40: d08cb000 d08cb000 00000000 07000000 00008000 c026c168 00000000 0001e000
[   19.379463] be60: 00000000 0000000c d08cb000 00000080 0000000c cf3bbf48 00000000 00000020
[   19.388101] be80: 00008000 c026c37c 0001e000 cf330000 cf330000 d08cb000 0001e000 c0179a78
[   19.396738] bea0: 0000000d c0177a68 0001e000 cf330000 00000000 cf330b20 0000000d c01794b4
[   19.405376] bec0: 00000000 cf330000 00000000 cf330a9c 00000000 c0175170 00000001 60000013
[   19.414012] bee0: cf32c800 00000000 00000000 00000000 cf3bbf48 00000000 00000020 c00c9e24
[   19.422648] bf00: 00100100 00200200 cf390300 00008000 cf3ba000 00208020 00000000 cf01a200
[   19.431284] bf20: cf32c800 c00e3d6c 00000000 0000000c cf32c840 00000000 c0013968 cf325800
[   19.439921] bf40: 0000000c 00000000 cf01a210 ce828858 0000000c cf053000 000a18b4 00000000
[   19.448559] bf60: 00208020 c0013968 cf3ba000 00000000 00000003 c00e3e40 00000000 c0071e24
[   19.457197] bf80: 00000000 00000000 cf325800 cf328380 a0000010 00000000 beb83b68 b6f8348c
[   19.465838] bfa0: 00000015 c00137c0 00000000 beb83b68 000a18b4 000a18c0 000a18c2 00208020
[   19.474475] bfc0: 00000000 beb83b68 b6f8348c 00000015 00000000 00000000 00000000 00000003
[   19.483108] bfe0: b6f35f48 beb83a64 00042994 b6f35f58 a0000010 000a18b4 00000000 00000000
[   19.491758] [<c01e724c>] (crc32_le+0xf8/0x168) from [<00000000>] (  (null))
[   19.499115] Code: 0a000008 e59da008 e28a1003 e5f1c001 (e2522001)
[   19.505555] ---[ end trace 84a04423f0bc8388 ]---

> Do you have fastmap UBI feature enabled?

No ...

#
# LPDDR flash memory drivers
#
# CONFIG_MTD_LPDDR is not set
CONFIG_MTD_UBI=y
CONFIG_MTD_UBI_WL_THRESHOLD=4096
CONFIG_MTD_UBI_BEB_LIMIT=20
# CONFIG_MTD_UBI_FASTMAP is not set
# CONFIG_MTD_UBI_GLUEBI is not set
CONFIG_DTC=y
CONFIG_OF=y
Mark Jackson March 14, 2013, 1:55 p.m. UTC | #9
On 14/03/13 13:40, Mark Jackson wrote:
> On 14/03/13 12:23, Artem Bityutskiy wrote:
>> On Thu, 2013-03-14 at 14:18 +0200, Artem Bityutskiy wrote:
>>>> Is this size larger than the allocated buffer ?
>>>
>>> I believe so.
>>
>> Err, I mean, the buffer is large enough. I do not believe there is a
>> stupid bug like too small buffer. This code has worked for years and I
>> do not think it was changes much.
>>
>> The oops may be cause by memory corruption - some of your drivers may
>> corrupt memory. You need to spend more time debugging this carefully.
> 
> It can handle 64k, but not 122880 bytes ...
> 
> # mount -o remount,rw /
> [   19.208302] UBIFS: start fixing up free space
> [   19.235881] uealc crc32 : d08cb000 2048
> [   19.240015] uealc crc32 : d08cb000 4096
> [   19.244091] uealc crc32 : d08cb000 8192
> [   19.248184] uealc crc32 : d08cb000 16384
> [   19.252448] uealc crc32 : d08cb000 32768
> [   19.256772] uealc crc32 : d08cb000 65536
> [   19.260133] uealc crc32 : d08cb000 122880
> [   19.261117] Unable to handle kernel paging request at virtual address e79381bc

Unless I test in finer increments, in which case it seems to work !?!

# mount -o remount,rw /
[   25.161875] UBIFS: start fixing up free space
[   25.189572] uealc crc32 : d08cb000 1024
[   25.193653] uealc crc32 : d08cb000 2048
[   25.197709] uealc crc32 : d08cb000 3072
[   25.201819] uealc crc32 : d08cb000 4096
[   25.205886] uealc crc32 : d08cb000 5120
[   25.209982] uealc crc32 : d08cb000 6144
[   25.214054] uealc crc32 : d08cb000 7168
[   25.218129] uealc crc32 : d08cb000 8192
[   25.222238] uealc crc32 : d08cb000 9216
[   25.226323] uealc crc32 : d08cb000 10240
[   25.230528] uealc crc32 : d08cb000 11264
[   25.234711] uealc crc32 : d08cb000 12288
[   25.238928] uealc crc32 : d08cb000 13312
[   25.243119] uealc crc32 : d08cb000 14336
[   25.247313] uealc crc32 : d08cb000 15360
[   25.251541] uealc crc32 : d08cb000 16384
[   25.255744] uealc crc32 : d08cb000 17408
[   25.259982] uealc crc32 : d08cb000 18432
[   25.264193] uealc crc32 : d08cb000 19456
[   25.268406] uealc crc32 : d08cb000 20480
[   25.272657] uealc crc32 : d08cb000 21504
[   25.276881] uealc crc32 : d08cb000 22528
[   25.281136] uealc crc32 : d08cb000 23552
[   25.285367] uealc crc32 : d08cb000 24576
[   25.289633] uealc crc32 : d08cb000 25600
[   25.293873] uealc crc32 : d08cb000 26624
[   25.298115] uealc crc32 : d08cb000 27648
[   25.302394] uealc crc32 : d08cb000 28672
[   25.306644] uealc crc32 : d08cb000 29696
[   25.310928] uealc crc32 : d08cb000 30720
[   25.315188] uealc crc32 : d08cb000 31744
[   25.319485] uealc crc32 : d08cb000 32768
[   25.323753] uealc crc32 : d08cb000 33792
[   25.328025] uealc crc32 : d08cb000 34816
[   25.332331] uealc crc32 : d08cb000 35840
[   25.336612] uealc crc32 : d08cb000 36864
[   25.340924] uealc crc32 : d08cb000 37888
[   25.345212] uealc crc32 : d08cb000 38912
[   25.349538] uealc crc32 : d08cb000 39936
[   25.353837] uealc crc32 : d08cb000 40960
[   25.358138] uealc crc32 : d08cb000 41984
[   25.362481] uealc crc32 : d08cb000 43008
[   25.366794] uealc crc32 : d08cb000 44032
[   25.371140] uealc crc32 : d08cb000 45056
[   25.375462] uealc crc32 : d08cb000 46080
[   25.379823] uealc crc32 : d08cb000 47104
[   25.384155] uealc crc32 : d08cb000 48128
[   25.388490] uealc crc32 : d08cb000 49152
[   25.392862] uealc crc32 : d08cb000 50176
[   25.397207] uealc crc32 : d08cb000 51200
[   25.401583] uealc crc32 : d08cb000 52224
[   25.405936] uealc crc32 : d08cb000 53248
[   25.410326] uealc crc32 : d08cb000 54272
[   25.414688] uealc crc32 : d08cb000 55296
[   25.419084] uealc crc32 : d08cb000 56320
[   25.423455] uealc crc32 : d08cb000 57344
[   25.427827] uealc crc32 : d08cb000 58368
[   25.432238] uealc crc32 : d08cb000 59392
[   25.436621] uealc crc32 : d08cb000 60416
[   25.441033] uealc crc32 : d08cb000 61440
[   25.445423] uealc crc32 : d08cb000 62464
[   25.449852] uealc crc32 : d08cb000 63488
[   25.454254] uealc crc32 : d08cb000 64512
[   25.458658] uealc crc32 : d08cb000 65536
[   25.463097] uealc crc32 : d08cb000 66560
[   25.467508] uealc crc32 : d08cb000 67584
[   25.471953] uealc crc32 : d08cb000 68608
[   25.476372] uealc crc32 : d08cb000 69632
[   25.480823] uealc crc32 : d08cb000 70656
[   25.485251] uealc crc32 : d08cb000 71680
[   25.489712] uealc crc32 : d08cb000 72704
[   25.494147] uealc crc32 : d08cb000 73728
[   25.498584] uealc crc32 : d08cb000 74752
[   25.503058] uealc crc32 : d08cb000 75776
[   25.507505] uealc crc32 : d08cb000 76800
[   25.511988] uealc crc32 : d08cb000 77824
[   25.516443] uealc crc32 : d08cb000 78848
[   25.520935] uealc crc32 : d08cb000 79872
[   25.525399] uealc crc32 : d08cb000 80896
[   25.529899] uealc crc32 : d08cb000 81920
[   25.534369] uealc crc32 : d08cb000 82944
[   25.538876] uealc crc32 : d08cb000 83968
[   25.543359] uealc crc32 : d08cb000 84992
[   25.547844] uealc crc32 : d08cb000 86016
[   25.552366] uealc crc32 : d08cb000 87040
[   25.556859] uealc crc32 : d08cb000 88064
[   25.561383] uealc crc32 : d08cb000 89088
[   25.565885] uealc crc32 : d08cb000 90112
[   25.570423] uealc crc32 : d08cb000 91136
[   25.574935] uealc crc32 : d08cb000 92160
[   25.579479] uealc crc32 : d08cb000 93184
[   25.583997] uealc crc32 : d08cb000 94208
[   25.588517] uealc crc32 : d08cb000 95232
[   25.593115] uealc crc32 : d08cb000 96256
[   25.597648] uealc crc32 : d08cb000 97280
[   25.602226] uealc crc32 : d08cb000 98304
[   25.606766] uealc crc32 : d08cb000 99328
[   25.611340] uealc crc32 : d08cb000 100352
[   25.615978] uealc crc32 : d08cb000 101376
[   25.620652] uealc crc32 : d08cb000 102400
[   25.625300] uealc crc32 : d08cb000 103424
[   25.629984] uealc crc32 : d08cb000 104448
[   25.634637] uealc crc32 : d08cb000 105472
[   25.639325] uealc crc32 : d08cb000 106496
[   25.643988] uealc crc32 : d08cb000 107520
[   25.648650] uealc crc32 : d08cb000 108544
[   25.653370] uealc crc32 : d08cb000 109568
[   25.658047] uealc crc32 : d08cb000 110592
[   25.662764] uealc crc32 : d08cb000 111616
[   25.667449] uealc crc32 : d08cb000 112640
[   25.672168] uealc crc32 : d08cb000 113664
[   25.676858] uealc crc32 : d08cb000 114688
[   25.681581] uealc crc32 : d08cb000 115712
[   25.686276] uealc crc32 : d08cb000 116736
[   25.691010] uealc crc32 : d08cb000 117760
[   25.695717] uealc crc32 : d08cb000 118784
[   25.700462] uealc crc32 : d08cb000 119808
[   25.705177] uealc crc32 : d08cb000 120832
[   25.709929] uealc crc32 : d08cb000 121856
[   25.714654] uealc crc32 : d08cb000 122880
[   25.719414] uealc crc x
[   27.527375] UBIFS: free space fixup complete
[   27.550760] UBIFS: background thread "ubifs_bgt0_0" started, PID 660
Artem Bityutskiy March 15, 2013, 8:40 a.m. UTC | #10
On Thu, 2013-03-14 at 13:40 +0000, Mark Jackson wrote:
> On 14/03/13 12:23, Artem Bityutskiy wrote:
> > On Thu, 2013-03-14 at 14:18 +0200, Artem Bityutskiy wrote:
> >>> Is this size larger than the allocated buffer ?
> >>
> >> I believe so.
> > 
> > Err, I mean, the buffer is large enough. I do not believe there is a
> > stupid bug like too small buffer. This code has worked for years and I
> > do not think it was changes much.
> > 
> > The oops may be cause by memory corruption - some of your drivers may
> > corrupt memory. You need to spend more time debugging this carefully.
> 
> It can handle 64k, but not 122880 bytes ...

No idea what is wrong with the CRC funtions, you may try older kernels
and if you fine a working one, you may try to bisect.
Artem Bityutskiy March 15, 2013, 8:42 a.m. UTC | #11
On Thu, 2013-03-14 at 11:13 +0200, Artem Bityutskiy wrote:
> On Wed, 2013-03-13 at 11:12 +0000, Mark Jackson wrote:
> > Sorry ... this just locks up the unit.
> 
> OK, I've reproduced the issue with 3.9-rc2 in nandsim, see the details
> below. The patch I proposed did not get the error path correctly, but it
> does fix the issue.
> 
> I think what you treat as "lockup" is the fixup process. UBIFS basically
> reads the entire UBI volume and writes it back. And it uses the atomic
> change UBI service, which means it also calculates CRC of everything it
> writes. And this all just takes a lot of time. This has to be done only
> once on the first mount.
> 
> I've attached the following:
> 
> 1. The patch which fixes the issue when I use nandsim. It is also
>    inlined in the end. Please, give it a try and be more patient -
>    wait longer. Please, do report your results back.

From your last e-mail I see that the patch solves the issue for you if
the CRC32 problem is worked-around. I am pushing this fix to the ubifs
git tree. Thanks for the report.
diff mbox

Patch

From a173f8e9296562e5ece3dd2936d799001897d6c6 Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Date: Thu, 14 Mar 2013 10:49:23 +0200
Subject: [PATCH] UBIFS: make space fixup work in the remount case

The UBIFS space fixup is a useful feature which allows to fixup the "broken"
flash space at the time of the first mount. The "broken" space is usually the
result of using a "dumb" industrial flasher which is not able to skip empty
NAND pages and just writes all 0xFFs to the empty space, which has grave
side-effects for UBIFS when UBIFS trise to write useful data to those empty
pages.

The fix-up feature works roughly like this:
1. mkfs.ubifs sets the fixup flag in UBIFS superblock when creating the image
   (see -F option)
2. when the file-system is mounted for the first time, UBIFS notices the fixup
   flag and re-writes the entire media atomically, which may take really a lot
   of time.
3. UBIFS clears the fixup flag in the superblock.

This works fine when the file system is mounted R/W for the very first time.
But it did not really work in the case when we first mount the file-system R/O,
and then re-mount R/W. The reason was that we started the fixup procedure too
late, which we cannot really do because we have to fixup the space before it
starts being used.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: stable@vger.kernel.org # 3.0+
---
 fs/ubifs/super.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index ac838b8..fa4aec6 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -1568,10 +1568,17 @@  static int ubifs_remount_rw(struct ubifs_info *c)
 	c->remounting_rw = 1;
 	c->ro_mount = 0;
 
+	if (c->space_fixup) {
+		err = ubifs_fixup_free_space(c);
+		if (err)
+			return err;
+	}
+
 	err = check_free_space(c);
 	if (err)
 		goto out;
 
+
 	if (c->old_leb_cnt != c->leb_cnt) {
 		struct ubifs_sb_node *sup;
 
@@ -1684,12 +1691,6 @@  static int ubifs_remount_rw(struct ubifs_info *c)
 		err = dbg_check_space_info(c);
 	}
 
-	if (c->space_fixup) {
-		err = ubifs_fixup_free_space(c);
-		if (err)
-			goto out;
-	}
-
 	mutex_unlock(&c->umount_mutex);
 	return err;
 
-- 
1.7.7.6