Message ID | 1274617419.22999.24.camel@localhost |
---|---|
State | New, archived |
Headers | show |
Artem, thank you very much, now applying your patches and making new stress-tests :-) Also last holidays have caught another issue. We've modified our program to primary mount ubi2:dbfs in RO, then sleep 3 secs, then remount in RW. Today have found half (5/10) of devices with the error : No space left on device If mount by hands this stalled fs primary in RW mode, then everything goes OK. If drop down mount "mount ubi2:dbfs in RO, then sleep 3 secs", leave only mount in RW, everything also goes OK. [ 6.576168] UBIFS: recovery needed [ 6.665035] UBIFS: recovery deferred [ 6.668941] UBIFS: mounted UBI device 2, volume 0, name "dbfs" [ 6.674801] UBIFS: mounted read-only [ 6.678707] UBIFS: file system size: 189407232 bytes (184968 KiB, 180 MiB, 1468 LEBs) [ 6.687496] UBIFS: journal size: 9418752 bytes (9198 KiB, 8 MiB, 73 LEBs) [ 6.694332] UBIFS: media format: w4/r0 (latest is w4/r0) [ 6.700191] UBIFS: default compressor: lzo [ 6.705074] UBIFS: reserved for root: 4952683 bytes (4836 KiB) [ 9.787104] UBIFS: completing deferred recovery Remounting ubi2:dbfs on /usr/local/ecom/db failed: No space left on device # df Filesystem Size Used Available Use% Mounted on ubi0:rootfs 13.2M 7.2M 6.0M 55% / tmpfs 4.0K 0 4.0K 0% /dev shm 1.0M 0 1.0M 0% /dev/shm tmpfs 1.0M 12.0K 1012.0K 1% /tmp tmpfs 1.0M 4.0K 1020.0K 0% /root ubi1:logs 11.5M 2.4M 8.4M 22% /var/log ubi1:tmp 13.4M 16.0K 12.7M 0% /usr/local/ecom/tmp ubi0:local 2.1M 44.0K 1.9M 2% /var/local ubi:config 2.0M 68.0K 1.9M 3% /usr/local/ecom/conf ubi2:dbfs 169.2M 18.7M 145.8M 11% /usr/local/ecom/db
On Mon, 2010-05-24 at 12:49 +0600, Alexander Pazdnikov wrote: > Artem, thank you very much, now applying your patches and making new stress-tests :-) > > Also last holidays have caught another issue. > > We've modified our program to primary mount ubi2:dbfs in RO, then sleep 3 secs, then remount in RW. > Today have found half (5/10) of devices with the error : No space left on device > If mount by hands this stalled fs primary in RW mode, then everything goes OK. > If drop down mount "mount ubi2:dbfs in RO, then sleep 3 secs", leave only mount in RW, everything also goes OK. > > [ 6.576168] UBIFS: recovery needed > [ 6.665035] UBIFS: recovery deferred > [ 6.668941] UBIFS: mounted UBI device 2, volume 0, name "dbfs" > [ 6.674801] UBIFS: mounted read-only > [ 6.678707] UBIFS: file system size: 189407232 bytes (184968 KiB, 180 MiB, 1468 LEBs) > [ 6.687496] UBIFS: journal size: 9418752 bytes (9198 KiB, 8 MiB, 73 LEBs) > [ 6.694332] UBIFS: media format: w4/r0 (latest is w4/r0) > [ 6.700191] UBIFS: default compressor: lzo > [ 6.705074] UBIFS: reserved for root: 4952683 bytes (4836 KiB) > [ 9.787104] UBIFS: completing deferred recovery > Remounting ubi2:dbfs on /usr/local/ecom/db failed: No space left on device > > # df > Filesystem Size Used Available Use% Mounted on > ubi0:rootfs 13.2M 7.2M 6.0M 55% / > tmpfs 4.0K 0 4.0K 0% /dev > shm 1.0M 0 1.0M 0% /dev/shm > tmpfs 1.0M 12.0K 1012.0K 1% /tmp > tmpfs 1.0M 4.0K 1020.0K 0% /root > ubi1:logs 11.5M 2.4M 8.4M 22% /var/log > ubi1:tmp 13.4M 16.0K 12.7M 0% /usr/local/ecom/tmp > ubi0:local 2.1M 44.0K 1.9M 2% /var/local > ubi:config 2.0M 68.0K 1.9M 3% /usr/local/ecom/conf > ubi2:dbfs 169.2M 18.7M 145.8M 11% /usr/local/ecom/db Could you enable debugging messages when mounging it after power cuts? The messages generate a lot of noise, so you need to disable them before starting the real test. The idea is to get UBIFS debugging messages when errors occur. You can read about how to enable/disable the messages here: http://www.linux-mtd.infradead.org/doc/ubifs.html#L_how_send_bugreport and in the Documentation/filesystems/ubifs.txt The problems you see are probably budgeting bugs. Please, read this section to get some idea what is the UBIFS budgeting subsystem about: http://www.linux-mtd.infradead.org/doc/ubifs.html#L_spaceacc I have a test which reproduces budgeting bugs here: git://git.infradead.org/users/dedekind/ubifs-userspace.git see the "freespace" test and the 'freespace.sh' script which runs the freespace test on various NAND/NOR size and geometry flashes. I never had time to really dig this and fix. Would be nice if you could do this because I will unlikely have time in the nearest future, but I can help you. One think you can do is to make sure you do not fill UBIFS completely, then you will avoid hitting these budgeting bugs, also known as ENOSPC issues (I think btrfs uses this terminology). E.g., if you work under a normal user, you can reserve more space for the root (see -R mkfs.ubifs option). But of course, ideally we should just fix the issues.
> Could you enable debugging messages when mounging it after power cuts? > The messages generate a lot of noise, so you need to disable them before > starting the real test. The idea is to get UBIFS debugging messages when > errors occur. Sorry, I've hurried up, and after applying your pathes reflash all of the devices with new firmware. But I'll setup some devices with all firmware to catch this issue again. > I never had time to really dig this and fix. Would be nice if you could > do this because I will unlikely have time in the nearest future, but I > can help you. Thank you, I'll continue testings and can supply with debug information. > One think you can do is to make sure you do not fill UBIFS completely, > then you will avoid hitting these budgeting bugs, also known as ENOSPC > issues (I think btrfs uses this terminology). In the issue above there is 11% used space and 89% free space. I've got this issue only when mounting fs in RO and then remounting it in RW.
On Mon, 2010-05-24 at 15:50 +0600, Alexander Pazdnikov wrote: > > Could you enable debugging messages when mounging it after power cuts? > > The messages generate a lot of noise, so you need to disable them before > > starting the real test. The idea is to get UBIFS debugging messages when > > errors occur. > Sorry, I've hurried up, and after applying your pathes reflash all of the > devices with new firmware. > But I'll setup some devices with all firmware to catch this issue again. > > > I never had time to really dig this and fix. Would be nice if you could > > do this because I will unlikely have time in the nearest future, but I > > can help you. > Thank you, I'll continue testings and can supply with debug information. > > > One think you can do is to make sure you do not fill UBIFS completely, > > then you will avoid hitting these budgeting bugs, also known as ENOSPC > > issues (I think btrfs uses this terminology). > In the issue above there is 11% used space and 89% free space. > I've got this issue only when mounting fs in RO and then remounting it in RW. OK. Try my patch. Also, it is worth running your tests with UBIFS extra self checks enabled.
On Mon, 2010-05-24 at 12:56 +0300, Artem Bityutskiy wrote: > On Mon, 2010-05-24 at 15:50 +0600, Alexander Pazdnikov wrote: > > > Could you enable debugging messages when mounging it after power cuts? > > > The messages generate a lot of noise, so you need to disable them before > > > starting the real test. The idea is to get UBIFS debugging messages when > > > errors occur. > > Sorry, I've hurried up, and after applying your pathes reflash all of the > > devices with new firmware. > > But I'll setup some devices with all firmware to catch this issue again. > > > > > I never had time to really dig this and fix. Would be nice if you could > > > do this because I will unlikely have time in the nearest future, but I > > > can help you. > > Thank you, I'll continue testings and can supply with debug information. > > > > > One think you can do is to make sure you do not fill UBIFS completely, > > > then you will avoid hitting these budgeting bugs, also known as ENOSPC > > > issues (I think btrfs uses this terminology). > > In the issue above there is 11% used space and 89% free space. > > I've got this issue only when mounting fs in RO and then remounting it in RW. > > OK. Try my patch. Also, it is worth running your tests with UBIFS extra > self checks enabled. Did it help?
В сообщении от Суббота 12 июня 2010 20:22:55 автор Artem Bityutskiy написал: > On Mon, 2010-05-24 at 12:56 +0300, Artem Bityutskiy wrote: > > On Mon, 2010-05-24 at 15:50 +0600, Alexander Pazdnikov wrote: > > > > Could you enable debugging messages when mounging it after power > > > > cuts? The messages generate a lot of noise, so you need to disable > > > > them before starting the real test. The idea is to get UBIFS > > > > debugging messages when errors occur. > > > > > > Sorry, I've hurried up, and after applying your pathes reflash all of > > > the devices with new firmware. > > > But I'll setup some devices with all firmware to catch this issue > > > again. > > > > > > > I never had time to really dig this and fix. Would be nice if you > > > > could do this because I will unlikely have time in the nearest > > > > future, but I can help you. > > > > > > Thank you, I'll continue testings and can supply with debug > > > information. > > > > > > > One think you can do is to make sure you do not fill UBIFS > > > > completely, then you will avoid hitting these budgeting bugs, also > > > > known as ENOSPC issues (I think btrfs uses this terminology). > > > > > > In the issue above there is 11% used space and 89% free space. > > > I've got this issue only when mounting fs in RO and then remounting it > > > in RW. > > > > OK. Try my patch. Also, it is worth running your tests with UBIFS extra > > self checks enabled. > > Did it help? > Artem, thank you very much, yes - it helped, all of 10 testing devices survived with repeatable powerfails (1,2,3,4,5,6,7,8,9,10 minutes in this sequence) for a week. Now we leave 3 devices for this test. Appologize me, please, I've not tested again for the RO-RW recovery on remount with this patch to supply you with debug info.
On Tue, 2010-06-15 at 11:02 +0600, Alexander Pazdnikov wrote: > Artem, thank you very much, yes - it helped, all of 10 testing devices > survived with repeatable powerfails (1,2,3,4,5,6,7,8,9,10 minutes in this > sequence) for a week. > Now we leave 3 devices for this test. OK, I finally pushed the fix to the ubifs-2.6.git tree and will merge it next merge window: http://git.infradead.org/ubifs-2.6.git/commit/6fb4374f6b1b3932f3acfe9d353568d3d8599cad
diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c index 109c6ea..5e2a438 100644 --- a/fs/ubifs/recovery.c +++ b/fs/ubifs/recovery.c @@ -1063,8 +1063,10 @@ int ubifs_rcvry_gc_commit(struct ubifs_info *c) } err = ubifs_find_dirty_leb(c, &lp, wbuf->offs, 2); if (err) { - if (err == -ENOSPC) - dbg_err("could not find a dirty LEB"); + if (err == -ENOSPC) { + dbg_rcvry("could not find a dirty LEB"); + goto find_free; + } return err; } ubifs_assert(!(lp.flags & LPROPS_INDEX));