Patchwork [MTD] Fix JFFS2 sync silent failure

login
register
mail settings
Submitter Jörn Engel
Date April 17, 2010, 6:40 p.m.
Message ID <20100417184016.GA17345@logfs.org>
Download mbox | patch
Permalink /patch/50385/
State New
Headers show

Comments

Jörn Engel - April 17, 2010, 6:40 p.m.
Moin David,

if I read the code correctly, JFFS2 will happily perform a NOP on
sys_sync() again.  And this time it appears to be Jens' fault.

JFFS2 does not appear to set s_bdi anywhere.  And as of 32a88aa1,
__sync_filesystem() will return 0 if s_bdi is not set.  As a result,
sync_fs() is never called for jffs2 and whatever remains in the wbuf
will not make it to the device.

The patch also adds a BUG_ON to catch this problem in any remaining or
future offenders.  I am not sure about network filesystems, but at
least bdev- and mtd-based ones should be caught.

Opinions?

Jörn
Jens Axboe - April 19, 2010, 7:38 a.m.
On Sat, Apr 17 2010, Jörn Engel wrote:
> Moin David,
> 
> if I read the code correctly, JFFS2 will happily perform a NOP on
> sys_sync() again.  And this time it appears to be Jens' fault.
> 
> JFFS2 does not appear to set s_bdi anywhere.  And as of 32a88aa1,
> __sync_filesystem() will return 0 if s_bdi is not set.  As a result,
> sync_fs() is never called for jffs2 and whatever remains in the wbuf
> will not make it to the device.
> 
> The patch also adds a BUG_ON to catch this problem in any remaining or
> future offenders.  I am not sure about network filesystems, but at
> least bdev- and mtd-based ones should be caught.
> 
> Opinions?

I think that BUG_ON() would be a lot better as a printk() and fail mount
instead. There's really little point in killing the kernel for something
you could easily warn about and recover nicely.
Jörn Engel - April 19, 2010, 10:15 a.m.
On Mon, 19 April 2010 09:38:44 +0200, Jens Axboe wrote:
> On Sat, Apr 17 2010, Jörn Engel wrote:
> > Moin David,
> > 
> > if I read the code correctly, JFFS2 will happily perform a NOP on
> > sys_sync() again.  And this time it appears to be Jens' fault.
> > 
> > JFFS2 does not appear to set s_bdi anywhere.  And as of 32a88aa1,
> > __sync_filesystem() will return 0 if s_bdi is not set.  As a result,
> > sync_fs() is never called for jffs2 and whatever remains in the wbuf
> > will not make it to the device.
> > 
> > The patch also adds a BUG_ON to catch this problem in any remaining or
> > future offenders.  I am not sure about network filesystems, but at
> > least bdev- and mtd-based ones should be caught.
> > 
> > Opinions?
> 
> I think that BUG_ON() would be a lot better as a printk() and fail mount
> instead. There's really little point in killing the kernel for something
> you could easily warn about and recover nicely.

*shrug*
The BUG_ON directly above is not qualitatively different.  In both cases
the only solution is to find and fix the bug in some other file,
recompile and try again.  But ultimately I don't care, as long as we
catch the bug before people lose their data.

Feel free to respin this patch.  You caused the problem in the first
place. ;)

For the record, while I consider the two-liner that causes this mess a
real turd, the rest of your patch was brilliant.  A shame I didn't catch
this earlier.

Jörn
Jens Axboe - April 19, 2010, 10:20 a.m.
On Mon, Apr 19 2010, Jörn Engel wrote:
> On Mon, 19 April 2010 09:38:44 +0200, Jens Axboe wrote:
> > On Sat, Apr 17 2010, Jörn Engel wrote:
> > > Moin David,
> > > 
> > > if I read the code correctly, JFFS2 will happily perform a NOP on
> > > sys_sync() again.  And this time it appears to be Jens' fault.
> > > 
> > > JFFS2 does not appear to set s_bdi anywhere.  And as of 32a88aa1,
> > > __sync_filesystem() will return 0 if s_bdi is not set.  As a result,
> > > sync_fs() is never called for jffs2 and whatever remains in the wbuf
> > > will not make it to the device.
> > > 
> > > The patch also adds a BUG_ON to catch this problem in any remaining or
> > > future offenders.  I am not sure about network filesystems, but at
> > > least bdev- and mtd-based ones should be caught.
> > > 
> > > Opinions?
> > 
> > I think that BUG_ON() would be a lot better as a printk() and fail mount
> > instead. There's really little point in killing the kernel for something
> > you could easily warn about and recover nicely.
> 
> *shrug*
> The BUG_ON directly above is not qualitatively different.  In both cases
> the only solution is to find and fix the bug in some other file,
> recompile and try again.  But ultimately I don't care, as long as we
> catch the bug before people lose their data.
> 
> Feel free to respin this patch.  You caused the problem in the first
> place. ;)
> 
> For the record, while I consider the two-liner that causes this mess a
> real turd, the rest of your patch was brilliant.  A shame I didn't catch
> this earlier.

Thanks, we definitely should have put a debug statement to catch this in
from day 1, good debugging should be an important part of any new
infrastructure.

Care to send your jffs2 patch separately to David? Then I'll commit a
modified variant for complaining about missing ->s_bdi on mount.
Jörn Engel - April 22, 2010, 5:54 a.m.
On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote:
> 
> Thanks, we definitely should have put a debug statement to catch this in
> from day 1, good debugging should be an important part of any new
> infrastructure.

Woke up early and had another look at this.  Looks like a much more
widespread problem.  Based on a quick grep an uncaffeinated brain:

9p	no s_bdi
afs	no s_bdi
ceph	creates its own s_bdi
cifs	no s_bdi
coda	no s_bdi
ecryptfs no s_bdi
exofs	no s_bdi
fuse	creates its own s_bdi?
gfs2	creates its own s_bdi?
jffs2	patch exists
logfs	fixed now
ncpfs	no s_bdi
nfs	creates its own s_bdi
ocfs2	no s_bdi
smbfs	no s_bdi
ubifs	creates its own s_bdi

I excluded all filesystems that appear to be read-only, block device
based or lack any sort of backing store.  So there is a chance I have
missed some as well.

Jörn
Jens Axboe - April 22, 2010, 9:03 a.m.
On Thu, Apr 22 2010, Jörn Engel wrote:
> On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote:
> > 
> > Thanks, we definitely should have put a debug statement to catch this in
> > from day 1, good debugging should be an important part of any new
> > infrastructure.
> 
> Woke up early and had another look at this.  Looks like a much more
> widespread problem.  Based on a quick grep an uncaffeinated brain:
> 
> 9p	no s_bdi
> afs	no s_bdi
> ceph	creates its own s_bdi
> cifs	no s_bdi
> coda	no s_bdi
> ecryptfs no s_bdi
> exofs	no s_bdi
> fuse	creates its own s_bdi?
> gfs2	creates its own s_bdi?
> jffs2	patch exists
> logfs	fixed now
> ncpfs	no s_bdi
> nfs	creates its own s_bdi
> ocfs2	no s_bdi
> smbfs	no s_bdi
> ubifs	creates its own s_bdi
> 
> I excluded all filesystems that appear to be read-only, block device
> based or lack any sort of backing store.  So there is a chance I have
> missed some as well.

It's funky, I was pretty sure there was/is code to set a default bdi for
non-bdev file systems. It appears to be missing, that's not good. So
options include:

- Add the appropriate per-sb bdi for these file systems (right fix), or
- Pre-fill default_backing_dev_info as a fallback ->s_bdi to at least
  ensure that data gets flushed (quick fix)

I'll slap together a set of fixes for this.
Jens Axboe - April 22, 2010, 10:39 a.m.
On Thu, Apr 22 2010, Jens Axboe wrote:
> On Thu, Apr 22 2010, Jörn Engel wrote:
> > On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote:
> > > 
> > > Thanks, we definitely should have put a debug statement to catch this in
> > > from day 1, good debugging should be an important part of any new
> > > infrastructure.
> > 
> > Woke up early and had another look at this.  Looks like a much more
> > widespread problem.  Based on a quick grep an uncaffeinated brain:
> > 
> > 9p	no s_bdi
> > afs	no s_bdi
> > ceph	creates its own s_bdi
> > cifs	no s_bdi
> > coda	no s_bdi
> > ecryptfs no s_bdi
> > exofs	no s_bdi
> > fuse	creates its own s_bdi?
> > gfs2	creates its own s_bdi?
> > jffs2	patch exists
> > logfs	fixed now
> > ncpfs	no s_bdi
> > nfs	creates its own s_bdi
> > ocfs2	no s_bdi
> > smbfs	no s_bdi
> > ubifs	creates its own s_bdi
> > 
> > I excluded all filesystems that appear to be read-only, block device
> > based or lack any sort of backing store.  So there is a chance I have
> > missed some as well.
> 
> It's funky, I was pretty sure there was/is code to set a default bdi for
> non-bdev file systems. It appears to be missing, that's not good. So
> options include:
> 
> - Add the appropriate per-sb bdi for these file systems (right fix), or
> - Pre-fill default_backing_dev_info as a fallback ->s_bdi to at least
>   ensure that data gets flushed (quick fix)
> 
> I'll slap together a set of fixes for this.

Here's a series for fixing these. At this point they are totally
untested except that I did compile them. Note that your analysis
appeared correct for all cases but ocfs2, which does use get_sb_bdev()
and hence gets ->s_bdi assigned.

You can see them here, I'll post the series soon:

http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus

The first patch is a helper addition, the rest are per-fs fixups.
David Woodhouse - April 22, 2010, 10:58 a.m.
On Thu, 2010-04-22 at 12:39 +0200, Jens Axboe wrote:
> 
> Here's a series for fixing these. At this point they are totally
> untested except that I did compile them. Note that your analysis
> appeared correct for all cases but ocfs2, which does use get_sb_bdev()
> and hence gets ->s_bdi assigned.
> 
> You can see them here, I'll post the series soon:
> 
> http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus
> 
> The first patch is a helper addition, the rest are per-fs fixups.

Do you want to include Jörn's addition of same to get_sb_mtd_set(), with
my Acked-By: David Woodhouse <David.Woodhouse@intel.com> ?
Jens Axboe - April 22, 2010, 11:05 a.m.
On Thu, Apr 22 2010, David Woodhouse wrote:
> On Thu, 2010-04-22 at 12:39 +0200, Jens Axboe wrote:
> > 
> > Here's a series for fixing these. At this point they are totally
> > untested except that I did compile them. Note that your analysis
> > appeared correct for all cases but ocfs2, which does use get_sb_bdev()
> > and hence gets ->s_bdi assigned.
> > 
> > You can see them here, I'll post the series soon:
> > 
> > http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus
> > 
> > The first patch is a helper addition, the rest are per-fs fixups.
> 
> Do you want to include Jörn's addition of same to get_sb_mtd_set(), with
> my Acked-By: David Woodhouse <David.Woodhouse@intel.com> ? 

Yeah will do. I also really wanted to provide a WARN and mount fail if
we get it wrong, but I don't see an easy way to do that. Basically I'd
want to check whether the storage backing is volatile or not.
Jörn Engel - April 22, 2010, 11:55 a.m.
On Thu, 22 April 2010 12:39:53 +0200, Jens Axboe wrote:
> 
> Here's a series for fixing these. At this point they are totally
> untested except that I did compile them. Note that your analysis
> appeared correct for all cases but ocfs2, which does use get_sb_bdev()
> and hence gets ->s_bdi assigned.
> 
> You can see them here, I'll post the series soon:
> 
> http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus
> 
> The first patch is a helper addition, the rest are per-fs fixups.

Looks good at a cursory glance.  What's still missing is some sort of
assertion.  You are a smart person and missed this problem, twice even.
Even if you hadn't, a not so smart person can add a new filesystem and
miss s_bdi, like I did.  We want some automatism to catch this.

Jörn
Jens Axboe - April 22, 2010, 12:08 p.m.
On Thu, Apr 22 2010, Jörn Engel wrote:
> On Thu, 22 April 2010 12:39:53 +0200, Jens Axboe wrote:
> > 
> > Here's a series for fixing these. At this point they are totally
> > untested except that I did compile them. Note that your analysis
> > appeared correct for all cases but ocfs2, which does use get_sb_bdev()
> > and hence gets ->s_bdi assigned.
> > 
> > You can see them here, I'll post the series soon:
> > 
> > http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus
> > 
> > The first patch is a helper addition, the rest are per-fs fixups.
> 
> Looks good at a cursory glance.  What's still missing is some sort of
> assertion.  You are a smart person and missed this problem, twice even.
> Even if you hadn't, a not so smart person can add a new filesystem and
> miss s_bdi, like I did.  We want some automatism to catch this.

I totally agree, we want some way to catch this problem in the future.
Really the check needs to be something ala:

        if (!sb->s_bdi && mnt_has_storage_backing() && rw_mount)
                yell_and_fail;

but I'm not sure how best to accomplish that. We can check for ->s_bdev
and mtd like you did, but that does not catch network file systems for
instance.
Jörn Engel - April 22, 2010, 12:17 p.m.
On Thu, 22 April 2010 14:08:37 +0200, Jens Axboe wrote:
> 
> I totally agree, we want some way to catch this problem in the future.
> Really the check needs to be something ala:
> 
>         if (!sb->s_bdi && mnt_has_storage_backing() && rw_mount)
>                 yell_and_fail;
> 
> but I'm not sure how best to accomplish that. We can check for ->s_bdev
> and mtd like you did, but that does not catch network file systems for
> instance.

One way would be to either add a flag to all safe filesystems or create
a noop_bdi for them.  It adds a line of code and some bytes[*] per
instance to most filesystems, but that's the only catchall I can think
of.

I guess if noone comes up with a better plan I'll look into that.

[*] Maybe we can steal a bit somewhere to make it less expensive.

Jörn

Patch

diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index af8b42e..7c00319 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -13,6 +13,7 @@ 
 #include <linux/mtd/super.h>
 #include <linux/namei.h>
 #include <linux/ctype.h>
+#include <linux/slab.h>
 
 /*
  * compare superblocks to see if they're equivalent
@@ -44,6 +45,7 @@  static int get_sb_mtd_set(struct super_block *sb, void *_mtd)
 
 	sb->s_mtd = mtd;
 	sb->s_dev = MKDEV(MTD_BLOCK_MAJOR, mtd->index);
+	sb->s_bdi = mtd->backing_dev_info;
 	return 0;
 }
 
diff --git a/fs/super.c b/fs/super.c
index f35ac60..e8af253 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -954,10 +954,12 @@  vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	if (error < 0)
 		goto out_free_secdata;
 	BUG_ON(!mnt->mnt_sb);
+	BUG_ON(!mnt->mnt_sb->s_bdi &&
+			(mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd));
 
- 	error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata);
- 	if (error)
- 		goto out_sb;
+	error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata);
+	if (error)
+		goto out_sb;
 
 	/*
 	 * filesystems should never set s_maxbytes larger than MAX_LFS_FILESIZE