Message ID | 20100417184016.GA17345@logfs.org |
---|---|
State | New, archived |
Headers | show |
On Sat, Apr 17 2010, Jörn Engel wrote: > Moin David, > > if I read the code correctly, JFFS2 will happily perform a NOP on > sys_sync() again. And this time it appears to be Jens' fault. > > JFFS2 does not appear to set s_bdi anywhere. And as of 32a88aa1, > __sync_filesystem() will return 0 if s_bdi is not set. As a result, > sync_fs() is never called for jffs2 and whatever remains in the wbuf > will not make it to the device. > > The patch also adds a BUG_ON to catch this problem in any remaining or > future offenders. I am not sure about network filesystems, but at > least bdev- and mtd-based ones should be caught. > > Opinions? I think that BUG_ON() would be a lot better as a printk() and fail mount instead. There's really little point in killing the kernel for something you could easily warn about and recover nicely.
On Mon, 19 April 2010 09:38:44 +0200, Jens Axboe wrote: > On Sat, Apr 17 2010, Jörn Engel wrote: > > Moin David, > > > > if I read the code correctly, JFFS2 will happily perform a NOP on > > sys_sync() again. And this time it appears to be Jens' fault. > > > > JFFS2 does not appear to set s_bdi anywhere. And as of 32a88aa1, > > __sync_filesystem() will return 0 if s_bdi is not set. As a result, > > sync_fs() is never called for jffs2 and whatever remains in the wbuf > > will not make it to the device. > > > > The patch also adds a BUG_ON to catch this problem in any remaining or > > future offenders. I am not sure about network filesystems, but at > > least bdev- and mtd-based ones should be caught. > > > > Opinions? > > I think that BUG_ON() would be a lot better as a printk() and fail mount > instead. There's really little point in killing the kernel for something > you could easily warn about and recover nicely. *shrug* The BUG_ON directly above is not qualitatively different. In both cases the only solution is to find and fix the bug in some other file, recompile and try again. But ultimately I don't care, as long as we catch the bug before people lose their data. Feel free to respin this patch. You caused the problem in the first place. ;) For the record, while I consider the two-liner that causes this mess a real turd, the rest of your patch was brilliant. A shame I didn't catch this earlier. Jörn
On Mon, Apr 19 2010, Jörn Engel wrote: > On Mon, 19 April 2010 09:38:44 +0200, Jens Axboe wrote: > > On Sat, Apr 17 2010, Jörn Engel wrote: > > > Moin David, > > > > > > if I read the code correctly, JFFS2 will happily perform a NOP on > > > sys_sync() again. And this time it appears to be Jens' fault. > > > > > > JFFS2 does not appear to set s_bdi anywhere. And as of 32a88aa1, > > > __sync_filesystem() will return 0 if s_bdi is not set. As a result, > > > sync_fs() is never called for jffs2 and whatever remains in the wbuf > > > will not make it to the device. > > > > > > The patch also adds a BUG_ON to catch this problem in any remaining or > > > future offenders. I am not sure about network filesystems, but at > > > least bdev- and mtd-based ones should be caught. > > > > > > Opinions? > > > > I think that BUG_ON() would be a lot better as a printk() and fail mount > > instead. There's really little point in killing the kernel for something > > you could easily warn about and recover nicely. > > *shrug* > The BUG_ON directly above is not qualitatively different. In both cases > the only solution is to find and fix the bug in some other file, > recompile and try again. But ultimately I don't care, as long as we > catch the bug before people lose their data. > > Feel free to respin this patch. You caused the problem in the first > place. ;) > > For the record, while I consider the two-liner that causes this mess a > real turd, the rest of your patch was brilliant. A shame I didn't catch > this earlier. Thanks, we definitely should have put a debug statement to catch this in from day 1, good debugging should be an important part of any new infrastructure. Care to send your jffs2 patch separately to David? Then I'll commit a modified variant for complaining about missing ->s_bdi on mount.
On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote: > > Thanks, we definitely should have put a debug statement to catch this in > from day 1, good debugging should be an important part of any new > infrastructure. Woke up early and had another look at this. Looks like a much more widespread problem. Based on a quick grep an uncaffeinated brain: 9p no s_bdi afs no s_bdi ceph creates its own s_bdi cifs no s_bdi coda no s_bdi ecryptfs no s_bdi exofs no s_bdi fuse creates its own s_bdi? gfs2 creates its own s_bdi? jffs2 patch exists logfs fixed now ncpfs no s_bdi nfs creates its own s_bdi ocfs2 no s_bdi smbfs no s_bdi ubifs creates its own s_bdi I excluded all filesystems that appear to be read-only, block device based or lack any sort of backing store. So there is a chance I have missed some as well. Jörn
On Thu, Apr 22 2010, Jörn Engel wrote: > On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote: > > > > Thanks, we definitely should have put a debug statement to catch this in > > from day 1, good debugging should be an important part of any new > > infrastructure. > > Woke up early and had another look at this. Looks like a much more > widespread problem. Based on a quick grep an uncaffeinated brain: > > 9p no s_bdi > afs no s_bdi > ceph creates its own s_bdi > cifs no s_bdi > coda no s_bdi > ecryptfs no s_bdi > exofs no s_bdi > fuse creates its own s_bdi? > gfs2 creates its own s_bdi? > jffs2 patch exists > logfs fixed now > ncpfs no s_bdi > nfs creates its own s_bdi > ocfs2 no s_bdi > smbfs no s_bdi > ubifs creates its own s_bdi > > I excluded all filesystems that appear to be read-only, block device > based or lack any sort of backing store. So there is a chance I have > missed some as well. It's funky, I was pretty sure there was/is code to set a default bdi for non-bdev file systems. It appears to be missing, that's not good. So options include: - Add the appropriate per-sb bdi for these file systems (right fix), or - Pre-fill default_backing_dev_info as a fallback ->s_bdi to at least ensure that data gets flushed (quick fix) I'll slap together a set of fixes for this.
On Thu, Apr 22 2010, Jens Axboe wrote: > On Thu, Apr 22 2010, Jörn Engel wrote: > > On Mon, 19 April 2010 12:20:56 +0200, Jens Axboe wrote: > > > > > > Thanks, we definitely should have put a debug statement to catch this in > > > from day 1, good debugging should be an important part of any new > > > infrastructure. > > > > Woke up early and had another look at this. Looks like a much more > > widespread problem. Based on a quick grep an uncaffeinated brain: > > > > 9p no s_bdi > > afs no s_bdi > > ceph creates its own s_bdi > > cifs no s_bdi > > coda no s_bdi > > ecryptfs no s_bdi > > exofs no s_bdi > > fuse creates its own s_bdi? > > gfs2 creates its own s_bdi? > > jffs2 patch exists > > logfs fixed now > > ncpfs no s_bdi > > nfs creates its own s_bdi > > ocfs2 no s_bdi > > smbfs no s_bdi > > ubifs creates its own s_bdi > > > > I excluded all filesystems that appear to be read-only, block device > > based or lack any sort of backing store. So there is a chance I have > > missed some as well. > > It's funky, I was pretty sure there was/is code to set a default bdi for > non-bdev file systems. It appears to be missing, that's not good. So > options include: > > - Add the appropriate per-sb bdi for these file systems (right fix), or > - Pre-fill default_backing_dev_info as a fallback ->s_bdi to at least > ensure that data gets flushed (quick fix) > > I'll slap together a set of fixes for this. Here's a series for fixing these. At this point they are totally untested except that I did compile them. Note that your analysis appeared correct for all cases but ocfs2, which does use get_sb_bdev() and hence gets ->s_bdi assigned. You can see them here, I'll post the series soon: http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus The first patch is a helper addition, the rest are per-fs fixups.
On Thu, 2010-04-22 at 12:39 +0200, Jens Axboe wrote: > > Here's a series for fixing these. At this point they are totally > untested except that I did compile them. Note that your analysis > appeared correct for all cases but ocfs2, which does use get_sb_bdev() > and hence gets ->s_bdi assigned. > > You can see them here, I'll post the series soon: > > http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus > > The first patch is a helper addition, the rest are per-fs fixups. Do you want to include Jörn's addition of same to get_sb_mtd_set(), with my Acked-By: David Woodhouse <David.Woodhouse@intel.com> ?
On Thu, Apr 22 2010, David Woodhouse wrote: > On Thu, 2010-04-22 at 12:39 +0200, Jens Axboe wrote: > > > > Here's a series for fixing these. At this point they are totally > > untested except that I did compile them. Note that your analysis > > appeared correct for all cases but ocfs2, which does use get_sb_bdev() > > and hence gets ->s_bdi assigned. > > > > You can see them here, I'll post the series soon: > > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus > > > > The first patch is a helper addition, the rest are per-fs fixups. > > Do you want to include Jörn's addition of same to get_sb_mtd_set(), with > my Acked-By: David Woodhouse <David.Woodhouse@intel.com> ? Yeah will do. I also really wanted to provide a WARN and mount fail if we get it wrong, but I don't see an easy way to do that. Basically I'd want to check whether the storage backing is volatile or not.
On Thu, 22 April 2010 12:39:53 +0200, Jens Axboe wrote: > > Here's a series for fixing these. At this point they are totally > untested except that I did compile them. Note that your analysis > appeared correct for all cases but ocfs2, which does use get_sb_bdev() > and hence gets ->s_bdi assigned. > > You can see them here, I'll post the series soon: > > http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus > > The first patch is a helper addition, the rest are per-fs fixups. Looks good at a cursory glance. What's still missing is some sort of assertion. You are a smart person and missed this problem, twice even. Even if you hadn't, a not so smart person can add a new filesystem and miss s_bdi, like I did. We want some automatism to catch this. Jörn
On Thu, Apr 22 2010, Jörn Engel wrote: > On Thu, 22 April 2010 12:39:53 +0200, Jens Axboe wrote: > > > > Here's a series for fixing these. At this point they are totally > > untested except that I did compile them. Note that your analysis > > appeared correct for all cases but ocfs2, which does use get_sb_bdev() > > and hence gets ->s_bdi assigned. > > > > You can see them here, I'll post the series soon: > > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus > > > > The first patch is a helper addition, the rest are per-fs fixups. > > Looks good at a cursory glance. What's still missing is some sort of > assertion. You are a smart person and missed this problem, twice even. > Even if you hadn't, a not so smart person can add a new filesystem and > miss s_bdi, like I did. We want some automatism to catch this. I totally agree, we want some way to catch this problem in the future. Really the check needs to be something ala: if (!sb->s_bdi && mnt_has_storage_backing() && rw_mount) yell_and_fail; but I'm not sure how best to accomplish that. We can check for ->s_bdev and mtd like you did, but that does not catch network file systems for instance.
On Thu, 22 April 2010 14:08:37 +0200, Jens Axboe wrote: > > I totally agree, we want some way to catch this problem in the future. > Really the check needs to be something ala: > > if (!sb->s_bdi && mnt_has_storage_backing() && rw_mount) > yell_and_fail; > > but I'm not sure how best to accomplish that. We can check for ->s_bdev > and mtd like you did, but that does not catch network file systems for > instance. One way would be to either add a flag to all safe filesystems or create a noop_bdi for them. It adds a line of code and some bytes[*] per instance to most filesystems, but that's the only catchall I can think of. I guess if noone comes up with a better plan I'll look into that. [*] Maybe we can steal a bit somewhere to make it less expensive. Jörn
diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c index af8b42e..7c00319 100644 --- a/drivers/mtd/mtdsuper.c +++ b/drivers/mtd/mtdsuper.c @@ -13,6 +13,7 @@ #include <linux/mtd/super.h> #include <linux/namei.h> #include <linux/ctype.h> +#include <linux/slab.h> /* * compare superblocks to see if they're equivalent @@ -44,6 +45,7 @@ static int get_sb_mtd_set(struct super_block *sb, void *_mtd) sb->s_mtd = mtd; sb->s_dev = MKDEV(MTD_BLOCK_MAJOR, mtd->index); + sb->s_bdi = mtd->backing_dev_info; return 0; } diff --git a/fs/super.c b/fs/super.c index f35ac60..e8af253 100644 --- a/fs/super.c +++ b/fs/super.c @@ -954,10 +954,12 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void if (error < 0) goto out_free_secdata; BUG_ON(!mnt->mnt_sb); + BUG_ON(!mnt->mnt_sb->s_bdi && + (mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd)); - error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata); - if (error) - goto out_sb; + error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata); + if (error) + goto out_sb; /* * filesystems should never set s_maxbytes larger than MAX_LFS_FILESIZE