Patchwork Catch filesystems lacking s_bdi

login
register
mail settings
Submitter Jörn Engel
Date April 22, 2010, 8:33 p.m.
Message ID <20100422203358.GB30749@logfs.org>
Download mbox | patch
Permalink /patch/50757/
State New
Headers show

Comments

Jörn Engel - April 22, 2010, 8:33 p.m.
On Thu, 22 April 2010 18:27:10 +0200, Jens Axboe wrote:
> 
> > Jens - please help fix this up.
> 
> Of course, I already posted a series of patches to fix this up. I want
> to test them a bit, and I'll send them in tomorrow.

How about something like this to catch future cases?  It compiles and
survived a test boot, so it does seem to work for the common cases like
tmpfs, procfs, etc.

Jens, you know the bdi code 10x better than me, would this work?

Jörn
Jens Axboe - April 23, 2010, 10:05 a.m.
On Thu, Apr 22 2010, Jörn Engel wrote:
> On Thu, 22 April 2010 18:27:10 +0200, Jens Axboe wrote:
> > 
> > > Jens - please help fix this up.
> > 
> > Of course, I already posted a series of patches to fix this up. I want
> > to test them a bit, and I'll send them in tomorrow.
> 
> How about something like this to catch future cases?  It compiles and
> survived a test boot, so it does seem to work for the common cases like
> tmpfs, procfs, etc.
> 
> Jens, you know the bdi code 10x better than me, would this work?

Looks sane, it's a good start. I think we should augment that with a
check to ensure that we don't ever add dirty inodes to this bdi, since
it's not going to be flushed.

Something like a:

        WARN_ON(bdi == &noop_backing_dev_info);

to __mark_inode_dirty(). Looking at the code it should already trigger a
warning, since it'll check for BDI_CAP_NO_WRITEBACK (which isn't set
for noop_backing_dev_info) and the fact that noop-bdi isn't registered
to begin with.

So it's probably safe and good enough as-is, I'll add it. Thanks!
Jörn Engel - April 23, 2010, 8:55 p.m.
On Fri, 23 April 2010 12:05:32 +0200, Jens Axboe wrote:
> 
> So it's probably safe and good enough as-is, I'll add it. Thanks!

I cannot see this patch in your tree yet.  Could be the weekend or a
deliberate decision not to send this for 2.6.34-rc anymore.

In case it was a deliberate decision, can we please make it explicit?  I
don't like the idea of adding a BUG_ON() that potentially triggers for
thousands of people this late in the stabilization process - but it is
better than having people lose data.  Even if we already ran two stable
kernels that way.

Damned if you do, damned if you don't. :(

Jörn
Jens Axboe - April 26, 2010, 9:48 a.m.
On Fri, Apr 23 2010, Jörn Engel wrote:
> On Fri, 23 April 2010 12:05:32 +0200, Jens Axboe wrote:
> > 
> > So it's probably safe and good enough as-is, I'll add it. Thanks!
> 
> I cannot see this patch in your tree yet.  Could be the weekend or a
> deliberate decision not to send this for 2.6.34-rc anymore.

It's there, I put it in yesterday. It's definitely 2.6.34-rc material, I
hope to submit it tonight.

> In case it was a deliberate decision, can we please make it explicit?  I
> don't like the idea of adding a BUG_ON() that potentially triggers for
> thousands of people this late in the stabilization process - but it is
> better than having people lose data.  Even if we already ran two stable
> kernels that way.
> 
> Damned if you do, damned if you don't. :(

Yeah, it's a bad situation to be in. I changed that BUG_ON() to a
WARN_ON(). I'm not too worried about that part, I'm more worried about
the file system changed. OTOH, they do lack proper flushing now, so it's
likely not a huge risk from that perspective.
Jörn Engel - April 26, 2010, 2:32 p.m.
On Mon, 26 April 2010 11:48:11 +0200, Jens Axboe wrote:
> On Fri, Apr 23 2010, Jörn Engel wrote:
> > On Fri, 23 April 2010 12:05:32 +0200, Jens Axboe wrote:
> > > 
> > > So it's probably safe and good enough as-is, I'll add it. Thanks!
> > 
> > I cannot see this patch in your tree yet.  Could be the weekend or a
> > deliberate decision not to send this for 2.6.34-rc anymore.
> 
> It's there, I put it in yesterday. It's definitely 2.6.34-rc material, I
> hope to submit it tonight.

Ok, thanks.

> > In case it was a deliberate decision, can we please make it explicit?  I
> > don't like the idea of adding a BUG_ON() that potentially triggers for
> > thousands of people this late in the stabilization process - but it is
> > better than having people lose data.  Even if we already ran two stable
> > kernels that way.
> > 
> > Damned if you do, damned if you don't. :(
> 
> Yeah, it's a bad situation to be in. I changed that BUG_ON() to a
> WARN_ON(). I'm not too worried about that part, I'm more worried about
> the file system changed. OTOH, they do lack proper flushing now, so it's
> likely not a huge risk from that perspective.

It is worse still.  Using mtd->backing_dev_info results in 
kernel BUG at fs/fs-writeback.c:157

which is BUG_ON(!work->seen); in bdi_queue_work().  Logfs is affected
and I bet jffs2 is as well.  So much for dwmw2 or me actually testing
the fix. :(

I did a hexdump to see what sb->s_bdi actually contained and the result
was this:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00                          ........

Which should be mtd_bdi_unmappable.  And at this point I have to admit
being clueless.  What exactly should a struct backing_dev_info contain
and for what purposes?  And where is this documented?

Jörn
Jens Axboe - April 26, 2010, 2:38 p.m.
On Mon, Apr 26 2010, Jörn Engel wrote:
> > Yeah, it's a bad situation to be in. I changed that BUG_ON() to a
> > WARN_ON(). I'm not too worried about that part, I'm more worried about
> > the file system changed. OTOH, they do lack proper flushing now, so it's
> > likely not a huge risk from that perspective.
> 
> It is worse still.  Using mtd->backing_dev_info results in 
> kernel BUG at fs/fs-writeback.c:157
> 
> which is BUG_ON(!work->seen); in bdi_queue_work().  Logfs is affected
> and I bet jffs2 is as well.  So much for dwmw2 or me actually testing
> the fix. :(
> 
> I did a hexdump to see what sb->s_bdi actually contained and the result
> was this:
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 00 00 00 00                          ........
> 
> Which should be mtd_bdi_unmappable.  And at this point I have to admit
> being clueless.  What exactly should a struct backing_dev_info contain
> and for what purposes?  And where is this documented?

The important bit is that each bdi must be initialized and registered if
it's going to be handling dirty data, it can't just be a static
placeholder. See the bdi_setup_and_register() helper I added.
Jens Axboe - April 26, 2010, 2:45 p.m.
On Mon, Apr 26 2010, Jens Axboe wrote:
> On Mon, Apr 26 2010, Jörn Engel wrote:
> > > Yeah, it's a bad situation to be in. I changed that BUG_ON() to a
> > > WARN_ON(). I'm not too worried about that part, I'm more worried about
> > > the file system changed. OTOH, they do lack proper flushing now, so it's
> > > likely not a huge risk from that perspective.
> > 
> > It is worse still.  Using mtd->backing_dev_info results in 
> > kernel BUG at fs/fs-writeback.c:157
> > 
> > which is BUG_ON(!work->seen); in bdi_queue_work().  Logfs is affected
> > and I bet jffs2 is as well.  So much for dwmw2 or me actually testing
> > the fix. :(
> > 
> > I did a hexdump to see what sb->s_bdi actually contained and the result
> > was this:
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 00 00 00 00                          ........
> > 
> > Which should be mtd_bdi_unmappable.  And at this point I have to admit
> > being clueless.  What exactly should a struct backing_dev_info contain
> > and for what purposes?  And where is this documented?
> 
> The important bit is that each bdi must be initialized and registered if
> it's going to be handling dirty data, it can't just be a static
> placeholder. See the bdi_setup_and_register() helper I added.

Took a quick look, and you want bdi_setup_and_register() for the three
bdis listed in mtdbdi.c in mtdcore.c:init_mtd(). Or manual bdi_init()
and bdi_register(). I'll take a look post-dinner. Either is workable,
but since the flags are already setup, the latter may be cleaner.

Patch

diff --git a/fs/super.c b/fs/super.c
index f35ac60..dc72491 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -693,6 +693,7 @@  int set_anon_super(struct super_block *s, void *data)
 		return -EMFILE;
 	}
 	s->s_dev = MKDEV(0, dev & MINORMASK);
+	s->s_bdi = &noop_backing_dev_info;
 	return 0;
 }
 
@@ -954,10 +955,11 @@  vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	if (error < 0)
 		goto out_free_secdata;
 	BUG_ON(!mnt->mnt_sb);
+	BUG_ON(!mnt->mnt_sb->s_bdi);
 
- 	error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata);
- 	if (error)
- 		goto out_sb;
+	error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata);
+	if (error)
+		goto out_sb;
 
 	/*
 	 * filesystems should never set s_maxbytes larger than MAX_LFS_FILESIZE
diff --git a/fs/sync.c b/fs/sync.c
index fc5c3d7..92b2281 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -14,6 +14,7 @@ 
 #include <linux/pagemap.h>
 #include <linux/quotaops.h>
 #include <linux/buffer_head.h>
+#include <linux/backing-dev.h>
 #include "internal.h"
 
 #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \
@@ -32,7 +33,7 @@  static int __sync_filesystem(struct super_block *sb, int wait)
 	 * This should be safe, as we require bdi backing to actually
 	 * write out data in the first place
 	 */
-	if (!sb->s_bdi)
+	if (!sb->s_bdi || sb->s_bdi == &noop_backing_dev_info)
 		return 0;
 
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index fcbc26a..f4a1436 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -246,6 +246,7 @@  int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
 #endif
 
 extern struct backing_dev_info default_backing_dev_info;
+extern struct backing_dev_info noop_backing_dev_info;
 void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page);
 
 int writeback_in_progress(struct backing_dev_info *bdi);
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index f13e067..4aba836 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -25,6 +25,11 @@  struct backing_dev_info default_backing_dev_info = {
 };
 EXPORT_SYMBOL_GPL(default_backing_dev_info);
 
+struct backing_dev_info noop_backing_dev_info = {
+	.name		= "noop",
+};
+EXPORT_SYMBOL_GPL(noop_backing_dev_info);
+
 static struct class *bdi_class;
 
 /*