diff mbox

[HACK] fs/super.c: sync ro remount after blocking writers

Message ID 1391095614-21554-1-git-send-email-andrew.ruder@elecsyscorp.com
State RFC
Headers show

Commit Message

Andrew Ruder Jan. 30, 2014, 3:26 p.m. UTC
Move sync_filesystem() after sb_prepare_remount_readonly().  If writers
sneak in anywhere from sync_filesystem() to sb_prepare_remount_readonly()
it can cause inodes to be dirtied and writeback to occur well after
sys_mount() has completely successfully.

This was spotted by corrupted ubifs filesystems on reboot, but appears
that it can cause issues with any filesystem using writeback.

Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Richard Weinberger <richard@nod.at>
Co-authored-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Ruder <andrew.ruder@elecsyscorp.com>
---

I marked this as hack as because there is still a race condition concerning
the force == 1 situation.  mark_files_ro() never actually blocks new
writers even through the filesystem-specific remount code because nothing
ever sets sb->s_readonly_remount.

 fs/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Richard Weinberger Jan. 31, 2014, 8:20 a.m. UTC | #1
Am 30.01.2014 16:26, schrieb Andrew Ruder:
> Move sync_filesystem() after sb_prepare_remount_readonly().  If writers
> sneak in anywhere from sync_filesystem() to sb_prepare_remount_readonly()
> it can cause inodes to be dirtied and writeback to occur well after
> sys_mount() has completely successfully.
> 
> This was spotted by corrupted ubifs filesystems on reboot, but appears
> that it can cause issues with any filesystem using writeback.

Link to original report:
http://lists.infradead.org/pipermail/linux-mtd/2014-January/051651.html

What we see is that writeback still happens after mounting the fs ro.

Thanks,
//richard
Christoph Hellwig Feb. 3, 2014, 10:23 a.m. UTC | #2
On Thu, Jan 30, 2014 at 09:26:54AM -0600, Andrew Ruder wrote:
> Move sync_filesystem() after sb_prepare_remount_readonly().  If writers
> sneak in anywhere from sync_filesystem() to sb_prepare_remount_readonly()
> it can cause inodes to be dirtied and writeback to occur well after
> sys_mount() has completely successfully.
> 
> This was spotted by corrupted ubifs filesystems on reboot, but appears
> that it can cause issues with any filesystem using writeback.

From the link that Richard posted it seems like you have a testcase.
Can you please integrate it into xfstests so that we can properly
regression test for this issue from now on?
Richard Weinberger Feb. 28, 2014, 9:25 a.m. UTC | #3
Andrew,

Am 03.02.2014 11:23, schrieb Christoph Hellwig:
> On Thu, Jan 30, 2014 at 09:26:54AM -0600, Andrew Ruder wrote:
>> Move sync_filesystem() after sb_prepare_remount_readonly().  If writers
>> sneak in anywhere from sync_filesystem() to sb_prepare_remount_readonly()
>> it can cause inodes to be dirtied and writeback to occur well after
>> sys_mount() has completely successfully.
>>
>> This was spotted by corrupted ubifs filesystems on reboot, but appears
>> that it can cause issues with any filesystem using writeback.
> 
> From the link that Richard posted it seems like you have a testcase.
> Can you please integrate it into xfstests so that we can properly
> regression test for this issue from now on?

Is it possible to create such a test case?
I don't know whether it is possible to trigger the issue on
a regular filesystem.
But as hch noted, it would be nice to have. :)

Thanks,
//richard
Andrew Ruder March 3, 2014, 9:12 p.m. UTC | #4
On Fri, Feb 28, 2014 at 10:25:24AM +0100, Richard Weinberger wrote:
> Is it possible to create such a test case?  I don't know whether it is
> possible to trigger the issue on a regular filesystem.  But as hch
> noted, it would be nice to have. :)

I've given this some thought after initially receiving the email and it
would not be easily recreated.  As best as I can tell, the test case
would be something along the line of:

1. # Start up a filesystem writer
2. mount -o remount,ro /testfilesystem
3. # Immediately mark block device as read only
4. # Reboot, show that the filesystem is dirty next time around.

Step 3 I'm not sure how to do, Step 4 makes for the world's most painful
test case.  The thread from Leon Pollak is probably the easiest way to
recreate these issues (remount,ro followed by a hardware-enabled
read-only) and I'm just not sure it belongs in a test case.

- Andy
diff mbox

Patch

diff --git a/fs/super.c b/fs/super.c
index 0225c20..1912090d 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -735,59 +735,59 @@  rescan:
 int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 {
 	int retval;
 	int remount_ro;
 
 	if (sb->s_writers.frozen != SB_UNFROZEN)
 		return -EBUSY;
 
 #ifdef CONFIG_BLOCK
 	if (!(flags & MS_RDONLY) && bdev_read_only(sb->s_bdev))
 		return -EACCES;
 #endif
 
 	if (flags & MS_RDONLY)
 		acct_auto_close(sb);
 	shrink_dcache_sb(sb);
-	sync_filesystem(sb);
 
 	remount_ro = (flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY);
 
 	/* If we are remounting RDONLY and current sb is read/write,
 	   make sure there are no rw files opened */
 	if (remount_ro) {
 		if (force) {
 			mark_files_ro(sb);
 		} else {
 			retval = sb_prepare_remount_readonly(sb);
 			if (retval)
 				return retval;
 		}
 	}
+	sync_filesystem(sb);
 
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		if (retval) {
 			if (!force)
 				goto cancel_readonly;
 			/* If forced remount, go ahead despite any errors */
 			WARN(1, "forced remount of a %s fs returned %i\n",
 			     sb->s_type->name, retval);
 		}
 	}
 	sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK);
 	/* Needs to be ordered wrt mnt_is_readonly() */
 	smp_wmb();
 	sb->s_readonly_remount = 0;
 
 	/*
 	 * Some filesystems modify their metadata via some other path than the
 	 * bdev buffer cache (eg. use a private mapping, or directories in
 	 * pagecache, etc). Also file data modifications go via their own
 	 * mappings. So If we try to mount readonly then copy the filesystem
 	 * from bdev, we could get stale data, so invalidate it to give a best
 	 * effort at coherency.
 	 */
 	if (remount_ro && sb->s_bdev)
 		invalidate_bdev(sb->s_bdev);
 	return 0;