diff mbox

[RFC] fs: only call sync_filesystem() when remounting read-only

Message ID 20140308160818.GC11633@thunk.org
State Superseded, archived
Headers show

Commit Message

Theodore Ts'o March 8, 2014, 4:08 p.m. UTC
On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> TL;DR: we experience long temporary hangs when doing multiple mount -o
> remount at the same time as other I/O on an ext4 filesystem.
> 
> When starting hundreds of LXC containers simultaneously on a system, the
> boot of some containers was hanging. We tracked this down to an
> initscript's use of mount -o remount, which was hanging in D state.
> 
> We reproduced the problem outside of LXC, with the script available at
> [0]. That script initiates 1000 mount -o remount, and performs some
> writes using a big cp to the same filesystem during the remounts....

+linux-fsdevel since the patch modifies fs/super.c

Lukas, can you try this patch?  I'm pretty sure this is what's going
on.  It turns out each "mount -o remount" is implying an fsync(), so
your test case is identical to copying a large file while having
thousand of processes calling syncfs() on the file system, with the
predictable results.

Folks on linux-fsdevel, any objections if I carry this patch in the
ext4 tree?  I don't think it should cause problems for other file
systems, since any file system that tries to rely on the implied
syncfs() is going to be subject to races, but it might make such a
race condition bug much more visible...

					- Ted

commit 8862c3c69acc205b59b00baed67e50446e2fd093
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sat Mar 8 11:05:35 2014 -0500

    fs: only call sync_filesystem() when remounting read-only
    
    Currently "mount -o remount" always implies an syncfs() on the file
    system.  This can cause a problem if a workload calls "mount -o
    remount" many, many times while concurrent I/O is happening:
    
       http://article.gmane.org/gmane.comp.file-systems.ext4/42876
    
    Whether it would ever be sane for a workload to call "mount -o
    remount" gazillions of times when they are effectively no-ops, it
    seems stupid for a remount to imply an fsync().
    
    It's possible that there is some file system which is relying on the
    implied fsync(), but that's arguably broken, since aside for the
    remount read-only case, there's nothing that will prevent other writes
    from sneaking in between the sync_filesystem() and the call to
    sb->s_op->remount_fs().
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Lucas Nussbaum March 10, 2014, 11:45 a.m. UTC | #1
On 08/03/14 at 11:08 -0500, Theodore Ts'o wrote:
> On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> > TL;DR: we experience long temporary hangs when doing multiple mount -o
> > remount at the same time as other I/O on an ext4 filesystem.
> > 
> > When starting hundreds of LXC containers simultaneously on a system, the
> > boot of some containers was hanging. We tracked this down to an
> > initscript's use of mount -o remount, which was hanging in D state.
> > 
> > We reproduced the problem outside of LXC, with the script available at
> > [0]. That script initiates 1000 mount -o remount, and performs some
> > writes using a big cp to the same filesystem during the remounts....
> 
> +linux-fsdevel since the patch modifies fs/super.c
> 
> Lukas, can you try this patch?  I'm pretty sure this is what's going
> on.  It turns out each "mount -o remount" is implying an fsync(), so
> your test case is identical to copying a large file while having
> thousand of processes calling syncfs() on the file system, with the
> predictable results.

Hi Ted,

I can confirm that:
1) the patch solves my problem
2) issuing 'sync' instead of 'mount -o remount' indeed exhibits the
   problem again

However, I'm curious: why would such a workload (multiple syncfs()
initiated during a write) block for several minutes on an ext4
filesystem? I've just tried again on ext3, and it's not a problem in
that case.

Lucas
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lucas Nussbaum March 10, 2014, 12:15 p.m. UTC | #2
Hi Ted,

That Cc: line:
Cc: linux-ext4@vger.kernel.org, "linux-fsdevel@vger.kernel.org Emmanuel Jeanvoine" <emmanuel.jeanvoine@inria.fr>
sounds wrong. You might want to re-send to linux-fsdevel@.

Thanks

Lucas

On 08/03/14 at 11:08 -0500, Theodore Ts'o wrote:
> On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> > TL;DR: we experience long temporary hangs when doing multiple mount -o
> > remount at the same time as other I/O on an ext4 filesystem.
> > 
> > When starting hundreds of LXC containers simultaneously on a system, the
> > boot of some containers was hanging. We tracked this down to an
> > initscript's use of mount -o remount, which was hanging in D state.
> > 
> > We reproduced the problem outside of LXC, with the script available at
> > [0]. That script initiates 1000 mount -o remount, and performs some
> > writes using a big cp to the same filesystem during the remounts....
> 
> +linux-fsdevel since the patch modifies fs/super.c
> 
> Lukas, can you try this patch?  I'm pretty sure this is what's going
> on.  It turns out each "mount -o remount" is implying an fsync(), so
> your test case is identical to copying a large file while having
> thousand of processes calling syncfs() on the file system, with the
> predictable results.
> 
> Folks on linux-fsdevel, any objections if I carry this patch in the
> ext4 tree?  I don't think it should cause problems for other file
> systems, since any file system that tries to rely on the implied
> syncfs() is going to be subject to races, but it might make such a
> race condition bug much more visible...
> 
> 					- Ted
> 
> commit 8862c3c69acc205b59b00baed67e50446e2fd093
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Sat Mar 8 11:05:35 2014 -0500
> 
>     fs: only call sync_filesystem() when remounting read-only
>     
>     Currently "mount -o remount" always implies an syncfs() on the file
>     system.  This can cause a problem if a workload calls "mount -o
>     remount" many, many times while concurrent I/O is happening:
>     
>        http://article.gmane.org/gmane.comp.file-systems.ext4/42876
>     
>     Whether it would ever be sane for a workload to call "mount -o
>     remount" gazillions of times when they are effectively no-ops, it
>     seems stupid for a remount to imply an fsync().
>     
>     It's possible that there is some file system which is relying on the
>     implied fsync(), but that's arguably broken, since aside for the
>     remount read-only case, there's nothing that will prevent other writes
>     from sneaking in between the sync_filesystem() and the call to
>     sb->s_op->remount_fs().
>     
>     Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> 
> diff --git a/fs/super.c b/fs/super.c
> index 80d5cf2..0fc87ac 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -717,10 +717,9 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
>  			if (retval)
>  				return retval;
>  		}
> +		sync_filesystem(sb);
>  	}
>  
> -	sync_filesystem(sb);
> -
>  	if (sb->s_op->remount_fs) {
>  		retval = sb->s_op->remount_fs(sb, &flags, data);
>  		if (retval) {
>
Theodore Ts'o March 10, 2014, 2:41 p.m. UTC | #3
On Mon, Mar 10, 2014 at 12:45:08PM +0100, Lucas Nussbaum wrote:
> > Lukas, can you try this patch?  I'm pretty sure this is what's going
> > on.  It turns out each "mount -o remount" is implying an fsync(), so
> > your test case is identical to copying a large file while having
> > thousand of processes calling syncfs() on the file system, with the
> > predictable results.
> 
> Hi Ted,
> 
> I can confirm that:
> 1) the patch solves my problem
> 2) issuing 'sync' instead of 'mount -o remount' indeed exhibits the
>    problem again
> 
> However, I'm curious: why would such a workload (multiple syncfs()
> initiated during a write) block for several minutes on an ext4
> filesystem? I've just tried again on ext3, and it's not a problem in
> that case.

The reason why is because ext3 is less careful than ext4.
ext3_sync_fs() simply tries to start a commit, and if there is already
a commit already started, it does nothing.  So if you issue a
gazillion syncfs() calls, with ext3, it's a no-op.

For ext4, each syncfs() call will result in a SYNC_CACHE flushh being
sent to the disk:

	/*
	 * Data writeback is possible w/o journal transaction, so barrier must
	 * being sent at the end of the function. But we can skip it if
	 * transaction_commit will do it for us.
	 */
	target = jbd2_get_latest_transaction(sbi->s_journal);
	if (wait && sbi->s_journal->j_flags & JBD2_BARRIER &&
	    !jbd2_trans_will_send_data_barrier(sbi->s_journal, target))
		needs_barrier = true;
		.
		.
		.
	if (needs_barrier) {
		int err;
		err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
		if (!ret)
			ret = err;
	}

We can debate whether or not this care is necessary, and since
syncfs() isn't terribly reliable, we could add hacks so that if an
syncfs() had been issued in the last 100ms, we could make it be a
no-op, or some other horrible hack.

But given that these hacks are horrible, it's not clear that it's
worth it to do all of this just to something where userspace is doing
something really stupid, whether it is issuing thousands of syncfs()
or "mount -o remount" requests per second.

Cheers,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner March 13, 2014, 12:36 a.m. UTC | #4
On Sat, Mar 08, 2014 at 11:08:18AM -0500, Theodore Ts'o wrote:
> On Wed, Mar 05, 2014 at 03:13:43PM +0100, Lucas Nussbaum wrote:
> > TL;DR: we experience long temporary hangs when doing multiple mount -o
> > remount at the same time as other I/O on an ext4 filesystem.
> > 
> > When starting hundreds of LXC containers simultaneously on a system, the
> > boot of some containers was hanging. We tracked this down to an
> > initscript's use of mount -o remount, which was hanging in D state.
> > 
> > We reproduced the problem outside of LXC, with the script available at
> > [0]. That script initiates 1000 mount -o remount, and performs some
> > writes using a big cp to the same filesystem during the remounts....
> 
> +linux-fsdevel since the patch modifies fs/super.c
> 
> Lukas, can you try this patch?  I'm pretty sure this is what's going
> on.  It turns out each "mount -o remount" is implying an fsync(), so
> your test case is identical to copying a large file while having
> thousand of processes calling syncfs() on the file system, with the
> predictable results.
> 
> Folks on linux-fsdevel, any objections if I carry this patch in the
> ext4 tree?  I don't think it should cause problems for other file
> systems, since any file system that tries to rely on the implied
> syncfs() is going to be subject to races, but it might make such a
> race condition bug much more visible...

IMO, I think that you should be looking to fix ext4 syncfs issues,
not changing the VFS behaviour that might cause subtle and unnoticed
problems for other filesystems. We should not be moving data
inegrity operations without first auditing of all the filesystem
remount operations for issues.

Cheers,

Dave.
Theodore Ts'o March 13, 2014, 1:16 a.m. UTC | #5
On Thu, Mar 13, 2014 at 11:36:35AM +1100, Dave Chinner wrote:
> > Folks on linux-fsdevel, any objections if I carry this patch in the
> > ext4 tree?  I don't think it should cause problems for other file
> > systems, since any file system that tries to rely on the implied
> > syncfs() is going to be subject to races, but it might make such a
> > race condition bug much more visible...
> 
> IMO, I think that you should be looking to fix ext4 syncfs issues,
> not changing the VFS behaviour that might cause subtle and unnoticed
> problems for other filesystems. We should not be moving data
> inegrity operations without first auditing of all the filesystem
> remount operations for issues.

The issue is that it's forcing a CACHE FLUSH if we don't need to force
a journal commit, since it's possible that data writes could have been
sent to the disk without modifying fs metadata that would require a
commit.  So arguably what we're doing with ext4 is _correct_, where as
with ext3 we would simply not calling blkdev_issue_barrier() in that
situation.

The issue is that if userspace executes a no-op remount, there
shouldn't be a reason to call sync_filesystem() at all.  But I'm also
not so sure that I should be that solicitous of a workload where
someone is calling thousands and thousands of no-op remounts.....

	   	   	     	 	      	    - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 13, 2014, 3:14 a.m. UTC | #6
On Wed, Mar 12, 2014 at 09:16:29PM -0400, Theodore Ts'o wrote:
> > IMO, I think that you should be looking to fix ext4 syncfs issues,
> > not changing the VFS behaviour that might cause subtle and unnoticed
> > problems for other filesystems. We should not be moving data
> > inegrity operations without first auditing of all the filesystem
> > remount operations for issues.
> 
> The issue is that it's forcing a CACHE FLUSH if we don't need to force
> a journal commit, since it's possible that data writes could have been
> sent to the disk without modifying fs metadata that would require a
> commit.  So arguably what we're doing with ext4 is _correct_, where as
> with ext3 we would simply not calling blkdev_issue_barrier() in that
> situation.

Doing some more digging, ext4 is currently interpreting syncfs() as
requiring a data integrity sync.  So we go through some extra work to
guarantee that we call blkdev_issue_barrier(), even if a journal
commit is not required.

This change was made by Dmitry last June:

commit 06a407f13daf9e48f0ef7189c7e54082b53940c7
Author: Dmitry Monakhov <dmonakhov@openvz.org>
Date:   Wed Jun 12 22:25:07 2013 -0400

    ext4: fix data integrity for ext4_sync_fs
    
    Inode's data or non journaled quota may be written w/o jounral so we
    _must_ send a barrier at the end of ext4_sync_fs. But it can be
    skipped if journal commit will do it for us.
    
    Also fix data integrity for nojournal mode.
    
    Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Both ext3 and xfs do *not* do this.

Looking more closely at the syncfs(2) manpage, it's not clear it
requires this:

       sync() causes all buffered modifications to file metadata and
       data to be written to the underlying filesystems.

       syncfs() is like sync(), but synchronizes just the filesystem
       containing file referred to by the open file descriptor fd.

Unlike the fsync(2) system call, it does *not* state that the data
flushed to the disk is guaranteed to be there after a crash, which I
suppose justifies ext3 and xfs's current behavior.

So the way I see it, we have three choices.

1)  Nowhere in the remount system call is it stated that it has
    ***any*** data integrity implifications.   If you are making the rw->ro
    transition, sure, you'll need to flush out any pending changes.  But there
    doesn't seem to be any justification for requiring this this if the
    remount is a no-op.   So I think changing the remount code path as I
    suggested is a valid option.

2) We could revert Dmitry's change from last June.  This would make
   ext4 work the same way as ext3 and xfs.  Which I think is also
   valid, since the syncfs(2) system call says nothing about
   guaranteeing data being preserved after a crash, unlike fsync(2).

3) We could say that a workload that calls thousands of no-op remounts
   to be stupid/busted/silly, and not do anything at all.


#1 requires core VFS changes, and Dave seems unhappy with it.

#2 requires rolling back an ext4 change, and I wonder if Dmitry had a
situation where he really needed syncfs(2) to have data integrity
guarantees.

#3 is the default if we can't come to consensus over what else to do.

Cheers,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner March 13, 2014, 6:04 a.m. UTC | #7
On Wed, Mar 12, 2014 at 11:14:40PM -0400, Theodore Ts'o wrote:
> On Wed, Mar 12, 2014 at 09:16:29PM -0400, Theodore Ts'o wrote:
> > > IMO, I think that you should be looking to fix ext4 syncfs issues,
> > > not changing the VFS behaviour that might cause subtle and unnoticed
> > > problems for other filesystems. We should not be moving data
> > > inegrity operations without first auditing of all the filesystem
> > > remount operations for issues.
> > 
> > The issue is that it's forcing a CACHE FLUSH if we don't need to force
> > a journal commit, since it's possible that data writes could have been
> > sent to the disk without modifying fs metadata that would require a
> > commit.  So arguably what we're doing with ext4 is _correct_, where as
> > with ext3 we would simply not calling blkdev_issue_barrier() in that
> > situation.
> 
> Doing some more digging, ext4 is currently interpreting syncfs() as
> requiring a data integrity sync.  So we go through some extra work to
> guarantee that we call blkdev_issue_barrier(), even if a journal
> commit is not required.
> 
> This change was made by Dmitry last June:
> 
> commit 06a407f13daf9e48f0ef7189c7e54082b53940c7
> Author: Dmitry Monakhov <dmonakhov@openvz.org>
> Date:   Wed Jun 12 22:25:07 2013 -0400
> 
>     ext4: fix data integrity for ext4_sync_fs
>     
>     Inode's data or non journaled quota may be written w/o jounral so we
>     _must_ send a barrier at the end of ext4_sync_fs. But it can be
>     skipped if journal commit will do it for us.
>     
>     Also fix data integrity for nojournal mode.
>     
>     Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
>     Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> 
> Both ext3 and xfs do *not* do this.

XFS most definitely considered sync_filesystem(sb) to be a data
integrity operation. xfs_fs_sync_fs() calls this:

	xfs_log_force(mp, XFS_LOG_SYNC);

Which will issue a blocking journal commit which will uses
REQ_FLUSH|REQ_FUA for the journal writes. Hence if there was
anything dirty in the filesystem that sync_filesystem wrote to disk,
it will issue a cache flush just like ext4 does.

> Looking more closely at the syncfs(2) manpage, it's not clear it
> requires this:
> 
>        sync() causes all buffered modifications to file metadata and
>        data to be written to the underlying filesystems.
> 
>        syncfs() is like sync(), but synchronizes just the filesystem
>        containing file referred to by the open file descriptor fd.
> 
> Unlike the fsync(2) system call, it does *not* state that the data
> flushed to the disk is guaranteed to be there after a crash, which I
> suppose justifies ext3 and xfs's current behavior.

sync() data integrity is provided by sync_inodes_sb() followed by
->sync_fs().

syncfs() data integrity is provided by sync_inodes_sb() followed by
->sync_fs().

They are functionally identical and so provide the same guarantees.
That is the intent of the syncfs syscall, and there are lots of
people out there using it for data integrity purposes.

> 1)  Nowhere in the remount system call is it stated that it has
>     ***any*** data integrity implifications.   If you are making the rw->ro
>     transition, sure, you'll need to flush out any pending changes.  But there
>     doesn't seem to be any justification for requiring this this if the
>     remount is a no-op.   So I think changing the remount code path as I
>     suggested is a valid option.

What the man page says doesn't change the fact we need to audit all
the existing filesystems before such a change is made.

> 2) We could revert Dmitry's change from last June.  This would make
>    ext4 work the same way as ext3 and xfs.  Which I think is also
>    valid, since the syncfs(2) system call says nothing about
>    guaranteeing data being preserved after a crash, unlike fsync(2).

It wouldmake ext4 the same as ext3. XFS definitely guarantees
data integrity through syncfs()....

> 3) We could say that a workload that calls thousands of no-op remounts
>    to be stupid/busted/silly, and not do anything at all.

Sure, but the reporter indicated that if he replaced the remount
with sync then the problem still existed. IOWs, you haven't fixed
root cause of the problem, merely papered over a symptom.

> #1 requires core VFS changes, and Dave seems unhappy with it.

I'm unhappy with your approach to the change (i.e. no validation of
the assertions made about other filesystems), not about the actual
change.

So:

4) fix ext4 not to issue unnecessary cache flushes, or find and fix
whatever is actually causing sync to be slow (maybe some of these
issues: https://lwn.net/Articles/561569/).

Cheers,

Dave.
Christoph Hellwig March 13, 2014, 7:39 a.m. UTC | #8
On Thu, Mar 13, 2014 at 11:36:35AM +1100, Dave Chinner wrote:
> IMO, I think that you should be looking to fix ext4 syncfs issues,
> not changing the VFS behaviour that might cause subtle and unnoticed
> problems for other filesystems. We should not be moving data
> inegrity operations without first auditing of all the filesystem
> remount operations for issues.

Requiring a sync on every remount that doesn't go read-only seems odd
to me, so removing it doesn't sound bad.  However I agree that a proper
audit of filesystems should be done, e.g.:

  patch 1:
  	move calls into the filesystems, explaining why filesystems not
	implementing ->remount_fs should be safe
  patch 2:
  	remove call from ext4, safe because of $FOO
  patch b:
	remove call from $fs, safe because of $BAR

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 13, 2014, 12:55 p.m. UTC | #9
On Thu, Mar 13, 2014 at 05:04:44PM +1100, Dave Chinner wrote:
> XFS most definitely considered sync_filesystem(sb) to be a data
> integrity operation. xfs_fs_sync_fs() calls this:
> 
> 	xfs_log_force(mp, XFS_LOG_SYNC);
> 
> Which will issue a blocking journal commit which will uses
> REQ_FLUSH|REQ_FUA for the journal writes. Hence if there was
> anything dirty in the filesystem that sync_filesystem wrote to disk,
> it will issue a cache flush just like ext4 does.

Sorry, I didn't quite do my experimets right, apparetly.

So I did the following test:

dd if=/etc/motd of=/mnt/test conv=notrunc ; mount -o remount /mnt ; mount -o remount /mnt ; mount -o remount /mnt

This is what XFS does:

252,2    0        1     0.000000000 15453  Q   W 88 + 8 [kworker/u16:1]
252,2    0        2     0.000230348 15453  C   W 88 + 8 [0]
252,2    7        1     0.000334892 15544  Q FWFSM 5243112 + 8 [mount]
252,2    0        3     0.251131828     0  C WFSM 5243112 + 8 [0]
252,2    0        4     0.251495890 15087  Q  WM 96 + 16 [xfsaild/dm-2]
252,2    0        5     0.251729470     0  C  WM 96 + 16 [0]
252,2    4        1     0.263317936 11070  Q FWFSM 5243120 + 8 [kworker/4:2]
252,2    0        6     0.273394400     0  C WFSM 5243120 + 8 [0]
252,2    0        7     0.273678692 15087  Q  WM 0 + 8 [xfsaild/dm-2]
252,2    0        8     0.273902550     0  C  WM 0 + 8 [0]
252,2    0        9     0.295673237     0  C WFSM 5243128 + 8 [0]
252,2    0       10     0.296035803 15087  Q  WM 0 + 8 [xfsaild/dm-2]
252,2    0       11     0.296266732     0  C  WM 0 + 8 [0]
252,2    7        2     0.286271844 24012  Q FWFSM 5243128 + 8 [kworker/7:0]

... and this is what ext4 does:

252,4    7        8    10.973326622 15512  Q  WM 78 + 2 [mount]
252,4    7        9    10.973576941 15512  Q FWS [mount]
252,4    2        1    10.973141488 15271  Q   W 108 + 2 [kworker/u16:3]
252,4    0       24    10.973390538     0  C   W 108 + 2 [0]
252,4    0       25    10.973548736     0  C  WM 78 + 2 [0]
252,4    7       10    11.244052462 15513  Q FWS [mount]
252,4    0       26    11.231292967     0  C FWS 0 [0]
252,4    0       27    11.231452643 15512  Q  WS 2 + 2 [mount]
252,4    0       28    11.231686794     0  C  WS 2 + 2 [0]
252,4    7       11    11.266337812 15514  Q FWS [mount]
252,4    0       29    11.253022650     0  C FWS 0 [0]
252,4    0       30    11.253135113 15513  Q  WS 2 + 2 [mount]
252,4    0       31    11.253376707     0  C  WS 2 + 2 [0]
252,4    0       32    11.266640135     0  C FWS 0 [0]
252,4    0       33    11.266727461 15514  Q  WS 2 + 2 [mount]
252,4    0       34    11.266954710     0  C  WS 2 + 2 [0]

> > 1)  Nowhere in the remount system call is it stated that it has
> >     ***any*** data integrity implications.   If you are making the rw->ro
> >     transition, sure, you'll need to flush out any pending changes.  But there
> >     doesn't seem to be any justification for requiring this this if the
> >     remount is a no-op.   So I think changing the remount code path as I
> >     suggested is a valid option.
> 
> What the man page says doesn't change the fact we need to audit all
> the existing filesystems before such a change is made.

Fair enough, I'll do what Christoph suggested and move the
sync_filesystems() call into all of the existing file systems.

		   	     	    - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/super.c b/fs/super.c
index 80d5cf2..0fc87ac 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -717,10 +717,9 @@  int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 			if (retval)
 				return retval;
 		}
+		sync_filesystem(sb);
 	}
 
-	sync_filesystem(sb);
-
 	if (sb->s_op->remount_fs) {
 		retval = sb->s_op->remount_fs(sb, &flags, data);
 		if (retval) {