Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"

Message ID 20190201044219.12802-1-tytso@mit.edu
State Awaiting Upstream
Headers show
Series
  • Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
Related show

Commit Message

Theodore Ts'o Feb. 1, 2019, 4:42 a.m.
This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.

As Jan Kara pointed out, this change was unsafe since it means we lose
the call to sync_mapping_buffers() in the nojournal case.  The
original point of the commit was avoid taking the inode mutex (since
it causes a lockdep warning in generic/113); but we need the mutex in
order to call sync_mapping_buffers().

The real fix to this problem was discussed here:

https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org

The proposed patch was to fix a syzbot complaint, but the problem can
also demonstrated via "kvm-xfstests -c nojournal generic/113".
Multiple solutions were discused in the e-mail thread, but none have
landed in the kernel as of this writing.  Anyway, commit
ad211f3e94b314 is absolutely the wrong way to suppress the lockdep, so
revert it.

Fixes: ad211f3e94b314a910d4af03178a0b52a7d1ee0a ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported: Jan Kara <jack@suse.cz>
---
 fs/ext4/fsync.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

Comments

Jan Kara Feb. 1, 2019, 9:21 p.m. | #1
On Thu 31-01-19 23:42:19, Theodore Ts'o wrote:
> This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.
> 
> As Jan Kara pointed out, this change was unsafe since it means we lose
> the call to sync_mapping_buffers() in the nojournal case.  The
> original point of the commit was avoid taking the inode mutex (since
> it causes a lockdep warning in generic/113); but we need the mutex in
> order to call sync_mapping_buffers().

Actually, I don't think sync_mapping_buffers() needs inode mutex (i_rwsem
these days). It uses blkdev_mapping->private_lock for synchronization of
operations on the list of buffers and fsync_buffers_list() seems to be
pretty careful about races with mark_buffer_dirty_inode(). So why do you
think we need i_rwsem?

> The real fix to this problem was discussed here:
> 
> https://lore.kernel.org/lkml/20181025150540.259281-4-bvanassche@acm.org
> 
> The proposed patch was to fix a syzbot complaint, but the problem can
> also demonstrated via "kvm-xfstests -c nojournal generic/113".
> Multiple solutions were discused in the e-mail thread, but none have
> landed in the kernel as of this writing.  Anyway, commit
> ad211f3e94b314 is absolutely the wrong way to suppress the lockdep, so
> revert it.
> 
> Fixes: ad211f3e94b314a910d4af03178a0b52a7d1ee0a ("ext4: use ext4_write_inode() when fsyncing w/o a journal")
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Reported: Jan Kara <jack@suse.cz>

So if you decide to go via a safe way of reverting the change, I'm fine
with that so feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/fsync.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 712f00995390..5508baa11bb6 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -116,16 +116,8 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  		goto out;
>  	}
>  
> -	ret = file_write_and_wait_range(file, start, end);
> -	if (ret)
> -		return ret;
> -
>  	if (!journal) {
> -		struct writeback_control wbc = {
> -			.sync_mode = WB_SYNC_ALL
> -		};
> -
> -		ret = ext4_write_inode(inode, &wbc);
> +		ret = __generic_file_fsync(file, start, end, datasync);
>  		if (!ret)
>  			ret = ext4_sync_parent(inode);
>  		if (test_opt(inode->i_sb, BARRIER))
> @@ -133,6 +125,9 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  		goto out;
>  	}
>  
> +	ret = file_write_and_wait_range(file, start, end);
> +	if (ret)
> +		return ret;
>  	/*
>  	 * data=writeback,ordered:
>  	 *  The caller's filemap_fdatawrite()/wait will sync the data.
> -- 
> 2.19.1
>
Theodore Ts'o Feb. 2, 2019, 4:08 a.m. | #2
On Fri, Feb 01, 2019 at 10:21:20PM +0100, Jan Kara wrote:
> On Thu 31-01-19 23:42:19, Theodore Ts'o wrote:
> > This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.
> > 
> > As Jan Kara pointed out, this change was unsafe since it means we lose
> > the call to sync_mapping_buffers() in the nojournal case.  The
> > original point of the commit was avoid taking the inode mutex (since
> > it causes a lockdep warning in generic/113); but we need the mutex in
> > order to call sync_mapping_buffers().
> 
> Actually, I don't think sync_mapping_buffers() needs inode mutex (i_rwsem
> these days). It uses blkdev_mapping->private_lock for synchronization of
> operations on the list of buffers and fsync_buffers_list() seems to be
> pretty careful about races with mark_buffer_dirty_inode(). So why do you
> think we need i_rwsem?

Hmm, I think you're right.  I wonder if we can therefore remove the
inode_lock() in __generic_file_fsync() then...   What do you think?

     			       		      	 - Ted
Jan Kara Feb. 4, 2019, 9:45 a.m. | #3
On Fri 01-02-19 23:08:11, Theodore Y. Ts'o wrote:
> On Fri, Feb 01, 2019 at 10:21:20PM +0100, Jan Kara wrote:
> > On Thu 31-01-19 23:42:19, Theodore Ts'o wrote:
> > > This reverts commit ad211f3e94b314a910d4af03178a0b52a7d1ee0a.
> > > 
> > > As Jan Kara pointed out, this change was unsafe since it means we lose
> > > the call to sync_mapping_buffers() in the nojournal case.  The
> > > original point of the commit was avoid taking the inode mutex (since
> > > it causes a lockdep warning in generic/113); but we need the mutex in
> > > order to call sync_mapping_buffers().
> > 
> > Actually, I don't think sync_mapping_buffers() needs inode mutex (i_rwsem
> > these days). It uses blkdev_mapping->private_lock for synchronization of
> > operations on the list of buffers and fsync_buffers_list() seems to be
> > pretty careful about races with mark_buffer_dirty_inode(). So why do you
> > think we need i_rwsem?
> 
> Hmm, I think you're right.  I wonder if we can therefore remove the
> inode_lock() in __generic_file_fsync() then...   What do you think?

That's actually a good question. I was thinking about why we have
inode_lock() in __generic_file_fsync().  The only reason I could come up
with is that when fsync(2) races with write(2) or truncate(2), with
inode_lock() in __generic_file_fsync() you will either get old or new
metadata state on disk. Without inode_lock() you could get some
intermediate metadata state and thus after a crash may not be able to see
even the old data. We are here on the thin ice of how good data consistency
do we provide after a crash for non-journalling filesystems. It is never
going to be perfect but this change would seem like a noticeable regression
to me. What do you think?

								Honza

Patch

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 712f00995390..5508baa11bb6 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -116,16 +116,8 @@  int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		goto out;
 	}
 
-	ret = file_write_and_wait_range(file, start, end);
-	if (ret)
-		return ret;
-
 	if (!journal) {
-		struct writeback_control wbc = {
-			.sync_mode = WB_SYNC_ALL
-		};
-
-		ret = ext4_write_inode(inode, &wbc);
+		ret = __generic_file_fsync(file, start, end, datasync);
 		if (!ret)
 			ret = ext4_sync_parent(inode);
 		if (test_opt(inode->i_sb, BARRIER))
@@ -133,6 +125,9 @@  int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		goto out;
 	}
 
+	ret = file_write_and_wait_range(file, start, end);
+	if (ret)
+		return ret;
 	/*
 	 * data=writeback,ordered:
 	 *  The caller's filemap_fdatawrite()/wait will sync the data.