diff mbox series

ext4: remove checks for data=ordered and journal_async_commit options

Message ID 20240320105808.1184826-1-ks77sj@gmail.com
State New
Headers show
Series ext4: remove checks for data=ordered and journal_async_commit options | expand

Commit Message

Jongseok Kim March 20, 2024, 10:58 a.m. UTC
On Tue, Nov 25, 2014 at 04:56:15PM +0100, Jan Kara wrote:
> Option journal_async_commit breaks gurantees of data=ordered mode as it
> sends only a single cache flush after writing a transaction commit
> block. Thus even though the transaction including the commit block is
> fully stored on persistent storage, file data may still linger in drives
> caches and will be lost on power failure. Since all checksums match on
> journal recovery, we replay the transaction thus possibly exposing stale
> user data.
>
> To fix this data exposure issue, remove the possibility to use
> journal_async_commit in data=ordered mode.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

In employing the journal_async_commit feature, the approach involves
slightly early submission of a request for a transaction commit block
without the inclusion of PREFLUSH and FUA. Instead, once all the blocks of
the transaction have been written, the journal device is flushed once.
This methodology, even under data=ordered mode, due to the procedure of
flushing the j_fs_dev, guarantees that the file data is permanently stored
by the time the commit block request is initiated.

This discussion pertains to scenarios where the file data device and
the journal device are distinct. If the devices are the same,
then all file data is written before the flush of the journal device,
making a single flush sufficient in this context.

Consequently, it remains entirely permissible to activate
the journal_async_commit option, even when operating in ordered mode.

Should my interpretation deviate in any manner,
I earnestly request your guidance and correction.

Signed-off-by: Jongseok Kim <ks77sj@gmail.com>
---
 fs/ext4/super.c | 14 --------------
 1 file changed, 14 deletions(-)

Comments

Jan Kara March 20, 2024, 12:03 p.m. UTC | #1
On Wed 20-03-24 19:58:08, Jongseok Kim wrote:
> On Tue, Nov 25, 2014 at 04:56:15PM +0100, Jan Kara wrote:
> > Option journal_async_commit breaks gurantees of data=ordered mode as it
> > sends only a single cache flush after writing a transaction commit
> > block. Thus even though the transaction including the commit block is
> > fully stored on persistent storage, file data may still linger in drives
> > caches and will be lost on power failure. Since all checksums match on
> > journal recovery, we replay the transaction thus possibly exposing stale
> > user data.
> >
> > To fix this data exposure issue, remove the possibility to use
> > journal_async_commit in data=ordered mode.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> In employing the journal_async_commit feature, the approach involves
> slightly early submission of a request for a transaction commit block
> without the inclusion of PREFLUSH and FUA. Instead, once all the blocks of
> the transaction have been written, the journal device is flushed once.
> This methodology, even under data=ordered mode, due to the procedure of
> flushing the j_fs_dev, guarantees that the file data is permanently stored
> by the time the commit block request is initiated.
> 
> This discussion pertains to scenarios where the file data device and
> the journal device are distinct. If the devices are the same,
> then all file data is written before the flush of the journal device,
> making a single flush sufficient in this context.
> 
> Consequently, it remains entirely permissible to activate
> the journal_async_commit option, even when operating in ordered mode.

So I agree that when journal is on a different device jbd2 will flush the
data device before writing commit record so we are safe in data=ordered
mode even with async commit. Good idea. But:

1) For dealing with this complication we need to have a sensible usecase.
Do you have performance numbers from this combination that show async
commit brings noticeable performance benefit? Please include them in the
changelog.

2) The code should make sure we are really using external journal device
before allowing async commit in data=ordered mode.

Thanks!

								Honza


> Should my interpretation deviate in any manner,
> I earnestly request your guidance and correction.
> 
> Signed-off-by: Jongseok Kim <ks77sj@gmail.com>
> ---
>  fs/ext4/super.c | 14 --------------
>  1 file changed, 14 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index cfb8449c731f..2141c2eb4bf0 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4954,13 +4954,6 @@ static int ext4_load_and_init_journal(struct super_block *sb,
>  		break;
>  	}
>  
> -	if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_ORDERED_DATA &&
> -	    test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
> -		ext4_msg(sb, KERN_ERR, "can't mount with "
> -			"journal_async_commit in data=ordered mode");
> -		goto out;
> -	}
> -
>  	set_task_ioprio(sbi->s_journal->j_task, ctx->journal_ioprio);
>  
>  	sbi->s_journal->j_submit_inode_data_buffers =
> @@ -6510,13 +6503,6 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
>  			err = -EINVAL;
>  			goto restore_opts;
>  		}
> -	} else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_ORDERED_DATA) {
> -		if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
> -			ext4_msg(sb, KERN_ERR, "can't mount with "
> -				"journal_async_commit in data=ordered mode");
> -			err = -EINVAL;
> -			goto restore_opts;
> -		}
>  	}
>  
>  	if ((sbi->s_mount_opt ^ old_opts.s_mount_opt) & EXT4_MOUNT_NO_MBCACHE) {
> -- 
> 2.34.1
>
diff mbox series

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index cfb8449c731f..2141c2eb4bf0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4954,13 +4954,6 @@  static int ext4_load_and_init_journal(struct super_block *sb,
 		break;
 	}
 
-	if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_ORDERED_DATA &&
-	    test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
-		ext4_msg(sb, KERN_ERR, "can't mount with "
-			"journal_async_commit in data=ordered mode");
-		goto out;
-	}
-
 	set_task_ioprio(sbi->s_journal->j_task, ctx->journal_ioprio);
 
 	sbi->s_journal->j_submit_inode_data_buffers =
@@ -6510,13 +6503,6 @@  static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
 			err = -EINVAL;
 			goto restore_opts;
 		}
-	} else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_ORDERED_DATA) {
-		if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
-			ext4_msg(sb, KERN_ERR, "can't mount with "
-				"journal_async_commit in data=ordered mode");
-			err = -EINVAL;
-			goto restore_opts;
-		}
 	}
 
 	if ((sbi->s_mount_opt ^ old_opts.s_mount_opt) & EXT4_MOUNT_NO_MBCACHE) {