diff mbox series

[2/2] jbd2: increase the journal IO's priority

Message ID 20231125121740.1035816-2-yi.zhang@huaweicloud.com
State Superseded
Headers show
Series [1/2] jbd2: correct the printing of write_flags in jbd2_write_superblock() | expand

Commit Message

Zhang Yi Nov. 25, 2023, 12:17 p.m. UTC
From: Zhang Yi <yi.zhang@huawei.com>

Current jbd2 only add REQ_SYNC for descriptor block, metadata log
buffer, commit buffer and superblock buffer, the submitted IO could be
throttled by writeback throttle in block layer, that could lead to
priority inversion in some cases. The log IO looks like a kind of high
priority metadata IO, so it should not be throttled by WBT like QOS
policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from
writeback throttle, and also add REQ_META together indicates it's a
metadata IO.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/jbd2/commit.c     |  8 ++++----
 fs/jbd2/journal.c    | 20 +++++++++++---------
 include/linux/jbd2.h |  3 +++
 3 files changed, 18 insertions(+), 13 deletions(-)

Comments

Jan Kara Nov. 27, 2023, 4:11 p.m. UTC | #1
Hello!

On Sat 25-11-23 20:17:39, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Current jbd2 only add REQ_SYNC for descriptor block, metadata log
> buffer, commit buffer and superblock buffer, the submitted IO could be
> throttled by writeback throttle in block layer, that could lead to
> priority inversion in some cases. The log IO looks like a kind of high
> priority metadata IO, so it should not be throttled by WBT like QOS
> policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from
> writeback throttle, and also add REQ_META together indicates it's a
> metadata IO.

So I agree about REQ_META but with REQ_IDLE I'm not so sure. __REQ_IDLE is
documented as "anticipate more IO after this one" so that is a good fit for
normal transaction writes but not so much for journal superblock writes.
OTOH as far as I'm checking XFS uses REQ_IDLE as well for its log IO and
the only places where REQ_IDLE is really checked is in blk-wbt... So I
guess this makes sense.

> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 52772c826c86..f7e8274b46ae 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1374,6 +1374,9 @@ JBD2_FEATURE_INCOMPAT_FUNCS(csum2,		CSUM_V2)
>  JBD2_FEATURE_INCOMPAT_FUNCS(csum3,		CSUM_V3)
>  JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit,	FAST_COMMIT)
>  
> +/* Journal high priority write IO operation flags */
> +#define JBD2_REQ_HIPRIO		(REQ_META | REQ_SYNC | REQ_IDLE)

Since all journal IO is submitted with these flags I'd maybe name this
JBD2_JOURNAL_REQ_FLAGS? Otherwise the patch looks good to me so feel free
to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
Zhang Yi Nov. 28, 2023, 3:32 a.m. UTC | #2
Hello!

On 2023/11/28 0:11, Jan Kara wrote:
> Hello!
> 
> On Sat 25-11-23 20:17:39, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Current jbd2 only add REQ_SYNC for descriptor block, metadata log
>> buffer, commit buffer and superblock buffer, the submitted IO could be
>> throttled by writeback throttle in block layer, that could lead to
>> priority inversion in some cases. The log IO looks like a kind of high
>> priority metadata IO, so it should not be throttled by WBT like QOS
>> policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from
>> writeback throttle, and also add REQ_META together indicates it's a
>> metadata IO.
> 
> So I agree about REQ_META but with REQ_IDLE I'm not so sure. __REQ_IDLE is
> documented as "anticipate more IO after this one" so that is a good fit for
> normal transaction writes but not so much for journal superblock writes.
> OTOH as far as I'm checking XFS uses REQ_IDLE as well for its log IO and
> the only places where REQ_IDLE is really checked is in blk-wbt... So I
> guess this makes sense.
> 

Thanks a lot for your review. Yeah, We've checked the block cgroup related
qos policies and blk-wbt, block cgroup based policies does not throttle IO
with REQ_META, and blk-wbt exempt IO with both REQ_IDLE and REQ_SYNC. But
submit_bh() can make sure the journal IO is always bind to the root cgroup,
so only blk-wbt is left for us to deal with, so I add REQ_IDLE like xfs does.

>> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
>> index 52772c826c86..f7e8274b46ae 100644
>> --- a/include/linux/jbd2.h
>> +++ b/include/linux/jbd2.h
>> @@ -1374,6 +1374,9 @@ JBD2_FEATURE_INCOMPAT_FUNCS(csum2,		CSUM_V2)
>>  JBD2_FEATURE_INCOMPAT_FUNCS(csum3,		CSUM_V3)
>>  JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit,	FAST_COMMIT)
>>  
>> +/* Journal high priority write IO operation flags */
>> +#define JBD2_REQ_HIPRIO		(REQ_META | REQ_SYNC | REQ_IDLE)
> 
> Since all journal IO is submitted with these flags I'd maybe name this
> JBD2_JOURNAL_REQ_FLAGS? Otherwise the patch looks good to me so feel free
> to add:
> 

Sure, JBD2_JOURNAL_REQ_FLAGS looks fine, I will use it in v2.

Thanks,
Yi.
diff mbox series

Patch

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 8d6f934c3d95..58e4d4cbdf88 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -119,7 +119,7 @@  static int journal_submit_commit_record(journal_t *journal,
 	struct commit_header *tmp;
 	struct buffer_head *bh;
 	struct timespec64 now;
-	blk_opf_t write_flags = REQ_OP_WRITE | REQ_SYNC;
+	blk_opf_t write_flags = REQ_OP_WRITE | JBD2_REQ_HIPRIO;
 
 	*cbh = NULL;
 
@@ -395,8 +395,7 @@  void jbd2_journal_commit_transaction(journal_t *journal)
 		 */
 		jbd2_journal_update_sb_log_tail(journal,
 						journal->j_tail_sequence,
-						journal->j_tail,
-						REQ_SYNC);
+						journal->j_tail, 0);
 		mutex_unlock(&journal->j_checkpoint_mutex);
 	} else {
 		jbd2_debug(3, "superblock not updated\n");
@@ -715,6 +714,7 @@  void jbd2_journal_commit_transaction(journal_t *journal)
 
 			for (i = 0; i < bufs; i++) {
 				struct buffer_head *bh = wbuf[i];
+
 				/*
 				 * Compute checksum.
 				 */
@@ -727,7 +727,7 @@  void jbd2_journal_commit_transaction(journal_t *journal)
 				clear_buffer_dirty(bh);
 				set_buffer_uptodate(bh);
 				bh->b_end_io = journal_end_buffer_io_sync;
-				submit_bh(REQ_OP_WRITE | REQ_SYNC, bh);
+				submit_bh(REQ_OP_WRITE | JBD2_REQ_HIPRIO, bh);
 			}
 			cond_resched();
 
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index e7aa47a02d4d..f2921d728dcc 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1100,8 +1100,7 @@  int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block)
 	 * space and if we lose sb update during power failure we'd replay
 	 * old transaction with possibly newly overwritten data.
 	 */
-	ret = jbd2_journal_update_sb_log_tail(journal, tid, block,
-					      REQ_SYNC | REQ_FUA);
+	ret = jbd2_journal_update_sb_log_tail(journal, tid, block, REQ_FUA);
 	if (ret)
 		goto out;
 
@@ -1768,8 +1767,7 @@  static int journal_reset(journal_t *journal)
 		 */
 		jbd2_journal_update_sb_log_tail(journal,
 						journal->j_tail_sequence,
-						journal->j_tail,
-						REQ_SYNC | REQ_FUA);
+						journal->j_tail, REQ_FUA);
 		mutex_unlock(&journal->j_checkpoint_mutex);
 	}
 	return jbd2_journal_start_thread(journal);
@@ -1791,6 +1789,11 @@  static int jbd2_write_superblock(journal_t *journal, blk_opf_t write_flags)
 		return -EIO;
 	}
 
+	/*
+	 * Always set high priority flags to exempt from block layer's
+	 * QOS policies, e.g. writeback throttle.
+	 */
+	write_flags |= JBD2_REQ_HIPRIO;
 	if (!(journal->j_flags & JBD2_BARRIER))
 		write_flags &= ~(REQ_FUA | REQ_PREFLUSH);
 
@@ -2045,7 +2048,7 @@  void jbd2_journal_update_sb_errno(journal_t *journal)
 	jbd2_debug(1, "JBD2: updating superblock error (errno %d)\n", errcode);
 	sb->s_errno    = cpu_to_be32(errcode);
 
-	jbd2_write_superblock(journal, REQ_SYNC | REQ_FUA);
+	jbd2_write_superblock(journal, REQ_FUA);
 }
 EXPORT_SYMBOL(jbd2_journal_update_sb_errno);
 
@@ -2166,8 +2169,7 @@  int jbd2_journal_destroy(journal_t *journal)
 				++journal->j_transaction_sequence;
 			write_unlock(&journal->j_state_lock);
 
-			jbd2_mark_journal_empty(journal,
-					REQ_SYNC | REQ_PREFLUSH | REQ_FUA);
+			jbd2_mark_journal_empty(journal, REQ_PREFLUSH | REQ_FUA);
 			mutex_unlock(&journal->j_checkpoint_mutex);
 		} else
 			err = -EIO;
@@ -2468,7 +2470,7 @@  int jbd2_journal_flush(journal_t *journal, unsigned int flags)
 	 * the magic code for a fully-recovered superblock.  Any future
 	 * commits of data to the journal will restore the current
 	 * s_start value. */
-	jbd2_mark_journal_empty(journal, REQ_SYNC | REQ_FUA);
+	jbd2_mark_journal_empty(journal, REQ_FUA);
 
 	if (flags)
 		err = __jbd2_journal_erase(journal, flags);
@@ -2514,7 +2516,7 @@  int jbd2_journal_wipe(journal_t *journal, int write)
 	if (write) {
 		/* Lock to make assertions happy... */
 		mutex_lock_io(&journal->j_checkpoint_mutex);
-		jbd2_mark_journal_empty(journal, REQ_SYNC | REQ_FUA);
+		jbd2_mark_journal_empty(journal, REQ_FUA);
 		mutex_unlock(&journal->j_checkpoint_mutex);
 	}
 
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 52772c826c86..f7e8274b46ae 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1374,6 +1374,9 @@  JBD2_FEATURE_INCOMPAT_FUNCS(csum2,		CSUM_V2)
 JBD2_FEATURE_INCOMPAT_FUNCS(csum3,		CSUM_V3)
 JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit,	FAST_COMMIT)
 
+/* Journal high priority write IO operation flags */
+#define JBD2_REQ_HIPRIO		(REQ_META | REQ_SYNC | REQ_IDLE)
+
 /*
  * Journal flag definitions
  */