Message ID | 20231125121740.1035816-2-yi.zhang@huaweicloud.com |
---|---|
State | Superseded |
Headers | show |
Series | [1/2] jbd2: correct the printing of write_flags in jbd2_write_superblock() | expand |
Hello! On Sat 25-11-23 20:17:39, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@huawei.com> > > Current jbd2 only add REQ_SYNC for descriptor block, metadata log > buffer, commit buffer and superblock buffer, the submitted IO could be > throttled by writeback throttle in block layer, that could lead to > priority inversion in some cases. The log IO looks like a kind of high > priority metadata IO, so it should not be throttled by WBT like QOS > policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from > writeback throttle, and also add REQ_META together indicates it's a > metadata IO. So I agree about REQ_META but with REQ_IDLE I'm not so sure. __REQ_IDLE is documented as "anticipate more IO after this one" so that is a good fit for normal transaction writes but not so much for journal superblock writes. OTOH as far as I'm checking XFS uses REQ_IDLE as well for its log IO and the only places where REQ_IDLE is really checked is in blk-wbt... So I guess this makes sense. > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h > index 52772c826c86..f7e8274b46ae 100644 > --- a/include/linux/jbd2.h > +++ b/include/linux/jbd2.h > @@ -1374,6 +1374,9 @@ JBD2_FEATURE_INCOMPAT_FUNCS(csum2, CSUM_V2) > JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3) > JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit, FAST_COMMIT) > > +/* Journal high priority write IO operation flags */ > +#define JBD2_REQ_HIPRIO (REQ_META | REQ_SYNC | REQ_IDLE) Since all journal IO is submitted with these flags I'd maybe name this JBD2_JOURNAL_REQ_FLAGS? Otherwise the patch looks good to me so feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza
Hello! On 2023/11/28 0:11, Jan Kara wrote: > Hello! > > On Sat 25-11-23 20:17:39, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@huawei.com> >> >> Current jbd2 only add REQ_SYNC for descriptor block, metadata log >> buffer, commit buffer and superblock buffer, the submitted IO could be >> throttled by writeback throttle in block layer, that could lead to >> priority inversion in some cases. The log IO looks like a kind of high >> priority metadata IO, so it should not be throttled by WBT like QOS >> policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from >> writeback throttle, and also add REQ_META together indicates it's a >> metadata IO. > > So I agree about REQ_META but with REQ_IDLE I'm not so sure. __REQ_IDLE is > documented as "anticipate more IO after this one" so that is a good fit for > normal transaction writes but not so much for journal superblock writes. > OTOH as far as I'm checking XFS uses REQ_IDLE as well for its log IO and > the only places where REQ_IDLE is really checked is in blk-wbt... So I > guess this makes sense. > Thanks a lot for your review. Yeah, We've checked the block cgroup related qos policies and blk-wbt, block cgroup based policies does not throttle IO with REQ_META, and blk-wbt exempt IO with both REQ_IDLE and REQ_SYNC. But submit_bh() can make sure the journal IO is always bind to the root cgroup, so only blk-wbt is left for us to deal with, so I add REQ_IDLE like xfs does. >> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h >> index 52772c826c86..f7e8274b46ae 100644 >> --- a/include/linux/jbd2.h >> +++ b/include/linux/jbd2.h >> @@ -1374,6 +1374,9 @@ JBD2_FEATURE_INCOMPAT_FUNCS(csum2, CSUM_V2) >> JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3) >> JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit, FAST_COMMIT) >> >> +/* Journal high priority write IO operation flags */ >> +#define JBD2_REQ_HIPRIO (REQ_META | REQ_SYNC | REQ_IDLE) > > Since all journal IO is submitted with these flags I'd maybe name this > JBD2_JOURNAL_REQ_FLAGS? Otherwise the patch looks good to me so feel free > to add: > Sure, JBD2_JOURNAL_REQ_FLAGS looks fine, I will use it in v2. Thanks, Yi.
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 8d6f934c3d95..58e4d4cbdf88 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -119,7 +119,7 @@ static int journal_submit_commit_record(journal_t *journal, struct commit_header *tmp; struct buffer_head *bh; struct timespec64 now; - blk_opf_t write_flags = REQ_OP_WRITE | REQ_SYNC; + blk_opf_t write_flags = REQ_OP_WRITE | JBD2_REQ_HIPRIO; *cbh = NULL; @@ -395,8 +395,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) */ jbd2_journal_update_sb_log_tail(journal, journal->j_tail_sequence, - journal->j_tail, - REQ_SYNC); + journal->j_tail, 0); mutex_unlock(&journal->j_checkpoint_mutex); } else { jbd2_debug(3, "superblock not updated\n"); @@ -715,6 +714,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) for (i = 0; i < bufs; i++) { struct buffer_head *bh = wbuf[i]; + /* * Compute checksum. */ @@ -727,7 +727,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) clear_buffer_dirty(bh); set_buffer_uptodate(bh); bh->b_end_io = journal_end_buffer_io_sync; - submit_bh(REQ_OP_WRITE | REQ_SYNC, bh); + submit_bh(REQ_OP_WRITE | JBD2_REQ_HIPRIO, bh); } cond_resched(); diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index e7aa47a02d4d..f2921d728dcc 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1100,8 +1100,7 @@ int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block) * space and if we lose sb update during power failure we'd replay * old transaction with possibly newly overwritten data. */ - ret = jbd2_journal_update_sb_log_tail(journal, tid, block, - REQ_SYNC | REQ_FUA); + ret = jbd2_journal_update_sb_log_tail(journal, tid, block, REQ_FUA); if (ret) goto out; @@ -1768,8 +1767,7 @@ static int journal_reset(journal_t *journal) */ jbd2_journal_update_sb_log_tail(journal, journal->j_tail_sequence, - journal->j_tail, - REQ_SYNC | REQ_FUA); + journal->j_tail, REQ_FUA); mutex_unlock(&journal->j_checkpoint_mutex); } return jbd2_journal_start_thread(journal); @@ -1791,6 +1789,11 @@ static int jbd2_write_superblock(journal_t *journal, blk_opf_t write_flags) return -EIO; } + /* + * Always set high priority flags to exempt from block layer's + * QOS policies, e.g. writeback throttle. + */ + write_flags |= JBD2_REQ_HIPRIO; if (!(journal->j_flags & JBD2_BARRIER)) write_flags &= ~(REQ_FUA | REQ_PREFLUSH); @@ -2045,7 +2048,7 @@ void jbd2_journal_update_sb_errno(journal_t *journal) jbd2_debug(1, "JBD2: updating superblock error (errno %d)\n", errcode); sb->s_errno = cpu_to_be32(errcode); - jbd2_write_superblock(journal, REQ_SYNC | REQ_FUA); + jbd2_write_superblock(journal, REQ_FUA); } EXPORT_SYMBOL(jbd2_journal_update_sb_errno); @@ -2166,8 +2169,7 @@ int jbd2_journal_destroy(journal_t *journal) ++journal->j_transaction_sequence; write_unlock(&journal->j_state_lock); - jbd2_mark_journal_empty(journal, - REQ_SYNC | REQ_PREFLUSH | REQ_FUA); + jbd2_mark_journal_empty(journal, REQ_PREFLUSH | REQ_FUA); mutex_unlock(&journal->j_checkpoint_mutex); } else err = -EIO; @@ -2468,7 +2470,7 @@ int jbd2_journal_flush(journal_t *journal, unsigned int flags) * the magic code for a fully-recovered superblock. Any future * commits of data to the journal will restore the current * s_start value. */ - jbd2_mark_journal_empty(journal, REQ_SYNC | REQ_FUA); + jbd2_mark_journal_empty(journal, REQ_FUA); if (flags) err = __jbd2_journal_erase(journal, flags); @@ -2514,7 +2516,7 @@ int jbd2_journal_wipe(journal_t *journal, int write) if (write) { /* Lock to make assertions happy... */ mutex_lock_io(&journal->j_checkpoint_mutex); - jbd2_mark_journal_empty(journal, REQ_SYNC | REQ_FUA); + jbd2_mark_journal_empty(journal, REQ_FUA); mutex_unlock(&journal->j_checkpoint_mutex); } diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 52772c826c86..f7e8274b46ae 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1374,6 +1374,9 @@ JBD2_FEATURE_INCOMPAT_FUNCS(csum2, CSUM_V2) JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3) JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit, FAST_COMMIT) +/* Journal high priority write IO operation flags */ +#define JBD2_REQ_HIPRIO (REQ_META | REQ_SYNC | REQ_IDLE) + /* * Journal flag definitions */