| Message ID | 20251107144249.435029-22-libaokun@huaweicloud.com |
|---|---|
| State | Superseded |
| Headers | show |
| Series | ext4: enable block size larger than page size | expand |
On Fri 07-11-25 22:42:46, libaokun@huaweicloud.com wrote: > From: Baokun Li <libaokun1@huawei.com> > > Currently, ext4_set_inode_mapping_order() does not set max folio order > for files with the data journalling flag. For files that already have > large folios enabled, ext4_inode_journal_mode() ignores the data > journalling flag once max folio order is set. > > This is not because data journalling cannot work with large folios, but > because credit estimates will go through the roof if there are too many > blocks per folio. > > Since the real constraint is blocks-per-folio, to support data=journal > under LBS, we now set max folio order to be equal to min folio order for > files with the journalling flag. When LBS is disabled, the max folio order > remains unset as before. > > Additionally, the max_order check in ext4_inode_journal_mode() is removed, > and mapping order is reset in ext4_change_inode_journal_flag(). > > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Baokun Li <libaokun1@huawei.com> ... > @@ -6585,6 +6590,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) > ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA); > } > ext4_set_aops(inode); > + ext4_set_inode_mapping_order(inode); > > jbd2_journal_unlock_updates(journal); > ext4_writepages_up_write(inode->i_sb, alloc_ctx); I think more needs to be done here because this way we could leave folios in the page cache that would be now larger than max order. To simplify the logic I'd make filemap_write_and_wait() call in ext4_change_inode_journal_flag() unconditional and add there truncate_pagecache() call to evict all the page cache before we switch the inode journalling mode. Honza
On 2025-11-10 17:48, Jan Kara wrote: > On Fri 07-11-25 22:42:46, libaokun@huaweicloud.com wrote: >> From: Baokun Li <libaokun1@huawei.com> >> >> Currently, ext4_set_inode_mapping_order() does not set max folio order >> for files with the data journalling flag. For files that already have >> large folios enabled, ext4_inode_journal_mode() ignores the data >> journalling flag once max folio order is set. >> >> This is not because data journalling cannot work with large folios, but >> because credit estimates will go through the roof if there are too many >> blocks per folio. >> >> Since the real constraint is blocks-per-folio, to support data=journal >> under LBS, we now set max folio order to be equal to min folio order for >> files with the journalling flag. When LBS is disabled, the max folio order >> remains unset as before. >> >> Additionally, the max_order check in ext4_inode_journal_mode() is removed, >> and mapping order is reset in ext4_change_inode_journal_flag(). >> >> Suggested-by: Jan Kara <jack@suse.cz> >> Signed-off-by: Baokun Li <libaokun1@huawei.com> > ... > >> @@ -6585,6 +6590,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) >> ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA); >> } >> ext4_set_aops(inode); >> + ext4_set_inode_mapping_order(inode); >> >> jbd2_journal_unlock_updates(journal); >> ext4_writepages_up_write(inode->i_sb, alloc_ctx); > I think more needs to be done here because this way we could leave folios > in the page cache that would be now larger than max order. To simplify the > logic I'd make filemap_write_and_wait() call in > ext4_change_inode_journal_flag() unconditional and add there > truncate_pagecache() call to evict all the page cache before we switch the > inode journalling mode. > > Honza That makes sense. I forgot to truncate the old page cache here. I will make the changes according to your suggestion in the next version. Thank you for your advice! Cheers, Baokun
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index a0e66bc10093..05e5946ed9b3 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -16,8 +16,7 @@ int ext4_inode_journal_mode(struct inode *inode) ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) || test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA || (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) && - !test_opt(inode->i_sb, DELALLOC) && - !mapping_large_folio_support(inode->i_mapping))) { + !test_opt(inode->i_sb, DELALLOC))) { /* We do not support data journalling for encrypted data */ if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode)) return EXT4_INODE_ORDERED_DATA_MODE; /* ordered */ diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 22d215f90c64..517701024d18 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5152,9 +5152,6 @@ static bool ext4_should_enable_large_folio(struct inode *inode) if (!S_ISREG(inode->i_mode)) return false; - if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA || - ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA)) - return false; if (ext4_has_feature_verity(sb)) return false; if (ext4_has_feature_encrypt(sb)) @@ -5172,12 +5169,20 @@ static bool ext4_should_enable_large_folio(struct inode *inode) umin(MAX_PAGECACHE_ORDER, (11 + (i)->i_blkbits - PAGE_SHIFT)) void ext4_set_inode_mapping_order(struct inode *inode) { + u32 max_order; + if (!ext4_should_enable_large_folio(inode)) return; + if (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA || + ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA)) + max_order = EXT4_SB(inode->i_sb)->s_min_folio_order; + else + max_order = EXT4_MAX_PAGECACHE_ORDER(inode); + mapping_set_folio_order_range(inode->i_mapping, EXT4_SB(inode->i_sb)->s_min_folio_order, - EXT4_MAX_PAGECACHE_ORDER(inode)); + max_order); } struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, @@ -6585,6 +6590,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) ext4_clear_inode_flag(inode, EXT4_INODE_JOURNAL_DATA); } ext4_set_aops(inode); + ext4_set_inode_mapping_order(inode); jbd2_journal_unlock_updates(journal); ext4_writepages_up_write(inode->i_sb, alloc_ctx);