Message ID | 20230616015547.3155195-1-yi.zhang@huaweicloud.com |
---|---|
State | Accepted |
Headers | show |
Series | [v2] jbd2: skip reading super block if it has been verified | expand |
On Fri 16-06-23 09:55:47, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@huawei.com> > > We got a NULL pointer dereference issue below while running generic/475 > I/O failure pressure test. > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 0 P4D 0 > Oops: 0002 [#1] PREEMPT SMP PTI > CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190 > RIP: 0010:jbd2_journal_set_features+0x13d/0x430 > ... > Call Trace: > <TASK> > ? __die+0x23/0x60 > ? page_fault_oops+0xa4/0x170 > ? exc_page_fault+0x67/0x170 > ? asm_exc_page_fault+0x26/0x30 > ? jbd2_journal_set_features+0x13d/0x430 > jbd2_journal_revoke+0x47/0x1e0 > __ext4_forget+0xc3/0x1b0 > ext4_free_blocks+0x214/0x2f0 > ext4_free_branches+0xeb/0x270 > ext4_ind_truncate+0x2bf/0x320 > ext4_truncate+0x1e4/0x490 > ext4_handle_inode_extension+0x1bd/0x2a0 > ? iomap_dio_complete+0xaf/0x1d0 > > The root cause is the journal super block had been failed to write out > due to I/O fault injection, it's uptodate bit was cleared by > end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock(). > And it raced by journal_get_superblock()->bh_read(), unfortunately, the > read IO is also failed, so the error handling in > journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer, > finally lead to above NULL pointer dereference issue. > > If the journal super block had been read and verified, there is no need > to call bh_read() read it again even if it has been failed to written > out. So the fix could be simply move buffer_verified(bh) in front of > bh_read(). Also remove a stale comment left in > jbd2_journal_check_used_features(). > > Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()") > Reported-by: Theodore Ts'o <tytso@mit.edu> > Signed-off-by: Zhang Yi <yi.zhang@huawei.com> This works as a workaround. It is a bit kludgy but for now I guess it is good enough. Thanks for the fix and feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > v1->v2: > - Remove a stale comment left in jbd2_journal_check_used_features(). > > fs/jbd2/journal.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c > index b5e57735ab3f..559242df0f9a 100644 > --- a/fs/jbd2/journal.c > +++ b/fs/jbd2/journal.c > @@ -1919,6 +1919,9 @@ static int journal_get_superblock(journal_t *journal) > bh = journal->j_sb_buffer; > > J_ASSERT(bh != NULL); > + if (buffer_verified(bh)) > + return 0; > + > err = bh_read(bh, 0); > if (err < 0) { > printk(KERN_ERR > @@ -1926,9 +1929,6 @@ static int journal_get_superblock(journal_t *journal) > goto out; > } > > - if (buffer_verified(bh)) > - return 0; > - > sb = journal->j_superblock; > > err = -EINVAL; > @@ -2229,7 +2229,6 @@ int jbd2_journal_check_used_features(journal_t *journal, unsigned long compat, > > if (!compat && !ro && !incompat) > return 1; > - /* Load journal superblock if it is not loaded yet. */ > if (journal_get_superblock(journal)) > return 0; > if (!jbd2_format_support_feature(journal)) > -- > 2.39.2 >
On 2023/6/16 21:27, Jan Kara wrote: > On Fri 16-06-23 09:55:47, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@huawei.com> >> >> We got a NULL pointer dereference issue below while running generic/475 >> I/O failure pressure test. >> >> BUG: kernel NULL pointer dereference, address: 0000000000000000 >> #PF: supervisor write access in kernel mode >> #PF: error_code(0x0002) - not-present page >> PGD 0 P4D 0 >> Oops: 0002 [#1] PREEMPT SMP PTI >> CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190 >> RIP: 0010:jbd2_journal_set_features+0x13d/0x430 >> ... >> Call Trace: >> <TASK> >> ? __die+0x23/0x60 >> ? page_fault_oops+0xa4/0x170 >> ? exc_page_fault+0x67/0x170 >> ? asm_exc_page_fault+0x26/0x30 >> ? jbd2_journal_set_features+0x13d/0x430 >> jbd2_journal_revoke+0x47/0x1e0 >> __ext4_forget+0xc3/0x1b0 >> ext4_free_blocks+0x214/0x2f0 >> ext4_free_branches+0xeb/0x270 >> ext4_ind_truncate+0x2bf/0x320 >> ext4_truncate+0x1e4/0x490 >> ext4_handle_inode_extension+0x1bd/0x2a0 >> ? iomap_dio_complete+0xaf/0x1d0 >> >> The root cause is the journal super block had been failed to write out >> due to I/O fault injection, it's uptodate bit was cleared by >> end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock(). >> And it raced by journal_get_superblock()->bh_read(), unfortunately, the >> read IO is also failed, so the error handling in >> journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer, >> finally lead to above NULL pointer dereference issue. >> >> If the journal super block had been read and verified, there is no need >> to call bh_read() read it again even if it has been failed to written >> out. So the fix could be simply move buffer_verified(bh) in front of >> bh_read(). Also remove a stale comment left in >> jbd2_journal_check_used_features(). >> >> Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()") >> Reported-by: Theodore Ts'o <tytso@mit.edu> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> > > This works as a workaround. It is a bit kludgy but for now I guess it is > good enough. Thanks for the fix and feel free to add: > Thanks for the review. Yes, I suppose it's better to find a way to adjust the sequence of journal load and feature checking in ocfs2_check_volume(), so that we could completely remove the journal_get_superblock() in jbd2_journal_check_used_features(). Thanks, Yi. > > >> --- >> v1->v2: >> - Remove a stale comment left in jbd2_journal_check_used_features(). >> >> fs/jbd2/journal.c | 7 +++---- >> 1 file changed, 3 insertions(+), 4 deletions(-) >> >> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c >> index b5e57735ab3f..559242df0f9a 100644 >> --- a/fs/jbd2/journal.c >> +++ b/fs/jbd2/journal.c >> @@ -1919,6 +1919,9 @@ static int journal_get_superblock(journal_t *journal) >> bh = journal->j_sb_buffer; >> >> J_ASSERT(bh != NULL); >> + if (buffer_verified(bh)) >> + return 0; >> + >> err = bh_read(bh, 0); >> if (err < 0) { >> printk(KERN_ERR >> @@ -1926,9 +1929,6 @@ static int journal_get_superblock(journal_t *journal) >> goto out; >> } >> >> - if (buffer_verified(bh)) >> - return 0; >> - >> sb = journal->j_superblock; >> >> err = -EINVAL; >> @@ -2229,7 +2229,6 @@ int jbd2_journal_check_used_features(journal_t *journal, unsigned long compat, >> >> if (!compat && !ro && !incompat) >> return 1; >> - /* Load journal superblock if it is not loaded yet. */ >> if (journal_get_superblock(journal)) >> return 0; >> if (!jbd2_format_support_feature(journal)) >> -- >> 2.39.2 >>
On Sat, Jun 17, 2023 at 10:42:59AM +0800, Zhang Yi wrote: > > This works as a workaround. It is a bit kludgy but for now I guess it is > > good enough. Thanks for the fix and feel free to add: > > Thanks for the review. Yes, I suppose it's better to find a way to adjust > the sequence of journal load and feature checking in ocfs2_check_volume(), > so that we could completely remove the journal_get_superblock() in > jbd2_journal_check_used_features(). Indeed, thanks for the fix. This is would be for after the merge window, but I think we can clean this up in the jbd2 layer by simply moving the call to load_superblock() from jbd2_journal_load() and jbd2_journal_wipe() to journal_init_common(). This change would mean the journal superblock gets read as part of the call to jbd2_journal_init_{dev,inode}. That way, once the file system has a journal_t object, it's guaranteed that the j_sb_buffer contains valid data, and so we can drop the call to journal_get_superblock() from jbd2_journal_check_used_features(). And after we do that, we should be able to inline the code in load_superblock() and journal_get_superblock() into journal_init_common(), which would simplify things in jfs/jbd2/journal.c Finally, so we can provide better error handling, we could change Jbd2_journal_init_{dev,inode} to return an ERR_PTR instead of a NULL if there is a failure. And since it's a good idea to change the function name when changing the function signature, we could rename those functions to something like jbd2_open_{dev,inode} at the same time. - Ted P.S. The only reason why we don't load the superblock in jbd2_journal_init_{dev,common} was that back in 2001, it was possible to create the journal by creating a zero length file in the file system, noting the inode number of the file system, unmounting the file system from ext2, and then remounting it with "mount -t ext3 -o journal=NNN ...". In order to do this, the ext3 file system code called journal_init_inode() with the inode, and then follow it up with a call to journal_create(), which would actually write out the journal superblock. For that reason, journal_init_inode() had to avoid reading the journal superblock, since it might not be initialized yet. We removed jbd2_journal_create() from fs/jbd2 back in 2009, and it hadn't been in use for quite a while before that --- in fact, I'm not sure ext4 ever supported this ext3-style "let's create a journal without e2fsprogs support because Stephen Tweedie was implementing the ext3 journal kernel code without wanting to make changes to e2fsprogs first" feature. :-)
On 2023/6/18 2:50, Theodore Ts'o wrote: > On Sat, Jun 17, 2023 at 10:42:59AM +0800, Zhang Yi wrote: >>> This works as a workaround. It is a bit kludgy but for now I guess it is >>> good enough. Thanks for the fix and feel free to add: >> >> Thanks for the review. Yes, I suppose it's better to find a way to adjust >> the sequence of journal load and feature checking in ocfs2_check_volume(), >> so that we could completely remove the journal_get_superblock() in >> jbd2_journal_check_used_features(). > > Indeed, thanks for the fix. > > This is would be for after the merge window, but I think we can clean > this up in the jbd2 layer by simply moving the call to > load_superblock() from jbd2_journal_load() and jbd2_journal_wipe() to > journal_init_common(). This change would mean the journal superblock > gets read as part of the call to jbd2_journal_init_{dev,inode}. > > That way, once the file system has a journal_t object, it's guaranteed > that the j_sb_buffer contains valid data, and so we can drop the call > to journal_get_superblock() from jbd2_journal_check_used_features(). > > And after we do that, we should be able to inline the code in > load_superblock() and journal_get_superblock() into > journal_init_common(), which would simplify things in > jfs/jbd2/journal.c > > Finally, so we can provide better error handling, we could change > Jbd2_journal_init_{dev,inode} to return an ERR_PTR instead of a NULL > if there is a failure. And since it's a good idea to change the > function name when changing the function signature, we could rename > those functions to something like jbd2_open_{dev,inode} at the same > time. > > - Ted > > P.S. The only reason why we don't load the superblock in > jbd2_journal_init_{dev,common} was that back in 2001, it was possible > to create the journal by creating a zero length file in the file > system, noting the inode number of the file system, unmounting the > file system from ext2, and then remounting it with "mount -t ext3 -o > journal=NNN ...". In order to do this, the ext3 file system code > called journal_init_inode() with the inode, and then follow it up with > a call to journal_create(), which would actually write out the journal > superblock. For that reason, journal_init_inode() had to avoid > reading the journal superblock, since it might not be initialized yet. > > We removed jbd2_journal_create() from fs/jbd2 back in 2009, and it > hadn't been in use for quite a while before that --- in fact, I'm not > sure ext4 ever supported this ext3-style "let's create a journal > without e2fsprogs support because Stephen Tweedie was implementing the > ext3 journal kernel code without wanting to make changes to e2fsprogs > first" feature. :-) > Thanks for the suggestion and historical details, we could do the cleanup in jbd2 layer after the merge window. Thanks, Yi.
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index b5e57735ab3f..559242df0f9a 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1919,6 +1919,9 @@ static int journal_get_superblock(journal_t *journal) bh = journal->j_sb_buffer; J_ASSERT(bh != NULL); + if (buffer_verified(bh)) + return 0; + err = bh_read(bh, 0); if (err < 0) { printk(KERN_ERR @@ -1926,9 +1929,6 @@ static int journal_get_superblock(journal_t *journal) goto out; } - if (buffer_verified(bh)) - return 0; - sb = journal->j_superblock; err = -EINVAL; @@ -2229,7 +2229,6 @@ int jbd2_journal_check_used_features(journal_t *journal, unsigned long compat, if (!compat && !ro && !incompat) return 1; - /* Load journal superblock if it is not loaded yet. */ if (journal_get_superblock(journal)) return 0; if (!jbd2_format_support_feature(journal))