diff mbox series

[v2] jbd2: skip reading super block if it has been verified

Message ID 20230616015547.3155195-1-yi.zhang@huaweicloud.com
State Accepted
Headers show
Series [v2] jbd2: skip reading super block if it has been verified | expand

Commit Message

Zhang Yi June 16, 2023, 1:55 a.m. UTC
From: Zhang Yi <yi.zhang@huawei.com>

We got a NULL pointer dereference issue below while running generic/475
I/O failure pressure test.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 0 P4D 0
 Oops: 0002 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190
 RIP: 0010:jbd2_journal_set_features+0x13d/0x430
 ...
 Call Trace:
  <TASK>
  ? __die+0x23/0x60
  ? page_fault_oops+0xa4/0x170
  ? exc_page_fault+0x67/0x170
  ? asm_exc_page_fault+0x26/0x30
  ? jbd2_journal_set_features+0x13d/0x430
  jbd2_journal_revoke+0x47/0x1e0
  __ext4_forget+0xc3/0x1b0
  ext4_free_blocks+0x214/0x2f0
  ext4_free_branches+0xeb/0x270
  ext4_ind_truncate+0x2bf/0x320
  ext4_truncate+0x1e4/0x490
  ext4_handle_inode_extension+0x1bd/0x2a0
  ? iomap_dio_complete+0xaf/0x1d0

The root cause is the journal super block had been failed to write out
due to I/O fault injection, it's uptodate bit was cleared by
end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock().
And it raced by journal_get_superblock()->bh_read(), unfortunately, the
read IO is also failed, so the error handling in
journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer,
finally lead to above NULL pointer dereference issue.

If the journal super block had been read and verified, there is no need
to call bh_read() read it again even if it has been failed to written
out. So the fix could be simply move buffer_verified(bh) in front of
bh_read(). Also remove a stale comment left in
jbd2_journal_check_used_features().

Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()")
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
v1->v2:
 - Remove a stale comment left in jbd2_journal_check_used_features().

 fs/jbd2/journal.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Jan Kara June 16, 2023, 1:27 p.m. UTC | #1
On Fri 16-06-23 09:55:47, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> We got a NULL pointer dereference issue below while running generic/475
> I/O failure pressure test.
> 
>  BUG: kernel NULL pointer dereference, address: 0000000000000000
>  #PF: supervisor write access in kernel mode
>  #PF: error_code(0x0002) - not-present page
>  PGD 0 P4D 0
>  Oops: 0002 [#1] PREEMPT SMP PTI
>  CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190
>  RIP: 0010:jbd2_journal_set_features+0x13d/0x430
>  ...
>  Call Trace:
>   <TASK>
>   ? __die+0x23/0x60
>   ? page_fault_oops+0xa4/0x170
>   ? exc_page_fault+0x67/0x170
>   ? asm_exc_page_fault+0x26/0x30
>   ? jbd2_journal_set_features+0x13d/0x430
>   jbd2_journal_revoke+0x47/0x1e0
>   __ext4_forget+0xc3/0x1b0
>   ext4_free_blocks+0x214/0x2f0
>   ext4_free_branches+0xeb/0x270
>   ext4_ind_truncate+0x2bf/0x320
>   ext4_truncate+0x1e4/0x490
>   ext4_handle_inode_extension+0x1bd/0x2a0
>   ? iomap_dio_complete+0xaf/0x1d0
> 
> The root cause is the journal super block had been failed to write out
> due to I/O fault injection, it's uptodate bit was cleared by
> end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock().
> And it raced by journal_get_superblock()->bh_read(), unfortunately, the
> read IO is also failed, so the error handling in
> journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer,
> finally lead to above NULL pointer dereference issue.
> 
> If the journal super block had been read and verified, there is no need
> to call bh_read() read it again even if it has been failed to written
> out. So the fix could be simply move buffer_verified(bh) in front of
> bh_read(). Also remove a stale comment left in
> jbd2_journal_check_used_features().
> 
> Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()")
> Reported-by: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

This works as a workaround. It is a bit kludgy but for now I guess it is
good enough. Thanks for the fix and feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
> v1->v2:
>  - Remove a stale comment left in jbd2_journal_check_used_features().
> 
>  fs/jbd2/journal.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index b5e57735ab3f..559242df0f9a 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -1919,6 +1919,9 @@ static int journal_get_superblock(journal_t *journal)
>  	bh = journal->j_sb_buffer;
>  
>  	J_ASSERT(bh != NULL);
> +	if (buffer_verified(bh))
> +		return 0;
> +
>  	err = bh_read(bh, 0);
>  	if (err < 0) {
>  		printk(KERN_ERR
> @@ -1926,9 +1929,6 @@ static int journal_get_superblock(journal_t *journal)
>  		goto out;
>  	}
>  
> -	if (buffer_verified(bh))
> -		return 0;
> -
>  	sb = journal->j_superblock;
>  
>  	err = -EINVAL;
> @@ -2229,7 +2229,6 @@ int jbd2_journal_check_used_features(journal_t *journal, unsigned long compat,
>  
>  	if (!compat && !ro && !incompat)
>  		return 1;
> -	/* Load journal superblock if it is not loaded yet. */
>  	if (journal_get_superblock(journal))
>  		return 0;
>  	if (!jbd2_format_support_feature(journal))
> -- 
> 2.39.2
>
Zhang Yi June 17, 2023, 2:42 a.m. UTC | #2
On 2023/6/16 21:27, Jan Kara wrote:
> On Fri 16-06-23 09:55:47, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> We got a NULL pointer dereference issue below while running generic/475
>> I/O failure pressure test.
>>
>>  BUG: kernel NULL pointer dereference, address: 0000000000000000
>>  #PF: supervisor write access in kernel mode
>>  #PF: error_code(0x0002) - not-present page
>>  PGD 0 P4D 0
>>  Oops: 0002 [#1] PREEMPT SMP PTI
>>  CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190
>>  RIP: 0010:jbd2_journal_set_features+0x13d/0x430
>>  ...
>>  Call Trace:
>>   <TASK>
>>   ? __die+0x23/0x60
>>   ? page_fault_oops+0xa4/0x170
>>   ? exc_page_fault+0x67/0x170
>>   ? asm_exc_page_fault+0x26/0x30
>>   ? jbd2_journal_set_features+0x13d/0x430
>>   jbd2_journal_revoke+0x47/0x1e0
>>   __ext4_forget+0xc3/0x1b0
>>   ext4_free_blocks+0x214/0x2f0
>>   ext4_free_branches+0xeb/0x270
>>   ext4_ind_truncate+0x2bf/0x320
>>   ext4_truncate+0x1e4/0x490
>>   ext4_handle_inode_extension+0x1bd/0x2a0
>>   ? iomap_dio_complete+0xaf/0x1d0
>>
>> The root cause is the journal super block had been failed to write out
>> due to I/O fault injection, it's uptodate bit was cleared by
>> end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock().
>> And it raced by journal_get_superblock()->bh_read(), unfortunately, the
>> read IO is also failed, so the error handling in
>> journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer,
>> finally lead to above NULL pointer dereference issue.
>>
>> If the journal super block had been read and verified, there is no need
>> to call bh_read() read it again even if it has been failed to written
>> out. So the fix could be simply move buffer_verified(bh) in front of
>> bh_read(). Also remove a stale comment left in
>> jbd2_journal_check_used_features().
>>
>> Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()")
>> Reported-by: Theodore Ts'o <tytso@mit.edu>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> This works as a workaround. It is a bit kludgy but for now I guess it is
> good enough. Thanks for the fix and feel free to add:
> 

Thanks for the review. Yes, I suppose it's better to find a way to adjust
the sequence of journal load and feature checking in ocfs2_check_volume(),
so that we could completely remove the journal_get_superblock() in
jbd2_journal_check_used_features().

Thanks,
Yi.

> 
> 
>> ---
>> v1->v2:
>>  - Remove a stale comment left in jbd2_journal_check_used_features().
>>
>>  fs/jbd2/journal.c | 7 +++----
>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
>> index b5e57735ab3f..559242df0f9a 100644
>> --- a/fs/jbd2/journal.c
>> +++ b/fs/jbd2/journal.c
>> @@ -1919,6 +1919,9 @@ static int journal_get_superblock(journal_t *journal)
>>  	bh = journal->j_sb_buffer;
>>  
>>  	J_ASSERT(bh != NULL);
>> +	if (buffer_verified(bh))
>> +		return 0;
>> +
>>  	err = bh_read(bh, 0);
>>  	if (err < 0) {
>>  		printk(KERN_ERR
>> @@ -1926,9 +1929,6 @@ static int journal_get_superblock(journal_t *journal)
>>  		goto out;
>>  	}
>>  
>> -	if (buffer_verified(bh))
>> -		return 0;
>> -
>>  	sb = journal->j_superblock;
>>  
>>  	err = -EINVAL;
>> @@ -2229,7 +2229,6 @@ int jbd2_journal_check_used_features(journal_t *journal, unsigned long compat,
>>  
>>  	if (!compat && !ro && !incompat)
>>  		return 1;
>> -	/* Load journal superblock if it is not loaded yet. */
>>  	if (journal_get_superblock(journal))
>>  		return 0;
>>  	if (!jbd2_format_support_feature(journal))
>> -- 
>> 2.39.2
>>
Theodore Ts'o June 17, 2023, 6:50 p.m. UTC | #3
On Sat, Jun 17, 2023 at 10:42:59AM +0800, Zhang Yi wrote:
> > This works as a workaround. It is a bit kludgy but for now I guess it is
> > good enough. Thanks for the fix and feel free to add:
> 
> Thanks for the review. Yes, I suppose it's better to find a way to adjust
> the sequence of journal load and feature checking in ocfs2_check_volume(),
> so that we could completely remove the journal_get_superblock() in
> jbd2_journal_check_used_features().

Indeed, thanks for the fix.

This is would be for after the merge window, but I think we can clean
this up in the jbd2 layer by simply moving the call to
load_superblock() from jbd2_journal_load() and jbd2_journal_wipe() to
journal_init_common().  This change would mean the journal superblock
gets read as part of the call to jbd2_journal_init_{dev,inode}.

That way, once the file system has a journal_t object, it's guaranteed
that the j_sb_buffer contains valid data, and so we can drop the call
to journal_get_superblock() from jbd2_journal_check_used_features().

And after we do that, we should be able to inline the code in
load_superblock() and journal_get_superblock() into
journal_init_common(), which would simplify things in
jfs/jbd2/journal.c

Finally, so we can provide better error handling, we could change
Jbd2_journal_init_{dev,inode} to return an ERR_PTR instead of a NULL
if there is a failure.  And since it's a good idea to change the
function name when changing the function signature, we could rename
those functions to something like jbd2_open_{dev,inode} at the same
time.

						- Ted

P.S.  The only reason why we don't load the superblock in
jbd2_journal_init_{dev,common} was that back in 2001, it was possible
to create the journal by creating a zero length file in the file
system, noting the inode number of the file system, unmounting the
file system from ext2, and then remounting it with "mount -t ext3 -o
journal=NNN ...".  In order to do this, the ext3 file system code
called journal_init_inode() with the inode, and then follow it up with
a call to journal_create(), which would actually write out the journal
superblock.  For that reason, journal_init_inode() had to avoid
reading the journal superblock, since it might not be initialized yet.

We removed jbd2_journal_create() from fs/jbd2 back in 2009, and it
hadn't been in use for quite a while before that --- in fact, I'm not
sure ext4 ever supported this ext3-style "let's create a journal
without e2fsprogs support because Stephen Tweedie was implementing the
ext3 journal kernel code without wanting to make changes to e2fsprogs
first" feature.  :-)
Zhang Yi June 20, 2023, 1:12 a.m. UTC | #4
On 2023/6/18 2:50, Theodore Ts'o wrote:
> On Sat, Jun 17, 2023 at 10:42:59AM +0800, Zhang Yi wrote:
>>> This works as a workaround. It is a bit kludgy but for now I guess it is
>>> good enough. Thanks for the fix and feel free to add:
>>
>> Thanks for the review. Yes, I suppose it's better to find a way to adjust
>> the sequence of journal load and feature checking in ocfs2_check_volume(),
>> so that we could completely remove the journal_get_superblock() in
>> jbd2_journal_check_used_features().
> 
> Indeed, thanks for the fix.
> 
> This is would be for after the merge window, but I think we can clean
> this up in the jbd2 layer by simply moving the call to
> load_superblock() from jbd2_journal_load() and jbd2_journal_wipe() to
> journal_init_common().  This change would mean the journal superblock
> gets read as part of the call to jbd2_journal_init_{dev,inode}.
> 
> That way, once the file system has a journal_t object, it's guaranteed
> that the j_sb_buffer contains valid data, and so we can drop the call
> to journal_get_superblock() from jbd2_journal_check_used_features().
> 
> And after we do that, we should be able to inline the code in
> load_superblock() and journal_get_superblock() into
> journal_init_common(), which would simplify things in
> jfs/jbd2/journal.c
> 
> Finally, so we can provide better error handling, we could change
> Jbd2_journal_init_{dev,inode} to return an ERR_PTR instead of a NULL
> if there is a failure.  And since it's a good idea to change the
> function name when changing the function signature, we could rename
> those functions to something like jbd2_open_{dev,inode} at the same
> time.
> 
> 						- Ted
> 
> P.S.  The only reason why we don't load the superblock in
> jbd2_journal_init_{dev,common} was that back in 2001, it was possible
> to create the journal by creating a zero length file in the file
> system, noting the inode number of the file system, unmounting the
> file system from ext2, and then remounting it with "mount -t ext3 -o
> journal=NNN ...".  In order to do this, the ext3 file system code
> called journal_init_inode() with the inode, and then follow it up with
> a call to journal_create(), which would actually write out the journal
> superblock.  For that reason, journal_init_inode() had to avoid
> reading the journal superblock, since it might not be initialized yet.
> 
> We removed jbd2_journal_create() from fs/jbd2 back in 2009, and it
> hadn't been in use for quite a while before that --- in fact, I'm not
> sure ext4 ever supported this ext3-style "let's create a journal
> without e2fsprogs support because Stephen Tweedie was implementing the
> ext3 journal kernel code without wanting to make changes to e2fsprogs
> first" feature.  :-)
> 

Thanks for the suggestion and historical details, we could do the cleanup
in jbd2 layer after the merge window.

Thanks,
Yi.
diff mbox series

Patch

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b5e57735ab3f..559242df0f9a 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1919,6 +1919,9 @@  static int journal_get_superblock(journal_t *journal)
 	bh = journal->j_sb_buffer;
 
 	J_ASSERT(bh != NULL);
+	if (buffer_verified(bh))
+		return 0;
+
 	err = bh_read(bh, 0);
 	if (err < 0) {
 		printk(KERN_ERR
@@ -1926,9 +1929,6 @@  static int journal_get_superblock(journal_t *journal)
 		goto out;
 	}
 
-	if (buffer_verified(bh))
-		return 0;
-
 	sb = journal->j_superblock;
 
 	err = -EINVAL;
@@ -2229,7 +2229,6 @@  int jbd2_journal_check_used_features(journal_t *journal, unsigned long compat,
 
 	if (!compat && !ro && !incompat)
 		return 1;
-	/* Load journal superblock if it is not loaded yet. */
 	if (journal_get_superblock(journal))
 		return 0;
 	if (!jbd2_format_support_feature(journal))