diff mbox series

[2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer

Message ID 20200203140458.37397-3-yi.zhang@huawei.com
State Superseded
Headers show
Series None | expand

Commit Message

Zhang Yi Feb. 3, 2020, 2:04 p.m. UTC
Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
an older transaction") set the BH_Freed flag when forgetting a metadata
buffer which belongs to the committing transaction, it indicate the
committing process clear dirty bits when it is done with the buffer. But
it also clear the BH_Mapped flag at the same time, which may trigger
below NULL pointer oops when block_size < PAGE_SIZE.

rmdir 1             kjournald2                 mkdir 2
                    jbd2_journal_commit_transaction
		    commit transaction N
jbd2_journal_forget
set_buffer_freed(bh1)
                    jbd2_journal_commit_transaction
                     commit transaction N+1
                     ...
                     clear_buffer_mapped(bh1)
                                               ext4_getblk(bh2 ummapped)
                                               ...
                                               grow_dev_page
                                                init_page_buffers
                                                 bh1->b_private=NULL
                                                 bh2->b_private=NULL
                     jbd2_journal_put_journal_head(jh1)
                      __journal_remove_journal_head(hb1)
		       jh1 is NULL and trigger oops

*) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
   already been unmapped.

For the metadata buffer we forgetting, clear the dirty flags is enough,
so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
keep the mapped flag for the metadata buffer.

Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
 fs/jbd2/commit.c      | 11 +++++++----
 fs/jbd2/transaction.c |  1 +
 include/linux/jbd2.h  |  2 ++
 3 files changed, 10 insertions(+), 4 deletions(-)

Comments

Jan Kara Feb. 6, 2020, 11:46 a.m. UTC | #1
On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
> an older transaction") set the BH_Freed flag when forgetting a metadata
> buffer which belongs to the committing transaction, it indicate the
> committing process clear dirty bits when it is done with the buffer. But
> it also clear the BH_Mapped flag at the same time, which may trigger
> below NULL pointer oops when block_size < PAGE_SIZE.
> 
> rmdir 1             kjournald2                 mkdir 2
>                     jbd2_journal_commit_transaction
> 		    commit transaction N
> jbd2_journal_forget
> set_buffer_freed(bh1)
>                     jbd2_journal_commit_transaction
>                      commit transaction N+1
>                      ...
>                      clear_buffer_mapped(bh1)
>                                                ext4_getblk(bh2 ummapped)
>                                                ...
>                                                grow_dev_page
>                                                 init_page_buffers
>                                                  bh1->b_private=NULL
>                                                  bh2->b_private=NULL
>                      jbd2_journal_put_journal_head(jh1)
>                       __journal_remove_journal_head(hb1)
> 		       jh1 is NULL and trigger oops
> 
> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
>    already been unmapped.
> 
> For the metadata buffer we forgetting, clear the dirty flags is enough,
> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
> keep the mapped flag for the metadata buffer.
> 
> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>

Good spotting! Thanks for the patch. Some comments below:

> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 6396fe70085b..a649cdd1c5e5 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>  		if (buffer_freed(bh) && !jh->b_next_transaction) {
>  			clear_buffer_freed(bh);
>  			clear_buffer_jbddirty(bh);
> -			clear_buffer_mapped(bh);
> -			clear_buffer_new(bh);
> -			clear_buffer_req(bh);
> -			bh->b_bdev = NULL;
> +			if (buffer_unmap(bh)) {
> +				clear_buffer_unmap(bh);
> +				clear_buffer_mapped(bh);
> +				clear_buffer_new(bh);
> +				clear_buffer_req(bh);
> +				bh->b_bdev = NULL;
> +			}

Any reason why you don't want to clear buffer_req and buffer_new flags for
all buffers as well? I agree that b_bdev setting and buffer_mapped need
special treatment.

Also rather than introducing this new buffer_unmap bit, I'd use the fact
this special treatment is needed only for buffers coming from the block device
mapping. And we can check for that like:

		/*
		 * We can (and need to) unmap buffer only for normal mappings.
		 * Block device buffers need to stay mapped all the time.
		 * We need to be careful about the check because the page
		 * mapping can get cleared under our hands.
		 */
		mapping = READ_ONCE(bh->b_page->mapping);
		if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
			...
		}

Longer term, we might want to rework how the handling of truncated buffers
works with JDB2. There's lots of duplication between jbd2_journal_forget()
and jbd2_journal_unmap_buffer(), the dirtiness is tracked in jh->b_modified
as well as buffer_jbddirty() and it is further redundant with the journal
list the buffer is currently on. So I suspect it could all be simplified if
we took a fresh look at things.

								Honza
Zhang Yi Feb. 6, 2020, 3:28 p.m. UTC | #2
Thanks for the comments.

On 2020/2/6 19:46, Jan Kara wrote:
> On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
[..]
>> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
>> index 6396fe70085b..a649cdd1c5e5 100644
>> --- a/fs/jbd2/commit.c
>> +++ b/fs/jbd2/commit.c
>> @@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>>  		if (buffer_freed(bh) && !jh->b_next_transaction) {
>>  			clear_buffer_freed(bh);
>>  			clear_buffer_jbddirty(bh);
>> -			clear_buffer_mapped(bh);
>> -			clear_buffer_new(bh);
>> -			clear_buffer_req(bh);
>> -			bh->b_bdev = NULL;
>> +			if (buffer_unmap(bh)) {
>> +				clear_buffer_unmap(bh);
>> +				clear_buffer_mapped(bh);
>> +				clear_buffer_new(bh);
>> +				clear_buffer_req(bh);
>> +				bh->b_bdev = NULL;
>> +			}
> 
> Any reason why you don't want to clear buffer_req and buffer_new flags for
> all buffers as well? I agree that b_bdev setting and buffer_mapped need
> special treatment.
> 
IIUC, for the buffer coming from jbd2_journal_forget() is always 'block
device backed' metadata buffer (not pretty sure), and for these metadata
buffer, buffer_new flag will not be set. At the same time, since it's
always mapped, so it's fine to keep the buffer_req flag even it's freed
by the filesystem now, because it means the block device has committed
this buffer, and it seems that it does not affect we reuse this buffer.
Am I missing something ?

> Also rather than introducing this new buffer_unmap bit, I'd use the fact
> this special treatment is needed only for buffers coming from the block device
> mapping. And we can check for that like:
> 
> 		/*
> 		 * We can (and need to) unmap buffer only for normal mappings.
> 		 * Block device buffers need to stay mapped all the time.
> 		 * We need to be careful about the check because the page
> 		 * mapping can get cleared under our hands.
> 		 */
> 		mapping = READ_ONCE(bh->b_page->mapping);
> 		if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
> 			...
> 		}
> 
It looks better, I will use this checking in the next iteration.

> Longer term, we might want to rework how the handling of truncated buffers
> works with JDB2. There's lots of duplication between jbd2_journal_forget()
> and jbd2_journal_unmap_buffer(), the dirtiness is tracked in jh->b_modified
> as well as buffer_jbddirty() and it is further redundant with the journal
> list the buffer is currently on. So I suspect it could all be simplified if
> we took a fresh look at things.
> 
Indeed, it is tricky and not pretty easy to understand now, refactoring
these is awesome int the future.

Thanks,
Yi.
Zhang Yi Feb. 11, 2020, 6:51 a.m. UTC | #3
On 2020/2/6 19:46, Jan Kara wrote:
> On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
>> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
>> an older transaction") set the BH_Freed flag when forgetting a metadata
>> buffer which belongs to the committing transaction, it indicate the
>> committing process clear dirty bits when it is done with the buffer. But
>> it also clear the BH_Mapped flag at the same time, which may trigger
>> below NULL pointer oops when block_size < PAGE_SIZE.
>>
>> rmdir 1             kjournald2                 mkdir 2
>>                     jbd2_journal_commit_transaction
>> 		    commit transaction N
>> jbd2_journal_forget
>> set_buffer_freed(bh1)
>>                     jbd2_journal_commit_transaction
>>                      commit transaction N+1
>>                      ...
>>                      clear_buffer_mapped(bh1)
>>                                                ext4_getblk(bh2 ummapped)
>>                                                ...
>>                                                grow_dev_page
>>                                                 init_page_buffers
>>                                                  bh1->b_private=NULL
>>                                                  bh2->b_private=NULL
>>                      jbd2_journal_put_journal_head(jh1)
>>                       __journal_remove_journal_head(hb1)
>> 		       jh1 is NULL and trigger oops
>>
>> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
>>    already been unmapped.
>>
>> For the metadata buffer we forgetting, clear the dirty flags is enough,
>> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
>> keep the mapped flag for the metadata buffer.
>>
>> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
>> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
[..]
> 
> Also rather than introducing this new buffer_unmap bit, I'd use the fact
> this special treatment is needed only for buffers coming from the block device
> mapping. And we can check for that like:
> 
> 		/*
> 		 * We can (and need to) unmap buffer only for normal mappings.
> 		 * Block device buffers need to stay mapped all the time.
> 		 * We need to be careful about the check because the page
> 		 * mapping can get cleared under our hands.
> 		 */
> 		mapping = READ_ONCE(bh->b_page->mapping);
> 		if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
> 			...
> 		}

Think about it again, it may missing clearing of mapped flag if 'mapping'
of journalled data page was cleared, and finally trigger exception if
we reuse the buffer again. So I think it should be:

		if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {
			...
		}

Thanks,
Yi.
Jan Kara Feb. 12, 2020, 8:45 a.m. UTC | #4
On Thu 06-02-20 23:28:01, zhangyi (F) wrote:
> Thanks for the comments.
> 
> On 2020/2/6 19:46, Jan Kara wrote:
> > On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
> [..]
> >> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> >> index 6396fe70085b..a649cdd1c5e5 100644
> >> --- a/fs/jbd2/commit.c
> >> +++ b/fs/jbd2/commit.c
> >> @@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> >>  		if (buffer_freed(bh) && !jh->b_next_transaction) {
> >>  			clear_buffer_freed(bh);
> >>  			clear_buffer_jbddirty(bh);
> >> -			clear_buffer_mapped(bh);
> >> -			clear_buffer_new(bh);
> >> -			clear_buffer_req(bh);
> >> -			bh->b_bdev = NULL;
> >> +			if (buffer_unmap(bh)) {
> >> +				clear_buffer_unmap(bh);
> >> +				clear_buffer_mapped(bh);
> >> +				clear_buffer_new(bh);
> >> +				clear_buffer_req(bh);
> >> +				bh->b_bdev = NULL;
> >> +			}
> > 
> > Any reason why you don't want to clear buffer_req and buffer_new flags for
> > all buffers as well? I agree that b_bdev setting and buffer_mapped need
> > special treatment.
> > 
> IIUC, for the buffer coming from jbd2_journal_forget() is always 'block
> device backed' metadata buffer (not pretty sure), and for these metadata
  Yes, it is.

> buffer, buffer_new flag will not be set. At the same time, since it's
> always mapped, so it's fine to keep the buffer_req flag even it's freed
> by the filesystem now, because it means the block device has committed
> this buffer, and it seems that it does not affect we reuse this buffer.
> Am I missing something ?

OK, you're right that buffer_new shouldn't be ever set for block backed
buffers and we don't care about buffer_req. So let's keep the split of bits
to clear as you did and just add a comment that for block device buffers it
is enough to clear buffer_jbddirty and buffer_freed, for file mapping
buffers (i.e., journalled data) we have to be more careful and clear more
bits.

								Honza
Jan Kara Feb. 12, 2020, 8:47 a.m. UTC | #5
On Tue 11-02-20 14:51:10, zhangyi (F) wrote:
> On 2020/2/6 19:46, Jan Kara wrote:
> > On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
> >> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
> >> an older transaction") set the BH_Freed flag when forgetting a metadata
> >> buffer which belongs to the committing transaction, it indicate the
> >> committing process clear dirty bits when it is done with the buffer. But
> >> it also clear the BH_Mapped flag at the same time, which may trigger
> >> below NULL pointer oops when block_size < PAGE_SIZE.
> >>
> >> rmdir 1             kjournald2                 mkdir 2
> >>                     jbd2_journal_commit_transaction
> >> 		    commit transaction N
> >> jbd2_journal_forget
> >> set_buffer_freed(bh1)
> >>                     jbd2_journal_commit_transaction
> >>                      commit transaction N+1
> >>                      ...
> >>                      clear_buffer_mapped(bh1)
> >>                                                ext4_getblk(bh2 ummapped)
> >>                                                ...
> >>                                                grow_dev_page
> >>                                                 init_page_buffers
> >>                                                  bh1->b_private=NULL
> >>                                                  bh2->b_private=NULL
> >>                      jbd2_journal_put_journal_head(jh1)
> >>                       __journal_remove_journal_head(hb1)
> >> 		       jh1 is NULL and trigger oops
> >>
> >> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
> >>    already been unmapped.
> >>
> >> For the metadata buffer we forgetting, clear the dirty flags is enough,
> >> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
> >> keep the mapped flag for the metadata buffer.
> >>
> >> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
> >> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
> [..]
> > 
> > Also rather than introducing this new buffer_unmap bit, I'd use the fact
> > this special treatment is needed only for buffers coming from the block device
> > mapping. And we can check for that like:
> > 
> > 		/*
> > 		 * We can (and need to) unmap buffer only for normal mappings.
> > 		 * Block device buffers need to stay mapped all the time.
> > 		 * We need to be careful about the check because the page
> > 		 * mapping can get cleared under our hands.
> > 		 */
> > 		mapping = READ_ONCE(bh->b_page->mapping);
> > 		if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
> > 			...
> > 		}
> 
> Think about it again, it may missing clearing of mapped flag if 'mapping'
> of journalled data page was cleared, and finally trigger exception if
> we reuse the buffer again. So I think it should be:
> 
> 		if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {
> 			...
> 		}

Well, if b_page->mapping got cleared, it means the page got fully truncated
and in such case buffers can never be reused - the page and buffers will be
freed once we are done with them. So what you are concerned about cannot
happen. But you're right it is good to explain this in the comment.

								Honza
Zhang Yi Feb. 12, 2020, 1:14 p.m. UTC | #6
Hi,

On 2020/2/12 16:47, Jan Kara wrote:
> On Tue 11-02-20 14:51:10, zhangyi (F) wrote:
>> On 2020/2/6 19:46, Jan Kara wrote:
>>> On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
>>>> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
>>>> an older transaction") set the BH_Freed flag when forgetting a metadata
>>>> buffer which belongs to the committing transaction, it indicate the
>>>> committing process clear dirty bits when it is done with the buffer. But
>>>> it also clear the BH_Mapped flag at the same time, which may trigger
>>>> below NULL pointer oops when block_size < PAGE_SIZE.
>>>>
>>>> rmdir 1             kjournald2                 mkdir 2
>>>>                     jbd2_journal_commit_transaction
>>>> 		    commit transaction N
>>>> jbd2_journal_forget
>>>> set_buffer_freed(bh1)
>>>>                     jbd2_journal_commit_transaction
>>>>                      commit transaction N+1
>>>>                      ...
>>>>                      clear_buffer_mapped(bh1)
>>>>                                                ext4_getblk(bh2 ummapped)
>>>>                                                ...
>>>>                                                grow_dev_page
>>>>                                                 init_page_buffers
>>>>                                                  bh1->b_private=NULL
>>>>                                                  bh2->b_private=NULL
>>>>                      jbd2_journal_put_journal_head(jh1)
>>>>                       __journal_remove_journal_head(hb1)
>>>> 		       jh1 is NULL and trigger oops
>>>>
>>>> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
>>>>    already been unmapped.
>>>>
>>>> For the metadata buffer we forgetting, clear the dirty flags is enough,
>>>> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
>>>> keep the mapped flag for the metadata buffer.
>>>>
>>>> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
>>>> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
>> [..]
>>>
>>> Also rather than introducing this new buffer_unmap bit, I'd use the fact
>>> this special treatment is needed only for buffers coming from the block device
>>> mapping. And we can check for that like:
>>>
>>> 		/*
>>> 		 * We can (and need to) unmap buffer only for normal mappings.
>>> 		 * Block device buffers need to stay mapped all the time.
>>> 		 * We need to be careful about the check because the page
>>> 		 * mapping can get cleared under our hands.
>>> 		 */
>>> 		mapping = READ_ONCE(bh->b_page->mapping);
>>> 		if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
>>> 			...
>>> 		}
>>
>> Think about it again, it may missing clearing of mapped flag if 'mapping'
>> of journalled data page was cleared, and finally trigger exception if
>> we reuse the buffer again. So I think it should be:
>>
>> 		if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {
>> 			...
>> 		}
> 
> Well, if b_page->mapping got cleared, it means the page got fully truncated
> and in such case buffers can never be reused - the page and buffers will be
> freed once we are done with them. So what you are concerned about cannot
> happen. But you're right it is good to explain this in the comment.
> 
Yes, you are right, the page and buffer will be freed in release_buffer_page()
and it seems there is no exception, I will send V3 to back to use the judgement
condition as you suggested and add comments after tests.

Thanks,
Yi.
diff mbox series

Patch

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 6396fe70085b..a649cdd1c5e5 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -987,10 +987,13 @@  void jbd2_journal_commit_transaction(journal_t *journal)
 		if (buffer_freed(bh) && !jh->b_next_transaction) {
 			clear_buffer_freed(bh);
 			clear_buffer_jbddirty(bh);
-			clear_buffer_mapped(bh);
-			clear_buffer_new(bh);
-			clear_buffer_req(bh);
-			bh->b_bdev = NULL;
+			if (buffer_unmap(bh)) {
+				clear_buffer_unmap(bh);
+				clear_buffer_mapped(bh);
+				clear_buffer_new(bh);
+				clear_buffer_req(bh);
+				bh->b_bdev = NULL;
+			}
 		}
 
 		if (buffer_jbddirty(bh)) {
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a479cbf8ae54..717964eec9d3 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2335,6 +2335,7 @@  static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
 		 * should clear dirty bits when it is done with the buffer.
 		 */
 		set_buffer_freed(bh);
+		set_buffer_unmap(bh);
 		if (journal->j_running_transaction && buffer_jbddirty(bh))
 			jh->b_next_transaction = journal->j_running_transaction;
 		may_free = 0;
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f613d8529863..f74906ebc73a 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -310,6 +310,7 @@  enum jbd_state_bits {
 	  = BH_PrivateStart,
 	BH_JWrite,		/* Being written to log (@@@ DEBUGGING) */
 	BH_Freed,		/* Has been freed (truncated) */
+	BH_Unmap,		/* Has been freed and need to unmap */
 	BH_Revoked,		/* Has been revoked from the log */
 	BH_RevokeValid,		/* Revoked flag is valid */
 	BH_JBDDirty,		/* Is dirty but journaled */
@@ -328,6 +329,7 @@  TAS_BUFFER_FNS(Revoked, revoked)
 BUFFER_FNS(RevokeValid, revokevalid)
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)
+BUFFER_FNS(Unmap, unmap)
 BUFFER_FNS(Shadow, shadow)
 BUFFER_FNS(Verified, verified)