diff mbox series

[v3,2/4] jbd2: discard dirty data when forgetting an un-journalled buffer

Message ID 1548419456-4331-3-git-send-email-yi.zhang@huawei.com
State Superseded
Headers show
Series ext4: fix a data corruption problem | expand

Commit Message

Zhang Yi Jan. 25, 2019, 12:30 p.m. UTC
We do not unmap and clear dirty flag when forgetting a buffer without
journal or does not belongs to any transaction, so the invalid dirty
data may still be written to the disk later. It's fine if the
corresponding block is never used before the next mount, and it's also
fine that we invoke clean_bdev_aliases() related functions to unmap
the block device mapping when re-allocating such freed block as data
block. But this logic is somewhat fragile and risky that may lead to
data corruption if we forget to clean bdev aliases. So, It's better to
discard dirty data during forget time.

We have been already handled all the cases of forgetting journalled
buffer, this patch deal with the remaining two cases.

- buffer is not journalled yet,
- buffer is journalled but doesn't belongs to any transaction.

We invoke __bforget() instead of __brelese() when forgetting an
un-journalled buffer in jbd2_journal_forget(). After this patch we can
remove all clean_bdev_aliases() related calls in ext4.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
 fs/jbd2/transaction.c | 43 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

Comments

Jan Kara Jan. 28, 2019, 3:26 p.m. UTC | #1
On Fri 25-01-19 20:30:54, zhangyi (F) wrote:
> We do not unmap and clear dirty flag when forgetting a buffer without
> journal or does not belongs to any transaction, so the invalid dirty
> data may still be written to the disk later. It's fine if the
> corresponding block is never used before the next mount, and it's also
> fine that we invoke clean_bdev_aliases() related functions to unmap
> the block device mapping when re-allocating such freed block as data
> block. But this logic is somewhat fragile and risky that may lead to
> data corruption if we forget to clean bdev aliases. So, It's better to
> discard dirty data during forget time.
> 
> We have been already handled all the cases of forgetting journalled
> buffer, this patch deal with the remaining two cases.
> 
> - buffer is not journalled yet,
> - buffer is journalled but doesn't belongs to any transaction.
> 
> We invoke __bforget() instead of __brelese() when forgetting an
> un-journalled buffer in jbd2_journal_forget(). After this patch we can
> remove all clean_bdev_aliases() related calls in ext4.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>

Thanks for the patch! Just one small comment below:

> +		/*
> +		 * The buffer is still not written to disk, we should
> +		 * attach this buffer to current transaction to prevent
> +		 * missing writing back when doing checkpoint before
> +		 * the current transaction complete submittion.
> +		 */
> +		__jbd2_journal_temp_unlink_buffer(jh);

Calling __jbd2_journal_temp_unlink_buffer() is not needed when you know the
buffer does not belong to any transaction. Otherwise the patch looks good
to me so feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

after fixing this.

								Honza

> +		clear_buffer_dirty(bh);
> +		__jbd2_journal_file_buffer(jh, transaction, BJ_Forget);
> +		spin_unlock(&journal->j_list_lock);
>  	}
>  
> -not_jbd:
>  	jbd_unlock_bh_state(bh);
>  	__brelse(bh);
>  drop:
> @@ -1643,6 +1673,11 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
>  		handle->h_buffer_credits++;
>  	}
>  	return err;
> +
> +not_jbd:
> +	jbd_unlock_bh_state(bh);
> +	__bforget(bh);
> +	goto drop;
>  }
>  
>  /**
> -- 
> 2.7.4
>
diff mbox series

Patch

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 0c0cbda..8825d45 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1597,9 +1597,7 @@  int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 			__jbd2_journal_unfile_buffer(jh);
 			if (!buffer_jbd(bh)) {
 				spin_unlock(&journal->j_list_lock);
-				jbd_unlock_bh_state(bh);
-				__bforget(bh);
-				goto drop;
+				goto not_jbd;
 			}
 		}
 		spin_unlock(&journal->j_list_lock);
@@ -1632,9 +1630,41 @@  int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 			if (was_modified)
 				drop_reserve = 1;
 		}
+	} else {
+		/*
+		 * Finally, if the buffer is not belongs to any
+		 * transaction, we can just drop it now if it has no
+		 * checkpoint.
+		 */
+		spin_lock(&journal->j_list_lock);
+		if (!jh->b_cp_transaction) {
+			JBUFFER_TRACE(jh, "belongs to none transaction");
+			spin_unlock(&journal->j_list_lock);
+			goto not_jbd;
+		}
+
+		/*
+		 * Otherwise, if the buffer has been written to disk,
+		 * it is safe to remove the checkpoint and drop it.
+		 */
+		if (!buffer_dirty(bh)) {
+			__jbd2_journal_remove_checkpoint(jh);
+			spin_unlock(&journal->j_list_lock);
+			goto not_jbd;
+		}
+
+		/*
+		 * The buffer is still not written to disk, we should
+		 * attach this buffer to current transaction to prevent
+		 * missing writing back when doing checkpoint before
+		 * the current transaction complete submittion.
+		 */
+		__jbd2_journal_temp_unlink_buffer(jh);
+		clear_buffer_dirty(bh);
+		__jbd2_journal_file_buffer(jh, transaction, BJ_Forget);
+		spin_unlock(&journal->j_list_lock);
 	}
 
-not_jbd:
 	jbd_unlock_bh_state(bh);
 	__brelse(bh);
 drop:
@@ -1643,6 +1673,11 @@  int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh)
 		handle->h_buffer_credits++;
 	}
 	return err;
+
+not_jbd:
+	jbd_unlock_bh_state(bh);
+	__bforget(bh);
+	goto drop;
 }
 
 /**