From patchwork Wed Jan 30 06:49:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 1033302 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=huawei.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43qDQy43s3z9s9h for ; Wed, 30 Jan 2019 17:46:02 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728356AbfA3GqB (ORCPT ); Wed, 30 Jan 2019 01:46:01 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:3241 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727617AbfA3GqB (ORCPT ); Wed, 30 Jan 2019 01:46:01 -0500 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id B73AF2FF821BF9C7D08B; Wed, 30 Jan 2019 14:45:58 +0800 (CST) Received: from huawei.com (10.90.53.225) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 14:45:52 +0800 From: "zhangyi (F)" To: CC: , , , , Subject: [PATCH v4 1/4] jbd2: make sure dirty flag is cleared while revorking a buffer which belongs to older transaction Date: Wed, 30 Jan 2019 14:49:37 +0800 Message-ID: <1548830980-29482-2-git-send-email-yi.zhang@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1548830980-29482-1-git-send-email-yi.zhang@huawei.com> References: <1548830980-29482-1-git-send-email-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.90.53.225] X-CFilter-Loop: Reflected Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Now, we capture a data corruption problem on ext4 while we're truncating an extent index block. Imaging that if we are revoking a buffer which has been journaled by the committing transaction, the buffer's jbddirty flag will not be cleared in jbd2_journal_forget(), so the commit code will set the buffer dirty flag again after refile the buffer. fsx kjournald2 jbd2_journal_commit_transaction jbd2_journal_revoke commit phase 1~5... jbd2_journal_forget belongs to older transaction commit phase 6 jbddirty not clear __jbd2_journal_refile_buffer __jbd2_journal_unfile_buffer test_clear_buffer_jbddirty mark_buffer_dirty Finally, if the freed extent index block was allocated again as data block by some other files, it may corrupt the file data after writing cached pages later, such as during unmount time. (In general, clean_bdev_aliases() related helpers should be invoked after re-allocation to prevent the above corruption, but unfortunately we missed it when zeroout the head of extra extent blocks in ext4_ext_handle_unwritten_extents()). This patch mark buffer as freed and set j_next_transaction to the new transaction when it already belongs to the committing transaction in jbd2_journal_forget(), so that commit code knows it should clear dirty bits when it is done with the buffer. This problem can be reproduced by xfstests generic/455 easily with seeds (3246 3247 3248 3249). Signed-off-by: zhangyi (F) Reviewed-by: Jan Kara Cc: stable@vger.kernel.org --- fs/jbd2/transaction.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index f07f006..f0d8dab 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1609,14 +1609,21 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh) /* However, if the buffer is still owned by a prior * (committing) transaction, we can't drop it yet... */ JBUFFER_TRACE(jh, "belongs to older transaction"); - /* ... but we CAN drop it from the new transaction if we - * have also modified it since the original commit. */ + /* ... but we CAN drop it from the new transaction through + * marking the buffer as freed and set j_next_transaction to + * the new transaction, so that not only the commit code + * knows it should clear dirty bits when it is done with the + * buffer, but also the buffer can be checkpointed only + * after the new transaction commits. */ - if (jh->b_next_transaction) { - J_ASSERT(jh->b_next_transaction == transaction); + set_buffer_freed(bh); + + if (!jh->b_next_transaction) { spin_lock(&journal->j_list_lock); - jh->b_next_transaction = NULL; + jh->b_next_transaction = transaction; spin_unlock(&journal->j_list_lock); + } else { + J_ASSERT(jh->b_next_transaction == transaction); /* * only drop a reference if this transaction modified From patchwork Wed Jan 30 06:49:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 1033306 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=huawei.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43qDR225H8z9s9h for ; Wed, 30 Jan 2019 17:46:06 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729189AbfA3GqD (ORCPT ); Wed, 30 Jan 2019 01:46:03 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:3243 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727620AbfA3GqC (ORCPT ); Wed, 30 Jan 2019 01:46:02 -0500 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id B2E56DD49A80F76119C7; Wed, 30 Jan 2019 14:45:58 +0800 (CST) Received: from huawei.com (10.90.53.225) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 14:45:52 +0800 From: "zhangyi (F)" To: CC: , , , , Subject: [PATCH v4 2/4] jbd2: discard dirty data when forgetting an un-journalled buffer Date: Wed, 30 Jan 2019 14:49:38 +0800 Message-ID: <1548830980-29482-3-git-send-email-yi.zhang@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1548830980-29482-1-git-send-email-yi.zhang@huawei.com> References: <1548830980-29482-1-git-send-email-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.90.53.225] X-CFilter-Loop: Reflected Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org We do not unmap and clear dirty flag when forgetting a buffer without journal or does not belongs to any transaction, so the invalid dirty data may still be written to the disk later. It's fine if the corresponding block is never used before the next mount, and it's also fine that we invoke clean_bdev_aliases() related functions to unmap the block device mapping when re-allocating such freed block as data block. But this logic is somewhat fragile and risky that may lead to data corruption if we forget to clean bdev aliases. So, It's better to discard dirty data during forget time. We have been already handled all the cases of forgetting journalled buffer, this patch deal with the remaining two cases. - buffer is not journalled yet, - buffer is journalled but doesn't belongs to any transaction. We invoke __bforget() instead of __brelese() when forgetting an un-journalled buffer in jbd2_journal_forget(). After this patch we can remove all clean_bdev_aliases() related calls in ext4. Suggested-by: Jan Kara Signed-off-by: zhangyi (F) Reviewed-by: Jan Kara --- fs/jbd2/transaction.c | 42 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index f0d8dab..a43b630 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1597,9 +1597,7 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh) __jbd2_journal_unfile_buffer(jh); if (!buffer_jbd(bh)) { spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); - __bforget(bh); - goto drop; + goto not_jbd; } } spin_unlock(&journal->j_list_lock); @@ -1632,9 +1630,40 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh) if (was_modified) drop_reserve = 1; } + } else { + /* + * Finally, if the buffer is not belongs to any + * transaction, we can just drop it now if it has no + * checkpoint. + */ + spin_lock(&journal->j_list_lock); + if (!jh->b_cp_transaction) { + JBUFFER_TRACE(jh, "belongs to none transaction"); + spin_unlock(&journal->j_list_lock); + goto not_jbd; + } + + /* + * Otherwise, if the buffer has been written to disk, + * it is safe to remove the checkpoint and drop it. + */ + if (!buffer_dirty(bh)) { + __jbd2_journal_remove_checkpoint(jh); + spin_unlock(&journal->j_list_lock); + goto not_jbd; + } + + /* + * The buffer is still not written to disk, we should + * attach this buffer to current transaction so that the + * buffer can be checkpointed only after the current + * transaction commits. + */ + clear_buffer_dirty(bh); + __jbd2_journal_file_buffer(jh, transaction, BJ_Forget); + spin_unlock(&journal->j_list_lock); } -not_jbd: jbd_unlock_bh_state(bh); __brelse(bh); drop: @@ -1643,6 +1672,11 @@ int jbd2_journal_forget (handle_t *handle, struct buffer_head *bh) handle->h_buffer_credits++; } return err; + +not_jbd: + jbd_unlock_bh_state(bh); + __bforget(bh); + goto drop; } /** From patchwork Wed Jan 30 06:49:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 1033303 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=huawei.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43qDR01027z9sC7 for ; Wed, 30 Jan 2019 17:46:04 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728680AbfA3GqB (ORCPT ); Wed, 30 Jan 2019 01:46:01 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:3242 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725850AbfA3GqB (ORCPT ); Wed, 30 Jan 2019 01:46:01 -0500 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id C2EF6D54C1BE4CC96D59; Wed, 30 Jan 2019 14:45:58 +0800 (CST) Received: from huawei.com (10.90.53.225) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 14:45:52 +0800 From: "zhangyi (F)" To: CC: , , , , Subject: [PATCH v4 3/4] ext4: cleanup clean_bdev_aliases() calls Date: Wed, 30 Jan 2019 14:49:39 +0800 Message-ID: <1548830980-29482-4-git-send-email-yi.zhang@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1548830980-29482-1-git-send-email-yi.zhang@huawei.com> References: <1548830980-29482-1-git-send-email-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.90.53.225] X-CFilter-Loop: Reflected Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Now, we have already handle all cases of forgetting buffer in jbd2_journal_forget(), the buffer should not be mapped to blockdevice when reallocating it. So this patch remove all clean_bdev_aliases() and clean_bdev_bh_alias() calls which were invoked by ext4 explicitly. Suggested-by: Jan Kara Signed-off-by: zhangyi (F) Reviewed-by: Jan Kara --- fs/ext4/extents.c | 12 +----------- fs/ext4/inode.c | 7 ------- fs/ext4/page-io.c | 4 +--- 3 files changed, 2 insertions(+), 21 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index a054f51..ffb72d8 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4068,18 +4068,8 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode, } else allocated = ret; map->m_flags |= EXT4_MAP_NEW; - /* - * if we allocated more blocks than requested - * we need to make sure we unmap the extra block - * allocated. The actual needed block will get - * unmapped later when we find the buffer_head marked - * new. - */ - if (allocated > map->m_len) { - clean_bdev_aliases(inode->i_sb->s_bdev, newblock + map->m_len, - allocated - map->m_len); + if (allocated > map->m_len) allocated = map->m_len; - } map->m_len = allocated; map_out: diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e7adf87..3068c83 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -678,8 +678,6 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, if (flags & EXT4_GET_BLOCKS_ZERO && map->m_flags & EXT4_MAP_MAPPED && map->m_flags & EXT4_MAP_NEW) { - clean_bdev_aliases(inode->i_sb->s_bdev, map->m_pblk, - map->m_len); ret = ext4_issue_zeroout(inode, map->m_lblk, map->m_pblk, map->m_len); if (ret) { @@ -1194,7 +1192,6 @@ static int ext4_block_write_begin(struct page *page, loff_t pos, unsigned len, if (err) break; if (buffer_new(bh)) { - clean_bdev_bh_alias(bh); if (PageUptodate(page)) { clear_buffer_new(bh); set_buffer_uptodate(bh); @@ -2490,10 +2487,6 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd) } BUG_ON(map->m_len == 0); - if (map->m_flags & EXT4_MAP_NEW) { - clean_bdev_aliases(inode->i_sb->s_bdev, map->m_pblk, - map->m_len); - } return 0; } diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 2aa62d5..1559946 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -467,10 +467,8 @@ int ext4_bio_write_page(struct ext4_io_submit *io, ext4_io_submit(io); continue; } - if (buffer_new(bh)) { + if (buffer_new(bh)) clear_buffer_new(bh); - clean_bdev_bh_alias(bh); - } set_buffer_async_write(bh); nr_to_submit++; } while ((bh = bh->b_this_page) != head); From patchwork Wed Jan 30 06:49:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhang Yi X-Patchwork-Id: 1033304 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=huawei.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43qDR05mt6z9s9h for ; Wed, 30 Jan 2019 17:46:04 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728701AbfA3GqB (ORCPT ); Wed, 30 Jan 2019 01:46:01 -0500 Received: from szxga05-in.huawei.com ([45.249.212.191]:3239 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726368AbfA3GqB (ORCPT ); Wed, 30 Jan 2019 01:46:01 -0500 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id BF3498F4C68672F06F1B; Wed, 30 Jan 2019 14:45:58 +0800 (CST) Received: from huawei.com (10.90.53.225) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 14:45:52 +0800 From: "zhangyi (F)" To: CC: , , , , Subject: [PATCH v4 4/4] ext4: convert ext4_split_extent() to return requested length Date: Wed, 30 Jan 2019 14:49:40 +0800 Message-ID: <1548830980-29482-5-git-send-email-yi.zhang@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1548830980-29482-1-git-send-email-yi.zhang@huawei.com> References: <1548830980-29482-1-git-send-email-yi.zhang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.90.53.225] X-CFilter-Loop: Reflected Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org After we remove clean_bdev_aliases() calls which used to unmap extra blocks in ext4_ext_handle_unwritten_extents(), return extra initialized region in ext4_ext_convert_to_initialized() is no longer needed, so in order to simplify logic, this patch convert to return the requested size instead. Signed-off-by: zhangyi (F) Reviewed-by: Jan Kara --- fs/ext4/extents.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index ffb72d8..ffe9671 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3456,9 +3456,8 @@ static int ext4_split_extent(handle_t *handle, * of the logical span [map->m_lblk, map->m_lblk + map->m_len). * * Post-conditions on success: - * - the returned value is the number of blocks beyond map->l_lblk - * that are allocated and initialized. - * It is guaranteed to be >= map->m_len. + * - The returned value is the minimum number of requested blocks or + * initialized blocks. It is guaranteed to be <= map->m_len. */ static int ext4_ext_convert_to_initialized(handle_t *handle, struct inode *inode, @@ -3700,7 +3699,6 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, split_map.m_len += split_map.m_lblk - ee_block; split_map.m_lblk = ee_block; - allocated = map->m_len; } } @@ -3709,6 +3707,9 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, if (err > 0) err = 0; out: + if (allocated > map->m_len) + allocated = map->m_len; + /* If we have gotten a failure, don't zero out status tree */ if (!err) { err = ext4_zeroout_es(inode, &zero_ex1); @@ -4065,11 +4066,10 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode, if (ret <= 0) { err = ret; goto out2; - } else - allocated = ret; + } + + allocated = ret; map->m_flags |= EXT4_MAP_NEW; - if (allocated > map->m_len) - allocated = map->m_len; map->m_len = allocated; map_out: