From patchwork Thu Sep 9 20:22:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1526288 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=rNPjsNEO; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4H59RX1szFz9t1Q; Fri, 10 Sep 2021 06:22:47 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1mOQZH-0001tM-M6; Thu, 09 Sep 2021 20:22:43 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1mOQZE-0001rh-RC for kernel-team@lists.ubuntu.com; Thu, 09 Sep 2021 20:22:40 +0000 Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id A2AC83F302 for ; Thu, 9 Sep 2021 20:22:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1631218960; bh=57Hia0ThN7SAd4b9TtrfCRrMj/WGATB5qJIqg8+tz7c=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=rNPjsNEOOfuDa/r1Qk10gF3zNQNRcWV9AY4zVL9DMf9fX/QgA7S0rwfTMbeOzJ5z3 0ZOWV292BQhpPvJ7r5Cu+gN1dtFUtEH5rDOBFewe5o8YHoqwEvG3rSDHfoih4/Bu9P F2fJzOprReLhFlNGBHDkIM8eQumh1a6+aucCmQErfTRkzio8UpzNe0Vf18FX7k0Vkn c6V3URKmn10a2xSo05Qkv/5m1u1hTRn27XksWlDa+kMkUeij1DYktkDJPccTKVR7W7 5I76GFt1Dceo2/eDhONqkPe1RoopvjrGXkTZEveZmDokRWZVQ5mplMTcGs0tGH9rIT MmHE7BB+1uyiQ== Received: by mail-qv1-f70.google.com with SMTP id a10-20020ad45c4a000000b0037774ba4e8bso11601931qva.5 for ; Thu, 09 Sep 2021 13:22:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=57Hia0ThN7SAd4b9TtrfCRrMj/WGATB5qJIqg8+tz7c=; b=xOSnAfWTRNOuG9kd5pPefScf5mw22LCKj6kJpwTYPBoQ3qEG5LFxxNbQ/KQ/trJzf8 WiQLj+ngcqwZB4AJtZQB4pG29xGQF4YRv51O12gX3EMNbkMrsbfLUfdgO5aHkmLh4xQb MFu0w01sXzfUHWZ8norcYpO1ausgFhSqVoz+ilSDkPiquF0SJO1qLqPNFrgJh4PXa42K 96dS93mO2U31/chTXHM2YklT3LMWYCXADcpBDsvY+t5ehjmDm6535mWDasFaP5zlDeqV B2ssq7oMR0rQ+dxrKwTQiF9bXSV0aw1GLa7YLu3LpwFqvcoXM5moFo0mdcJwgecnmow7 OrYA== X-Gm-Message-State: AOAM530YhOuNY5tDVpPVc49eS/dx5OQPomxfiT5eIRcoKQhXrD0To7gA P/BPqEjwIAG2kqC+YICFjukX0BcN0nD6ZgPyombA8iO7io4QHhkBXji4R5ZH8pixr6PDMCFtcll ynCNlW/0Pgro3AgN9QxSu7PLWGrKro35PjMJasu3G6g== X-Received: by 2002:ac8:6786:: with SMTP id b6mr4763148qtp.201.1631218959338; Thu, 09 Sep 2021 13:22:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyoH062DwDWE1GPG/0Q3pzLYMCOw4i17v/acy28KH4Wnmt/fV2ev5vzTw9yKf2hT92NZ/1FqQ== X-Received: by 2002:ac8:6786:: with SMTP id b6mr4763122qtp.201.1631218959040; Thu, 09 Sep 2021 13:22:39 -0700 (PDT) Received: from mfo-t470.. ([2804:14c:4e1:8732:e256:1fca:b0d8:d6a8]) by smtp.gmail.com with ESMTPSA id t64sm2172210qkd.71.2021.09.09.13.22.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 13:22:38 -0700 (PDT) From: Mauricio Faria de Oliveira To: kernel-team@lists.ubuntu.com Subject: [F][PATCH 4/5] ext4: data=journal: write-protect pages on j_submit_inode_data_buffers() Date: Thu, 9 Sep 2021 17:22:25 -0300 Message-Id: <20210909202230.886329-5-mfo@canonical.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210909202230.886329-1-mfo@canonical.com> References: <20210909202230.886329-1-mfo@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" BugLink: https://bugs.launchpad.net/bugs/1847340 This implements journal callbacks j_submit|finish_inode_data_buffers() with different behavior for data=journal: to write-protect pages under commit, preventing changes to buffers writeably mapped to userspace. If a buffer's content changes between commit's checksum calculation and write-out to disk, it can cause journal recovery/mount failures upon a kernel crash or power loss. [ 27.334874] EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support! [ 27.339492] JBD2: Invalid checksum recovering data block 8705 in log [ 27.342716] JBD2: recovery failed [ 27.343316] EXT4-fs (loop0): error loading journal mount: /ext4: can't read superblock on /dev/loop0. In j_submit_inode_data_buffers() we write-protect the inode's pages with write_cache_pages() and redirty w/ writepage callback if needed. In j_finish_inode_data_buffers() there is nothing do to. And in order to use the callbacks, inodes are added to the inode list in transaction in __ext4_journalled_writepage() and ext4_page_mkwrite(). In ext4_page_mkwrite() we must make sure that the buffers are attached to the transaction as jbddirty with write_end_fn(), as already done in __ext4_journalled_writepage(). Signed-off-by: Mauricio Faria de Oliveira Reported-by: Dann Frazier Reported-by: kernel test robot # wbc.nr_to_write Suggested-by: Jan Kara Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/20201006004841.600488-5-mfo@canonical.com Signed-off-by: Theodore Ts'o (cherry picked from commit afb585a97f81899e39c14658789f02259d8c306a) Signed-off-by: Mauricio Faria de Oliveira --- fs/ext4/inode.c | 25 +++++++++----- fs/ext4/super.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 101 insertions(+), 11 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fa283bcdd762..0fd8b033bbbe 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2069,6 +2069,9 @@ static int __ext4_journalled_writepage(struct page *page, err = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL, write_end_fn); } + if (ret == 0) + ret = err; + err = ext4_jbd2_inode_add_write(handle, inode, 0, len); if (ret == 0) ret = err; EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid; @@ -6351,10 +6354,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) size = i_size_read(inode); /* Page got truncated from under us? */ if (page->mapping != mapping || page_offset(page) > size) { - unlock_page(page); ret = VM_FAULT_NOPAGE; - ext4_journal_stop(handle); - goto out; + goto out_error; } if (page->index == size >> PAGE_SHIFT) @@ -6364,13 +6365,15 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) err = __block_write_begin(page, 0, len, ext4_get_block); if (!err) { + ret = VM_FAULT_SIGBUS; if (ext4_walk_page_buffers(handle, page_buffers(page), - 0, len, NULL, do_journal_get_write_access)) { - unlock_page(page); - ret = VM_FAULT_SIGBUS; - ext4_journal_stop(handle); - goto out; - } + 0, len, NULL, do_journal_get_write_access)) + goto out_error; + if (ext4_walk_page_buffers(handle, page_buffers(page), + 0, len, NULL, write_end_fn)) + goto out_error; + if (ext4_jbd2_inode_add_write(handle, inode, 0, len)) + goto out_error; ext4_set_inode_state(inode, EXT4_STATE_JDATA); } else { unlock_page(page); @@ -6385,6 +6388,10 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) up_read(&EXT4_I(inode)->i_mmap_sem); sb_end_pagefault(inode->i_sb); return ret; +out_error: + unlock_page(page); + ext4_journal_stop(handle); + goto out; } vm_fault_t ext4_filemap_fault(struct vm_fault *vmf) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index dda9df956505..a29ec0fa3d71 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -432,6 +432,89 @@ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn) spin_unlock(&sbi->s_md_lock); } +/* + * This writepage callback for write_cache_pages() + * takes care of a few cases after page cleaning. + * + * write_cache_pages() already checks for dirty pages + * and calls clear_page_dirty_for_io(), which we want, + * to write protect the pages. + * + * However, we may have to redirty a page (see below.) + */ +static int ext4_journalled_writepage_callback(struct page *page, + struct writeback_control *wbc, + void *data) +{ + transaction_t *transaction = (transaction_t *) data; + struct buffer_head *bh, *head; + struct journal_head *jh; + + bh = head = page_buffers(page); + do { + /* + * We have to redirty a page in these cases: + * 1) If buffer is dirty, it means the page was dirty because it + * contains a buffer that needs checkpointing. So the dirty bit + * needs to be preserved so that checkpointing writes the buffer + * properly. + * 2) If buffer is not part of the committing transaction + * (we may have just accidentally come across this buffer because + * inode range tracking is not exact) or if the currently running + * transaction already contains this buffer as well, dirty bit + * needs to be preserved so that the buffer gets writeprotected + * properly on running transaction's commit. + */ + jh = bh2jh(bh); + if (buffer_dirty(bh) || + (jh && (jh->b_transaction != transaction || + jh->b_next_transaction))) { + redirty_page_for_writepage(wbc, page); + goto out; + } + } while ((bh = bh->b_this_page) != head); + +out: + return AOP_WRITEPAGE_ACTIVATE; +} + +static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode) +{ + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + struct writeback_control wbc = { + .sync_mode = WB_SYNC_ALL, + .nr_to_write = LONG_MAX, + .range_start = jinode->i_dirty_start, + .range_end = jinode->i_dirty_end, + }; + + return write_cache_pages(mapping, &wbc, + ext4_journalled_writepage_callback, + jinode->i_transaction); +} + +static int ext4_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) +{ + int ret; + + if (ext4_should_journal_data(jinode->i_vfs_inode)) + ret = ext4_journalled_submit_inode_data_buffers(jinode); + else + ret = jbd2_journal_submit_inode_data_buffers(jinode); + + return ret; +} + +static int ext4_journal_finish_inode_data_buffers(struct jbd2_inode *jinode) +{ + int ret = 0; + + if (!ext4_should_journal_data(jinode->i_vfs_inode)) + ret = jbd2_journal_finish_inode_data_buffers(jinode); + + return ret; +} + static bool system_going_down(void) { return system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF @@ -4466,9 +4549,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_journal->j_commit_callback = ext4_journal_commit_callback; sbi->s_journal->j_submit_inode_data_buffers = - jbd2_journal_submit_inode_data_buffers; + ext4_journal_submit_inode_data_buffers; sbi->s_journal->j_finish_inode_data_buffers = - jbd2_journal_finish_inode_data_buffers; + ext4_journal_finish_inode_data_buffers; no_journal: if (!test_opt(sb, NO_MBCACHE)) {