From patchwork Mon Aug 10 01:02:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1342684 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BPyNv4yr7z9sTb for ; Mon, 10 Aug 2020 11:02:23 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726382AbgHJBCW (ORCPT ); Sun, 9 Aug 2020 21:02:22 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:36769 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726219AbgHJBCW (ORCPT ); Sun, 9 Aug 2020 21:02:22 -0400 Received: from mail-qv1-f71.google.com ([209.85.219.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k4wCh-000707-Cn for linux-ext4@vger.kernel.org; Mon, 10 Aug 2020 01:02:19 +0000 Received: by mail-qv1-f71.google.com with SMTP id z10so6332184qvm.0 for ; Sun, 09 Aug 2020 18:02:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d3yDKo9vJf9G43nEc3BI2IxvP5t3stpY+dybWJvQ24k=; b=LlwovZEfcWmO3ef1vsjhCJblFid0I+7DKNMMuQ0N+oAiENgYQdVKXz7WUGxrWNK65a ZpCWsfFPhmeEhVxnYG5DgmrQjXlrMw25NXG6nS6LNIEazxHEgBX4vF/B5Fb9FfwsoPrG bUWQ+7AvYJ50tY017HIzFVQWUOV7lyOtuWiDZZWoML2WnLyqbEFKPuWSLf8FS2HSk49K 24jycubmMFCF88e4sUZ/7gGSDO6jAI/JZWluNu2w03K8iKLtmHVFReBf4iRy4C7IUjEJ NXlFrkt1dzRIbhgOmkrdGj0hiQ/4+B0CILoU+CyGVPd11m3pZDjJjClRYX1RiVBLTAq5 gUog== X-Gm-Message-State: AOAM531ZqtYSJaDtQXy2U3qUs7fisrx4nw2OGiH+X/hF66SCGrHnr3Bj xRGdH4VYyPdAsIq2OvXHQA0OWALsbxG+TQU3LXPHlR0i6qUA/Y3MUkWP+N1SzHY4XcuvVQvgwup 13JrHdwNYscFRJNK7At/roMMeK0QHpUNs2TqxLmw= X-Received: by 2002:a05:6214:13b0:: with SMTP id h16mr25203745qvz.207.1597021338241; Sun, 09 Aug 2020 18:02:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzWW307K6Dhef/KbsJp9Ec5pys3C5hXSa4G+GHhj0G5wBEVREBKK3KSMKtxVkckfKONdGnlpQ== X-Received: by 2002:a05:6214:13b0:: with SMTP id h16mr25203721qvz.207.1597021338000; Sun, 09 Aug 2020 18:02:18 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id 95sm44815qtc.29.2020.08.09.18.02.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Aug 2020 18:02:17 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier , Mauricio Faria de Oliveira , Jan Kara Subject: [RFC PATCH v2 1/5] jbd2: test case for ext4 data=journal/mmap() journal corruption Date: Sun, 9 Aug 2020 22:02:04 -0300 Message-Id: <20200810010210.3305322-2-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200810010210.3305322-1-mfo@canonical.com> References: <20200810010210.3305322-1-mfo@canonical.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This checks during journal commit, right after calculating the checksum of a buffer head, whether its contents match the 'BUG' string (the cookie string in the test case userspace part.) If so, it sleeps 5 seconds for such contents to change (i.e., so that the actual checksum changes from what was calculated.) And if it changed, set a flag to panic after committing to disk. Then, on filesystem remount/journal recovery there is an invalid checksum error, and recovery fails: $ sudo mount -o data=journal,journal_checksum $DEV $MNT [ 100.832223] EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support! [ 100.837488] JBD2: Invalid checksum recovering data block 8706 in log [ 100.842010] JBD2: recovery failed [ 100.843045] EXT4-fs (loop0): error loading journal mount: /ext4: can't read superblock on /dev/loop0. --- fs/jbd2/commit.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 6d2da8ad0e6f..51f713089e35 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -26,6 +26,11 @@ #include #include +#include +#include + +static journal_t *force_panic; + /* * IO end handler for temporary buffer_heads handling writes to the journal. */ @@ -331,14 +336,35 @@ static void jbd2_block_tag_csum_set(journal_t *j, journal_block_tag_t *tag, __u32 csum32; __be32 seq; + // For the testcase + __u32 csum32_later; + __u8 *bh_data; + if (!jbd2_journal_has_csum_v2or3(j)) return; seq = cpu_to_be32(sequence); addr = kmap_atomic(page); csum32 = jbd2_chksum(j, j->j_csum_seed, (__u8 *)&seq, sizeof(seq)); + csum32_later = csum32; // Copy csum32 to check again later csum32 = jbd2_chksum(j, csum32, addr + offset_in_page(bh->b_data), bh->b_size); + + // Check for testcase cookie 'BUG' in the buffer_head data. + bh_data = addr + offset_in_page(bh->b_data); + if (bh_data[0] == 'B' && + bh_data[1] == 'U' && + bh_data[2] == 'G') { + pr_info("TESTCASE: Cookie found. Waiting 5 seconds for changes.\n"); + msleep(5000); + pr_info("TESTCASE: Cookie eaten. Resumed.\n"); + } + + // Check the checksum again for changes/panic after commit. + csum32_later = jbd2_chksum(j, csum32_later, addr + offset_in_page(bh->b_data), bh->b_size); + if (csum32 != csum32_later) + force_panic = j; + kunmap_atomic(addr); if (jbd2_has_feature_csum3(j)) @@ -885,6 +911,9 @@ void jbd2_journal_commit_transaction(journal_t *journal) blkdev_issue_flush(journal->j_dev, GFP_NOFS); } + if (force_panic == journal) + panic("TESTCASE: checksum changed; commit record done; panic!\n"); + if (err) jbd2_journal_abort(journal, err); From patchwork Mon Aug 10 01:02:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1342686 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BPyNx5v7bz9sTb for ; Mon, 10 Aug 2020 11:02:25 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726398AbgHJBCZ (ORCPT ); Sun, 9 Aug 2020 21:02:25 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:36773 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726335AbgHJBCY (ORCPT ); Sun, 9 Aug 2020 21:02:24 -0400 Received: from mail-qt1-f200.google.com ([209.85.160.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k4wCj-00070X-L0 for linux-ext4@vger.kernel.org; Mon, 10 Aug 2020 01:02:21 +0000 Received: by mail-qt1-f200.google.com with SMTP id b18so6472292qte.18 for ; Sun, 09 Aug 2020 18:02:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JEfAzSWNZQaASNwjhatmjvzHZB2T4xTmTgYYKkfWqIQ=; b=mwzDvLiq/rKYZMvAf18fh6Impj1wkeY+tfOthnsr2c8qOMUT4o9toFg3HpxIWrOBxh 9S6so/sEpYS63wEtg5pmUv8Qa3oXakvtqjFPKaa1M0NCtYoDgdseqpIdKazqVxATiPEp vvy+4RB79GlZwPl4ssis1aW9P6+1+dMH1PU8/Z2uyEpXXkhq5I+05xpSkTXr2FgKBIb8 I4UcUEELmtc3U3+InH4M71Red7N2N/hbNupIVjA1v3MYxh3Ir2aVN2PfozNDytb7Nxlv 0BsN5jRZ481r/BqAko9juba07GjC7zlhpSsnzlwk4IQKhTDZK88DCly+p2Zd0reewY79 v6sw== X-Gm-Message-State: AOAM53046Go/JrcTGBrAw6c1vT07vaSiupH/V5wKhRm81WuNch7hnS0V WGkn9Xmd4bDPJC015bZ0nYSkkpdKQ/g3uwVegacHsyUEHZav0boUrAYwQzRO0c64wlMQXKKxC8f eKxZGZdGh7qyWmSNgPAMx1twJYInvz9WDxyOt9Ng= X-Received: by 2002:a05:620a:21c1:: with SMTP id h1mr23384890qka.178.1597021340620; Sun, 09 Aug 2020 18:02:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw/gNL1NLIKY9GhTzgTs03cLkaKtYowVsvbffBNA7OF+unNlfZkYequJe4anyU7u2lg8rTxkQ== X-Received: by 2002:a05:620a:21c1:: with SMTP id h1mr23384864qka.178.1597021340263; Sun, 09 Aug 2020 18:02:20 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id 95sm44815qtc.29.2020.08.09.18.02.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Aug 2020 18:02:19 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier , Mauricio Faria de Oliveira , Jan Kara Subject: [RFC PATCH v2 2/5] jbd2: introduce journal callbacks j_submit|finish_inode_data_buffers Date: Sun, 9 Aug 2020 22:02:05 -0300 Message-Id: <20200810010210.3305322-3-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200810010210.3305322-1-mfo@canonical.com> References: <20200810010210.3305322-1-mfo@canonical.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Add the callbacks as opt-in to override the default behavior for the transaction's inode list, instead of moving that code around. This is important as not only ext4 uses the inode list: ocfs2 too, via jbd2_journal_inode_ranged_write(), and maybe out-of-tree code. To opt-out of the default behavior (i.e., to do nothing), one has to opt-in with a no-op function. --- fs/jbd2/commit.c | 21 ++++++++++++++++----- include/linux/jbd2.h | 21 ++++++++++++++++++++- 2 files changed, 36 insertions(+), 6 deletions(-) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 51f713089e35..b98d227b50d8 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -237,10 +237,14 @@ static int journal_submit_data_buffers(journal_t *journal, * instead of writepages. Because writepages can do * block allocation with delalloc. We need to write * only allocated blocks here. + * This can be overriden with a custom callback. */ trace_jbd2_submit_inode_data(jinode->i_vfs_inode); - err = journal_submit_inode_data_buffers(mapping, dirty_start, - dirty_end); + if (journal->j_submit_inode_data_buffers) + err = journal->j_submit_inode_data_buffers(jinode); + else + err = journal_submit_inode_data_buffers(mapping, + dirty_start, dirty_end); if (!ret) ret = err; spin_lock(&journal->j_list_lock); @@ -274,9 +278,16 @@ static int journal_finish_inode_data_buffers(journal_t *journal, continue; jinode->i_flags |= JI_COMMIT_RUNNING; spin_unlock(&journal->j_list_lock); - err = filemap_fdatawait_range_keep_errors( - jinode->i_vfs_inode->i_mapping, dirty_start, - dirty_end); + /* + * Wait for the inode data buffers writeout. + * This can be overriden with a custom callback. + */ + if (journal->j_finish_inode_data_buffers) + err = journal->j_finish_inode_data_buffers(jinode); + else + err = filemap_fdatawait_range_keep_errors( + jinode->i_vfs_inode->i_mapping, + dirty_start, dirty_end); if (!ret) ret = err; spin_lock(&journal->j_list_lock); diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index d56128df2aff..24efe88eda1b 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -628,7 +628,8 @@ struct transaction_s struct journal_head *t_shadow_list; /* - * List of inodes whose data we've modified in data=ordered mode. + * List of inodes whose data we've modified in data=ordered mode + * or whose pages we should write-protect in data=journaled mode. * [j_list_lock] */ struct list_head t_inode_list; @@ -1110,6 +1111,24 @@ struct journal_s void (*j_commit_callback)(journal_t *, transaction_t *); + /** + * @j_submit_inode_data_buffers: + * + * This function is called before flushing metadata buffers. + * This overrides the default behavior (writeout data buffers.) + */ + int (*j_submit_inode_data_buffers) + (struct jbd2_inode *); + + /** + * @j_finish_inode_data_buffers: + * + * This function is called after flushing metadata buffers. + * This overrides the default behavior (wait writeout.) + */ + int (*j_finish_inode_data_buffers) + (struct jbd2_inode *); + /* * Journal statistics */ From patchwork Mon Aug 10 01:02:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1342687 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BPyP02jy1z9sTb for ; Mon, 10 Aug 2020 11:02:28 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726401AbgHJBC1 (ORCPT ); Sun, 9 Aug 2020 21:02:27 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:36776 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726350AbgHJBC0 (ORCPT ); Sun, 9 Aug 2020 21:02:26 -0400 Received: from mail-qk1-f198.google.com ([209.85.222.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k4wCl-00070s-Tw for linux-ext4@vger.kernel.org; Mon, 10 Aug 2020 01:02:24 +0000 Received: by mail-qk1-f198.google.com with SMTP id a130so6092500qkg.9 for ; Sun, 09 Aug 2020 18:02:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OTgRyPr4ATTOsKSC00sYGFwp4YxQtlbP2Mw6EU51Lz4=; b=qxt6M4wURNJPQt7XcA6RZWz3zgAe6Z9khINE4eKIzEuBjuRPKaW6SZQZwUBUBCO7qh IuVzhA6gUtENVAuQDyvAwYfUiHNUfJ1kaXKJK6qxrU6A19H7sA0KLDr7fpWRer+PtLVG /fhbBR4AQ8I1fg9pGrYV1ulxYO5D2Y+dmk7V2/Ib8snKLnIEjCi7TcQPPw+wfbG7DeEI HkmBpZtc8uLTcH7cFzywJHXVSp52BUPLE5QceS11bznYisTd5Y3/xDgKiujMKwjnKkRp MP+iSdjRD0ZyPuU768RizlP48ZU9eZW7dIgLFPHpT0eG8XM+xere0Z3vzVLQ+UwsJ3Re ZKFQ== X-Gm-Message-State: AOAM531JvpdDosWeft0TWyjktv1tl5iEj1kbqAoFUnR07uhvsY6Jq+gG WCRJIjNooQ2xB1yPMpDvKZlKMugT1CtQy0xapKiORn1NJQ/LT6mS2WckQLAjJmDwTy+JgnXnZYP ubbiWwyIP4bJ2Hx+rqBCEUpVe+PQpnpPrhqR4Tlk= X-Received: by 2002:ae9:ebd0:: with SMTP id b199mr22706347qkg.294.1597021342975; Sun, 09 Aug 2020 18:02:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwJAmBaN3Mtp9AQd1vCl1OREa6PJqmBWU7zPcC8hLAKNd8L+F8/MqiMR7XTbNaGGtq5cijPBg== X-Received: by 2002:ae9:ebd0:: with SMTP id b199mr22706331qkg.294.1597021342641; Sun, 09 Aug 2020 18:02:22 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id 95sm44815qtc.29.2020.08.09.18.02.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Aug 2020 18:02:22 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier , Mauricio Faria de Oliveira , Jan Kara Subject: [RFC PATCH v2 3/5] ext4: data=journal: write-protect pages on submit inode data buffers callback Date: Sun, 9 Aug 2020 22:02:06 -0300 Message-Id: <20200810010210.3305322-4-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200810010210.3305322-1-mfo@canonical.com> References: <20200810010210.3305322-1-mfo@canonical.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This implements the journal's j_submit_inode_data_buffers() callback to write-protect the inode's pages with write_cache_pages(), and use a writepage callback to redirty pages with buffers that are not part of the committing transaction or the next transaction. And set a no-op function as j_finish_inode_data_buffers() callback (nothing needed other than the write-protect above.) Currently, the inode is added to the transaction's inode list in the __ext4_journalled_writepage() function. --- fs/ext4/inode.c | 4 +++ fs/ext4/super.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 10dd470876b3..978ccde8454f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1911,6 +1911,10 @@ static int __ext4_journalled_writepage(struct page *page, err = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL, write_end_fn); } + if (ret == 0) + ret = err; + // XXX: is this correct for inline data inodes? + err = ext4_jbd2_inode_add_write(handle, inode, 0, len); if (ret == 0) ret = err; EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 330957ed1f05..38aaac6572ea 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -472,6 +472,66 @@ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn) spin_unlock(&sbi->s_md_lock); } +/* + * This writepage callback for write_cache_pages() + * takes care of a few cases after page cleaning. + * + * write_cache_pages() already checks for dirty pages + * and calls clear_page_dirty_for_io(), which we want, + * to write protect the pages. + * + * However, we have to redirty a page in two cases: + * 1) some buffer is not part of the committing transaction + * 2) some buffer already has b_next_transaction set + */ + +static int ext4_journalled_writepage_callback(struct page *page, + struct writeback_control *wbc, + void *data) +{ + transaction_t *transaction = (transaction_t *) data; + struct buffer_head *bh, *head; + struct journal_head *jh; + + // XXX: any chance of !bh here? + bh = head = page_buffers(page); + do { + jh = bh2jh(bh); + if (!jh || jh->b_transaction != transaction || + jh->b_next_transaction) { + redirty_page_for_writepage(wbc, page); + goto out; + } + } while ((bh = bh->b_this_page) != head); + +out: + return AOP_WRITEPAGE_ACTIVATE; +} + +static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode) +{ + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + transaction_t *transaction = jinode->i_transaction; + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; + + struct writeback_control wbc = { + .sync_mode = WB_SYNC_ALL, + .nr_to_write = mapping->nrpages * 2, + .range_start = dirty_start, + .range_end = dirty_end, + }; + + return write_cache_pages(mapping, &wbc, + ext4_journalled_writepage_callback, + transaction); +} + +static int ext4_journalled_finish_inode_data_buffers(struct jbd2_inode *jinode) +{ + return 0; +} + static bool system_going_down(void) { return system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF @@ -4599,6 +4659,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) ext4_msg(sb, KERN_ERR, "can't mount with " "journal_async_commit in data=ordered mode"); goto failed_mount_wq; + } else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) { + sbi->s_journal->j_submit_inode_data_buffers = + ext4_journalled_submit_inode_data_buffers; + sbi->s_journal->j_finish_inode_data_buffers = + ext4_journalled_finish_inode_data_buffers; } set_task_ioprio(sbi->s_journal->j_task, journal_ioprio); From patchwork Mon Aug 10 01:02:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1342688 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BPyP11L14z9sV4 for ; Mon, 10 Aug 2020 11:02:29 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726402AbgHJBC2 (ORCPT ); Sun, 9 Aug 2020 21:02:28 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:36787 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726335AbgHJBC2 (ORCPT ); Sun, 9 Aug 2020 21:02:28 -0400 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k4wCo-00071K-J2 for linux-ext4@vger.kernel.org; Mon, 10 Aug 2020 01:02:26 +0000 Received: by mail-qk1-f197.google.com with SMTP id x18so1777822qkb.16 for ; Sun, 09 Aug 2020 18:02:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aT+TjmaBdGsCMxrZDeHyxopgcjq6gYSzqyWVPJ4XfFs=; b=RzuLr9/INP4Qe03GTXEF/58S/3k9MUPuEQandBY1nYomZKzF6/c3pTucIU+VHpHcFm TssyxFVe5FiidZA4eIjbG8bjiLF/p0y9q5v3aomJJaw4yKdM/0VCd42eKZ199cRSdUfW jpMR3+0kmLrioAEP1oCrJIY+9maH/1wLu0X9ekVPuMJbKyXCVnxeStGOalMiw95ZtiOx tdjvLzdgePaGdGnZRIpO2F9g1eCOla0vzaGJ5vvXte5N1Yr59uDG2MQzd5OBVkmX5OA5 GYRECHuv+hPNJ9GcOX0PZOOiGuIwhqwVBHH4qu+nBDtgm6U6To7ruIrEecSylI/IJmGY xmug== X-Gm-Message-State: AOAM5335+cKxzeT5uxDfqwRpZgJ+cGnWcvSgeADhhKJoqYuYdn0v3qQu EGiR9xSF8v5hVSbgV6Q+3Rjb14SW0CiNxgiwZ5/dsRBHpr2kmxeRuzP2zPszrTAPtPjG+gojZxR CV42XZ4I3aFk7R5BNd4EX7Fb+K/peG9lpb2P6JDc= X-Received: by 2002:a37:64d7:: with SMTP id y206mr23010306qkb.133.1597021345287; Sun, 09 Aug 2020 18:02:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyqDSExeByhzT3Alk1fW7vv+rAm3ND+Q5u7mGBL0p7LY12Xb2zMm16sOFRp9Ft6HlFekogAjw== X-Received: by 2002:a37:64d7:: with SMTP id y206mr23010281qkb.133.1597021345027; Sun, 09 Aug 2020 18:02:25 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id 95sm44815qtc.29.2020.08.09.18.02.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Aug 2020 18:02:24 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier , Mauricio Faria de Oliveira , Jan Kara Subject: [RFC PATCH v2 4/5] ext4: data=journal: add inode to transaction inode list in ext4_page_mkwrite() Date: Sun, 9 Aug 2020 22:02:07 -0300 Message-Id: <20200810010210.3305322-5-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200810010210.3305322-1-mfo@canonical.com> References: <20200810010210.3305322-1-mfo@canonical.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Since we only add the inode to the transaction's inode list in __ext4_journalled_writepage(), we depend on msync() or writeback work (which call it) for the write-protect mechanism to work. This test snippet shows that, as pwrite() gets the inode into a transaction (!= than into transaction's inode list), and addr[] write access gets the page writeably mapped. fd = open("file"); addr = mmap(fd); pwrite(fd, "a", 1, 0); // journals inode via ext4_write_begin() addr[0] = 'a'; // page is writeably mapped to user space. // periodic journal commit / jbd2 thread runs now. // __ext4_journalled_writepage() was not called yet. Now it's possible for a subsequent addr[] write access to race with the commit function, and possibly hit the window to cause invalid checksums. --- fs/ext4/inode.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 978ccde8454f..ce5464f92a7e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6008,9 +6008,10 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) len = PAGE_SIZE; /* * Return if we have all the buffers mapped. This avoids the need to do - * journal_start/journal_stop which can block and take a long time + * journal_start/journal_stop which can block and take a long time. But + * not on data journalling, as we have to add the inode to the txn list. */ - if (page_has_buffers(page)) { + if (page_has_buffers(page) && !ext4_should_journal_data(inode)) { if (!ext4_walk_page_buffers(NULL, page_buffers(page), 0, len, NULL, ext4_bh_unmapped)) { @@ -6043,6 +6044,12 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) goto out; } ext4_set_inode_state(inode, EXT4_STATE_JDATA); + if (ext4_jbd2_inode_add_write(handle, inode, 0, PAGE_SIZE)) { + unlock_page(page); + ret = VM_FAULT_SIGBUS; + ext4_journal_stop(handle); + goto out; + } } ext4_journal_stop(handle); if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) From patchwork Mon Aug 10 01:02:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1342689 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BPyP53yhzz9sTF for ; Mon, 10 Aug 2020 11:02:33 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726422AbgHJBCd (ORCPT ); Sun, 9 Aug 2020 21:02:33 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:36794 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726335AbgHJBCc (ORCPT ); Sun, 9 Aug 2020 21:02:32 -0400 Received: from mail-qv1-f71.google.com ([209.85.219.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k4wCr-00071j-6M for linux-ext4@vger.kernel.org; Mon, 10 Aug 2020 01:02:29 +0000 Received: by mail-qv1-f71.google.com with SMTP id x4so6265075qvu.18 for ; Sun, 09 Aug 2020 18:02:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ry5kClhVeJEU90WeiwwQYOS4C7MpG/trNSO+Ys37KEI=; b=Fa/nPOo8rfWoq1fwLQ4IfQaWnjZyFDR/PIb+i96rKf+oR3WUzjafa1+arZflTjDmZZ QTiM7SWYXwPpFxQ0AIDcXBrqBAL7qtlKZ/KefWxas0dbudr0adbxwrvKkSA/IzQ6FQbc qDK+IurQSiSuZ8gAqRKnLK+ulb3xC/AoztVC9k+OzuJlIyqhbSwYw+nFvtPa2YlCWdTJ tRGlh4GB5F32lPOTI61vvs6viVq/Qm7yYfezJZWEUQGlRSTSsJ4eJ6YxFk8qZZWZ+41L F2/pHraqAovmAuruiz3yzdiehbryegpzOSuiIs7fMqFkS8SoBOuQFg8hCXhlg1HiyuZt a+Dg== X-Gm-Message-State: AOAM533MMzXVpAy5A1Sij1dPRO9YbHe7DYOA06cuc5dVnA87dGRBxNL6 /CzUUapAYmk0lbXvaT11nTgxmTP+gdBxCW/dSYnv4TM0xr93X8hJ57bXgE1kIRTzBA5aJm8rDiv HYd7WGvIi4qhpi44YcJjn1P13V0af1wF05k7DQJM= X-Received: by 2002:a37:4048:: with SMTP id n69mr22876463qka.421.1597021347727; Sun, 09 Aug 2020 18:02:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyS5u+W/NYbVnEM0a/20Emez2vFGKGTGv0zXgDoOLBOVIpKfbijCYwOtQci6BJOwLwF69D7WA== X-Received: by 2002:a37:4048:: with SMTP id n69mr22876438qka.421.1597021347420; Sun, 09 Aug 2020 18:02:27 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id 95sm44815qtc.29.2020.08.09.18.02.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Aug 2020 18:02:26 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier , Mauricio Faria de Oliveira , Jan Kara Subject: [RFC PATCH v2 5/5] ext4/jbd2: debugging messages Date: Sun, 9 Aug 2020 22:02:08 -0300 Message-Id: <20200810010210.3305322-6-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200810010210.3305322-1-mfo@canonical.com> References: <20200810010210.3305322-1-mfo@canonical.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org For code tracking and deubgging purposes; some used in cover letter. --- fs/ext4/inode.c | 27 +++++++++++++++++++++++++++ fs/ext4/super.c | 10 ++++++++++ fs/jbd2/commit.c | 5 +++++ fs/jbd2/journal.c | 5 +++++ fs/jbd2/transaction.c | 4 ++++ 5 files changed, 51 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ce5464f92a7e..cd01aec87303 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -48,6 +48,17 @@ #include +#include "ext4_jbd2_dbg.h" + +static int ext4_bh_tdbg(handle_t *handle, struct buffer_head *bh) +{ + struct super_block *sb = bh->b_page->mapping->host->i_sb; + tdbg_ext4(sb, "bh: %px, data offset: %04llx, page: %px, inode: %px\n", + bh, (u64) bh->b_data & ((u64)PAGE_SIZE - 1), + bh->b_page, bh->b_page->mapping->host); + return 0; +} + static __u32 ext4_inode_csum(struct inode *inode, struct ext4_inode *raw, struct ext4_inode_info *ei) { @@ -1193,6 +1204,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, ret = __block_write_begin(page, pos, len, ext4_get_block); #endif if (!ret && ext4_should_journal_data(inode)) { + tdbg_ext4(inode->i_sb, "journal started: inode %px, txn %px", inode, handle->h_transaction); ret = ext4_walk_page_buffers(handle, page_buffers(page), from, to, NULL, do_journal_get_write_access); @@ -1446,6 +1458,7 @@ static int ext4_journalled_write_end(struct file *file, ext4_orphan_del(NULL, inode); } + tdbg_ext4(inode->i_sb, "journal stopped: inode %px", inode); return ret ? ret : copied; } @@ -1917,6 +1930,10 @@ static int __ext4_journalled_writepage(struct page *page, err = ext4_jbd2_inode_add_write(handle, inode, 0, len); if (ret == 0) ret = err; + { + struct super_block *sb = handle->h_transaction->t_journal->j_private; + tdbg_ext4(sb, "Added inode to txn list: inode %px, txn = %px, err = %d", inode, handle->h_transaction, err); + } EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid; err = ext4_journal_stop(handle); if (!ret) @@ -2035,6 +2052,7 @@ static int ext4_writepage(struct page *page, keep_towrite = true; } + tdbg_ext4(inode->i_sb, "called for inode %px by comm %s", inode, current->comm); if (PageChecked(page) && ext4_should_journal_data(inode)) /* * It's mmapped pagecache. Add buffers and journal it. There @@ -5969,6 +5987,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) get_block_t *get_block; int retries = 0; + tdbg_ext4(inode->i_sb, "entry for inode %px", inode); + if (unlikely(IS_IMMUTABLE(inode))) return VM_FAULT_SIGBUS; @@ -6006,6 +6026,9 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) len = size & ~PAGE_MASK; else len = PAGE_SIZE; + + ext4_walk_page_buffers(NULL, page_buffers(page), 0, len, NULL, ext4_bh_tdbg); + /* * Return if we have all the buffers mapped. This avoids the need to do * journal_start/journal_stop which can block and take a long time. But @@ -6018,6 +6041,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) /* Wait so that we don't change page under IO */ wait_for_stable_page(page); ret = VM_FAULT_LOCKED; + tdbg_ext4(inode->i_sb, "returning; all buffers mapped for inode %px", inode); goto out; } } @@ -6036,6 +6060,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) } err = block_page_mkwrite(vma, vmf, get_block); if (!err && ext4_should_journal_data(inode)) { + tdbg_ext4(inode->i_sb, "before djgwa(), for inode %px", inode); if (ext4_walk_page_buffers(handle, page_buffers(page), 0, PAGE_SIZE, NULL, do_journal_get_write_access)) { unlock_page(page); @@ -6043,6 +6068,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) ext4_journal_stop(handle); goto out; } + tdbg_ext4(inode->i_sb, "after djgwa(), for inode %px", inode); ext4_set_inode_state(inode, EXT4_STATE_JDATA); if (ext4_jbd2_inode_add_write(handle, inode, 0, PAGE_SIZE)) { unlock_page(page); @@ -6050,6 +6076,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) ext4_journal_stop(handle); goto out; } + tdbg_ext4(inode->i_sb, "Added inode to txn list: inode %px, txn = %px, err = 0", inode, handle->h_transaction); } ext4_journal_stop(handle); if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 38aaac6572ea..7167fcf60b5c 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -58,6 +58,8 @@ #define CREATE_TRACE_POINTS #include +#include "ext4_jbd2_dbg.h" + static struct ext4_lazy_init *ext4_li_info; static struct mutex ext4_li_mtx; static struct ratelimit_state ext4_mount_msg_ratelimit; @@ -492,14 +494,20 @@ static int ext4_journalled_writepage_callback(struct page *page, transaction_t *transaction = (transaction_t *) data; struct buffer_head *bh, *head; struct journal_head *jh; + struct super_block *sb = page->mapping->host->i_sb; // XXX: any chance of !bh here? bh = head = page_buffers(page); + tdbg_ext4(sb, "entry for bh %px, page %px, inode: %px", bh, page, page->mapping->host); do { jh = bh2jh(bh); if (!jh || jh->b_transaction != transaction || jh->b_next_transaction) { redirty_page_for_writepage(wbc, page); + tdbg_ext4(sb, "redirty for bh %px, jh, %px, txn %px, next_txn %px", + bh, jh, + jh ? jh->b_transaction : NULL, + jh ? jh->b_next_transaction : NULL); goto out; } } while ((bh = bh->b_this_page) != head); @@ -522,6 +530,7 @@ static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode) .range_end = dirty_end, }; + tdbg_ext4(jinode->i_vfs_inode->i_sb, "entry for inode: %px", jinode->i_vfs_inode); return write_cache_pages(mapping, &wbc, ext4_journalled_writepage_callback, transaction); @@ -529,6 +538,7 @@ static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode) static int ext4_journalled_finish_inode_data_buffers(struct jbd2_inode *jinode) { + tdbg_ext4(jinode->i_vfs_inode->i_sb, "entry for inode: %px", jinode->i_vfs_inode); return 0; } diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index b98d227b50d8..96f0d81eadf9 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -29,6 +29,8 @@ #include #include +#include "../ext4/ext4_jbd2_dbg.h" + static journal_t *force_panic; /* @@ -222,11 +224,14 @@ static int journal_submit_data_buffers(journal_t *journal, int err, ret = 0; struct address_space *mapping; + tdbg_jbd2(journal, "entry for transaction: 0x%px\n", commit_transaction); spin_lock(&journal->j_list_lock); list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) { loff_t dirty_start = jinode->i_dirty_start; loff_t dirty_end = jinode->i_dirty_end; + tdbg_jbd2(journal, "txn list has inode %px (write data flag: 0x%lx)\n", jinode->i_vfs_inode, (jinode->i_flags & JI_WRITE_DATA)); + if (!(jinode->i_flags & JI_WRITE_DATA)) continue; mapping = jinode->i_vfs_inode->i_mapping; diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index e4944436e733..b86b871ee823 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -48,6 +48,8 @@ #include #include +#include "../ext4/ext4_jbd2_dbg.h" + #ifdef CONFIG_JBD2_DEBUG ushort jbd2_journal_enable_debug __read_mostly; EXPORT_SYMBOL(jbd2_journal_enable_debug); @@ -453,6 +455,9 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction, *bh_out = new_bh; + tdbg_jbd2(transaction->t_journal, "copy out: done/need %d/%d, bh: %px, offset: %04x, page: %px, inode: %px\n", + done_copy_out, need_copy_out, jh2bh(jh_in), new_offset, new_page, new_page->mapping ? new_page->mapping->host : NULL); + /* * The to-be-written buffer needs to get moved to the io queue, * and the original buffer whose contents we are shadowing or diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index e91aad3637a2..93a55a228e08 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -30,6 +30,8 @@ #include +#include "../ext4/ext4_jbd2_dbg.h" + static void __jbd2_journal_temp_unlink_buffer(struct journal_head *jh); static void __jbd2_journal_unfile_buffer(struct journal_head *jh); @@ -952,6 +954,8 @@ do_get_write_access(handle_t *handle, struct journal_head *jh, repeat: bh = jh2bh(jh); + tdbg_jbd2(journal, "entry for bh: %px, offset: %04lx, page %px, inode: %px", + bh, offset_in_page(bh->b_data), bh->b_page, bh->b_page->mapping->host); /* @@@ Need to check for errors here at some point. */ start_lock = jiffies;