From patchwork Thu Sep 10 19:31:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1361903 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BnTb83VgTz9sV5 for ; Fri, 11 Sep 2020 05:33:56 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728014AbgIJTdy (ORCPT ); Thu, 10 Sep 2020 15:33:54 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:43006 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727945AbgIJTbh (ORCPT ); Thu, 10 Sep 2020 15:31:37 -0400 Received: from mail-qk1-f200.google.com ([209.85.222.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kGSIB-0000ib-37 for linux-ext4@vger.kernel.org; Thu, 10 Sep 2020 19:31:35 +0000 Received: by mail-qk1-f200.google.com with SMTP id s141so4321837qka.13 for ; Thu, 10 Sep 2020 12:31:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3SZen2y6K+eX+Acff0PC37lH0dwsf+MyxigV9F/BcKg=; b=eh7iZNizR3qu8eqoU96Vbp96ZCY9kWRKrPvtUTyl6hmb2+P2WkWmpWEDX3zcjXEgQ+ 1wMb6DAWF0E5wcV2OqxCqcclCvw3dK43WSzsqLQt+NGxBKg7JOM/TCTeyV4378NbDvdh hg7JD31zq8XIPZDbxnPBEu/HvNuMACZxD1JMNcic7O7rQDb2+olfMstiU31t6IzEFHhY EY3U7EUNkVxUmhmwT5lIhRiWICc8jHUkkDAZp4479Gl78uyzey0OpXuWirZvP4zbn5X0 lpHeG9vTltjeLQ/trYLFoMu30cL+VwQBN3tEkH1dW4NPyL79GJFqxGEjLRCiO+JGgihG ZGIw== X-Gm-Message-State: AOAM533oW2YPmhLZp6d7cWYlMZ53EmVObD5GJqOYTcj/+fNKg2COjF8V 4KIqbHGl9f2DSYdITqQQLXJpN/9e3yeH/tlvK1S+Lj7ETMp6AXzYzIWh2cq+FW1NOc0jfgUxbd2 FOh6dUYXto8mZF3XkmDj0tdG/IKWjTRFqbTylm6Y= X-Received: by 2002:ac8:743:: with SMTP id k3mr9878649qth.182.1599766294104; Thu, 10 Sep 2020 12:31:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy/VErnBmL5lJo1R9q0rx7mfoORd6dxd7jVrDbB6VHvn5oCmJXu4teKr8Zo77K/YeDe+/3Saw== X-Received: by 2002:ac8:743:: with SMTP id k3mr9878630qth.182.1599766293837; Thu, 10 Sep 2020 12:31:33 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id u4sm6410391qkk.68.2020.09.10.12.31.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Sep 2020 12:31:33 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier , Mauricio Faria de Oliveira Subject: [RFC PATCH v3 1/3] jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers() Date: Thu, 10 Sep 2020 16:31:25 -0300 Message-Id: <20200910193127.276214-2-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200910193127.276214-1-mfo@canonical.com> References: <20200910193127.276214-1-mfo@canonical.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Export functions that implement the current behavior done for an inode in journal_submit|finish_inode_data_buffers(). No functional change. Signed-off-by: Mauricio Faria de Oliveira Suggested-by: Jan Kara Reviewed-by: Jan Kara --- fs/jbd2/commit.c | 32 +++++++++++++++++--------------- fs/jbd2/journal.c | 2 ++ include/linux/jbd2.h | 4 ++++ 3 files changed, 23 insertions(+), 15 deletions(-) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 6d2da8ad0e6f..c17cda96926e 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -187,9 +187,11 @@ static int journal_wait_on_commit_record(journal_t *journal, * use writepages() because with delayed allocation we may be doing * block allocation in writepages(). */ -static int journal_submit_inode_data_buffers(struct address_space *mapping, - loff_t dirty_start, loff_t dirty_end) +int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) { + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; int ret; struct writeback_control wbc = { .sync_mode = WB_SYNC_ALL, @@ -215,16 +217,11 @@ static int journal_submit_data_buffers(journal_t *journal, { struct jbd2_inode *jinode; int err, ret = 0; - struct address_space *mapping; spin_lock(&journal->j_list_lock); list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) { - loff_t dirty_start = jinode->i_dirty_start; - loff_t dirty_end = jinode->i_dirty_end; - if (!(jinode->i_flags & JI_WRITE_DATA)) continue; - mapping = jinode->i_vfs_inode->i_mapping; jinode->i_flags |= JI_COMMIT_RUNNING; spin_unlock(&journal->j_list_lock); /* @@ -234,8 +231,7 @@ static int journal_submit_data_buffers(journal_t *journal, * only allocated blocks here. */ trace_jbd2_submit_inode_data(jinode->i_vfs_inode); - err = journal_submit_inode_data_buffers(mapping, dirty_start, - dirty_end); + err = jbd2_journal_submit_inode_data_buffers(jinode); if (!ret) ret = err; spin_lock(&journal->j_list_lock); @@ -248,6 +244,17 @@ static int journal_submit_data_buffers(journal_t *journal, return ret; } +int jbd2_journal_finish_inode_data_buffers(struct jbd2_inode *jinode) +{ + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; + int ret; + + ret = filemap_fdatawait_range_keep_errors(mapping, dirty_start, dirty_end); + return ret; +} + /* * Wait for data submitted for writeout, refile inodes to proper * transaction if needed. @@ -262,16 +269,11 @@ static int journal_finish_inode_data_buffers(journal_t *journal, /* For locking, see the comment in journal_submit_data_buffers() */ spin_lock(&journal->j_list_lock); list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) { - loff_t dirty_start = jinode->i_dirty_start; - loff_t dirty_end = jinode->i_dirty_end; - if (!(jinode->i_flags & JI_WAIT_DATA)) continue; jinode->i_flags |= JI_COMMIT_RUNNING; spin_unlock(&journal->j_list_lock); - err = filemap_fdatawait_range_keep_errors( - jinode->i_vfs_inode->i_mapping, dirty_start, - dirty_end); + err = jbd2_journal_finish_inode_data_buffers(jinode); if (!ret) ret = err; spin_lock(&journal->j_list_lock); diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 17fdc482f554..c0600405e7a2 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -91,6 +91,8 @@ EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers); EXPORT_SYMBOL(jbd2_journal_force_commit); EXPORT_SYMBOL(jbd2_journal_inode_ranged_write); EXPORT_SYMBOL(jbd2_journal_inode_ranged_wait); +EXPORT_SYMBOL(jbd2_journal_submit_inode_data_buffers); +EXPORT_SYMBOL(jbd2_journal_finish_inode_data_buffers); EXPORT_SYMBOL(jbd2_journal_init_jbd_inode); EXPORT_SYMBOL(jbd2_journal_release_jbd_inode); EXPORT_SYMBOL(jbd2_journal_begin_ordered_truncate); diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 08f904943ab2..2865a5475888 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1421,6 +1421,10 @@ extern int jbd2_journal_inode_ranged_write(handle_t *handle, extern int jbd2_journal_inode_ranged_wait(handle_t *handle, struct jbd2_inode *inode, loff_t start_byte, loff_t length); +extern int jbd2_journal_submit_inode_data_buffers( + struct jbd2_inode *jinode); +extern int jbd2_journal_finish_inode_data_buffers( + struct jbd2_inode *jinode); extern int jbd2_journal_begin_ordered_truncate(journal_t *journal, struct jbd2_inode *inode, loff_t new_size); extern void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode); From patchwork Thu Sep 10 19:31:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1361902 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BnTb41w04z9sR4 for ; Fri, 11 Sep 2020 05:33:52 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727992AbgIJTdn (ORCPT ); Thu, 10 Sep 2020 15:33:43 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:43010 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727977AbgIJTbk (ORCPT ); Thu, 10 Sep 2020 15:31:40 -0400 Received: from mail-qv1-f70.google.com ([209.85.219.70]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kGSID-0000jI-Li for linux-ext4@vger.kernel.org; Thu, 10 Sep 2020 19:31:37 +0000 Received: by mail-qv1-f70.google.com with SMTP id j5so3916386qvb.16 for ; Thu, 10 Sep 2020 12:31:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qAGZ3zwaYkfZjWHevep26DeFrC83Znglqf4joWJDHuE=; b=bZpXzbF0KaBUJy4KJakKVFneGCXkkSYrQnbdmkGo6Av8GzJrL9H1BEtL2rHM33b2YY OFl35L3CMmfQNhqxdTxL8uKMIPvjL3dY+WqxJ7udqOWaVBjhAOrkAyU3cxhZ4GLVoRt1 xeIrbo09ZDoqhAMB3Fv9PbNE6U5MqfoPRIDMoW4rc8y+nH1YFE2yMhWH0hsDBgzqZH07 9msYLVvE3a1o07iG30aECjnUIfYf/Xlnk9NsfXnOXJnNUcOoRxxaveGg5kEB1ncwm5AA CGKm/kZv5cbAauB/KT7J2KFu7SWUusaABjDDRe++Ukd69Ynfc4+6cU5DVZvgqq/m90FU ERbA== X-Gm-Message-State: AOAM532bMu12XgzrscqckNdNKiQTy3yUx3JW/FUB+SMksbyGtFzfYzM1 R6oxX20D//O3u5F47A6TA5ztw9JNJUxAVpFIzmP/xn2ZLFsL7HgNql3pMCqcPxkQas9qv67M0VO 9yIpfNtJVI5dhaS3LAvaiWJ3rq3EllAW/fPiDx0A= X-Received: by 2002:a0c:c58d:: with SMTP id a13mr3744217qvj.113.1599766296458; Thu, 10 Sep 2020 12:31:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyNQu5/K0I6VC6pQFpz0YonYa7QFmOdaeE+XTXxNmgy3gGh1cPn345LfvS9vbX5hCMwkYzFXw== X-Received: by 2002:a0c:c58d:: with SMTP id a13mr3744196qvj.113.1599766296194; Thu, 10 Sep 2020 12:31:36 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id u4sm6410391qkk.68.2020.09.10.12.31.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Sep 2020 12:31:35 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier , Mauricio Faria de Oliveira Subject: [RFC PATCH v3 2/3] jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers() Date: Thu, 10 Sep 2020 16:31:26 -0300 Message-Id: <20200910193127.276214-3-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200910193127.276214-1-mfo@canonical.com> References: <20200910193127.276214-1-mfo@canonical.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Introduce journal callbacks to allow different behaviors for an inode in journal_submit|finish_inode_data_buffers(). The existing users of the current behavior (ext4, ocfs2) are adapted to use the previously exported functions that implement the current behavior. Users are callers of jbd2_journal_inode_ranged_write|wait(), which adds the inode to the transaction's inode list with the JI_WRITE|WAIT_DATA flags. Only ext4 and ocfs2 in-tree. Signed-off-by: Mauricio Faria de Oliveira Suggested-by: Jan Kara Reviewed-by: Jan Kara --- fs/ext4/super.c | 14 ++++++++++++++ fs/jbd2/commit.c | 30 ++++++++++++++++++------------ fs/ocfs2/super.c | 15 +++++++++++++++ include/linux/jbd2.h | 25 ++++++++++++++++++++++++- 4 files changed, 71 insertions(+), 13 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ea425b49b345..7303839d7ad9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -472,6 +472,16 @@ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn) spin_unlock(&sbi->s_md_lock); } +static int ext4_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) +{ + return jbd2_journal_submit_inode_data_buffers(jinode); +} + +static int ext4_journal_finish_inode_data_buffers(struct jbd2_inode *jinode) +{ + return jbd2_journal_finish_inode_data_buffers(jinode); +} + static bool system_going_down(void) { return system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF @@ -4646,6 +4656,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) set_task_ioprio(sbi->s_journal->j_task, journal_ioprio); sbi->s_journal->j_commit_callback = ext4_journal_commit_callback; + sbi->s_journal->j_submit_inode_data_buffers = + ext4_journal_submit_inode_data_buffers; + sbi->s_journal->j_finish_inode_data_buffers = + ext4_journal_finish_inode_data_buffers; no_journal: if (!test_opt(sb, NO_MBCACHE)) { diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index c17cda96926e..23d3fcc11b97 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -200,6 +200,12 @@ int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) .range_end = dirty_end, }; + /* + * submit the inode data buffers. We use writepage + * instead of writepages. Because writepages can do + * block allocation with delalloc. We need to write + * only allocated blocks here. + */ ret = generic_writepages(mapping, &wbc); return ret; } @@ -224,16 +230,13 @@ static int journal_submit_data_buffers(journal_t *journal, continue; jinode->i_flags |= JI_COMMIT_RUNNING; spin_unlock(&journal->j_list_lock); - /* - * submit the inode data buffers. We use writepage - * instead of writepages. Because writepages can do - * block allocation with delalloc. We need to write - * only allocated blocks here. - */ + /* submit the inode data buffers. */ trace_jbd2_submit_inode_data(jinode->i_vfs_inode); - err = jbd2_journal_submit_inode_data_buffers(jinode); - if (!ret) - ret = err; + if (journal->j_submit_inode_data_buffers) { + err = journal->j_submit_inode_data_buffers(jinode); + if (!ret) + ret = err; + } spin_lock(&journal->j_list_lock); J_ASSERT(jinode->i_transaction == commit_transaction); jinode->i_flags &= ~JI_COMMIT_RUNNING; @@ -273,9 +276,12 @@ static int journal_finish_inode_data_buffers(journal_t *journal, continue; jinode->i_flags |= JI_COMMIT_RUNNING; spin_unlock(&journal->j_list_lock); - err = jbd2_journal_finish_inode_data_buffers(jinode); - if (!ret) - ret = err; + /* wait for the inode data buffers writeout. */ + if (journal->j_finish_inode_data_buffers) { + err = journal->j_finish_inode_data_buffers(jinode); + if (!ret) + ret = err; + } spin_lock(&journal->j_list_lock); jinode->i_flags &= ~JI_COMMIT_RUNNING; smp_mb(); diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 1d91dd1e8711..f4e62aafc89c 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -2010,6 +2010,16 @@ static int ocfs2_journal_addressable(struct ocfs2_super *osb) return status; } +static int ocfs2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) +{ + return jbd2_journal_submit_inode_data_buffers(jinode); +} + +static int ocfs2_journal_finish_inode_data_buffers(struct jbd2_inode *jinode) +{ + return jbd2_journal_finish_inode_data_buffers(jinode); +} + static int ocfs2_initialize_super(struct super_block *sb, struct buffer_head *bh, int sector_size, @@ -2211,6 +2221,11 @@ static int ocfs2_initialize_super(struct super_block *sb, } osb->journal = journal; journal->j_osb = osb; + journal->j_journal->j_submit_inode_data_buffers = + ocfs2_journal_submit_inode_data_buffers; + journal->j_journal->j_finish_inode_data_buffers = + ocfs2_journal_finish_inode_data_buffers; + atomic_set(&journal->j_num_trans, 0); init_rwsem(&journal->j_trans_barrier); diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 2865a5475888..4aaa408c0ca7 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -629,7 +629,9 @@ struct transaction_s struct journal_head *t_shadow_list; /* - * List of inodes whose data we've modified in data=ordered mode. + * List of inodes associated with the transaction; e.g., ext4 uses + * this to track inodes in data=ordered and data=journal mode that + * need special handling on transaction commit; also used by ocfs2. * [j_list_lock] */ struct list_head t_inode_list; @@ -1111,6 +1113,27 @@ struct journal_s void (*j_commit_callback)(journal_t *, transaction_t *); + /** + * @j_submit_inode_data_buffers: + * + * This function is called for all inodes associated with the + * committing transaction marked with JI_WRITE_DATA flag + * before we start to write out the transaction to the journal. + */ + int (*j_submit_inode_data_buffers) + (struct jbd2_inode *); + + /** + * @j_finish_inode_data_buffers: + * + * This function is called for all inodes associated with the + * committing transaction marked with JI_WAIT_DATA flag + * after we have written the transaction to the journal + * but before we write out the commit block. + */ + int (*j_finish_inode_data_buffers) + (struct jbd2_inode *); + /* * Journal statistics */ From patchwork Thu Sep 10 19:31:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1361901 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BnTZv5VR3z9sTK for ; Fri, 11 Sep 2020 05:33:43 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727815AbgIJTdl (ORCPT ); Thu, 10 Sep 2020 15:33:41 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:43016 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727984AbgIJTbn (ORCPT ); Thu, 10 Sep 2020 15:31:43 -0400 Received: from mail-qv1-f71.google.com ([209.85.219.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kGSIF-0000jj-VZ for linux-ext4@vger.kernel.org; Thu, 10 Sep 2020 19:31:40 +0000 Received: by mail-qv1-f71.google.com with SMTP id p20so3952061qvl.4 for ; Thu, 10 Sep 2020 12:31:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=G0YwzWk/54GR5D0aP15u02c0dC9pXHYaZ/gVMt6Bajs=; b=qAiDA6qzZh2k2GZsoGYC5o0rdu5odR5mgOS8FV0dj44giuE0oFugvrYi2nCmGHzWoT BkAg19pj8JHDTm20usjxpFOG++iCCpT4q13wKZJklqP7U5m5Ur5XEGr9/ety1lxDw4xa 71z2Kslstw3jilAdvzOPd2r0+G3g1Zod+ATPg68PY7mmNVzY2/VHRDQZWJuKreyumNEo v/dN8/KK8ND9o40HDphbXlnmVMK4CFSxFyn5lVG9J8b2CYnk39Yyn8VcLTc+Vfeox4lt dsZGNzGqgDIAXqTdeOWAeDSsbFsI4yMDRj0HWlyIUY+YsgEgqp+iZlNzdWt6heHF8b6u +t3Q== X-Gm-Message-State: AOAM5328BO2s6LRvyZRGR9ccgMAZTpjCNEVsWD7he5fYNR6orAU6WZaU gU9j/TldmFBR7Cf0vSvRNyLqKPm3omiTPOw1WOw7Bt/mH9iXzqtgYKJ895YeVkvvzLPB8W89TK1 fPJA8nc6ZWGHCKJgubYGDVXGqz/cV99YFZfXgAdI= X-Received: by 2002:ad4:53a8:: with SMTP id j8mr10118812qvv.26.1599766299000; Thu, 10 Sep 2020 12:31:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxHgPpvGaGtGfZRsHpdxexUIT7C7s17E87NK/R+hdim871YxH7FgycxfLMP7o77rhCv9WNZ6Q== X-Received: by 2002:ad4:53a8:: with SMTP id j8mr10118793qvv.26.1599766298756; Thu, 10 Sep 2020 12:31:38 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id u4sm6410391qkk.68.2020.09.10.12.31.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Sep 2020 12:31:38 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier , Mauricio Faria de Oliveira Subject: [RFC PATCH v3 3/3] ext4: data=journal: write-protect pages on j_submit_inode_data_buffers() Date: Thu, 10 Sep 2020 16:31:27 -0300 Message-Id: <20200910193127.276214-4-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200910193127.276214-1-mfo@canonical.com> References: <20200910193127.276214-1-mfo@canonical.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This implements journal callbacks j_submit|finish_inode_data_buffers() with different behavior for data=journal: to write-protect pages under commit, preventing changes to buffers writeably mapped to userspace. If a buffer's content changes between commit's checksum calculation and write-out to disk, it can cause journal recovery/mount failures upon a kernel crash or power loss. [ 27.334874] EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support! [ 27.339492] JBD2: Invalid checksum recovering data block 8705 in log [ 27.342716] JBD2: recovery failed [ 27.343316] EXT4-fs (loop0): error loading journal mount: /ext4: can't read superblock on /dev/loop0. In j_submit_inode_data_buffers() we write-protect the inode's pages with write_cache_pages() and redirty w/ writepage callback if needed. In j_finish_inode_data_buffers() there is nothing do to. And in order to use the callbacks, inodes are added to the inode list in transaction in __ext4_journalled_writepage() and ext4_page_mkwrite(). In ext4_page_mkwrite() we must make sure that: 1) the inode is always added to the list; thus we skip the 'all buffers mapped' optimization on data=journal; 2) the buffers are attached to transaction as dirty; as already done in __ext4_journalled_writepage(). Signed-off-by: Mauricio Faria de Oliveira Suggested-by: Jan Kara Reported-by: Dann Frazier --- fs/ext4/inode.c | 29 ++++++++++++++------ fs/ext4/super.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 91 insertions(+), 10 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bf596467c234..fa4109da056c 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1910,6 +1910,9 @@ static int __ext4_journalled_writepage(struct page *page, err = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL, write_end_fn); } + if (ret == 0) + ret = err; + err = ext4_jbd2_inode_add_write(handle, inode, 0, len); if (ret == 0) ret = err; EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid; @@ -6004,9 +6007,12 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) len = PAGE_SIZE; /* * Return if we have all the buffers mapped. This avoids the need to do - * journal_start/journal_stop which can block and take a long time + * journal_start/journal_stop which can block and take a long time. + * + * This cannot be done for data journalling, as we have to add the + * inode to the transaction's list to writeprotect pages on commit. */ - if (page_has_buffers(page)) { + if (page_has_buffers(page) && !ext4_should_journal_data(inode)) { if (!ext4_walk_page_buffers(NULL, page_buffers(page), 0, len, NULL, ext4_bh_unmapped)) { @@ -6032,12 +6038,14 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) err = block_page_mkwrite(vma, vmf, get_block); if (!err && ext4_should_journal_data(inode)) { if (ext4_walk_page_buffers(handle, page_buffers(page), 0, - PAGE_SIZE, NULL, do_journal_get_write_access)) { - unlock_page(page); - ret = VM_FAULT_SIGBUS; - ext4_journal_stop(handle); - goto out; - } + PAGE_SIZE, NULL, do_journal_get_write_access)) + goto out_err; + /* Make sure buffers are attached to the transaction as dirty */ + if (ext4_walk_page_buffers(handle, page_buffers(page), 0, + PAGE_SIZE, NULL, write_end_fn)) + goto out_err; + if (ext4_jbd2_inode_add_write(handle, inode, 0, PAGE_SIZE)) + goto out_err; ext4_set_inode_state(inode, EXT4_STATE_JDATA); } ext4_journal_stop(handle); @@ -6049,6 +6057,11 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) up_read(&EXT4_I(inode)->i_mmap_sem); sb_end_pagefault(inode->i_sb); return ret; +out_err: + unlock_page(page); + ret = VM_FAULT_SIGBUS; + ext4_journal_stop(handle); + goto out; } vm_fault_t ext4_filemap_fault(struct vm_fault *vmf) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 7303839d7ad9..528b5e20b71c 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -472,14 +472,82 @@ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn) spin_unlock(&sbi->s_md_lock); } +/* + * This writepage callback for write_cache_pages() + * takes care of a few cases after page cleaning. + * + * write_cache_pages() already checks for dirty pages + * and calls clear_page_dirty_for_io(), which we want, + * to write protect the pages. + * + * However, we have to redirty a page in these cases: + * 1) some buffer is dirty (needs checkpointing) + * 2) some buffer is not part of the committing transaction + * 3) some buffer already has b_next_transaction set + */ + +static int ext4_journalled_writepage_callback(struct page *page, + struct writeback_control *wbc, + void *data) +{ + transaction_t *transaction = (transaction_t *) data; + struct buffer_head *bh, *head; + struct journal_head *jh; + + bh = head = page_buffers(page); + do { + jh = bh2jh(bh); + if (buffer_dirty(bh) || + (jh && (jh->b_transaction != transaction || + jh->b_next_transaction))) { + redirty_page_for_writepage(wbc, page); + goto out; + } + } while ((bh = bh->b_this_page) != head); + +out: + return AOP_WRITEPAGE_ACTIVATE; +} + +static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode) +{ + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + transaction_t *transaction = jinode->i_transaction; + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; + + struct writeback_control wbc = { + .sync_mode = WB_SYNC_ALL, + .nr_to_write = ~0ULL, + .range_start = dirty_start, + .range_end = dirty_end, + }; + + return write_cache_pages(mapping, &wbc, + ext4_journalled_writepage_callback, + transaction); +} + static int ext4_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) { - return jbd2_journal_submit_inode_data_buffers(jinode); + int ret; + + if (ext4_should_journal_data(jinode->i_vfs_inode)) + ret = ext4_journalled_submit_inode_data_buffers(jinode); + else + ret = jbd2_journal_submit_inode_data_buffers(jinode); + + return ret; } static int ext4_journal_finish_inode_data_buffers(struct jbd2_inode *jinode) { - return jbd2_journal_finish_inode_data_buffers(jinode); + int ret = 0; + + if (!ext4_should_journal_data(jinode->i_vfs_inode)) + ret = jbd2_journal_finish_inode_data_buffers(jinode); + + return ret; } static bool system_going_down(void)