From patchwork Mon Sep 28 19:41:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1372869 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4C0XvH2jdLz9sSf for ; Tue, 29 Sep 2020 05:41:15 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726566AbgI1TlP (ORCPT ); Mon, 28 Sep 2020 15:41:15 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:38907 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726465AbgI1TlO (ORCPT ); Mon, 28 Sep 2020 15:41:14 -0400 Received: from mail-qk1-f200.google.com ([209.85.222.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kMz1L-00029M-G3 for linux-ext4@vger.kernel.org; Mon, 28 Sep 2020 19:41:11 +0000 Received: by mail-qk1-f200.google.com with SMTP id w64so1278709qkc.14 for ; Mon, 28 Sep 2020 12:41:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VRC6uFJa1v8Fej+10IywOo/zW3EV6qwHE1HatZ5a/9o=; b=D4OkumUD1/8lTFVcDLqHa4kCa44iz2RfZqubfARXaTRv9Fr10RGY3TvGybNg0FIYO+ c+ovOQKl79puu+7GlFx1KYtyT0zv6PYy4F6y3D5b9bpxNkQ/hw0GtYwwbdA41HPFE7LB S4zuBndufqEYJtsvlBHSGTivDXoH++NuNpq1L9qZpy5pDOSDlvCmax9FQ4PejpEl4Sk1 vm61ozt/pCCjTiDNGW1tKSA0SejUOzHe8y1Z0qFsg24us9ywPdO7Z+NTmfqFQXSmKfns opCyCDjrmrwJw+MldyhH0ZbVrLKsTzGJdC9w+JvpD5sA++u18ezTEop4UCG96ujnmtXn 8CDA== X-Gm-Message-State: AOAM531Rl3zxvIMLDwh+6QIuHc7RNhK9RmvaUjqrenpw75AgPaR+4yhZ 8r2zpjrEMWXEoZPkO8WJBM5L24/5piz8yp8FG0W8gzjpnX0rrT6c+eHkKi1HNctQduSlPNdX7gB EuqTj4hDwIWhxW7dY9z3b/mfdNnaYFQjURIbXljw= X-Received: by 2002:a05:620a:15f6:: with SMTP id p22mr1107349qkm.198.1601322070377; Mon, 28 Sep 2020 12:41:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyUkO8HAXQPZJp4JldR/iFSvoh9ZAMJmIHkknsv/GFMjziGcuK/gfbz13OZoS/gjSefOxqKfw== X-Received: by 2002:a05:620a:15f6:: with SMTP id p22mr1107317qkm.198.1601322070108; Mon, 28 Sep 2020 12:41:10 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id u15sm2360222qtj.3.2020.09.28.12.41.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Sep 2020 12:41:09 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier Subject: [RFC PATCH v4 1/4] jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers() Date: Mon, 28 Sep 2020 16:41:00 -0300 Message-Id: <20200928194103.244692-2-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200928194103.244692-1-mfo@canonical.com> References: <20200928194103.244692-1-mfo@canonical.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Export functions that implement the current behavior done for an inode in journal_submit|finish_inode_data_buffers(). No functional change. Signed-off-by: Mauricio Faria de Oliveira Suggested-by: Jan Kara Reviewed-by: Jan Kara Reviewed-by: Andreas Dilger --- fs/jbd2/commit.c | 32 +++++++++++++++++--------------- fs/jbd2/journal.c | 2 ++ include/linux/jbd2.h | 4 ++++ 3 files changed, 23 insertions(+), 15 deletions(-) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 6d2da8ad0e6f..c17cda96926e 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -187,9 +187,11 @@ static int journal_wait_on_commit_record(journal_t *journal, * use writepages() because with delayed allocation we may be doing * block allocation in writepages(). */ -static int journal_submit_inode_data_buffers(struct address_space *mapping, - loff_t dirty_start, loff_t dirty_end) +int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) { + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; int ret; struct writeback_control wbc = { .sync_mode = WB_SYNC_ALL, @@ -215,16 +217,11 @@ static int journal_submit_data_buffers(journal_t *journal, { struct jbd2_inode *jinode; int err, ret = 0; - struct address_space *mapping; spin_lock(&journal->j_list_lock); list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) { - loff_t dirty_start = jinode->i_dirty_start; - loff_t dirty_end = jinode->i_dirty_end; - if (!(jinode->i_flags & JI_WRITE_DATA)) continue; - mapping = jinode->i_vfs_inode->i_mapping; jinode->i_flags |= JI_COMMIT_RUNNING; spin_unlock(&journal->j_list_lock); /* @@ -234,8 +231,7 @@ static int journal_submit_data_buffers(journal_t *journal, * only allocated blocks here. */ trace_jbd2_submit_inode_data(jinode->i_vfs_inode); - err = journal_submit_inode_data_buffers(mapping, dirty_start, - dirty_end); + err = jbd2_journal_submit_inode_data_buffers(jinode); if (!ret) ret = err; spin_lock(&journal->j_list_lock); @@ -248,6 +244,17 @@ static int journal_submit_data_buffers(journal_t *journal, return ret; } +int jbd2_journal_finish_inode_data_buffers(struct jbd2_inode *jinode) +{ + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; + int ret; + + ret = filemap_fdatawait_range_keep_errors(mapping, dirty_start, dirty_end); + return ret; +} + /* * Wait for data submitted for writeout, refile inodes to proper * transaction if needed. @@ -262,16 +269,11 @@ static int journal_finish_inode_data_buffers(journal_t *journal, /* For locking, see the comment in journal_submit_data_buffers() */ spin_lock(&journal->j_list_lock); list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) { - loff_t dirty_start = jinode->i_dirty_start; - loff_t dirty_end = jinode->i_dirty_end; - if (!(jinode->i_flags & JI_WAIT_DATA)) continue; jinode->i_flags |= JI_COMMIT_RUNNING; spin_unlock(&journal->j_list_lock); - err = filemap_fdatawait_range_keep_errors( - jinode->i_vfs_inode->i_mapping, dirty_start, - dirty_end); + err = jbd2_journal_finish_inode_data_buffers(jinode); if (!ret) ret = err; spin_lock(&journal->j_list_lock); diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 17fdc482f554..c0600405e7a2 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -91,6 +91,8 @@ EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers); EXPORT_SYMBOL(jbd2_journal_force_commit); EXPORT_SYMBOL(jbd2_journal_inode_ranged_write); EXPORT_SYMBOL(jbd2_journal_inode_ranged_wait); +EXPORT_SYMBOL(jbd2_journal_submit_inode_data_buffers); +EXPORT_SYMBOL(jbd2_journal_finish_inode_data_buffers); EXPORT_SYMBOL(jbd2_journal_init_jbd_inode); EXPORT_SYMBOL(jbd2_journal_release_jbd_inode); EXPORT_SYMBOL(jbd2_journal_begin_ordered_truncate); diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 08f904943ab2..2865a5475888 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1421,6 +1421,10 @@ extern int jbd2_journal_inode_ranged_write(handle_t *handle, extern int jbd2_journal_inode_ranged_wait(handle_t *handle, struct jbd2_inode *inode, loff_t start_byte, loff_t length); +extern int jbd2_journal_submit_inode_data_buffers( + struct jbd2_inode *jinode); +extern int jbd2_journal_finish_inode_data_buffers( + struct jbd2_inode *jinode); extern int jbd2_journal_begin_ordered_truncate(journal_t *journal, struct jbd2_inode *inode, loff_t new_size); extern void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode); From patchwork Mon Sep 28 19:41:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1372870 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4C0XvK1v7Zz9s1t for ; Tue, 29 Sep 2020 05:41:17 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726607AbgI1TlR (ORCPT ); Mon, 28 Sep 2020 15:41:17 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:38911 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726328AbgI1TlQ (ORCPT ); Mon, 28 Sep 2020 15:41:16 -0400 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kMz1N-00029r-JS for linux-ext4@vger.kernel.org; Mon, 28 Sep 2020 19:41:13 +0000 Received: by mail-qk1-f197.google.com with SMTP id 125so1291222qkh.4 for ; Mon, 28 Sep 2020 12:41:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dsydCJbpZJgqvBtcGppNpwTpopS5m5EZ8lb0YworUAo=; b=ktUFQPKK4IYzKMXbzH8JyM+p6nwE+EV6H0wVINzoRaiEepyzWdh2NFmeNnB7t3r8V7 t2zNaFnDfcAAlWlK9bTl/rVxMmG2Xb9XI9xh3NFxpBqwQ4yzqs4i5G7vcNYjMnXuPGex tQRjK6qtzr87SEe6Ew7RfRJQSmSwtF+IpZ4KRcfR0cVTkKKHMtjDawRwhqV0n7mkAvCa K7zQVoaAhlbwYkQpZUnAce0H0R8GDzIj+m/8zvLBWO26QaQnQoR6Dvhx5vvLeR9AFEYu 1d2q1wkypZgAgNzfZdS3S7KR3BL/Ot+gGXrxP9hGaEfz9vDuv78TeCQtKsZCdqsk6yv9 ybcQ== X-Gm-Message-State: AOAM530cFrGfo1h0okLrj5+8BNFHDdT8wF8HXBeWz4GVmQ7YDSaqYUFN eOPRnIPEgps9mmIm730TcNjnAA+p2lTWROj1fDfK8Gk0GTIYslBHtZ+bixwdwKIzDhPHCziA1lI NFa9aW2SWsCxgUFXxmR1jo/Pl06381b1EgFgEC2Q= X-Received: by 2002:ae9:ed91:: with SMTP id c139mr1081006qkg.7.1601322072484; Mon, 28 Sep 2020 12:41:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxcfLi6jhUPURp8r0Tp4lb41+Zeuc3kAsh73J///xcd1sHBns132iNahHxfkZr+jgR1XQGumw== X-Received: by 2002:ae9:ed91:: with SMTP id c139mr1080980qkg.7.1601322072140; Mon, 28 Sep 2020 12:41:12 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id u15sm2360222qtj.3.2020.09.28.12.41.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Sep 2020 12:41:11 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier Subject: [RFC PATCH v4 2/4] jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers() Date: Mon, 28 Sep 2020 16:41:01 -0300 Message-Id: <20200928194103.244692-3-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200928194103.244692-1-mfo@canonical.com> References: <20200928194103.244692-1-mfo@canonical.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Introduce journal callbacks to allow different behaviors for an inode in journal_submit|finish_inode_data_buffers(). The existing users of the current behavior (ext4, ocfs2) are adapted to use the previously exported functions that implement the current behavior. Users are callers of jbd2_journal_inode_ranged_write|wait(), which adds the inode to the transaction's inode list with the JI_WRITE|WAIT_DATA flags. Only ext4 and ocfs2 in-tree. Both CONFIG_EXT4_FS and CONFIG_OCSFS2_FS select CONFIG_JBD2, which builds fs/jbd2/commit.c and journal.c that define and export the functions, so we can call directly in ext4/ocfs2. Signed-off-by: Mauricio Faria de Oliveira Suggested-by: Jan Kara Reviewed-by: Jan Kara Reviewed-by: Andreas Dilger --- fs/ext4/super.c | 4 ++++ fs/jbd2/commit.c | 30 ++++++++++++++++++------------ fs/ocfs2/super.c | 5 +++++ include/linux/jbd2.h | 25 ++++++++++++++++++++++++- 4 files changed, 51 insertions(+), 13 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ea425b49b345..a14c1ed39aa3 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4646,6 +4646,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) set_task_ioprio(sbi->s_journal->j_task, journal_ioprio); sbi->s_journal->j_commit_callback = ext4_journal_commit_callback; + sbi->s_journal->j_submit_inode_data_buffers = + jbd2_journal_submit_inode_data_buffers; + sbi->s_journal->j_finish_inode_data_buffers = + jbd2_journal_finish_inode_data_buffers; no_journal: if (!test_opt(sb, NO_MBCACHE)) { diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index c17cda96926e..23d3fcc11b97 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -200,6 +200,12 @@ int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) .range_end = dirty_end, }; + /* + * submit the inode data buffers. We use writepage + * instead of writepages. Because writepages can do + * block allocation with delalloc. We need to write + * only allocated blocks here. + */ ret = generic_writepages(mapping, &wbc); return ret; } @@ -224,16 +230,13 @@ static int journal_submit_data_buffers(journal_t *journal, continue; jinode->i_flags |= JI_COMMIT_RUNNING; spin_unlock(&journal->j_list_lock); - /* - * submit the inode data buffers. We use writepage - * instead of writepages. Because writepages can do - * block allocation with delalloc. We need to write - * only allocated blocks here. - */ + /* submit the inode data buffers. */ trace_jbd2_submit_inode_data(jinode->i_vfs_inode); - err = jbd2_journal_submit_inode_data_buffers(jinode); - if (!ret) - ret = err; + if (journal->j_submit_inode_data_buffers) { + err = journal->j_submit_inode_data_buffers(jinode); + if (!ret) + ret = err; + } spin_lock(&journal->j_list_lock); J_ASSERT(jinode->i_transaction == commit_transaction); jinode->i_flags &= ~JI_COMMIT_RUNNING; @@ -273,9 +276,12 @@ static int journal_finish_inode_data_buffers(journal_t *journal, continue; jinode->i_flags |= JI_COMMIT_RUNNING; spin_unlock(&journal->j_list_lock); - err = jbd2_journal_finish_inode_data_buffers(jinode); - if (!ret) - ret = err; + /* wait for the inode data buffers writeout. */ + if (journal->j_finish_inode_data_buffers) { + err = journal->j_finish_inode_data_buffers(jinode); + if (!ret) + ret = err; + } spin_lock(&journal->j_list_lock); jinode->i_flags &= ~JI_COMMIT_RUNNING; smp_mb(); diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 1d91dd1e8711..560f13d4e2aa 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -2211,6 +2211,11 @@ static int ocfs2_initialize_super(struct super_block *sb, } osb->journal = journal; journal->j_osb = osb; + journal->j_journal->j_submit_inode_data_buffers = + jbd2_journal_submit_inode_data_buffers; + journal->j_journal->j_finish_inode_data_buffers = + jbd2_journal_finish_inode_data_buffers; + atomic_set(&journal->j_num_trans, 0); init_rwsem(&journal->j_trans_barrier); diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 2865a5475888..4aaa408c0ca7 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -629,7 +629,9 @@ struct transaction_s struct journal_head *t_shadow_list; /* - * List of inodes whose data we've modified in data=ordered mode. + * List of inodes associated with the transaction; e.g., ext4 uses + * this to track inodes in data=ordered and data=journal mode that + * need special handling on transaction commit; also used by ocfs2. * [j_list_lock] */ struct list_head t_inode_list; @@ -1111,6 +1113,27 @@ struct journal_s void (*j_commit_callback)(journal_t *, transaction_t *); + /** + * @j_submit_inode_data_buffers: + * + * This function is called for all inodes associated with the + * committing transaction marked with JI_WRITE_DATA flag + * before we start to write out the transaction to the journal. + */ + int (*j_submit_inode_data_buffers) + (struct jbd2_inode *); + + /** + * @j_finish_inode_data_buffers: + * + * This function is called for all inodes associated with the + * committing transaction marked with JI_WAIT_DATA flag + * after we have written the transaction to the journal + * but before we write out the commit block. + */ + int (*j_finish_inode_data_buffers) + (struct jbd2_inode *); + /* * Journal statistics */ From patchwork Mon Sep 28 19:41:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1372871 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4C0XvN05NXz9s1t for ; Tue, 29 Sep 2020 05:41:20 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726656AbgI1TlT (ORCPT ); Mon, 28 Sep 2020 15:41:19 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:38913 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726465AbgI1TlT (ORCPT ); Mon, 28 Sep 2020 15:41:19 -0400 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kMz1Q-0002AS-2i for linux-ext4@vger.kernel.org; Mon, 28 Sep 2020 19:41:16 +0000 Received: by mail-qt1-f198.google.com with SMTP id h31so1406675qtd.14 for ; Mon, 28 Sep 2020 12:41:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WCau5di699grODyFgnJe+SR4ba0hdB+A8OoiSVqP36Y=; b=bPU863MScTZzL7bF+ZkZbAB6wyYY8zdfKZwKs+58EiNREVBPPSbuL0nOe+9CGgmFae ZR/A/X6XwyRugZuDRoppJ15PZeFgNgEv8Ivxou5+lcrpQkxy9mm+FVZr2/Wm+c6eZQf3 kvo0EDw8/KFKBbRpj3WaXc17eagMS2iCtFWpFiyFGSwGa/YItDXXcCuIFgqAV9kYxqJ3 41NFMfGr6DrxlOuBiyZMx6MjbKArepsgBHuhkw/zOvHjMmUiu+PPE6hR/+zAPKFcWEMB krsH+9N+yaoIYOThPzEIKFfUGu6fX6no4v3pABcdv/queyFU396V30jc5G+C50HyfNuQ O1uA== X-Gm-Message-State: AOAM5313sDe4M5J+eW7WcelySKMjhebh+fX73z64lRldBcoecSaeVvIU Te7mewH5N5Jztyk2HsTAA/v9MCVuak2OC6I7U4X0JJAVonmTpHWg7F3ZiuPBfEnNNpuQ/vHdDW8 NXDqUy2nLzBDUhzdcdMJngq+OcA5C6sJMSDZtYvc= X-Received: by 2002:ac8:4e49:: with SMTP id e9mr3318991qtw.167.1601322075052; Mon, 28 Sep 2020 12:41:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzjm1+NxWMR7TE19FtY2Iwy5UDlIFICf81ckydbVpoTs03//KHJ9ugCmfSS5MThiLIC37UCbg== X-Received: by 2002:ac8:4e49:: with SMTP id e9mr3318973qtw.167.1601322074779; Mon, 28 Sep 2020 12:41:14 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id u15sm2360222qtj.3.2020.09.28.12.41.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Sep 2020 12:41:13 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier Subject: [RFC PATCH v4 3/4] ext4: data=journal: fixes for ext4_page_mkwrite() Date: Mon, 28 Sep 2020 16:41:02 -0300 Message-Id: <20200928194103.244692-4-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200928194103.244692-1-mfo@canonical.com> References: <20200928194103.244692-1-mfo@canonical.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org These are two fixes for data journalling required by the next patch, discovered while testing it. First, the optimization to return early if all buffers are mapped is not appropriate for the next patch: The inode _must_ be added to the transaction's list in data=journal mode (so to write-protect pages on commit) thus we cannot return early there. Second, once that optimization to reduce transactions was disabled for data=journal mode, more transactions happened, and occasionally hit this warning message: 'JBD2: Spotted dirty metadata buffer'. Reason is, block_page_mkwrite() will set_buffer_dirty() before do_journal_get_write_access() that is there to prevent it. This issue was masked by the optimization. So, on data=journal use __block_write_begin() instead. This also requires page locking and len recalculation. (see block_page_mkwrite() for implementation details.) Finally, as Jan noted there is little sharing between data=journal and other modes in ext4_page_mkwrite(). However, a prototype of ext4_journalled_page_mkwrite() showed there still would be lots of duplicated lines (tens of) that didn't seem worth it. Thus this patch ends up with an ugly goto to skip all non-data journalling code (to avoid long indentations, but that can be changed..) in the beginning, and just a conditional in the transaction section. Well, we skip a common part to data journalling which is the page truncated check, but we do it again after ext4_journal_start() when we re-acquire the page lock (so not to acquire the page lock twice needlessly for data journalling.) Signed-off-by: Mauricio Faria de Oliveira Suggested-by: Jan Kara Reviewed-by: Andreas Dilger Reviewed-by: Jan Kara --- fs/ext4/inode.c | 51 ++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bf596467c234..ac153e340a6f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5977,9 +5977,17 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) if (err) goto out_ret; + /* + * On data journalling we skip straight to the transaction handle: + * there's no delalloc; page truncated will be checked later; the + * early return w/ all buffers mapped (calculates size/len) can't + * be used; and there's no dioread_nolock, so only ext4_get_block. + */ + if (ext4_should_journal_data(inode)) + goto retry_alloc; + /* Delalloc case is easy... */ if (test_opt(inode->i_sb, DELALLOC) && - !ext4_should_journal_data(inode) && !ext4_nonda_switch(inode->i_sb)) { do { err = block_page_mkwrite(vma, vmf, @@ -6005,6 +6013,9 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) /* * Return if we have all the buffers mapped. This avoids the need to do * journal_start/journal_stop which can block and take a long time + * + * This cannot be done for data journalling, as we have to add the + * inode to the transaction's list to writeprotect pages on commit. */ if (page_has_buffers(page)) { if (!ext4_walk_page_buffers(NULL, page_buffers(page), @@ -6029,16 +6040,42 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) ret = VM_FAULT_SIGBUS; goto out; } - err = block_page_mkwrite(vma, vmf, get_block); - if (!err && ext4_should_journal_data(inode)) { - if (ext4_walk_page_buffers(handle, page_buffers(page), 0, - PAGE_SIZE, NULL, do_journal_get_write_access)) { + /* + * Data journalling can't use block_page_mkwrite() because it + * will set_buffer_dirty() before do_journal_get_write_access() + * thus might hit warning messages for dirty metadata buffers. + */ + if (!ext4_should_journal_data(inode)) { + err = block_page_mkwrite(vma, vmf, get_block); + } else { + lock_page(page); + size = i_size_read(inode); + /* Page got truncated from under us? */ + if (page->mapping != mapping || page_offset(page) > size) { unlock_page(page); - ret = VM_FAULT_SIGBUS; + ret = VM_FAULT_NOPAGE; ext4_journal_stop(handle); goto out; } - ext4_set_inode_state(inode, EXT4_STATE_JDATA); + + if (page->index == size >> PAGE_SHIFT) + len = size & ~PAGE_MASK; + else + len = PAGE_SIZE; + + err = __block_write_begin(page, 0, len, ext4_get_block); + if (!err) { + if (ext4_walk_page_buffers(handle, page_buffers(page), + 0, len, NULL, do_journal_get_write_access)) { + unlock_page(page); + ret = VM_FAULT_SIGBUS; + ext4_journal_stop(handle); + goto out; + } + ext4_set_inode_state(inode, EXT4_STATE_JDATA); + } else { + unlock_page(page); + } } ext4_journal_stop(handle); if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) From patchwork Mon Sep 28 19:41:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1372872 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4C0XvP6B1Sz9sSf for ; Tue, 29 Sep 2020 05:41:21 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726465AbgI1TlV (ORCPT ); Mon, 28 Sep 2020 15:41:21 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:38918 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726328AbgI1TlV (ORCPT ); Mon, 28 Sep 2020 15:41:21 -0400 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kMz1S-0002Ar-6Q for linux-ext4@vger.kernel.org; Mon, 28 Sep 2020 19:41:18 +0000 Received: by mail-qt1-f198.google.com with SMTP id g10so1420467qto.1 for ; Mon, 28 Sep 2020 12:41:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sD1Mzs+Ed6ovaSd7Z54agbSCqdaGZf7p4sJFJdVpKtk=; b=CJ0Js6Zfi6VHlmsR6u9kaFiRcMwTV5nIPWsKp0IbvkbGNYdAOet7jAIjIOlwme9vvw eo/aLUuhI6qrkTiv7sCaWe1neGjqQBBjkkjlcPw6QkDI9mm6kB/K1pLrUz4mk6BB3qnc TCsO0Is2PjNEjM2USf1TvUYMtVnkFDkNWFOnJQv8w04yP2gJniPwh26ZwoIcvgFEF0ND 6MP6PgParQ72AQsB3R15hc6ol+xFmmjFK6/Ft/f/0io0obr45//+Rgj/7E1HD719DRhC /+TwaYGfsBvMV7BuJ8cgquByw6viC982E6oUjSCMRzXXQWoIQXTDOQw1MwC2hA0Mgxa/ fFxg== X-Gm-Message-State: AOAM533Spqb7PUz7SwRHSGSOJamKnIUUD1NkZ6r/JyaIIJspJL8U99sp 9LbVJD5A8KabrC+LC0S75vZ9qGny2uz3WpiMbJoX7p2LBV5GAuIvwJuI0QiotYb/D/MerhB95f7 Ddhhl5un1qX4uqrfxbX4D/SeIJ2MFaBDakvDo+T8= X-Received: by 2002:ac8:17af:: with SMTP id o44mr3265034qtj.343.1601322077182; Mon, 28 Sep 2020 12:41:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyoJBFbYmnghe+XHCDa7wTgckRtn+6JT51FBPC5LIf1PX9YFgHk1nugoWmnkGLf65VH/Ram4A== X-Received: by 2002:ac8:17af:: with SMTP id o44mr3265011qtj.343.1601322076904; Mon, 28 Sep 2020 12:41:16 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id u15sm2360222qtj.3.2020.09.28.12.41.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Sep 2020 12:41:16 -0700 (PDT) From: Mauricio Faria de Oliveira To: Jan Kara Cc: linux-ext4@vger.kernel.org, dann frazier Subject: [RFC PATCH v4 4/4] ext4: data=journal: write-protect pages on j_submit_inode_data_buffers() Date: Mon, 28 Sep 2020 16:41:03 -0300 Message-Id: <20200928194103.244692-5-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200928194103.244692-1-mfo@canonical.com> References: <20200928194103.244692-1-mfo@canonical.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This implements journal callbacks j_submit|finish_inode_data_buffers() with different behavior for data=journal: to write-protect pages under commit, preventing changes to buffers writeably mapped to userspace. If a buffer's content changes between commit's checksum calculation and write-out to disk, it can cause journal recovery/mount failures upon a kernel crash or power loss. [ 27.334874] EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support! [ 27.339492] JBD2: Invalid checksum recovering data block 8705 in log [ 27.342716] JBD2: recovery failed [ 27.343316] EXT4-fs (loop0): error loading journal mount: /ext4: can't read superblock on /dev/loop0. In j_submit_inode_data_buffers() we write-protect the inode's pages with write_cache_pages() and redirty w/ writepage callback if needed. In j_finish_inode_data_buffers() there is nothing do to. And in order to use the callbacks, inodes are added to the inode list in transaction in __ext4_journalled_writepage() and ext4_page_mkwrite(). In ext4_page_mkwrite() we must make sure that the buffers are attached to the transaction as jbddirty with write_end_fn(), as already done in __ext4_journalled_writepage(). Signed-off-by: Mauricio Faria de Oliveira Reported-by: Dann Frazier Reported-by: kernel test robot # wbc.nr_to_write Suggested-by: Jan Kara Reviewed-by: Jan Kara --- fs/ext4/inode.c | 25 +++++++++------ fs/ext4/super.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 96 insertions(+), 11 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ac153e340a6f..af5de62c1214 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1910,6 +1910,9 @@ static int __ext4_journalled_writepage(struct page *page, err = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL, write_end_fn); } + if (ret == 0) + ret = err; + err = ext4_jbd2_inode_add_write(handle, inode, 0, len); if (ret == 0) ret = err; EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid; @@ -6052,10 +6055,8 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) size = i_size_read(inode); /* Page got truncated from under us? */ if (page->mapping != mapping || page_offset(page) > size) { - unlock_page(page); ret = VM_FAULT_NOPAGE; - ext4_journal_stop(handle); - goto out; + goto out_error; } if (page->index == size >> PAGE_SHIFT) @@ -6065,13 +6066,15 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) err = __block_write_begin(page, 0, len, ext4_get_block); if (!err) { + ret = VM_FAULT_SIGBUS; if (ext4_walk_page_buffers(handle, page_buffers(page), - 0, len, NULL, do_journal_get_write_access)) { - unlock_page(page); - ret = VM_FAULT_SIGBUS; - ext4_journal_stop(handle); - goto out; - } + 0, len, NULL, do_journal_get_write_access)) + goto out_error; + if (ext4_walk_page_buffers(handle, page_buffers(page), + 0, len, NULL, write_end_fn)) + goto out_error; + if (ext4_jbd2_inode_add_write(handle, inode, 0, len)) + goto out_error; ext4_set_inode_state(inode, EXT4_STATE_JDATA); } else { unlock_page(page); @@ -6086,6 +6089,10 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) up_read(&EXT4_I(inode)->i_mmap_sem); sb_end_pagefault(inode->i_sb); return ret; +out_error: + unlock_page(page); + ext4_journal_stop(handle); + goto out; } vm_fault_t ext4_filemap_fault(struct vm_fault *vmf) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a14c1ed39aa3..ac9558080fc7 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -472,6 +472,84 @@ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn) spin_unlock(&sbi->s_md_lock); } +/* + * This writepage callback for write_cache_pages() + * takes care of a few cases after page cleaning. + * + * write_cache_pages() already checks for dirty pages + * and calls clear_page_dirty_for_io(), which we want, + * to write protect the pages. + * + * However, we have to redirty a page in these cases: + * 1) some buffer is dirty (needs checkpointing) + * 2) some buffer is not part of the committing transaction + * 3) some buffer already has b_next_transaction set + */ + +static int ext4_journalled_writepage_callback(struct page *page, + struct writeback_control *wbc, + void *data) +{ + transaction_t *transaction = (transaction_t *) data; + struct buffer_head *bh, *head; + struct journal_head *jh; + + bh = head = page_buffers(page); + do { + jh = bh2jh(bh); + if (buffer_dirty(bh) || + (jh && (jh->b_transaction != transaction || + jh->b_next_transaction))) { + redirty_page_for_writepage(wbc, page); + goto out; + } + } while ((bh = bh->b_this_page) != head); + +out: + return AOP_WRITEPAGE_ACTIVATE; +} + +static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode) +{ + struct address_space *mapping = jinode->i_vfs_inode->i_mapping; + transaction_t *transaction = jinode->i_transaction; + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; + + struct writeback_control wbc = { + .sync_mode = WB_SYNC_ALL, + .nr_to_write = LONG_MAX, + .range_start = dirty_start, + .range_end = dirty_end, + }; + + return write_cache_pages(mapping, &wbc, + ext4_journalled_writepage_callback, + transaction); +} + +static int ext4_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) +{ + int ret; + + if (ext4_should_journal_data(jinode->i_vfs_inode)) + ret = ext4_journalled_submit_inode_data_buffers(jinode); + else + ret = jbd2_journal_submit_inode_data_buffers(jinode); + + return ret; +} + +static int ext4_journal_finish_inode_data_buffers(struct jbd2_inode *jinode) +{ + int ret = 0; + + if (!ext4_should_journal_data(jinode->i_vfs_inode)) + ret = jbd2_journal_finish_inode_data_buffers(jinode); + + return ret; +} + static bool system_going_down(void) { return system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF @@ -4647,9 +4725,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_journal->j_commit_callback = ext4_journal_commit_callback; sbi->s_journal->j_submit_inode_data_buffers = - jbd2_journal_submit_inode_data_buffers; + ext4_journal_submit_inode_data_buffers; sbi->s_journal->j_finish_inode_data_buffers = - jbd2_journal_finish_inode_data_buffers; + ext4_journal_finish_inode_data_buffers; no_journal: if (!test_opt(sb, NO_MBCACHE)) {