From patchwork Wed Apr 8 21:55:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268344 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=X3qLn+FG; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4Q1smXz9sSM for ; Thu, 9 Apr 2020 07:55:50 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726699AbgDHVzs (ORCPT ); Wed, 8 Apr 2020 17:55:48 -0400 Received: from mail-pl1-f177.google.com ([209.85.214.177]:45153 "EHLO mail-pl1-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726494AbgDHVzs (ORCPT ); Wed, 8 Apr 2020 17:55:48 -0400 Received: by mail-pl1-f177.google.com with SMTP id t4so3017724plq.12 for ; Wed, 08 Apr 2020 14:55:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Q9XrkZNYu0Yw+bWRSqI4hr5T+r4/VGdxEE6HjAdpRJY=; b=X3qLn+FGVWmDWvpsgniRKA4oeHTLDHuiglDezQUin4o7qzeTqAXpMypFc1w3d2j1gs LQqIQGCsIh+Dpv05Qsj6Ac+v+vuwNkzoxCn8FGHfjpxUt/QDSJH4s/N/+HPzGzFHSm8r /q+VRdnrV2+HD3w/QiusEaTHCh21PEBNYYqGz8x0U2YSjd9hDUdWOLJyb+nG0/yNx8U0 LfIPaRnr0oRUUk0l3UMAjcElVIw/ry7rIpBjglIrsHgSDPbcc0/gjmel5KDPJw/evPNJ tcDXr+5t/kxNWnF74/QOkMeaTV6qVxZlai6+fNvA9SrYQYnfoFuQHrzprgl9eMPlNKhY gPHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Q9XrkZNYu0Yw+bWRSqI4hr5T+r4/VGdxEE6HjAdpRJY=; b=AZe+obO1jTWJT8wUTQwiaRomnBVCsPllrh5K/rU2QQJWnFonQ7dArhe20F44r+600w oV6M8z/mM/0u2NytENsqjsIK4YVR/6kIqHcuLCPx6nD7h7H1tbcExuRDriAalPVJZxZW zsNLUsD15ZmHUUu5uFhtjfOgzWVcPwqbs8+/leYAZ9uDg0x2PIZPwUUQ2ynms1RNGLsa YbNulNvUF7L2cTN3wetu57HRs2HFn68t7E9pPmWNZ9IT5adRcFeMjPusnB9kDt0LzWkj 7gHv2asv5BzGf2/12bWX+5DtsL0bbzAztzDT98yM/altWF8NiBClpXVfCEEjqhDICxGK pLaA== X-Gm-Message-State: AGi0PuYwlug42H7wkEs0UR5wWKW4aZgYeHb0i+wmWAexYFEvDtTOQ2Pi +TOqZ8kTUqXMepCnlpAP5PwC5nzZ X-Google-Smtp-Source: APiQypISF/SZEDT/usj+J0kRw8Xem5GfDAjE+7Y4rmzFfbDkuPf1hNpYeRBufS75q9RQe1exD6sUhw== X-Received: by 2002:a17:902:9a82:: with SMTP id w2mr9114359plp.117.1586382945493; Wed, 08 Apr 2020 14:55:45 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:44 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 01/20] ext4: update docs for fast commit feature Date: Wed, 8 Apr 2020 14:55:11 -0700 Message-Id: <20200408215530.25649-1-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar This patch series adds support for fast commits which is a simplified version of the scheme proposed by Park and Shin, in their paper, "iJournaling: Fine-Grained Journaling for Improving the Latency of Fsync System Call"[1]. The basic idea of fast commits is to make JBD2 give the client file system an opportunity to perform a faster commit. Only if the file system cannot perform such a commit operation, then JBD2 should fall back to traditional commits. Because JBD2 operates at block granularity, for every file system metadata update it commits all the changed blocks are written to the journal at commit time. This is inefficient because updates to some blocks that JBD2 commits are derivable from some other blocks. For example, if a new extent is added to an inode, then corresponding updates to the inode table, the block bitmap, the group descriptor and the superblock can be derived based on just the extent information and the corresponding inode information. So, if we take this relationship between blocks into account and replay the journalled blocks smartly, we could increase performance of file system commits significantly. Fast commits introduced in this patch have two main contributions: (1) Making JBD2 fast commit aware, so that clients of JBD2 can implement fast commits (2) Add support in ext4 to use JBD2's new interfaces and implement fast commits. Ext4 supports two modes of fast commits: 1) fast commits with hard consistency guarantees 2) fast commits with soft consistency guarantees When hard consistency is enabled, fast commit guarantees that all the updates will be committed. After a successful replay of fast commits blocks in hard consistency mode, the entire file system would be in the same state as that when fsync() returned before crash. This guarantee is similar to what jbd2 gives with full commits. With soft consistency, file system only guarantees consistency for the inode in question. In this mode, file system will try to write as less data to the backend as possible during the commit time. To be precise, file system records all the data updates for the inode in question and directory updates that are required for guaranteeing consistency of the inode in question. In our evaluations, fast commits with hard consistency performed better than fast commits with soft consistency. That's because with hard consistency, a fast commit often ends up committing other inodes together, while with soft consistency commits get serialized. Future work can look at creating hybrid approach between the two extremes that are there in this patchset. Testing ------- e2fsprogs was updated to set fast commit feature flag and to ignore fast commit blocks during e2fsck. https://github.com/harshadjs/e2fsprogs.git After applying all the patches in this series, following runs of xfstests were performed: - kvm-xfstest.sh -g log -c 4k - kvm-xfstests.sh smoke All the log tests were successful and smoke tests didn't introduce any additional failures. Performance Evaluation ---------------------- Ext4 file system performance was tested with full commits, with fast commits with soft consistency and with fast commits with hard consistency. fs_mark benchmark showed that depending on the file size, performance improvement was seen up to 50%. Soft fast commits performed slightly worse than hard fast commits. But soft fast commits ended up writing slightly lesser number of blocks on disk. Changes since V5: - Rebased on top of v5.6 Harshad Shirwadkar(20): ext4: add debug mount option to test fast commit replay ext4: add fast commit replay path ext4: disable certain features in replay path ext4: add idempotent helpers to manipulate bitmaps ext4: fast commit recovery path preparation jbd2: add fast commit recovery path support ext4: main commit routine for fast commits jbd2: add new APIs for commit path of fast commits ext4: add fast commit on-disk format structs ext4: add fast commit track points ext4: break ext4_unlink() and ext4_link() ext4: add inode tracking and ineligible marking routines ext4: add directory entry tracking routines ext4: add generic diff tracking routines and range tracking jbd2: fast commit main commit path changes jbd2: disable fast commits if journal is empty jbd2: add fast commit block tracker variables ext4, jbd2: add fast commit initialization routines ext4: add handling for extended mount options ext4: update docs for fast commit feature Documentation/filesystems/ext4/journal.rst | 127 ++- Documentation/filesystems/journalling.rst | 18 + fs/ext4/acl.c | 1 + fs/ext4/balloc.c | 10 +- fs/ext4/ext4.h | 126 +++ fs/ext4/ext4_jbd2.c | 1484 +++++++++++++++++++++++++++- fs/ext4/ext4_jbd2.h | 71 ++ fs/ext4/extents.c | 5 + fs/ext4/extents_status.c | 24 + fs/ext4/fsync.c | 2 +- fs/ext4/ialloc.c | 165 +++- fs/ext4/inline.c | 3 + fs/ext4/inode.c | 76 +- fs/ext4/ioctl.c | 11 +- fs/ext4/mballoc.c | 158 ++- fs/ext4/mballoc.h | 2 + fs/ext4/migrate.c | 1 + fs/ext4/namei.c | 182 ++-- fs/ext4/super.c | 72 +- fs/ext4/xattr.c | 6 + fs/jbd2/commit.c | 61 ++ fs/jbd2/journal.c | 217 +++- fs/jbd2/recovery.c | 67 +- include/linux/jbd2.h | 83 +- include/trace/events/ext4.h | 208 +++- 25 files changed, 3046 insertions(+), 134 deletions(-)ˆ Signed-off-by: Harshad Shirwadkar Reviewed-by: Andreas Dilger --- Documentation/filesystems/ext4/journal.rst | 127 ++++++++++++++++++++- Documentation/filesystems/journalling.rst | 18 +++ 2 files changed, 139 insertions(+), 6 deletions(-) diff --git a/Documentation/filesystems/ext4/journal.rst b/Documentation/filesystems/ext4/journal.rst index ea613ee701f5..f94e66f2f8c4 100644 --- a/Documentation/filesystems/ext4/journal.rst +++ b/Documentation/filesystems/ext4/journal.rst @@ -29,10 +29,10 @@ safest. If ``data=writeback``, dirty data blocks are not flushed to the disk before the metadata are written to disk through the journal. The journal inode is typically inode 8. The first 68 bytes of the -journal inode are replicated in the ext4 superblock. The journal itself -is normal (but hidden) file within the filesystem. The file usually -consumes an entire block group, though mke2fs tries to put it in the -middle of the disk. +journal inode are replicated in the ext4 superblock. The journal +itself is normal (but hidden) file within the filesystem. The file +usually consumes an entire block group, though mke2fs tries to put it +in the middle of the disk. All fields in jbd2 are written to disk in big-endian order. This is the opposite of ext4. @@ -42,22 +42,74 @@ NOTE: Both ext4 and ocfs2 use jbd2. The maximum size of a journal embedded in an ext4 filesystem is 2^32 blocks. jbd2 itself does not seem to care. +Fast Commits +~~~~~~~~~~~~ + +Ext4 also implements fast commits and integrates it with JBD2 journalling. +Fast commits store metadata changes made to the file system as inode level +diff. In other words, each fast commit block identifies updates made to +a particular inode and collectively they represent total changes made to +the file system. + +A fast commit is valid only if there is no full commit after that particular +fast commit. Because of this feature, fast commit blocks can be reused by +the following transactions. + +Each fast commit block stores updates to 1 particular inode. Updates in each +fast commit block are one of the 2 types: +- Data updates (add range / delete range) +- Directory entry updates (Add / remove links) + +Fast commit blocks must be replayed in the order in which they appear on disk. +That's because directory entry updates are written in fast commit blocks +in the order in which they are applied on the file system before crash. +Changing the order of replaying for directory entry updates may result +in inconsistent file system. Note that only directory entry updates need +ordering, data updates, since they apply to only one inode, do not require +ordered replay. Also, fast commits guarantee that file system is in consistent +state after replay of each fast commit block as long as order of replay has +been followed. + +Note that directory inode updates are never directly recorded in fast commits. +Just like other file system level metaata, updates to directories are always +implied based on directory entry updates stored in fast commit blocks. + +Based on which directory entry updates are committed with an inode, fast +commits have two modes of operation: + +- Hard Consistency (default) +- Soft Consistency (can be enabled by setting mount flag "fc_soft_consistency") + +When hard consistency is enabled, fast commit guarantees that all the updates +will be committed. After a successful replay of fast commits blocks +in hard consistency mode, the entire file system would be in the same state as +that when fsync() returned before crash. This guarantee is similar to what +jbd2 gives. + +With soft consistency, file system only guarantees consistency for the +inode in question. In this mode, file system will try to write as less data +to the backed as possible during the commit time. To be precise, file system +records all the data updates for the inode in question and directory updates +that are required for guaranteeing consistency of the inode in question. + Layout ~~~~~~ Generally speaking, the journal has this format: .. list-table:: - :widths: 16 48 16 + :widths: 16 48 16 18 :header-rows: 1 * - Superblock - descriptor\_block (data\_blocks or revocation\_block) [more data or revocations] commmit\_block - [more transactions...] + - [Fast commits...] * - - One transaction - + - Notice that a transaction begins with either a descriptor and some data, or a block revocation list. A finished transaction always ends with a @@ -76,7 +128,7 @@ The journal superblock will be in the next full block after the superblock. .. list-table:: - :widths: 12 12 12 32 12 + :widths: 12 12 12 32 12 12 :header-rows: 1 * - 1024 bytes of padding @@ -85,11 +137,13 @@ superblock. - descriptor\_block (data\_blocks or revocation\_block) [more data or revocations] commmit\_block - [more transactions...] + - [Fast commits...] * - - - - One transaction - + - Block Header ~~~~~~~~~~~~ @@ -609,3 +663,64 @@ bytes long (but uses a full block): - h\_commit\_nsec - Nanoseconds component of the above timestamp. +Fast Commit Block +~~~~~~~~~~~~~~~~~ + +The fast commit block indicates an append to the last commit block +that was written to the journal. One fast commit block records updates +to one inode. So, typically you would find as many fast commit blocks +as the number of inodes that got changed since the last commit. A fast +commit block is valid only if there is no commit block present with +transaction ID greater than that of the fast commit block. If such a +block a present, then there is no need to replay the fast commit +block. + +.. list-table:: + :widths: 8 8 24 40 + :header-rows: 1 + + * - Offset + - Type + - Name + - Descriptor + * - 0x0 + - journal\_header\_s + - (open coded) + - Common block header. + * - 0xC + - \_\_le32 + - fc\_magic + - Magic value which should be set to 0xE2540090. This identifies + that this block is a fast commit block. + * - 0x10 + - \_\_u8 + - fc\_features + - Features used by this fast commit block. + * - 0x11 + - \_\_le16 + - fc_num_tlvs + - Number of TLVs contained in this fast commit block + * - 0x13 + - \_\_le32 + - \_\_fc\_len + - Length of the fast commit block in terms of number of blocks + * - 0x17 + - \_\_le32 + - fc\_ino + - Inode number of the inode that will be recovered using this fast commit + * - 0x2B + - struct ext4\_inode + - inode + - On-disk copy of the inode at the commit time + * - + - struct ext4\_fc\_tl + - Array of struct ext4\_fc\_tl + - The actual delta with the last commit. Starting at this offset, + there is an array of TLVs that indicates which all extents + should be present in the corresponding inode. Currently, + following tags are supported: EXT4\_FC\_TAG\_EXT (extent that + should be present in the inode), EXT4\_FC\_TAG\_HOLE (extent + that should be removed from the inode), EXT4\_FC\_TAG\_ADD\_DENTRY + (dentry that should be linked), EXT4\_FC\_TAG\_DEL\_DENTRY + (dentry that should be unlinked), EXT4\_FC\_TAG\_CREATE\_DENTRY + (dentry that for the file that should be created for the first time). diff --git a/Documentation/filesystems/journalling.rst b/Documentation/filesystems/journalling.rst index 58ce6b395206..1cb116ab27ab 100644 --- a/Documentation/filesystems/journalling.rst +++ b/Documentation/filesystems/journalling.rst @@ -115,6 +115,24 @@ called after each transaction commit. You can also use ``transaction->t_private_list`` for attaching entries to a transaction that need processing when the transaction commits. +JBD2 also allows client file systems to implement file system specific +commits which are called as ``fast commits``. Fast commits are +asynchronous in nature i.e. file systems can call their own commit +functions at any time. In order to avoid the race with kjournald +thread and other possible fast commits that may be happening in +parallel, file systems should first call +:c:func:`jbd2_start_async_fc()`. File system can call +:c:func:`jbd2_map_fc_buf()` to get buffers reserved for fast +commits. Once a fast commit is completed, file system should call +:c:func:`jbd2_stop_async_fc()` to indicate and unblock other +committers and the kjournald thread. After performing either a fast +or a full commit, JBD2 calls ``journal->j_fc_cleanup_cb`` to allow +file systems to perform cleanups for their internal fast commit +related data structures. At the replay time, JBD2 passes each and +every fast commit block to the file system via +``journal->j_fc_replay_cb``. Ext4 effectively uses this fast commit +mechanism to improve journal commit performance. + JBD2 also provides a way to block all transaction updates via :c:func:`jbd2_journal_lock_updates()` / :c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a From patchwork Wed Apr 8 21:55:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268343 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=d+WilaVc; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4P5j51z9sPF for ; Thu, 9 Apr 2020 07:55:49 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726690AbgDHVzs (ORCPT ); Wed, 8 Apr 2020 17:55:48 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:39251 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726684AbgDHVzr (ORCPT ); Wed, 8 Apr 2020 17:55:47 -0400 Received: by mail-pf1-f195.google.com with SMTP id k15so3074881pfh.6 for ; Wed, 08 Apr 2020 14:55:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6a9hHcVkp4224GllHFBq6W3vTwEZAExmZ8j+JbKkY7M=; b=d+WilaVcevP6z3hb7n9vci21GmtKIKlXtN1HC8Z0T7ver5Gfu8HhRKDToRaJZUMBqm 9DdWBtC0dVeftlkWE59LJC/zVgiRfunGuFbC6js9ohCPb7jeVg3p8OhlVAvE+w0n2pQa 3PvXYW/IbhcPJnltd2redMs/5HOUZqv3KzIU0jYY3R+S+bTcAUKo1dQzMTHLpAV18etB PygIneS9+x7fy4yukgMPH2YcQzv0Gvzj+I7RTrPtFaiRfTNU47evYehYctv+ByuaaJVy JeQZXzvWWftt+GX0DIUrGWL05KuMG07L8ZVNWvBWBBeEDExcSrpvTdeyBxHO1JGAWnbW gnsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6a9hHcVkp4224GllHFBq6W3vTwEZAExmZ8j+JbKkY7M=; b=rPjsu7qt0RrKWwCV/xHwKTPxSqsgXacoUp2ECv0n3zJ0/ScdI15pJ/0s7oXdHZrYqg 7nJ/rW6l9dfIGYiXPihxlYwzdfox9U4CxhtJwZ0Gcmur9w2TFWOQjJTTgOSnGD7A0M4d QM9FD9rcZs2sEeztSXcVecIxBFzzCo6LXI2E3yx+ANyGaUlAyQKgdLzPTXeKfbXKSBFt sZaBLz50Sl1iA0odbnH3Lb1eU9WYpSoC86J4rQb8n1BfIrBqL+9UxPhN88GJp/lJsy2L dFCMEClTr9sXVXjx0L7RCz3LVwe/EOYIs1nw4qbOArd6vgKElyuU1BhZt32kcg1uLZN5 XoZQ== X-Gm-Message-State: AGi0PubwsbeJh7qyn1Z/D9E8o9gf3Cr24d//L/MsEF2eOvcRjxmCKTm3 ++APf6QTGDcYlLH8dEcTYPcmFeqe X-Google-Smtp-Source: APiQypIX46WwhczoksCz77lvlBzxZijOhxX/reknlqElCEA0RVL5qPyX0qkPMTcGnf/SbFLgNzNpGg== X-Received: by 2002:a63:2158:: with SMTP id s24mr3551071pgm.336.1586382946147; Wed, 08 Apr 2020 14:55:46 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:45 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 02/20] ext4: add handling for extended mount options Date: Wed, 8 Apr 2020 14:55:12 -0700 Message-Id: <20200408215530.25649-2-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar We are running out of mount option bits. Add handling for using s_mount_opt2. Add ability to turn on / off the fast commit feature and to turn on / off fast commit soft consistency option. Signed-off-by: Harshad Shirwadkar Reviewed-by: Andreas Dilger --- fs/ext4/ext4.h | 7 +++++++ fs/ext4/super.c | 23 +++++++++++++++++++---- 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 91eb4381cae5..7c3d89007eca 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1172,6 +1172,13 @@ struct ext4_inode_info { #define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly specified journal checksum */ +#define EXT4_MOUNT2_JOURNAL_FAST_COMMIT 0x00000010 /* Journal fast commit */ + +#define EXT4_MOUNT2_JOURNAL_FC_SOFT_CONSISTENCY 0x00000020 /* Soft consistency + * mode for fast + * commits + */ + #define clear_opt(sb, opt) EXT4_SB(sb)->s_mount_opt &= \ ~EXT4_MOUNT_##opt #define set_opt(sb, opt) EXT4_SB(sb)->s_mount_opt |= \ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9728e7b0e84f..70aaea283a63 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1523,6 +1523,7 @@ enum { Opt_dioread_nolock, Opt_dioread_lock, Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable, Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache, + Opt_no_fc, Opt_fc_soft_consistency }; static const match_table_t tokens = { @@ -1606,6 +1607,8 @@ static const match_table_t tokens = { {Opt_init_itable, "init_itable=%u"}, {Opt_init_itable, "init_itable"}, {Opt_noinit_itable, "noinit_itable"}, + {Opt_no_fc, "no_fc"}, + {Opt_fc_soft_consistency, "fc_soft_consistency"}, {Opt_max_dir_size_kb, "max_dir_size_kb=%u"}, {Opt_test_dummy_encryption, "test_dummy_encryption"}, {Opt_nombcache, "nombcache"}, @@ -1728,6 +1731,7 @@ static int clear_qf_name(struct super_block *sb, int qtype) #define MOPT_NO_EXT3 0x0200 #define MOPT_EXT4_ONLY (MOPT_NO_EXT2 | MOPT_NO_EXT3) #define MOPT_STRING 0x0400 +#define MOPT_2 0x0800 static const struct mount_opts { int token; @@ -1820,6 +1824,10 @@ static const struct mount_opts { {Opt_max_dir_size_kb, 0, MOPT_GTE0}, {Opt_test_dummy_encryption, 0, MOPT_GTE0}, {Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET}, + {Opt_no_fc, EXT4_MOUNT2_JOURNAL_FAST_COMMIT, + MOPT_CLEAR | MOPT_2 | MOPT_EXT4_ONLY}, + {Opt_fc_soft_consistency, EXT4_MOUNT2_JOURNAL_FC_SOFT_CONSISTENCY, + MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY}, {Opt_err, 0, 0} }; @@ -2110,10 +2118,17 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token, WARN_ON(1); return -1; } - if (arg != 0) - sbi->s_mount_opt |= m->mount_opt; - else - sbi->s_mount_opt &= ~m->mount_opt; + if (m->flags & MOPT_2) { + if (arg != 0) + sbi->s_mount_opt2 |= m->mount_opt; + else + sbi->s_mount_opt2 &= ~m->mount_opt; + } else { + if (arg != 0) + sbi->s_mount_opt |= m->mount_opt; + else + sbi->s_mount_opt &= ~m->mount_opt; + } } return 1; } From patchwork Wed Apr 8 21:55:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268345 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=CdKojDuP; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4Q6L1vz9sSc for ; Thu, 9 Apr 2020 07:55:50 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726703AbgDHVzt (ORCPT ); Wed, 8 Apr 2020 17:55:49 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:38223 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726684AbgDHVzt (ORCPT ); Wed, 8 Apr 2020 17:55:49 -0400 Received: by mail-pl1-f193.google.com with SMTP id w3so3036499plz.5 for ; Wed, 08 Apr 2020 14:55:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QvILr1IFcHUNBwIqs5oi3LTaooY1mIn9zz9BaeDPZLc=; b=CdKojDuPdKjG26IqIEcZczzx1BXet7R4Lz6lsT7AQwZBAM+U1oBACfDLqdTgjlPkaL bhS/u1Y6Q3RTkt21v4LhxkPGkmCX0tRC7tsoM2OQ7HN+kOT4NjbWIMPNgt4vBNgFxZp+ YkMHTUfFNBko4hc4ln2ZovUMijYsEoy0hEZ02F6gMYJa2T30wDQJIJjroy74GM98Y/0B hdLcHjwu7Kn43RVTyxWpsl7zJ9pZ0Ew4LjKulFO014yrqR+jp2E74NxuqDd0TWYF968k U9vqonM8pRNOFtoFEIUzOwdRGVXOVI/TA7LYSeJqvq1r/TrEDtSj4XRrJp+wbmuztsw9 lhjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QvILr1IFcHUNBwIqs5oi3LTaooY1mIn9zz9BaeDPZLc=; b=jh2zYwWeXnjpihrrjw5kOEQA+hsKM1a936Bj2wB9YCo72rR2o2JMytm1YQAW6i/JC1 KKHKpziksZK8oPvcN094geC6/zg6w/WrIJdIBYF9iOQOZRQTmSLZKRlGpH6Upf9fC4PG xC6xjg+npBX6QZlXuy3a4RfOCQzspvqCLKjmNLpi4FcKVzj2kzopcO1/DOTJxiLyg9CM UNMmEIx9pjRCz8LtWvS/TxjBT0IJnn+/eg7q3pUJmiE7n4EjNYDcFet1BGwpI0yOpdQh x5DKZoHW94QEMdEbXOg/CbpeGa651YBivi4n57erQ2AfwTfJrzDVQeen2DNRyI/3mFUX tvpw== X-Gm-Message-State: AGi0PuadcZb3xelop43XEhe6QB1aYE2ce5E7HfXQo0Hw0zXvmD072ICX gtFpAPr3MHaNGLzbtzMtUI379eDV X-Google-Smtp-Source: APiQypJ0mymiZ5JFUPYrBYvt+08cvdg9Y+WI33xUEvCMqowJoqm+cMm4T/qCWzpeydDf+I4MXS7JRQ== X-Received: by 2002:a17:90a:e016:: with SMTP id u22mr7530199pjy.65.1586382946810; Wed, 08 Apr 2020 14:55:46 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:46 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 03/20] ext4, jbd2: add fast commit initialization routines Date: Wed, 8 Apr 2020 14:55:13 -0700 Message-Id: <20200408215530.25649-3-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Define feature flags for fast commits and add routines to allow ext4 to initialize fast commits. Note that we allow 128 blocks to be used for fast commits. As of now, that's the default constant value. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4.h | 2 ++ fs/ext4/ext4_jbd2.c | 6 ++++++ fs/ext4/ext4_jbd2.h | 13 +++++++++++++ fs/ext4/super.c | 5 +++++ fs/jbd2/journal.c | 11 +++++++++++ include/linux/jbd2.h | 19 ++++++++++++++++++- 6 files changed, 55 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 7c3d89007eca..57f8fd4fe6ad 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1770,6 +1770,7 @@ static inline bool ext4_verity_in_progress(struct inode *inode) #define EXT4_FEATURE_COMPAT_RESIZE_INODE 0x0010 #define EXT4_FEATURE_COMPAT_DIR_INDEX 0x0020 #define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200 +#define EXT4_FEATURE_COMPAT_FAST_COMMIT 0x0400 #define EXT4_FEATURE_COMPAT_STABLE_INODES 0x0800 #define EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER 0x0001 @@ -1872,6 +1873,7 @@ EXT4_FEATURE_COMPAT_FUNCS(xattr, EXT_ATTR) EXT4_FEATURE_COMPAT_FUNCS(resize_inode, RESIZE_INODE) EXT4_FEATURE_COMPAT_FUNCS(dir_index, DIR_INDEX) EXT4_FEATURE_COMPAT_FUNCS(sparse_super2, SPARSE_SUPER2) +EXT4_FEATURE_COMPAT_FUNCS(fast_commit, FAST_COMMIT) EXT4_FEATURE_COMPAT_FUNCS(stable_inodes, STABLE_INODES) EXT4_FEATURE_RO_COMPAT_FUNCS(sparse_super, SPARSE_SUPER) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 7f16e1af8d5c..91d6437bc9b3 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -367,3 +367,9 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line, mark_buffer_dirty(bh); return err; } +void ext4_init_fast_commit(struct super_block *sb, journal_t *journal) +{ + if (!ext4_should_fast_commit(sb)) + return; + jbd2_init_fast_commit(journal, EXT4_NUM_FC_BLKS); +} diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 4b9002f0e84c..b15cfa89cf1d 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -440,6 +440,17 @@ static inline int ext4_jbd2_inode_add_wait(handle_t *handle, return 0; } +static inline int ext4_should_fast_commit(struct super_block *sb) +{ + if (!ext4_has_feature_fast_commit(sb)) + return 0; + if (!test_opt2(sb, JOURNAL_FAST_COMMIT)) + return 0; + if (test_opt(sb, QUOTA)) + return 0; + return 1; +} + static inline void ext4_update_inode_fsync_trans(handle_t *handle, struct inode *inode, int datasync) @@ -518,4 +529,6 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) return 1; } +#define EXT4_NUM_FC_BLKS 128 +void ext4_init_fast_commit(struct super_block *sb, journal_t *journal); #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 70aaea283a63..0bfaf76200d2 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3809,6 +3809,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) #ifdef CONFIG_EXT4_FS_POSIX_ACL set_opt(sb, POSIX_ACL); #endif + if (ext4_has_feature_fast_commit(sb)) + set_opt2(sb, JOURNAL_FAST_COMMIT); + /* don't forget to enable journal_csum when metadata_csum is enabled. */ if (ext4_has_metadata_csum(sb)) set_opt(sb, JOURNAL_CHECKSUM); @@ -4463,6 +4466,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_def_mount_opt &= ~EXT4_MOUNT_JOURNAL_CHECKSUM; clear_opt(sb, JOURNAL_CHECKSUM); clear_opt(sb, DATA_FLAGS); + clear_opt2(sb, JOURNAL_FAST_COMMIT); sbi->s_journal = NULL; needs_recovery = 0; goto no_journal; @@ -4821,6 +4825,7 @@ static void ext4_init_journal_params(struct super_block *sb, journal_t *journal) journal->j_commit_interval = sbi->s_commit_interval; journal->j_min_batch_time = sbi->s_min_batch_time; journal->j_max_batch_time = sbi->s_max_batch_time; + ext4_init_fast_commit(sb, journal); write_lock(&journal->j_state_lock); if (test_opt(sb, BARRIER)) diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index a49d0e670ddf..4e5d41d79b24 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1196,6 +1196,14 @@ static journal_t *journal_init_common(struct block_device *bdev, return NULL; } +void jbd2_init_fast_commit(journal_t *journal, int num_fc_blks) +{ + journal->j_fc_wbufsize = num_fc_blks; + journal->j_wbufsize = journal->j_blocksize / sizeof(journal_block_tag_t) + - journal->j_fc_wbufsize; + journal->j_fc_wbuf = &journal->j_wbuf[journal->j_wbufsize]; +} + /* jbd2_journal_init_dev and jbd2_journal_init_inode: * * Create a journal structure assigned some fixed set of disk blocks to @@ -1722,6 +1730,9 @@ int jbd2_journal_load(journal_t *journal) */ journal->j_flags &= ~JBD2_ABORT; + if (journal->j_fc_wbufsize > 0) + jbd2_journal_set_features(journal, 0, 0, + JBD2_FEATURE_INCOMPAT_FAST_COMMIT); /* OK, we've finished with the dynamic journal bits: * reinitialise the dynamic contents of the superblock in memory * and reset them on disk. */ diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index f613d8529863..3bd1431cb222 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -288,6 +288,7 @@ typedef struct journal_superblock_s #define JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT 0x00000004 #define JBD2_FEATURE_INCOMPAT_CSUM_V2 0x00000008 #define JBD2_FEATURE_INCOMPAT_CSUM_V3 0x00000010 +#define JBD2_FEATURE_INCOMPAT_FAST_COMMIT 0x00000020 /* See "journal feature predicate functions" below */ @@ -298,7 +299,8 @@ typedef struct journal_superblock_s JBD2_FEATURE_INCOMPAT_64BIT | \ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT | \ JBD2_FEATURE_INCOMPAT_CSUM_V2 | \ - JBD2_FEATURE_INCOMPAT_CSUM_V3) + JBD2_FEATURE_INCOMPAT_CSUM_V3 | \ + JBD2_FEATURE_INCOMPAT_FAST_COMMIT) #ifdef __KERNEL__ @@ -1058,6 +1060,12 @@ struct journal_s */ struct buffer_head **j_wbuf; + /** + * @j_fc_wbuf: Array of fast commit bhs for + * jbd2_journal_commit_transaction. + */ + struct buffer_head **j_fc_wbuf; + /** * @j_wbufsize: * @@ -1065,6 +1073,13 @@ struct journal_s */ int j_wbufsize; + /** + * @j_fc_wbufsize: + * + * Size of @j_fc_wbuf array. + */ + int j_fc_wbufsize; + /** * @j_last_sync_writer: * @@ -1234,6 +1249,7 @@ JBD2_FEATURE_INCOMPAT_FUNCS(64bit, 64BIT) JBD2_FEATURE_INCOMPAT_FUNCS(async_commit, ASYNC_COMMIT) JBD2_FEATURE_INCOMPAT_FUNCS(csum2, CSUM_V2) JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3) +JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit, FAST_COMMIT) /* * Journal flag definitions @@ -1500,6 +1516,7 @@ void __jbd2_log_wait_for_space(journal_t *journal); extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *); extern int jbd2_cleanup_journal_tail(journal_t *); +void jbd2_init_fast_commit(journal_t *journal, int num_fc_blks); /* * is_journal_abort * From patchwork Wed Apr 8 21:55:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268346 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=r3mquR0H; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4R27Kwz9sSj for ; Thu, 9 Apr 2020 07:55:51 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726705AbgDHVzt (ORCPT ); Wed, 8 Apr 2020 17:55:49 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:33510 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726692AbgDHVzt (ORCPT ); Wed, 8 Apr 2020 17:55:49 -0400 Received: by mail-pf1-f194.google.com with SMTP id c138so3733742pfc.0 for ; Wed, 08 Apr 2020 14:55:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nn0lXFt10mUg8QLw79Kr79Dzre7UK9mmCWBrt3nmavU=; b=r3mquR0Hdxfj8up8JzU/PxZ/4X58bl9OLg4somTtkwPkksHJIGc0e4thrmNOhroEyx MlM+GU+Rec22UfjyjkYVRPx/oIAka0xxyUnySw6fZoJkXhj6/Vi8UfOIx+I9kqsJhwM6 w55GCZ4gKGZEaCg7xnD6SWelO6c9pMLa94JCK3YgVhyPrCGuobZ6PDsNealMv37XOEDk 5KeBtIezHbPNy+byr7EeUOTP7ftg2tOt9OocdUhTgBo3PyNqEtSvf2WaRFhwbXmCceYv sbML0eJgbA+C8CWj0q7f6hagvlfvJwNbkVrlC8XMVQYxADXfj/u96c55Lvq9ACdcjWdv H8yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nn0lXFt10mUg8QLw79Kr79Dzre7UK9mmCWBrt3nmavU=; b=MaDguo83vm5MntTzqLiaB7Ze95oRoO2k4Nn8k7YcBxYqKU/NnkSRhtpMuyjwGNrPrd R57WmPEkWXtTyXHWDQ7bAnSO3DnTiWSAN2xThoGZo5NXAj6XTkmNEO+O2Noats8rz4Td yWpzhh5L7qpdZuOgpB+veA7lT4KWcH6uvltw5KViOZso+T2WSpafxhqceU8QWmOVBQaH MWYo/bMG2EKRV6Gg3RvKkEjb6LLEKIk4bTHpBU1L6gsuPPxKeIJ6AXmSI5390MpHX7yG h3vrVlNO/i4t9edV5zryyWUWwuvNE9yoz6SFnYzFF7NK8mXqUTjkz9IYyTPBv8VhsjMM BIxA== X-Gm-Message-State: AGi0PuaWh2Bb8ogEW9sjw6wS6gmX4PW4JKjnTTDfrThrbSgvvJxpMdG5 Vda0pAS6A6iYRV1fczTYp7EmI8O7 X-Google-Smtp-Source: APiQypJtgngKI3YSPR2H2KAhHTcc4aqxFcfutNGOJqa5jBThzTDScVCB7ysoA5jTGBIek9eIcn16Ww== X-Received: by 2002:a63:4463:: with SMTP id t35mr9024769pgk.412.1586382947456; Wed, 08 Apr 2020 14:55:47 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:47 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 04/20] jbd2: add fast commit block tracker variables Date: Wed, 8 Apr 2020 14:55:14 -0700 Message-Id: <20200408215530.25649-4-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Add j_first_fc, j_last_fc and j_fc_offset variables to track fast commit area. j_first_fc and j_last_fc mark the start and the end of the area, while j_fc_offset points to the last used block in the region. Signed-off-by: Harshad Shirwadkar --- fs/jbd2/journal.c | 33 ++++++++++++++++++++++++++++----- include/linux/jbd2.h | 24 ++++++++++++++++++++++++ 2 files changed, 52 insertions(+), 5 deletions(-) diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 4e5d41d79b24..79f015f7bf54 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1178,6 +1178,11 @@ static journal_t *journal_init_common(struct block_device *bdev, if (!journal->j_wbuf) goto err_cleanup; + if (journal->j_fc_wbufsize > 0) { + journal->j_wbufsize = n - journal->j_fc_wbufsize; + journal->j_fc_wbuf = &journal->j_wbuf[journal->j_wbufsize]; + } + bh = getblk_unmovable(journal->j_dev, start, journal->j_blocksize); if (!bh) { pr_err("%s: Cannot get buffer for journal superblock\n", @@ -1321,11 +1326,20 @@ static int journal_reset(journal_t *journal) } journal->j_first = first; - journal->j_last = last; - journal->j_head = first; - journal->j_tail = first; - journal->j_free = last - first; + if (jbd2_has_feature_fast_commit(journal) && + journal->j_fc_wbufsize > 0) { + journal->j_last_fc = last; + journal->j_last = last - journal->j_fc_wbufsize; + journal->j_first_fc = journal->j_last + 1; + journal->j_fc_off = 0; + } else { + journal->j_last = last; + } + + journal->j_head = journal->j_first; + journal->j_tail = journal->j_first; + journal->j_free = journal->j_last - journal->j_first; journal->j_tail_sequence = journal->j_transaction_sequence; journal->j_commit_sequence = journal->j_transaction_sequence - 1; @@ -1667,9 +1681,18 @@ static int load_superblock(journal_t *journal) journal->j_tail_sequence = be32_to_cpu(sb->s_sequence); journal->j_tail = be32_to_cpu(sb->s_start); journal->j_first = be32_to_cpu(sb->s_first); - journal->j_last = be32_to_cpu(sb->s_maxlen); journal->j_errno = be32_to_cpu(sb->s_errno); + if (jbd2_has_feature_fast_commit(journal) && + journal->j_fc_wbufsize > 0) { + journal->j_last_fc = be32_to_cpu(sb->s_maxlen); + journal->j_last = journal->j_last_fc - journal->j_fc_wbufsize; + journal->j_first_fc = journal->j_last + 1; + journal->j_fc_off = 0; + } else { + journal->j_last = be32_to_cpu(sb->s_maxlen); + } + return 0; } diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 3bd1431cb222..1fc981cca479 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -910,6 +910,30 @@ struct journal_s */ unsigned long j_last; + /** + * @j_first_fc: + * + * The block number of the first fast commit block in the journal + * [j_state_lock]. + */ + unsigned long j_first_fc; + + /** + * @j_fc_off: + * + * Number of fast commit blocks currently allocated. + * [j_state_lock]. + */ + unsigned long j_fc_off; + + /** + * @j_last_fc: + * + * The block number one beyond the last fast commit block in the journal + * [j_state_lock]. + */ + unsigned long j_last_fc; + /** * @j_dev: Device where we store the journal. */ From patchwork Wed Apr 8 21:55:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268347 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=RyJSosRS; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4R50wpz9sSq for ; Thu, 9 Apr 2020 07:55:51 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726706AbgDHVzu (ORCPT ); Wed, 8 Apr 2020 17:55:50 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:44642 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726701AbgDHVzt (ORCPT ); Wed, 8 Apr 2020 17:55:49 -0400 Received: by mail-pl1-f196.google.com with SMTP id h11so3023144plr.11 for ; Wed, 08 Apr 2020 14:55:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zp897UpkuL2UU+LkHaSYhEJOVKSoljAk865Ib3Ff/78=; b=RyJSosRSQBz0v9NfmaelwsBdj9d33WWyeZdmckVnh3x/7oo5n9l5Q3NZ7Y3ghKBGmP uf8JfPnvRf5K2qk+B8s1WDVBe01hSAklXaR2lJJ1zZ/MdaeqwinQzrWc8j3v3jal/WhC 9MlRRfjXtjFEuRmm96/DRLtrTmckuib69MD7kWTI581nd3ITxhxzUIBAOowCi3PhomKb S9bXV6zQn/OuYB/p4XSzC9ONF10yVya8I7ouJi6Ibogmw8D9P4y8NCaOKBFH8B844dRT cgiMa/C+HOxT2xApnVp2T/9Wq/Ptgv0JyuhgzFYJS1vCTYJ7G0xI0E1TO49vbT7GGkXb +hQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zp897UpkuL2UU+LkHaSYhEJOVKSoljAk865Ib3Ff/78=; b=tPituKCEXT2EU7Isjgy4IFrBhxxlK48dfM56bo9VPW+j3m6HzuJdPuH39mH93X2gpb qaTXFRGkQHyLepGBMxXWwwMWfL97FFnVHJK80yfg62uYQKh/dPq9+cK05KoKNoAyZv+M A43dYxyGIVZ1gIE7L0eypRvz//6JojF3vhRIPS0lsBDWENZGMQsA3+eOIqd4Ee52Pq+Y +qR54/pDua1tqHia03O83LKEttYMmpgHIt/g9Nnoxz7zjmJstzowS+dxevJmPssSu7ga M4VsWDdwjcdyje5QiSzRjI78GoPtksQB78WKHxow9KSLySYGq5XBUcfXRlz3t908kLL9 1cfA== X-Gm-Message-State: AGi0PuZa760T08EnDpWROLVmPatFVj4BjDH4aqGoPn0N+vSRHfR0cCWD VRZXN34G+0gp5yN3e/aCjlv64yHO X-Google-Smtp-Source: APiQypIJaqK7Su+a2rX86UUKYBeLLZUjtLpCd9wxnfFGzX1z7B4YomW1aOFwvQJYjjtVWIANzrSu2g== X-Received: by 2002:a17:902:aa09:: with SMTP id be9mr7870207plb.341.1586382948457; Wed, 08 Apr 2020 14:55:48 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:47 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 05/20] jbd2: disable fast commits if journal is empty Date: Wed, 8 Apr 2020 14:55:15 -0700 Message-Id: <20200408215530.25649-5-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar If journal is empty, clear the fast commit flag from the on disk superblock. With this optimization, chances of running into backward compatibility issues are reduced. Signed-off-by: Harshad Shirwadkar --- fs/jbd2/journal.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 79f015f7bf54..f8f55d0814ea 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1482,6 +1482,7 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid, static void jbd2_mark_journal_empty(journal_t *journal, int write_op) { journal_superblock_t *sb = journal->j_superblock; + bool had_fast_commit = false; BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex)); lock_buffer(journal->j_sb_buffer); @@ -1495,9 +1496,20 @@ static void jbd2_mark_journal_empty(journal_t *journal, int write_op) sb->s_sequence = cpu_to_be32(journal->j_tail_sequence); sb->s_start = cpu_to_be32(0); + if (jbd2_has_feature_fast_commit(journal)) { + /* + * When journal is clean, no need to commit fast commit flag and + * make file system incompatible with older kernels. + */ + jbd2_clear_feature_fast_commit(journal); + had_fast_commit = true; + } jbd2_write_superblock(journal, write_op); + if (had_fast_commit) + jbd2_set_feature_fast_commit(journal); + /* Log is no longer empty */ write_lock(&journal->j_state_lock); journal->j_flags |= JBD2_FLUSHED; From patchwork Wed Apr 8 21:55:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268362 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=oSHi90jY; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ5K3YhVz9sR4 for ; Thu, 9 Apr 2020 07:56:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727015AbgDHV4g (ORCPT ); Wed, 8 Apr 2020 17:56:36 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:34435 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726707AbgDHVzv (ORCPT ); Wed, 8 Apr 2020 17:55:51 -0400 Received: by mail-pl1-f194.google.com with SMTP id a23so3049222plm.1 for ; Wed, 08 Apr 2020 14:55:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Mvj8takKlh+auL4njKXthM3ts4E4u4tf8ZoUI6zM4LQ=; b=oSHi90jY29kXojlSaBfR9QFjURYIOGtAZIPUZqLXI9jxUwvT2RvgzN2cXX1vWnU4KR aZJ41nNFEhSlCYgv5wquAbiV/E7rJeKc5h3rKySzmFknj2Bpx0z7T13khNWcFbVDNKFu sGl5QtlbL7NQtLCKXu8rL5m8KEYvAPoywP+Gr87zpwLP292l9Hf2HPXuZ+9Bpv2ciDU5 zJ5p7OUmTroxqTMDZPPrzT5LX/vYe/0xqz3hlRuDtGMKZC/aA4jH30CTrkX3gdIcRTtR P/HLNvRAuqHO+wH+gPE0JRs3eZQTy2f1gWCErjrsFEw1xvRbMwB31WsISJniQKhnG7uK I1Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Mvj8takKlh+auL4njKXthM3ts4E4u4tf8ZoUI6zM4LQ=; b=efmrZiSZAyXnrOo1g66ysuEqxrjvpx1rcFrEfsjn+w0QWozC/2tojIBgJoBErvSbtE SMD7V6cUSM6ujNCIafjaYcCS9rZ4sN/UCpPx8WKy/yeS7nL3OulWw4tV8yq25Pi06ac4 7UX4Bgx6RioZgSjaIZe7px1wO6VIs0WJseZ4kYC3YK6UlBgffUV7PpSTaLrpbm2AGlkA 0Tpf+mHCJe1d+DArxay/N5Qg1pcJW4jFQq/xBQExPdAuUzGc+TESbRyyY2N/hk6aO3cf cIYQy+Uydw4bW4xrSQ0SppsBt7zRDkVVFRyKyOl+R1f1Aj17rpDLNGHW5ENoTWpaklq2 WXGA== X-Gm-Message-State: AGi0PuZSAxLB9slPGu6xcNFIHGV5BqKh7QvCTHXqwOq8eEc/phs/QbyE RIhHxyzrSzwi7DJcfEtbzkVOZVzI X-Google-Smtp-Source: APiQypKF20TlVf7v0tSSprLlXdWXF7GejMKBUgfJ42ri6tp158oArnss16DlfsvSQFdiJHlufwNn7Q== X-Received: by 2002:a17:90a:d596:: with SMTP id v22mr7490303pju.169.1586382949089; Wed, 08 Apr 2020 14:55:49 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:48 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 06/20] jbd2: fast commit main commit path changes Date: Wed, 8 Apr 2020 14:55:16 -0700 Message-Id: <20200408215530.25649-6-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Add 3 new APIs jbd2_start_async_fc_nowait(), jbd2_start_async_fc_wait() and jbd2_stop_async_fc(). These APIs can be used by file systems to indicate to jbd2 that they are starting or stopping a fast commit. Signed-off-by: Harshad Shirwadkar --- fs/jbd2/commit.c | 21 +++++++++++ fs/jbd2/journal.c | 85 +++++++++++++++++++++++++++++++++++++++++++- include/linux/jbd2.h | 21 +++++++++++ 3 files changed, 126 insertions(+), 1 deletion(-) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index e855d8260433..280d11591bcb 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -413,6 +413,23 @@ void jbd2_journal_commit_transaction(journal_t *journal) J_ASSERT(journal->j_running_transaction != NULL); J_ASSERT(journal->j_committing_transaction == NULL); + write_lock(&journal->j_state_lock); + journal->j_flags |= JBD2_FULL_COMMIT_ONGOING; + while (journal->j_flags & JBD2_FAST_COMMIT_ONGOING) { + DEFINE_WAIT(wait); + + prepare_to_wait(&journal->j_wait_async_fc, &wait, + TASK_UNINTERRUPTIBLE); + write_unlock(&journal->j_state_lock); + schedule(); + write_lock(&journal->j_state_lock); + finish_wait(&journal->j_wait_async_fc, &wait); + } + write_unlock(&journal->j_state_lock); + + if (journal->j_fc_cleanup_callback) + journal->j_fc_cleanup_callback(journal); + commit_transaction = journal->j_running_transaction; trace_jbd2_start_commit(journal, commit_transaction); @@ -420,6 +437,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) commit_transaction->t_tid); write_lock(&journal->j_state_lock); + journal->j_fc_off = 0; J_ASSERT(commit_transaction->t_state == T_RUNNING); commit_transaction->t_state = T_LOCKED; @@ -1125,6 +1143,8 @@ void jbd2_journal_commit_transaction(journal_t *journal) journal->j_commit_sequence, journal->j_tail_sequence); write_lock(&journal->j_state_lock); + journal->j_flags &= ~JBD2_FULL_COMMIT_ONGOING; + journal->j_flags &= ~JBD2_FAST_COMMIT_ONGOING; spin_lock(&journal->j_list_lock); commit_transaction->t_state = T_FINISHED; /* Check if the transaction can be dropped now that we are finished */ @@ -1136,6 +1156,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) spin_unlock(&journal->j_list_lock); write_unlock(&journal->j_state_lock); wake_up(&journal->j_wait_done_commit); + wake_up(&journal->j_wait_async_fc); /* * Calculate overall stats diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index f8f55d0814ea..d3897d155fb9 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -157,7 +157,9 @@ static void commit_timeout(struct timer_list *t) * * 1) COMMIT: Every so often we need to commit the current state of the * filesystem to disk. The journal thread is responsible for writing - * all of the metadata buffers to disk. + * all of the metadata buffers to disk. If a fast commit is ongoing + * journal thread waits until it's done and then copntinues from + * there on. * * 2) CHECKPOINT: We cannot reuse a used section of the log file until all * of the data in that part of the log has been rewritten elsewhere on @@ -714,6 +716,86 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid) return err; } +/* + * Returns 0 if async fc could be started. Returns -EINVAL if no full + * commit has been done yet. Returns -EALREADY if another fast / + * full commit is ongoing. + */ +int jbd2_start_async_fc_nowait(journal_t *journal, tid_t tid) +{ + /* + * Fast commits only allowed if at least one full commit has + * been processed. + */ + if (!journal->j_stats.ts_tid) + return -EINVAL; + + if (tid <= journal->j_commit_sequence) + return -EALREADY; + + write_lock(&journal->j_state_lock); + if (journal->j_flags & + (JBD2_FAST_COMMIT_ONGOING | JBD2_FULL_COMMIT_ONGOING)) { + write_unlock(&journal->j_state_lock); + return -EALREADY; + } + + journal->j_flags |= JBD2_FAST_COMMIT_ONGOING; + write_unlock(&journal->j_state_lock); + + return 0; +} + +/* + * Same as above but waits for any ongoing fast commits to complete. + * If a full commit is ongoing, this function returns with + * -EALREADY. + */ +int jbd2_start_async_fc_wait(journal_t *journal, tid_t tid) +{ + int ret; + + /* + * Fast commits only allowed if at least one full commit has + * been processed. + */ + if (!journal->j_stats.ts_tid) + return -EINVAL; + + if (tid <= journal->j_commit_sequence) + return -EALREADY; + + write_lock(&journal->j_state_lock); +restart: + if (journal->j_flags & JBD2_FULL_COMMIT_ONGOING) { + ret = -EALREADY; + } else if (journal->j_flags & JBD2_FAST_COMMIT_ONGOING) { + DEFINE_WAIT(wait); + + prepare_to_wait(&journal->j_wait_async_fc, &wait, + TASK_UNINTERRUPTIBLE); + write_unlock(&journal->j_state_lock); + schedule(); + write_lock(&journal->j_state_lock); + finish_wait(&journal->j_wait_async_fc, &wait); + goto restart; + } else { + journal->j_flags |= JBD2_FAST_COMMIT_ONGOING; + ret = 0; + } + write_unlock(&journal->j_state_lock); + + return ret; +} + +void jbd2_stop_async_fc(journal_t *journal, tid_t tid) +{ + write_lock(&journal->j_state_lock); + journal->j_flags &= ~JBD2_FAST_COMMIT_ONGOING; + write_unlock(&journal->j_state_lock); + wake_up(&journal->j_wait_async_fc); +} + /* Return 1 when transaction with given tid has already committed. */ int jbd2_transaction_committed(journal_t *journal, tid_t tid) { @@ -1140,6 +1222,7 @@ static journal_t *journal_init_common(struct block_device *bdev, init_waitqueue_head(&journal->j_wait_commit); init_waitqueue_head(&journal->j_wait_updates); init_waitqueue_head(&journal->j_wait_reserved); + init_waitqueue_head(&journal->j_wait_async_fc); mutex_init(&journal->j_barrier); mutex_init(&journal->j_checkpoint_mutex); spin_lock_init(&journal->j_revoke_lock); diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 1fc981cca479..0a4d9d484528 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -853,6 +853,13 @@ struct journal_s */ wait_queue_head_t j_wait_reserved; + /** + * @j_wait_async_fc: + * + * Wait queue to wait for completion of async fast commits. + */ + wait_queue_head_t j_wait_async_fc; + /** * @j_checkpoint_mutex: * @@ -1203,6 +1210,14 @@ struct journal_s */ struct lockdep_map j_trans_commit_map; #endif + + /** + * @j_fc_cleanup_callback: + * + * Clean-up after fast commit or full commit. JBD2 calls this function + * after every commit operation. + */ + void (*j_fc_cleanup_callback)(struct journal_s *journal); }; #define jbd2_might_wait_for_commit(j) \ @@ -1288,6 +1303,8 @@ JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit, FAST_COMMIT) * data write error in ordered * mode */ #define JBD2_REC_ERR 0x080 /* The errno in the sb has been recorded */ +#define JBD2_FAST_COMMIT_ONGOING 0x100 /* Fast commit is ongoing */ +#define JBD2_FULL_COMMIT_ONGOING 0x200 /* Full commit is ongoing */ /* * Function declarations for the journaling transaction and buffer @@ -1540,6 +1557,10 @@ void __jbd2_log_wait_for_space(journal_t *journal); extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *); extern int jbd2_cleanup_journal_tail(journal_t *); +/* Fast commit related APIs */ +int jbd2_start_async_fc_nowait(journal_t *journal, tid_t tid); +int jbd2_start_async_fc_wait(journal_t *journal, tid_t tid); +void jbd2_stop_async_fc(journal_t *journal, tid_t tid); void jbd2_init_fast_commit(journal_t *journal, int num_fc_blks); /* * is_journal_abort From patchwork Wed Apr 8 21:55:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268348 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=rqf+kk16; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4X2KDqz9sR4 for ; Thu, 9 Apr 2020 07:55:56 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726736AbgDHVzy (ORCPT ); Wed, 8 Apr 2020 17:55:54 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:41122 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726701AbgDHVzw (ORCPT ); Wed, 8 Apr 2020 17:55:52 -0400 Received: by mail-pf1-f194.google.com with SMTP id b8so1220722pfp.8 for ; Wed, 08 Apr 2020 14:55:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XHjdeKKwUYR4SKjk2mxZ8+XNNI4oSwIR+5A6Xr13goU=; b=rqf+kk16lE2RDDxEU6yKqry86totEozyNcHw5quM5ZFEeIiGQ3P/81aNhiXPVmS8ty JTUthyVi09hnsPUt9NbdYl3EeRqJC/gWqXauK+5ZQOPME6XanEeIVZhni+Tyk0D644SY OJLvg7a4nTxYw/8FEQu5Ah6XtN+OjUPMPFhbu4w7wG9uNODxRcwSuTp/uBG9ghqUC4oA LnQ/XVEmeFunbLjdK2ybx3oyONFoetUNhRSWOYTqIdZacmrcEFI7+UVQDDJmHFol4GdF M3IVAJzRUIGFGfVejFHaAkjUdn1TDHBdjtVJySuUteOBql6kpIhjHETgOU2SO4peWIPo Jplg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XHjdeKKwUYR4SKjk2mxZ8+XNNI4oSwIR+5A6Xr13goU=; b=ewkOgGkZYcnxpLTxbxKWeJGkzUhJ08ttOOkT9+yVfBZmjfEqT2Be04J7GKedE3WHLs c0OjgLOIVlJyCqLPoNZcTvN2LSK29HSzoS57laehwVTfQKJaOXhBu5k7CTaY+xyrg1sd 1IwUePfLe2dLt3pMb0PLz/aj3HahcHapJo48u97tOFVdwj8V5pwND1hIMxcteVFICBlO aro0Z7V+MT85cvCcU6DOVttx2DTAKBRfiV+NIomFLvUzSR6cWJS3bR0LRMoRM1/95m1k J+STLU/L13sq6pzDcDvAESqNEQVzmSGlWkbX5FjP/Vwtr/F6eZ1PN17l2CIMseaoFx37 jrmQ== X-Gm-Message-State: AGi0PuZqjNCa1W7sUG3tYyYpMOVUpV3AujElHXGOOuuQXQdjwIeXl78S fxEMZtSbZFiFO/5WHDPiN8aTt5GP X-Google-Smtp-Source: APiQypKd+Q85GL+Yprl1La2nA6rbFxXMaJuguaI5Y7UBFZcNAcgNIwQPDTVi/8Z+TpwTRdgLrBDR1g== X-Received: by 2002:a63:df42:: with SMTP id h2mr9112346pgj.216.1586382949806; Wed, 08 Apr 2020 14:55:49 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:49 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar , kbuild test robot Subject: [PATCH v6 07/20] ext4: add generic diff tracking routines and range tracking Date: Wed, 8 Apr 2020 14:55:17 -0700 Message-Id: <20200408215530.25649-7-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar In fast commits, we need to track changes that have been made to the file system since last full commit. Add generic diff tracking infrastructure. We use those helpers to track logical block ranges that have been affected for inodes. The diff tracking helpers are used in following patches to track directory entry updates as well. Signed-off-by: Harshad Shirwadkar Reported-by: kbuild test robot --- fs/ext4/ext4.h | 32 ++++++++++ fs/ext4/ext4_jbd2.c | 121 ++++++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.h | 3 + fs/ext4/inode.c | 18 ++++++ fs/ext4/super.c | 5 ++ include/trace/events/ext4.h | 27 ++++++++ 6 files changed, 206 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 57f8fd4fe6ad..c07ab844c335 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -983,6 +983,32 @@ struct ext4_inode_info { struct list_head i_orphan; /* unlinked but open inodes */ + struct list_head i_fc_list; /* + * inodes that need fast commit + * protected by sbi->s_fc_lock. + */ + /* + * TID of when this struct was last updated. If fc_tid != + * running transaction tid, then none of the other fields in this + * struct are valid. Don't directly modify fields in this struct. + * Use wrappers provided in ext4_jbd2.c. + */ + tid_t i_fc_tid; + + /* + * Start of logical block range that needs to be committed in + * this fast commit. + */ + ext4_lblk_t i_fc_lblk_start; + + /* + * End of logical block range that needs to be committed in this fast + * commit + */ + ext4_lblk_t i_fc_lblk_end; + + rwlock_t i_fc_lock; + /* * i_disksize keeps track of what the inode size is ON DISK, not * in memory. During truncate, i_size is set to the new size by @@ -1102,6 +1128,7 @@ struct ext4_inode_info { #define EXT4_VALID_FS 0x0001 /* Unmounted cleanly */ #define EXT4_ERROR_FS 0x0002 /* Errors detected */ #define EXT4_ORPHAN_FS 0x0004 /* Orphans being recovered */ +#define EXT4_FC_REPLAY 0x0008 /* Fast commit replay ongoing */ /* * Misc. filesystem flags @@ -1567,6 +1594,11 @@ struct ext4_sb_info { #ifdef CONFIG_EXT4_DEBUG unsigned long s_simulate_fail; #endif + /* Ext4 fast commit stuff */ + struct list_head s_fc_q; /* Inodes staged for fast commit + * that have data changes in them. + */ + spinlock_t s_fc_lock; }; static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 91d6437bc9b3..151a4558c338 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -367,6 +367,127 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line, mark_buffer_dirty(bh); return err; } + +static inline +void ext4_reset_inode_fc_info(struct inode *inode) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + + ei->i_fc_tid = 0; + ei->i_fc_lblk_start = 0; + ei->i_fc_lblk_end = 0; +} + +void ext4_init_inode_fc_info(struct inode *inode) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + + ext4_reset_inode_fc_info(inode); + INIT_LIST_HEAD(&ei->i_fc_list); +} + +static void ext4_fc_enqueue_inode(struct inode *inode) +{ + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + + if (!ext4_should_fast_commit(inode->i_sb) || + (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)) + return; + + spin_lock(&sbi->s_fc_lock); + if (list_empty(&EXT4_I(inode)->i_fc_list)) + list_add_tail(&EXT4_I(inode)->i_fc_list, &sbi->s_fc_q); + spin_unlock(&sbi->s_fc_lock); +} + +static inline tid_t get_running_txn_tid(struct super_block *sb) +{ + if (EXT4_SB(sb)->s_journal) + return EXT4_SB(sb)->s_journal->j_commit_sequence + 1; + return 0; +} + +/* + * Generic fast commit tracking function. If this is the first + * time this we are called after a full commit, we initialize + * fast commit fields and then call __fc_track_fn() with + * update = 0. If we have already been called after a full commit, + * we pass update = 1. Based on that, the track function can + * determine if it needs to track a field for the first time + * or if it needs to just update the previously tracked value. + */ +static int __ext4_fc_track_template( + struct inode *inode, + int (*__fc_track_fn)(struct inode *, void *, bool), + void *args) +{ + tid_t running_txn_tid = get_running_txn_tid(inode->i_sb); + bool update = false; + struct ext4_inode_info *ei = EXT4_I(inode); + int ret; + + if (!ext4_should_fast_commit(inode->i_sb) || + (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)) + return -EOPNOTSUPP; + + write_lock(&ei->i_fc_lock); + if (running_txn_tid == ei->i_fc_tid) { + update = true; + } else { + ext4_reset_inode_fc_info(inode); + ei->i_fc_tid = running_txn_tid; + } + ret = __fc_track_fn(inode, args, update); + write_unlock(&ei->i_fc_lock); + + ext4_fc_enqueue_inode(inode); + + return ret; +} +struct __ext4_fc_track_range_args { + ext4_lblk_t start, end; +}; + +#define MIN(__a, __b) ((__a) < (__b) ? (__a) : (__b)) +#define MAX(__a, __b) ((__a) > (__b) ? (__a) : (__b)) + +int __ext4_fc_track_range(struct inode *inode, void *arg, bool update) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + struct __ext4_fc_track_range_args *__arg = + (struct __ext4_fc_track_range_args *)arg; + + if (inode->i_ino < EXT4_FIRST_INO(inode->i_sb)) { + ext4_debug("Special inode %ld being modified\n", inode->i_ino); + return -ECANCELED; + } + + if (update) { + ei->i_fc_lblk_start = MIN(ei->i_fc_lblk_start, __arg->start); + ei->i_fc_lblk_end = MAX(ei->i_fc_lblk_end, __arg->end); + } else { + ei->i_fc_lblk_start = __arg->start; + ei->i_fc_lblk_end = __arg->end; + } + + return 0; +} + +void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start, + ext4_lblk_t end) +{ + struct __ext4_fc_track_range_args args; + int ret; + + args.start = start; + args.end = end; + + ret = __ext4_fc_track_template(inode, + __ext4_fc_track_range, &args); + + trace_ext4_fc_track_range(inode, start, end, ret); +} + void ext4_init_fast_commit(struct super_block *sb, journal_t *journal) { if (!ext4_should_fast_commit(sb)) diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index b15cfa89cf1d..06d1e4a885b7 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -531,4 +531,7 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) #define EXT4_NUM_FC_BLKS 128 void ext4_init_fast_commit(struct super_block *sb, journal_t *journal); +void ext4_init_inode_fc_info(struct inode *inode); +void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start, + ext4_lblk_t end); #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e416096fc081..3bf0ad4d7d32 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -725,6 +725,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, if (ret) return ret; } + ext4_fc_track_range(inode, map->m_lblk, + map->m_lblk + map->m_len - 1); } return retval; } @@ -4073,6 +4075,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) up_write(&EXT4_I(inode)->i_data_sem); } + ext4_fc_track_range(inode, first_block, stop_block); if (IS_SYNC(inode)) ext4_handle_sync(handle); @@ -4684,6 +4687,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, for (block = 0; block < EXT4_N_BLOCKS; block++) ei->i_data[block] = raw_inode->i_block[block]; INIT_LIST_HEAD(&ei->i_orphan); + ext4_init_inode_fc_info(&ei->vfs_inode); /* * Set transaction id's of transactions that have to be committed @@ -5351,6 +5355,20 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) inode->i_mtime = current_time(inode); inode->i_ctime = inode->i_mtime; } + + if (shrink) + ext4_fc_track_range( + inode, attr->ia_size >> + inode->i_sb->s_blocksize_bits, + oldsize >> + inode->i_sb->s_blocksize_bits); + else + ext4_fc_track_range( + inode, oldsize >> + inode->i_sb->s_blocksize_bits, + attr->ia_size >> + inode->i_sb->s_blocksize_bits); + down_write(&EXT4_I(inode)->i_data_sem); EXT4_I(inode)->i_disksize = attr->ia_size; rc = ext4_mark_inode_dirty(handle, inode); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 0bfaf76200d2..99b24156933a 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1151,6 +1151,8 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) ei->i_datasync_tid = 0; atomic_set(&ei->i_unwritten, 0); INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work); + ext4_init_inode_fc_info(&ei->vfs_inode); + rwlock_init(&ei->i_fc_lock); return &ei->vfs_inode; } @@ -1193,6 +1195,7 @@ static void init_once(void *foo) init_rwsem(&ei->i_data_sem); init_rwsem(&ei->i_mmap_sem); inode_init_once(&ei->vfs_inode); + ext4_init_inode_fc_info(&ei->vfs_inode); } static int __init init_inodecache(void) @@ -4417,6 +4420,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */ mutex_init(&sbi->s_orphan_lock); + INIT_LIST_HEAD(&sbi->s_fc_q); + spin_lock_init(&sbi->s_fc_lock); sb->s_root = NULL; needs_recovery = (es->s_last_orphan != 0 || diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 19c87661eeec..9424ffb2a54b 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2723,6 +2723,33 @@ TRACE_EVENT(ext4_error, __entry->function, __entry->line) ); +TRACE_EVENT(ext4_fc_track_range, + TP_PROTO(struct inode *inode, long start, long end, int ret), + + TP_ARGS(inode, start, end, ret), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int, ino) + __field(long, start) + __field(long, end) + __field(int, error) + ), + + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->start = start; + __entry->end = end; + __entry->error = ret; + ), + + TP_printk("dev %d:%d, inode %d, error %d, start %ld, end %ld", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, __entry->error, __entry->start, + __entry->end) + ); + #endif /* _TRACE_EXT4_H */ /* This part must be outside protection */ From patchwork Wed Apr 8 21:55:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268349 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=pGS90TMN; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4X6RT0z9sSj for ; Thu, 9 Apr 2020 07:55:56 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726740AbgDHVzz (ORCPT ); Wed, 8 Apr 2020 17:55:55 -0400 Received: from mail-pj1-f65.google.com ([209.85.216.65]:52782 "EHLO mail-pj1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726708AbgDHVzw (ORCPT ); Wed, 8 Apr 2020 17:55:52 -0400 Received: by mail-pj1-f65.google.com with SMTP id ng8so402685pjb.2 for ; Wed, 08 Apr 2020 14:55:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dr8n2yn6EsOkSCTVHEoi4j033Jna0CorAOaC29kYXiY=; b=pGS90TMNR8dT7aGyiqnzvPStg7bShXeFnoexEVQlmw44USrlaiCCAFXmYu+ZzJFP+p 06gvGp42O5GVmWr5ZV2UBBwh81cp7Ic5WFe2T04aHeQuiZqC2PPuM61V8L4skWH/x+xt XV1jgtlkF0GowJ1F848ZB0z2sIeJLfwq01+xsOIU3LuNN++HkL0l5hM2WvTGNhvXupOl ZQm1DfzXKWs6xDr304t9ORl2ktck0l6GPVtBCFLUBAvayl0HnmRzynog1PRfuQWQiIuE i+8hTNLu1pJDTju2pAhZA896V/Yvtygx8K5o/BMRRCw52g1xW5dvzSeRCS++dGmAkPIQ pdcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dr8n2yn6EsOkSCTVHEoi4j033Jna0CorAOaC29kYXiY=; b=Ov9NVLUs8/vkB7Yzz1rdasMxGiPeVGE6PABcVQHa0cPRvTtAoBuK2YuhixYiD62G5f 2NjGEPZ5+J+mSk5uJ7rzj0kLUNuDI93jl6/9wHd4GRDs0j4mxHbev6Syy2hz2BZCLhxN gb3lRzDHB+B9dqLM3owukgygVv5vpT6VCBDEx/h8gZi1mYsV0KfBMPpMD4+7bMNIr9w6 Ug8ZHzWi/A8vz82S0v0KiOr2yV2S7qYjyz/CcE78H/7n8rAG0G69/AkhiT4dFkdo0tm3 NEdpkcjKl6It+wuF+hGiqlT63R9pC3midSRN0W6u/szJ6vXL5QURyzcZIVkVLqblWCHj OJ7Q== X-Gm-Message-State: AGi0Pubt1sYNcE5vh9/Zz8s6x8TT5FXVrulQaqqLTuG2UiqaRUDe1xXF HKUh65o7wG2T9ovlDcmyDuaZTj/K X-Google-Smtp-Source: APiQypLnCMarMLM7QZmzqykWNAJpkSiMtNmszW8HeYmri7j7Vmoh6G+uhPPBJzZa3psv/b9B4NhQZg== X-Received: by 2002:a17:90a:ba09:: with SMTP id s9mr7927495pjr.20.1586382950492; Wed, 08 Apr 2020 14:55:50 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:50 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 08/20] ext4: add directory entry tracking routines Date: Wed, 8 Apr 2020 14:55:18 -0700 Message-Id: <20200408215530.25649-8-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Adds directory entry change tracking routines for fast commits. Use an in-memory list of directory updates to track directory entry updates. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4.h | 26 +++++++++ fs/ext4/ext4_jbd2.c | 102 ++++++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.h | 4 ++ fs/ext4/super.c | 7 +++ include/trace/events/ext4.h | 28 ++++++++++ 5 files changed, 167 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index c07ab844c335..669ecf12d392 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -950,6 +950,26 @@ enum { }; +/* Fast commit tags */ +#define EXT4_FC_TAG_ADD_RANGE 0x1 +#define EXT4_FC_TAG_DEL_RANGE 0x2 +#define EXT4_FC_TAG_CREAT_DENTRY 0x3 +#define EXT4_FC_TAG_ADD_DENTRY 0x4 +#define EXT4_FC_TAG_DEL_DENTRY 0x5 + +/* + * In memory list of dentry updates that are performed on the file + * system used by fast commit code. + */ +struct ext4_fc_dentry_update { + int fcd_op; /* Type of update create / add / del */ + int fcd_parent; /* Parent inode number */ + int fcd_ino; /* Inode number */ + struct qstr fcd_name; /* Dirent name qstr */ + unsigned char fcd_iname[DNAME_INLINE_LEN]; /* Dirent name string */ + struct list_head fcd_list; +}; + /* * fourth extended file system inode data in memory */ @@ -1009,6 +1029,11 @@ struct ext4_inode_info { rwlock_t i_fc_lock; + /* + * Last mdata / dirent update that happened on this inode. + */ + struct ext4_fc_dentry_update *i_fc_mdata_update; + /* * i_disksize keeps track of what the inode size is ON DISK, not * in memory. During truncate, i_size is set to the new size by @@ -1598,6 +1623,7 @@ struct ext4_sb_info { struct list_head s_fc_q; /* Inodes staged for fast commit * that have data changes in them. */ + struct list_head s_fc_dentry_q; spinlock_t s_fc_lock; }; diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 151a4558c338..ccaaf1c09ba6 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -368,6 +368,8 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line, return err; } +static struct kmem_cache *ext4_fc_dentry_cachep; + static inline void ext4_reset_inode_fc_info(struct inode *inode) { @@ -376,6 +378,7 @@ void ext4_reset_inode_fc_info(struct inode *inode) ei->i_fc_tid = 0; ei->i_fc_lblk_start = 0; ei->i_fc_lblk_end = 0; + ei->i_fc_mdata_update = NULL; } void ext4_init_inode_fc_info(struct inode *inode) @@ -444,6 +447,94 @@ static int __ext4_fc_track_template( return ret; } + +struct __ext4_dentry_update_args { + struct dentry *dentry; + int op; +}; + +static int __ext4_dentry_update(struct inode *inode, void *arg, bool update) +{ + struct ext4_fc_dentry_update *node; + struct ext4_inode_info *ei = EXT4_I(inode); + struct __ext4_dentry_update_args *dentry_update = + (struct __ext4_dentry_update_args *)arg; + struct dentry *dentry = dentry_update->dentry; + + write_unlock(&ei->i_fc_lock); + node = kmem_cache_alloc(ext4_fc_dentry_cachep, GFP_NOFS); + if (!node) { + write_lock(&ei->i_fc_lock); + return -ENOMEM; + } + + node->fcd_op = dentry_update->op; + node->fcd_parent = dentry->d_parent->d_inode->i_ino; + node->fcd_ino = inode->i_ino; + if (dentry->d_name.len > DNAME_INLINE_LEN) { + node->fcd_name.name = kmalloc(dentry->d_name.len + 1, + GFP_KERNEL); + if (!node->fcd_iname) { + kmem_cache_free(ext4_fc_dentry_cachep, node); + return -ENOMEM; + } + memcpy((u8 *)node->fcd_name.name, dentry->d_name.name, + dentry->d_name.len); + } else { + memcpy(node->fcd_iname, dentry->d_name.name, + dentry->d_name.len); + node->fcd_name.name = node->fcd_iname; + } + node->fcd_name.len = dentry->d_name.len; + + spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock); + list_add_tail(&node->fcd_list, &EXT4_SB(inode->i_sb)->s_fc_dentry_q); + spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); + write_lock(&ei->i_fc_lock); + EXT4_I(inode)->i_fc_mdata_update = node; + + return 0; +} + +void ext4_fc_track_unlink(struct inode *inode, struct dentry *dentry) +{ + struct __ext4_dentry_update_args args; + int ret; + + args.dentry = dentry; + args.op = EXT4_FC_TAG_DEL_DENTRY; + + ret = __ext4_fc_track_template(inode, __ext4_dentry_update, + (void *)&args); + trace_ext4_fc_track_unlink(inode, dentry, ret); +} + +void ext4_fc_track_link(struct inode *inode, struct dentry *dentry) +{ + struct __ext4_dentry_update_args args; + int ret; + + args.dentry = dentry; + args.op = EXT4_FC_TAG_ADD_DENTRY; + + ret = __ext4_fc_track_template(inode, __ext4_dentry_update, + (void *)&args); + trace_ext4_fc_track_link(inode, dentry, ret); +} + +void ext4_fc_track_create(struct inode *inode, struct dentry *dentry) +{ + struct __ext4_dentry_update_args args; + int ret; + + args.dentry = dentry; + args.op = EXT4_FC_TAG_CREAT_DENTRY; + + ret = __ext4_fc_track_template(inode, __ext4_dentry_update, + (void *)&args); + trace_ext4_fc_track_create(inode, dentry, ret); +} + struct __ext4_fc_track_range_args { ext4_lblk_t start, end; }; @@ -494,3 +585,14 @@ void ext4_init_fast_commit(struct super_block *sb, journal_t *journal) return; jbd2_init_fast_commit(journal, EXT4_NUM_FC_BLKS); } + +int __init ext4_init_fc_dentry_cache(void) +{ + ext4_fc_dentry_cachep = KMEM_CACHE(ext4_fc_dentry_update, + SLAB_RECLAIM_ACCOUNT); + + if (ext4_fc_dentry_cachep == NULL) + return -ENOMEM; + + return 0; +} diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 06d1e4a885b7..8fbd09dbfeca 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -534,4 +534,8 @@ void ext4_init_fast_commit(struct super_block *sb, journal_t *journal); void ext4_init_inode_fc_info(struct inode *inode); void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end); +void ext4_fc_track_unlink(struct inode *inode, struct dentry *dentry); +void ext4_fc_track_link(struct inode *inode, struct dentry *dentry); +void ext4_fc_track_create(struct inode *inode, struct dentry *dentry); +int __init ext4_init_fc_dentry_cache(void); #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 99b24156933a..a93dada07623 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4421,6 +4421,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) mutex_init(&sbi->s_orphan_lock); INIT_LIST_HEAD(&sbi->s_fc_q); + INIT_LIST_HEAD(&sbi->s_fc_dentry_q); spin_lock_init(&sbi->s_fc_lock); sb->s_root = NULL; @@ -6249,6 +6250,11 @@ static int __init ext4_init_fs(void) err = init_inodecache(); if (err) goto out1; + + err = ext4_init_fc_dentry_cache(); + if (err) + goto out05; + register_as_ext3(); register_as_ext2(); err = register_filesystem(&ext4_fs_type); @@ -6259,6 +6265,7 @@ static int __init ext4_init_fs(void) out: unregister_as_ext2(); unregister_as_ext3(); +out05: destroy_inodecache(); out1: ext4_exit_mballoc(); diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 9424ffb2a54b..577c6230b23a 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2723,6 +2723,34 @@ TRACE_EVENT(ext4_error, __entry->function, __entry->line) ); +#define DEFINE_TRACE_DENTRY_EVENT(__type) \ + TRACE_EVENT(ext4_fc_track_##__type, \ + TP_PROTO(struct inode *inode, struct dentry *dentry, int ret), \ + \ + TP_ARGS(inode, dentry, ret), \ + \ + TP_STRUCT__entry( \ + __field(dev_t, dev) \ + __field(int, ino) \ + __field(int, error) \ + ), \ + \ + TP_fast_assign( \ + __entry->dev = inode->i_sb->s_dev; \ + __entry->ino = inode->i_ino; \ + __entry->error = ret; \ + ), \ + \ + TP_printk("dev %d:%d, inode %d, error %d, fc_%s", \ + MAJOR(__entry->dev), MINOR(__entry->dev), \ + __entry->ino, __entry->error, \ + #__type) \ + ) + +DEFINE_TRACE_DENTRY_EVENT(create); +DEFINE_TRACE_DENTRY_EVENT(link); +DEFINE_TRACE_DENTRY_EVENT(unlink); + TRACE_EVENT(ext4_fc_track_range, TP_PROTO(struct inode *inode, long start, long end, int ret), From patchwork Wed Apr 8 21:55:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268350 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=Eax6pzoK; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4Y2y49z9sSq for ; Thu, 9 Apr 2020 07:55:57 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726749AbgDHVz4 (ORCPT ); Wed, 8 Apr 2020 17:55:56 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:44009 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726714AbgDHVzw (ORCPT ); Wed, 8 Apr 2020 17:55:52 -0400 Received: by mail-pf1-f196.google.com with SMTP id l1so1138404pff.10 for ; Wed, 08 Apr 2020 14:55:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=19DeMtAJ5UALfeyWX0r6JzF4fe5obfUieLCkFbjvJxM=; b=Eax6pzoKsmm4EhZc8TNzVXJjW1X7I8KTxREsc1fosAFATQO1KhZvlykKSpLtqZR2EU GUDtxw23CYGG9SDDLgK68PZvETCTa42Sfr2BV64lbiVw9LnOx18Cu4Lt26iOvA0rNQ6e X5vrcdDsX+kTXPaYe4Hou0H/kvs/Ut9lfdWssAIgYxVVHWQIUTUtudAy6+ZpFxsGh651 rmp+qivqIMaqXFKBX8u/+ggeYzo7akEXCQ+Hf3j/LbaOLxIDxYeGKYIkuF/ZNC4z1/0D ti+lPcMQsi+ZEcvu6Nc7TYHpy/pej08F/y5cnrXVyIfCzEjJf/gimzDquah/dmuHER98 Focg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=19DeMtAJ5UALfeyWX0r6JzF4fe5obfUieLCkFbjvJxM=; b=Jg+TfTjWU0sjrwAn6xxk9qVus6mSmQJPQALlj3x8aBfv2akb6UB2IjB95ES+CwpS+A wlDi00W421ys27EuYMZUvhFS3YUsOiSLjty3Q0AaiaMpFi5V3E56tpwZ3yNHZGBZkUYc tbWe5p1p/LKpvd+yoMwuC34mbJS7IGiQ9qtoPA7Ml6xIgL0kKgTdvF5LsjDGfdT1w4O8 11g61RN8sMUKCjJuYlFc/8oJ55c92qr1wIWfmUmGWalDXq6FHAIsv5Tp9bPVpyr0BAu4 lKHiXMCFHXfMjQa9dmezCJYmlICqlYAjO2HGVnSxf0GM6ge1ZW2wgiQehK4JaNtkiaQO BVow== X-Gm-Message-State: AGi0PuZIhMOC/dxsicovUJwOrj470X8cfp1l27JEbxXsjyYyfS1OBNK6 EL8BiJZYd4TRINe6gzecZwPmA6iq X-Google-Smtp-Source: APiQypIX5PsRO8wjJgalg/tIczZ9PeY7b49HZXMmJeZQ+v9tcJvl5a1/DwLwnSLUtg+ObQYbZCVq1g== X-Received: by 2002:a63:5053:: with SMTP id q19mr8865830pgl.66.1586382951121; Wed, 08 Apr 2020 14:55:51 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:50 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 09/20] ext4: add inode tracking and ineligible marking routines Date: Wed, 8 Apr 2020 14:55:19 -0700 Message-Id: <20200408215530.25649-9-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Under certain situations, such as zeroing a range, there are only data updates and no metadata updates. We need to track such inodes for fast commits. Also, under some situations, we need to fall back to full commits because remembering the delta is either not yet supported or fast commits won't be "fast" enough. In such cases, we need to mark inodes as ineligible for fast commits. Add routines that allow tracking just the inodes and marking inodes as well as entire file system as fast commit ineligible. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4.h | 2 ++ fs/ext4/ext4_jbd2.c | 57 +++++++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.h | 4 +++ fs/ext4/super.c | 1 + include/trace/events/ext4.h | 22 ++++++++++++++ 5 files changed, 86 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 669ecf12d392..e9c82f555b6d 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1154,6 +1154,7 @@ struct ext4_inode_info { #define EXT4_ERROR_FS 0x0002 /* Errors detected */ #define EXT4_ORPHAN_FS 0x0004 /* Orphans being recovered */ #define EXT4_FC_REPLAY 0x0008 /* Fast commit replay ongoing */ +#define EXT4_FC_INELIGIBLE 0x0010 /* Fast commit ineligible */ /* * Misc. filesystem flags @@ -1735,6 +1736,7 @@ enum { EXT4_STATE_EXT_PRECACHED, /* extents have been precached */ EXT4_STATE_LUSTRE_EA_INODE, /* Lustre-style ea_inode */ EXT4_STATE_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */ + EXT4_STATE_FC_ELIGIBLE, /* File is Fast commit eligible */ }; #define EXT4_INODE_BIT_FNS(name, field, offset) \ diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index ccaaf1c09ba6..9f12ae2fb3ab 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -379,6 +379,7 @@ void ext4_reset_inode_fc_info(struct inode *inode) ei->i_fc_lblk_start = 0; ei->i_fc_lblk_end = 0; ei->i_fc_mdata_update = NULL; + ext4_clear_inode_state(inode, EXT4_STATE_FC_ELIGIBLE); } void ext4_init_inode_fc_info(struct inode *inode) @@ -410,6 +411,36 @@ static inline tid_t get_running_txn_tid(struct super_block *sb) return 0; } +bool ext4_is_inode_fc_ineligible(struct inode *inode) +{ + if (get_running_txn_tid(inode->i_sb) == EXT4_I(inode)->i_fc_tid) + return !ext4_test_inode_state(inode, EXT4_STATE_FC_ELIGIBLE); + return false; +} + +void ext4_fc_mark_ineligible(struct inode *inode, int reason) +{ + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + struct ext4_inode_info *ei = EXT4_I(inode); + + if (!ext4_should_fast_commit(inode->i_sb) || + (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)) + return; + + if (sbi->s_journal) + ei->i_fc_tid = get_running_txn_tid(inode->i_sb); + ext4_clear_inode_state(inode, EXT4_STATE_FC_ELIGIBLE); + + ext4_fc_enqueue_inode(inode); +} + +void ext4_fc_disable(struct super_block *sb, int reason) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + + sbi->s_mount_state |= EXT4_FC_INELIGIBLE; +} + /* * Generic fast commit tracking function. If this is the first * time this we are called after a full commit, we initialize @@ -435,10 +466,15 @@ static int __ext4_fc_track_template( write_lock(&ei->i_fc_lock); if (running_txn_tid == ei->i_fc_tid) { + if (!ext4_test_inode_state(inode, EXT4_STATE_FC_ELIGIBLE)) { + write_unlock(&ei->i_fc_lock); + return -EINVAL; + } update = true; } else { ext4_reset_inode_fc_info(inode); ei->i_fc_tid = running_txn_tid; + ext4_set_inode_state(inode, EXT4_STATE_FC_ELIGIBLE); } ret = __fc_track_fn(inode, args, update); write_unlock(&ei->i_fc_lock); @@ -535,6 +571,27 @@ void ext4_fc_track_create(struct inode *inode, struct dentry *dentry) trace_ext4_fc_track_create(inode, dentry, ret); } +static int __ext4_fc_add_inode(struct inode *inode, void *arg, bool update) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + + if (update) + return -EEXIST; + + ei->i_fc_lblk_start = (i_size_read(inode) - 1) >> inode->i_blkbits; + ei->i_fc_lblk_end = (i_size_read(inode) - 1) >> inode->i_blkbits; + + return 0; +} + +void ext4_fc_track_inode(struct inode *inode) +{ + int ret; + + ret = __ext4_fc_track_template(inode, __ext4_fc_add_inode, NULL); + trace_ext4_fc_track_inode(inode, ret); +} + struct __ext4_fc_track_range_args { ext4_lblk_t start, end; }; diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 8fbd09dbfeca..b1239d6be713 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -538,4 +538,8 @@ void ext4_fc_track_unlink(struct inode *inode, struct dentry *dentry); void ext4_fc_track_link(struct inode *inode, struct dentry *dentry); void ext4_fc_track_create(struct inode *inode, struct dentry *dentry); int __init ext4_init_fc_dentry_cache(void); +void ext4_fc_track_inode(struct inode *inode); +void ext4_fc_mark_ineligible(struct inode *inode, int reason); +void ext4_fc_disable(struct super_block *sb, int reason); + #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a93dada07623..695bc43d5916 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4422,6 +4422,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) INIT_LIST_HEAD(&sbi->s_fc_q); INIT_LIST_HEAD(&sbi->s_fc_dentry_q); + sbi->s_mount_state &= ~EXT4_FC_INELIGIBLE; spin_lock_init(&sbi->s_fc_lock); sb->s_root = NULL; diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 577c6230b23a..5d278a8082a7 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2751,6 +2751,28 @@ DEFINE_TRACE_DENTRY_EVENT(create); DEFINE_TRACE_DENTRY_EVENT(link); DEFINE_TRACE_DENTRY_EVENT(unlink); +TRACE_EVENT(ext4_fc_track_inode, + TP_PROTO(struct inode *inode, int ret), + + TP_ARGS(inode, ret), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int, ino) + __field(int, error) + ), + + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->error = ret; + ), + + TP_printk("dev %d:%d, inode %d, error %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, __entry->error) + ); + TRACE_EVENT(ext4_fc_track_range, TP_PROTO(struct inode *inode, long start, long end, int ret), From patchwork Wed Apr 8 21:55:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268361 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=FpSVh7zy; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ5H6HcJz9sR4 for ; Thu, 9 Apr 2020 07:56:35 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726734AbgDHVzy (ORCPT ); Wed, 8 Apr 2020 17:55:54 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:46340 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726702AbgDHVzx (ORCPT ); Wed, 8 Apr 2020 17:55:53 -0400 Received: by mail-pl1-f194.google.com with SMTP id x2so415599plv.13 for ; Wed, 08 Apr 2020 14:55:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=X6Cdy/diJgY3l1SwROJcGixlj3eqYx1Ab0GaummoDv4=; b=FpSVh7zyT21wLHrJL3I9lgDXsz6YbLVbU+HAFw1hSMlYz3VH5OmcN/JHvmw3s5dIoZ GtcMzBVcp0X2FVkjS3Nj/gYzHEtPxKoLwHXTVuqrWC8J9AkYIMN1ZA/mgqoWgdGOxv7o ILmTFXk6+Kl5g93jgO0QsVz3HwCWf+grTMG6GfEdhK74GWZe9aSbc43Lpn9nuhslynic lA1yMlZFVmkmUN4/cA0odNRveRlf0zngCJoiloEzJ/JkZ8ffD+2KsXCLjZ6iInAyjjzx vLTDWjLciHz7urE7DuPAGUzyd9fkWjYDio2AVSkzyI9N+Lnabgno2wIsxkXYVJxeRcWA ygNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=X6Cdy/diJgY3l1SwROJcGixlj3eqYx1Ab0GaummoDv4=; b=jyx+ScS6Nra7FesS5ZtHb0rMIgZRV6zHQWga9utoX7ZL2Cx3MHCykGhYH717APPJTm nbXSzTOFowK9W1hzFg/oJy9Y75VUqB7j9/1AKwqicKcwlPQxs5UtP2Kdnq0RyX8sxU0Z LIBvVGO4e6IF+45sKnZB2KHuBGIkKLHKbvPYIcAzJ5nbMBv9j666eB1qXAJvFTHUUniM 3CSen50U3S3d9V+fpeWJR7KYIlfDHGUvuKrvnRr8NCQeYddp8CO4d1NkirTCRVBZ+jsc VEBdiK2LLFfTNZyRO25q0DongIEdpbtPdaT8pJuvH996fDhSVqfTw4FgQ3Lm0N/qa9a2 yUTw== X-Gm-Message-State: AGi0Pua/NU/nAZkZTqGZKd4RSDWyFEdO7xfF7Jv8ipzIJy3omd7UIXgo VgSy3Km2llJ6Ebhc9iWrC/2GpfUS X-Google-Smtp-Source: APiQypKkeCj8laEgMf3ohT/qHXDCmbibD96X6RQaIkha+ccQiG9VYnboZbjeMcCNKgQmTGoFs5TZ8Q== X-Received: by 2002:a17:902:8d86:: with SMTP id v6mr9331911plo.57.1586382951757; Wed, 08 Apr 2020 14:55:51 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:51 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 10/20] ext4: break ext4_unlink() and ext4_link() Date: Wed, 8 Apr 2020 14:55:20 -0700 Message-Id: <20200408215530.25649-10-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Break ext4_link() and ext4_unlink() each into 2 parts in order to make them usable in recovery path as well. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4.h | 4 ++ fs/ext4/namei.c | 139 +++++++++++++++++++++++++++++------------------- 2 files changed, 88 insertions(+), 55 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index e9c82f555b6d..c39966facf86 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3365,6 +3365,10 @@ extern int ext4_handle_dirty_dirblock(handle_t *handle, struct inode *inode, extern int ext4_ci_compare(const struct inode *parent, const struct qstr *fname, const struct qstr *entry, bool quick); +extern int __ext4_unlink(struct inode *dir, const struct qstr *d_name, + struct inode *inode); +extern int __ext4_link(struct inode *dir, struct inode *inode, + struct dentry *dentry); #define S_SHIFT 12 static const unsigned char ext4_type_by_mode[(S_IFMT >> S_SHIFT) + 1] = { diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index a8aca4772aaa..77fc136fe718 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -3171,39 +3171,36 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry) return retval; } -static int ext4_unlink(struct inode *dir, struct dentry *dentry) +int __ext4_unlink(struct inode *dir, const struct qstr *d_name, + struct inode *inode) { - int retval; - struct inode *inode; struct buffer_head *bh; struct ext4_dir_entry_2 *de; handle_t *handle = NULL; + int retval = -ENOENT; + int skip_remove_dentry = 0; - if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb)))) - return -EIO; - - trace_ext4_unlink_enter(dir, dentry); - /* Initialize quotas before so that eventual writes go - * in separate transaction */ - retval = dquot_initialize(dir); - if (retval) - return retval; - retval = dquot_initialize(d_inode(dentry)); - if (retval) - return retval; - - retval = -ENOENT; - bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL); + bh = ext4_find_entry(dir, d_name, &de, NULL); if (IS_ERR(bh)) return PTR_ERR(bh); - if (!bh) - goto end_unlink; - inode = d_inode(dentry); + if (!bh) { + retval = -ENOENT; + goto end_unlink; + } retval = -EFSCORRUPTED; - if (le32_to_cpu(de->inode) != inode->i_ino) - goto end_unlink; + if (le32_to_cpu(de->inode) != inode->i_ino) { + /* + * It's okay if we find dont find dentry which matches + * the inode. That's because it might have gotten + * renamed to a different inode number + */ + if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) + skip_remove_dentry = 1; + else + goto end_unlink; + } handle = ext4_journal_start(dir, EXT4_HT_DIR, EXT4_DATA_TRANS_BLOCKS(dir->i_sb)); @@ -3216,15 +3213,20 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry) if (IS_DIRSYNC(dir)) ext4_handle_sync(handle); - retval = ext4_delete_entry(handle, dir, de, bh); - if (retval) - goto end_unlink; - dir->i_ctime = dir->i_mtime = current_time(dir); - ext4_update_dx_flag(dir); - ext4_mark_inode_dirty(handle, dir); + if (!skip_remove_dentry) { + retval = ext4_delete_entry(handle, dir, de, bh); + if (retval) + goto end_unlink; + dir->i_ctime = dir->i_mtime = current_time(dir); + ext4_update_dx_flag(dir); + ext4_mark_inode_dirty(handle, dir); + } else { + retval = 0; + } + if (inode->i_nlink == 0) ext4_warning_inode(inode, "Deleting file '%.*s' with no links", - dentry->d_name.len, dentry->d_name.name); + d_name->len, d_name->name); else drop_nlink(inode); if (!inode->i_nlink) @@ -3232,6 +3234,33 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry) inode->i_ctime = current_time(inode); ext4_mark_inode_dirty(handle, inode); +end_unlink: + brelse(bh); + if (handle) + ext4_journal_stop(handle); + return retval; +} + +static int ext4_unlink(struct inode *dir, struct dentry *dentry) +{ + int retval; + + if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb)))) + return -EIO; + + trace_ext4_unlink_enter(dir, dentry); + /* + * Initialize quotas before so that eventual writes go + * in separate transaction + */ + retval = dquot_initialize(dir); + if (retval) + return retval; + retval = dquot_initialize(d_inode(dentry)); + if (retval) + return retval; + + retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry)); #ifdef CONFIG_UNICODE /* VFS negative dentries are incompatible with Encoding and * Case-insensitiveness. Eventually we'll want avoid @@ -3242,11 +3271,6 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry) if (IS_CASEFOLDED(dir)) d_invalidate(dentry); #endif - -end_unlink: - brelse(bh); - if (handle) - ext4_journal_stop(handle); trace_ext4_unlink_exit(dentry, retval); return retval; } @@ -3380,29 +3404,10 @@ static int ext4_symlink(struct inode *dir, return err; } -static int ext4_link(struct dentry *old_dentry, - struct inode *dir, struct dentry *dentry) +int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry) { handle_t *handle; - struct inode *inode = d_inode(old_dentry); int err, retries = 0; - - if (inode->i_nlink >= EXT4_LINK_MAX) - return -EMLINK; - - err = fscrypt_prepare_link(old_dentry, dir, dentry); - if (err) - return err; - - if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) && - (!projid_eq(EXT4_I(dir)->i_projid, - EXT4_I(old_dentry->d_inode)->i_projid))) - return -EXDEV; - - err = dquot_initialize(dir); - if (err) - return err; - retry: handle = ext4_journal_start(dir, EXT4_HT_DIR, (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) + @@ -3436,6 +3441,30 @@ static int ext4_link(struct dentry *old_dentry, return err; } +static int ext4_link(struct dentry *old_dentry, + struct inode *dir, struct dentry *dentry) +{ + struct inode *inode = d_inode(old_dentry); + int err; + + if (inode->i_nlink >= EXT4_LINK_MAX) + return -EMLINK; + + err = fscrypt_prepare_link(old_dentry, dir, dentry); + if (err) + return err; + + if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) && + (!projid_eq(EXT4_I(dir)->i_projid, + EXT4_I(old_dentry->d_inode)->i_projid))) + return -EXDEV; + + err = dquot_initialize(dir); + if (err) + return err; + return __ext4_link(dir, inode, dentry); +} + /* * Try to find buffer head where contains the parent block. From patchwork Wed Apr 8 21:55:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268351 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=Pw6EKq2j; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4Y6FZlz9sSr for ; Thu, 9 Apr 2020 07:55:57 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726760AbgDHVz4 (ORCPT ); Wed, 8 Apr 2020 17:55:56 -0400 Received: from mail-pj1-f68.google.com ([209.85.216.68]:52787 "EHLO mail-pj1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726726AbgDHVzz (ORCPT ); Wed, 8 Apr 2020 17:55:55 -0400 Received: by mail-pj1-f68.google.com with SMTP id ng8so402713pjb.2 for ; Wed, 08 Apr 2020 14:55:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ywrQ3rmZNsHtpw6BLvpEJBkorcBqqE2Vs2YJycAZq3k=; b=Pw6EKq2juQKoGyrZ/obk+xlWVkOUTYfmTuJ1KgpuvJe2krcy6V7Ochaqd73F+LfVOH 1uztOQz9SIcvSr2CDq28pr2OkOvDxY0QDDNJu52V3ZYm/VCSSWFP7l9aiT2E1ZPAu7bJ 4CxM0lu4nWK8TtSUBxTXhBBE2rwafjIRqhfUAKz/pRvy881lJPqRcxrdnkTl5KbDtM31 zO1E+66gMvqHxlDXX+vBzNBKLfOkE1PhdQTJvF5mLbPUqPpyaYe8qAups4lYgEF3V2ik oB7FDZQdKEp3N0P1JJNMMhmBhK6jh+MHuIwNV5oDYA6zDCbOJka/Gk7KNdrCAx5cKpoW bauw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ywrQ3rmZNsHtpw6BLvpEJBkorcBqqE2Vs2YJycAZq3k=; b=H7IBIg7NosaPIyoOax1O0GKXI2+pkV88dBSGbdSelSOiUX/df8VXNjIiOFBab3hrjp nGlIa/6+rC2ZxalrFVMw6+bVSxVytqzzzCa2VvKceR4DjuvnGY3fSgybGIZ8KOkGxwGD FuiwfIUCRnAiAIA8je+EWTeyuc+seV7q0xSC2sZoqs7WblNCF9ihR9W+Bcghht/fPtqK 0kSUUzYrkv3hWhbnh8/PGTDwoQIBNpxKWh5uFDJmmHyxUsmQybjG/RL6IkmpUfuEdJtC j7GTu0YeyZAuBD9yJoEScen4LS+RnP53kTdSeV8AyRQPluBxKIpd9VZsi29rYfwfdFBd RBCw== X-Gm-Message-State: AGi0PuZr29xBUKl6u3RRXDu1owzziWyhmzDGMeQ4+9e4zVcq2ikJuQLd NDpqcTOQp83FKii6tTsy5XSOskd5 X-Google-Smtp-Source: APiQypIF7Y59H2NjHiVmcR/oR9kAHWUP69/XCEcegzklOMjSq6vn8mjF5NYEIswMOyIxtG96krFR+g== X-Received: by 2002:a17:902:9682:: with SMTP id n2mr9141820plp.44.1586382952477; Wed, 08 Apr 2020 14:55:52 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:52 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 11/20] ext4: add fast commit track points Date: Wed, 8 Apr 2020 14:55:21 -0700 Message-Id: <20200408215530.25649-11-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Previous patches in the series have added following tracking routines: - ext4_fc_track_inode() -> tracks just the inode - ext4_fc_track_create() -> tracks creation of an inode and remembers its dirent - ext4_fc_track_unlink() -> tracks inode unlink - ext4_fc_track_link() -> tracks inode link - ext4_fc_mark_ineligible() -> marks inode as ineligible for fast commits - ext4_fc_disable() -> marks entire file system as fast commit ineligible Add these different track points at various points in the file system. This patch also adds high level stats to remember reasons why inodes were marked ineligible. Signed-off-by: Harshad Shirwadkar --- fs/ext4/acl.c | 1 + fs/ext4/balloc.c | 3 +++ fs/ext4/ext4.h | 26 ++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.c | 5 +++++ fs/ext4/extents.c | 5 +++++ fs/ext4/inline.c | 3 +++ fs/ext4/inode.c | 6 ++++++ fs/ext4/ioctl.c | 5 +++++ fs/ext4/namei.c | 41 +++++++++++++++++++++++++++++++++++++++-- fs/ext4/super.c | 9 +++++++++ fs/ext4/xattr.c | 6 ++++++ 11 files changed, 108 insertions(+), 2 deletions(-) diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c index 8c7bbf3e566d..28e9e04a8e96 100644 --- a/fs/ext4/acl.c +++ b/fs/ext4/acl.c @@ -257,6 +257,7 @@ ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type) inode->i_mode = mode; inode->i_ctime = current_time(inode); ext4_mark_inode_dirty(handle, inode); + ext4_fc_track_inode(inode); } out_stop: ext4_journal_stop(handle); diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 0e0a4d6209c7..25960bb4fe69 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -664,6 +664,9 @@ ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode, ar.len = count ? *count : 1; ar.flags = flags; + ext4_fc_mark_ineligible(inode, + EXT4_FC_REASON_META_ALLOC); + ret = ext4_mb_new_blocks(handle, &ar, errp); if (count) *count = ar.len; diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index c39966facf86..922939320d02 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1439,6 +1439,31 @@ struct ext4_super_block { #define ext4_has_strict_mode(sbi) \ (sbi->s_encoding_flags & EXT4_ENC_STRICT_MODE_FL) +/* + * Fast commit ineligible reasons. + */ +enum { + EXT4_FC_REASON_META_ALLOC, + EXT4_FC_REASON_QUOTA, + EXT4_FC_REASON_XATTR, + EXT4_FC_REASON_CROSS_RENAME, + EXT4_FC_REASON_FALLOC_RANGE_OP, + EXT4_FC_REASON_JOURNAL_FLAG_CHANGE, + EXT4_FC_REASON_DELETE, + EXT4_FC_REASON_MEM, + EXT4_FC_REASON_SWAP_BOOT, + EXT4_FC_REASON_RESIZE, + EXT4_FC_REASON_RENAME_DIR, + EXT4_FC_REASON_MAX +}; + +struct ext4_fc_stats { + int fc_ineligible_reason_count[EXT4_FC_REASON_MAX]; + int fc_num_commits; + int fc_ineligible_commits; + int fc_numblks; +}; + /* * fourth extended-fs super-block data in memory */ @@ -1626,6 +1651,7 @@ struct ext4_sb_info { */ struct list_head s_fc_dentry_q; spinlock_t s_fc_lock; + struct ext4_fc_stats s_fc_stats; }; static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 9f12ae2fb3ab..4bef01f9814a 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -427,6 +427,8 @@ void ext4_fc_mark_ineligible(struct inode *inode, int reason) (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)) return; + WARN_ON(reason >= EXT4_FC_REASON_MAX); + sbi->s_fc_stats.fc_ineligible_reason_count[reason]++; if (sbi->s_journal) ei->i_fc_tid = get_running_txn_tid(inode->i_sb); ext4_clear_inode_state(inode, EXT4_STATE_FC_ELIGIBLE); @@ -439,6 +441,8 @@ void ext4_fc_disable(struct super_block *sb, int reason) struct ext4_sb_info *sbi = EXT4_SB(sb); sbi->s_mount_state |= EXT4_FC_INELIGIBLE; + WARN_ON(reason >= EXT4_FC_REASON_MAX); + sbi->s_fc_stats.fc_ineligible_reason_count[reason]++; } /* @@ -500,6 +504,7 @@ static int __ext4_dentry_update(struct inode *inode, void *arg, bool update) write_unlock(&ei->i_fc_lock); node = kmem_cache_alloc(ext4_fc_dentry_cachep, GFP_NOFS); if (!node) { + ext4_fc_disable(inode->i_sb, EXT4_FC_REASON_MEM); write_lock(&ei->i_fc_lock); return -ENOMEM; } diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 031752cfb6f7..b19f0e596503 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4578,6 +4578,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, if (new_size) ext4_update_inode_size(inode, new_size); ext4_mark_inode_dirty(handle, inode); + ext4_fc_track_inode(inode); /* Zero out partial block at the edges of the range */ ret = ext4_zero_partial_blocks(handle, inode, offset, len); @@ -5273,6 +5274,8 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) if (IS_SYNC(inode)) ext4_handle_sync(handle); inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_mark_ineligible(inode, + EXT4_FC_REASON_FALLOC_RANGE_OP); ext4_mark_inode_dirty(handle, inode); ext4_update_inode_fsync_trans(handle, inode, 1); @@ -5383,6 +5386,8 @@ static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) inode->i_size += len; EXT4_I(inode)->i_disksize += len; inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_mark_ineligible(inode, + EXT4_FC_REASON_FALLOC_RANGE_OP); ret = ext4_mark_inode_dirty(handle, inode); if (ret) goto out_stop; diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index f35e289e17aa..4d48a6d985c8 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -759,6 +759,7 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, ext4_write_unlock_xattr(inode, &no_expand); brelse(iloc.bh); + ext4_fc_track_inode(inode); mark_inode_dirty(inode); out: return copied; @@ -974,6 +975,7 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, * ordering of page lock and transaction start for journaling * filesystems. */ + ext4_fc_track_inode(inode); mark_inode_dirty(inode); return copied; @@ -1946,6 +1948,7 @@ int ext4_inline_data_truncate(struct inode *inode, int *has_inline) if (err == 0) { inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_track_inode(inode); err = ext4_mark_inode_dirty(handle, inode); if (IS_SYNC(inode)) ext4_handle_sync(handle); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 3bf0ad4d7d32..e5b45f32dd30 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2849,6 +2849,7 @@ static int ext4_writepages(struct address_space *mapping, out_writepages: trace_ext4_writepages_result(inode, wbc, ret, nr_to_write - wbc->nr_to_write); + ext4_fc_track_inode(inode); percpu_up_read(&sbi->s_writepages_rwsem); return ret; } @@ -5295,6 +5296,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (attr->ia_valid & ATTR_GID) inode->i_gid = attr->ia_gid; error = ext4_mark_inode_dirty(handle, inode); + ext4_fc_track_inode(inode); ext4_journal_stop(handle); } @@ -5413,6 +5415,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (!error) { setattr_copy(inode, attr); + ext4_fc_track_inode(inode); mark_inode_dirty(inode); } @@ -5843,6 +5846,7 @@ void ext4_dirty_inode(struct inode *inode, int flags) goto out; ext4_mark_inode_dirty(handle, inode); + ext4_fc_track_inode(inode); ext4_journal_stop(handle); out: @@ -5928,6 +5932,8 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) if (IS_ERR(handle)) return PTR_ERR(handle); + ext4_fc_mark_ineligible(inode, + EXT4_FC_REASON_JOURNAL_FLAG_CHANGE); err = ext4_mark_inode_dirty(handle, inode); ext4_handle_sync(handle); ext4_journal_stop(handle); diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index bfc1281fc4cb..f66bcf185f5b 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -204,6 +204,8 @@ static long swap_inode_boot_loader(struct super_block *sb, ext4_discard_preallocations(inode); + if (EXT4_SB(sb)->s_journal) + ext4_fc_disable(sb, EXT4_FC_REASON_SWAP_BOOT); err = ext4_mark_inode_dirty(handle, inode); if (err < 0) { /* No need to update quota information. */ @@ -385,6 +387,8 @@ static int ext4_ioctl_setflags(struct inode *inode, inode->i_ctime = current_time(inode); err = ext4_mark_iloc_dirty(handle, inode, &iloc); + ext4_fc_track_inode(inode); + flags_err: ext4_journal_stop(handle); if (err) @@ -1068,6 +1072,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) err = ext4_resize_fs(sb, n_blocks_count); if (EXT4_SB(sb)->s_journal) { + ext4_fc_disable(sb, EXT4_FC_REASON_RESIZE); jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal); err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal); jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal); diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 77fc136fe718..2d9c3767d8d6 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2601,7 +2601,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool excl) { handle_t *handle; - struct inode *inode; + struct inode *inode, *inode_save; int err, credits, retries = 0; err = dquot_initialize(dir); @@ -2619,7 +2619,11 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode, inode->i_op = &ext4_file_inode_operations; inode->i_fop = &ext4_file_operations; ext4_set_aops(inode); + inode_save = inode; + ihold(inode_save); err = ext4_add_nondir(handle, dentry, &inode); + ext4_fc_track_create(inode_save, dentry); + iput(inode_save); } if (handle) ext4_journal_stop(handle); @@ -2634,7 +2638,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t rdev) { handle_t *handle; - struct inode *inode; + struct inode *inode, *inode_save; int err, credits, retries = 0; err = dquot_initialize(dir); @@ -2651,12 +2655,18 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry, if (!IS_ERR(inode)) { init_special_inode(inode, inode->i_mode, rdev); inode->i_op = &ext4_special_inode_operations; + inode_save = inode; + ihold(inode_save); err = ext4_add_nondir(handle, dentry, &inode); + if (!err) + ext4_fc_track_create(inode_save, dentry); + iput(inode_save); } if (handle) ext4_journal_stop(handle); if (!IS_ERR_OR_NULL(inode)) iput(inode); + if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries)) goto retry; return err; @@ -2688,6 +2698,8 @@ static int ext4_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode) err = ext4_orphan_add(handle, inode); if (err) goto err_unlock_inode; + + ext4_fc_track_inode(inode); mark_inode_dirty(inode); unlock_new_inode(inode); } @@ -2813,6 +2825,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) iput(inode); goto out_retry; } + ext4_fc_track_create(inode, dentry); ext4_inc_count(handle, dir); ext4_update_dx_flag(dir); err = ext4_mark_inode_dirty(handle, dir); @@ -3261,6 +3274,8 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry) return retval; retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry)); + if (!retval) + ext4_fc_track_unlink(d_inode(dentry), dentry); #ifdef CONFIG_UNICODE /* VFS negative dentries are incompatible with Encoding and * Case-insensitiveness. Eventually we'll want avoid @@ -3424,6 +3439,7 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry) err = ext4_add_entry(handle, dentry, inode); if (!err) { + ext4_fc_track_link(inode, dentry); ext4_mark_inode_dirty(handle, inode); /* this can happen only for tmpfile being * linked the first time @@ -3872,6 +3888,23 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry, ext4_mark_inode_dirty(handle, new.dir); } } + + if (S_ISDIR(old.inode->i_mode)) { + /* + * We disable fast commits here that's because the + * replay code is not yet capable of changing dot dot + * dirents in directories. Since this is a metadata + * update that's ineligible, we need to mark entire fs + * as ineligbile. + */ + ext4_fc_disable(old.inode->i_sb, EXT4_FC_REASON_RENAME_DIR); + } else { + if (new.inode) + ext4_fc_track_unlink(new.inode, new.dentry); + ext4_fc_track_link(old.inode, new.dentry); + ext4_fc_track_unlink(old.inode, old.dentry); + } + ext4_mark_inode_dirty(handle, old.dir); if (new.inode) { ext4_mark_inode_dirty(handle, new.inode); @@ -4008,7 +4041,11 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry, ctime = current_time(old.inode); old.inode->i_ctime = ctime; new.inode->i_ctime = ctime; + ext4_fc_mark_ineligible(old.inode, + EXT4_FC_REASON_CROSS_RENAME); ext4_mark_inode_dirty(handle, old.inode); + ext4_fc_mark_ineligible(new.inode, + EXT4_FC_REASON_CROSS_RENAME); ext4_mark_inode_dirty(handle, new.inode); if (old.dir_bh) { diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 695bc43d5916..ea712c9cf77b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1381,6 +1381,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len, * S_DAX may be disabled */ ext4_set_inode_flags(inode); + ext4_fc_track_inode(inode); res = ext4_mark_inode_dirty(handle, inode); if (res) EXT4_ERROR_INODE(inode, "Failed to mark inode dirty"); @@ -4424,6 +4425,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) INIT_LIST_HEAD(&sbi->s_fc_dentry_q); sbi->s_mount_state &= ~EXT4_FC_INELIGIBLE; spin_lock_init(&sbi->s_fc_lock); + memset(&sbi->s_fc_stats, 0, sizeof(sbi->s_fc_stats)); + sb->s_root = NULL; needs_recovery = (es->s_last_orphan != 0 || @@ -5914,6 +5917,8 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id, EXT4_I(inode)->i_flags |= EXT4_NOATIME_FL | EXT4_IMMUTABLE_FL; inode_set_flags(inode, S_NOATIME | S_IMMUTABLE, S_NOATIME | S_IMMUTABLE); + ext4_fc_mark_ineligible(inode, + EXT4_FC_REASON_QUOTA); ext4_mark_inode_dirty(handle, inode); ext4_journal_stop(handle); unlock_inode: @@ -6021,6 +6026,8 @@ static int ext4_quota_off(struct super_block *sb, int type) EXT4_I(inode)->i_flags &= ~(EXT4_NOATIME_FL | EXT4_IMMUTABLE_FL); inode_set_flags(inode, 0, S_NOATIME | S_IMMUTABLE); inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_mark_ineligible(inode, + EXT4_FC_REASON_QUOTA); ext4_mark_inode_dirty(handle, inode); ext4_journal_stop(handle); out_unlock: @@ -6127,6 +6134,8 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type, if (inode->i_size < off + len) { i_size_write(inode, off + len); EXT4_I(inode)->i_disksize = inode->i_size; + ext4_fc_mark_ineligible(inode, + EXT4_FC_REASON_QUOTA); ext4_mark_inode_dirty(handle, inode); } return len; diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 21df43a25328..5859d186796a 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -2413,6 +2413,8 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index, if (IS_SYNC(inode)) ext4_handle_sync(handle); } + ext4_fc_mark_ineligible(inode, + EXT4_FC_REASON_XATTR); cleanup: brelse(is.iloc.bh); @@ -2490,6 +2492,8 @@ ext4_xattr_set(struct inode *inode, int name_index, const char *name, if (error == 0) error = error2; } + ext4_fc_mark_ineligible(inode, + EXT4_FC_REASON_XATTR); return error; } @@ -2922,6 +2926,8 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode, error); goto cleanup; } + ext4_fc_mark_ineligible(inode, + EXT4_FC_REASON_XATTR); } error = 0; cleanup: From patchwork Wed Apr 8 21:55:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268354 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=Gt5rbZdn; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4b5G7jz9sSM for ; Thu, 9 Apr 2020 07:55:59 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726757AbgDHVz4 (ORCPT ); Wed, 8 Apr 2020 17:55:56 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:38838 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726725AbgDHVzy (ORCPT ); Wed, 8 Apr 2020 17:55:54 -0400 Received: by mail-pf1-f194.google.com with SMTP id c21so3077449pfo.5 for ; Wed, 08 Apr 2020 14:55:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6c2L16Ujn3w/lMWA3ilafaxiWdDboTeYgqDIA6qCREs=; b=Gt5rbZdntMswGmqfECpMJid05gBHuMHWxD1VUaPztneu7hNfsbFF/lRX61har3IxeT 7x+A4f7FopiG6XqW5PFvmh0qUAFY6KEsAbYtWgQbdETbyoL3+URepNKDsRqLmdHvOT4g bpg8WHE0BXuSu5J4PCHT6i2/aOcMYj847daZuPHmompFj/aPkG4TMUBoR8Mb5sHuwYXG RT+o2oSuG+CYk5HvZe5SOmIclCzhR5BfeFlL5U/ohulwwm5v8zI8zlFNC1hfHea0ruCF HifDa5KAJx8/TrsZ0qxh78zLxX1T3rfAtnxsAyx6pRWJQEubj260gcRYHrfjcEhPa0Ff yq6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6c2L16Ujn3w/lMWA3ilafaxiWdDboTeYgqDIA6qCREs=; b=izFlU1foo+mUCv6cSB3AKWy6aEc2sqf0KTQRYHxp2uS5LcHrvzF6mpNP3zarMYEWAI efGOL4oxDNI25+HiHinkpsRf2ijwtc+LuKLVII1qyGdXKuR7W2mnIa99+zXeMQPkNbo6 zlaADsOKoJys3uXsRE2gxrFsV7gMqzSD9hggLjLsATO4g635bWo/8yzZIiT2J74hOXVR kghgpJMsGcEiWQOm8sjPYvmNFC2VgReCQtX3Kry1K2+HTufbufbPLuT68ys8+K+pOUSM X/K8gaVjdBr+M7Y4T6qJLIwRceQDzMZnXDVbglIY2rYg8OotcMzlqG0aRBU9rFIFrY0m w07g== X-Gm-Message-State: AGi0PubOQKdbng3VeIp31/rZNcIu5x0SFX+EpCoAVHd0xTZGPFdePt7z lnF2m1wnVKWJUZa8WzTUpomuaWgv X-Google-Smtp-Source: APiQypLkTNNfYzDW2P9+XNbKFZxuIEkhJeQo8e1xSRrZQszPzqdyEG9iSilFY5/+CTiJutXOva7PUA== X-Received: by 2002:a63:717:: with SMTP id 23mr9217407pgh.61.1586382953105; Wed, 08 Apr 2020 14:55:53 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:52 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 12/20] ext4: add fast commit on-disk format structs Date: Wed, 8 Apr 2020 14:55:22 -0700 Message-Id: <20200408215530.25649-12-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Add structs representing on-disk format of the commit header and tlvs in the commit header. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4_jbd2.c | 1 + fs/ext4/ext4_jbd2.h | 45 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 4bef01f9814a..fca478a3b7e9 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -4,6 +4,7 @@ */ #include "ext4_jbd2.h" +#include "ext4_extents.h" #include diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index b1239d6be713..b8ac3f29949d 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -529,7 +529,52 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) return 1; } +/* Ext4 fast commit related info */ + +/* Magic of fast commit header */ +#define EXT4_FC_MAGIC 0xE2540090 + #define EXT4_NUM_FC_BLKS 128 + +struct ext4_fc_commit_hdr { + /* Fast commit magic, should be EXT4_FC_MAGIC */ + __le32 fc_magic; + /* Features used by this fast commit block */ + __u8 fc_features; + /* Number of TLVs in this fast commmit block */ + __le16 fc_num_tlvs; + /* Inode number */ + __le32 fc_ino; + /* Csum(hdr+contents) */ + __le32 fc_csum; +}; + +/* Fast commit on disk tag length structure */ +struct ext4_fc_tl { + __le16 fc_tag; + __le16 fc_len; +}; + +/* On disk fast commit tlv value structure for dirent tags: + * - EXT4_FC_TAG_CREATE_DENTRY + * - EXT4_FC_TAG_ADD_DENTRY + * - EXT4_FC_TAG_DEL_DENTRY + */ +struct ext4_fc_dentry_info { + __le32 fc_parent_ino; + __le32 fc_ino; + u8 fc_dname[0]; +}; + +/* + * On disk fast commit tlv value structure for tag + * EXT4_FC_TAG_HOLE. + */ +struct ext4_fc_lrange { + __le32 fc_lblk; + __le32 fc_len; +}; + void ext4_init_fast_commit(struct super_block *sb, journal_t *journal); void ext4_init_inode_fc_info(struct inode *inode); void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start, From patchwork Wed Apr 8 21:55:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268352 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=Cco/J1RA; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4Z2m4Jz9sSM for ; Thu, 9 Apr 2020 07:55:58 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726761AbgDHVz5 (ORCPT ); Wed, 8 Apr 2020 17:55:57 -0400 Received: from mail-pf1-f193.google.com ([209.85.210.193]:33517 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726702AbgDHVzz (ORCPT ); Wed, 8 Apr 2020 17:55:55 -0400 Received: by mail-pf1-f193.google.com with SMTP id c138so3733871pfc.0 for ; Wed, 08 Apr 2020 14:55:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xpBJs4b8LdhljW/mbELyC+7Lt7EHv44g0hNPCugQeCA=; b=Cco/J1RAjpj/6AjOuwb55YBbAacUCjs9l0lz3mgG3dj4Nw97onQrhKlzu9V32YmaSO AXXBlxa6lrMKy//aiJvr+DmzbQK+RCwkhJa4TwtSayVEXAVDhjtoboGDvcU06FwHRmeg EREiEyvcrNjjqzQvYuKPy+fCWAKNwSWWeSGkNPB3GIMda6tHUAWmNXkD+vV/TDS9dhg9 JqlV9ZTLHgdqKKwgR74Vqpd5U6pPOPL3WlByOsyzEfggeOf71OKbUjTDruzLhaInd9BL XR+oIGXQzTUySpI86BOKM1ja6PPGtoT7Fck9GhuV9aru7xTvmcYgQL7/q/HLmv8RV207 XN4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xpBJs4b8LdhljW/mbELyC+7Lt7EHv44g0hNPCugQeCA=; b=FB+/4fhulJlTBjd+g20mMM6h4o9a82RS/QVt089efbZ3z9uJ0fP1NenugQM7ccgmi+ jPqEzmJ2W/sqJ980mN3aiBOsHJBakhnHBgGNTSW/S/nRd7TyTrIryvlMzHPznpz9Rbw2 eQGHl4fl0S8Wo00/ZhtLQgtEU4VKPyaXErQOIINETc3gRY+f7Nsv2qLqQQV5TH6V7782 twWOfGR/eGZQQiGwBmAXGatG7RG9mO5do7z5CopVHXQPSypg8nW/3vOafwOLF1R+bJpP CzVN0TOpw/sS3xyzcC0yaC0B5sI9Sy1v7orbLVKUlSJCIPN5CyHYYahm/cVdCA70xmiO EhgQ== X-Gm-Message-State: AGi0PuYLL01LT1X3RE80rfTEC9ZhY9nbhVdkPAO40A8y79031Hr39R5X nWZYZcPqZLNSoH1T1bz5v6w/1pJD X-Google-Smtp-Source: APiQypLTSWSMQg33yOSOnyHy9AuA8n4wYOGR9TivDenZDOi0i+jqBpxj4Psyt1vYvwRtzRM7NDx0Iw== X-Received: by 2002:a63:721a:: with SMTP id n26mr8883861pgc.386.1586382953746; Wed, 08 Apr 2020 14:55:53 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:53 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 13/20] jbd2: add new APIs for commit path of fast commits Date: Wed, 8 Apr 2020 14:55:23 -0700 Message-Id: <20200408215530.25649-13-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Add following helpers for commit path: - jbd2_map_fc_buf() - allocates fast commit buffers for caller - jbd2_wait_on_fc_bufs() - waits on fast commit buffers allocated using jbd2_map_fc_buf() - jbd2_submit_inode_data() - submit data buffers for one inode - jbd2_wait_inode_data() - wait for inode data Signed-off-by: Harshad Shirwadkar --- fs/jbd2/commit.c | 40 +++++++++++++++++++++++ fs/jbd2/journal.c | 76 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/jbd2.h | 6 ++++ 3 files changed, 122 insertions(+) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 280d11591bcb..2ef2dfb029e4 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -202,6 +202,46 @@ static int journal_submit_inode_data_buffers(struct address_space *mapping, return ret; } +/* Send all the data buffers related to an inode */ +int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode) +{ + struct address_space *mapping; + loff_t dirty_start; + loff_t dirty_end; + int ret; + + if (!jinode) + return 0; + + dirty_start = jinode->i_dirty_start; + dirty_end = jinode->i_dirty_end; + + if (!(jinode->i_flags & JI_WRITE_DATA)) + return 0; + + dirty_start = jinode->i_dirty_start; + dirty_end = jinode->i_dirty_end; + + mapping = jinode->i_vfs_inode->i_mapping; + + trace_jbd2_submit_inode_data(jinode->i_vfs_inode); + ret = journal_submit_inode_data_buffers(mapping, dirty_start, + dirty_end); + + return ret; +} +EXPORT_SYMBOL(jbd2_submit_inode_data); + +int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode) +{ + if (!jinode || !(jinode->i_flags & JI_WAIT_DATA)) + return 0; + return filemap_fdatawait_range_keep_errors( + jinode->i_vfs_inode->i_mapping, jinode->i_dirty_start, + jinode->i_dirty_end); +} +EXPORT_SYMBOL(jbd2_wait_inode_data); + /* * Submit all the data buffers of inode associated with the transaction to * disk. diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index d3897d155fb9..e4e0b55dd077 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -864,6 +864,82 @@ int jbd2_journal_next_log_block(journal_t *journal, unsigned long long *retp) return jbd2_journal_bmap(journal, blocknr, retp); } +/* Map one fast commit buffer for use by the file system */ +int jbd2_map_fc_buf(journal_t *journal, struct buffer_head **bh_out) +{ + unsigned long long pblock; + unsigned long blocknr; + int ret = 0; + struct buffer_head *bh; + int fc_off; + journal_header_t *jhdr; + + write_lock(&journal->j_state_lock); + + if (journal->j_fc_off + journal->j_first_fc < journal->j_last_fc) { + fc_off = journal->j_fc_off; + blocknr = journal->j_first_fc + fc_off; + journal->j_fc_off++; + } else { + ret = -EINVAL; + } + write_unlock(&journal->j_state_lock); + + if (ret) + return ret; + + ret = jbd2_journal_bmap(journal, blocknr, &pblock); + if (ret) + return ret; + + bh = __getblk(journal->j_dev, pblock, journal->j_blocksize); + if (!bh) + return -ENOMEM; + + lock_buffer(bh); + jhdr = (journal_header_t *)bh->b_data; + jhdr->h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER); + jhdr->h_blocktype = cpu_to_be32(JBD2_FC_BLOCK); + jhdr->h_sequence = cpu_to_be32(journal->j_running_transaction->t_tid); + + set_buffer_uptodate(bh); + unlock_buffer(bh); + journal->j_fc_wbuf[fc_off] = bh; + + *bh_out = bh; + + return 0; +} +EXPORT_SYMBOL(jbd2_map_fc_buf); + +/* + * Wait on fast commit buffers that were allocated by jbd2_map_fc_buf + * for completion. + */ +int jbd2_wait_on_fc_bufs(journal_t *journal, int num_blks) +{ + struct buffer_head *bh; + int i, j_fc_off; + + read_lock(&journal->j_state_lock); + j_fc_off = journal->j_fc_off; + read_unlock(&journal->j_state_lock); + + /* + * Wait in reverse order to minimize chances of us being woken up before + * all IOs have completed + */ + for (i = j_fc_off - 1; i >= j_fc_off - num_blks; i--) { + bh = journal->j_fc_wbuf[i]; + wait_on_buffer(bh); + if (unlikely(!buffer_uptodate(bh))) + return -EIO; + } + + return 0; +} +EXPORT_SYMBOL(jbd2_wait_on_fc_bufs); + /* * Conversion of logical to physical block numbers for the journal * diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 0a4d9d484528..599113bef67f 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -123,6 +123,7 @@ typedef struct journal_s journal_t; /* Journal control structure */ #define JBD2_SUPERBLOCK_V1 3 #define JBD2_SUPERBLOCK_V2 4 #define JBD2_REVOKE_BLOCK 5 +#define JBD2_FC_BLOCK 6 /* * Standard header for all descriptor blocks: @@ -1562,6 +1563,11 @@ int jbd2_start_async_fc_nowait(journal_t *journal, tid_t tid); int jbd2_start_async_fc_wait(journal_t *journal, tid_t tid); void jbd2_stop_async_fc(journal_t *journal, tid_t tid); void jbd2_init_fast_commit(journal_t *journal, int num_fc_blks); +int jbd2_map_fc_buf(journal_t *journal, struct buffer_head **bh_out); +int jbd2_wait_on_fc_bufs(journal_t *journal, int num_blks); +int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode); +int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode); + /* * is_journal_abort * From patchwork Wed Apr 8 21:55:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268353 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=dqQdXBrU; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4Z63D4z9sSq for ; Thu, 9 Apr 2020 07:55:58 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726765AbgDHVz6 (ORCPT ); Wed, 8 Apr 2020 17:55:58 -0400 Received: from mail-pj1-f66.google.com ([209.85.216.66]:52788 "EHLO mail-pj1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726754AbgDHVz5 (ORCPT ); Wed, 8 Apr 2020 17:55:57 -0400 Received: by mail-pj1-f66.google.com with SMTP id ng8so402749pjb.2 for ; Wed, 08 Apr 2020 14:55:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=p5zxxUAzV1fa3PxiC2K/7qsTGoI8m3v+I/86bZn8Dmk=; b=dqQdXBrUvu+jcHi7271cU12jhShiWewbFq+LnPzqLrR5+z5sFP1O3/mvQlzqLzHVZu saWT/sLn5Nwmj18q/LZqP3s4g9DmUguMN/IYtw8TCjQC00AI3Ft54zSOPcG15jAv4L+V lkKQSkjMWEm8cn/8aLovOcLL0wxR+RQG6CpW5Tg/bbvxBRRlvhB0Pzr0VTcXQQuS0lEZ JwRN5zY0g2/Ugkl8VBnqbhzrgQiyiAuxXkb2QxPjtv/EQ9UYgA7rFkYnNUA3QWaMYEzM SpaBRd2RPVGzNKhK0AkFNKv2/jHJXtfSykwXcY/3k+W6ssEaNkj6wQgEOWkDdVRSyO5c EixA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=p5zxxUAzV1fa3PxiC2K/7qsTGoI8m3v+I/86bZn8Dmk=; b=e0EjwUrEOZ1LMBN5Mb44MX9nZkVCzxIkDy8/Bd9wiQOL0JhVz4H5jqLXL+o9bXk9o5 U4Qnrl1ABTL7uzP0DoojtEco85bgl+2N6xGBbh1W6Y/yBLGd3DL9KNxra50Y+eQCEi4w fJfiBny2XZcdIrcdvk1i98djsj3Ym0wyID1vuFYFI1hQejQXrYufmsZpzI5E+Mac+rKa baUpes850v2AdmQGM2ralG0ipSNfUuULWzTr0EJ1rQHZ4QhFkBZXsZ565RJe4RAEw2Ep IfmL3JXedJ5dowmKlTZp5fJmVu+ymJLBYh2jCeu6Tdkfbc0/4WgfW9dSHr/34xPcecos TImw== X-Gm-Message-State: AGi0PuZxnH4SebCeQC9gLJOYITYEwQwB404nvxx7I3KWHNIXFgMlrO/J x4SElOOc0lsvswet6GAbyVG8tT8m X-Google-Smtp-Source: APiQypJUEnY2wV2kQ0HQDJqvnY5O3ufQ5o78PlPRzHVHPr5igV9B0ZMCEqgSvbg5LBpvXbPUZA4k9Q== X-Received: by 2002:a17:902:8f87:: with SMTP id z7mr8272130plo.342.1586382954409; Wed, 08 Apr 2020 14:55:54 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:54 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 14/20] ext4: main commit routine for fast commits Date: Wed, 8 Apr 2020 14:55:24 -0700 Message-Id: <20200408215530.25649-14-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Add main commit routine for fast commits to perform a fast commit. Also, handle race between inode deletion which results in inode being deleted from fast commit list and the commit routine. Add commit routines for fast commit with hard consistency as well as for soft consistency. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4.h | 7 + fs/ext4/ext4_jbd2.c | 714 ++++++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.h | 2 + fs/ext4/fsync.c | 2 +- fs/ext4/inode.c | 6 +- fs/ext4/migrate.c | 1 + fs/ext4/super.c | 12 + include/trace/events/ext4.h | 100 +++++ 8 files changed, 841 insertions(+), 3 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 922939320d02..c4e74bcbbf90 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1763,6 +1763,10 @@ enum { EXT4_STATE_LUSTRE_EA_INODE, /* Lustre-style ea_inode */ EXT4_STATE_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */ EXT4_STATE_FC_ELIGIBLE, /* File is Fast commit eligible */ + EXT4_STATE_FC_DATA_SUBMIT, /* File is going through fast commit */ + EXT4_STATE_FC_MDATA_SUBMIT, /* Fast commit block is + * being submitted + */ }; #define EXT4_INODE_BIT_FNS(name, field, offset) \ @@ -2853,6 +2857,9 @@ extern int ext4_group_extend(struct super_block *sb, extern int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count); /* super.c */ + +int ext4_fc_async_commit_inode(journal_t *journal, tid_t commit_tid, + struct inode *inode); extern struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block, int op_flags); extern int ext4_seq_options_show(struct seq_file *seq, void *offset); diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index fca478a3b7e9..bbba1b067fdf 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -388,6 +388,8 @@ void ext4_init_inode_fc_info(struct inode *inode) struct ext4_inode_info *ei = EXT4_I(inode); ext4_reset_inode_fc_info(inode); + ext4_clear_inode_state(inode, EXT4_STATE_FC_DATA_SUBMIT); + ext4_clear_inode_state(inode, EXT4_STATE_FC_MDATA_SUBMIT); INIT_LIST_HEAD(&ei->i_fc_list); } @@ -412,6 +414,43 @@ static inline tid_t get_running_txn_tid(struct super_block *sb) return 0; } +void ext4_fc_del(struct inode *inode) +{ + if (!ext4_should_fast_commit(inode->i_sb) || + (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)) + return; + + if (list_empty(&EXT4_I(inode)->i_fc_list)) + return; + + ext4_fc_disable(inode->i_sb, EXT4_FC_REASON_DELETE); + +restart: + spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock); + if (ext4_test_inode_state(inode, EXT4_STATE_FC_DATA_SUBMIT)) { + struct ext4_inode_info *ei = EXT4_I(inode); + wait_queue_head_t *wq; +#if (BITS_PER_LONG < 64) + DEFINE_WAIT_BIT(wait, &ei->i_state_flags, + EXT4_STATE_FC_DATA_SUBMIT); + wq = bit_waitqueue(&ei->i_state_flags, + EXT4_STATE_FC_DATA_SUBMIT); +#else + DEFINE_WAIT_BIT(wait, &ei->i_flags, + EXT4_STATE_FC_DATA_SUBMIT); + wq = bit_waitqueue(&ei->i_flags, + EXT4_STATE_FC_DATA_SUBMIT); +#endif + prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE); + spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); + schedule(); + finish_wait(wq, &wait.wq_entry); + goto restart; + } + list_del_init(&EXT4_I(inode)->i_fc_list); + spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); +} + bool ext4_is_inode_fc_ineligible(struct inode *inode) { if (get_running_txn_tid(inode->i_sb) == EXT4_I(inode)->i_fc_tid) @@ -470,6 +509,7 @@ static int __ext4_fc_track_template( return -EOPNOTSUPP; write_lock(&ei->i_fc_lock); + ext4_clear_inode_state(inode, EXT4_STATE_FC_MDATA_SUBMIT); if (running_txn_tid == ei->i_fc_tid) { if (!ext4_test_inode_state(inode, EXT4_STATE_FC_ELIGIBLE)) { write_unlock(&ei->i_fc_lock); @@ -642,10 +682,684 @@ void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start, trace_ext4_fc_track_range(inode, start, end, ret); } +static void ext4_end_buffer_io_sync(struct buffer_head *bh, int uptodate) +{ + BUFFER_TRACE(bh, ""); + if (uptodate) { + ext4_debug("%s: Block %lld up-to-date", + __func__, bh->b_blocknr); + set_buffer_uptodate(bh); + } else { + ext4_debug("%s: Block %lld not up-to-date", + __func__, bh->b_blocknr); + clear_buffer_uptodate(bh); + } + + unlock_buffer(bh); +} + +void submit_fc_bh(struct buffer_head *bh) +{ + lock_buffer(bh); + clear_buffer_dirty(bh); + set_buffer_uptodate(bh); + bh->b_end_io = ext4_end_buffer_io_sync; + submit_bh(REQ_OP_WRITE, REQ_SYNC, bh); +} + +/* + * Writes fast commit header and inode structure at memory + * pointed to by start. Returns 0 on success, error on failure. + * If successful, *last is upadated to point to the end of + * inode that was copied. + */ +static int fc_write_hdr(struct inode *inode, u8 *start, u8 *end, + u8 **last) +{ + struct ext4_fc_commit_hdr *fc_hdr = (struct ext4_fc_commit_hdr *)start; + struct ext4_inode_info *ei = EXT4_I(inode); + int inode_len = EXT4_GOOD_OLD_INODE_SIZE; + struct ext4_iloc iloc; + u8 *cur = start; + int ret; + + if (ext4_is_inode_fc_ineligible(inode)) + return -ECANCELED; + + ret = ext4_get_inode_loc(inode, &iloc); + if (ret) + return ret; + + fc_hdr->fc_magic = cpu_to_le32(EXT4_FC_MAGIC); + fc_hdr->fc_ino = cpu_to_le32(inode->i_ino); + fc_hdr->fc_features = 0; + fc_hdr->fc_csum = 0; + + cur = (u8 *)(fc_hdr + 1); + if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) + inode_len += ei->i_extra_isize; + if (cur + inode_len >= end) + return -ECANCELED; + + memcpy(cur, ext4_raw_inode(&iloc), inode_len); + cur += inode_len; + *last = cur; + + return 0; +} + +/* + * Adds tag, length and value at memory pointed to by dst. Returns + * true if tlv was added. Returns false if there's not enough space. + * If successful also updates *dst to point to the end of this tlv. + */ +static bool fc_try_add_tlv(u8 **dst, u8 *end, u16 tag, u16 len, u8 *val) +{ + struct ext4_fc_tl tl; + + if (*dst + sizeof(tl) + len >= end) + return false; + + tl.fc_tag = cpu_to_le16(tag); + tl.fc_len = cpu_to_le16(len); + memcpy(*dst, &tl, sizeof(tl)); + memcpy(*dst + sizeof(tl), val, len); + + *dst = *dst + sizeof(tl) + len; + return true; +} + +/* Same as above, but tries to add dentry tlv. */ +static bool fc_try_add_dentry_info_tlv(u8 **dst, u8 *end, u16 tag, + int parent_ino, int ino, int dlen, + const unsigned char *dname) +{ + struct ext4_fc_dentry_info fcd; + struct ext4_fc_tl tl; + + + if (*dst + sizeof(tl) + sizeof(fcd) + dlen >= end) + return false; + + fcd.fc_parent_ino = cpu_to_le32(parent_ino); + fcd.fc_ino = cpu_to_le32(ino); + tl.fc_tag = cpu_to_le16(tag); + tl.fc_len = cpu_to_le16(sizeof(fcd) + dlen); + memcpy(*dst, &tl, sizeof(tl)); + *dst += sizeof(tl); + memcpy(*dst, &fcd, sizeof(fcd)); + *dst += sizeof(fcd); + memcpy(*dst, dname, dlen); + *dst += dlen; + + return true; +} + +/* + * Writes data tags (EXT4_FC_TAG_ADD_RANGE / EXT4_FC_TAG_DEL_RANGE) + * at memory pointed to by start. Returns number of TLVs that were + * added if successfully. Returns errors otherwise. + */ +static int fc_write_data(struct inode *inode, u8 *start, u8 *end, + u8 **last) +{ + ext4_lblk_t old_blk_size, cur_lblk_off, new_blk_size; + struct ext4_inode_info *ei = EXT4_I(inode); + struct ext4_map_blocks map; + struct ext4_extent extent; + struct ext4_fc_lrange lrange; + u8 *cur = start; + int num_tlvs = 0; + int ret; + + write_lock(&ei->i_fc_lock); + old_blk_size = ei->i_fc_lblk_start; + new_blk_size = ei->i_fc_lblk_end; + ei->i_fc_lblk_start = ei->i_fc_lblk_end; + write_unlock(&ei->i_fc_lock); + + cur_lblk_off = old_blk_size; + jbd_debug(1, "%s: will try writing %ld to %ld for inode %ld\n", + __func__, cur_lblk_off, new_blk_size, inode->i_ino); + while (cur_lblk_off <= new_blk_size) { + map.m_lblk = cur_lblk_off; + map.m_len = new_blk_size - cur_lblk_off + 1; + ret = ext4_map_blocks(NULL, inode, &map, 0); + if (ret < 0) + return ret; + if (map.m_len == 0) + return -ECANCELED; + if (map.m_flags & EXT4_MAP_UNWRITTEN) + return -ECANCELED; + + cur_lblk_off += map.m_len; + if (ret == 0) { + lrange.fc_lblk = cpu_to_le32(map.m_lblk); + lrange.fc_len = cpu_to_le32(map.m_len); + if (!fc_try_add_tlv(&cur, end, EXT4_FC_TAG_DEL_RANGE, + sizeof(lrange), (u8 *)&lrange)) + return -ENOSPC; + + } else { + extent.ee_block = cpu_to_le32(map.m_lblk); + extent.ee_len = cpu_to_le16(map.m_len); + ext4_ext_store_pblock(&extent, map.m_pblk); + ext4_ext_mark_initialized(&extent); + if (!fc_try_add_tlv(&cur, end, EXT4_FC_TAG_ADD_RANGE, + sizeof(struct ext4_extent), (u8 *)&extent)) + return -ENOSPC; + } + num_tlvs++; + } + *last = cur; + + return num_tlvs; +} + +static int fc_commit_data_inode(journal_t *journal, struct inode *inode) +{ + struct ext4_fc_commit_hdr *hdr; + struct buffer_head *bh; + u8 *start, *cur, *end; + int ret; + int num_tlvs = 0; + + ret = jbd2_map_fc_buf(journal, &bh); + if (ret) + return -ECANCELED; + + start = cur = ((__u8 *)bh->b_data + sizeof(journal_header_t)); + end = (__u8 *)bh->b_data + journal->j_blocksize; + hdr = (struct ext4_fc_commit_hdr *)start; + + ret = fc_write_hdr(inode, start, end, &cur); + if (ret < 0) + return ret; + + ret = fc_write_data(inode, cur, end, &cur); + if (ret < 0) + return ret; + memset(cur, 0, end - cur); + + hdr->fc_num_tlvs = cpu_to_le16(num_tlvs + ret); + hdr->fc_csum = 0; + hdr->fc_csum = cpu_to_le32(ext4_chksum(EXT4_SB(inode->i_sb), + 0, start, end - start)); + submit_fc_bh(bh); + ext4_set_inode_state(inode, EXT4_STATE_FC_MDATA_SUBMIT); + + return 1; +} + +static int submit_all_inode_data(journal_t *journal) +{ + struct super_block *sb = (struct super_block *)(journal->j_private); + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_inode_info *iter; + struct list_head *pos; + int ret = 0; + + spin_lock(&sbi->s_fc_lock); + list_for_each(pos, &sbi->s_fc_q) { + iter = list_entry(pos, struct ext4_inode_info, i_fc_list); + ext4_set_inode_state(&iter->vfs_inode, + EXT4_STATE_FC_DATA_SUBMIT); + spin_unlock(&sbi->s_fc_lock); + ret = jbd2_submit_inode_data(journal, iter->jinode); + if (ret) + return ret; + spin_lock(&sbi->s_fc_lock); + } + spin_unlock(&sbi->s_fc_lock); + + return ret; +} + +static int wait_all_inode_data(journal_t *journal) +{ + struct super_block *sb = (struct super_block *)(journal->j_private); + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_inode_info *pos, *n; + int ret = 0; + + spin_lock(&sbi->s_fc_lock); + list_for_each_entry_safe(pos, n, &sbi->s_fc_q, i_fc_list) { + if (!ext4_test_inode_state(&pos->vfs_inode, + EXT4_STATE_FC_DATA_SUBMIT)) + continue; + spin_unlock(&sbi->s_fc_lock); + ret = jbd2_wait_inode_data(journal, pos->jinode); + if (ret) + break; + spin_lock(&sbi->s_fc_lock); + list_safe_reset_next(pos, n, i_fc_list); + list_del_init(&pos->i_fc_list); + + ext4_clear_inode_state(&pos->vfs_inode, + EXT4_STATE_FC_DATA_SUBMIT); + /* Make sure DATA_SUBMIT bit is set before waking up */ + smp_mb(); +#if (BITS_PER_LONG < 64) + wake_up_bit(&pos->i_state_flags, EXT4_STATE_FC_DATA_SUBMIT); +#else + wake_up_bit(&pos->i_flags, EXT4_STATE_FC_DATA_SUBMIT); +#endif + } + spin_unlock(&sbi->s_fc_lock); + + return 0; +} + +static int fc_inode_match(struct inode *inode, void *data) +{ + if (inode->i_ino != (long)data) + return 0; + + if (inode->i_nlink) + return 1; + + /* + * Avoid returning a nearly dead inode (withi_nlink == 0). + */ + if (ext4_test_inode_state(inode, + EXT4_STATE_FC_DATA_SUBMIT)) { + /* + * This is a tricky situation, after we + * submitted data for this inode, someone + * tried to free this. ext4_fc_del() is + * waiting on FC_DATA_SUBMIT bit to clear. + * Since we are never going to wait for data + * just wake the sleeper. + * TODO: Even in this case don't fallback to full commits + * and indicate the caller that this is a deleted inode. + */ + ext4_clear_inode_state( + inode, EXT4_STATE_FC_DATA_SUBMIT); + /* Make sure that data_submit bit is set */ + smp_mb(); +#if (BITS_PER_LONG < 64) + wake_up_bit(&EXT4_I(inode)->i_state_flags, + EXT4_STATE_FC_DATA_SUBMIT); +#else + wake_up_bit(&EXT4_I(inode)->i_flags, + EXT4_STATE_FC_DATA_SUBMIT); +#endif + } + return 0; +} + +/* + * Commits all the dentry updates and respective inodes till and + * including "last". + */ +static int fc_commit_dentry_updates(journal_t *journal, + struct ext4_fc_dentry_update *last) +{ + struct super_block *sb = (struct super_block *)(journal->j_private); + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_fc_commit_hdr *hdr; + struct ext4_fc_dentry_update *fc_dentry; + struct inode *inode; + struct buffer_head *bh; + u8 *start, *cur, *end; + int len, ret; + int nblks = 0; + int num_tlvs = 0; + bool is_last; + + ret = jbd2_map_fc_buf(journal, &bh); + if (ret) + return -ECANCELED; + + start = cur = ((__u8 *)bh->b_data + sizeof(journal_header_t)); + end = (__u8 *)bh->b_data + journal->j_blocksize; + hdr = (struct ext4_fc_commit_hdr *)start; + + spin_lock(&sbi->s_fc_lock); + while (!list_empty(&sbi->s_fc_dentry_q)) { + fc_dentry = list_first_entry( + &sbi->s_fc_dentry_q, struct ext4_fc_dentry_update, + fcd_list); + list_del_init(&fc_dentry->fcd_list); + spin_unlock(&sbi->s_fc_lock); + if (!fc_try_add_dentry_info_tlv( + &cur, end, fc_dentry->fcd_op, + fc_dentry->fcd_parent, fc_dentry->fcd_ino, + fc_dentry->fcd_name.len, + fc_dentry->fcd_name.name)) { + kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry); + return -ENOSPC; + } + num_tlvs++; + inode = ilookup5_nowait(sb, fc_dentry->fcd_ino, fc_inode_match, + (void *)(long)fc_dentry->fcd_ino); + /* + * If this was the last metadata update for this inode, clear + * since we are going to handle it now. + */ + if (inode && EXT4_I(inode)->i_fc_mdata_update == fc_dentry) + EXT4_I(inode)->i_fc_mdata_update = NULL; + if (fc_dentry != last && + fc_dentry->fcd_op != EXT4_FC_TAG_CREAT_DENTRY) { + if (inode) + iput(inode); + spin_lock(&sbi->s_fc_lock); + kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry); + continue; + } + is_last = (fc_dentry == last); + kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry); + if (IS_ERR_OR_NULL(inode)) + /* + * Inode got evicted from memory for some + * reason. it's possible that someone deleted + * the inode after we started fast commit. + * We just abort fast commits in this case. + */ + return -ECANCELED; + + /* + * It's either the last dentry update or it's inode + * creation. Until now, we have written all the + * directory entry updates since the beginning or + * the last creation in current fast commit buf. + * Move the contents towards the end of the block and + * then write header first. We move it to the end + * because header size is variable. + */ + len = cur - start; + memmove(end - len, start, len); + ret = fc_write_hdr(inode, start, end - len, &cur); + if (ret < 0) { + iput(inode); + return ret; + } + /* + * Place directory entry updates right after the + * header and the inode and write remaining + * tags if any. + */ + memmove(cur, end - len, len); + cur = cur + len; + if (inode->i_nlink) { + ret = fc_write_data(inode, cur, end, &cur); + if (ret < 0) { + iput(inode); + return ret; + } + } + memset(cur, 0, end - cur); + hdr->fc_num_tlvs = cpu_to_le16(num_tlvs + ret); + hdr->fc_csum = cpu_to_le32( + ext4_chksum(sbi, 0, start, end - start)); + submit_fc_bh(bh); + nblks++; + if (!inode->i_nlink) { + ext4_clear_inode_state(inode, + EXT4_STATE_FC_DATA_SUBMIT); + smp_mb(); /* Make sure data submit bit is set */ +#if (BITS_PER_LONG < 64) + wake_up_bit(&EXT4_I(inode)->i_state_flags, + EXT4_STATE_FC_DATA_SUBMIT); +#else + wake_up_bit(&EXT4_I(inode)->i_flags, + EXT4_STATE_FC_DATA_SUBMIT); +#endif + } else if (!ext4_test_inode_state(inode, + EXT4_STATE_FC_DATA_SUBMIT)) { + ret = jbd2_submit_inode_data( + journal, EXT4_I(inode)->jinode); + if (ret < 0) + return ret; + ext4_set_inode_state(inode, + EXT4_STATE_FC_DATA_SUBMIT); + } + ext4_set_inode_state(inode, EXT4_STATE_FC_MDATA_SUBMIT); + iput(inode); + if (is_last) { + bh = NULL; + goto skip_unlock; + } + ret = jbd2_map_fc_buf(journal, &bh); + if (ret < 0) + return ret; + start = cur = ((__u8 *)bh->b_data + sizeof(journal_header_t)); + hdr = (struct ext4_fc_commit_hdr *)start; + end = (__u8 *)bh->b_data + journal->j_blocksize; + memset(start, 0, end - start); + spin_lock(&sbi->s_fc_lock); + } + + spin_unlock(&sbi->s_fc_lock); +skip_unlock: + WARN_ON(bh != NULL); + return nblks; +} + +static void ext4_journal_fc_cleanup_cb(journal_t *journal) +{ + struct super_block *sb = journal->j_private; + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_inode_info *iter; + struct ext4_fc_dentry_update *fc_dentry; + + spin_lock(&sbi->s_fc_lock); + while (!list_empty(&sbi->s_fc_q)) { + iter = list_first_entry(&sbi->s_fc_q, + struct ext4_inode_info, i_fc_list); + iter->i_fc_mdata_update = NULL; + + list_del_init(&iter->i_fc_list); + ext4_clear_inode_state(&iter->vfs_inode, + EXT4_STATE_FC_DATA_SUBMIT); + ext4_clear_inode_state(&iter->vfs_inode, + EXT4_STATE_FC_MDATA_SUBMIT); + /* Make sure DATA_SUBMIT bit is set */ + smp_mb(); + wake_up_bit(&iter->i_flags, EXT4_STATE_FC_DATA_SUBMIT); + } + INIT_LIST_HEAD(&sbi->s_fc_q); + while (!list_empty(&sbi->s_fc_dentry_q)) { + fc_dentry = list_first_entry(&sbi->s_fc_dentry_q, + struct ext4_fc_dentry_update, + fcd_list); + list_del_init(&fc_dentry->fcd_list); + spin_unlock(&sbi->s_fc_lock); + + if (fc_dentry->fcd_name.name && + fc_dentry->fcd_name.len > DNAME_INLINE_LEN) + kfree(fc_dentry->fcd_name.name); + kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry); + spin_lock(&sbi->s_fc_lock); + } + INIT_LIST_HEAD(&sbi->s_fc_dentry_q); + sbi->s_mount_state &= ~EXT4_FC_INELIGIBLE; + spin_unlock(&sbi->s_fc_lock); + trace_ext4_journal_fc_stats(sb); +} + +int ext4_fc_perform_hard_commit(journal_t *journal) +{ + struct super_block *sb = (struct super_block *)(journal->j_private); + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_inode_info *iter; + struct list_head *pos; + struct inode *inode; + int ret = 0, nblks = 0; + + ret = submit_all_inode_data(journal); + if (ret < 0) + return ret; + + if (!list_empty(&EXT4_SB(sb)->s_fc_dentry_q)) { + ret = fc_commit_dentry_updates( + journal, list_last_entry( + &EXT4_SB(sb)->s_fc_dentry_q, + struct ext4_fc_dentry_update, + fcd_list)); + if (ret < 0) + return ret; + nblks = ret; + } + + spin_lock(&sbi->s_fc_lock); + list_for_each(pos, &sbi->s_fc_q) { + iter = list_entry(pos, struct ext4_inode_info, i_fc_list); + inode = &iter->vfs_inode; + if (ext4_test_inode_state( + inode, EXT4_STATE_FC_MDATA_SUBMIT) || + !ext4_test_inode_state( + inode, EXT4_STATE_FC_DATA_SUBMIT)) + continue; + + spin_unlock(&sbi->s_fc_lock); + ret = fc_commit_data_inode(journal, inode); + if (ret < 0) + return ret; + nblks += ret; + spin_lock(&sbi->s_fc_lock); + } + spin_unlock(&sbi->s_fc_lock); + + ret = wait_all_inode_data(journal); + if (ret < 0) + return ret; + + return nblks; +} + +int ext4_fc_async_commit_inode(journal_t *journal, tid_t commit_tid, + struct inode *inode) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + struct super_block *sb = inode->i_sb; + struct ext4_sb_info *sbi = EXT4_SB(sb); + int nblks = 0, ret; + int start_jiffies; + + trace_ext4_journal_fc_commit_cb_start(sb); + start_jiffies = jiffies; + + if (!ext4_should_fast_commit(sb) || + (sbi->s_mount_state & EXT4_FC_INELIGIBLE)) { + sbi->s_fc_stats.fc_ineligible_commits++; + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "disabled"); + trace_ext4_journal_fc_stats(sb); + return jbd2_complete_transaction(journal, commit_tid); + } + + if (ext4_is_inode_fc_ineligible(inode)) { + sbi->s_fc_stats.fc_ineligible_commits++; + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "ineligible"); + trace_ext4_journal_fc_stats(sb); + return jbd2_complete_transaction(journal, commit_tid); + } + + /* + * In case of soft consistency mode, we wait for any parallel + * fast commits to complete. In case of hard consistency, if a + * parallel fast commit is ongoing, it is going to take care + * of us as well, so we don't wait. + */ + if (!test_opt2(sb, JOURNAL_FC_SOFT_CONSISTENCY)) + ret = jbd2_start_async_fc_nowait(journal, commit_tid); + else + ret = jbd2_start_async_fc_wait(journal, commit_tid); + if (ret == -EALREADY) { + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "already"); + trace_ext4_journal_fc_stats(sb); + return 0; + } + + if (ret) { + sbi->s_fc_stats.fc_ineligible_commits++; + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "start"); + trace_ext4_journal_fc_stats(sb); + return jbd2_complete_transaction(journal, commit_tid); + } + + if (ext4_test_inode_state(inode, EXT4_STATE_FC_MDATA_SUBMIT)) { + jbd2_stop_async_fc(journal, commit_tid); + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "committed"); + trace_ext4_journal_fc_stats(sb); + return 0; + } + + if (ei->i_fc_tid != commit_tid) { + jbd2_stop_async_fc(journal, commit_tid); + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "stale"); + trace_ext4_journal_fc_stats(sb); + return 0; + } + + if (!test_opt2(sb, JOURNAL_FC_SOFT_CONSISTENCY)) { + ret = ext4_fc_perform_hard_commit(journal); + nblks = ret; + } else if (ei->i_fc_mdata_update) { + ret = submit_all_inode_data(journal); + if (ret < 0) + goto out; + nblks = fc_commit_dentry_updates(journal, + ei->i_fc_mdata_update); + if (nblks < 0) { + ret = nblks; + goto out; + } + ret = wait_all_inode_data(journal); + } else if (!list_empty(&EXT4_I(inode)->i_fc_list)) { + ext4_set_inode_state(inode, EXT4_STATE_FC_DATA_SUBMIT); + ret = jbd2_submit_inode_data(journal, EXT4_I(inode)->jinode); + if (ret < 0) + goto out; + nblks = fc_commit_data_inode(journal, inode); + if (nblks < 0) { + ret = nblks; + goto out; + } + ext4_set_inode_state(inode, EXT4_STATE_FC_MDATA_SUBMIT); + ret = jbd2_wait_inode_data(journal, EXT4_I(inode)->jinode); + spin_lock(&sbi->s_fc_lock); + + list_del_init(&EXT4_I(inode)->i_fc_list); + ext4_clear_inode_state(inode, EXT4_STATE_FC_DATA_SUBMIT); + smp_mb(); /* Make sure data submit bit is set */ +#if (BITS_PER_LONG < 64) + wake_up_bit(&EXT4_I(inode)->i_state_flags, + EXT4_STATE_FC_DATA_SUBMIT); +#else + wake_up_bit(&EXT4_I(inode)->i_flags, + EXT4_STATE_FC_DATA_SUBMIT); +#endif + spin_unlock(&sbi->s_fc_lock); + } + +out: + if (ret < 0) { + sbi->s_fc_stats.fc_ineligible_commits++; + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "fail1"); + jbd2_stop_async_fc(journal, commit_tid); + trace_ext4_journal_fc_stats(sb); + sbi->s_mount_state &= ~EXT4_FC_REPLAY; + return jbd2_complete_transaction(journal, commit_tid); + } + jbd2_wait_on_fc_bufs(journal, nblks); + jbd2_stop_async_fc(journal, commit_tid); + + EXT4_SB(sb)->s_fc_stats.fc_num_commits++; + EXT4_SB(sb)->s_fc_stats.fc_numblks += nblks; + trace_ext4_journal_fc_commit_cb_stop(sb, + nblks < 0 ? 0 : nblks, + nblks >= 0 ? "success" : "fail2"); + trace_ext4_journal_fc_stats(sb); + sbi->s_mount_state &= ~EXT4_FC_REPLAY; + return 0; +} + void ext4_init_fast_commit(struct super_block *sb, journal_t *journal) { if (!ext4_should_fast_commit(sb)) return; + journal->j_fc_cleanup_callback = ext4_journal_fc_cleanup_cb; jbd2_init_fast_commit(journal, EXT4_NUM_FC_BLKS); } diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index b8ac3f29949d..ac83967e4be0 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -586,5 +586,7 @@ int __init ext4_init_fc_dentry_cache(void); void ext4_fc_track_inode(struct inode *inode); void ext4_fc_mark_ineligible(struct inode *inode, int reason); void ext4_fc_disable(struct super_block *sb, int reason); +void ext4_fc_del(struct inode *inode); + #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index e10206e7f4bb..f3bab652f98c 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -114,7 +114,7 @@ static int ext4_fsync_journal(struct inode *inode, bool datasync, !jbd2_trans_will_send_data_barrier(journal, commit_tid)) *needs_barrier = true; - return jbd2_complete_transaction(journal, commit_tid); + return ext4_fc_async_commit_inode(journal, commit_tid, inode); } /* diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e5b45f32dd30..9cbd3f98c5f3 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5135,8 +5135,10 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc) if (wbc->sync_mode != WB_SYNC_ALL || wbc->for_sync) return 0; - err = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal, - EXT4_I(inode)->i_sync_tid); + err = ext4_fc_async_commit_inode(EXT4_SB(inode->i_sb) + ->s_journal, + EXT4_I(inode)->i_sync_tid, + inode); } else { struct ext4_iloc iloc; diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c index fb6520f37135..5ddb23c036a1 100644 --- a/fs/ext4/migrate.c +++ b/fs/ext4/migrate.c @@ -496,6 +496,7 @@ int ext4_ext_migrate(struct inode *inode) * work to orphan_list_cleanup() */ ext4_orphan_del(NULL, tmp_inode); + ext4_fc_del(inode); retval = PTR_ERR(handle); goto out_tmp_inode; } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ea712c9cf77b..16548b0cbe71 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1162,6 +1162,11 @@ static int ext4_drop_inode(struct inode *inode) if (!drop) drop = fscrypt_drop_inode(inode); + if (drop) { + spin_unlock(&inode->i_lock); + ext4_fc_del(inode); + spin_lock(&inode->i_lock); + } trace_ext4_drop_inode(inode, drop); return drop; @@ -1170,6 +1175,11 @@ static int ext4_drop_inode(struct inode *inode) static void ext4_free_in_core_inode(struct inode *inode) { fscrypt_free_inode(inode); + if (!list_empty(&(EXT4_I(inode)->i_fc_list))) { + pr_warn("%s: inode %ld still in fc list", + __func__, inode->i_ino); + ext4_fc_del(inode); + } kmem_cache_free(ext4_inode_cachep, EXT4_I(inode)); } @@ -1184,6 +1194,8 @@ static void ext4_destroy_inode(struct inode *inode) true); dump_stack(); } + if (!list_empty(&(EXT4_I(inode)->i_fc_list))) + ext4_fc_del(inode); } static void init_once(void *foo) diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 5d278a8082a7..4f7c9c00910e 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -92,6 +92,19 @@ TRACE_DEFINE_ENUM(ES_REFERENCED_B); { FALLOC_FL_COLLAPSE_RANGE, "COLLAPSE_RANGE"}, \ { FALLOC_FL_ZERO_RANGE, "ZERO_RANGE"}) +#define show_fc_reason(reason) \ + __print_symbolic(reason, \ + { EXT4_FC_REASON_META_ALLOC, "META_ALLOC"}, \ + { EXT4_FC_REASON_QUOTA, "QUOTA"}, \ + { EXT4_FC_REASON_XATTR, "XATTR"}, \ + { EXT4_FC_REASON_CROSS_RENAME, "CROSS_RENAME"}, \ + { EXT4_FC_REASON_FALLOC_RANGE_OP, "FALLOC_RANGE_OP"}, \ + { EXT4_FC_REASON_JOURNAL_FLAG_CHANGE, "JOURNAL_FLAG_CHANGE"}, \ + { EXT4_FC_REASON_MEM, "NO_MEM"}, \ + { EXT4_FC_REASON_SWAP_BOOT, "SWAP_BOOT"}, \ + { EXT4_FC_REASON_RESIZE, "RESIZE"}, \ + { EXT4_FC_REASON_RENAME_DIR, "FALLOC_RANGE_OP"}) + TRACE_EVENT(ext4_other_inode_update_time, TP_PROTO(struct inode *inode, ino_t orig_ino), @@ -2723,6 +2736,93 @@ TRACE_EVENT(ext4_error, __entry->function, __entry->line) ); +TRACE_EVENT(ext4_journal_fc_commit_cb_start, + TP_PROTO(struct super_block *sb), + + TP_ARGS(sb), + + TP_STRUCT__entry( + __field(dev_t, dev) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + ), + + TP_printk("fast_commit started on dev %d,%d", + MAJOR(__entry->dev), MINOR(__entry->dev)) +); + +TRACE_EVENT(ext4_journal_fc_commit_cb_stop, + TP_PROTO(struct super_block *sb, int nblks, const char *reason), + + TP_ARGS(sb, nblks, reason), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int, nblks) + __field(const char *, reason) + __field(int, num_fc) + __field(int, num_fc_ineligible) + __field(int, nblks_agg) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + __entry->nblks = nblks; + __entry->reason = reason; + __entry->num_fc = EXT4_SB(sb)->s_fc_stats.fc_num_commits; + __entry->num_fc_ineligible = + EXT4_SB(sb)->s_fc_stats.fc_ineligible_commits; + __entry->nblks_agg = EXT4_SB(sb)->s_fc_stats.fc_numblks; + ), + + TP_printk("fc on [%d,%d] nblks %d, reason %s, fc = %d, ineligible = %d, agg_nblks %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->nblks, __entry->reason, __entry->num_fc, + __entry->num_fc_ineligible, __entry->nblks_agg) +); + +#define FC_REASON_NAME_STAT(reason) \ + show_fc_reason(reason), \ + __entry->sbi->s_fc_stats.fc_ineligible_reason_count[reason] + +TRACE_EVENT(ext4_journal_fc_stats, + TP_PROTO(struct super_block *sb), + + TP_ARGS(sb), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(struct ext4_sb_info *, sbi) + __field(int, count) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + __entry->sbi = EXT4_SB(sb); + ), + + TP_printk("dev %d:%d fc ineligible reasons:\n" + "%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d; " + "num_commits:%d, ineligible: %d, numblks: %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + FC_REASON_NAME_STAT(EXT4_FC_REASON_META_ALLOC), + FC_REASON_NAME_STAT(EXT4_FC_REASON_QUOTA), + FC_REASON_NAME_STAT(EXT4_FC_REASON_XATTR), + FC_REASON_NAME_STAT(EXT4_FC_REASON_CROSS_RENAME), + FC_REASON_NAME_STAT(EXT4_FC_REASON_FALLOC_RANGE_OP), + FC_REASON_NAME_STAT(EXT4_FC_REASON_JOURNAL_FLAG_CHANGE), + FC_REASON_NAME_STAT(EXT4_FC_REASON_MEM), + FC_REASON_NAME_STAT(EXT4_FC_REASON_SWAP_BOOT), + FC_REASON_NAME_STAT(EXT4_FC_REASON_RESIZE), + FC_REASON_NAME_STAT(EXT4_FC_REASON_RENAME_DIR), + __entry->sbi->s_fc_stats.fc_num_commits, + __entry->sbi->s_fc_stats.fc_ineligible_commits, + __entry->sbi->s_fc_stats.fc_numblks) + +); + #define DEFINE_TRACE_DENTRY_EVENT(__type) \ TRACE_EVENT(ext4_fc_track_##__type, \ TP_PROTO(struct inode *inode, struct dentry *dentry, int ret), \ From patchwork Wed Apr 8 21:55:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268360 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=OOKQ1Jv3; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ5G3RGBz9sSq for ; Thu, 9 Apr 2020 07:56:34 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726787AbgDHVz7 (ORCPT ); Wed, 8 Apr 2020 17:55:59 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:45998 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726714AbgDHVz5 (ORCPT ); Wed, 8 Apr 2020 17:55:57 -0400 Received: by mail-pg1-f195.google.com with SMTP id 128so1337370pge.12 for ; Wed, 08 Apr 2020 14:55:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=lCaX8GE0MlE8nC+Xaw9bovnVH2hHLFL66j0y6tOc93Q=; b=OOKQ1Jv3WiPPzMr3b00CzuhiI4oS2TApu1RvcuxUdazT7SGp1UwRY2GpWfxdQAzu4f rpytg8sA8zCyi1MnC+4VlPb0UIhORFOKecQ944UThR7M7ad15L23o3wztgg3kjpoDAQ5 pYF/S9bXhGViiuhZNU4aOIfXgoF6OX1wm7rHYfpkZKsCH6wHeOvZOgdBI981CCWC/Qeu 7e54nf90pfiu08IjowAPhl9YyNwc10lNviei2sW8UnbAVgoqEGSb/N74OJ1p1uyqLI+V P+6bXihBPFkMa6Eg/Ec3V7uCOPcYxJhJlRHNW5/c3yzVM1dwTEuh0zzbV0cKsi5UvqiH Dm4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lCaX8GE0MlE8nC+Xaw9bovnVH2hHLFL66j0y6tOc93Q=; b=jyKcYEgekm04V+Fz7CgfFpx4HaEk++rSBRIYKF4a5efOjLjq01/WfGhAPFzT4wT09I THX78zNhu0EzkGvMLQUrkAkG4ClYGbpPQFlVcXNiXeq+yu2Tdp1iWv7liu3aOZZIoeGH 5DjLNKcD6cDKGo+5L+Mh+FITOYQBAY8q+sv0hIpozfv1a+7rfCGq+RrdU12z3lz9l4XX WA+MYufIRwnLYR9ghg0YqYHpmKkl2tWqSjMcdgQNouC0YA+HwliUNcornIdblSs8DlJe QK4juR0LP7bbpiw5O01g4mPpwK08/NuijK/5iW+IdkCTYgxlf7kQje0gWEbPPxoopDx2 DyTA== X-Gm-Message-State: AGi0PuYJngYW8Q8Oq0ti4JCPzeyiq0EWtNqHYwmUbPlVIctLrLMXg/1b TvG7FtgG82dHRtMvlENx3f+tWyLx X-Google-Smtp-Source: APiQypIsJGM+9mzJ0SrbXCPKhTC6ep0H/Rh0gWXUEDhTTZRNhP1I3J7imEl0iPK6xvooXXedb5hu5Q== X-Received: by 2002:a65:424b:: with SMTP id d11mr8477450pgq.17.1586382955087; Wed, 08 Apr 2020 14:55:55 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:54 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 15/20] jbd2: add fast commit recovery path support Date: Wed, 8 Apr 2020 14:55:25 -0700 Message-Id: <20200408215530.25649-15-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Add fc_do_one_pass to invoke file system specific replay callback and pass discovered fast commit blocks to let file system handle those. Signed-off-by: Harshad Shirwadkar --- fs/jbd2/recovery.c | 67 +++++++++++++++++++++++++++++++++++++++++--- include/linux/jbd2.h | 13 +++++++++ 2 files changed, 76 insertions(+), 4 deletions(-) diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index a4967b27ffb6..09f069e59c36 100644 --- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -35,7 +35,6 @@ struct recovery_info int nr_revoke_hits; }; -enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY}; static int do_one_pass(journal_t *journal, struct recovery_info *info, enum passtype pass); static int scan_revoke_records(journal_t *, struct buffer_head *, @@ -225,10 +224,63 @@ static int count_tags(journal_t *journal, struct buffer_head *bh) /* Make sure we wrap around the log correctly! */ #define wrap(journal, var) \ do { \ - if (var >= (journal)->j_last) \ - var -= ((journal)->j_last - (journal)->j_first); \ + unsigned long _wrap_last = \ + jbd2_has_feature_fast_commit(journal) ? \ + (journal)->j_last_fc : (journal)->j_last; \ + \ + if (var >= _wrap_last) \ + var -= (_wrap_last - (journal)->j_first); \ } while (0) +static int fc_do_one_pass(journal_t *journal, + struct recovery_info *info, enum passtype pass) +{ + unsigned int expected_commit_id = info->end_transaction; + unsigned long next_fc_block; + struct buffer_head *bh; + unsigned int seq; + journal_header_t *jhdr; + int err = 0; + + next_fc_block = journal->j_first_fc; + + while (next_fc_block <= journal->j_last_fc) { + jbd_debug(3, "Fast commit replay: next block %ld", + next_fc_block); + err = jread(&bh, journal, next_fc_block); + if (err) { + jbd_debug(3, "Fast commit replay: read error"); + break; + } + + jhdr = (journal_header_t *)bh->b_data; + seq = be32_to_cpu(jhdr->h_sequence); + if (be32_to_cpu(jhdr->h_magic) != JBD2_MAGIC_NUMBER || + seq != expected_commit_id) { + jbd_debug(3, "Fast commit replay: magic / commitid error [%d / %d / %d]\n", + be32_to_cpu(jhdr->h_magic), seq, + expected_commit_id); + break; + } + jbd_debug(3, "Processing fast commit blk with seq %d", + seq); + if (journal->j_fc_replay_callback) { + err = journal->j_fc_replay_callback( + journal, bh, pass, + next_fc_block - + journal->j_first_fc); + if (err) + break; + } + next_fc_block++; + } + + if (err) + jbd_debug(3, "Fast commit replay failed, err = %d\n", err); + + return err; +} + /** * jbd2_journal_recover - recovers a on-disk journal * @journal: the journal to recover @@ -470,7 +522,7 @@ static int do_one_pass(journal_t *journal, break; jbd_debug(2, "Scanning for sequence ID %u at %lu/%lu\n", - next_commit_ID, next_log_block, journal->j_last); + next_commit_ID, next_log_block, journal->j_last_fc); /* Skip over each chunk of the transaction looking * either the next descriptor block or the final commit @@ -768,6 +820,9 @@ static int do_one_pass(journal_t *journal, if (err) goto failed; continue; + case JBD2_FC_BLOCK: + pr_warn("Unexpectedly found fast commit block.\n"); + continue; default: jbd_debug(3, "Unrecognised magic %d, end of scan.\n", @@ -799,6 +854,10 @@ static int do_one_pass(journal_t *journal, success = -EIO; } } + + if (jbd2_has_feature_fast_commit(journal) && pass != PASS_REVOKE) + success = fc_do_one_pass(journal, info, pass); + if (block_error && success == 0) success = -EIO; return success; diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 599113bef67f..01f6de8f6731 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -749,6 +749,8 @@ jbd2_time_diff(unsigned long start, unsigned long end) #define JBD2_NR_BATCH 64 +enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY}; + /** * struct journal_s - The journal_s type is the concrete type associated with * journal_t. @@ -1219,6 +1221,17 @@ struct journal_s * after every commit operation. */ void (*j_fc_cleanup_callback)(struct journal_s *journal); + + /* + * @j_fc_replay_callback: + * + * File-system specific function that performs replay of a fast + * commit. JBD2 calls this function for each fast commit block found in + * the journal. + */ + int (*j_fc_replay_callback)(struct journal_s *journal, + struct buffer_head *bh, + enum passtype pass, int off); }; #define jbd2_might_wait_for_commit(j) \ From patchwork Wed Apr 8 21:55:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268355 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=RUY1oJ5Y; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4c1Q0Dz9sSv for ; Thu, 9 Apr 2020 07:56:00 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726766AbgDHVz6 (ORCPT ); Wed, 8 Apr 2020 17:55:58 -0400 Received: from mail-pj1-f68.google.com ([209.85.216.68]:56203 "EHLO mail-pj1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726759AbgDHVz5 (ORCPT ); Wed, 8 Apr 2020 17:55:57 -0400 Received: by mail-pj1-f68.google.com with SMTP id a32so394920pje.5 for ; Wed, 08 Apr 2020 14:55:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CHLRuyVAfUzUijQHLOb3XGMomAzso/SIDFCvB1CAdhc=; b=RUY1oJ5Yhj3joMsHtxOSP+RxasNWtXwm389mRk/BceBm/UoWkahex4FMsbFhlKx85N i8BE3psgG+ALG97E57qLtPiLkPcIqmN/YBBF5Zgw/eik7vBeqh0ujwRYAxvzPg+3APNQ 5ylsTxJnjRQY8RJSOLCz1HNIOSQHhg6votcl+2aP1rkQuoi/qz+1VIpV6D+BHifVyEVu Ywyh3CEz7Rm4VCEE0yCol7WRf3kg5MOOBu92+inxWnt2sxlN+8DfFVGYOVAizGvHB3d1 2kbaLOrkAPQwbpg3oMQIs+wlkzrZU1GnA7Ry6dwx1egOKQxSlFWJglol1RUu+hJ2cBw5 Oxmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CHLRuyVAfUzUijQHLOb3XGMomAzso/SIDFCvB1CAdhc=; b=Gfxzp4TUevLrRmOqukax269wl1z4PpcHGUvSKO7YmQPNSeL2x7NtXJgj1BxxpwJsRC uEubUpqybBWdaFli7swNma4BanJtq4ssM21QsvMYwilckAanyapGi9OnKUZCiR+lN7ok 2NLGpZ0zyU5OGDdWs+drEwLjdEO5188y6BCdy97YNAlstZ/sHmME7ifrWqRcZ1u9WLB/ imeuhUjzWi9bPvKAFvppTH4/imQ5OE8/hC4ixnPtXV3se5qPsDIzzIZeGJ+j2d7JgOGk OZ4YJpqm4hJhln3tXuxtBRFcXQ+pL3u/L+D4zoS9LB9DylsGLLXDqoR8krdwFunxPrrV 7XOg== X-Gm-Message-State: AGi0PuZlObdqI14ebordPqJLC6OGi07P0GphDgYy91tEysRGSnje7uuN u4x90ewpzfUKXgzUM1k+Y6NwSStW X-Google-Smtp-Source: APiQypIV3wfskg5VjFL18D5hTFdB/lowWCSg/muD5jMVoE+ZC3H2+KDW6/WMcJSK+ua3drl/otJstQ== X-Received: by 2002:a17:902:20b:: with SMTP id 11mr8247861plc.209.1586382955755; Wed, 08 Apr 2020 14:55:55 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:55 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 16/20] ext4: fast commit recovery path preparation Date: Wed, 8 Apr 2020 14:55:26 -0700 Message-Id: <20200408215530.25649-16-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Prepare for making ext4 fast commit recovery path changes. Make a few existing functions visible. Break and add a wrapper around ext4_get_inode_loc to allow reading inode from disk without having a corresponding VFS inode. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4.h | 7 ++++ fs/ext4/inode.c | 64 +++++++++++++++++++++++++++---------- fs/ext4/ioctl.c | 6 ++-- fs/ext4/namei.c | 2 +- include/trace/events/ext4.h | 8 ++--- 5 files changed, 63 insertions(+), 24 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index c4e74bcbbf90..7c9ca8b962f8 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2740,6 +2740,8 @@ extern int ext4_trim_fs(struct super_block *, struct fstrim_range *); extern void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid); /* inode.c */ +void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw, + struct ext4_inode_info *ei); int ext4_inode_is_fast_symlink(struct inode *inode); struct buffer_head *ext4_getblk(handle_t *, struct inode *, ext4_lblk_t, int); struct buffer_head *ext4_bread(handle_t *, struct inode *, ext4_lblk_t, int); @@ -2786,6 +2788,8 @@ extern int ext4_sync_inode(handle_t *, struct inode *); extern void ext4_dirty_inode(struct inode *, int); extern int ext4_change_inode_journal_flag(struct inode *, int); extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *); +extern int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino, + struct ext4_iloc *iloc); extern int ext4_inode_attach_jinode(struct inode *inode); extern int ext4_can_truncate(struct inode *inode); extern int ext4_truncate(struct inode *); @@ -2819,12 +2823,15 @@ extern int ext4_ind_remove_space(handle_t *handle, struct inode *inode, /* ioctl.c */ extern long ext4_ioctl(struct file *, unsigned int, unsigned long); extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long); +extern void ext4_reset_inode_seed(struct inode *inode); /* migrate.c */ extern int ext4_ext_migrate(struct inode *); extern int ext4_ind_migrate(struct inode *inode); /* namei.c */ +extern int ext4_init_new_dir(handle_t *handle, struct inode *dir, + struct inode *inode); extern int ext4_dirblock_csum_verify(struct inode *inode, struct buffer_head *bh); extern int ext4_orphan_add(handle_t *, struct inode *); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9cbd3f98c5f3..b5ca07497bbc 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -101,8 +101,8 @@ static int ext4_inode_csum_verify(struct inode *inode, struct ext4_inode *raw, return provided == calculated; } -static void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw, - struct ext4_inode_info *ei) +void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw, + struct ext4_inode_info *ei) { __u32 csum; @@ -4251,22 +4251,22 @@ int ext4_truncate(struct inode *inode) * data in memory that is needed to recreate the on-disk version of this * inode. */ -static int __ext4_get_inode_loc(struct inode *inode, - struct ext4_iloc *iloc, int in_mem) +static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, + struct ext4_iloc *iloc, int in_mem, + ext4_fsblk_t *ret_block) { struct ext4_group_desc *gdp; struct buffer_head *bh; - struct super_block *sb = inode->i_sb; ext4_fsblk_t block; struct blk_plug plug; int inodes_per_block, inode_offset; iloc->bh = NULL; - if (inode->i_ino < EXT4_ROOT_INO || - inode->i_ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)) + if (ino < EXT4_ROOT_INO || + ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)) return -EFSCORRUPTED; - iloc->block_group = (inode->i_ino - 1) / EXT4_INODES_PER_GROUP(sb); + iloc->block_group = (ino - 1) / EXT4_INODES_PER_GROUP(sb); gdp = ext4_get_group_desc(sb, iloc->block_group, NULL); if (!gdp) return -EIO; @@ -4275,7 +4275,7 @@ static int __ext4_get_inode_loc(struct inode *inode, * Figure out the offset within the block group inode table */ inodes_per_block = EXT4_SB(sb)->s_inodes_per_block; - inode_offset = ((inode->i_ino - 1) % + inode_offset = ((ino - 1) % EXT4_INODES_PER_GROUP(sb)); block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block); iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb); @@ -4376,7 +4376,7 @@ static int __ext4_get_inode_loc(struct inode *inode, * has in-inode xattrs, or we don't have this inode in memory. * Read the block from disk. */ - trace_ext4_load_inode(inode); + trace_ext4_load_inode(sb, ino); get_bh(bh); bh->b_end_io = end_buffer_read_sync; submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, bh); @@ -4384,8 +4384,8 @@ static int __ext4_get_inode_loc(struct inode *inode, wait_on_buffer(bh); if (!buffer_uptodate(bh)) { simulate_eio: - ext4_error_inode_block(inode, block, EIO, - "unable to read itable block"); + if (ret_block) + *ret_block = block; brelse(bh); return -EIO; } @@ -4395,11 +4395,43 @@ static int __ext4_get_inode_loc(struct inode *inode, return 0; } +static int __ext4_get_inode_loc_noinmem(struct inode *inode, + struct ext4_iloc *iloc) +{ + ext4_fsblk_t err_blk; + int ret; + + ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc, 0, + &err_blk); + + if (ret == -EIO) + ext4_error_inode_block(inode, err_blk, EIO, + "unable to read itable block"); + + return ret; +} + int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc) { + ext4_fsblk_t err_blk; + int ret; + /* We have all inode data except xattrs in memory here. */ - return __ext4_get_inode_loc(inode, iloc, - !ext4_test_inode_state(inode, EXT4_STATE_XATTR)); + ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc, + !ext4_test_inode_state(inode, EXT4_STATE_XATTR), &err_blk); + + if (ret == -EIO) + ext4_error_inode_block(inode, err_blk, EIO, + "unable to read itable block"); + + return ret; +} + + +int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino, + struct ext4_iloc *iloc) +{ + return __ext4_get_inode_loc(sb, ino, iloc, 0, NULL); } static bool ext4_should_use_dax(struct inode *inode) @@ -4551,7 +4583,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, ei = EXT4_I(inode); iloc.bh = NULL; - ret = __ext4_get_inode_loc(inode, &iloc, 0); + ret = __ext4_get_inode_loc_noinmem(inode, &iloc); if (ret < 0) goto bad_inode; raw_inode = ext4_raw_inode(&iloc); @@ -5142,7 +5174,7 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc) } else { struct ext4_iloc iloc; - err = __ext4_get_inode_loc(inode, &iloc, 0); + err = __ext4_get_inode_loc_noinmem(inode, &iloc); if (err) return err; /* diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index f66bcf185f5b..93523709f039 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -86,7 +86,7 @@ static void swap_inode_data(struct inode *inode1, struct inode *inode2) i_size_write(inode2, isize); } -static void reset_inode_seed(struct inode *inode) +void ext4_reset_inode_seed(struct inode *inode) { struct ext4_inode_info *ei = EXT4_I(inode); struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); @@ -199,8 +199,8 @@ static long swap_inode_boot_loader(struct super_block *sb, inode->i_generation = prandom_u32(); inode_bl->i_generation = prandom_u32(); - reset_inode_seed(inode); - reset_inode_seed(inode_bl); + ext4_reset_inode_seed(inode); + ext4_reset_inode_seed(inode_bl); ext4_discard_preallocations(inode); diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 2d9c3767d8d6..3e69006e79f4 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2742,7 +2742,7 @@ struct ext4_dir_entry_2 *ext4_init_dot_dotdot(struct inode *inode, return ext4_next_entry(de, blocksize); } -static int ext4_init_new_dir(handle_t *handle, struct inode *dir, +int ext4_init_new_dir(handle_t *handle, struct inode *dir, struct inode *inode) { struct buffer_head *dir_block = NULL; diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 4f7c9c00910e..8f31fd427ccc 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -1755,9 +1755,9 @@ TRACE_EVENT(ext4_ext_load_extent, ); TRACE_EVENT(ext4_load_inode, - TP_PROTO(struct inode *inode), + TP_PROTO(struct super_block *sb, unsigned long ino), - TP_ARGS(inode), + TP_ARGS(sb, ino), TP_STRUCT__entry( __field( dev_t, dev ) @@ -1765,8 +1765,8 @@ TRACE_EVENT(ext4_load_inode, ), TP_fast_assign( - __entry->dev = inode->i_sb->s_dev; - __entry->ino = inode->i_ino; + __entry->dev = sb->s_dev; + __entry->ino = ino; ), TP_printk("dev %d,%d ino %ld", From patchwork Wed Apr 8 21:55:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268357 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=BaerydpF; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4g56DCz9sSk for ; Thu, 9 Apr 2020 07:56:03 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726825AbgDHV4D (ORCPT ); Wed, 8 Apr 2020 17:56:03 -0400 Received: from mail-pj1-f65.google.com ([209.85.216.65]:37329 "EHLO mail-pj1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726775AbgDHVz7 (ORCPT ); Wed, 8 Apr 2020 17:55:59 -0400 Received: by mail-pj1-f65.google.com with SMTP id k3so396959pjj.2 for ; Wed, 08 Apr 2020 14:55:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GtjfFbGj4P83Q8j99lnWEEKYAER1RbVFaz8/pYX3NJc=; b=BaerydpFRlKy1+/mGuPsndvI9wyOTY32ReE+ToUQi/WhwNjOpSD7wJD95ygIWb/SX7 vwmcWACnQww8ajb6bPHRCPEStZHCYPokWP7t9Zc5on32cCtlzqq9/cClVHknDlJqECw4 O9y3n7sY+mry6rIHRkOrGFQa7y8NCmsVPX0NHYppT67fIQ2z54YNXnlnN2j+UvZCNUrl mFWiQzKnEt98lJj+GfgnHqUa+qIsi2whF86NxFiSAE79FZlWVjdR7KE7Lfmwj3+vpoV9 V6Iikr0KreLmNByXLfz5Kn9X20q7C30px4BkLa/U5yme3YApzt8KlXoBUxjMmexomvuF Ha5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GtjfFbGj4P83Q8j99lnWEEKYAER1RbVFaz8/pYX3NJc=; b=n4HbmFswWbWNknDhNtseywcuqfsMbthVw2MnlVXbQSTomOj5H3yDfywZOzP6JWMczY VCbQuXKPVewikWtQM7aO2b2/NGJOn1+xFkXLWhg4usLIHVCFDnph5XetuWSof4cqwuDd FFQBqRPu8tD5zKXSOW+elLQKseYHO112myBWeCgZ3wWIbclsaubZVuDVIE8LVFeADqul P/M+dtPE+oKhiTDEum4dTVyl3jmJR2lGkiQp+SrP/Q8MzxVLmbCx/KLaVBEVCXPESxTl Jc9OvIpuUo0yBF3HsukcL18XKFWhvij4LfuK9OiF6jVJpqGGu0gjVLhMIMp6ZfgPphzJ UGDQ== X-Gm-Message-State: AGi0Pua7Hhmo5BjsiXgtmEz4RKeT1449Y6zMUbfkD4Ra8VHCOLtm+KB+ ko+VKlDdrUokhW4949tWRV+F3PlF X-Google-Smtp-Source: APiQypK1vwQRrS45Dp81v9NX4WeWb6mnJbTbspJRxq1XJjFj6L1Ax9v9vMZj51gK6hWRsQzhxzEvYQ== X-Received: by 2002:a17:902:7896:: with SMTP id q22mr8242871pll.75.1586382956489; Wed, 08 Apr 2020 14:55:56 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:56 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar , kbuild test robot Subject: [PATCH v6 17/20] ext4: add idempotent helpers to manipulate bitmaps Date: Wed, 8 Apr 2020 14:55:27 -0700 Message-Id: <20200408215530.25649-17-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar For fast commit replay path, we need idempotent helpers that mark inodes used, data blocks as used or free. It's important these are idempotent and that's because we can crash while we are replaying. Signed-off-by: Harshad Shirwadkar Reported-by: kbuild test robot --- fs/ext4/ext4.h | 1 + fs/ext4/ialloc.c | 113 ++++++++++++++++++++++++++++++++++++++ fs/ext4/mballoc.c | 136 +++++++++++++++++++++++++++++++++++++++++++++- fs/ext4/mballoc.h | 2 + 4 files changed, 251 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 7c9ca8b962f8..45b73c8bf5a6 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2690,6 +2690,7 @@ extern int ext4fs_dirhash(const struct inode *dir, const char *name, int len, struct dx_hash_info *hinfo); /* ialloc.c */ +extern int ext4_mark_inode_used(struct super_block *sb, int ino); extern struct inode *__ext4_new_inode(handle_t *, struct inode *, umode_t, const struct qstr *qstr, __u32 goal, uid_t *owner, __u32 i_flags, diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index b420c9dc444d..7ecac922d8a4 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -742,6 +742,119 @@ static int find_inode_bit(struct super_block *sb, ext4_group_t group, return 1; } +int ext4_mark_inode_used(struct super_block *sb, int ino) +{ + unsigned long max_ino = le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count); + struct buffer_head *inode_bitmap_bh = NULL, *group_desc_bh = NULL; + struct ext4_group_desc *gdp; + ext4_group_t group; + int bit; + int err = -EFSCORRUPTED; + + if (ino < EXT4_FIRST_INO(sb) || ino > max_ino) + goto out; + + group = (ino - 1) / EXT4_INODES_PER_GROUP(sb); + bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb); + inode_bitmap_bh = ext4_read_inode_bitmap(sb, group); + if (IS_ERR(inode_bitmap_bh)) + return PTR_ERR(inode_bitmap_bh); + + if (ext4_test_bit(bit, inode_bitmap_bh->b_data)) { + err = -EEXIST; + goto out; + } + + gdp = ext4_get_group_desc(sb, group, &group_desc_bh); + if (!gdp || !group_desc_bh) { + err = -EINVAL; + goto out; + } + + ext4_set_bit(bit, inode_bitmap_bh->b_data); + + BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata"); + err = ext4_handle_dirty_metadata(NULL, NULL, inode_bitmap_bh); + if (err) { + ext4_std_error(sb, err); + goto out; + } + sync_dirty_buffer(inode_bitmap_bh); + BUFFER_TRACE(group_desc_bh, "get_write_access"); + + /* We may have to initialize the block bitmap if it isn't already */ + if (ext4_has_group_desc_csum(sb) && + gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + struct buffer_head *block_bitmap_bh; + + block_bitmap_bh = ext4_read_block_bitmap(sb, group); + if (IS_ERR(block_bitmap_bh)) { + err = PTR_ERR(block_bitmap_bh); + goto out; + } + + BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap"); + err = ext4_handle_dirty_metadata(NULL, NULL, block_bitmap_bh); + sync_dirty_buffer(block_bitmap_bh); + + /* recheck and clear flag under lock if we still need to */ + ext4_lock_group(sb, group); + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { + gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); + ext4_free_group_clusters_set(sb, gdp, + ext4_free_clusters_after_init(sb, group, gdp)); + ext4_block_bitmap_csum_set(sb, group, gdp, + block_bitmap_bh); + ext4_group_desc_csum_set(sb, group, gdp); + } + ext4_unlock_group(sb, group); + brelse(block_bitmap_bh); + + if (err) { + ext4_std_error(sb, err); + goto out; + } + } + + /* Update the relevant bg descriptor fields */ + if (ext4_has_group_desc_csum(sb)) { + int free; + + ext4_lock_group(sb, group); /* while we modify the bg desc */ + free = EXT4_INODES_PER_GROUP(sb) - + ext4_itable_unused_count(sb, gdp); + if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { + gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT); + free = 0; + } + + /* + * Check the relative inode number against the last used + * relative inode number in this group. if it is greater + * we need to update the bg_itable_unused count + */ + if (bit >= free) + ext4_itable_unused_set(sb, gdp, + (EXT4_INODES_PER_GROUP(sb) - bit - 1)); + } else { + ext4_lock_group(sb, group); + } + + ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1); + if (ext4_has_group_desc_csum(sb)) { + ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh, + EXT4_INODES_PER_GROUP(sb) / 8); + ext4_group_desc_csum_set(sb, group, gdp); + } + + ext4_unlock_group(sb, group); + err = ext4_handle_dirty_metadata(NULL, NULL, group_desc_bh); + sync_dirty_buffer(group_desc_bh); +out: + return err; +} + /* * There are two policies for allocating an inode. If the new inode is * a directory, then a forward search is made for a block group with both diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 87c85be4c12e..db08208c1137 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3059,6 +3059,93 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, return err; } +void ext4_mb_mark_used(struct super_block *sb, ext4_fsblk_t block, + int len) +{ + struct buffer_head *bitmap_bh = NULL; + struct ext4_group_desc *gdp; + struct buffer_head *gdp_bh; + struct ext4_sb_info *sbi = EXT4_SB(sb); + ext4_group_t group; + ext4_fsblk_t cluster; + ext4_grpblk_t blkoff; + int i, clen, err; + int already_allocated_count; + + cluster = EXT4_B2C(sbi, block); + clen = EXT4_B2C(sbi, len); + + ext4_get_group_no_and_offset(sb, block, &group, &blkoff); + bitmap_bh = ext4_read_block_bitmap(sb, group); + if (IS_ERR(bitmap_bh)) { + err = PTR_ERR(bitmap_bh); + bitmap_bh = NULL; + goto out_err; + } + + err = -EIO; + gdp = ext4_get_group_desc(sb, group, &gdp_bh); + if (!gdp) + goto out_err; + + if (!ext4_data_block_valid(sbi, block, len)) { + ext4_error(sb, "Allocating blks %llu-%llu which overlap mdata", + cluster, cluster+clen); + /* File system mounted not to panic on error + * Fix the bitmap and return EFSCORRUPTED + * We leak some of the blocks here. + */ + ext4_lock_group(sb, group); + ext4_set_bits(bitmap_bh->b_data, blkoff, clen); + ext4_unlock_group(sb, group); + err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + if (!err) + err = -EFSCORRUPTED; + sync_dirty_buffer(bitmap_bh); + goto out_err; + } + + ext4_lock_group(sb, group); + already_allocated_count = 0; + for (i = 0; i < clen; i++) + if (mb_test_bit(blkoff + i, bitmap_bh->b_data)) + already_allocated_count++; + + ext4_set_bits(bitmap_bh->b_data, blkoff, clen); + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { + gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); + ext4_free_group_clusters_set(sb, gdp, + ext4_free_clusters_after_init(sb, + group, gdp)); + } + clen = ext4_free_group_clusters(sb, gdp) - clen + + already_allocated_count; + ext4_free_group_clusters_set(sb, gdp, clen); + ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh); + ext4_group_desc_csum_set(sb, group, gdp); + + ext4_unlock_group(sb, group); + + if (sbi->s_log_groups_per_flex) { + ext4_group_t flex_group = ext4_flex_group(sbi, group); + + atomic64_sub(len, + &sbi_array_rcu_deref(sbi, s_flex_groups, + flex_group)->free_clusters); + } + + err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + if (err) + goto out_err; + sync_dirty_buffer(bitmap_bh); + err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); + sync_dirty_buffer(gdp_bh); + +out_err: + brelse(bitmap_bh); +} + /* * here we normalize request for locality group * Group request are normalized to s_mb_group_prealloc, which goes to @@ -4721,6 +4808,47 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b, return 0; } +void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block, + unsigned long count) +{ + struct buffer_head *bitmap_bh; + struct super_block *sb = inode->i_sb; + struct ext4_group_desc *gdp; + struct buffer_head *gdp_bh; + ext4_group_t group; + ext4_grpblk_t blkoff; + int already_freed = 0, err, i; + + ext4_get_group_no_and_offset(sb, block, &group, &blkoff); + bitmap_bh = ext4_read_block_bitmap(sb, group); + if (IS_ERR(bitmap_bh)) { + err = PTR_ERR(bitmap_bh); + pr_warn("Failed to read block bitmap\n"); + return; + } + gdp = ext4_get_group_desc(sb, group, &gdp_bh); + if (!gdp) + return; + + for (i = 0; i < count; i++) { + if (!mb_test_bit(blkoff + i, bitmap_bh->b_data)) + already_freed++; + } + mb_clear_bits(bitmap_bh->b_data, blkoff, count); + err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + if (err) + return; + ext4_free_group_clusters_set( + sb, gdp, ext4_free_group_clusters(sb, gdp) + + count - already_freed); + ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh); + ext4_group_desc_csum_set(sb, group, gdp); + ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); + sync_dirty_buffer(bitmap_bh); + sync_dirty_buffer(gdp_bh); + brelse(bitmap_bh); +} + /** * ext4_free_blocks() -- Free given blocks and update quota * @handle: handle for this transaction @@ -4747,6 +4875,13 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, int err = 0; int ret; + sbi = EXT4_SB(sb); + + if (sbi->s_mount_state & EXT4_FC_REPLAY) { + ext4_free_blocks_simple(inode, block, count); + return; + } + might_sleep(); if (bh) { if (block) @@ -4755,7 +4890,6 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, block = bh->b_blocknr; } - sbi = EXT4_SB(sb); if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) && !ext4_data_block_valid(sbi, block, count)) { ext4_error(sb, "Freeing blocks not in datazone - " diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index 88c98f17e3d9..1881710041b6 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -215,4 +215,6 @@ ext4_mballoc_query_range( ext4_mballoc_query_range_fn formatter, void *priv); +void ext4_mb_mark_used(struct super_block *sb, ext4_fsblk_t block, + int len); #endif From patchwork Wed Apr 8 21:55:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268359 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=pfYsbtrW; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ5F4z7dz9sR4 for ; Thu, 9 Apr 2020 07:56:33 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727007AbgDHV4d (ORCPT ); Wed, 8 Apr 2020 17:56:33 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:44588 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726762AbgDHVz7 (ORCPT ); Wed, 8 Apr 2020 17:55:59 -0400 Received: by mail-pf1-f194.google.com with SMTP id b72so3063864pfb.11 for ; Wed, 08 Apr 2020 14:55:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kQfvuRpZ4w8iMJngw8IAku8h4fGLGF0RHMC3sZquPto=; b=pfYsbtrWoHfk/+7VSHj9p4qahodNc5+ucHaa6mUmtZh0+PEvTZMZKh3OQHYUsGmJkU /7y/jR1yeydKM4imyXbE8MgC1GJ8H5pZf+QPvFFzaYWBcR95XrQc9xr83XaPXLDz1ets 0ow/9RToJmLDgedSZxu+gMwEtqaQvHQY7wCpBnlDMIjS/QCk6Gv9Yi8cC3uIlcC0bM6t 5Hi/nf7wMZOKMU/CYYhrg9FRggAaQl7K52SYgatB8gkch0UvI5WArWwqndKvjLPUsGCC Onb17N3sV0fvtZ9Uxac/Y/yjIcqZqcqEGShp6NpKjmQ+0Y5om0Imc4zfNCo1WOOEC1fw Z97g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kQfvuRpZ4w8iMJngw8IAku8h4fGLGF0RHMC3sZquPto=; b=eBVVBpZ637FKBlIZ5NzkmxZL61WppetRGGm67Cb4mU8k/vCLVieyfDlgfQeijo6ZwW wA5fdA8rn40r+z3UMnjedXm++2sEel/JWXN7hcRcQKARgCFjyniUVOVkZtQtaB2XbVw3 t6LZWzby8rWOLJGc2WsHAl7nIDXwsJSdYKInm/pQ/3ySH7rW9kvzA77sLW5zyUCtR7Nt 6XAKvPEQk0+eTIv3GILvmECW95uehsr+U2GZ2CpukvlH6uYhhdNmPzbLxWJzO9tTjtep qlZPBWfzVA6WOdVQ17X5OPPvMBhEZ4J17FK/tdYB9o0uFI9fMcOfCScJZDBeRaitOjgL DBlw== X-Gm-Message-State: AGi0PuaGHwrxPjJd5OPDQJT0UnmTo+0CqhUpJKr+lwlO33tYKvtrkqO6 xAwJ4eU0LnVKd75rV0cI4fVjA6IB X-Google-Smtp-Source: APiQypJWgoHyTzVbjJ+a90qbC7zLqzENVN7eb/9boEZJnStTWsCP0VxbzMnHhOH7ias8lMPAxFeMPA== X-Received: by 2002:a63:e49:: with SMTP id 9mr8556628pgo.182.1586382957187; Wed, 08 Apr 2020 14:55:57 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:56 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 18/20] ext4: disable certain features in replay path Date: Wed, 8 Apr 2020 14:55:28 -0700 Message-Id: <20200408215530.25649-18-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Replay path uses similar code paths for replaying committed changes. But since it runs before full initialization of the file system and also since we don't have to be super careful about performance, we can and need to disable certain file system features during the replay path. More specifically, we disable most of the extent status tree stuff, mballoc and some places where we mark file system with errors. Signed-off-by: Harshad Shirwadkar --- fs/ext4/balloc.c | 7 +++++- fs/ext4/ext4_jbd2.c | 2 +- fs/ext4/extents_status.c | 24 +++++++++++++++++++ fs/ext4/ialloc.c | 52 +++++++++++++++++++++++++++------------- fs/ext4/inode.c | 14 +++++++---- fs/ext4/mballoc.c | 21 ++++++++++------ 6 files changed, 90 insertions(+), 30 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 25960bb4fe69..220d818030ef 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -368,7 +368,12 @@ static int ext4_validate_block_bitmap(struct super_block *sb, struct buffer_head *bh) { ext4_fsblk_t blk; - struct ext4_group_info *grp = ext4_get_group_info(sb, block_group); + struct ext4_group_info *grp; + + if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY) + return 0; + + grp = ext4_get_group_info(sb, block_group); if (buffer_verified(bh)) return 0; diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index bbba1b067fdf..b9b3833c8fdd 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -101,7 +101,7 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, return ERR_PTR(err); journal = EXT4_SB(sb)->s_journal; - if (!journal) + if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)) return ext4_get_nojournal(); return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds, GFP_NOFS, type, line); diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index d996b44d2265..69c16ac7416e 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -311,6 +311,9 @@ void ext4_es_find_extent_range(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t end, struct extent_status *es) { + if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) + return; + trace_ext4_es_find_extent_range_enter(inode, lblk); read_lock(&EXT4_I(inode)->i_es_lock); @@ -361,6 +364,9 @@ bool ext4_es_scan_range(struct inode *inode, { bool ret; + if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) + return false; + read_lock(&EXT4_I(inode)->i_es_lock); ret = __es_scan_range(inode, matching_fn, lblk, end); read_unlock(&EXT4_I(inode)->i_es_lock); @@ -404,6 +410,9 @@ bool ext4_es_scan_clu(struct inode *inode, { bool ret; + if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) + return false; + read_lock(&EXT4_I(inode)->i_es_lock); ret = __es_scan_clu(inode, matching_fn, lblk); read_unlock(&EXT4_I(inode)->i_es_lock); @@ -812,6 +821,9 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, int err = 0; struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) + return 0; + es_debug("add [%u/%u) %llu %x to extent status tree of inode %lu\n", lblk, len, pblk, status, inode->i_ino); @@ -873,6 +885,9 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk, struct extent_status newes; ext4_lblk_t end = lblk + len - 1; + if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) + return; + newes.es_lblk = lblk; newes.es_len = len; ext4_es_store_pblock_status(&newes, pblk, status); @@ -908,6 +923,9 @@ int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk, struct rb_node *node; int found = 0; + if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) + return 0; + trace_ext4_es_lookup_extent_enter(inode, lblk); es_debug("lookup extent in block %u\n", lblk); @@ -1419,6 +1437,9 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk, int err = 0; int reserved = 0; + if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) + return 0; + trace_ext4_es_remove_extent(inode, lblk, len); es_debug("remove [%u/%u) from extent status tree of inode %lu\n", lblk, len, inode->i_ino); @@ -1969,6 +1990,9 @@ int ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk, struct extent_status newes; int err = 0; + if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) + return 0; + es_debug("add [%u/1) delayed to extent status tree of inode %lu\n", lblk, inode->i_ino); diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 7ecac922d8a4..3ccdf7834ab6 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -82,7 +82,12 @@ static int ext4_validate_inode_bitmap(struct super_block *sb, struct buffer_head *bh) { ext4_fsblk_t blk; - struct ext4_group_info *grp = ext4_get_group_info(sb, block_group); + struct ext4_group_info *grp; + + if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY) + return 0; + + grp = ext4_get_group_info(sb, block_group); if (buffer_verified(bh)) return 0; @@ -284,15 +289,17 @@ void ext4_free_inode(handle_t *handle, struct inode *inode) bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb); bitmap_bh = ext4_read_inode_bitmap(sb, block_group); /* Don't bother if the inode bitmap is corrupt. */ - grp = ext4_get_group_info(sb, block_group); if (IS_ERR(bitmap_bh)) { fatal = PTR_ERR(bitmap_bh); bitmap_bh = NULL; goto error_return; } - if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) { - fatal = -EFSCORRUPTED; - goto error_return; + if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) { + grp = ext4_get_group_info(sb, block_group); + if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) { + fatal = -EFSCORRUPTED; + goto error_return; + } } BUFFER_TRACE(bitmap_bh, "get_write_access"); @@ -884,7 +891,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, struct inode *ret; ext4_group_t i; ext4_group_t flex_group; - struct ext4_group_info *grp; + struct ext4_group_info *grp = NULL; int encrypt = 0; /* Cannot create files in a deleted directory */ @@ -1022,15 +1029,21 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, if (ext4_free_inodes_count(sb, gdp) == 0) goto next_group; - grp = ext4_get_group_info(sb, group); - /* Skip groups with already-known suspicious inode tables */ - if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) - goto next_group; + if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) { + grp = ext4_get_group_info(sb, group); + /* + * Skip groups with already-known suspicious inode + * tables + */ + if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) + goto next_group; + } brelse(inode_bitmap_bh); inode_bitmap_bh = ext4_read_inode_bitmap(sb, group); /* Skip groups with suspicious inode tables */ - if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp) || + if (((!(sbi->s_mount_state & EXT4_FC_REPLAY)) + && EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) || IS_ERR(inode_bitmap_bh)) { inode_bitmap_bh = NULL; goto next_group; @@ -1049,7 +1062,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, goto next_group; } - if (!handle) { + if ((!(sbi->s_mount_state & EXT4_FC_REPLAY)) && !handle) { BUG_ON(nblocks <= 0); handle = __ext4_journal_start_sb(dir->i_sb, line_no, handle_type, nblocks, 0, @@ -1153,9 +1166,15 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, /* Update the relevant bg descriptor fields */ if (ext4_has_group_desc_csum(sb)) { int free; - struct ext4_group_info *grp = ext4_get_group_info(sb, group); - - down_read(&grp->alloc_sem); /* protect vs itable lazyinit */ + struct ext4_group_info *grp = NULL; + + if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) { + grp = ext4_get_group_info(sb, group); + down_read(&grp->alloc_sem); /* + * protect vs itable + * lazyinit + */ + } ext4_lock_group(sb, group); /* while we modify the bg desc */ free = EXT4_INODES_PER_GROUP(sb) - ext4_itable_unused_count(sb, gdp); @@ -1171,7 +1190,8 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, if (ino > free) ext4_itable_unused_set(sb, gdp, (EXT4_INODES_PER_GROUP(sb) - ino)); - up_read(&grp->alloc_sem); + if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) + up_read(&grp->alloc_sem); } else { ext4_lock_group(sb, group); } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index b5ca07497bbc..d6e5ffce5cf7 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -508,7 +508,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, return -EFSCORRUPTED; /* Lookup extent status tree firstly */ - if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) { + if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) && + ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) { if (ext4_es_is_written(&es) || ext4_es_is_unwritten(&es)) { map->m_pblk = ext4_es_pblock(&es) + map->m_lblk - es.es_lblk; @@ -820,7 +821,8 @@ struct buffer_head *ext4_getblk(handle_t *handle, struct inode *inode, int create = map_flags & EXT4_GET_BLOCKS_CREATE; int err; - J_ASSERT(handle != NULL || create == 0); + J_ASSERT((EXT4_SB(inode->i_sb)->s_mount_state | EXT4_FC_REPLAY) + || handle != NULL || create == 0); map.m_lblk = block; map.m_len = 1; @@ -836,7 +838,8 @@ struct buffer_head *ext4_getblk(handle_t *handle, struct inode *inode, return ERR_PTR(-ENOMEM); if (map.m_flags & EXT4_MAP_NEW) { J_ASSERT(create != 0); - J_ASSERT(handle != NULL); + J_ASSERT((EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) + || (handle != NULL)); /* * Now that we do not always journal data, we should @@ -4631,8 +4634,9 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, if (!ext4_inode_csum_verify(inode, raw_inode, ei) || ext4_simulate_fail(sb, EXT4_SIM_INODE_CRC)) { - ext4_error_inode_err(inode, function, line, 0, EFSBADCRC, - "iget: checksum invalid"); + if (!(EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)) + ext4_error_inode_err(inode, function, line, 0, + EFSBADCRC, "iget: checksum invalid"); ret = -EFSBADCRC; goto bad_inode; } diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index db08208c1137..d6cdcc3e125c 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1449,14 +1449,16 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b, blocknr = ext4_group_first_block_no(sb, e4b->bd_group); blocknr += EXT4_C2B(sbi, block); - ext4_grp_locked_error(sb, e4b->bd_group, - inode ? inode->i_ino : 0, - blocknr, - "freeing already freed block " - "(bit %u); block bitmap corrupt.", - block); - ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group, + if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) { + ext4_grp_locked_error(sb, e4b->bd_group, + inode ? inode->i_ino : 0, + blocknr, + "freeing already freed block (bit %u); block bitmap corrupt.", + block); + ext4_mark_group_bitmap_corrupted( + sb, e4b->bd_group, EXT4_GROUP_INFO_BBITMAP_CORRUPT); + } mb_regenerate_buddy(e4b); goto done; } @@ -4116,6 +4118,9 @@ void ext4_discard_preallocations(struct inode *inode) return; } + if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY) + return; + mb_debug(1, "discard preallocation for inode %lu\n", inode->i_ino); trace_ext4_discard_preallocations(inode); @@ -4591,6 +4596,8 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, sb = ar->inode->i_sb; sbi = EXT4_SB(sb); + WARN_ON(sbi->s_mount_state & EXT4_FC_REPLAY); + trace_ext4_request_blocks(ar); /* Allow to use superuser reservation for quota file */ From patchwork Wed Apr 8 21:55:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268358 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=ki0QLS6m; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4l419Dz9sSj for ; Thu, 9 Apr 2020 07:56:07 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726817AbgDHV4C (ORCPT ); Wed, 8 Apr 2020 17:56:02 -0400 Received: from mail-pj1-f68.google.com ([209.85.216.68]:54846 "EHLO mail-pj1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726744AbgDHV4B (ORCPT ); Wed, 8 Apr 2020 17:56:01 -0400 Received: by mail-pj1-f68.google.com with SMTP id np9so395863pjb.4 for ; Wed, 08 Apr 2020 14:55:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GGxV/mejNxD37vT9WLknnVeLe//Orzd+TBlSuQQ1asI=; b=ki0QLS6mUCfZ/qikqVoms4pRFLbB7QI0ndTpgiJyNN4uFaMnhmTaeIU214OCuasR9p Y4N0ru0GUWOrMxdmzb+O6uyOElEwUbgIx3G9IrxoYCuVlmt976Sb6Z6Ecg6HC/A6G7sj WAid0F1IHEb6MEHFnoWpk6QEMJ17vo1wxb7i0ifVubGtqtcVMEqyniv9n3z4c8CL1snG FVkcOpDwvnQvWSruw9ZhUfAQ+umtxG0X5lQ97d+8a6zBf7tc24e1v2Yepwm1WD1hfCFt 3MTAcCwejxdHucAkkmsHnsT3/wO9ke7RNDoRgDKbvO+OOkD6FnyjGNFV38yr6USPxwjN FjNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GGxV/mejNxD37vT9WLknnVeLe//Orzd+TBlSuQQ1asI=; b=t3oZ3quNIfvfxVmtqC+V2wJX95k5W7AuzC+HB4XEvd9p73zbG5UXk5p2CmaeZKytGi zypJGbYli6RztOweEdi7IhehK5c2Rgo/R8a7oFDcUsO2yLykfy2eOlZcCIgMYakyeFHW biuwBg9eCP+V1IflUyGjwBDn2xgFGZIJNRYuhTk9zfwcQCW4nnZXIuzKrnxZy23gUHHg ZrUWiR/NfJQ7UMLk5/QtOCmMT3ydomXZkUYlLOrDZCtmDWDj7DGv69pOfEhqLpT4AtBM dqHvfMVg5uO5dQEtpwYCTBCEkNcsClLXQioebCEm9VY5jOuVgpGf0dZF6LwjIGwz8KOu kH8w== X-Gm-Message-State: AGi0PuYE84MBYbMHcvHADgktHzUjb+qzwaNswpdcPLny+FGLD4i3oFuZ 7Ys80MOk71Rr5MEMLZf8h+OxdXCy X-Google-Smtp-Source: APiQypItRXx1VxSMtbvetlDpEqsH4jTVhIAw3RhznGAm0JS2oNa7W3+Qw74eyo1tK4GFyWv4m5t/1w== X-Received: by 2002:a17:90a:24af:: with SMTP id i44mr7900511pje.136.1586382957867; Wed, 08 Apr 2020 14:55:57 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:57 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 19/20] ext4: add fast commit replay path Date: Wed, 8 Apr 2020 14:55:29 -0700 Message-Id: <20200408215530.25649-19-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Add main routine for replaying fast commit blocks. Fast commit replay routine should be idempotent; so that if we crash while replaying, we can restart from the beginning and won't result in a corrupted file system. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4.h | 9 + fs/ext4/ext4_jbd2.c | 470 ++++++++++++++++++++++++++++++++++++ include/trace/events/ext4.h | 22 ++ 3 files changed, 501 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 45b73c8bf5a6..df88e408d0bf 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1439,6 +1439,14 @@ struct ext4_super_block { #define ext4_has_strict_mode(sbi) \ (sbi->s_encoding_flags & EXT4_ENC_STRICT_MODE_FL) +/* + * Fast commit replay state. + */ +struct ext4_fc_replay_state { + int fc_replay_error; + int fc_replay_expected_off; +}; + /* * Fast commit ineligible reasons. */ @@ -1650,6 +1658,7 @@ struct ext4_sb_info { * that have data changes in them. */ struct list_head s_fc_dentry_q; + struct ext4_fc_replay_state s_fc_replay_state; spinlock_t s_fc_lock; struct ext4_fc_stats s_fc_stats; }; diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index b9b3833c8fdd..5effa1389705 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -5,6 +5,7 @@ #include "ext4_jbd2.h" #include "ext4_extents.h" +#include "mballoc.h" #include @@ -856,6 +857,18 @@ static int fc_write_data(struct inode *inode, u8 *start, u8 *end, return num_tlvs; } +/* Get length of a particular tlv */ +static int fc_tag_len(struct ext4_fc_tl *tl) +{ + return le16_to_cpu(tl->fc_len); +} + +/* Get a pointer to "value" of a tlv */ +static u8 *fc_tag_val(struct ext4_fc_tl *tl) +{ + return (u8 *)tl + sizeof(*tl); +} + static int fc_commit_data_inode(journal_t *journal, struct inode *inode) { struct ext4_fc_commit_hdr *hdr; @@ -1178,6 +1191,457 @@ static void ext4_journal_fc_cleanup_cb(journal_t *journal) trace_ext4_journal_fc_stats(sb); } +struct dentry_info_args { + int parent_ino, dname_len, ino, inode_len; + char *dname; +}; + +static int fc_replay_add_link(struct super_block *sb, struct inode *inode, + struct dentry_info_args *darg) +{ + struct inode *dir = NULL; + struct dentry *dentry_dir = NULL, *dentry_inode = NULL; + struct qstr qstr_dname = QSTR_INIT(darg->dname, darg->dname_len); + int ret = 0; + + dir = ext4_iget(sb, darg->parent_ino, EXT4_IGET_NORMAL); + if (IS_ERR(dir)) { + jbd_debug(1, "Dir with inode %d not found.", darg->parent_ino); + ret = PTR_ERR(dir); + dir = NULL; + goto out; + } + + dentry_dir = d_obtain_alias(dir); + if (IS_ERR(dentry_dir)) { + jbd_debug(1, "Failed to obtain dentry"); + ret = PTR_ERR(dentry_dir); + dentry_dir = NULL; + goto out; + } + + dentry_inode = d_alloc(dentry_dir, &qstr_dname); + if (!dentry_inode) { + jbd_debug(1, "Inode dentry not created."); + ret = -ENOMEM; + goto out; + } + + ret = __ext4_link(dir, inode, dentry_inode); + if (ret && ret != -EEXIST) { + jbd_debug(1, "Failed to link\n"); + goto out; + } + + /* + * It's possible that link already existed since data blocks + * for the dir in question got persisted before we crashed. + */ + if (ret == -EEXIST) + ret = 0; +out: + if (dentry_dir) { + d_drop(dentry_dir); + dput(dentry_dir); + } else if (dir) { + iput(dir); + } + if (dentry_inode) { + d_drop(dentry_inode); + dput(dentry_inode); + } + + return ret; +} + +static int fc_replay_create_inode(struct super_block *sb, + struct ext4_inode *raw_inode, + struct dentry_info_args *darg) +{ + int ret = 0; + struct ext4_iloc iloc; + int orig_nlink = 0; + struct inode *inode = NULL; + struct inode *dir = NULL; + + /* + * First let's setup the on-disk inode using the one found in + * the journal + */ + ret = ext4_get_fc_inode_loc(sb, darg->ino, &iloc); + if (ret) + goto out; + + orig_nlink = le16_to_cpu(ext4_raw_inode(&iloc)->i_links_count); + memcpy(ext4_raw_inode(&iloc), raw_inode, darg->inode_len); + ret = ext4_handle_dirty_metadata(NULL, NULL, iloc.bh); + if (ret) + goto out; + sync_dirty_buffer(iloc.bh); + brelse(iloc.bh); + iloc.bh = NULL; + + /* This takes care of update group descriptor and other metadata */ + ret = ext4_mark_inode_used(sb, darg->ino); + if (ret) + goto out; + + inode = ext4_iget(sb, darg->ino, EXT4_IGET_NORMAL); + if (IS_ERR(inode)) { + jbd_debug(1, "inode %d not found.", darg->ino); + ret = PTR_ERR(inode); + goto out; + } + + if (S_ISDIR(inode->i_mode)) { + dir = ext4_iget(sb, darg->parent_ino, EXT4_IGET_NORMAL); + if (IS_ERR_OR_NULL(dir)) { + iput(inode); + ret = PTR_ERR(dir); + goto out; + } + ret = ext4_init_new_dir(NULL, dir, inode); + iput(dir); + if (ret) + goto out; + } + ret = fc_replay_add_link(sb, inode, darg); + if (ret) + goto out; + set_nlink(inode, orig_nlink + 1); +out: + if (inode) + iput(inode); + if (iloc.bh) + brelse(iloc.bh); + return ret; +} + +static int fc_replay_dentries(journal_t *journal, + struct ext4_fc_commit_hdr *fc_hdr) +{ + struct dentry_info_args darg = {0}; + struct super_block *sb = journal->j_private; + struct ext4_fc_tl *tl; + __u8 *start; + int inode_len = EXT4_GOOD_OLD_INODE_SIZE; + struct inode *old_parent; + struct inode *inode; + int ret; + int i; + struct ext4_fc_dentry_info *fcd; + + if (EXT4_INODE_SIZE(sb) > EXT4_GOOD_OLD_INODE_SIZE) + inode_len += + le16_to_cpu(((struct ext4_inode *) + (fc_hdr + 1))->i_extra_isize); + tl = (struct ext4_fc_tl *)((u8 *)fc_hdr + + sizeof(struct ext4_fc_commit_hdr) + + inode_len); + start = (__u8 *)tl; + for (i = 0; i < le16_to_cpu(fc_hdr->fc_num_tlvs); i++) { + fcd = (struct ext4_fc_dentry_info *)fc_tag_val(tl); + + darg.parent_ino = le32_to_cpu(fcd->fc_parent_ino); + darg.ino = le32_to_cpu(fcd->fc_ino); + darg.dname = fcd->fc_dname; + darg.dname_len = fc_tag_len(tl) - + sizeof(struct ext4_fc_dentry_info); + if (le16_to_cpu(tl->fc_tag) == EXT4_FC_TAG_ADD_DENTRY) { + inode = ext4_iget(sb, darg.ino, EXT4_IGET_NORMAL); + if (IS_ERR_OR_NULL(inode)) { + jbd_debug(1, "Inode not found."); + return PTR_ERR(inode); + } + ret = fc_replay_add_link(sb, inode, &darg); + iput(inode); + if (ret) + return ret; + } else if (le16_to_cpu(tl->fc_tag) == EXT4_FC_TAG_DEL_DENTRY) { + const struct qstr entry = { + .name = darg.dname, + .len = darg.dname_len + }; + inode = ext4_iget(sb, darg.ino, EXT4_IGET_NORMAL); + + if (IS_ERR_OR_NULL(inode)) + return -ECANCELED; + + old_parent = ext4_iget(sb, darg.parent_ino, + EXT4_IGET_NORMAL); + if (IS_ERR_OR_NULL(old_parent)) { + iput(inode); + return -ECANCELED; + } + + ret = __ext4_unlink(old_parent, &entry, inode); + /* -ENOENT ok coz it might not exist anymore. */ + if (ret == -ENOENT) + ret = 0; + iput(old_parent); + iput(inode); + if (ret) + return ret; + } else if (le16_to_cpu(tl->fc_tag) == + EXT4_FC_TAG_CREAT_DENTRY) { + darg.inode_len = inode_len; + ret = fc_replay_create_inode( + sb, (struct ext4_inode *)(fc_hdr + 1), &darg); + if (ret) { + jbd_debug(1, "Failed to create ext4 inode."); + return ret; + } + } + tl = (struct ext4_fc_tl *)((__u8 *)tl + + le16_to_cpu(tl->fc_len) + + sizeof(*tl)); + } + return 0; +} + +static int ext4_journal_fc_replay_scan(journal_t *journal, + struct buffer_head *bh, int off) +{ + struct super_block *sb = journal->j_private; + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_fc_replay_state *state; + struct ext4_fc_commit_hdr *fc_hdr; + __u32 csum, old_csum; + __u8 *start, *end; + + state = &sbi->s_fc_replay_state; + fc_hdr = (struct ext4_fc_commit_hdr *) + ((__u8 *)bh->b_data + sizeof(journal_header_t)); + + start = (u8 *)fc_hdr; + end = (__u8 *)bh->b_data + journal->j_blocksize; + + /* Check if we already concluded that this fast commit is not useful */ + if (state->fc_replay_expected_off && state->fc_replay_error) + goto out_err; + + if (le32_to_cpu(fc_hdr->fc_magic) != EXT4_FC_MAGIC) { + state->fc_replay_error = -ENOENT; + goto out_err; + } + + if (off != state->fc_replay_expected_off) { + state->fc_replay_error = -EFSCORRUPTED; + goto out_err; + } + + state->fc_replay_expected_off++; + + if (le16_to_cpu(fc_hdr->fc_features)) { + state->fc_replay_error = -EOPNOTSUPP; + goto out_err; + } + + old_csum = fc_hdr->fc_csum; + fc_hdr->fc_csum = 0; + csum = ext4_chksum(sbi, 0, start, end - start); + fc_hdr->fc_csum = old_csum; + + if (csum != le32_to_cpu(fc_hdr->fc_csum)) { + state->fc_replay_error = -EFSBADCRC; + goto out_err; + } + + trace_ext4_journal_fc_replay_scan(sb, state->fc_replay_error, off); + return 0; + +out_err: + trace_ext4_journal_fc_replay_scan(sb, state->fc_replay_error, off); + return state->fc_replay_error; +} + +static int fc_add_range(struct inode *inode, struct ext4_extent *ex) +{ + struct ext4_extent newex; + ext4_lblk_t start, cur; + int remaining, len; + ext4_fsblk_t start_pblk; + struct ext4_map_blocks map; + struct ext4_ext_path *path = NULL; + int ret; + + start = le32_to_cpu(ex->ee_block); + start_pblk = ext4_ext_pblock(ex); + len = ext4_ext_get_actual_len(ex); + + cur = start; + remaining = len; + + jbd_debug(1, "Adding extent %ld:%ld to inode %ld\n", + start, len, inode->i_ino); + + while (remaining > 0) { + map.m_lblk = cur; + map.m_len = remaining; + + ret = ext4_map_blocks(NULL, inode, &map, 0); + if (ret < 0) + return -ECANCELED; + if (ret > 0) { + if (!!(ext4_ext_is_unwritten(ex)) == + !!(map.m_flags & EXT4_MAP_UNWRITTEN)) { + remaining -= ret; + cur += ret; + ext4_mb_mark_used(inode->i_sb, + ext4_ext_pblock(ex), + map.m_len); + continue; + } + + /* handle change of state */ + map.m_lblk = cur; + map.m_len = ret; + map.m_flags = 0; + ret = ext4_map_blocks( + NULL, inode, &map, + EXT4_GET_BLOCKS_IO_CONVERT_EXT); + if (ret <= 0) + return -ECANCELED; + remaining -= ret; + cur += ret; + } else if (ret == 0) { + path = ext4_find_extent(inode, cur, NULL, 0); + if (!path) + continue; + memset(&newex, 0, sizeof(newex)); + newex.ee_block = cpu_to_le32(cur); + ext4_ext_store_pblock( + &newex, start_pblk + cur - start); + newex.ee_len = cpu_to_le16(map.m_len); + if (ext4_ext_is_unwritten(ex)) + ext4_ext_mark_unwritten(&newex); + down_write(&EXT4_I(inode)->i_data_sem); + + ret = ext4_ext_insert_extent( + NULL, inode, &path, &newex, 0); + ext4_mb_mark_used( + inode->i_sb, ext4_ext_pblock(&newex), + map.m_len); + up_write((&EXT4_I(inode)->i_data_sem)); + kfree(path); + if (ret) + return -ECANCELED; + cur += map.m_len; + remaining -= map.m_len; + } + } + return 0; +} + +static int ext4_journal_fc_replay_cb(journal_t *journal, struct buffer_head *bh, + enum passtype pass, int off) +{ + struct super_block *sb = journal->j_private; + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_fc_commit_hdr *fc_hdr; + struct ext4_fc_tl *tl; + struct ext4_iloc iloc; + struct ext4_extent *ex; + struct ext4_fc_lrange *lrange; + struct inode *inode; + + int i, ret; + int inode_len = EXT4_GOOD_OLD_INODE_SIZE; + + if (pass == PASS_SCAN) + return ext4_journal_fc_replay_scan(journal, bh, off); + + if (sbi->s_fc_replay_state.fc_replay_error) { + jbd_debug(1, "FC replay error set = %d\n", + sbi->s_fc_replay_state.fc_replay_error); + return sbi->s_fc_replay_state.fc_replay_error; + } + + sbi->s_mount_state |= EXT4_FC_REPLAY; + fc_hdr = (struct ext4_fc_commit_hdr *) + ((__u8 *)bh->b_data + sizeof(journal_header_t)); + + jbd_debug(3, "%s: Got FC block for inode %d at [%d,%d]", __func__, + le32_to_cpu(fc_hdr->fc_ino), + be32_to_cpu(((journal_header_t *)bh->b_data)->h_sequence)); + + if (EXT4_INODE_SIZE(sb) > EXT4_GOOD_OLD_INODE_SIZE) + inode_len += le16_to_cpu(((struct ext4_inode *) + (fc_hdr + 1))->i_extra_isize); + + ret = fc_replay_dentries(journal, fc_hdr); + + inode = ext4_iget(sb, le32_to_cpu(fc_hdr->fc_ino), EXT4_IGET_NORMAL); + if (IS_ERR(inode)) + return 0; + + ret = ext4_get_inode_loc(inode, &iloc); + if (ret) + return ret; + + inode_lock(inode); + tl = (struct ext4_fc_tl *)((u8 *)fc_hdr + + sizeof(struct ext4_fc_commit_hdr) + + inode_len); + for (i = 0; i < le16_to_cpu(fc_hdr->fc_num_tlvs); i++) { + switch (le16_to_cpu(tl->fc_tag)) { + case EXT4_FC_TAG_ADD_RANGE: + ex = (struct ext4_extent *)(tl + 1); + ret = fc_add_range(inode, ex); + break; + case EXT4_FC_TAG_DEL_RANGE: + lrange = (struct ext4_fc_lrange *)(tl + 1); + inode_unlock(inode); + ret = ext4_punch_hole(inode, + le32_to_cpu(lrange->fc_lblk) << + sb->s_blocksize_bits, + le32_to_cpu(lrange->fc_len) << + sb->s_blocksize_bits); + inode_lock(inode); + break; + case EXT4_FC_TAG_ADD_DENTRY: + break; + default: + jbd_debug(1, "Unknown tag found.\n"); + } + tl = (struct ext4_fc_tl *)((__u8 *)tl + + le16_to_cpu(tl->fc_len) + + sizeof(*tl)); + } + ext4_reserve_inode_write(NULL, inode, &iloc); + inode_unlock(inode); + + /* + * Unless inode contains inline data, copy everything except + * i_blocks. i_blocks would have been set alright by ext4_fc_add_block + * call above. + */ + if (ext4_has_inline_data(inode)) { + memcpy(ext4_raw_inode(&iloc), fc_hdr + 1, inode_len); + } else { + memcpy(ext4_raw_inode(&iloc), fc_hdr + 1, + offsetof(struct ext4_inode, i_block)); + memcpy(&ext4_raw_inode(&iloc)->i_generation, + &((struct ext4_inode *)(fc_hdr + 1))->i_generation, + inode_len - + offsetof(struct ext4_inode, i_generation)); + } + inode->i_generation = le32_to_cpu(ext4_raw_inode(&iloc)->i_generation); + ext4_reset_inode_seed(inode); + + ext4_inode_csum_set(inode, ext4_raw_inode(&iloc), EXT4_I(inode)); + ret = ext4_handle_dirty_metadata(NULL, NULL, iloc.bh); + sync_dirty_buffer(iloc.bh); + brelse(iloc.bh); + iput(inode); + if (!ret) + ret = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL); + + sbi->s_mount_state &= ~EXT4_FC_REPLAY; + + return ret; +} + int ext4_fc_perform_hard_commit(journal_t *journal) { struct super_block *sb = (struct super_block *)(journal->j_private); @@ -1357,6 +1821,12 @@ int ext4_fc_async_commit_inode(journal_t *journal, tid_t commit_tid, void ext4_init_fast_commit(struct super_block *sb, journal_t *journal) { + /* + * We set replay callback even if fast commit disabled because we may + * could still have fast commit blocks that need to be replayed even if + * fast commit has now been turned off. + */ + journal->j_fc_replay_callback = ext4_journal_fc_replay_cb; if (!ext4_should_fast_commit(sb)) return; journal->j_fc_cleanup_callback = ext4_journal_fc_cleanup_cb; diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 8f31fd427ccc..22bacf860de8 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2736,6 +2736,28 @@ TRACE_EVENT(ext4_error, __entry->function, __entry->line) ); +TRACE_EVENT(ext4_journal_fc_replay_scan, + TP_PROTO(struct super_block *sb, int error, int off), + + TP_ARGS(sb, error, off), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int, error) + __field(int, off) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + __entry->error = error; + __entry->off = off; + ), + + TP_printk("FC scan pass on dev %d,%d: error %d, off %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->error, __entry->off) +); + TRACE_EVENT(ext4_journal_fc_commit_cb_start, TP_PROTO(struct super_block *sb), From patchwork Wed Apr 8 21:55:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1268356 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=F5fjFCXq; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48yJ4g1WXpz9sSc for ; Thu, 9 Apr 2020 07:56:03 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726813AbgDHV4B (ORCPT ); Wed, 8 Apr 2020 17:56:01 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:46001 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726796AbgDHV4A (ORCPT ); Wed, 8 Apr 2020 17:56:00 -0400 Received: by mail-pg1-f196.google.com with SMTP id 128so1337439pge.12 for ; Wed, 08 Apr 2020 14:55:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/tbb09dhmKoGd36xgDiwMId5I/xfne1fhL5wT8HkB5g=; b=F5fjFCXqUfTwvvylK/JzzRSkMbKUrj2NsFyciheiGvvL0yRo0CY4vfh1BjQ05Gv8+Z bGSBNLNkQYAGFbux0YaQ8HxcITML5BaxO/nfgENanGhMNuTgaCidiyOiTczrHumRYk8j /4uRxm80UuQJqAqRwMuTeJSSxPEvJ2ndU/HCDLVj40ysVOdii+iLzIXSbmFBGE/jdH0B XKWel0si8ev0Iw41Z4lAn6q/rk357rg1Ucr7rrYVnE/zfaTWGIF6t2jfKubGE1at2QJI karjBhhjC4kJiNSNV7qgqSMEldugn/2TB/nYiNfrtynaKzlJOXTBV+IL+vFnvhvBy/E9 SVfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/tbb09dhmKoGd36xgDiwMId5I/xfne1fhL5wT8HkB5g=; b=SPgJrPRBjJnCWvhDIpHHG1tGQrlZ8WjpjkbMvoc24OCE/EE3XN4CgmpEyW7If1Grq+ pe39GALdIE4h1E1r/Y3SUmI8TW3I7lZklFQFMwugsaUYWFVDAlKT8RzZDJemHxKOIlYI Y17ixGeqoSNFmulwC3O+5nxF2cp9Zx5KFxTgExBbxqYqLnYWm36dCajsyZCito0awzA7 R0dgS5B+1W23rOMRly77S82wYTwLnO5t5wDVCEgBec4peXYBdbYLUoe6JIGUAzRG7dgh k53ZxuXvx/jSthTyDbaeG8jAMGZiK76Cmn27nYWj3oaa2imSME9N2QXpS3GMPvyJv51J zWLw== X-Gm-Message-State: AGi0PuaKBMvYSQQ4/geaMhb31MRdDMppmmNuVYG+Vm3CNNS60gUc1zM7 lK8dx03JSNR3O1TXUvbetBRqiVNS X-Google-Smtp-Source: APiQypJRlb8AHOM86qKV3lgY6X16IQNEv3qJuFm4peL0N7LxJ/yphvqCCUrF2ublNA4X9bWD60GHSg== X-Received: by 2002:aa7:9e4d:: with SMTP id z13mr10056552pfq.6.1586382958444; Wed, 08 Apr 2020 14:55:58 -0700 (PDT) Received: from harshads-520.kir.corp.google.com ([2620:15c:17:10:6271:607:aca0:b6f7]) by smtp.googlemail.com with ESMTPSA id z7sm450929pju.37.2020.04.08.14.55.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Apr 2020 14:55:58 -0700 (PDT) From: Harshad Shirwadkar X-Google-Original-From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: tytso@mit.edu, Harshad Shirwadkar Subject: [PATCH v6 20/20] ext4: add debug mount option to test fast commit replay Date: Wed, 8 Apr 2020 14:55:30 -0700 Message-Id: <20200408215530.25649-20-harshads@google.com> X-Mailer: git-send-email 2.26.0.110.g2183baf09c-goog In-Reply-To: <20200408215530.25649-1-harshads@google.com> References: <20200408215530.25649-1-harshads@google.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Harshad Shirwadkar Add a debug mount option to simulate errors while replaying. If fc_debug_max_replay is set, ext4 will replay only as many fast commit blocks as passed as an argument. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4.h | 3 +++ fs/ext4/ext4_jbd2.c | 6 ++++++ fs/ext4/super.c | 12 +++++++++++- 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index df88e408d0bf..72e8b7078e77 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1660,6 +1660,9 @@ struct ext4_sb_info { struct list_head s_fc_dentry_q; struct ext4_fc_replay_state s_fc_replay_state; spinlock_t s_fc_lock; +#ifdef EXT4_FC_DEBUG + int s_fc_debug_max_replay; +#endif struct ext4_fc_stats s_fc_stats; }; diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 5effa1389705..f1865a97f6f8 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -1557,6 +1557,12 @@ static int ext4_journal_fc_replay_cb(journal_t *journal, struct buffer_head *bh, return sbi->s_fc_replay_state.fc_replay_error; } +#ifdef EXT4_FC_DEBUG + if (sbi->s_fc_debug_max_replay && off >= sbi->s_fc_debug_max_replay) { + pr_warn("Dropping fc block %d because max_replay set\n", off); + return -EINVAL; + } +#endif sbi->s_mount_state |= EXT4_FC_REPLAY; fc_hdr = (struct ext4_fc_commit_hdr *) ((__u8 *)bh->b_data + sizeof(journal_header_t)); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 16548b0cbe71..995c61d19327 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1539,7 +1539,7 @@ enum { Opt_dioread_nolock, Opt_dioread_lock, Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable, Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache, - Opt_no_fc, Opt_fc_soft_consistency + Opt_no_fc, Opt_fc_soft_consistency, Opt_fc_debug_max_replay }; static const match_table_t tokens = { @@ -1625,6 +1625,9 @@ static const match_table_t tokens = { {Opt_noinit_itable, "noinit_itable"}, {Opt_no_fc, "no_fc"}, {Opt_fc_soft_consistency, "fc_soft_consistency"}, +#ifdef EXT4_FC_DEBUG + {Opt_fc_debug_max_replay, "fc_debug_max_replay=%u"}, +#endif {Opt_max_dir_size_kb, "max_dir_size_kb=%u"}, {Opt_test_dummy_encryption, "test_dummy_encryption"}, {Opt_nombcache, "nombcache"}, @@ -1844,6 +1847,9 @@ static const struct mount_opts { MOPT_CLEAR | MOPT_2 | MOPT_EXT4_ONLY}, {Opt_fc_soft_consistency, EXT4_MOUNT2_JOURNAL_FC_SOFT_CONSISTENCY, MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY}, +#ifdef EXT4_FC_DEBUG + {Opt_fc_debug_max_replay, 0, MOPT_GTE0}, +#endif {Opt_err, 0, 0} }; @@ -2003,6 +2009,10 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token, sbi->s_li_wait_mult = arg; } else if (token == Opt_max_dir_size_kb) { sbi->s_max_dir_size_kb = arg; +#ifdef EXT4_FC_DEBUG + } else if (token == Opt_fc_debug_max_replay) { + sbi->s_fc_debug_max_replay = arg; +#endif } else if (token == Opt_stripe) { sbi->s_stripe = arg; } else if (token == Opt_resuid) {