{"id":2211743,"url":"http://patchwork.ozlabs.org/api/1.2/patches/2211743/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-ext4/patch/20260317084624.457185-2-me@linux.beauty/","project":{"id":8,"url":"http://patchwork.ozlabs.org/api/1.2/projects/8/?format=json","name":"Linux ext4 filesystem development","link_name":"linux-ext4","list_id":"linux-ext4.vger.kernel.org","list_email":"linux-ext4@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20260317084624.457185-2-me@linux.beauty>","list_archive_url":null,"date":"2026-03-17T08:46:16","name":"[RFC,v5,1/7] ext4: fast commit: snapshot inode state before writing log","commit_ref":null,"pull_url":null,"state":"superseded","archived":false,"hash":"a860899ec5be43047e5172c7d55ddb43647de4f3","submitter":{"id":84264,"url":"http://patchwork.ozlabs.org/api/1.2/people/84264/?format=json","name":"Li Chen","email":"me@linux.beauty"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/linux-ext4/patch/20260317084624.457185-2-me@linux.beauty/mbox/","series":[{"id":496215,"url":"http://patchwork.ozlabs.org/api/1.2/series/496215/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-ext4/list/?series=496215","date":"2026-03-17T08:46:15","name":"ext4: fast commit: snapshot inode state for FC log","version":5,"mbox":"http://patchwork.ozlabs.org/series/496215/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/2211743/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/2211743/checks/","tags":{},"related":[],"headers":{"Return-Path":"\n <SRS0=ieh2=BR=vger.kernel.org=linux-ext4+bounces-15098-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=linux.beauty header.i=me@linux.beauty\n header.a=rsa-sha256 header.s=zmail header.b=DPHeUuyX;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=150.107.74.76; helo=mail.ozlabs.org;\n envelope-from=srs0=ieh2=br=vger.kernel.org=linux-ext4+bounces-15098-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=\"2600:3c04:e001:36c::12fc:5321\"\n arc.chain=\"subspace.kernel.org:zohomail.com\"","gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=linux.beauty","gandalf.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=linux.beauty header.i=me@linux.beauty\n header.a=rsa-sha256 header.s=zmail header.b=DPHeUuyX;\n\tdkim-atps=neutral","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c04:e001:36c::12fc:5321; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15098-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty\n header.b=\"DPHeUuyX\"","smtp.subspace.kernel.org;\n arc=pass smtp.client-ip=136.143.188.15","smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=linux.beauty","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=linux.beauty"],"Received":["from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1 raw public key)\n server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fZlyD6vcrz1xqn\n\tfor <incoming@patchwork.ozlabs.org>; Tue, 17 Mar 2026 19:49:40 +1100 (AEDT)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fZlyD6cyPz4w9T\n\tfor <incoming@patchwork.ozlabs.org>; Tue, 17 Mar 2026 19:49:40 +1100 (AEDT)","by gandalf.ozlabs.org (Postfix)\n\tid 4fZlyD6Rfkz4wCm; Tue, 17 Mar 2026 19:49:40 +1100 (AEDT)","from tor.lore.kernel.org (tor.lore.kernel.org\n [IPv6:2600:3c04:e001:36c::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fZly93cyyz4w9T\n\tfor <patchwork-incoming@ozlabs.org>; Tue, 17 Mar 2026 19:49:37 +1100 (AEDT)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id 90AB130347AC\n\tfor <patchwork-incoming@ozlabs.org>; Tue, 17 Mar 2026 08:49:17 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id E565A28A1E6;\n\tTue, 17 Mar 2026 08:49:14 +0000 (UTC)","from sender4-op-o15.zoho.com (sender4-op-o15.zoho.com\n [136.143.188.15])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id C578A34DB4B;\n\tTue, 17 Mar 2026 08:49:12 +0000 (UTC)","by mx.zohomail.com with SMTPS id 1773737201936794.0160351190276;\n\tTue, 17 Mar 2026 01:46:41 -0700 (PDT)"],"ARC-Seal":["i=3; a=rsa-sha256; d=ozlabs.org; s=201707; t=1773737380; cv=pass;\n\tb=DE2UoO0Rppmgn3L7gp9yF7s6UkJBJXgsQd6O8W6ECiSWYVR9cd2U5H8h4gdD31o0qIOk9c9qeAku49R659QK2ofsWTktE3EdpJL9Cr+mZa6JvQNblGmjUzjKLiBzhhaaH3GHxgOoSsBHxxDjGFG0hmVYG0m43aG5oydMaoaKuySWvNjPE/7/IFOsg+jbsjfCHA93lw8MPkbnCfQvyyyhu3xe0qCObzzq+XsskSUOOqFvPfRiMBhhFPqZfRwEaRNFXKWDNxp7iUMlY5Vaq/0S8XMd/ehM3CIMcWBVMMsjCaaisemUaOlFbV5D4XM5Na+Hnplttz2qLhUfIH/tEdbpRw==","i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1773737354; cv=pass;\n b=nxFyz3fWyIXQb35r7sUkbUG4e2X8x/oSA9md/uHvZZ9nKA7fS3iKHuN7hdRTf8kGcQySqZenLUyzL6I4Rz6gyxFNMd34eTWJURjLotmoEN/Al34G4bnDWw3UCmtKzeMlR+QifN3OhEgHfe2ogw3U+3yu8FAZ1i3cl0UhMiGCLpg=","i=1; a=rsa-sha256; t=1773737203; cv=none;\n\td=zohomail.com; s=zohoarc;\n\tb=PXI+XJ03V1M0x1QBzMT2ZVJJ8eyWbrJ3ZOjmOr+ChfRh9MX29ElMbrbg0I/DaFHvzeb8MvWbCnPPVZpwuXe6GHAl6tdDHPaeVfnZwGy5BTwYWFHvjaxy0AaWD5g67/5udfsgTN6rtWSPV/oSBHbMoZNlgT/kkiOkS11BdGK5g+A="],"ARC-Message-Signature":["i=3; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1773737380; c=relaxed/relaxed;\n\tbh=tL/9hXvNCUQaU/c3iyjpe5tzJGEQNbiFXrrzi7CN0HU=;\n\th=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:\n\t MIME-Version;\n b=IsLIHFsk6FzV20klebrM4SmasvT1AifYgDb4ISCGhSF1L1K9UNWBdk6ZOr0jBJUncNpmSfdKdueStgwPZ2c3lIPKlJaPPLtnrTPQUHQfVwTSUsmUTvR29NJpXSgUTXgwUepnwpnxbkypdmBt2NWQI04WwLlsQIH8SUV5IZ5VosDu+7hj1HRHfZ6+p5n6Q7uoJFBijT9hIK6Tm3jVWqov5lzIUs9nF69C8mJbZ9+qQ3vPO8uF/O+7SOQ3hGim+ZP1sJsGRVUGGEhahTeU6WauTgTFLD9r9qazJXOSzpuvDTcdnabhjLUeHyo1y3Y+jQ15rskqOjQVJaqLCVyUx4svpA==","i=2; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1773737354; c=relaxed/simple;\n\tbh=2QQnnYAFsnVBSeKVvXgvRx2UHnhO/USOL8fHAb5/tgQ=;\n\th=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:\n\t MIME-Version;\n b=nVxnLjeN0gmI4BqtBHe3u4G/7B4AzWDFlLv3ZQuTgxqC+ffyogsR7PZPAUTPZUojTy51qhb41JETsZTqabgHSPfbY7+GGvwswCwvK0SD6ha2/f+DUZm4HAUW8YfX/1R8x9AnWOa5uTJI2iDnv6idPDNw5K57lsP9A9YJpG8mBQw=","i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com;\n s=zohoarc;\n\tt=1773737203;\n h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:Subject:To:To:Message-Id:Reply-To;\n\tbh=tL/9hXvNCUQaU/c3iyjpe5tzJGEQNbiFXrrzi7CN0HU=;\n\tb=PydlSFv8DyzVhV0+OXaxOd8mPsmPGPZ9xEHP4WosJ96xSyJULWiFIna0iQoeLQSDB/xeMSjFeKaAb9XGMbM1QnJwFvu6c+PBXE4p5KoRvkC92mUPvEobACxtNCsO7G5k4zIlVVG5UvSQURewLv2NsfFj6AXq4aaX2sa8az+sV5Q="],"ARC-Authentication-Results":["i=3; gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=linux.beauty;\n dkim=pass (1024-bit key;\n unprotected) header.d=linux.beauty header.i=me@linux.beauty\n header.a=rsa-sha256 header.s=zmail header.b=DPHeUuyX; dkim-atps=neutral;\n spf=pass (client-ip=2600:3c04:e001:36c::12fc:5321; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15098-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=2; smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=linux.beauty;\n spf=pass smtp.mailfrom=linux.beauty;\n dkim=pass (1024-bit key) header.d=linux.beauty header.i=me@linux.beauty\n header.b=DPHeUuyX; arc=pass smtp.client-ip=136.143.188.15","i=1; mx.zohomail.com;\n\tdkim=pass  header.i=linux.beauty;\n\tspf=pass  smtp.mailfrom=me@linux.beauty;\n\tdmarc=pass header.from=<me@linux.beauty>"],"DKIM-Signature":"v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1773737203;\n\ts=zmail; d=linux.beauty; i=me@linux.beauty;\n\th=From:From:To:To:Cc:Cc:Subject:Subject:Date:Date:Message-ID:In-Reply-To:References:MIME-Version:Content-Transfer-Encoding:Message-Id:Reply-To;\n\tbh=tL/9hXvNCUQaU/c3iyjpe5tzJGEQNbiFXrrzi7CN0HU=;\n\tb=DPHeUuyX+VM3Ab0naXEIiooTWHNfHh4g8QJQ/2rhUVUHFckio8gg9bJ3XvxH8fQo\n\tVr9KMzI2ThZRQL0xtHGGAKJC6UrwrnHR8jEN/MewtI1BOeoYoIjO/vqrFHv4XTexb96\n\tNiW/wxzKv5GCf81PVAfZt257VPT+CR3F9jyBGItA=","From":"Li Chen <me@linux.beauty>","To":"Zhang Yi <yi.zhang@huaweicloud.com>,\n\tTheodore Ts'o <tytso@mit.edu>,\n\tAndreas Dilger <adilger.kernel@dilger.ca>,\n\tlinux-ext4@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org","Cc":"Steven Rostedt <rostedt@goodmis.org>,\n\tMasami Hiramatsu <mhiramat@kernel.org>,\n\tMathieu Desnoyers <mathieu.desnoyers@efficios.com>,\n\tlinux-trace-kernel@vger.kernel.org,\n\tLi Chen <me@linux.beauty>","Subject":"[RFC v5 1/7] ext4: fast commit: snapshot inode state before writing\n log","Date":"Tue, 17 Mar 2026 16:46:16 +0800","Message-ID":"<20260317084624.457185-2-me@linux.beauty>","X-Mailer":"git-send-email 2.53.0","In-Reply-To":"<20260317084624.457185-1-me@linux.beauty>","References":"<20260317084624.457185-1-me@linux.beauty>","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Transfer-Encoding":"8bit","X-ZohoMailClient":"External","X-Spam-Status":"No, score=-1.2 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,\n\tHEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,\n\tSPF_PASS autolearn=disabled version=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"},"content":"Fast commit writes inode metadata and data range updates after unlocking\njournal updates. New handles can start at that point, so the log writing\npath must not look at live inode state.\n\nAdd a commit-time per-inode snapshot and populate it while journal updates\nare locked and existing handles are drained. Store the snapshot behind\next4_inode_info->i_fc_snap so ext4_inode_info only grows by one pointer.\nThe snapshot contains a copy of the on-disk inode plus the data range\nrecords needed for fast commit TLVs.\n\nSnapshotting runs under jbd2_journal_lock_updates(). Avoid triggering I/O\nthere by using ext4_get_inode_loc_noio() and falling back to full commit\nif the inode table block is not present or not uptodate.\n\nLog writing then only serializes the snapshot, so it no longer needs to\ncall ext4_map_blocks() and take i_data_sem under s_fc_lock. The snapshot\nis installed and freed under s_fc_lock and is released from fast commit\ncleanup and inode eviction.\n\nSigned-off-by: Li Chen <me@linux.beauty>\n---\n fs/ext4/ext4.h        |  22 ++-\n fs/ext4/fast_commit.c | 330 +++++++++++++++++++++++++++++++++++-------\n fs/ext4/inode.c       |  51 +++++++\n 3 files changed, 351 insertions(+), 52 deletions(-)","diff":"diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h\nindex 1524276aeac79..bd30c24d4f948 100644\n--- a/fs/ext4/ext4.h\n+++ b/fs/ext4/ext4.h\n@@ -1033,6 +1033,7 @@ enum {\n \tI_DATA_SEM_EA\n };\n \n+struct ext4_fc_inode_snap;\n \n /*\n  * fourth extended file system inode data in memory\n@@ -1089,6 +1090,22 @@ struct ext4_inode_info {\n \t/* End of lblk range that needs to be committed in this fast commit */\n \text4_lblk_t i_fc_lblk_len;\n \n+\t/*\n+\t * Commit-time fast commit snapshots.\n+\t *\n+\t * i_fc_snap is installed and freed under sbi->s_fc_lock. The fast\n+\t * commit log writing path reads the snapshot under sbi->s_fc_lock while\n+\t * serializing fast commit TLVs.\n+\t *\n+\t * The snapshot lifetime is bounded by EXT4_STATE_FC_COMMITTING and the\n+\t * corresponding cleanup / eviction paths.\n+\t *\n+\t * i_fc_snap points to per-inode snapshot data for fast commit:\n+\t * - a raw inode snapshot for EXT4_FC_TAG_INODE\n+\t * - data range records for EXT4_FC_TAG_{ADD,DEL}_RANGE\n+\t */\n+\tstruct ext4_fc_inode_snap *i_fc_snap;\n+\n \tspinlock_t i_raw_lock;\t/* protects updates to the raw inode */\n \n \t/* Fast commit wait queue for this inode */\n@@ -3093,8 +3110,9 @@ extern int  ext4_file_getattr(struct mnt_idmap *, const struct path *,\n \t\t\t      struct kstat *, u32, unsigned int);\n extern void ext4_dirty_inode(struct inode *, int);\n extern int ext4_change_inode_journal_flag(struct inode *, int);\n-extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *);\n-extern int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,\n+int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc);\n+int ext4_get_inode_loc_noio(struct inode *inode, struct ext4_iloc *iloc);\n+int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,\n \t\t\t  struct ext4_iloc *iloc);\n extern int ext4_inode_attach_jinode(struct inode *inode);\n extern int ext4_can_truncate(struct inode *inode);\ndiff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c\nindex 5bd57d7f921b9..d5c28304e8181 100644\n--- a/fs/ext4/fast_commit.c\n+++ b/fs/ext4/fast_commit.c\n@@ -55,21 +55,23 @@\n  *     deleted while it is being flushed.\n  * [2] Flush data buffers to disk and clear \"EXT4_STATE_FC_FLUSHING_DATA\"\n  *     state.\n- * [3] Lock the journal by calling jbd2_journal_lock_updates. This ensures that\n- *     all the exsiting handles finish and no new handles can start.\n- * [4] Mark all the fast commit eligible inodes as undergoing fast commit\n- *     by setting \"EXT4_STATE_FC_COMMITTING\" state.\n- * [5] Unlock the journal by calling jbd2_journal_unlock_updates. This allows\n+ * [3] Lock the journal by calling jbd2_journal_lock_updates(). This ensures\n+ *     that all the existing handles finish and no new handles can start.\n+ * [4] Mark all the fast commit eligible inodes as undergoing fast commit by\n+ *     setting \"EXT4_STATE_FC_COMMITTING\" state, and snapshot the inode state\n+ *     needed for log writing.\n+ * [5] Unlock the journal by calling jbd2_journal_unlock_updates(). This allows\n  *     starting of new handles. If new handles try to start an update on\n  *     any of the inodes that are being committed, ext4_fc_track_inode()\n  *     will block until those inodes have finished the fast commit.\n  * [6] Commit all the directory entry updates in the fast commit space.\n- * [7] Commit all the changed inodes in the fast commit space and clear\n- *     \"EXT4_STATE_FC_COMMITTING\" for these inodes.\n+ * [7] Commit all the changed inodes in the fast commit space.\n  * [8] Write tail tag (this tag ensures the atomicity, please read the following\n  *     section for more details).\n+ * [9] Clear \"EXT4_STATE_FC_COMMITTING\" and wake up waiters in\n+ *     ext4_fc_cleanup().\n  *\n- * All the inode updates must be enclosed within jbd2_jounrnal_start()\n+ * All the inode updates must be enclosed within jbd2_journal_start()\n  * and jbd2_journal_stop() similar to JBD2 journaling.\n  *\n  * Fast Commit Ineligibility\n@@ -199,6 +201,8 @@ static void ext4_end_buffer_io_sync(struct buffer_head *bh, int uptodate)\n \tunlock_buffer(bh);\n }\n \n+static void ext4_fc_free_inode_snap(struct inode *inode);\n+\n static inline void ext4_fc_reset_inode(struct inode *inode)\n {\n \tstruct ext4_inode_info *ei = EXT4_I(inode);\n@@ -215,6 +219,7 @@ void ext4_fc_init_inode(struct inode *inode)\n \text4_clear_inode_state(inode, EXT4_STATE_FC_COMMITTING);\n \tINIT_LIST_HEAD(&ei->i_fc_list);\n \tINIT_LIST_HEAD(&ei->i_fc_dilist);\n+\tei->i_fc_snap = NULL;\n \tinit_waitqueue_head(&ei->i_fc_wait);\n }\n \n@@ -240,6 +245,7 @@ void ext4_fc_del(struct inode *inode)\n \n \talloc_ctx = ext4_fc_lock(inode->i_sb);\n \tif (list_empty(&ei->i_fc_list) && list_empty(&ei->i_fc_dilist)) {\n+\t\text4_fc_free_inode_snap(inode);\n \t\text4_fc_unlock(inode->i_sb, alloc_ctx);\n \t\treturn;\n \t}\n@@ -281,6 +287,7 @@ void ext4_fc_del(struct inode *inode)\n \t\t}\n \t\tfinish_wait(wq, &wait.wq_entry);\n \t}\n+\text4_fc_free_inode_snap(inode);\n \tlist_del_init(&ei->i_fc_list);\n \n \t/*\n@@ -845,6 +852,21 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,\n \treturn true;\n }\n \n+struct ext4_fc_range {\n+\tstruct list_head list;\n+\tu16 tag;\n+\text4_lblk_t lblk;\n+\text4_lblk_t len;\n+\text4_fsblk_t pblk;\n+\tbool unwritten;\n+};\n+\n+struct ext4_fc_inode_snap {\n+\tstruct list_head data_list;\n+\tunsigned int inode_len;\n+\tu8 inode_buf[];\n+};\n+\n /*\n  * Writes inode in the fast commit space under TLV with tag @tag.\n  * Returns 0 on success, error on failure.\n@@ -852,21 +874,21 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,\n static int ext4_fc_write_inode(struct inode *inode, u32 *crc)\n {\n \tstruct ext4_inode_info *ei = EXT4_I(inode);\n-\tint inode_len = EXT4_GOOD_OLD_INODE_SIZE;\n-\tint ret;\n-\tstruct ext4_iloc iloc;\n+\tstruct ext4_fc_inode_snap *snap = ei->i_fc_snap;\n \tstruct ext4_fc_inode fc_inode;\n \tstruct ext4_fc_tl tl;\n \tu8 *dst;\n+\tu8 *src;\n+\tint inode_len;\n+\tint ret;\n \n-\tret = ext4_get_inode_loc(inode, &iloc);\n-\tif (ret)\n-\t\treturn ret;\n+\tif (!snap)\n+\t\treturn -ECANCELED;\n \n-\tif (ext4_test_inode_flag(inode, EXT4_INODE_INLINE_DATA))\n-\t\tinode_len = EXT4_INODE_SIZE(inode->i_sb);\n-\telse if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE)\n-\t\tinode_len += ei->i_extra_isize;\n+\tsrc = snap->inode_buf;\n+\tinode_len = snap->inode_len;\n+\tif (!src || inode_len == 0)\n+\t\treturn -ECANCELED;\n \n \tfc_inode.fc_ino = cpu_to_le32(inode->i_ino);\n \ttl.fc_tag = cpu_to_le16(EXT4_FC_TAG_INODE);\n@@ -882,10 +904,9 @@ static int ext4_fc_write_inode(struct inode *inode, u32 *crc)\n \tdst += EXT4_FC_TAG_BASE_LEN;\n \tmemcpy(dst, &fc_inode, sizeof(fc_inode));\n \tdst += sizeof(fc_inode);\n-\tmemcpy(dst, (u8 *)ext4_raw_inode(&iloc), inode_len);\n+\tmemcpy(dst, src, inode_len);\n \tret = 0;\n err:\n-\tbrelse(iloc.bh);\n \treturn ret;\n }\n \n@@ -895,12 +916,74 @@ static int ext4_fc_write_inode(struct inode *inode, u32 *crc)\n  */\n static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)\n {\n-\text4_lblk_t old_blk_size, cur_lblk_off, new_blk_size;\n \tstruct ext4_inode_info *ei = EXT4_I(inode);\n-\tstruct ext4_map_blocks map;\n+\tstruct ext4_fc_inode_snap *snap = ei->i_fc_snap;\n \tstruct ext4_fc_add_range fc_ext;\n \tstruct ext4_fc_del_range lrange;\n \tstruct ext4_extent *ex;\n+\tstruct ext4_fc_range *range;\n+\n+\tif (!snap)\n+\t\treturn -ECANCELED;\n+\n+\tlist_for_each_entry(range, &snap->data_list, list) {\n+\t\tif (range->tag == EXT4_FC_TAG_DEL_RANGE) {\n+\t\t\tlrange.fc_ino = cpu_to_le32(inode->i_ino);\n+\t\t\tlrange.fc_lblk = cpu_to_le32(range->lblk);\n+\t\t\tlrange.fc_len = cpu_to_le32(range->len);\n+\t\t\tif (!ext4_fc_add_tlv(inode->i_sb, EXT4_FC_TAG_DEL_RANGE,\n+\t\t\t\t\t     sizeof(lrange), (u8 *)&lrange, crc))\n+\t\t\t\treturn -ENOSPC;\n+\t\t\tcontinue;\n+\t\t}\n+\n+\t\tfc_ext.fc_ino = cpu_to_le32(inode->i_ino);\n+\t\tex = (struct ext4_extent *)&fc_ext.fc_ex;\n+\t\tex->ee_block = cpu_to_le32(range->lblk);\n+\t\tex->ee_len = cpu_to_le16(range->len);\n+\t\text4_ext_store_pblock(ex, range->pblk);\n+\t\tif (range->unwritten)\n+\t\t\text4_ext_mark_unwritten(ex);\n+\t\telse\n+\t\t\text4_ext_mark_initialized(ex);\n+\n+\t\tif (!ext4_fc_add_tlv(inode->i_sb, EXT4_FC_TAG_ADD_RANGE,\n+\t\t\t\t     sizeof(fc_ext), (u8 *)&fc_ext, crc))\n+\t\t\treturn -ENOSPC;\n+\t}\n+\n+\treturn 0;\n+}\n+\n+static void ext4_fc_free_ranges(struct list_head *head)\n+{\n+\tstruct ext4_fc_range *range, *range_n;\n+\n+\tlist_for_each_entry_safe(range, range_n, head, list) {\n+\t\tlist_del(&range->list);\n+\t\tkfree(range);\n+\t}\n+}\n+\n+static void ext4_fc_free_inode_snap(struct inode *inode)\n+{\n+\tstruct ext4_inode_info *ei = EXT4_I(inode);\n+\tstruct ext4_fc_inode_snap *snap = ei->i_fc_snap;\n+\n+\tif (!snap)\n+\t\treturn;\n+\n+\text4_fc_free_ranges(&snap->data_list);\n+\tkfree(snap);\n+\tei->i_fc_snap = NULL;\n+}\n+\n+static int ext4_fc_snapshot_inode_data(struct inode *inode,\n+\t\t\t\t       struct list_head *ranges)\n+{\n+\tstruct ext4_inode_info *ei = EXT4_I(inode);\n+\text4_lblk_t start_lblk, end_lblk, cur_lblk;\n+\tstruct ext4_map_blocks map;\n \tint ret;\n \n \tspin_lock(&ei->i_fc_lock);\n@@ -908,18 +991,20 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)\n \t\tspin_unlock(&ei->i_fc_lock);\n \t\treturn 0;\n \t}\n-\told_blk_size = ei->i_fc_lblk_start;\n-\tnew_blk_size = ei->i_fc_lblk_start + ei->i_fc_lblk_len - 1;\n+\tstart_lblk = ei->i_fc_lblk_start;\n+\tend_lblk = ei->i_fc_lblk_start + ei->i_fc_lblk_len - 1;\n \tei->i_fc_lblk_len = 0;\n \tspin_unlock(&ei->i_fc_lock);\n \n-\tcur_lblk_off = old_blk_size;\n-\text4_debug(\"will try writing %d to %d for inode %ld\\n\",\n-\t\t   cur_lblk_off, new_blk_size, inode->i_ino);\n+\tcur_lblk = start_lblk;\n+\text4_debug(\"snapshot data ranges %u-%u for inode %lu\\n\",\n+\t\t   start_lblk, end_lblk, inode->i_ino);\n+\n+\twhile (cur_lblk <= end_lblk) {\n+\t\tstruct ext4_fc_range *range;\n \n-\twhile (cur_lblk_off <= new_blk_size) {\n-\t\tmap.m_lblk = cur_lblk_off;\n-\t\tmap.m_len = new_blk_size - cur_lblk_off + 1;\n+\t\tmap.m_lblk = cur_lblk;\n+\t\tmap.m_len = end_lblk - cur_lblk + 1;\n \t\tret = ext4_map_blocks(NULL, inode, &map,\n \t\t\t\t      EXT4_GET_BLOCKS_IO_SUBMIT |\n \t\t\t\t      EXT4_EX_NOCACHE);\n@@ -927,17 +1012,21 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)\n \t\t\treturn -ECANCELED;\n \n \t\tif (map.m_len == 0) {\n-\t\t\tcur_lblk_off++;\n+\t\t\tcur_lblk++;\n \t\t\tcontinue;\n \t\t}\n \n+\t\trange = kmalloc(sizeof(*range), GFP_NOFS);\n+\t\tif (!range)\n+\t\t\treturn -ENOMEM;\n+\n+\t\trange->lblk = map.m_lblk;\n+\t\trange->len = map.m_len;\n+\t\trange->pblk = 0;\n+\t\trange->unwritten = false;\n+\n \t\tif (ret == 0) {\n-\t\t\tlrange.fc_ino = cpu_to_le32(inode->i_ino);\n-\t\t\tlrange.fc_lblk = cpu_to_le32(map.m_lblk);\n-\t\t\tlrange.fc_len = cpu_to_le32(map.m_len);\n-\t\t\tif (!ext4_fc_add_tlv(inode->i_sb, EXT4_FC_TAG_DEL_RANGE,\n-\t\t\t\t\t    sizeof(lrange), (u8 *)&lrange, crc))\n-\t\t\t\treturn -ENOSPC;\n+\t\t\trange->tag = EXT4_FC_TAG_DEL_RANGE;\n \t\t} else {\n \t\t\tunsigned int max = (map.m_flags & EXT4_MAP_UNWRITTEN) ?\n \t\t\t\tEXT_UNWRITTEN_MAX_LEN : EXT_INIT_MAX_LEN;\n@@ -945,26 +1034,67 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)\n \t\t\t/* Limit the number of blocks in one extent */\n \t\t\tmap.m_len = min(max, map.m_len);\n \n-\t\t\tfc_ext.fc_ino = cpu_to_le32(inode->i_ino);\n-\t\t\tex = (struct ext4_extent *)&fc_ext.fc_ex;\n-\t\t\tex->ee_block = cpu_to_le32(map.m_lblk);\n-\t\t\tex->ee_len = cpu_to_le16(map.m_len);\n-\t\t\text4_ext_store_pblock(ex, map.m_pblk);\n-\t\t\tif (map.m_flags & EXT4_MAP_UNWRITTEN)\n-\t\t\t\text4_ext_mark_unwritten(ex);\n-\t\t\telse\n-\t\t\t\text4_ext_mark_initialized(ex);\n-\t\t\tif (!ext4_fc_add_tlv(inode->i_sb, EXT4_FC_TAG_ADD_RANGE,\n-\t\t\t\t\t    sizeof(fc_ext), (u8 *)&fc_ext, crc))\n-\t\t\t\treturn -ENOSPC;\n+\t\t\trange->tag = EXT4_FC_TAG_ADD_RANGE;\n+\t\t\trange->len = map.m_len;\n+\t\t\trange->pblk = map.m_pblk;\n+\t\t\trange->unwritten = !!(map.m_flags & EXT4_MAP_UNWRITTEN);\n \t\t}\n \n-\t\tcur_lblk_off += map.m_len;\n+\t\tINIT_LIST_HEAD(&range->list);\n+\t\tlist_add_tail(&range->list, ranges);\n+\n+\t\tcur_lblk += map.m_len;\n \t}\n \n \treturn 0;\n }\n \n+static int ext4_fc_snapshot_inode(struct inode *inode)\n+{\n+\tstruct ext4_inode_info *ei = EXT4_I(inode);\n+\tstruct ext4_fc_inode_snap *snap;\n+\tint inode_len = EXT4_GOOD_OLD_INODE_SIZE;\n+\tstruct ext4_iloc iloc;\n+\tLIST_HEAD(ranges);\n+\tint ret;\n+\tint alloc_ctx;\n+\n+\tret = ext4_get_inode_loc_noio(inode, &iloc);\n+\tif (ret)\n+\t\treturn ret;\n+\n+\tif (ext4_test_inode_flag(inode, EXT4_INODE_INLINE_DATA))\n+\t\tinode_len = EXT4_INODE_SIZE(inode->i_sb);\n+\telse if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE)\n+\t\tinode_len += ei->i_extra_isize;\n+\n+\tsnap = kmalloc(struct_size(snap, inode_buf, inode_len), GFP_NOFS);\n+\tif (!snap) {\n+\t\tbrelse(iloc.bh);\n+\t\treturn -ENOMEM;\n+\t}\n+\tINIT_LIST_HEAD(&snap->data_list);\n+\tsnap->inode_len = inode_len;\n+\n+\tmemcpy(snap->inode_buf, (u8 *)ext4_raw_inode(&iloc), inode_len);\n+\tbrelse(iloc.bh);\n+\n+\tret = ext4_fc_snapshot_inode_data(inode, &ranges);\n+\tif (ret) {\n+\t\tkfree(snap);\n+\t\text4_fc_free_ranges(&ranges);\n+\t\treturn ret;\n+\t}\n+\n+\talloc_ctx = ext4_fc_lock(inode->i_sb);\n+\text4_fc_free_inode_snap(inode);\n+\tei->i_fc_snap = snap;\n+\tlist_splice_tail_init(&ranges, &snap->data_list);\n+\text4_fc_unlock(inode->i_sb, alloc_ctx);\n+\n+\treturn 0;\n+}\n+\n \n /* Flushes data of all the inodes in the commit queue. */\n static int ext4_fc_flush_data(journal_t *journal)\n@@ -1015,6 +1145,11 @@ static int ext4_fc_commit_dentry_updates(journal_t *journal, u32 *crc)\n \t\t */\n \t\tif (list_empty(&fc_dentry->fcd_dilist))\n \t\t\tcontinue;\n+\t\t/*\n+\t\t * For EXT4_FC_TAG_CREAT, fcd_dilist is linked on the created\n+\t\t * inode's i_fc_dilist list (kept singular), so we can recover the\n+\t\t * inode through it.\n+\t\t */\n \t\tei = list_first_entry(&fc_dentry->fcd_dilist,\n \t\t\t\tstruct ext4_inode_info, i_fc_dilist);\n \t\tinode = &ei->vfs_inode;\n@@ -1039,6 +1174,88 @@ static int ext4_fc_commit_dentry_updates(journal_t *journal, u32 *crc)\n \treturn 0;\n }\n \n+static int ext4_fc_snapshot_inodes(journal_t *journal)\n+{\n+\tstruct super_block *sb = journal->j_private;\n+\tstruct ext4_sb_info *sbi = EXT4_SB(sb);\n+\tstruct ext4_inode_info *iter;\n+\tstruct ext4_fc_dentry_update *fc_dentry;\n+\tstruct inode **inodes;\n+\tunsigned int nr_inodes = 0;\n+\tunsigned int i = 0;\n+\tint ret = 0;\n+\tint alloc_ctx;\n+\n+\talloc_ctx = ext4_fc_lock(sb);\n+\tlist_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list)\n+\t\tnr_inodes++;\n+\n+\tlist_for_each_entry(fc_dentry, &sbi->s_fc_dentry_q[FC_Q_MAIN], fcd_list) {\n+\t\tstruct ext4_inode_info *ei;\n+\n+\t\tif (fc_dentry->fcd_op != EXT4_FC_TAG_CREAT)\n+\t\t\tcontinue;\n+\t\tif (list_empty(&fc_dentry->fcd_dilist))\n+\t\t\tcontinue;\n+\n+\t\t/* See the comment in ext4_fc_commit_dentry_updates(). */\n+\t\tei = list_first_entry(&fc_dentry->fcd_dilist,\n+\t\t\t\t      struct ext4_inode_info, i_fc_dilist);\n+\t\tif (!list_empty(&ei->i_fc_list))\n+\t\t\tcontinue;\n+\n+\t\tnr_inodes++;\n+\t}\n+\text4_fc_unlock(sb, alloc_ctx);\n+\n+\tif (!nr_inodes)\n+\t\treturn 0;\n+\n+\tinodes = kvcalloc(nr_inodes, sizeof(*inodes), GFP_NOFS);\n+\tif (!inodes)\n+\t\treturn -ENOMEM;\n+\n+\talloc_ctx = ext4_fc_lock(sb);\n+\tlist_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {\n+\t\tinodes[i] = igrab(&iter->vfs_inode);\n+\t\tif (inodes[i])\n+\t\t\ti++;\n+\t}\n+\n+\tlist_for_each_entry(fc_dentry, &sbi->s_fc_dentry_q[FC_Q_MAIN], fcd_list) {\n+\t\tstruct ext4_inode_info *ei;\n+\n+\t\tif (fc_dentry->fcd_op != EXT4_FC_TAG_CREAT)\n+\t\t\tcontinue;\n+\t\tif (list_empty(&fc_dentry->fcd_dilist))\n+\t\t\tcontinue;\n+\n+\t\t/* See the comment in ext4_fc_commit_dentry_updates(). */\n+\t\tei = list_first_entry(&fc_dentry->fcd_dilist,\n+\t\t\t\t      struct ext4_inode_info, i_fc_dilist);\n+\t\tif (!list_empty(&ei->i_fc_list))\n+\t\t\tcontinue;\n+\n+\t\tinodes[i] = igrab(&ei->vfs_inode);\n+\t\tif (inodes[i])\n+\t\t\ti++;\n+\t}\n+\text4_fc_unlock(sb, alloc_ctx);\n+\n+\tfor (nr_inodes = 0; nr_inodes < i; nr_inodes++) {\n+\t\tret = ext4_fc_snapshot_inode(inodes[nr_inodes]);\n+\t\tif (ret)\n+\t\t\tbreak;\n+\t}\n+\n+\tfor (nr_inodes = 0; nr_inodes < i; nr_inodes++) {\n+\t\tif (inodes[nr_inodes])\n+\t\t\tiput(inodes[nr_inodes]);\n+\t}\n+\tkvfree(inodes);\n+\treturn ret;\n+}\n+\n static int ext4_fc_perform_commit(journal_t *journal)\n {\n \tstruct super_block *sb = journal->j_private;\n@@ -1111,7 +1328,11 @@ static int ext4_fc_perform_commit(journal_t *journal)\n \t\t\t\t     EXT4_STATE_FC_COMMITTING);\n \t}\n \text4_fc_unlock(sb, alloc_ctx);\n+\n+\tret = ext4_fc_snapshot_inodes(journal);\n \tjbd2_journal_unlock_updates(journal);\n+\tif (ret)\n+\t\treturn ret;\n \n \t/*\n \t * Step 5: If file system device is different from journal device,\n@@ -1308,6 +1529,7 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)\n \t\t\t\t\tstruct ext4_inode_info,\n \t\t\t\t\ti_fc_list);\n \t\tlist_del_init(&ei->i_fc_list);\n+\t\text4_fc_free_inode_snap(&ei->vfs_inode);\n \t\text4_clear_inode_state(&ei->vfs_inode,\n \t\t\t\t       EXT4_STATE_FC_COMMITTING);\n \t\tif (tid_geq(tid, ei->i_sync_tid)) {\n@@ -1343,6 +1565,14 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)\n \t\t\t\t\t     struct ext4_fc_dentry_update,\n \t\t\t\t\t     fcd_list);\n \t\tlist_del_init(&fc_dentry->fcd_list);\n+\t\tif (fc_dentry->fcd_op == EXT4_FC_TAG_CREAT &&\n+\t\t    !list_empty(&fc_dentry->fcd_dilist)) {\n+\t\t\t/* See the comment in ext4_fc_commit_dentry_updates(). */\n+\t\t\tei = list_first_entry(&fc_dentry->fcd_dilist,\n+\t\t\t\t\t      struct ext4_inode_info,\n+\t\t\t\t\t      i_fc_dilist);\n+\t\t\text4_fc_free_inode_snap(&ei->vfs_inode);\n+\t\t}\n \t\tlist_del_init(&fc_dentry->fcd_dilist);\n \n \t\trelease_dentry_name_snapshot(&fc_dentry->fcd_name);\ndiff --git a/fs/ext4/inode.c b/fs/ext4/inode.c\nindex a1c81ffdca2b9..385ff112d405e 100644\n--- a/fs/ext4/inode.c\n+++ b/fs/ext4/inode.c\n@@ -4969,6 +4969,57 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)\n \treturn ret;\n }\n \n+/*\n+ * ext4_get_inode_loc_noio() is a best-effort variant of ext4_get_inode_loc().\n+ * It looks up the inode table block in the buffer cache and returns -EAGAIN if\n+ * the block is not present or not uptodate, without starting any I/O.\n+ */\n+int ext4_get_inode_loc_noio(struct inode *inode, struct ext4_iloc *iloc)\n+{\n+\tstruct super_block *sb = inode->i_sb;\n+\tstruct ext4_group_desc *gdp;\n+\tstruct buffer_head *bh;\n+\text4_fsblk_t block;\n+\tint inodes_per_block, inode_offset;\n+\tunsigned long ino = inode->i_ino;\n+\n+\tiloc->bh = NULL;\n+\tif (ino < EXT4_ROOT_INO ||\n+\t    ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count))\n+\t\treturn -EFSCORRUPTED;\n+\n+\tiloc->block_group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);\n+\tgdp = ext4_get_group_desc(sb, iloc->block_group, NULL);\n+\tif (!gdp)\n+\t\treturn -EIO;\n+\n+\t/* Figure out the offset within the block group inode table. */\n+\tinodes_per_block = EXT4_SB(sb)->s_inodes_per_block;\n+\tinode_offset = ((ino - 1) % EXT4_INODES_PER_GROUP(sb));\n+\tiloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb);\n+\n+\tblock = ext4_inode_table(sb, gdp);\n+\tif (block <= le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block) ||\n+\t    block >= ext4_blocks_count(EXT4_SB(sb)->s_es)) {\n+\t\text4_error(sb,\n+\t\t\t   \"Invalid inode table block %llu in block_group %u\",\n+\t\t\t   block, iloc->block_group);\n+\t\treturn -EFSCORRUPTED;\n+\t}\n+\tblock += inode_offset / inodes_per_block;\n+\n+\tbh = sb_find_get_block(sb, block);\n+\tif (!bh)\n+\t\treturn -EAGAIN;\n+\tif (!ext4_buffer_uptodate(bh)) {\n+\t\tbrelse(bh);\n+\t\treturn -EAGAIN;\n+\t}\n+\n+\tiloc->bh = bh;\n+\treturn 0;\n+}\n+\n \n int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,\n \t\t\t  struct ext4_iloc *iloc)\n","prefixes":["RFC","v5","1/7"]}