From patchwork Fri Aug 9 03:45:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144304 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="OwEQ1gEI"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPk6k8dz9sP3 for ; Fri, 9 Aug 2019 13:46:34 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405102AbfHIDqa (ORCPT ); Thu, 8 Aug 2019 23:46:30 -0400 Received: from mail-pl1-f195.google.com ([209.85.214.195]:39326 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733140AbfHIDqa (ORCPT ); Thu, 8 Aug 2019 23:46:30 -0400 Received: by mail-pl1-f195.google.com with SMTP id b7so44464631pls.6 for ; Thu, 08 Aug 2019 20:46:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MNd6MIRslhds+bAe0ZMMUjgZRrTpkejvhnp+VLxvP8Q=; b=OwEQ1gEIHwoSbn8zmOZRgRpgyNXt8Ql6BvvZnnMRUNXZ+iBz3s3uidwEZEiiiSQztW DvjbUA1DNBByaGSJYxQfLA3jCrReKK7v1/n7sZ/9Rx88kZ3raYQaBmGPKBlmnR3O5ran NJoRDqARUIky83QV8Hji5Z8z1j05qFETEEsYFmBoO92H3/b93k7rcNvou26TllIpnd9A mabWI1nnBR/+vnKwl9KNlvDPwGnKt5nr1tvi08cYFebkGODpZNcFbKld1vDwVlHMpwCs 0PhKIdmJBMUfmMf4rRaVKphZUQGupL3G2gABLpEnpL1ZfQ3uKBEXFqfgko8BBD6Zm+oX SclQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MNd6MIRslhds+bAe0ZMMUjgZRrTpkejvhnp+VLxvP8Q=; b=lyAfC4K7itdVPxvK97/HgtPTPPT0iPTD0j/o67isBOKzfnySnvlJOn88GAEDjBChBa dQJ7FRNl3sLXnBDKo9kLm3rogW+UpHCf6TTWbSfkGIsApgi7ZzttPUdH1cmHR8t41D9u obcvfhMBvr6+1JOsZ9Ln9weYeUsW4VrwmzmRYEE3u8dBB3snmriJAKRik+QqIxJ+yPD/ BTIL4ZE9DahkUb38EtlQM+xpa32mbzMo4erKkRZCDGiO3Dwk9124u7+av61SsTpbH34t LFcUXSkaW4D82zHmbblMdDERflrn5zFffkqKmQbVxL74P60A2FyDKe+tHnopv6TAt3Uv Aagg== X-Gm-Message-State: APjAAAWK6uxENgpANzxYcehGKE9yk6zdahxaMTw4WBzRubdRZ2shgz3L ihdaRDZkyPjYi4/AKruIcdhOHI9f X-Google-Smtp-Source: APXvYqyR4Ca2ym3kxhnrWs38JFkSFhB8doMQ3WHpsoky+SYOpaALDTaJ2s6hwopSXtAkSbIzLNcM6A== X-Received: by 2002:a17:902:244:: with SMTP id 62mr16763055plc.243.1565322388845; Thu, 08 Aug 2019 20:46:28 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.28 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:28 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 01/12] ext4: add handling for extended mount options Date: Thu, 8 Aug 2019 20:45:41 -0700 Message-Id: <20190809034552.148629-2-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org We are running out of mount option bits. This patch adds handling for using s_mount_opt2 and also adds ability to turn on / off the fast commit feature. In order to use fast commits, new version e2fsprogs needs to set the fast feature commit flag. This also makes sure that we have fast commit compatible e2fsprogs before starting to use the feature. Mount flag "no_fastcommit", introuced in this patch, can be passed to disable the feature at mount time. Signed-off-by: Harshad Shirwadkar Reviewed-by: Andreas Dilger Reviewed-by: Theodore Ts'o --- Changelog: V2: No changes since V1 --- fs/ext4/ext4.h | 4 ++++ fs/ext4/super.c | 27 ++++++++++++++++++++++----- include/linux/jbd2.h | 5 ++++- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index bf660aa7a9e0..becbda38b7db 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1146,6 +1146,8 @@ struct ext4_inode_info { #define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly specified journal checksum */ +#define EXT4_MOUNT2_JOURNAL_FAST_COMMIT 0x00000010 /* Journal fast commit */ + #define clear_opt(sb, opt) EXT4_SB(sb)->s_mount_opt &= \ ~EXT4_MOUNT_##opt #define set_opt(sb, opt) EXT4_SB(sb)->s_mount_opt |= \ @@ -1643,6 +1645,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei) #define EXT4_FEATURE_COMPAT_RESIZE_INODE 0x0010 #define EXT4_FEATURE_COMPAT_DIR_INDEX 0x0020 #define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200 +#define EXT4_FEATURE_COMPAT_FAST_COMMIT 0x0400 #define EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER 0x0001 #define EXT4_FEATURE_RO_COMPAT_LARGE_FILE 0x0002 @@ -1743,6 +1746,7 @@ EXT4_FEATURE_COMPAT_FUNCS(xattr, EXT_ATTR) EXT4_FEATURE_COMPAT_FUNCS(resize_inode, RESIZE_INODE) EXT4_FEATURE_COMPAT_FUNCS(dir_index, DIR_INDEX) EXT4_FEATURE_COMPAT_FUNCS(sparse_super2, SPARSE_SUPER2) +EXT4_FEATURE_COMPAT_FUNCS(fast_commit, FAST_COMMIT) EXT4_FEATURE_RO_COMPAT_FUNCS(sparse_super, SPARSE_SUPER) EXT4_FEATURE_RO_COMPAT_FUNCS(large_file, LARGE_FILE) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 4079605d437a..e376ac040cce 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1455,6 +1455,7 @@ enum { Opt_dioread_nolock, Opt_dioread_lock, Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable, Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache, + Opt_no_fastcommit }; static const match_table_t tokens = { @@ -1537,6 +1538,7 @@ static const match_table_t tokens = { {Opt_init_itable, "init_itable=%u"}, {Opt_init_itable, "init_itable"}, {Opt_noinit_itable, "noinit_itable"}, + {Opt_no_fastcommit, "no_fastcommit"}, {Opt_max_dir_size_kb, "max_dir_size_kb=%u"}, {Opt_test_dummy_encryption, "test_dummy_encryption"}, {Opt_nombcache, "nombcache"}, @@ -1659,6 +1661,7 @@ static int clear_qf_name(struct super_block *sb, int qtype) #define MOPT_NO_EXT3 0x0200 #define MOPT_EXT4_ONLY (MOPT_NO_EXT2 | MOPT_NO_EXT3) #define MOPT_STRING 0x0400 +#define MOPT_2 0x0800 static const struct mount_opts { int token; @@ -1751,6 +1754,8 @@ static const struct mount_opts { {Opt_max_dir_size_kb, 0, MOPT_GTE0}, {Opt_test_dummy_encryption, 0, MOPT_GTE0}, {Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET}, + {Opt_no_fastcommit, EXT4_MOUNT2_JOURNAL_FAST_COMMIT, + MOPT_CLEAR | MOPT_2 | MOPT_EXT4_ONLY}, {Opt_err, 0, 0} }; @@ -1858,8 +1863,9 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token, set_opt2(sb, EXPLICIT_DELALLOC); } else if (m->mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) { set_opt2(sb, EXPLICIT_JOURNAL_CHECKSUM); - } else + } else if (m->mount_opt) { return -1; + } } if (m->flags & MOPT_CLEAR_ERR) clear_opt(sb, ERRORS_MASK); @@ -2027,10 +2033,17 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token, WARN_ON(1); return -1; } - if (arg != 0) - sbi->s_mount_opt |= m->mount_opt; - else - sbi->s_mount_opt &= ~m->mount_opt; + if (m->flags & MOPT_2) { + if (arg != 0) + sbi->s_mount_opt2 |= m->mount_opt; + else + sbi->s_mount_opt2 &= ~m->mount_opt; + } else { + if (arg != 0) + sbi->s_mount_opt |= m->mount_opt; + else + sbi->s_mount_opt &= ~m->mount_opt; + } } return 1; } @@ -3733,6 +3746,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) #ifdef CONFIG_EXT4_FS_POSIX_ACL set_opt(sb, POSIX_ACL); #endif + if (ext4_has_feature_fast_commit(sb)) + set_opt2(sb, JOURNAL_FAST_COMMIT); + /* don't forget to enable journal_csum when metadata_csum is enabled. */ if (ext4_has_metadata_csum(sb)) set_opt(sb, JOURNAL_CHECKSUM); @@ -4334,6 +4350,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_def_mount_opt &= ~EXT4_MOUNT_JOURNAL_CHECKSUM; clear_opt(sb, JOURNAL_CHECKSUM); clear_opt(sb, DATA_FLAGS); + clear_opt2(sb, JOURNAL_FAST_COMMIT); sbi->s_journal = NULL; needs_recovery = 0; goto no_journal; diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index df03825ad1a1..b7eed49b8ecd 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -288,6 +288,7 @@ typedef struct journal_superblock_s #define JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT 0x00000004 #define JBD2_FEATURE_INCOMPAT_CSUM_V2 0x00000008 #define JBD2_FEATURE_INCOMPAT_CSUM_V3 0x00000010 +#define JBD2_FEATURE_INCOMPAT_FAST_COMMIT 0x00000020 /* See "journal feature predicate functions" below */ @@ -298,7 +299,8 @@ typedef struct journal_superblock_s JBD2_FEATURE_INCOMPAT_64BIT | \ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT | \ JBD2_FEATURE_INCOMPAT_CSUM_V2 | \ - JBD2_FEATURE_INCOMPAT_CSUM_V3) + JBD2_FEATURE_INCOMPAT_CSUM_V3 | \ + JBD2_FEATURE_INCOMPAT_FAST_COMMIT) #ifdef __KERNEL__ @@ -1235,6 +1237,7 @@ JBD2_FEATURE_INCOMPAT_FUNCS(64bit, 64BIT) JBD2_FEATURE_INCOMPAT_FUNCS(async_commit, ASYNC_COMMIT) JBD2_FEATURE_INCOMPAT_FUNCS(csum2, CSUM_V2) JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3) +JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit, FAST_COMMIT) /* * Journal flag definitions From patchwork Fri Aug 9 03:45:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144305 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="QW/o9dwi"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPl3VXrz9sP9 for ; Fri, 9 Aug 2019 13:46:35 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405165AbfHIDqb (ORCPT ); Thu, 8 Aug 2019 23:46:31 -0400 Received: from mail-pf1-f193.google.com ([209.85.210.193]:44403 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405100AbfHIDqa (ORCPT ); Thu, 8 Aug 2019 23:46:30 -0400 Received: by mail-pf1-f193.google.com with SMTP id t16so45244535pfe.11 for ; Thu, 08 Aug 2019 20:46:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ANE5R9Yz3I1xChBILEP8gIfVAaEDzNS/uDUefRfMDfw=; b=QW/o9dwi1kyGhI5ACyN1p+FkiQATqbVqc4qZfagK7bzYdNRcGYVANZ5s/wvW5p51uJ d6qMkXH3IpxHF+vWW+7H/oeir7vpFVj+pW9NTqaPs9Dr+aitvFF8P+rAVd1gJIwiqdAu xM8xCIKOi18hoCCCyKuNC6OLGyDzxzzul6AuTBlkhUgDwxG0yls4lqu0WvOBy65cdMrD r//iRzNYFZrFoeFdbDBO0mmdHRIFEqAxjyccqS494/J3BVJhjSyBJZQ3HZz8tF85qKwR szYDSNKA4dH6mfQkYcKAym/1V072oAVgGDH2bHZrl3lKVX4yR1gzjV1GgsR7XSVc2Y/p ZWrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ANE5R9Yz3I1xChBILEP8gIfVAaEDzNS/uDUefRfMDfw=; b=EUOhZRO7wj02O0nn4mRX+pZEmPq9CA4w+KDtOpHx313PHQT5tPFhiVRBXeGfMYMfRl NuCASIcF+w8Xdqq0zWSptfBm2fFm6TEfj+uYccm9EVxj4YQrafCdAlnJgF1kNMlQ7/ix iQIN/OYPLCjy56JRhQeN83u3ctaKP+CZZCoAQZRMOEEphLBCg2+OqFq1ybyf6DfPR+ew 30LqsGjOU5qlkquXVC8tsuM5YGombfrG8ohE8uAK2g4hh4mQ/wlJAkQeRj1b/qGUx1m6 kPlMN5CSc9408NVozzR0Q8wlKk6lWQhM+kxHKnomcmtumqC1rgMosRfKi3vAFDMeyYAC wARg== X-Gm-Message-State: APjAAAVZHqxOd0jopPLFP+Fmly/23BpbMTROTqpV+6XMgpfk4ET+m434 KFpIf1nsbmWXOGSvd+m5OkUzx1B5 X-Google-Smtp-Source: APXvYqzOiBHVMLErsqKMt+eyeFUl8XDrNN+IbmLH+BebzWLOx41Ehl1wKz3VqZOnVEyLV8YEk7LdFQ== X-Received: by 2002:a63:1743:: with SMTP id 3mr15226483pgx.435.1565322389571; Thu, 08 Aug 2019 20:46:29 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.28 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:29 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 02/12] jbd2: add fast commit fields to journal_s structure Date: Thu, 8 Aug 2019 20:45:42 -0700 Message-Id: <20190809034552.148629-3-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org For fast commits, JBD2 as of now allocates a default of 128 blocks at the end of the journalling area. Although JBD2 owns these blocks, it doesn't control what exactly should be written in these blocks. It just provides the right abstraction for making these blocks usable by file systems. This patch adds necessary fields to manage these fast commit blocks. Signed-off-by: Harshad Shirwadkar --- Changelog: V2: Added struct transaction_run_stats_s * argument to j_fc_commit_callback to collect stats --- include/linux/jbd2.h | 79 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index b7eed49b8ecd..9a750b732241 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -66,6 +66,7 @@ void __jbd2_debug(int level, const char *file, const char *func, extern void *jbd2_alloc(size_t size, gfp_t flags); extern void jbd2_free(void *ptr, size_t size); +#define JBD2_FAST_COMMIT_BLOCKS 128 #define JBD2_MIN_JOURNAL_BLOCKS 1024 #ifdef __KERNEL__ @@ -918,6 +919,34 @@ struct journal_s */ unsigned long j_last; + /** + * @j_first_fc: + * + * The block number of the first fast commit block in the journal + */ + unsigned long j_first_fc; + + /** + * @j_current_fc: + * + * Journal fc block iterator + */ + unsigned long j_fc_off; + + /** + * @j_last_fc: + * + * The block number of the last fast commit block in the journal + */ + unsigned long j_last_fc; + + /** + * @j_do_full_commit: + * + * Force a full commit. If this flag is set JBD2 won't try fast commits + */ + bool j_do_full_commit; + /** * @j_dev: Device where we store the journal. */ @@ -987,6 +1016,15 @@ struct journal_s */ tid_t j_transaction_sequence; + /** + * @j_subtid: + * + * One plus the sequence number of the most recently committed fast + * commit. This represents the sub transaction ID for the next fast + * commit. + */ + tid_t j_subtid; + /** * @j_commit_sequence: * @@ -1068,6 +1106,20 @@ struct journal_s */ int j_wbufsize; + /** + * @j_fc_wbuf: + * + * Array of bhs for fast commit transactions + */ + struct buffer_head **j_fc_wbuf; + + /** + * @j_fc_wbufsize: + * + * Size of @j_fc_wbufsize array. + */ + int j_fc_wbufsize; + /** * @j_last_sync_writer: * @@ -1167,6 +1219,33 @@ struct journal_s */ struct lockdep_map j_trans_commit_map; #endif + /** + * @j_fc_commit_callback: + * + * File-system specific function that performs actual fast commit + * operation. Should return 0 if the fast commit was successful, in that + * case, JBD2 will just increment journal->j_subtid and move on. If it + * returns < 0, JBD2 will fall-back to full commit. + */ + int (*j_fc_commit_callback)(struct journal_s *journal, tid_t tid, + tid_t subtid, + struct transaction_run_stats_s *stats); + /** + * @j_fc_replay_callback: + * + * File-system specific function that performs replay of a fast + * commit. JBD2 calls this function for each fast commit block found in + * the journal. + */ + int (*j_fc_replay_callback)(struct journal_s *journal, + struct buffer_head *bh); + /** + * @j_fc_cleanup_callback: + * + * Clean-up after fast commit or full commit. JBD2 calls this function + * after every commit operation. + */ + void (*j_fc_cleanup_callback)(struct journal_s *journal); }; #define jbd2_might_wait_for_commit(j) \ From patchwork Fri Aug 9 03:45:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144306 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="WYa2FHHz"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPm0DVYz9sPM for ; Fri, 9 Aug 2019 13:46:36 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405193AbfHIDqd (ORCPT ); Thu, 8 Aug 2019 23:46:33 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:41672 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405160AbfHIDqb (ORCPT ); Thu, 8 Aug 2019 23:46:31 -0400 Received: by mail-pg1-f196.google.com with SMTP id x15so34799619pgg.8 for ; Thu, 08 Aug 2019 20:46:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=iEIK6eOdIhxu0EqtgKItuoTZsy4oa4Lp3C968BFlQw0=; b=WYa2FHHzRlPGn1HLWQapugk3IUij+LveBo014tvtaMTbnKOw1xcnzh3cVsAkaTC/jq 2FQ+4a0NxCKhNXXTtNmBeN3xrvwjiAIqzNoQfbq41DptZJnRlIgNUyq0OYGr2+06zhuL k5K1BFAGTIyGfYlIE1qfv4NX7s7LtiCk3XDqWd4H9RMM/NY+N1ZMoy7MPl40CrfqrJPY 8Hz5G6rO8b8WsYvmqOoLvUfWuSpbPZuUz6604gU1gp5m7W8MBmWWOkJXGfNIxKx7a25L sMBOpVHaRNKVMJnJoapU7/DDGz8jtwH8bFRwttQMbIwYqowHahgwF/gTO92oFuEad9sM eDOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iEIK6eOdIhxu0EqtgKItuoTZsy4oa4Lp3C968BFlQw0=; b=W7A59ZrEJuB6Smb1IplOFiIHold/w5ljDmFxYXfQA6uehjYinZMHmp0QFwRBVNpp48 cK4rOXIsuKabuKgADVEmDjNIHwRzugf/AkjmY1a3qRDKd5FFhVYbVk2HAbSaeCntntm1 M0sNQBIiVOd+sDuZ1JUn+W6zutNZ7Vl/TeCS6wXHbkbk3ihrfaTG0FzQ+2t9lLz+PTJ+ Dk+OahFPEugjK9eQTsm/q7tWrqFOgrh+YdZ8+FWtKFox9Nuf68XOW0CnTR4HTwEz4GSr GhbWGVdYjRguxP1KIbtG05UUrA+lq6KtVAwjoiywEKEzUj1NgQ9LKNIzOPeBYDyBOrtZ 5d4g== X-Gm-Message-State: APjAAAXlbP13pcK/tXQl+CBaauJz9mN5vqj2HzRxWc+hvYHG7WdSqgua wA6Kk2cK5JNRZuCaZeqxMGr+FUmc X-Google-Smtp-Source: APXvYqzdzyJZxaHSuqJTEqJqFQpKJsqhqdB8AXQjcZ3kLm7or91yf7Yz7JGsGwK97A0fQAte9U4o3Q== X-Received: by 2002:a17:90a:bc0c:: with SMTP id w12mr6970820pjr.111.1565322390279; Thu, 08 Aug 2019 20:46:30 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.29 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:29 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 03/12] jbd2: fast commit setup and enable Date: Thu, 8 Aug 2019 20:45:43 -0700 Message-Id: <20190809034552.148629-4-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch allows file systems to turn fast commits on and thereby restrict the normal journalling space to total journal blocks minus JBD2_FAST_COMMIT_BLOCKS. Fast commits are not actually performed, just the interface to turn fast commits on is opened. Signed-off-by: Harshad Shirwadkar --- Changelog: V2: No changes since V1 --- fs/ext4/super.c | 3 ++- fs/jbd2/journal.c | 39 ++++++++++++++++++++++++++++++++------- fs/ocfs2/journal.c | 4 ++-- include/linux/jbd2.h | 2 +- 4 files changed, 37 insertions(+), 11 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index e376ac040cce..81c3ec165822 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4933,7 +4933,8 @@ static int ext4_load_journal(struct super_block *sb, if (save) memcpy(save, ((char *) es) + EXT4_S_ERR_START, EXT4_S_ERR_LEN); - err = jbd2_journal_load(journal); + err = jbd2_journal_load(journal, + test_opt2(sb, JOURNAL_FAST_COMMIT)); if (save) memcpy(((char *) es) + EXT4_S_ERR_START, save, EXT4_S_ERR_LEN); diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 953990eb70a9..59ad709154a3 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1159,12 +1159,15 @@ static journal_t *journal_init_common(struct block_device *bdev, journal->j_blk_offset = start; journal->j_maxlen = len; n = journal->j_blocksize / sizeof(journal_block_tag_t); - journal->j_wbufsize = n; + journal->j_wbufsize = n - JBD2_FAST_COMMIT_BLOCKS; journal->j_wbuf = kmalloc_array(n, sizeof(struct buffer_head *), GFP_KERNEL); if (!journal->j_wbuf) goto err_cleanup; + journal->j_fc_wbuf = &journal->j_wbuf[journal->j_wbufsize]; + journal->j_fc_wbufsize = JBD2_FAST_COMMIT_BLOCKS; + bh = getblk_unmovable(journal->j_dev, start, journal->j_blocksize); if (!bh) { pr_err("%s: Cannot get buffer for journal superblock\n", @@ -1297,11 +1300,19 @@ static int journal_reset(journal_t *journal) } journal->j_first = first; - journal->j_last = last; - journal->j_head = first; - journal->j_tail = first; - journal->j_free = last - first; + if (jbd2_has_feature_fast_commit(journal)) { + journal->j_last_fc = last; + journal->j_last = last - JBD2_FAST_COMMIT_BLOCKS; + journal->j_first_fc = journal->j_last + 1; + journal->j_fc_off = 0; + } else { + journal->j_last = last; + } + + journal->j_head = journal->j_first; + journal->j_tail = journal->j_first; + journal->j_free = journal->j_last - journal->j_first; journal->j_tail_sequence = journal->j_transaction_sequence; journal->j_commit_sequence = journal->j_transaction_sequence - 1; @@ -1626,9 +1637,17 @@ static int load_superblock(journal_t *journal) journal->j_tail_sequence = be32_to_cpu(sb->s_sequence); journal->j_tail = be32_to_cpu(sb->s_start); journal->j_first = be32_to_cpu(sb->s_first); - journal->j_last = be32_to_cpu(sb->s_maxlen); journal->j_errno = be32_to_cpu(sb->s_errno); + if (jbd2_has_feature_fast_commit(journal)) { + journal->j_last_fc = be32_to_cpu(sb->s_maxlen); + journal->j_last = journal->j_last_fc - JBD2_FAST_COMMIT_BLOCKS; + journal->j_first_fc = journal->j_last + 1; + journal->j_fc_off = 0; + } else { + journal->j_last = be32_to_cpu(sb->s_maxlen); + } + return 0; } @@ -1641,7 +1660,7 @@ static int load_superblock(journal_t *journal) * a journal, read the journal from disk to initialise the in-memory * structures. */ -int jbd2_journal_load(journal_t *journal) +int jbd2_journal_load(journal_t *journal, bool enable_fc) { int err; journal_superblock_t *sb; @@ -1684,6 +1703,12 @@ int jbd2_journal_load(journal_t *journal) return -EFSCORRUPTED; } + if (enable_fc) + jbd2_journal_set_features(journal, 0, 0, + JBD2_FEATURE_INCOMPAT_FAST_COMMIT); + else + jbd2_journal_clear_features(journal, 0, 0, + JBD2_FEATURE_INCOMPAT_FAST_COMMIT); /* OK, we've finished with the dynamic journal bits: * reinitialise the dynamic contents of the superblock in memory * and reset them on disk. */ diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index 930e3d388579..3b4d91b16e8e 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -1057,7 +1057,7 @@ int ocfs2_journal_load(struct ocfs2_journal *journal, int local, int replayed) osb = journal->j_osb; - status = jbd2_journal_load(journal->j_journal); + status = jbd2_journal_load(journal->j_journal, false); if (status < 0) { mlog(ML_ERROR, "Failed to load journal!\n"); goto done; @@ -1642,7 +1642,7 @@ static int ocfs2_replay_journal(struct ocfs2_super *osb, goto done; } - status = jbd2_journal_load(journal); + status = jbd2_journal_load(journal, false); if (status < 0) { mlog_errno(status); if (!igrab(inode)) diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 9a750b732241..153840b422cc 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1476,7 +1476,7 @@ extern int jbd2_journal_set_features (journal_t *, unsigned long, unsigned long, unsigned long); extern void jbd2_journal_clear_features (journal_t *, unsigned long, unsigned long, unsigned long); -extern int jbd2_journal_load (journal_t *journal); +extern int jbd2_journal_load(journal_t *journal, bool enable_fc); extern int jbd2_journal_destroy (journal_t *); extern int jbd2_journal_recover (journal_t *journal); extern int jbd2_journal_wipe (journal_t *, int); From patchwork Fri Aug 9 03:45:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144307 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="kw/CuSIn"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPm3c6nz9sP3 for ; Fri, 9 Aug 2019 13:46:36 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405189AbfHIDqd (ORCPT ); Thu, 8 Aug 2019 23:46:33 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:33564 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405167AbfHIDqc (ORCPT ); Thu, 8 Aug 2019 23:46:32 -0400 Received: by mail-pf1-f195.google.com with SMTP id g2so45276173pfq.0 for ; Thu, 08 Aug 2019 20:46:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MRm5rXOCZ+bnuTSrbkDttjv2f7u7GVHobS5RP9hhdGM=; b=kw/CuSIngoVKVcycMid6rKdXHTmubr0p/c7ODOWS9Rxh5YZbphoiU4YX3mLtmCDBNC kZlebNrbNx5SlptPoZEA92Kn6P+my5pfioX3BgMxSBcGhiiRmOwrOIsZ008m8Aljz3aV lFX1cKHDvK3mJTkzWwK6DXSHmB+PUzw6lZeH+0iptNEAF1y2jr6vaPWWIIffJXk+NzSp hjlQCx17RPnC5SKK1eaGjtmZRLm1yVE7vLt6hT3mF9xoM2tggqCODRUOe4X9fqwRsCk5 codqE/W0zDod0EDKyM7foXdrmMTepALjomjwkcX8Zz2zzNjIfZyrJInVZ9V5UBZRseRp VU9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MRm5rXOCZ+bnuTSrbkDttjv2f7u7GVHobS5RP9hhdGM=; b=TkCCqnj15Fl609OmG0AyG5TAsiyubPgwjz1Gkj4b5gYiKTYmqgz48TZR/7+VAdu0wu 6Ft2e0ifMY3IGeKnb6MmseouJ7cyBqlVMfJ7Q71KWYQ1wBz2pEcD90wP6HEJBkXEpLdC oorggMjhkwn5esmNxzZs1asTbfUBatX1v+GAMy8n/qUFXtDz9emz2FKgK2Co1bKyOVvE BY0Dm1UpirlR7yEKhLacH/bRzl8aPRPO/MYibuluF10YPM9bgE4SZHII3US48MXVhM9t N5P7AzrmDwoSsgpKCuO0fnnDdaxPEAUYEGjEmbkq+BHhVX+EAUXP56WQmnAE15Sg/u3J OVCw== X-Gm-Message-State: APjAAAVv0we+Mvv0dT8yB4lDU3lQtYxrJ1Yvrn9Ke/HAi4LRpPz7llph pKb1m3Wzn1COHc6ORrsDqMdMBu9z X-Google-Smtp-Source: APXvYqwWcu6aPoAEqbJ22KSCoGQ+R6pb1Er3th5MZ/wq3bg9xqJabwGHyzFhu3dcdOJpLWAG1nK+xQ== X-Received: by 2002:a63:e14d:: with SMTP id h13mr15676966pgk.431.1565322390940; Thu, 08 Aug 2019 20:46:30 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.30 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:30 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 04/12] jbd2: fast-commit commit path changes Date: Thu, 8 Aug 2019 20:45:44 -0700 Message-Id: <20190809034552.148629-5-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch adds core fast-commit commit path changes. This patch also modifies existing JBD2 APIs to allow usage of fast commits. If fast commits are enabled and journal->j_do_full_commit is not set, the commit routine tries the file system specific fast commmit first. Only if it fails, it falls back to the full commit. Commit start and wait APIs now take an additional argument which indicates if fast commits are allowed or not. In this patch we also add a new entry to journal->stats which counts the number of fast commits performed. Signed-off-by: Harshad Shirwadkar --- Changelog: V2: JBD2 commit routine passes stats to the fast commit callbac. Also, added a new entry to journal->stats and its tracking. --- fs/ext4/super.c | 2 +- fs/jbd2/checkpoint.c | 2 +- fs/jbd2/commit.c | 47 +++++++++++++++++++++++-- fs/jbd2/journal.c | 81 +++++++++++++++++++++++++++++++++++-------- fs/jbd2/transaction.c | 6 ++-- fs/ocfs2/alloc.c | 2 +- fs/ocfs2/super.c | 2 +- include/linux/jbd2.h | 9 +++-- 8 files changed, 124 insertions(+), 27 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 81c3ec165822..6bab59ae81f7 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5148,7 +5148,7 @@ static int ext4_sync_fs(struct super_block *sb, int wait) !jbd2_trans_will_send_data_barrier(sbi->s_journal, target)) needs_barrier = true; - if (jbd2_journal_start_commit(sbi->s_journal, &target)) { + if (jbd2_journal_start_commit(sbi->s_journal, &target, true)) { if (wait) ret = jbd2_log_wait_commit(sbi->s_journal, target); diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index a1909066bde6..6297978ae3bc 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -277,7 +277,7 @@ int jbd2_log_do_checkpoint(journal_t *journal) if (batch_count) __flush_batch(journal, &batch_count); - jbd2_log_start_commit(journal, tid); + jbd2_log_start_commit(journal, tid, true); /* * jbd2_journal_commit_transaction() may want * to take the checkpoint_mutex if JBD2_FLUSHED diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 132fb92098c7..9281814606e7 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -351,8 +351,12 @@ static void jbd2_block_tag_csum_set(journal_t *j, journal_block_tag_t *tag, * * The primary function for committing a transaction to the log. This * function is called by the journal thread to begin a complete commit. + * + * fc is input / output parameter. If fc is non-null and is set to true, this + * function tries to perform fast commit. If the fast commit is successfully + * performed, *fc is set to true. */ -void jbd2_journal_commit_transaction(journal_t *journal) +void jbd2_journal_commit_transaction(journal_t *journal, bool *fc) { struct transaction_stats_s stats; transaction_t *commit_transaction; @@ -380,6 +384,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) tid_t first_tid; int update_tail; int csum_size = 0; + bool full_commit; LIST_HEAD(io_bufs); LIST_HEAD(log_bufs); @@ -413,6 +418,40 @@ void jbd2_journal_commit_transaction(journal_t *journal) J_ASSERT(journal->j_running_transaction != NULL); J_ASSERT(journal->j_committing_transaction == NULL); + read_lock(&journal->j_state_lock); + full_commit = journal->j_do_full_commit; + read_unlock(&journal->j_state_lock); + + /* Let file-system try its own fast commit */ + if (jbd2_has_feature_fast_commit(journal)) { + if (!full_commit && fc && *fc == true && + journal->j_fc_commit_callback && + !journal->j_fc_commit_callback( + journal, journal->j_running_transaction->t_tid, + journal->j_subtid, &stats.run)) { + jbd_debug(3, "fast commit success.\n"); + if (journal->j_fc_cleanup_callback) + journal->j_fc_cleanup_callback(journal); + write_lock(&journal->j_state_lock); + journal->j_subtid++; + if (fc) + *fc = true; + write_unlock(&journal->j_state_lock); + goto update_overall_stats; + } + if (journal->j_fc_cleanup_callback) + journal->j_fc_cleanup_callback(journal); + write_lock(&journal->j_state_lock); + journal->j_fc_off = 0; + journal->j_subtid = 0; + journal->j_do_full_commit = false; + write_unlock(&journal->j_state_lock); + } + + jbd_debug(3, "fast commit not performed, trying full.\n"); + if (fc) + *fc = false; + commit_transaction = journal->j_running_transaction; trace_jbd2_start_commit(journal, commit_transaction); @@ -1129,8 +1168,12 @@ void jbd2_journal_commit_transaction(journal_t *journal) /* * Calculate overall stats */ +update_overall_stats: spin_lock(&journal->j_history_lock); - journal->j_stats.ts_tid++; + if (fc && *fc == true) + journal->j_stats.ts_num_fast_commits++; + else + journal->j_stats.ts_tid++; journal->j_stats.ts_requested += stats.ts_requested; journal->j_stats.run.rs_wait += stats.run.rs_wait; journal->j_stats.run.rs_request_delay += stats.run.rs_request_delay; diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 59ad709154a3..ab05e47ed2d4 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -160,7 +160,13 @@ static void commit_timeout(struct timer_list *t) * * 1) COMMIT: Every so often we need to commit the current state of the * filesystem to disk. The journal thread is responsible for writing - * all of the metadata buffers to disk. + * all of the metadata buffers to disk. If fast commits are allowed, + * journal thread passes the control to the file system and file system + * is then responsible for writing metadata buffers to disk (in whichever + * format it wants). If fast commit succeds, journal thread won't perform + * a normal commit. In case the fast commit fails, journal thread performs + * full commit as normal. + * * * 2) CHECKPOINT: We cannot reuse a used section of the log file until all * of the data in that part of the log has been rewritten elsewhere on @@ -172,6 +178,7 @@ static int kjournald2(void *arg) { journal_t *journal = arg; transaction_t *transaction; + bool fc_flag = true, fc_flag_save; /* * Set up an interval timer which can be used to trigger a commit wakeup @@ -209,9 +216,14 @@ static int kjournald2(void *arg) jbd_debug(1, "OK, requests differ\n"); write_unlock(&journal->j_state_lock); del_timer_sync(&journal->j_commit_timer); - jbd2_journal_commit_transaction(journal); + fc_flag_save = fc_flag; + jbd2_journal_commit_transaction(journal, &fc_flag); write_lock(&journal->j_state_lock); - goto loop; + if (!fc_flag) { + /* fast commit not performed */ + fc_flag = fc_flag_save; + goto loop; + } } wake_up(&journal->j_wait_done_commit); @@ -235,16 +247,18 @@ static int kjournald2(void *arg) prepare_to_wait(&journal->j_wait_commit, &wait, TASK_INTERRUPTIBLE); - if (journal->j_commit_sequence != journal->j_commit_request) + if (!fc_flag && + journal->j_commit_sequence != journal->j_commit_request) should_sleep = 0; transaction = journal->j_running_transaction; if (transaction && time_after_eq(jiffies, - transaction->t_expires)) + transaction->t_expires)) should_sleep = 0; if (journal->j_flags & JBD2_UNMOUNT) should_sleep = 0; if (should_sleep) { write_unlock(&journal->j_state_lock); + jbd_debug(1, "%s sleeps\n", __func__); schedule(); write_lock(&journal->j_state_lock); } @@ -259,7 +273,10 @@ static int kjournald2(void *arg) transaction = journal->j_running_transaction; if (transaction && time_after_eq(jiffies, transaction->t_expires)) { journal->j_commit_request = transaction->t_tid; + fc_flag = false; jbd_debug(1, "woke because of timeout\n"); + } else { + fc_flag = true; } goto loop; @@ -517,11 +534,17 @@ int __jbd2_log_start_commit(journal_t *journal, tid_t target) return 0; } -int jbd2_log_start_commit(journal_t *journal, tid_t tid) +int jbd2_log_start_commit(journal_t *journal, tid_t tid, bool full_commit) { int ret; write_lock(&journal->j_state_lock); + /* + * If someone has already requested a full commit, + * we have to honor it. + */ + if (!journal->j_do_full_commit) + journal->j_do_full_commit = full_commit; ret = __jbd2_log_start_commit(journal, tid); write_unlock(&journal->j_state_lock); return ret; @@ -556,7 +579,7 @@ static int __jbd2_journal_force_commit(journal_t *journal) tid = transaction->t_tid; read_unlock(&journal->j_state_lock); if (need_to_start) - jbd2_log_start_commit(journal, tid); + jbd2_log_start_commit(journal, tid, true); ret = jbd2_log_wait_commit(journal, tid); if (!ret) ret = 1; @@ -603,11 +626,14 @@ int jbd2_journal_force_commit(journal_t *journal) * if a transaction is going to be committed (or is currently already * committing), and fills its tid in at *ptid */ -int jbd2_journal_start_commit(journal_t *journal, tid_t *ptid) +int jbd2_journal_start_commit(journal_t *journal, tid_t *ptid, bool full_commit) { int ret = 0; write_lock(&journal->j_state_lock); + if (!journal->j_do_full_commit) + journal->j_do_full_commit = full_commit; + if (journal->j_running_transaction) { tid_t tid = journal->j_running_transaction->t_tid; @@ -675,7 +701,7 @@ EXPORT_SYMBOL(jbd2_trans_will_send_data_barrier); * Wait for a specified commit to complete. * The caller may not hold the journal lock. */ -int jbd2_log_wait_commit(journal_t *journal, tid_t tid) +int __jbd2_log_wait_commit(journal_t *journal, tid_t tid, tid_t subtid) { int err = 0; @@ -702,12 +728,25 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid) } #endif while (tid_gt(tid, journal->j_commit_sequence)) { - jbd_debug(1, "JBD2: want %u, j_commit_sequence=%u\n", - tid, journal->j_commit_sequence); + if ((!journal->j_do_full_commit) && + !tid_geq(subtid, journal->j_subtid)) + break; + jbd_debug(1, "JBD2: want full commit %u %s %u, ", + tid, journal->j_do_full_commit ? + "and ignoring fast commit request for " : + "or want fast commit", + journal->j_subtid); + jbd_debug(1, "j_commit_sequence=%u, j_subtid=%u\n", + journal->j_commit_sequence, journal->j_subtid); read_unlock(&journal->j_state_lock); wake_up(&journal->j_wait_commit); - wait_event(journal->j_wait_done_commit, - !tid_gt(tid, journal->j_commit_sequence)); + if (journal->j_do_full_commit) + wait_event(journal->j_wait_done_commit, + !tid_gt(tid, journal->j_commit_sequence)); + else + wait_event(journal->j_wait_done_commit, + !tid_gt(tid, journal->j_commit_sequence) || + !tid_geq(subtid, journal->j_subtid)); read_lock(&journal->j_state_lock); } read_unlock(&journal->j_state_lock); @@ -717,6 +756,13 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid) return err; } +int jbd2_log_wait_commit(journal_t *journal, tid_t tid) +{ + journal->j_do_full_commit = true; + return __jbd2_log_wait_commit(journal, tid, 0); +} + + /* Return 1 when transaction with given tid has already committed. */ int jbd2_transaction_committed(journal_t *journal, tid_t tid) { @@ -751,7 +797,7 @@ int jbd2_complete_transaction(journal_t *journal, tid_t tid) if (journal->j_commit_request != tid) { /* transaction not yet started, so request it */ read_unlock(&journal->j_state_lock); - jbd2_log_start_commit(journal, tid); + jbd2_log_start_commit(journal, tid, true); goto wait_commit; } } else if (!(journal->j_committing_transaction && @@ -996,6 +1042,8 @@ static int jbd2_seq_info_show(struct seq_file *seq, void *v) "each up to %u blocks\n", s->stats->ts_tid, s->stats->ts_requested, s->journal->j_max_transaction_buffers); + seq_printf(seq, "%lu fast commits performed\n", + s->stats->ts_num_fast_commits); if (s->stats->ts_tid == 0) return 0; seq_printf(seq, "average: \n %ums waiting for transaction\n", @@ -1020,6 +1068,9 @@ static int jbd2_seq_info_show(struct seq_file *seq, void *v) s->stats->run.rs_blocks / s->stats->ts_tid); seq_printf(seq, " %lu logged blocks per transaction\n", s->stats->run.rs_blocks_logged / s->stats->ts_tid); + seq_printf(seq, " %lu logged blocks per commit\n", + s->stats->run.rs_blocks_logged / + (s->stats->ts_tid + s->stats->ts_num_fast_commits)); return 0; } @@ -1741,7 +1792,7 @@ int jbd2_journal_destroy(journal_t *journal) /* Force a final log commit */ if (journal->j_running_transaction) - jbd2_journal_commit_transaction(journal); + jbd2_journal_commit_transaction(journal, NULL); /* Force any old transactions to disk */ diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 990e7b5062e7..87f6627d78aa 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -154,7 +154,7 @@ static void wait_transaction_locked(journal_t *journal) need_to_start = !tid_geq(journal->j_commit_request, tid); read_unlock(&journal->j_state_lock); if (need_to_start) - jbd2_log_start_commit(journal, tid); + jbd2_log_start_commit(journal, tid, true); jbd2_might_wait_for_commit(journal); schedule(); finish_wait(&journal->j_wait_transaction_locked, &wait); @@ -708,7 +708,7 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask) need_to_start = !tid_geq(journal->j_commit_request, tid); read_unlock(&journal->j_state_lock); if (need_to_start) - jbd2_log_start_commit(journal, tid); + jbd2_log_start_commit(journal, tid, true); rwsem_release(&journal->j_trans_commit_map, 1, _THIS_IP_); handle->h_buffer_credits = nblocks; @@ -1822,7 +1822,7 @@ int jbd2_journal_stop(handle_t *handle) jbd_debug(2, "transaction too old, requesting commit for " "handle %p\n", handle); /* This is non-blocking */ - jbd2_log_start_commit(journal, transaction->t_tid); + jbd2_log_start_commit(journal, transaction->t_tid, true); /* * Special case: JBD2_SYNC synchronous updates require us diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c index 0c335b51043d..df41c43573b7 100644 --- a/fs/ocfs2/alloc.c +++ b/fs/ocfs2/alloc.c @@ -6117,7 +6117,7 @@ int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb, goto out; } - if (jbd2_journal_start_commit(osb->journal->j_journal, &target)) { + if (jbd2_journal_start_commit(osb->journal->j_journal, &target, true)) { jbd2_log_wait_commit(osb->journal->j_journal, target); ret = 1; } diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 8b2f39506648..60ecc51759ae 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -410,7 +410,7 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait) } if (jbd2_journal_start_commit(osb->journal->j_journal, - &target)) { + &target, true)) { if (wait) jbd2_log_wait_commit(osb->journal->j_journal, target); diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 153840b422cc..535f88dff653 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -742,6 +742,7 @@ struct transaction_run_stats_s { struct transaction_stats_s { unsigned long ts_tid; + unsigned long ts_num_fast_commits; unsigned long ts_requested; struct transaction_run_stats_s run; }; @@ -1364,7 +1365,8 @@ int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block); void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block); /* Commit management */ -extern void jbd2_journal_commit_transaction(journal_t *); +extern void jbd2_journal_commit_transaction(journal_t *journal, + bool *full_commit); /* Checkpoint list management */ void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy); @@ -1571,9 +1573,10 @@ extern void jbd2_clear_buffer_revoked_flags(journal_t *journal); * transitions on demand. */ -int jbd2_log_start_commit(journal_t *journal, tid_t tid); +int jbd2_log_start_commit(journal_t *journal, tid_t tid, bool full_commit); int __jbd2_log_start_commit(journal_t *journal, tid_t tid); -int jbd2_journal_start_commit(journal_t *journal, tid_t *tid); +int jbd2_journal_start_commit(journal_t *journal, tid_t *tid, + bool full_commit); int jbd2_log_wait_commit(journal_t *journal, tid_t tid); int jbd2_transaction_committed(journal_t *journal, tid_t tid); int jbd2_complete_transaction(journal_t *journal, tid_t tid); From patchwork Fri Aug 9 03:45:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144308 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="BAZTJ/LE"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPn09NQz9sP9 for ; Fri, 9 Aug 2019 13:46:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405194AbfHIDqe (ORCPT ); Thu, 8 Aug 2019 23:46:34 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:33934 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733140AbfHIDqd (ORCPT ); Thu, 8 Aug 2019 23:46:33 -0400 Received: by mail-pg1-f194.google.com with SMTP id n9so38927927pgc.1 for ; Thu, 08 Aug 2019 20:46:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bpt0b1KZUhGaMVUg1HH6jpR3z+8EsMhPMABTfN2nNvM=; b=BAZTJ/LEkQEeMJhM335/Wwzv6QnXexYVKO0oGHrG0WPMkHFGQNLKnkwnMcXKJRclwe M1+3D6Ov/A4GIHdHM0QyZippii/vgkf5AfZgKxslP95ltDSp4swOtJeMUT991xY5ohBe 5/TjRTdUyhra/KArbxI4+7b76OkWkmf/r32N2mLKbJS0rZ7cULjBbPPT2Olq7VnDfhUf JhuJpk7i5MzQDbC5qGCN+LzGpsxEctxs2BRDXrnboJN8KD7AYtB1uG1N+lE0rGcGgVU9 FYUGM7cn4A20dou1Qo363TGL2a5l6fn1KdXGMZodSacbyHYH0h8W0pOp8hULHmhHhvnJ xUCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bpt0b1KZUhGaMVUg1HH6jpR3z+8EsMhPMABTfN2nNvM=; b=GQeFTnVVZT4QzR9EeBPPtsnc5DiNpDOQ+Ly0+P9GEd16iongdnEpLMQW1NXkbYBxmG zvKNJBUpbHluJ5mE17SqQHrqQc7LTX2vjJQyYvDn+ftlhDN2hQ30FS2lExbkFSEeEMe2 OpxSUDf0BTA2SHMIw7hkjytlLdEhFAdABIuKsJIJ8eHOfRTp02kXyVmAg3aadXxY+5JC PgUHly66Qa2DXI14O+xN42CuPCbd0zbuRV6sadK7+iD3tM5/PcSGKzjrPUCIrt0Je/l7 zhvSu4BNja9GJaUMvvluzi9OO6ynaQW7jF4H3VgJ5snzoU73iLSsK1qCEEccEMlpzfr4 4tYw== X-Gm-Message-State: APjAAAVKAWMXAvgeCNH2Pq2yIvQ6vNZ5z2b0kN4gtaXf16RhBNLzZ9x1 luu89gxyOtzckSmSMWdJBbBZNP8z X-Google-Smtp-Source: APXvYqwjWsbuE7HtJODNkz6YbCkMTVIPkWXgr9gQRIIqteJaEMuZdGiEtX6yWryyIkCYskY6NRzi6w== X-Received: by 2002:a17:90a:710c:: with SMTP id h12mr7226114pjk.36.1565322391608; Thu, 08 Aug 2019 20:46:31 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.31 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:31 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 05/12] jbd2: fast-commit commit path new APIs Date: Thu, 8 Aug 2019 20:45:45 -0700 Message-Id: <20190809034552.148629-6-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch adds new helper APIs that ext4 needs for fast commits. These new fast commit APIs are used by subsequent fast commit patches to implement fast commits. Following new APIs are added: /* * Returns when either a full commit or a fast commit * completes */ int jbd2_fc_complete_commit(journal_tc *journal, tid_t tid, tid_t tid, tid_t subtid) /* Send all the data buffers related to an inode */ int journal_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode) /* Map one fast commit buffer for use by the file system */ int jbd2_map_fc_buf(journal_t *journal, struct buffer_head **bh_out) /* Wait on fast commit buffers to complete IO */ jbd2_wait_on_fc_bufs(journal_t *journal, int num_bufs) Signed-off-by: Harshad Shirwadkar Reviewed-by: Andreas Dilger --- Changelog: V2: 1) Fixed error reported by kbuild test robot. Removed duplicate EXPORT_SYMBOL() call. Also, added EXPORT_SYMBOL() for the new APIs introduced. 2) Changed jbd2_submit_fc_bufs() to jbd2_wait_on_fc_bufs(). This gives client file system to submit JBD2 buffers according to its own convenience. --- fs/jbd2/commit.c | 32 +++++++++++++++ fs/jbd2/journal.c | 98 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/jbd2.h | 6 +++ 3 files changed, 136 insertions(+) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 9281814606e7..db62a53436e3 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -202,6 +202,38 @@ static int journal_submit_inode_data_buffers(struct address_space *mapping, return ret; } +int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode) +{ + struct address_space *mapping; + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; + int ret; + + if (!jinode) + return 0; + + if (!(jinode->i_flags & JI_WRITE_DATA)) + return 0; + + dirty_start = jinode->i_dirty_start; + dirty_end = jinode->i_dirty_end; + + mapping = jinode->i_vfs_inode->i_mapping; + jinode->i_flags |= JI_COMMIT_RUNNING; + + trace_jbd2_submit_inode_data(jinode->i_vfs_inode); + ret = journal_submit_inode_data_buffers(mapping, dirty_start, + dirty_end); + + jinode->i_flags &= ~JI_COMMIT_RUNNING; + /* Protect JI_COMMIT_RUNNING flag */ + smp_mb(); + wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING); + + return ret; +} +EXPORT_SYMBOL(jbd2_submit_inode_data); + /* * Submit all the data buffers of inode associated with the transaction to * disk. diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index ab05e47ed2d4..1e15804b2c3c 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -811,6 +811,33 @@ int jbd2_complete_transaction(journal_t *journal, tid_t tid) } EXPORT_SYMBOL(jbd2_complete_transaction); +int jbd2_fc_complete_commit(journal_t *journal, tid_t tid, tid_t subtid) +{ + int need_to_wait = 1; + + read_lock(&journal->j_state_lock); + if (journal->j_running_transaction && + journal->j_running_transaction->t_tid == tid) { + /* Check if fast commit was already done */ + if (journal->j_subtid > subtid) + need_to_wait = 0; + if (journal->j_commit_request != tid) { + /* transaction not yet started, so request it */ + read_unlock(&journal->j_state_lock); + jbd2_log_start_commit(journal, tid, false); + goto wait_commit; + } + } else if (!(journal->j_committing_transaction && + journal->j_committing_transaction->t_tid == tid)) + need_to_wait = 0; + read_unlock(&journal->j_state_lock); + if (!need_to_wait) + return 0; +wait_commit: + return __jbd2_log_wait_commit(journal, tid, subtid); +} +EXPORT_SYMBOL(jbd2_fc_complete_commit); + /* * Log buffer allocation routines: */ @@ -831,6 +858,77 @@ int jbd2_journal_next_log_block(journal_t *journal, unsigned long long *retp) return jbd2_journal_bmap(journal, blocknr, retp); } +int jbd2_map_fc_buf(journal_t *journal, struct buffer_head **bh_out) +{ + unsigned long long pblock; + unsigned long blocknr; + int ret = 0; + struct buffer_head *bh; + int fc_off; + journal_header_t *jhdr; + + write_lock(&journal->j_state_lock); + + if (journal->j_fc_off + journal->j_first_fc < journal->j_last_fc) { + fc_off = journal->j_fc_off; + blocknr = journal->j_first_fc + fc_off; + journal->j_fc_off++; + } else { + ret = -EINVAL; + } + write_unlock(&journal->j_state_lock); + + if (ret) + return ret; + + ret = jbd2_journal_bmap(journal, blocknr, &pblock); + if (ret) + return ret; + + bh = __getblk(journal->j_dev, pblock, journal->j_blocksize); + if (!bh) + return -ENOMEM; + + lock_buffer(bh); + jhdr = (journal_header_t *)bh->b_data; + jhdr->h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER); + jhdr->h_blocktype = cpu_to_be32(JBD2_FC_BLOCK); + jhdr->h_sequence = cpu_to_be32(journal->j_running_transaction->t_tid); + + set_buffer_uptodate(bh); + unlock_buffer(bh); + journal->j_fc_wbuf[fc_off] = bh; + + *bh_out = bh; + + return 0; +} +EXPORT_SYMBOL(jbd2_map_fc_buf); + +int jbd2_wait_on_fc_bufs(journal_t *journal, int num_blks) +{ + struct buffer_head *bh; + int i, j_fc_off; + + read_lock(&journal->j_state_lock); + j_fc_off = journal->j_fc_off; + read_unlock(&journal->j_state_lock); + + /* + * Wait in reverse order to minimize chances of us being woken up before + * all IOs have completed + */ + for (i = j_fc_off - 1; i >= j_fc_off - num_blks; i--) { + bh = journal->j_fc_wbuf[i]; + wait_on_buffer(bh); + if (unlikely(!buffer_uptodate(bh))) + return -EIO; + } + + return 0; +} +EXPORT_SYMBOL(jbd2_wait_on_fc_bufs); + /* * Conversion of logical to physical block numbers for the journal * diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 535f88dff653..5362777d06f8 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -124,6 +124,7 @@ typedef struct journal_s journal_t; /* Journal control structure */ #define JBD2_SUPERBLOCK_V1 3 #define JBD2_SUPERBLOCK_V2 4 #define JBD2_REVOKE_BLOCK 5 +#define JBD2_FC_BLOCK 6 /* * Standard header for all descriptor blocks: @@ -1582,6 +1583,7 @@ int jbd2_transaction_committed(journal_t *journal, tid_t tid); int jbd2_complete_transaction(journal_t *journal, tid_t tid); int jbd2_log_do_checkpoint(journal_t *journal); int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid); +int jbd2_fc_complete_commit(journal_t *journal, tid_t tid, tid_t subtid); void __jbd2_log_wait_for_space(journal_t *journal); extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *); @@ -1732,6 +1734,10 @@ static inline tid_t jbd2_get_latest_transaction(journal_t *journal) return tid; } +int jbd2_map_fc_buf(journal_t *journal, struct buffer_head **bh_out); +int jbd2_wait_on_fc_bufs(journal_t *journal, int num_blks); +int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode); + #ifdef __KERNEL__ #define buffer_trace_init(bh) do {} while (0) From patchwork Fri Aug 9 03:45:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144309 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="NuQbZAuu"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPn32D2z9sPS for ; Fri, 9 Aug 2019 13:46:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405205AbfHIDqf (ORCPT ); Thu, 8 Aug 2019 23:46:35 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:40165 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404804AbfHIDqd (ORCPT ); Thu, 8 Aug 2019 23:46:33 -0400 Received: by mail-pf1-f194.google.com with SMTP id p184so45270228pfp.7 for ; Thu, 08 Aug 2019 20:46:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HqweDCINfq7SD3bd+W+SfiaT2AiY7FPjQY2Mf2yBYaw=; b=NuQbZAuuxd29LssFWCIy0Meo7dqwrGV94+8VGKLgr+/GM54fIDBUbMT6VJjcl4BPxR p6Bmg8xakxYyd2MlVAJreWYnNXY4PaDIeWFU33SI78XsczG22SFZP6zZYLqOqk8ebra2 vOf3E28eIH+j/F15ajS+eFzJBpPMkMveeAPdzpYD2+oky3ifJwwrgMzyumIgSCycBj6c uzvqIAqABk9SNFGrk1i5Gnv66XpP4zqi5CFDPq3LgQ8S0tcferYBgXfYKIv35xH0X5+z dmSyA4MbGCxQwria1wwOLUA7L1O4LhG4Ac+UQ4bFWopepS8GGt4pjXYhJXxcs0FSh0uq BvsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HqweDCINfq7SD3bd+W+SfiaT2AiY7FPjQY2Mf2yBYaw=; b=Y1wVCev7cGJbiZK5vzyQSjqCYhCXCcWDxKukpiMvm7cJ9xAqV5GAy1k7t3/q//3i5z 54aNAykUVdV9aM1wFXPr3ie5bL1sbO+ClJadViQbfw6vbp2k4CCT5H6e6mKcgKNgbFcE XT3oBt9k2cHupFUlkMAZTEVjPOxbYZerE6MEn9ANif3viKKKhNRpmaBzA31DMgZrAr3J VLocBn6rC0P2Z+KCXjKHswBCK6fK0evzsiYNYQVoLl8NODBK423C9fOM8mrL1bLIe6An KRN/qow69NYJ3QFgAJ8WDBJpzqD+/UMj4N26Myo9FOZ2TwyvdMi9ZRfl2J5ShLxMDD7L 0g0w== X-Gm-Message-State: APjAAAV/OIIKDbxMYRaCoPwCmv6jaNlmXzWUagva2DMAVRkiXRQN7zPY 5Wsc99gV5F4ryjjwXRgDNGk8sYL7 X-Google-Smtp-Source: APXvYqxGrrBUJ3RRiXvE/YbFnt5Vj1B4bX3Lga3mCEHiS0GRIkd07A4gQu1jv+RVC1MkglqmW0xjAw== X-Received: by 2002:a63:4a04:: with SMTP id x4mr8398505pga.411.1565322392169; Thu, 08 Aug 2019 20:46:32 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.31 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:31 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 06/12] jbd2: fast-commit recovery path changes Date: Thu, 8 Aug 2019 20:45:46 -0700 Message-Id: <20190809034552.148629-7-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch adds fast-commit recovery path changes for JBD2. If we find a fast commit block that is valid in our recovery phase call file system specific routine to handle that block. We also clear the fast commit flag in jbd2_mark_journal_empty() which is called after successful recovery as well successful checkpointing. This allows JBD2 journal to be compatible with older versions when there are not fast commit blocks. Signed-off-by: Harshad Shirwadkar Reviewed-by: Andreas Dilger --- Changelog: V2: Fixed checkpatch error. --- fs/jbd2/journal.c | 12 ++++++++++ fs/jbd2/recovery.c | 59 +++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 68 insertions(+), 3 deletions(-) diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 1e15804b2c3c..ae4584a60cc3 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1604,6 +1604,7 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid, static void jbd2_mark_journal_empty(journal_t *journal, int write_op) { journal_superblock_t *sb = journal->j_superblock; + bool had_fast_commit = false; BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex)); lock_buffer(journal->j_sb_buffer); @@ -1617,6 +1618,14 @@ static void jbd2_mark_journal_empty(journal_t *journal, int write_op) sb->s_sequence = cpu_to_be32(journal->j_tail_sequence); sb->s_start = cpu_to_be32(0); + if (jbd2_has_feature_fast_commit(journal)) { + /* + * When journal is clean, no need to commit fast commit flag and + * make file system incompatible with older kernels. + */ + jbd2_clear_feature_fast_commit(journal); + had_fast_commit = true; + } jbd2_write_superblock(journal, write_op); @@ -1624,6 +1633,9 @@ static void jbd2_mark_journal_empty(journal_t *journal, int write_op) write_lock(&journal->j_state_lock); journal->j_flags |= JBD2_FLUSHED; write_unlock(&journal->j_state_lock); + + if (had_fast_commit) + jbd2_set_feature_fast_commit(journal); } diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index a4967b27ffb6..3a6cd1497504 100644 --- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -225,8 +225,12 @@ static int count_tags(journal_t *journal, struct buffer_head *bh) /* Make sure we wrap around the log correctly! */ #define wrap(journal, var) \ do { \ - if (var >= (journal)->j_last) \ - var -= ((journal)->j_last - (journal)->j_first); \ + unsigned long _wrap_last = \ + jbd2_has_feature_fast_commit(journal) ? \ + (journal)->j_last_fc : (journal)->j_last; \ + \ + if (var >= _wrap_last) \ + var -= (_wrap_last - (journal)->j_first); \ } while (0) /** @@ -413,6 +417,49 @@ static int jbd2_block_tag_csum_verify(journal_t *j, journal_block_tag_t *tag, return tag->t_checksum == cpu_to_be16(csum32); } +static int fc_do_one_pass(journal_t *journal, + struct recovery_info *info, enum passtype pass) +{ + unsigned int expected_commit_id = info->end_transaction; + unsigned long next_fc_block; + struct buffer_head *bh; + unsigned int seq; + journal_header_t *jhdr; + int err = 0; + + next_fc_block = journal->j_first_fc; + + while (next_fc_block != journal->j_last_fc) { + jbd_debug(3, "Fast commit replay: next block %lld", + next_fc_block); + err = jread(&bh, journal, next_fc_block); + if (err) + break; + + jhdr = (journal_header_t *)bh->b_data; + seq = be32_to_cpu(jhdr->h_sequence); + if (be32_to_cpu(jhdr->h_magic) != JBD2_MAGIC_NUMBER || + seq != expected_commit_id) { + break; + } + jbd_debug(3, "Processing fast commit blk with seq %d", + seq); + if (pass == PASS_REPLAY && + journal->j_fc_replay_callback) { + err = journal->j_fc_replay_callback(journal, + bh); + if (err) + break; + } + next_fc_block++; + } + + if (err) + jbd_debug(3, "Fast commit replay failed, err = %d\n", err); + + return err; +} + static int do_one_pass(journal_t *journal, struct recovery_info *info, enum passtype pass) { @@ -470,7 +517,7 @@ static int do_one_pass(journal_t *journal, break; jbd_debug(2, "Scanning for sequence ID %u at %lu/%lu\n", - next_commit_ID, next_log_block, journal->j_last); + next_commit_ID, next_log_block, journal->j_last_fc); /* Skip over each chunk of the transaction looking * either the next descriptor block or the final commit @@ -768,6 +815,8 @@ static int do_one_pass(journal_t *journal, if (err) goto failed; continue; + case JBD2_FC_BLOCK: + continue; default: jbd_debug(3, "Unrecognised magic %d, end of scan.\n", @@ -799,6 +848,10 @@ static int do_one_pass(journal_t *journal, success = -EIO; } } + + if (jbd2_has_feature_fast_commit(journal) && pass == PASS_REPLAY) + fc_do_one_pass(journal, info, pass); + if (block_error && success == 0) success = -EIO; return success; From patchwork Fri Aug 9 03:45:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144310 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="dbcILWAn"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPn6YrKz9sPX for ; Fri, 9 Aug 2019 13:46:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405211AbfHIDqf (ORCPT ); Thu, 8 Aug 2019 23:46:35 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:35982 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405192AbfHIDqe (ORCPT ); Thu, 8 Aug 2019 23:46:34 -0400 Received: by mail-pg1-f196.google.com with SMTP id l21so45142581pgm.3 for ; Thu, 08 Aug 2019 20:46:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9imIcesxR2ONOiIMiWLAgvV9/LerKPuKsi6B5H8lJbQ=; b=dbcILWAncVNBTq009AXNRWw8smV3iVQPLVcj+sGuDG4Rr6v0rH9xIZM2glX+d9Ui7C o+KbIoVk3QkR3lumgqGRsKLjWqISVQNEbQcifvEHhaq9Cjbs6M7CDu3RD7xF7KISPpU2 czMwQQpRpF5Xbt1y7iNZ/KeF+NWLpYOsq68skIsyVS2KYcw+uwLC/eK1cM51tbWpfjCj /8BPFn0v6HhLhszpHOXSE+gxvbTWNAy79xFan3cD0i10DjeDnI2JEfYhw4Ndg86NHTDr BqbrjvS05shYdI1Ule6UpaxP/JaALW2Xa3d5qS5DNj7/Wc+20iayeUGw4zr+6zFWgnPb 69ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9imIcesxR2ONOiIMiWLAgvV9/LerKPuKsi6B5H8lJbQ=; b=T0NbMei1lET1+yqUgc1F6YoDa9qoynR+MYiYG6npodZfJfi+cU6xfCngeE03xTTtHj 2GHxAQpJzQ+N0JXADA+DHqDCj4yUNbVMIZaHp3kd/mS8TopL3hnNcjIRzRk0/jpgkaIC 4nLmhF7fMc4qn2L54QNrn6Fmsp08G45TCJnFqLPje3QZSkuFI1sdqmXx5vY+Wx+ajxCN 99nDoSZ41EKg1VZyhN4SCJqYkG+0L+kQiqPNupD9ZaBkJmTEl8kkZxcaLs1apf7CctA6 gzUpiv85AoFfPeUvZNkdg7HJHK/QV1olP1wgTzJ6inCX5FeEMWwp+QJFPfRAFLk2ecxX iLEA== X-Gm-Message-State: APjAAAWI5rdLjyJZc5896I31a9LFmZ7baZFjhmK1WEyrUdd5O9ijXyaM T9jQd5uqvdY8OVG63YLHED62SyDm X-Google-Smtp-Source: APXvYqwU4PqSlM4ugrOc8uf/OZ857/ibUeCLN1Zxi7H8sG25jGN22xkigkGWFCdctap4Yy7qentRNg== X-Received: by 2002:a65:5342:: with SMTP id w2mr15468817pgr.261.1565322392873; Thu, 08 Aug 2019 20:46:32 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.32 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:32 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 07/12] ext4: add fields that are needed to track changed files Date: Thu, 8 Aug 2019 20:45:47 -0700 Message-Id: <20190809034552.148629-8-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Ext4's fast commit feature tracks changed files and maintains them in a queue. We also remember for each file the logical block range that needs to be committed. This patch adds these fields to ext4_inode_info and ext4_sb_info and also adds initialization calls. Signed-off-by: Harshad Shirwadkar --- Changelog: V2: Converted s_fc_lock from mutex to spinlock to improve parallelism performance. --- fs/ext4/ext4.h | 34 ++++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.c | 13 +++++++++++++ fs/ext4/ext4_jbd2.h | 2 ++ fs/ext4/inode.c | 1 + fs/ext4/super.c | 7 +++++++ 5 files changed, 57 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index becbda38b7db..0d15d4539dda 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -921,6 +921,27 @@ enum { I_DATA_SEM_QUOTA, }; +/* + * Ext4 fast commit inode specific information + */ +struct ext4_fast_commit_inode_info { + /* TID / SUB-TID when old_i_size and i_size were recorded */ + tid_t fc_tid; + tid_t fc_subtid; + + /* + * Start of logical block range that needs to be committed in this fast + * commit + */ + loff_t fc_lblk_start; + + /* + * End of logical block range that needs to be committed in this fast + * commit + */ + loff_t fc_lblk_end; +}; + /* * fourth extended file system inode data in memory @@ -955,6 +976,9 @@ struct ext4_inode_info { struct list_head i_orphan; /* unlinked but open inodes */ + struct list_head i_fc_list; /* inodes that need fast commit */ + struct ext4_fast_commit_inode_info i_fc; + /* * i_disksize keeps track of what the inode size is ON DISK, not * in memory. During truncate, i_size is set to the new size by @@ -1529,6 +1553,16 @@ struct ext4_sb_info { /* Barrier between changing inodes' journal flags and writepages ops. */ struct percpu_rw_semaphore s_journal_flag_rwsem; struct dax_device *s_daxdev; + + /* Ext4 fast commit stuff */ + bool fc_replay; /* Fast commit replay in progress */ + struct list_head s_fc_q; /* Inodes that need fast commit. */ + __u32 s_fc_q_cnt; /* Number of inodes in the fc queue */ + bool s_fc_eligible; /* + * Are changes after the last commit + * eligible for fast commit? + */ + spinlock_t s_fc_lock; }; static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 7c70b08d104c..75b6db808837 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -330,3 +330,16 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line, mark_buffer_dirty(bh); return err; } + +void ext4_init_inode_fc_info(struct inode *inode) +{ + handle_t *handle = ext4_journal_current_handle(); + struct ext4_inode_info *ei = EXT4_I(inode); + + memset(&ei->i_fc, 0, sizeof(ei->i_fc)); + if (ext4_handle_valid(handle)) { + ei->i_fc.fc_tid = handle->h_transaction->t_tid; + ei->i_fc.fc_subtid = handle->h_transaction->t_journal->j_subtid; + } + INIT_LIST_HEAD(&ei->i_fc_list); +} diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index ef8fcf7d0d3b..2305c1acd415 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -459,4 +459,6 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) return 1; } +void ext4_init_inode_fc_info(struct inode *inode); + #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 420fe3deed39..f230a888eddd 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4996,6 +4996,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, for (block = 0; block < EXT4_N_BLOCKS; block++) ei->i_data[block] = raw_inode->i_block[block]; INIT_LIST_HEAD(&ei->i_orphan); + ext4_init_inode_fc_info(&ei->vfs_inode); /* * Set transaction id's of transactions that have to be committed diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 6bab59ae81f7..0b833e9b61c1 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1100,6 +1100,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) ei->i_datasync_tid = 0; atomic_set(&ei->i_unwritten, 0); INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work); + ext4_init_inode_fc_info(&ei->vfs_inode); return &ei->vfs_inode; } @@ -1139,6 +1140,7 @@ static void init_once(void *foo) init_rwsem(&ei->i_data_sem); init_rwsem(&ei->i_mmap_sem); inode_init_once(&ei->vfs_inode); + ext4_init_inode_fc_info(&ei->vfs_inode); } static int __init init_inodecache(void) @@ -4301,6 +4303,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */ mutex_init(&sbi->s_orphan_lock); + INIT_LIST_HEAD(&sbi->s_fc_q); + sbi->s_fc_q_cnt = 0; + sbi->s_fc_eligible = true; + spin_lock_init(&sbi->s_fc_lock); + sb->s_root = NULL; needs_recovery = (es->s_last_orphan != 0 || From patchwork Fri Aug 9 03:45:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144312 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="BeXhuuLz"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPq2hrhz9sPJ for ; Fri, 9 Aug 2019 13:46:39 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405242AbfHIDqh (ORCPT ); Thu, 8 Aug 2019 23:46:37 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:37546 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405195AbfHIDqf (ORCPT ); Thu, 8 Aug 2019 23:46:35 -0400 Received: by mail-pg1-f196.google.com with SMTP id d1so12331039pgp.4 for ; Thu, 08 Aug 2019 20:46:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Pr4mcYfXdOHlnHrSzvHFB3M6Gn1mUEIhZlY0QXlBbXU=; b=BeXhuuLzhhRnVlJlobMOi+PBR7F8Lx9YH/EPXF0XnB2DxosDkBlS0mRjPJEzTPcFzP 784jjgdvwCjuaIUSr4stYv0UcoR545g0djvk0BoOINgU9DE6Z1OlV27zoDIjfAnN5foK 06i1OcBOwTUgIiOIuOL+9bnFFYopwmSPG/gxqMhJDZ5AZIdM6n0ILXOlZ59MBBZc2CKz XR5Fb+2dFKnb2b/kS/CpwGRK+ZMbnsAKzpZ6x3mY2OttVD+e55Pqeo18pI5lYZn66AVJ K+OXG3rQ+ONtl5kGhNvvzYf44qOSGewimrxunf3csy9sUdF9tODWj8qnsj5kjv2vVhX3 spWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Pr4mcYfXdOHlnHrSzvHFB3M6Gn1mUEIhZlY0QXlBbXU=; b=OdcxRLf0dHDv95Vg6BvV7Iedn1aGrb59cf8RsXTRGBtkMH0bEZKVvOXyC74hQwX4kB dRrBcsL9lXdlp3RbfYdJh3/fVTTto8vGFxWp6bubqCzpyOFhHZ5IghZRuTWNQzPLssKO 5+2NaIL2vogBTSsy6vnUXM7PIVpiDl7n+cnPjXRcdWZshGDguIqdX/BQ2MKpNsG/aoUe mTl2BapZJnLfDouiODdL7sBNLzgi79LfqzZr0kcXvhF+qf0iSWywhz7i0io9sOEFMFxA BpBns2eykpT2XKm2+U372AKnDjUSqHmc00+dANrItvYoUsd1geNEMsq5plAeyRRHK03e gYbw== X-Gm-Message-State: APjAAAUQfzYzxi8a6qJq/MmpnX1ediK6xYPGA2KErTY6ZlhOWdJBn+my +gZPwZ2qXYofOy+JCR+jnodLRXNw X-Google-Smtp-Source: APXvYqzgjciQr1yUssl6LnKrlpO2jcvZVYcJheeQjsyQuZZ/oq37goRhc3SpdWdxnLt/7S83F6tBVg== X-Received: by 2002:a65:6454:: with SMTP id s20mr15065789pgv.15.1565322393570; Thu, 08 Aug 2019 20:46:33 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.32 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:33 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 08/12] ext4: track changed files for fast commit Date: Thu, 8 Aug 2019 20:45:48 -0700 Message-Id: <20190809034552.148629-9-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org For fast commit, we need to remember all the files that have changed since last fast commit / full commit. For changes that are fast commit incompatible, we mark the file system fast commit incompatible. This patch adds code to either remember files that have changed or to mark ext4 as fast commit ineligible. We inspect every ext4_mark_inode_dirty calls and decide whether that particular file change is fast compatible or not. Signed-off-by: Harshad Shirwadkar --- Changelog: V2: Using spinlocks instead of mutexes for s_fc_lock. --- fs/ext4/acl.c | 1 + fs/ext4/ext4_jbd2.c | 46 +++++++++++++++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.h | 25 ++++++++++++++++++++++++ fs/ext4/extents.c | 17 +++++++++++++++-- fs/ext4/ialloc.c | 1 + fs/ext4/inline.c | 12 ++++++++++++ fs/ext4/inode.c | 30 +++++++++++++++++++++++++++-- fs/ext4/ioctl.c | 3 +++ fs/ext4/migrate.c | 1 + fs/ext4/namei.c | 14 +++++++++++++- fs/ext4/super.c | 15 +++++++++++++++ fs/ext4/xattr.c | 1 + 12 files changed, 161 insertions(+), 5 deletions(-) diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c index 8c7bbf3e566d..e84be9c315db 100644 --- a/fs/ext4/acl.c +++ b/fs/ext4/acl.c @@ -257,6 +257,7 @@ ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type) inode->i_mode = mode; inode->i_ctime = current_time(inode); ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); } out_stop: ext4_journal_stop(handle); diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 75b6db808837..d77b9f1e9dab 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -343,3 +343,49 @@ void ext4_init_inode_fc_info(struct inode *inode) } INIT_LIST_HEAD(&ei->i_fc_list); } + +void ext4_fc_enqueue_inode(handle_t *handle, struct inode *inode) +{ + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + struct ext4_inode_info *ei = EXT4_I(inode); + + if (!ext4_should_fast_commit(inode->i_sb)) + return; + + spin_lock(&sbi->s_fc_lock); + if (!sbi->s_fc_eligible) { + spin_unlock(&sbi->s_fc_lock); + return; + } + if (list_empty(&EXT4_I(inode)->i_fc_list)) { + list_add(&EXT4_I(inode)->i_fc_list, &sbi->s_fc_q); + sbi->s_fc_q_cnt++; + } + spin_unlock(&sbi->s_fc_lock); + + if (!ext4_handle_valid(handle)) + return; + + if (ei->i_fc.fc_tid == handle->h_transaction->t_tid && + ei->i_fc.fc_subtid == + handle->h_transaction->t_journal->j_subtid) + return; + + ei->i_fc.fc_lblk_start = i_size_read(inode); + ei->i_fc.fc_lblk_end = i_size_read(inode); + ei->i_fc.fc_subtid = handle->h_transaction->t_journal->j_subtid; + ei->i_fc.fc_tid = handle->h_transaction->t_tid; +} + +void ext4_fc_del(struct inode *inode) +{ + if (!ext4_should_fast_commit(inode->i_sb)) + return; + + if (list_empty(&EXT4_I(inode)->i_fc_list)) + return; + + spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock); + list_del_init(&EXT4_I(inode)->i_fc_list); + spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); +} diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 2305c1acd415..a27cc3a5c676 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -459,6 +459,31 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) return 1; } +static inline int ext4_should_fast_commit(struct super_block *sb) +{ + if (!ext4_has_feature_fast_commit(sb)) + return 0; + if (!test_opt2(sb, JOURNAL_FAST_COMMIT)) + return 0; + if (test_opt(sb, QUOTA)) + return 0; + return 1; +} + void ext4_init_inode_fc_info(struct inode *inode); +extern void ext4_fc_enqueue_inode(handle_t *handle, + struct inode *inode); +extern void ext4_fc_del(struct inode *inode); + +static inline void +ext4_fc_mark_ineligible(struct super_block *sb) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + + spin_lock(&sbi->s_fc_lock); + sbi->s_fc_eligible = false; + spin_unlock(&sbi->s_fc_lock); +} + #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 92266a2da7d6..eb77e306a82b 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -163,6 +163,7 @@ int __ext4_ext_dirty(const char *where, unsigned int line, handle_t *handle, } else { /* path points to leaf/index in inode body */ err = ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); } return err; } @@ -1371,6 +1372,7 @@ static int ext4_ext_create_new_leaf(handle_t *handle, struct inode *inode, struct ext4_ext_path *curp; int depth, i, err = 0; + ext4_fc_mark_ineligible(inode->i_sb); repeat: i = depth = ext_depth(inode); @@ -3714,6 +3716,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, err = ext4_zeroout_es(inode, &zero_ex1); if (!err) err = ext4_zeroout_es(inode, &zero_ex2); + } else { + ext4_fc_mark_ineligible(inode->i_sb); } return err ? err : allocated; } @@ -3856,7 +3860,7 @@ static int check_eofblocks_fl(handle_t *handle, struct inode *inode, struct ext4_ext_path *path, unsigned int len) { - int i, depth; + int i, ret, depth; struct ext4_extent_header *eh; struct ext4_extent *last_ex; @@ -3898,7 +3902,10 @@ static int check_eofblocks_fl(handle_t *handle, struct inode *inode, return 0; out: ext4_clear_inode_flag(inode, EXT4_INODE_EOFBLOCKS); - return ext4_mark_inode_dirty(handle, inode); + ret = ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); + + return ret; } static int @@ -4607,6 +4614,7 @@ static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset, inode->i_ino, map.m_lblk, map.m_len, ret); ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); ret2 = ext4_journal_stop(handle); break; } @@ -4624,6 +4632,7 @@ static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset, ext4_set_inode_flag(inode, EXT4_INODE_EOFBLOCKS); } + ext4_fc_enqueue_inode(handle, inode); ext4_mark_inode_dirty(handle, inode); ext4_update_inode_fsync_trans(handle, inode, 1); ret2 = ext4_journal_stop(handle); @@ -4786,6 +4795,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, ext4_set_inode_flag(inode, EXT4_INODE_EOFBLOCKS); } ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); /* Zero out partial block at the edges of the range */ ret = ext4_zero_partial_blocks(handle, inode, offset, len); @@ -4957,6 +4967,7 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode, "ext4_ext_map_blocks returned %d", inode->i_ino, map.m_lblk, map.m_len, ret); + ext4_fc_mark_ineligible(inode->i_sb); ext4_mark_inode_dirty(handle, inode); if (credits) ret2 = ext4_journal_stop(handle); @@ -5485,6 +5496,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) if (IS_SYNC(inode)) ext4_handle_sync(handle); inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_mark_ineligible(inode->i_sb); ext4_mark_inode_dirty(handle, inode); ext4_update_inode_fsync_trans(handle, inode, 1); @@ -5599,6 +5611,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) inode->i_size += len; EXT4_I(inode)->i_disksize += len; inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_mark_ineligible(inode->i_sb); ret = ext4_mark_inode_dirty(handle, inode); if (ret) goto out_stop; diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 764ff4c56233..97a9882a3363 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1175,6 +1175,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, ei->i_datasync_tid = handle->h_transaction->t_tid; } + ext4_fc_mark_ineligible(sb); err = ext4_mark_inode_dirty(handle, inode); if (err) { ext4_std_error(sb, err); diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 88cdf3c90bd1..190968996bc6 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -435,6 +435,8 @@ static int ext4_destroy_inline_data_nolock(handle_t *handle, if (error) goto out; + ext4_fc_mark_ineligible(inode->i_sb); + memset((void *)ext4_raw_inode(&is.iloc)->i_block, 0, EXT4_MIN_INLINE_DATA_SIZE); memset(ei->i_data, 0, EXT4_MIN_INLINE_DATA_SIZE); @@ -759,6 +761,8 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, ext4_write_unlock_xattr(inode, &no_expand); brelse(iloc.bh); + ext4_fc_enqueue_inode(ext4_journal_current_handle(), + inode); mark_inode_dirty(inode); out: return copied; @@ -974,6 +978,8 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, * ordering of page lock and transaction start for journaling * filesystems. */ + ext4_fc_enqueue_inode(ext4_journal_current_handle(), + inode); mark_inode_dirty(inode); return copied; @@ -1165,6 +1171,7 @@ static int ext4_finish_convert_inline_dir(handle_t *handle, if (err) return err; set_buffer_verified(dir_block); + ext4_fc_mark_ineligible(inode->i_sb); return ext4_mark_inode_dirty(handle, inode); } @@ -1216,6 +1223,8 @@ static int ext4_convert_inline_data_nolock(handle_t *handle, goto out_restore; } + ext4_fc_mark_ineligible(inode->i_sb); + data_bh = sb_getblk(inode->i_sb, map.m_pblk); if (!data_bh) { error = -ENOMEM; @@ -1709,6 +1718,8 @@ int ext4_delete_inline_entry(handle_t *handle, if (err) goto out; + ext4_fc_enqueue_inode(handle, dir); + ext4_show_inline_dir(dir, iloc.bh, inline_start, inline_size); out: ext4_write_unlock_xattr(dir, &no_expand); @@ -1986,6 +1997,7 @@ int ext4_inline_data_truncate(struct inode *inode, int *has_inline) if (err == 0) { inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_enqueue_inode(handle, inode); err = ext4_mark_inode_dirty(handle, inode); if (IS_SYNC(inode)) ext4_handle_sync(handle); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f230a888eddd..379e911b48c4 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -262,6 +262,7 @@ void ext4_evict_inode(struct inode *inode) * cleaned up. */ ext4_orphan_del(NULL, inode); + ext4_fc_del(inode); sb_end_intwrite(inode->i_sb); goto no_delete; } @@ -279,6 +280,8 @@ void ext4_evict_inode(struct inode *inode) if (ext4_inode_is_fast_symlink(inode)) memset(EXT4_I(inode)->i_data, 0, sizeof(EXT4_I(inode)->i_data)); inode->i_size = 0; + ext4_fc_del(inode); + ext4_fc_mark_ineligible(inode->i_sb); err = ext4_mark_inode_dirty(handle, inode); if (err) { ext4_warning(inode->i_sb, @@ -303,6 +306,7 @@ void ext4_evict_inode(struct inode *inode) stop_handle: ext4_journal_stop(handle); ext4_orphan_del(NULL, inode); + ext4_fc_del(inode); sb_end_intwrite(inode->i_sb); ext4_xattr_inode_array_free(ea_inode_array); goto no_delete; @@ -326,6 +330,8 @@ void ext4_evict_inode(struct inode *inode) * having errors), but we can't free the inode if the mark_dirty * fails. */ + ext4_fc_del(inode); + ext4_fc_mark_ineligible(inode->i_sb); if (ext4_mark_inode_dirty(handle, inode)) /* If that failed, just do the required in-core inode clear. */ ext4_clear_inode(inode); @@ -1436,8 +1442,10 @@ static int ext4_write_end(struct file *file, * ordering of page lock and transaction start for journaling * filesystems. */ - if (i_size_changed || inline_data) + if (i_size_changed || inline_data) { ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); + } if (pos + len > inode->i_size && ext4_can_truncate(inode)) /* if we have allocated more blocks and copied @@ -1550,6 +1558,7 @@ static int ext4_journalled_write_end(struct file *file, pagecache_isize_extended(inode, old_size, pos); if (size_changed || inline_data) { + ext4_fc_enqueue_inode(handle, inode); ret2 = ext4_mark_inode_dirty(handle, inode); if (!ret) ret = ret2; @@ -2077,6 +2086,7 @@ static int __ext4_journalled_writepage(struct page *page, if (inline_data) { ret = ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); } else { ret = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL, do_journal_get_write_access); @@ -2604,6 +2614,7 @@ static int mpage_map_and_submit_extent(handle_t *handle, EXT4_I(inode)->i_disksize = disksize; up_write(&EXT4_I(inode)->i_data_sem); err2 = ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); if (err2) ext4_error(inode->i_sb, "Failed to mark inode %lu dirty", @@ -3205,6 +3216,7 @@ static int ext4_da_write_end(struct file *file, * bu greater than i_disksize.(hint delalloc) */ ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); } } @@ -3614,8 +3626,12 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length, ret = PTR_ERR(handle); goto orphan_del; } - if (ext4_update_inode_size(inode, offset + written)) + + if (ext4_update_inode_size(inode, offset + written)) { ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); + } + /* * We may need to truncate allocated but not written blocks beyond EOF. */ @@ -3851,6 +3867,7 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) * ignore it. */ ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); } } err = ext4_journal_stop(handle); @@ -4372,6 +4389,8 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) goto out_dio; } + ext4_fc_mark_ineligible(inode->i_sb); + ret = ext4_zero_partial_blocks(handle, inode, offset, length); if (ret) @@ -4525,6 +4544,7 @@ int ext4_truncate(struct inode *inode) if (inode->i_size & (inode->i_sb->s_blocksize - 1)) ext4_block_truncate_page(handle, mapping, inode->i_size); + ext4_fc_mark_ineligible(inode->i_sb); /* * We add the inode to the orphan list, so that if this * truncate spans multiple transactions, and we crash, we will @@ -5593,6 +5613,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (attr->ia_valid & ATTR_GID) inode->i_gid = attr->ia_gid; error = ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); ext4_journal_stop(handle); } @@ -5653,6 +5674,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) inode->i_mtime = current_time(inode); inode->i_ctime = inode->i_mtime; } + ext4_fc_enqueue_inode(handle, inode); down_write(&EXT4_I(inode)->i_data_sem); EXT4_I(inode)->i_disksize = attr->ia_size; rc = ext4_mark_inode_dirty(handle, inode); @@ -5697,6 +5719,8 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (!error) { setattr_copy(inode, attr); + ext4_fc_enqueue_inode(ext4_journal_current_handle(), + inode); mark_inode_dirty(inode); } @@ -6109,6 +6133,7 @@ void ext4_dirty_inode(struct inode *inode, int flags) goto out; ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); ext4_journal_stop(handle); out: @@ -6194,6 +6219,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) if (IS_ERR(handle)) return PTR_ERR(handle); + ext4_fc_mark_ineligible(inode->i_sb); err = ext4_mark_inode_dirty(handle, inode); ext4_handle_sync(handle); ext4_journal_stop(handle); diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 442f7ef873fc..c676fa118414 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -987,6 +987,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) err = mnt_want_write_file(filp); if (err) return err; + ext4_fc_mark_ineligible(sb); err = swap_inode_boot_loader(sb, inode); mnt_drop_write_file(filp); return err; @@ -997,6 +998,8 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) int err = 0, err2 = 0; ext4_group_t o_group = EXT4_SB(sb)->s_groups_count; + ext4_fc_mark_ineligible(sb); + if (copy_from_user(&n_blocks_count, (__u64 __user *)arg, sizeof(__u64))) { return -EFAULT; diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c index b1e4d359f73b..b995690d73ce 100644 --- a/fs/ext4/migrate.c +++ b/fs/ext4/migrate.c @@ -513,6 +513,7 @@ int ext4_ext_migrate(struct inode *inode) * work to orphan_list_cleanup() */ ext4_orphan_del(NULL, tmp_inode); + ext4_fc_del(inode); retval = PTR_ERR(handle); goto out; } diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 129029534075..e77ff130c045 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2022,6 +2022,7 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname, ext4_update_dx_flag(dir); inode_inc_iversion(dir); ext4_mark_inode_dirty(handle, dir); + ext4_fc_mark_ineligible(dir->i_sb); BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); err = ext4_handle_dirty_dirblock(handle, dir, bh); if (err) @@ -2140,8 +2141,10 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname, * out all the changes we did so far. Otherwise we can end up * with corrupted filesystem. */ - if (retval) + if (retval) { ext4_mark_inode_dirty(handle, dir); + ext4_fc_mark_ineligible(dir->i_sb); + } dx_release(frames); brelse(bh2); return retval; @@ -2208,6 +2211,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry, ext4_clear_inode_flag(dir, EXT4_INODE_INDEX); dx_fallback++; ext4_mark_inode_dirty(handle, dir); + ext4_fc_mark_ineligible(dir->i_sb); } blocks = dir->i_size >> sb->s_blocksize_bits; for (block = 0; block < blocks; block++) { @@ -2553,6 +2557,7 @@ static int ext4_add_nondir(handle_t *handle, int err = ext4_add_entry(handle, dentry, inode); if (!err) { ext4_mark_inode_dirty(handle, inode); + ext4_fc_mark_ineligible(inode->i_sb); d_instantiate_new(dentry, inode); return 0; } @@ -2661,6 +2666,7 @@ static int ext4_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode) err = ext4_orphan_add(handle, inode); if (err) goto err_unlock_inode; + ext4_fc_enqueue_inode(handle, inode); mark_inode_dirty(inode); unlock_new_inode(inode); } @@ -2773,6 +2779,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) err = ext4_init_new_dir(handle, dir, inode); if (err) goto out_clear_inode; + ext4_fc_mark_ineligible(inode->i_sb); err = ext4_mark_inode_dirty(handle, inode); if (!err) err = ext4_add_entry(handle, dentry, inode); @@ -3114,6 +3121,7 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry) inode->i_size = 0; ext4_orphan_add(handle, inode); inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode); + ext4_fc_mark_ineligible(inode->i_sb); ext4_mark_inode_dirty(handle, inode); ext4_dec_count(handle, dir); ext4_update_dx_flag(dir); @@ -3192,6 +3200,7 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry) goto end_unlink; dir->i_ctime = dir->i_mtime = current_time(dir); ext4_update_dx_flag(dir); + ext4_fc_mark_ineligible(dir->i_sb); ext4_mark_inode_dirty(handle, dir); drop_nlink(inode); if (!inode->i_nlink) @@ -3387,6 +3396,7 @@ static int ext4_link(struct dentry *old_dentry, err = ext4_add_entry(handle, dentry, inode); if (!err) { + ext4_fc_mark_ineligible(inode->i_sb); ext4_mark_inode_dirty(handle, inode); /* this can happen only for tmpfile being * linked the first time @@ -3991,6 +4001,8 @@ static int ext4_rename2(struct inode *old_dir, struct dentry *old_dentry, if (err) return err; + ext4_fc_mark_ineligible(old_dir->i_sb); + if (flags & RENAME_EXCHANGE) { return ext4_cross_rename(old_dir, old_dentry, new_dir, new_dentry); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 0b833e9b61c1..c7bb52bdaf6e 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1129,6 +1129,16 @@ static void ext4_destroy_inode(struct inode *inode) true); dump_stack(); } + if (!list_empty(&(EXT4_I(inode)->i_fc_list))) { +#ifdef EXT4FS_DEBUG + if (EXT4_SB(inode->i_sb)->s_fc_eligible) { + pr_warn("%s: INODE %ld in FC List with FC allowd", + __func__, inode->i_ino); + dump_stack(); + } +#endif + ext4_fc_del(inode); + } } static void init_once(void *foo) @@ -1181,6 +1191,7 @@ void ext4_clear_inode(struct inode *inode) EXT4_I(inode)->jinode = NULL; } fscrypt_put_encryption_info(inode); + ext4_fc_del(inode); } static struct inode *ext4_nfs_get_inode(struct super_block *sb, @@ -1325,6 +1336,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len, * S_DAX may be disabled */ ext4_set_inode_flags(inode); + ext4_fc_mark_ineligible(inode->i_sb); res = ext4_mark_inode_dirty(handle, inode); if (res) EXT4_ERROR_INODE(inode, "Failed to mark inode dirty"); @@ -5795,6 +5807,7 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id, EXT4_I(inode)->i_flags |= EXT4_NOATIME_FL | EXT4_IMMUTABLE_FL; inode_set_flags(inode, S_NOATIME | S_IMMUTABLE, S_NOATIME | S_IMMUTABLE); + ext4_fc_mark_ineligible(inode->i_sb); ext4_mark_inode_dirty(handle, inode); ext4_journal_stop(handle); unlock_inode: @@ -5902,6 +5915,7 @@ static int ext4_quota_off(struct super_block *sb, int type) EXT4_I(inode)->i_flags &= ~(EXT4_NOATIME_FL | EXT4_IMMUTABLE_FL); inode_set_flags(inode, 0, S_NOATIME | S_IMMUTABLE); inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_mark_ineligible(inode->i_sb); ext4_mark_inode_dirty(handle, inode); ext4_journal_stop(handle); out_unlock: @@ -6008,6 +6022,7 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type, if (inode->i_size < off + len) { i_size_write(inode, off + len); EXT4_I(inode)->i_disksize = inode->i_size; + ext4_fc_mark_ineligible(inode->i_sb); ext4_mark_inode_dirty(handle, inode); } return len; diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 491f9ee4040e..19bc4046658c 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1406,6 +1406,7 @@ static int ext4_xattr_inode_write(handle_t *handle, struct inode *ea_inode, inode_unlock(ea_inode); ext4_mark_inode_dirty(handle, ea_inode); + ext4_fc_enqueue_inode(handle, ea_inode); out: brelse(bh); From patchwork Fri Aug 9 03:45:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144311 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ACZhGKiR"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPp65cHz9sP7 for ; Fri, 9 Aug 2019 13:46:38 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405226AbfHIDqg (ORCPT ); Thu, 8 Aug 2019 23:46:36 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:44413 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405192AbfHIDqf (ORCPT ); Thu, 8 Aug 2019 23:46:35 -0400 Received: by mail-pf1-f196.google.com with SMTP id t16so45244639pfe.11 for ; Thu, 08 Aug 2019 20:46:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UYxGtB2AAo13bc1nbrh+Vvilu06ItaaLOGpOmCFjms4=; b=ACZhGKiRJxFRF3gqe6E1LgNgf5jm2CUpnDFrJ9lmnNh5hQ55msk5sKdtb/YfFChchb WjmClOtNpbMtf6lpEdcyfNRPwhjHziTIqEtcafmsdBLyTLOlf+Ic+TOJYeDfNyCRPZOi tUYpXn2T6orltnIZpbgRUgOFlbxKGzs9FDev5qc9oKob1Pg4+0+6Ki1BqKz3cPah709P SDvQ6chh4zZQsssvEVhsbGF5vqbQbep3Q+hx0v4P3fXMfvEcu/TI5N4dy/4Cln3ORNn3 A1nBQdKhaxuX3SBruGT5rBIyAEirSFKD6dCKdX6uaYMhlBY3UEfu+w9JgfKiuF8de/Qn zxGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UYxGtB2AAo13bc1nbrh+Vvilu06ItaaLOGpOmCFjms4=; b=bPTwVzD5EGF3X3pRYp15CibGDEdrWHbFv/dGHlVRrA9V6OyjtAk+wEuLI0lUPvwPHV bX3ZRsGkxEOBV3LpGPyZ5JVTtqfNbmyLiNTCNJaiJJszYr34itzGFxAs87D4p/Q8XLng EIcL+X8tf1XQCJQ8TUd3Kl4MJKU2/7KbnlnphdqhTHYY6ADbqhCECYK+igyCr4OTC6YG yTqt9Yk814uVH348V5CCp/Q4b8rQgrHr4RHHP8uziyhRgcGdUO0jeMkT17Ucpd4Zkk5s J5sYPbfj13jFBivsedoeUnEri9WxY8BJUtXTtKwS1x83QdryPBSBMyYLd24XNHUAsx4a Pn5Q== X-Gm-Message-State: APjAAAXCNVHXrWw/5K7HdJV3zXozuxF62WHVUon7vVhdEhKZhUGJ04Pm wV/XE9pQfN7+JC9YGNhU7P6Or/FL X-Google-Smtp-Source: APXvYqwl313YZgqdDGsY8LBMZb/W+anUKsuM34tXlpHBVJyyYittqApiMwbArlJXqNwS3Sv+8moBBg== X-Received: by 2002:a65:6495:: with SMTP id e21mr15735302pgv.359.1565322394367; Thu, 08 Aug 2019 20:46:34 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.33 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:33 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 09/12] ext4: fast-commit commit range tracking Date: Thu, 8 Aug 2019 20:45:49 -0700 Message-Id: <20190809034552.148629-10-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org With this patch, we track logical range of file offsets that need to be committed using fast commit. This allows us to find file extents that need to be committed during the commit time. Signed-off-by: Harshad Shirwadkar --- Changelog: V2: Since s_fc_lock is now a spinlock, updated calls appropriately. --- fs/ext4/ext4_jbd2.c | 33 +++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.h | 2 ++ fs/ext4/inline.c | 5 ++++- fs/ext4/inode.c | 18 +++++++++++++++++- 4 files changed, 56 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index d77b9f1e9dab..2897cbf4cc03 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -389,3 +389,36 @@ void ext4_fc_del(struct inode *inode) list_del_init(&EXT4_I(inode)->i_fc_list); spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); } + +void ext4_fc_update_commit_range(handle_t *handle, struct inode *inode, + loff_t start, loff_t end) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + + if (!ext4_should_fast_commit(inode->i_sb)) + return; + + if (!ext4_handle_valid(handle)) + return; + + if (inode->i_ino < EXT4_FIRST_INO(inode->i_sb)) + ext4_debug("Special inode %ld being modified\n", inode->i_ino); + + if (!EXT4_SB(inode->i_sb)->s_fc_eligible) + return; + + if (ei->i_fc.fc_tid == handle->h_transaction->t_tid && + ei->i_fc.fc_subtid == + handle->h_transaction->t_journal->j_subtid) { + ei->i_fc.fc_lblk_start = ei->i_fc.fc_lblk_start < start ? + ei->i_fc.fc_lblk_start : start; + ei->i_fc.fc_lblk_end = ei->i_fc.fc_lblk_end > end ? + ei->i_fc.fc_lblk_end : end; + return; + } + + ei->i_fc.fc_lblk_start = start; + ei->i_fc.fc_lblk_end = end; + ei->i_fc.fc_subtid = handle->h_transaction->t_journal->j_subtid; + ei->i_fc.fc_tid = handle->h_transaction->t_tid; +} diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index a27cc3a5c676..1badb142dc2a 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -485,5 +485,7 @@ ext4_fc_mark_ineligible(struct super_block *sb) spin_unlock(&sbi->s_fc_lock); } +void ext4_fc_update_commit_range(handle_t *handle, struct inode *inode, + loff_t start, loff_t end); #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 190968996bc6..de61c15e1b17 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -967,8 +967,11 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, * But it's important to update i_size while still holding page lock: * page writeout could otherwise come in and zero beyond i_size. */ - if (pos+copied > inode->i_size) + if (pos+copied > inode->i_size) { + ext4_fc_update_commit_range(ext4_journal_current_handle(), + inode, inode->i_size, pos + copied); i_size_write(inode, pos+copied); + } unlock_page(page); put_page(page); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 379e911b48c4..f79b185c013e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1549,6 +1549,8 @@ static int ext4_journalled_write_end(struct file *file, SetPageUptodate(page); } size_changed = ext4_update_inode_size(inode, pos + copied); + ext4_fc_update_commit_range(handle, inode, pos, pos + copied); + ext4_set_inode_state(inode, EXT4_STATE_JDATA); EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid; unlock_page(page); @@ -2610,8 +2612,12 @@ static int mpage_map_and_submit_extent(handle_t *handle, i_size = i_size_read(inode); if (disksize > i_size) disksize = i_size; - if (disksize > EXT4_I(inode)->i_disksize) + if (disksize > EXT4_I(inode)->i_disksize) { + ext4_fc_update_commit_range(handle, inode, + EXT4_I(inode)->i_disksize, + disksize); EXT4_I(inode)->i_disksize = disksize; + } up_write(&EXT4_I(inode)->i_data_sem); err2 = ext4_mark_inode_dirty(handle, inode); ext4_fc_enqueue_inode(handle, inode); @@ -3220,6 +3226,8 @@ static int ext4_da_write_end(struct file *file, } } + ext4_fc_update_commit_range(handle, inode, pos, pos + copied); + if (write_mode != CONVERT_INLINE_DATA && ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) && ext4_has_inline_data(inode)) @@ -3627,6 +3635,7 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length, goto orphan_del; } + ext4_fc_update_commit_range(handle, inode, offset, offset + written); if (ext4_update_inode_size(inode, offset + written)) { ext4_mark_inode_dirty(handle, inode); ext4_fc_enqueue_inode(handle, inode); @@ -3751,6 +3760,8 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) ext4_update_i_disksize(inode, inode->i_size); ext4_journal_stop(handle); } + ext4_fc_update_commit_range(journal_current_handle(), inode, offset, + offset + count); BUG_ON(iocb->private == NULL); @@ -3869,6 +3880,8 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) ext4_mark_inode_dirty(handle, inode); ext4_fc_enqueue_inode(handle, inode); } + ext4_fc_update_commit_range(handle, inode, offset, + offset + end); } err = ext4_journal_stop(handle); if (ret == 0) @@ -5327,6 +5340,9 @@ static int ext4_do_update_inode(handle_t *handle, cpu_to_le16(ei->i_file_acl >> 32); raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl); if (ei->i_disksize != ext4_isize(inode->i_sb, raw_inode)) { + ext4_fc_update_commit_range(handle, inode, + ext4_isize(inode->i_sb, raw_inode), + ei->i_disksize); ext4_isize_set(raw_inode, ei->i_disksize); need_datasync = 1; } From patchwork Fri Aug 9 03:45:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144313 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="A/0ilNT8"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPr0Lcfz9sP7 for ; Fri, 9 Aug 2019 13:46:40 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405244AbfHIDqi (ORCPT ); Thu, 8 Aug 2019 23:46:38 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:39766 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405213AbfHIDqg (ORCPT ); Thu, 8 Aug 2019 23:46:36 -0400 Received: by mail-pg1-f195.google.com with SMTP id u17so45147545pgi.6 for ; Thu, 08 Aug 2019 20:46:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wbThzPFT64mMkeZBQbJu0txXkO7wJjf6cZ45UHkMQk4=; b=A/0ilNT85egTaQy134Eiau62drKOeN7p2OGVGm7wibsVXQ21hfbQDPOmy/nGM9NO3Z STiOwf7ipjHBHWautZgnVjN0pM5KM1KgEH3eDNfJeUzSs713mxL2Iwhboi+PZiKTWlf3 bwvY+N/9yea5+xX9ONvMDp0yTrDazDCacSz4tgl28UZ8Sg6OLLZdx2Ff1+lj/6GnRQtT VLtyb4UOptezxGZA4m6xejHZfhP40blUfpsybnFQSG7MporADNcpsh79FfsE754wegLi RmnBDHpcYGwzJB4nO8ygQ1gWPfpDSjwd6l8F6K2ruuobiE6Ck/dkvFeAJ+SIRE6yq91/ O+2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wbThzPFT64mMkeZBQbJu0txXkO7wJjf6cZ45UHkMQk4=; b=uLxY2i1d+2w6o5Fef/p0V+P7mtjZHIXGq0sHx+xYkrMlUBTS1zmcjtySuHwFdjCkGJ sWbFpin1kw/Kww/2mb7CoGb2kSzZ3pnqjuMnTMaakhxmJmXmZQECovZpRsZ6B5t1d3nc rPquVyiqJA3fTR9mKyj0fj32idW0iGQXkESReHPSZ2NMczQCCiJrIQVIicgn4FTCtwCs +GI272u563dTd0nfcj873kTkV9/M543lhL4dE0BiUJpkwfPlHUmRYYZFmN/DVOvrK/8l Bl7RgxropMlL7dGuHb12jnU43tw98xwYz6gE6u29QOnwKJgWcnu9KvP1i3yMGDg3VVrv R2ow== X-Gm-Message-State: APjAAAWx8y4wTARhn4oSU/AwlAcBo7FCpAFuZVOLoKDjjlJoOEjd4za+ 2lneIX5u0ACNuiIaY2YZFYixWaGN X-Google-Smtp-Source: APXvYqyjN2oXsWHdXb4tSb7BIEXKnEFdqYAGExgShxeaUN0KdqzQKGWiruetWBl8gTtf51hGv1a9Ng== X-Received: by 2002:a63:ec03:: with SMTP id j3mr16156070pgh.325.1565322395227; Thu, 08 Aug 2019 20:46:35 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.34 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:34 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 10/12] ext4: fast-commit commit path changes Date: Thu, 8 Aug 2019 20:45:50 -0700 Message-Id: <20190809034552.148629-11-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch implements the actual commit path for fast commit. Based on inodes tracked and their respective logical ranges remembered, this patch adds code to create a fast commit block that stores extents added to the inode. We use new JBD2 interfaces added in previous patches in this series. The fast commit blocks that are created have extents that _should_ be present in the file. It doesn't yet support removing of extents, making operations such as truncate, delete fast commit incompatible. Signed-off-by: Harshad Shirwadkar --- Changelog: V2: 1) Use jbd2_wait_on_fc_bufs() instead of jbd2_fc_submit_bufs(). This also implies that fast commit callback now submits relevant bhs by itself. 2) Added tracepoints for commit path. 3) Several changes to fast commit on disk format: - Removed fc_tid from the fast commit header. That's because we TID can be obtained from journal header that exists before fast commit header. - Removed fc_len since it's always 1. - Added fc_flags fields. We set "last" flag for the last block in a sub-transaction. This allows us to maintain atomicity of sub-transactions. - Added fc_features to indicate what fast commit features are used by this fast commit block. In future, we plan to add support for handling of file create and file truncate. fc_features can be used by future patches to indicate incompatibility of those fast commit blocks. --- fs/ext4/ext4.h | 37 ++++++ fs/ext4/extents.c | 8 +- fs/ext4/fsync.c | 2 +- fs/ext4/inode.c | 5 +- fs/ext4/super.c | 259 +++++++++++++++++++++++++++++++++++- include/trace/events/ext4.h | 37 ++++++ 6 files changed, 340 insertions(+), 8 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 0d15d4539dda..210bd4c86d4f 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2276,6 +2276,43 @@ struct mmpd_data { */ #define EXT4_MMP_MAX_CHECK_INTERVAL 300UL +/* Magic of fast commit header */ +#define EXT4_FC_MAGIC 0xE2540090 + +#define EXT4_FC_FL_LAST 0x00000001 + +#define ext4_fc_is_last(__fc_hdr) (((__fc_hdr)->fc_flags) & \ + EXT4_FC_FL_LAST) + +#define ext4_fc_mark_last(__fc_hdr) (((__fc_hdr)->fc_flags) |= \ + EXT4_FC_FL_LAST) + +struct ext4_fc_commit_hdr { + /* Fast commit magic, should be EXT4_FC_MAGIC */ + __le32 fc_magic; + /* Sub transaction ID */ + __le32 fc_subtid; + /* Features used by this fast commit block */ + __u8 fc_features; + /* Flags for this block. */ + __u8 fc_flags; + /* Number of TLVs in this fast commmit block */ + __le16 fc_num_tlvs; + /* Inode number */ + __le32 fc_ino; + /* ext4 inode on disk copy */ + struct ext4_inode inode; + /* Csum(hdr+contents) */ + __le32 fc_csum; +}; + +#define EXT4_FC_TAG_EXT 0x1 /* Extent */ + +struct ext4_fc_tl { + __le16 fc_tag; + __le16 fc_len; +}; + /* * Function prototypes */ diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index eb77e306a82b..66f7f4fb1612 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4899,10 +4899,10 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) if (ret) goto out; - if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) { - ret = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal, - EXT4_I(inode)->i_sync_tid); - } + if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) + ret = jbd2_fc_complete_commit( + EXT4_SB(inode->i_sb)->s_journal, EXT4_I(inode)->i_sync_tid, + journal_current_handle()->h_journal->j_subtid); out: inode_unlock(inode); trace_ext4_fallocate_exit(inode, offset, max_blocks, ret); diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index 5508baa11bb6..4f783f9723c5 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -151,7 +151,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) if (journal->j_flags & JBD2_BARRIER && !jbd2_trans_will_send_data_barrier(journal, commit_tid)) needs_barrier = true; - ret = jbd2_complete_transaction(journal, commit_tid); + ret = jbd2_fc_complete_commit(journal, commit_tid, journal->j_subtid); if (needs_barrier) { issue_flush: err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f79b185c013e..dd5d39a48363 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5476,8 +5476,9 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc) if (wbc->sync_mode != WB_SYNC_ALL || wbc->for_sync) return 0; - err = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal, - EXT4_I(inode)->i_sync_tid); + err = jbd2_fc_complete_commit( + EXT4_SB(inode->i_sb)->s_journal, EXT4_I(inode)->i_sync_tid, + EXT4_SB(inode->i_sb)->s_journal->j_subtid); } else { struct ext4_iloc iloc; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index c7bb52bdaf6e..1191ebbb55c5 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -437,6 +437,260 @@ static bool system_going_down(void) || system_state == SYSTEM_RESTART; } +static void ext4_end_buffer_io_sync(struct buffer_head *bh, int uptodate) +{ + struct buffer_head *orig_bh = bh->b_private; + + BUFFER_TRACE(bh, ""); + if (uptodate) { + ext4_debug("%s: Block %lld up-to-date", + __func__, bh->b_blocknr); + set_buffer_uptodate(bh); + } else { + ext4_debug("%s: Block %lld not up-to-date", + __func__, bh->b_blocknr); + clear_buffer_uptodate(bh); + } + if (orig_bh) { + clear_bit_unlock(BH_Shadow, &orig_bh->b_state); + /* Protect BH_Shadow bit in b_state */ + smp_mb__after_atomic(); + wake_up_bit(&orig_bh->b_state, BH_Shadow); + } + unlock_buffer(bh); +} + +static int ext4_fc_write_inode(journal_t *journal, struct buffer_head *bh, + struct inode *inode, tid_t tid, tid_t subtid, + int is_last) +{ + loff_t old_blk_size, cur_lblk_off, new_blk_size; + struct super_block *sb = journal->j_private; + struct ext4_inode_info *ei = EXT4_I(inode); + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_fc_commit_hdr *fc_hdr; + struct ext4_map_blocks map; + struct ext4_iloc iloc; + struct ext4_fc_tl tl; + struct ext4_extent extent; + __u32 dummy_csum = 0, csum; + __u8 *start, *cur, *end; + __u16 num_tlvs = 0; + int ret; + + if (tid != ei->i_fc.fc_tid || subtid != ei->i_fc.fc_subtid) { + jbd_debug(3, + "File not modified. Modified %d:%d, expected %d:%d", + ei->i_fc.fc_tid, ei->i_fc.fc_subtid, tid, subtid); + return 0; + } + + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) + return -ECANCELED; + + ret = ext4_get_inode_loc(inode, &iloc); + if (ret) + return ret; + + end = (__u8 *)bh->b_data + journal->j_blocksize; + + old_blk_size = (ei->i_fc.fc_lblk_start + sb->s_blocksize - 1) >> + inode->i_blkbits; + new_blk_size = ei->i_fc.fc_lblk_end >> inode->i_blkbits; + + jbd_debug(3, "Committing as tid = %d, subtid = %d on buffer %lld\n", + tid, subtid, bh->b_blocknr); + + ei->i_fc.fc_lblk_start = ei->i_fc.fc_lblk_end; + + fc_hdr = (struct ext4_fc_commit_hdr *) + ((__u8 *)bh->b_data + sizeof(journal_header_t)); + fc_hdr->fc_magic = cpu_to_le32(EXT4_FC_MAGIC); + fc_hdr->fc_subtid = cpu_to_le32(subtid); + fc_hdr->fc_ino = cpu_to_le32(inode->i_ino); + fc_hdr->fc_features = 0; + fc_hdr->fc_flags = 0; + + if (is_last) + ext4_fc_mark_last(fc_hdr); + + memcpy(&fc_hdr->inode, ext4_raw_inode(&iloc), EXT4_INODE_SIZE(sb)); + cur = (__u8 *)(fc_hdr + 1); + start = cur; + csum = 0; + cur_lblk_off = old_blk_size; + while (cur_lblk_off <= new_blk_size) { + map.m_lblk = cur_lblk_off; + map.m_len = new_blk_size - cur_lblk_off + 1; + ret = ext4_map_blocks(NULL, inode, &map, 0); + if (!ret) { + cur_lblk_off += map.m_len; + continue; + } + + if (map.m_flags & EXT4_MAP_UNWRITTEN) + return -ECANCELED; + extent.ee_block = cpu_to_le32(map.m_lblk); + cur_lblk_off += map.m_len; + if (cur + sizeof(struct ext4_extent) + + sizeof(struct ext4_fc_tl) >= end) + return -ENOSPC; + + tl.fc_tag = cpu_to_le16(EXT4_FC_TAG_EXT); + tl.fc_len = cpu_to_le16(sizeof(struct ext4_extent)); + extent.ee_len = cpu_to_le16(map.m_len); + ext4_ext_store_pblock(&extent, map.m_pblk); + if (map.m_flags & EXT4_MAP_UNWRITTEN) + ext4_ext_mark_unwritten(&extent); + else + ext4_ext_mark_initialized(&extent); + memcpy(cur, &tl, sizeof(struct ext4_fc_tl)); + cur += sizeof(struct ext4_fc_tl); + memcpy(cur, &extent, sizeof(struct ext4_extent)); + cur += sizeof(struct ext4_extent); + num_tlvs++; + } + + fc_hdr->fc_num_tlvs = cpu_to_le16(num_tlvs); + csum = ext4_chksum(sbi, csum, (__u8 *)fc_hdr, + offsetof(struct ext4_fc_commit_hdr, fc_csum)); + csum = ext4_chksum(sbi, csum, &dummy_csum, sizeof(dummy_csum)); + csum = ext4_chksum(sbi, csum, start, cur - start); + fc_hdr->fc_csum = cpu_to_le32(csum); + + jbd_debug(3, "Created FC block for inode %ld with [%d, %d]", + inode->i_ino, tid, subtid); + + return 1; +} + +static void ext4_journal_fc_cleanup_cb(journal_t *journal) +{ + struct super_block *sb = journal->j_private; + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_inode_info *iter; + struct inode *inode; + + spin_lock(&sbi->s_fc_lock); + while (!list_empty(&sbi->s_fc_q)) { + iter = list_first_entry(&sbi->s_fc_q, + struct ext4_inode_info, i_fc_list); + list_del_init(&iter->i_fc_list); + inode = &iter->vfs_inode; + } + INIT_LIST_HEAD(&sbi->s_fc_q); + sbi->s_fc_q_cnt = 0; + spin_unlock(&sbi->s_fc_lock); +} + +/* + * Fast-commit commit callback. There is contention between sbi->s_fc_lock and + * i_data_sem. Locking order is - i_data_sem then s_fc_lock + */ +static int ext4_journal_fc_commit_cb(journal_t *journal, tid_t tid, + tid_t subtid, + struct transaction_run_stats_s *stats) +{ + struct super_block *sb = journal->j_private; + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct list_head *pos, *tmp; + struct ext4_inode_info *iter; + struct jbd2_inode *jinode; + int num_bufs = 0, ret; + + memset(stats, 0, sizeof(*stats)); + + trace_ext4_journal_fc_commit_cb_start(sb); + sbi = sbi; + spin_lock(&sbi->s_fc_lock); + if (!sbi->s_fc_eligible) { + sbi->s_fc_eligible = true; + spin_unlock(&sbi->s_fc_lock); + trace_ext4_journal_fc_commit_cb_stop(sb, 0); + return -ECANCELED; + } + + stats->rs_flushing = jiffies; + /* Submit data buffers first */ + list_for_each(pos, &sbi->s_fc_q) { + iter = list_entry(pos, struct ext4_inode_info, i_fc_list); + jinode = iter->jinode; + ret = jbd2_submit_inode_data(journal, jinode); + if (ret) { + spin_unlock(&sbi->s_fc_lock); + trace_ext4_journal_fc_commit_cb_stop(sb, 0); + return ret; + } + } + stats->rs_logging = jiffies; + stats->rs_flushing = jbd2_time_diff(stats->rs_flushing, + stats->rs_logging); + + list_for_each_safe(pos, tmp, &sbi->s_fc_q) { + struct inode *inode; + struct buffer_head *bh; + int is_last; + + iter = list_entry(pos, struct ext4_inode_info, i_fc_list); + inode = &iter->vfs_inode; + + is_last = list_is_last(pos, &sbi->s_fc_q); + spin_unlock(&sbi->s_fc_lock); + + ret = jbd2_map_fc_buf(journal, &bh); + if (ret) + return -ENOMEM; + + /* + * Release s_fc_lock here since fc_write_inode calls + * ext4_map_blocks which needs i_data_sem. + */ + ret = ext4_fc_write_inode(journal, bh, inode, tid, subtid, + is_last); + if (ret < 0) { + trace_ext4_journal_fc_commit_cb_stop(sb, 0); + return ret; + } + lock_buffer(bh); + clear_buffer_dirty(bh); + set_buffer_uptodate(bh); + bh->b_end_io = ext4_end_buffer_io_sync; + submit_bh(REQ_OP_WRITE, REQ_SYNC, bh); + + spin_lock(&sbi->s_fc_lock); + + num_bufs += ret; + } + + stats->rs_logging = jbd2_time_diff(stats->rs_logging, jiffies); + if (num_bufs == 0) { + spin_unlock(&sbi->s_fc_lock); + trace_ext4_journal_fc_commit_cb_stop(sb, 0); + stats->rs_blocks_logged = num_bufs; + return 0; + } + + /* + * Before returning, check if s_fc_eligible was modified since we + * started. + */ + if (!sbi->s_fc_eligible) { + spin_unlock(&sbi->s_fc_lock); + trace_ext4_journal_fc_commit_cb_stop(sb, 0); + return -ECANCELED; + } + + spin_unlock(&sbi->s_fc_lock); + + jbd_debug(3, "%s: Journal blocks ready for fast commit\n", __func__); + + stats->rs_blocks_logged = num_bufs; + + trace_ext4_journal_fc_commit_cb_stop(sb, num_bufs); + + return jbd2_wait_on_fc_bufs(journal, num_bufs); +} + /* Deal with the reporting of failure conditions on a filesystem such as * inconsistencies detected or read IO failures. * @@ -4723,7 +4977,10 @@ static void ext4_init_journal_params(struct super_block *sb, journal_t *journal) journal->j_commit_interval = sbi->s_commit_interval; journal->j_min_batch_time = sbi->s_min_batch_time; journal->j_max_batch_time = sbi->s_max_batch_time; - + if (ext4_should_fast_commit(sb)) { + journal->j_fc_commit_callback = ext4_journal_fc_commit_cb; + journal->j_fc_cleanup_callback = ext4_journal_fc_cleanup_cb; + } write_lock(&journal->j_state_lock); if (test_opt(sb, BARRIER)) journal->j_flags |= JBD2_BARRIER; diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index d68e9e536814..8ef67b61d54a 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2703,6 +2703,43 @@ TRACE_EVENT(ext4_error, __entry->function, __entry->line) ); +TRACE_EVENT(ext4_journal_fc_commit_cb_start, + TP_PROTO(struct super_block *sb), + + TP_ARGS(sb), + + TP_STRUCT__entry( + __field(dev_t, dev) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + ), + + TP_printk("fast_commit started on dev %d,%d", + MAJOR(__entry->dev), MINOR(__entry->dev)) +); + +TRACE_EVENT(ext4_journal_fc_commit_cb_stop, + TP_PROTO(struct super_block *sb, int nblks), + + TP_ARGS(sb, nblks), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int, nblks) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + __entry->nblks = nblks; + ), + + TP_printk("fast_commit done on dev %d,%d, nblks %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->nblks) +); + #endif /* _TRACE_EXT4_H */ /* This part must be outside protection */ From patchwork Fri Aug 9 03:45:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144314 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="fLnatKr8"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPr3jvqz9sPX for ; Fri, 9 Aug 2019 13:46:40 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405245AbfHIDqj (ORCPT ); Thu, 8 Aug 2019 23:46:39 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:36728 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405192AbfHIDqh (ORCPT ); Thu, 8 Aug 2019 23:46:37 -0400 Received: by mail-pf1-f194.google.com with SMTP id r7so45277325pfl.3 for ; Thu, 08 Aug 2019 20:46:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=70mL84Zj+oqJwB1/rpWefyytz1K+z1achazCJoanKpM=; b=fLnatKr8XK+rZrFSwAKA3CyzSiznLG7DjSSaDr5b3RExAqZ2Xp3lUFu6dZGehbXB3/ lgN0BiHWmGN1szcvhGPqqfEqMWUy7nWnx9ArB2OsxPuAT/GqA0FN85CgY3gHaRvlFhzs GR03HXniuPdpEt39EVCA4FgLjKnopvEP8hoLgPNHnr3iube16+UUMDUHs+FyBTVdUcyd sfDqmyL4M4Hl9L36WPVV5PAoNKg+aBH9AZqwnzq/1VqkftWcLQZJO+U2Pb4dlZiYy3b2 EPbwbozjbKDQhBQTpzUqUA/tZcXclCB6Gve0p/jaNQoR5HE9Hier1Ignyo4ntLUHAhOt fJJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=70mL84Zj+oqJwB1/rpWefyytz1K+z1achazCJoanKpM=; b=Mt4eV/o6puZhNDWDffvPcFeO6fq7cs8yOgCysEt+3D2/Ow3LnsljleWtnVHQsflhx/ vNg2KmwmyunvA8yrly7OZEvz4I+vwO5QOAkIM9RkMZKGZFcRXKkpjRoU7ZlhL2gWQI3F 4yRQRPc9Hm2Z7GR4BIOpekolAhddbVWV2KJjAEwr0OAJu/6RaOn31t8VVO+dhtz3Yo/C pSBdBTIkMdVLp2c4u9YBgzvkRlmhS8QbD1PkLxTfndwvNUjSz13Kw8OPXKs8y3THqogl gpixd8qkDhjMaZ3rpEOdgOwxQDt6JSbeKRdvkHzdUXU19rCBYr8T6PwU7rMsVJN9K81W caLw== X-Gm-Message-State: APjAAAUpM/Zc/96xIYUuvD9ED5S96CHsS8fL3u6Y+JC7hUdCPmKTmiFc aswVYyihZgbICbinn+pHwdYIW1ES X-Google-Smtp-Source: APXvYqwNccvxQkUwFP7jLsc6K9vUAkw5GNEg480Z3z4ecXMJD1MVsjbyY0P+cBTajCV0Uat/94XidQ== X-Received: by 2002:a62:b615:: with SMTP id j21mr18649527pff.190.1565322396018; Thu, 08 Aug 2019 20:46:36 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.35 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:35 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 11/12] ext4: fast-commit recovery path changes Date: Thu, 8 Aug 2019 20:45:51 -0700 Message-Id: <20190809034552.148629-12-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch adds core fast-commit recovery path changes. Each fast commit block stores modified extents for a particular file. Replay code maps blocks in each such extent to the actual file one-by-one. We also update corresponding file system metadata to account for newly mapped blocks. In order to achieve all of these, ext4_inode_csum_set(), ext4_inode_blocks() which were earlier static are now made visible. Signed-off-by: Harshad Shirwadkar --- Changelog: V2: 1) Fixed warning reported by Kbuild. 2) Implement scan pass. - we look for "last" blocks to maintain atomicity of subtransactions. - Implement CRC checksum verification. - If scan pass detects error, we don't perform replay pass. 3) Calling j_fc_replay_callback for SCAN pass as well. So added passtype and fast commit block offset parameters to j_fc_replay_callback. Added tracepoint for replay SCAN pass --- fs/ext4/balloc.c | 7 +- fs/ext4/ext4.h | 12 ++ fs/ext4/extents.c | 19 +-- fs/ext4/inode.c | 8 +- fs/ext4/mballoc.c | 83 +++++++++++++ fs/ext4/mballoc.h | 2 + fs/ext4/super.c | 225 ++++++++++++++++++++++++++++++++++++ fs/jbd2/commit.c | 6 +- fs/jbd2/recovery.c | 11 +- include/linux/jbd2.h | 5 +- include/trace/events/ext4.h | 22 ++++ include/trace/events/jbd2.h | 9 +- 12 files changed, 386 insertions(+), 23 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 0b202e00d93f..75c3025c7089 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -360,7 +360,12 @@ static int ext4_validate_block_bitmap(struct super_block *sb, struct buffer_head *bh) { ext4_fsblk_t blk; - struct ext4_group_info *grp = ext4_get_group_info(sb, block_group); + struct ext4_group_info *grp; + + if (EXT4_SB(sb)->fc_replay) + return 0; + + grp = ext4_get_group_info(sb, block_group); if (buffer_verified(bh)) return 0; diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 210bd4c86d4f..ca1fbd77a934 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1378,6 +1378,13 @@ struct ext4_super_block { #define ext4_has_strict_mode(sbi) \ (sbi->s_encoding_flags & EXT4_ENC_STRICT_MODE_FL) +struct ext4_fc_replay_state { + int fc_replay_error; + int fc_replay_expected_off; + int fc_replay_expected_tid; + int fc_replay_current_subtid; +}; + /* * fourth extended-fs super-block data in memory */ @@ -1562,6 +1569,7 @@ struct ext4_sb_info { * Are changes after the last commit * eligible for fast commit? */ + struct ext4_fc_replay_state s_fc_replay_state; spinlock_t s_fc_lock; }; @@ -2588,6 +2596,10 @@ extern int ext4_trim_fs(struct super_block *, struct fstrim_range *); extern void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid); /* inode.c */ +void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw, + struct ext4_inode_info *ei); +blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, + struct ext4_inode_info *ei); int ext4_inode_is_fast_symlink(struct inode *inode); struct buffer_head *ext4_getblk(handle_t *, struct inode *, ext4_lblk_t, int); struct buffer_head *ext4_bread(handle_t *, struct inode *, ext4_lblk_t, int); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 66f7f4fb1612..59fe596ce97d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2894,7 +2894,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, int depth = ext_depth(inode); struct ext4_ext_path *path = NULL; struct partial_cluster partial; - handle_t *handle; + handle_t *handle = NULL; int i = 0, err = 0; partial.pclu = 0; @@ -2904,9 +2904,11 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, ext_debug("truncate since %u to %u\n", start, end); /* probably first extent we're gonna free will be last in block */ - handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, depth + 1); - if (IS_ERR(handle)) - return PTR_ERR(handle); + if (!sbi->fc_replay) { + handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, depth + 1); + if (IS_ERR(handle)) + return PTR_ERR(handle); + } again: trace_ext4_ext_remove_space(inode, start, end, depth); @@ -2926,7 +2928,8 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, /* find extent for or closest extent to this block */ path = ext4_find_extent(inode, end, NULL, EXT4_EX_NOCACHE); if (IS_ERR(path)) { - ext4_journal_stop(handle); + if (!sbi->fc_replay) + ext4_journal_stop(handle); return PTR_ERR(path); } depth = ext_depth(inode); @@ -3012,7 +3015,8 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, path = kcalloc(depth + 1, sizeof(struct ext4_ext_path), GFP_NOFS); if (path == NULL) { - ext4_journal_stop(handle); + if (!sbi->fc_replay) + ext4_journal_stop(handle); return -ENOMEM; } path[0].p_maxdepth = path[0].p_depth = depth; @@ -3142,7 +3146,8 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, path = NULL; if (err == -EAGAIN) goto again; - ext4_journal_stop(handle); + if (!sbi->fc_replay) + ext4_journal_stop(handle); return err; } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index dd5d39a48363..21c9b5197c72 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -103,8 +103,8 @@ static int ext4_inode_csum_verify(struct inode *inode, struct ext4_inode *raw, return provided == calculated; } -static void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw, - struct ext4_inode_info *ei) +void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw, + struct ext4_inode_info *ei) { __u32 csum; @@ -4801,8 +4801,8 @@ void ext4_set_inode_flags(struct inode *inode) S_ENCRYPTED|S_CASEFOLD); } -static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, - struct ext4_inode_info *ei) +blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, + struct ext4_inode_info *ei) { blkcnt_t i_blocks ; struct inode *inode = &(ei->vfs_inode); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index a3e2767bdf2f..70551fa91237 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2915,6 +2915,89 @@ void ext4_exit_mballoc(void) } +void ext4_mb_mark_used(struct super_block *sb, ext4_fsblk_t block, + int len) +{ + struct buffer_head *bitmap_bh = NULL; + struct ext4_group_desc *gdp; + struct buffer_head *gdp_bh; + struct ext4_sb_info *sbi = EXT4_SB(sb); + ext4_group_t group; + ext4_fsblk_t cluster; + ext4_grpblk_t blkoff; + int i, clen, err; + int already_allocated_count; + + cluster = EXT4_B2C(sbi, block); + clen = EXT4_B2C(sbi, len); + + ext4_get_group_no_and_offset(sb, block, &group, &blkoff); + bitmap_bh = ext4_read_block_bitmap(sb, group); + if (IS_ERR(bitmap_bh)) { + err = PTR_ERR(bitmap_bh); + bitmap_bh = NULL; + goto out_err; + } + + err = -EIO; + gdp = ext4_get_group_desc(sb, group, &gdp_bh); + if (!gdp) + goto out_err; + + if (!ext4_data_block_valid(sbi, block, len)) { + ext4_error(sb, "Allocating blks %llu-%llu which overlap mdata", + cluster, cluster+clen); + /* File system mounted not to panic on error + * Fix the bitmap and return EFSCORRUPTED + * We leak some of the blocks here. + */ + ext4_lock_group(sb, group); + ext4_set_bits(bitmap_bh->b_data, blkoff, clen); + ext4_unlock_group(sb, group); + err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + if (!err) + err = -EFSCORRUPTED; + goto out_err; + } + + ext4_lock_group(sb, group); + already_allocated_count = 0; + for (i = 0; i < clen; i++) + if (mb_test_bit(blkoff + i, bitmap_bh->b_data)) + already_allocated_count++; + + ext4_set_bits(bitmap_bh->b_data, blkoff, clen); + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { + gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); + ext4_free_group_clusters_set(sb, gdp, + ext4_free_clusters_after_init(sb, + group, gdp)); + } + clen = ext4_free_group_clusters(sb, gdp) - clen + + already_allocated_count; + ext4_free_group_clusters_set(sb, gdp, clen); + ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh); + ext4_group_desc_csum_set(sb, group, gdp); + + ext4_unlock_group(sb, group); + + if (sbi->s_log_groups_per_flex) { + ext4_group_t flex_group = ext4_flex_group(sbi, group); + + atomic64_sub(len, + &sbi->s_flex_groups[flex_group].free_clusters); + } + + err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + if (err) + goto out_err; + err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); + +out_err: + brelse(bitmap_bh); +} + /* * Check quota and mark chosen space (ac->ac_b_ex) non-free in bitmaps * Returns 0 if success or error code diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index 88c98f17e3d9..1881710041b6 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -215,4 +215,6 @@ ext4_mballoc_query_range( ext4_mballoc_query_range_fn formatter, void *priv); +void ext4_mb_mark_used(struct super_block *sb, ext4_fsblk_t block, + int len); #endif diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 1191ebbb55c5..3b535eb624a7 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -408,6 +408,224 @@ static int block_device_ejected(struct super_block *sb) return bdi->dev == NULL; } +static void ext4_fc_add_block(struct inode *inode, ext4_lblk_t lblk, + ext4_fsblk_t pblk, int unwritten) +{ + struct ext4_extent ex; + struct ext4_ext_path *path = NULL; + struct ext4_map_blocks map; + int ret; + + map.m_lblk = lblk; + map.m_len = 0x1; + ret = ext4_map_blocks(NULL, inode, &map, 0); + if (ret > 0) { + if (pblk != map.m_pblk) + jbd_debug(1, "Bad mapping found while replaying fc\n"); + return; + } + + ex.ee_block = cpu_to_le32(lblk); + ext4_ext_store_pblock(&ex, pblk); + ex.ee_len = cpu_to_le16(0x1); + if (unwritten) + ext4_ext_mark_unwritten(&ex); + + path = ext4_find_extent(inode, lblk, NULL, 0); + if (path) { + down_write(&EXT4_I(inode)->i_data_sem); + ret = ext4_ext_insert_extent(NULL, inode, &path, &ex, 0); + ext4_mb_mark_used(inode->i_sb, ext4_ext_pblock(&ex), 0x1); + up_write((&EXT4_I(inode)->i_data_sem)); + kfree(path); + } +} + +static int ext4_journal_fc_replay_scan(struct super_block *sb, + struct buffer_head *bh, int off) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_fc_replay_state *state; + struct ext4_fc_commit_hdr *fc_hdr; + struct ext4_fc_tl *tl; + __u32 csum, dummy_csum = 0; + __u8 *start; + tid_t fc_subtid; + int i; + + state = &sbi->s_fc_replay_state; + fc_hdr = (struct ext4_fc_commit_hdr *) + ((__u8 *)bh->b_data + sizeof(journal_header_t)); + + fc_subtid = le32_to_cpu(fc_hdr->fc_subtid); + + if (le32_to_cpu(fc_hdr->fc_magic) != EXT4_FC_MAGIC) { + state->fc_replay_error = -ENOENT; + goto out_err; + } + + if (off != state->fc_replay_expected_off) { + state->fc_replay_error = -EFSCORRUPTED; + goto out_err; + } + + if (le16_to_cpu(fc_hdr->fc_features)) { + state->fc_replay_error = -EOPNOTSUPP; + goto out_err; + } + + /* Check if we already concluded that this fast commit is not useful */ + if (state->fc_replay_error && state->fc_replay_error != -EPROTO) + goto out_err; + + if (state->fc_replay_expected_off == 0) { + /* This is a first block */ + state->fc_replay_current_subtid = fc_subtid; + /* + * We set replay error by default until we find an end + * block for a particular subtid + */ + state->fc_replay_error = -EPROTO; + } + + if (state->fc_replay_error != 0) { + if (state->fc_replay_current_subtid != fc_subtid) { + state->fc_replay_error = -EFSCORRUPTED; + goto out_err; + } + } else { + /* + * We encountered _last_ block for previous subtid. So we should + * only find a bigger subtid here. + */ + if (fc_subtid <= state->fc_replay_current_subtid) { + state->fc_replay_error = -EFSCORRUPTED; + goto out_err; + } + state->fc_replay_current_subtid = fc_subtid; + } + + /* + * We can replay fast commit blocks only if we find a _last_ block for + * all subtids. + */ + if (ext4_fc_is_last(fc_hdr)) + state->fc_replay_error = 0; + + csum = ext4_chksum(sbi, 0, fc_hdr, + offsetof(struct ext4_fc_commit_hdr, fc_csum)); + csum = ext4_chksum(sbi, csum, &dummy_csum, sizeof(dummy_csum)); + + tl = (struct ext4_fc_tl *)(fc_hdr + 1); + start = (__u8 *)tl; + for (i = 0; i < le16_to_cpu(fc_hdr->fc_num_tlvs); i++) { + if (le16_to_cpu(tl->fc_tag) != EXT4_FC_TAG_EXT) + goto out_err; + tl = (struct ext4_fc_tl *)((__u8 *)tl + + le16_to_cpu(tl->fc_len) + + sizeof(*tl)); + } + csum = ext4_chksum(sbi, csum, start, (__u8 *)tl - start); + if (csum != le32_to_cpu(fc_hdr->fc_csum)) { + state->fc_replay_error = -EFSBADCRC; + goto out_err; + } + + state->fc_replay_expected_off++; + return 0; + +out_err: + trace_ext4_journal_fc_replay_scan(sb, off, state->fc_replay_error); + return state->fc_replay_error; +} + +static int ext4_journal_fc_replay_cb(journal_t *journal, struct buffer_head *bh, + enum passtype pass, int off) +{ + struct super_block *sb = journal->j_private; + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_fc_commit_hdr *fc_hdr; + struct ext4_fc_tl *tl; + struct ext4_iloc iloc; + struct ext4_extent *ex; + struct inode *inode; + int ret; + + if (pass == PASS_SCAN) + return ext4_journal_fc_replay_scan(sb, bh, off); + + if (sbi->s_fc_replay_state.fc_replay_error) + return sbi->s_fc_replay_state.fc_replay_error; + + sbi->fc_replay = true; + fc_hdr = (struct ext4_fc_commit_hdr *) + ((__u8 *)bh->b_data + sizeof(journal_header_t)); + + jbd_debug(3, "%s: Got FC block for inode %d at [%d,%d]", __func__, + le32_to_cpu(fc_hdr->fc_ino), + be32_to_cpu(((journal_header_t *)bh->b_data)->h_sequence), + le32_to_cpu(fc_hdr->fc_subtid)); + + inode = ext4_iget(sb, le32_to_cpu(fc_hdr->fc_ino), EXT4_IGET_NORMAL); + if (IS_ERR(inode)) + return 0; + + ret = ext4_get_inode_loc(inode, &iloc); + if (ret) + return ret; + + inode_lock(inode); + tl = (struct ext4_fc_tl *)(fc_hdr + 1); + while (le16_to_cpu(tl->fc_tag) == EXT4_FC_TAG_EXT) { + int i; + + ex = (struct ext4_extent *)(tl + 1); + tl = (struct ext4_fc_tl *)((__u8 *)tl + + le16_to_cpu(tl->fc_len) + + sizeof(*tl)); + /* + * We add block by block because part of extent may already have + * been added by a previous fast commit replay. + */ + for (i = 0; i < ext4_ext_get_actual_len(ex); i++) + ext4_fc_add_block(inode, le32_to_cpu(ex->ee_block) + i, + ext4_ext_pblock(ex) + i, + ext4_ext_is_unwritten(ex)); + } + + /* + * Unless inode contains inline data, copy everything except + * i_blocks. i_blocks would have been set alright by ext4_fc_add_block + * call above. + */ + if (ext4_has_inline_data(inode)) { + memcpy(ext4_raw_inode(&iloc), &fc_hdr->inode, + sizeof(struct ext4_inode)); + } else { + memcpy(ext4_raw_inode(&iloc), &fc_hdr->inode, + offsetof(struct ext4_inode, i_block)); + memcpy(&ext4_raw_inode(&iloc)->i_generation, + &fc_hdr->inode.i_generation, + sizeof(struct ext4_inode) - + offsetof(struct ext4_inode, i_generation)); + } + + ext4_reserve_inode_write(NULL, inode, &iloc); + inode_unlock(inode); + sbi->fc_replay = false; + + ext4_inode_csum_set(inode, ext4_raw_inode(&iloc), EXT4_I(inode)); + ret = ext4_handle_dirty_metadata(NULL, inode, iloc.bh); + iput(inode); + if (!ret) + ret = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL); + + brelse(iloc.bh); + + return ret; +} + + static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn) { struct super_block *sb = journal->j_private; @@ -4981,6 +5199,13 @@ static void ext4_init_journal_params(struct super_block *sb, journal_t *journal) journal->j_fc_commit_callback = ext4_journal_fc_commit_cb; journal->j_fc_cleanup_callback = ext4_journal_fc_cleanup_cb; } + + /* + * We set replay callback even if fast commit disabled because we may + * could still have fast commit blocks that need to be replayed even if + * fast commit has now been turned off. + */ + journal->j_fc_replay_callback = ext4_journal_fc_replay_cb; write_lock(&journal->j_state_lock); if (test_opt(sb, BARRIER)) journal->j_flags |= JBD2_BARRIER; diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index db62a53436e3..1875cdc839fb 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -469,6 +469,10 @@ void jbd2_journal_commit_transaction(journal_t *journal, bool *fc) if (fc) *fc = true; write_unlock(&journal->j_state_lock); + trace_jbd2_run_stats(journal->j_fs_dev->bd_dev, + journal->j_running_transaction + ->t_tid, + &stats.run, true); goto update_overall_stats; } if (journal->j_fc_cleanup_callback) @@ -1156,7 +1160,7 @@ void jbd2_journal_commit_transaction(journal_t *journal, bool *fc) stats.run.rs_handle_count = atomic_read(&commit_transaction->t_handle_count); trace_jbd2_run_stats(journal->j_fs_dev->bd_dev, - commit_transaction->t_tid, &stats.run); + commit_transaction->t_tid, &stats.run, false); stats.ts_requested = (commit_transaction->t_requested) ? 1 : 0; commit_transaction->t_state = T_COMMIT_CALLBACK; diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index 3a6cd1497504..ba049a31febc 100644 --- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -35,7 +35,6 @@ struct recovery_info int nr_revoke_hits; }; -enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY}; static int do_one_pass(journal_t *journal, struct recovery_info *info, enum passtype pass); static int scan_revoke_records(journal_t *, struct buffer_head *, @@ -444,10 +443,10 @@ static int fc_do_one_pass(journal_t *journal, } jbd_debug(3, "Processing fast commit blk with seq %d", seq); - if (pass == PASS_REPLAY && - journal->j_fc_replay_callback) { - err = journal->j_fc_replay_callback(journal, - bh); + if (journal->j_fc_replay_callback) { + err = journal->j_fc_replay_callback( + journal, bh, pass, + next_fc_block - journal->j_first_fc); if (err) break; } @@ -849,7 +848,7 @@ static int do_one_pass(journal_t *journal, } } - if (jbd2_has_feature_fast_commit(journal) && pass == PASS_REPLAY) + if (jbd2_has_feature_fast_commit(journal) && pass != PASS_REVOKE) fc_do_one_pass(journal, info, pass); if (block_error && success == 0) diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 5362777d06f8..000363d994bb 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -759,6 +759,8 @@ jbd2_time_diff(unsigned long start, unsigned long end) #define JBD2_NR_BATCH 64 +enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY}; + /** * struct journal_s - The journal_s type is the concrete type associated with * journal_t. @@ -1240,7 +1242,8 @@ struct journal_s * the journal. */ int (*j_fc_replay_callback)(struct journal_s *journal, - struct buffer_head *bh); + struct buffer_head *bh, + enum passtype pass, int off); /** * @j_fc_cleanup_callback: * diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 8ef67b61d54a..9aef10c8e16d 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2703,6 +2703,28 @@ TRACE_EVENT(ext4_error, __entry->function, __entry->line) ); +TRACE_EVENT(ext4_journal_fc_replay_scan, + TP_PROTO(struct super_block *sb, int error, int off), + + TP_ARGS(sb, error, off), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int, error) + __field(int, off) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + __entry->error = error; + __entry->off = off; + ), + + TP_printk("FC scan pass on dev %d,%d: error %d, off %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->error, __entry->off) +); + TRACE_EVENT(ext4_journal_fc_commit_cb_start, TP_PROTO(struct super_block *sb), diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h index 2310b259329f..af78bacdae83 100644 --- a/include/trace/events/jbd2.h +++ b/include/trace/events/jbd2.h @@ -233,9 +233,9 @@ TRACE_EVENT(jbd2_handle_stats, TRACE_EVENT(jbd2_run_stats, TP_PROTO(dev_t dev, unsigned long tid, - struct transaction_run_stats_s *stats), + struct transaction_run_stats_s *stats, bool fc), - TP_ARGS(dev, tid, stats), + TP_ARGS(dev, tid, stats, fc), TP_STRUCT__entry( __field( dev_t, dev ) @@ -249,6 +249,7 @@ TRACE_EVENT(jbd2_run_stats, __field( __u32, handle_count ) __field( __u32, blocks ) __field( __u32, blocks_logged ) + __field( bool, fc ) ), TP_fast_assign( @@ -263,11 +264,13 @@ TRACE_EVENT(jbd2_run_stats, __entry->handle_count = stats->rs_handle_count; __entry->blocks = stats->rs_blocks; __entry->blocks_logged = stats->rs_blocks_logged; + __entry->fc = fc; ), - TP_printk("dev %d,%d tid %lu wait %u request_delay %u running %u " + TP_printk("%s commit, dev %d,%d tid %lu wait %u request_delay %u running %u " "locked %u flushing %u logging %u handle_count %u " "blocks %u blocks_logged %u", + __entry->fc ? "fast" : "full", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid, jiffies_to_msecs(__entry->wait), jiffies_to_msecs(__entry->request_delay), From patchwork Fri Aug 9 03:45:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1144315 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="OfnPzcWv"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 464WPs0B2Rz9sPY for ; Fri, 9 Aug 2019 13:46:41 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405250AbfHIDqk (ORCPT ); Thu, 8 Aug 2019 23:46:40 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:42295 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405241AbfHIDqi (ORCPT ); Thu, 8 Aug 2019 23:46:38 -0400 Received: by mail-pf1-f196.google.com with SMTP id q10so45258657pff.9 for ; Thu, 08 Aug 2019 20:46:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dJqXJqAeOKVKiXAlejo9+7GDfeBArCeFG4cEOTw+pTU=; b=OfnPzcWv8+39mwnRXgtCsKtvTZ6avLBKvVXO28HFn4texfTnOD8IvM17wJfbXTlorl 8VF0EfJuehNiYCQK3R8cQgeyj9/YhtfI5RLob0HPUKTdWxniDYpgtshCkliwzYXmO70c nmxPolfKFmLTZiFI+lfQuSI2cl8C5syF/1ZG2n/xiSS1ARnHGgEdieumVBFrd5sqgM/J 7eLnwrFCQ6nkLLU5s+N1ZEVmltuZ2fxAfhiM0kilwa3HoY2TCCJTaaBXBigDKp4tDwOy zB+B8mgOYwa4QLoA+F/7mtzSDGhaga15OH21AoJNNaM2YPR0IEyqK+N6mvm4pzfZtcYj GIsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dJqXJqAeOKVKiXAlejo9+7GDfeBArCeFG4cEOTw+pTU=; b=QvduoshRAZ9zkS5FZc7uZcMKGenZqUp+8bZ7G2E3oSWyKEUV7W5qSEEGkDmRVgdhiG e1LZf13V39uCBu0AL6H8fzEYoryq/Bmaq+za0H3DDNPr32vkKt53bl90+SW7vEx7IbM6 lRYYH5DPwpM+KC2MvkgMxlcN136OE8/tdyGd1ufVdEpS4MvyeP8m+oZ+dRWyBxQKeAxp oTcfq+uiday6C2emYiOXbHnBHktO3MEXhKJAYemA6PIk5hkjH2ovSxGhdZ/2LXbHtjKH i6fUwlzeYvpmSKUzgS34/mFycpxw2h5q3f6sNhOSSyERAZZNyDywAb6yWhdwP7ydiBmO RDsA== X-Gm-Message-State: APjAAAVj8UtPV2fR7WVed5xbGkv6bl+2aeJ465VVZjocqYWh8QhGUXkL pemedcU/R22PZybrjZzgu23QX0ry X-Google-Smtp-Source: APXvYqzutHbG91AccfanqnTTnwpNMJDEGNrhhlOfeS/n1UQzyesI7Ltph5pveu0eb9PosYiof+oL5Q== X-Received: by 2002:a63:d301:: with SMTP id b1mr15422849pgg.379.1565322396609; Thu, 08 Aug 2019 20:46:36 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id s5sm80191085pfm.97.2019.08.08.20.46.36 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 08 Aug 2019 20:46:36 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v2 12/12] docs: Add fast commit documentation Date: Thu, 8 Aug 2019 20:45:52 -0700 Message-Id: <20190809034552.148629-13-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog In-Reply-To: <20190809034552.148629-1-harshadshirwadkar@gmail.com> References: <20190809034552.148629-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch adds necessary documentation to Documentation/filesystems/journalling.rst and Documentation/filesystems/ext4/journal.rst. Signed-off-by: Harshad Shirwadkar --- Documentation/filesystems/ext4/journal.rst | 96 ++++++++++++++++++++-- Documentation/filesystems/journalling.rst | 15 ++++ 2 files changed, 105 insertions(+), 6 deletions(-) diff --git a/Documentation/filesystems/ext4/journal.rst b/Documentation/filesystems/ext4/journal.rst index ea613ee701f5..d6e4a698e208 100644 --- a/Documentation/filesystems/ext4/journal.rst +++ b/Documentation/filesystems/ext4/journal.rst @@ -29,10 +29,14 @@ safest. If ``data=writeback``, dirty data blocks are not flushed to the disk before the metadata are written to disk through the journal. The journal inode is typically inode 8. The first 68 bytes of the -journal inode are replicated in the ext4 superblock. The journal itself -is normal (but hidden) file within the filesystem. The file usually -consumes an entire block group, though mke2fs tries to put it in the -middle of the disk. +journal inode are replicated in the ext4 superblock. The journal +itself is normal (but hidden) file within the filesystem. The file +usually consumes an entire block group, though mke2fs tries to put it +in the middle of the disk. Last 128 blocks in the journal are reserved +for fast commits. Fast commits store metadata changes to inodes in an +incremental fashion. A fast commit is valid only if there is no full +commit after that particular fast commit. That makes fast commit space +reusable after every full commit. All fields in jbd2 are written to disk in big-endian order. This is the opposite of ext4. @@ -48,16 +52,18 @@ Layout Generally speaking, the journal has this format: .. list-table:: - :widths: 16 48 16 + :widths: 16 48 16 18 :header-rows: 1 * - Superblock - descriptor\_block (data\_blocks or revocation\_block) [more data or revocations] commmit\_block - [more transactions...] + - [Fast commits...] * - - One transaction - + - Notice that a transaction begins with either a descriptor and some data, or a block revocation list. A finished transaction always ends with a @@ -76,7 +82,7 @@ The journal superblock will be in the next full block after the superblock. .. list-table:: - :widths: 12 12 12 32 12 + :widths: 12 12 12 32 12 12 :header-rows: 1 * - 1024 bytes of padding @@ -85,11 +91,13 @@ superblock. - descriptor\_block (data\_blocks or revocation\_block) [more data or revocations] commmit\_block - [more transactions...] + - [Fast commits...] * - - - - One transaction - + - Block Header ~~~~~~~~~~~~ @@ -609,3 +617,79 @@ bytes long (but uses a full block): - h\_commit\_nsec - Nanoseconds component of the above timestamp. +Fast Commit Block +~~~~~~~~~~~~~~~~~ + +The fast commit block indicates an append to the last commit block +that was written to the journal. One fast commit block records updates +to one inode. So, typically you would find as many fast commit blocks +as the number of inodes that got changed since the last commit. A fast +commit block is valid only if there is no commit block present with +transaction ID greater than that of the fast commit block. If such a +block a present, then there is no need to replay the fast commit +block. + +Multiple fast commit blocks are a part of one sub-transaction. To +indicate the last block in a fast commit transaction, fc_flags field +in the last block in every subtransaction is marked with "LAST" (0x1) +flag. A subtransaction is valid only if all the following conditions +are met: + +1) SUBTID of all blocks is either equal to or greater than SUBTID of + the previous fast commit block. +2) For every sub-transaction, last block is marked with LAST flag. +3) There are no invalid blocks in between. + +.. list-table:: + :widths: 8 8 24 40 + :header-rows: 1 + + * - Offset + - Type + - Name + - Descriptor + * - 0x0 + - journal\_header\_s + - (open coded) + - Common block header. + * - 0xC + - \_\_le32 + - fc\_magic + - Magic value which should be set to 0xE2540090. This identifies + that this block is a fast commit block. + * - 0x10 + - \_\_le32 + - fc\_subtid + - Sub-transaction ID for this commit block + * - 0x14 + - \_\_u8 + - fc\_features + - Features used by this fast commit block. + * - 0x15 + - \_\_u8 + - fc_flags + - Flags. (0x1(Last) - Indicates that this is the last block in sub-transaction) + * - 0x16 + - \_\_le16 + - fc_num_tlvs + - Number of TLVs contained in this fast commit block + * - 0x18 + - \_\_le32 + - \_\_fc\_len + - Length of the fast commit block in terms of number of blocks + * - 0x2c + - \_\_le32 + - fc\_ino + - Inode number of the inode that will be recovered using this fast commit + * - 0x30 + - struct ext4\_inode + - inode + - On-disk copy of the inode at the commit time + * - 0x34 + - struct ext4\_fc\_tl + - Array of struct ext4\_fc\_tl + - The actual delta with the last commit. Starting at this offset, + there is an array of TLVs that indicates which all extents + should be present in the corresponding inode. Currently, the + only tag that is supported is EXT4\_FC\_TAG\_EXT. That tag + indicates that the corresponding value is an extent. diff --git a/Documentation/filesystems/journalling.rst b/Documentation/filesystems/journalling.rst index 58ce6b395206..2e0d550b546c 100644 --- a/Documentation/filesystems/journalling.rst +++ b/Documentation/filesystems/journalling.rst @@ -115,6 +115,21 @@ called after each transaction commit. You can also use ``transaction->t_private_list`` for attaching entries to a transaction that need processing when the transaction commits. +JBD2 also allows client file systems to implement file system specific +commits which are called as ``fast commits``. File systems that wish +to use this feature should first set +``journal->j_fc_commit_callback``. That function is called before +performing a commit. File system can call :c:func:`jbd2_map_fc_buf()` +to get buffers reserved for fast commits. If file system returns 0, +JBD2 assumes that file system performed a fast commit and it backs off +from performing a commit. Otherwise, JBD2 falls back to normal full +commit. After performing either a fast or a full commit, JBD2 calls +``journal->j_fc_cleanup_cb`` to allow file systems to perform cleanups +for their internal fast commit related data structures. At the replay +time, JBD2 passes each and every fast commit block to the file system +via ``journal->j_fc_replay_cb``. Ext4 effectively uses this fast +commit mechanism to improve journal commit performance. + JBD2 also provides a way to block all transaction updates via :c:func:`jbd2_journal_lock_updates()` / :c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a