From patchwork Tue Oct 1 07:40:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169760 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="gV14bNcs"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB70492sz9sQn for ; Tue, 1 Oct 2019 17:42:04 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728926AbfJAHmD (ORCPT ); Tue, 1 Oct 2019 03:42:03 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:44478 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728249AbfJAHmD (ORCPT ); Tue, 1 Oct 2019 03:42:03 -0400 Received: by mail-pf1-f196.google.com with SMTP id q21so7287321pfn.11 for ; Tue, 01 Oct 2019 00:42:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OURPjTRTXuglZULKxoZ5ncYc60pHs4vLUxcp1tLNHN0=; b=gV14bNcsprFFBpLd/QTHUYWCcZsRupt2vO3cwEORjMQGdPSyWyEJE4mHPZK1JJ5QQ6 V0CYxZjp0yGrd35EFL9o1cKHcvl1qkkII/L08h2YkF2fmpSiGRTEtdf3YXrSnohqsPJ6 +nVmnrkWv8hsB/dHIMpYuR3ydRWPtEDQribRB1CZdKISCXCk/Oz0HtyykwosBjRMHgYW XJvoku+P8lZEo/CBu2ydAfvsXVTjEVKXSy/SVEh0utDrRX8JlByZ68/Hs1jgI6+3rTd8 TUTMtfTBdZefMsAMiF+gQ4Yx1y2dMI6LPHASXkVvWRGD4WftnfaNpr9U7KA8R5WOS5Ph Fk6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OURPjTRTXuglZULKxoZ5ncYc60pHs4vLUxcp1tLNHN0=; b=V9sKgiNdlIb9rco4wOnmEFmmBfCCUr7pV3KP5tLtqn70gYKp58HVeqW87NGniGkJKj AlH50foHD+wHXRKUtUjMRNqdZ+KTQooNdtZBojcpcozDshZQO2BuJ7P5ee4TuvVcEIFa 3xb+LBlZTv2rf1AdrLkX/lRRbKsubihVTi4hhnS77K1kZH9xf1LCm4VNpTRM7S4knrDm xfV7LwdrSJadQRr4c8/93tyBytn5S1OHlV8ydJZ4RIoOA9b4myNAJswSTlHqGYGYluG3 eJB1TiYsMXMcdULyx3HFGw63iE6RUhSpniND2dMfy6V8B3AxqVPZSnjEQ4RAMGG93M8B tfOg== X-Gm-Message-State: APjAAAVwwWn3dPwjg6ZH3ZiAUV8COJlBBxeh6UQXp1VJnM64jxL2QiBc cZ6fW83WBAlM1iIJlo1ZKebBuRhJYrk= X-Google-Smtp-Source: APXvYqySgwa9yveskwbdNaN1pdtM+37UA/mkuM4tNXGuKzQilOyQQmjISnZAVKT5IPAx67GLaZPfQw== X-Received: by 2002:a62:19c7:: with SMTP id 190mr26555009pfz.105.1569915722234; Tue, 01 Oct 2019 00:42:02 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:01 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar , Andreas Dilger , Theodore Ts'o Subject: [PATCH v3 01/13] ext4: add handling for extended mount options Date: Tue, 1 Oct 2019 00:40:50 -0700 Message-Id: <20191001074101.256523-2-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org We are running out of mount option bits. This patch adds handling for using s_mount_opt2 and also adds ability to turn on / off the fast commit feature. In order to use fast commits, new version e2fsprogs needs to set the fast feature commit flag. This also makes sure that we have fast commit compatible e2fsprogs before starting to use the feature. Mount flag "no_fastcommit", introuced in this patch, can be passed to disable the feature at mount time. Signed-off-by: Harshad Shirwadkar Reviewed-by: Andreas Dilger Reviewed-by: Theodore Ts'o --- fs/ext4/ext4.h | 4 ++++ fs/ext4/super.c | 27 ++++++++++++++++++++++----- include/linux/jbd2.h | 5 ++++- 3 files changed, 30 insertions(+), 6 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index bf660aa7a9e0..becbda38b7db 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1146,6 +1146,8 @@ struct ext4_inode_info { #define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly specified journal checksum */ +#define EXT4_MOUNT2_JOURNAL_FAST_COMMIT 0x00000010 /* Journal fast commit */ + #define clear_opt(sb, opt) EXT4_SB(sb)->s_mount_opt &= \ ~EXT4_MOUNT_##opt #define set_opt(sb, opt) EXT4_SB(sb)->s_mount_opt |= \ @@ -1643,6 +1645,7 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei) #define EXT4_FEATURE_COMPAT_RESIZE_INODE 0x0010 #define EXT4_FEATURE_COMPAT_DIR_INDEX 0x0020 #define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200 +#define EXT4_FEATURE_COMPAT_FAST_COMMIT 0x0400 #define EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER 0x0001 #define EXT4_FEATURE_RO_COMPAT_LARGE_FILE 0x0002 @@ -1743,6 +1746,7 @@ EXT4_FEATURE_COMPAT_FUNCS(xattr, EXT_ATTR) EXT4_FEATURE_COMPAT_FUNCS(resize_inode, RESIZE_INODE) EXT4_FEATURE_COMPAT_FUNCS(dir_index, DIR_INDEX) EXT4_FEATURE_COMPAT_FUNCS(sparse_super2, SPARSE_SUPER2) +EXT4_FEATURE_COMPAT_FUNCS(fast_commit, FAST_COMMIT) EXT4_FEATURE_RO_COMPAT_FUNCS(sparse_super, SPARSE_SUPER) EXT4_FEATURE_RO_COMPAT_FUNCS(large_file, LARGE_FILE) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 4079605d437a..e376ac040cce 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1455,6 +1455,7 @@ enum { Opt_dioread_nolock, Opt_dioread_lock, Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable, Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache, + Opt_no_fastcommit }; static const match_table_t tokens = { @@ -1537,6 +1538,7 @@ static const match_table_t tokens = { {Opt_init_itable, "init_itable=%u"}, {Opt_init_itable, "init_itable"}, {Opt_noinit_itable, "noinit_itable"}, + {Opt_no_fastcommit, "no_fastcommit"}, {Opt_max_dir_size_kb, "max_dir_size_kb=%u"}, {Opt_test_dummy_encryption, "test_dummy_encryption"}, {Opt_nombcache, "nombcache"}, @@ -1659,6 +1661,7 @@ static int clear_qf_name(struct super_block *sb, int qtype) #define MOPT_NO_EXT3 0x0200 #define MOPT_EXT4_ONLY (MOPT_NO_EXT2 | MOPT_NO_EXT3) #define MOPT_STRING 0x0400 +#define MOPT_2 0x0800 static const struct mount_opts { int token; @@ -1751,6 +1754,8 @@ static const struct mount_opts { {Opt_max_dir_size_kb, 0, MOPT_GTE0}, {Opt_test_dummy_encryption, 0, MOPT_GTE0}, {Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET}, + {Opt_no_fastcommit, EXT4_MOUNT2_JOURNAL_FAST_COMMIT, + MOPT_CLEAR | MOPT_2 | MOPT_EXT4_ONLY}, {Opt_err, 0, 0} }; @@ -1858,8 +1863,9 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token, set_opt2(sb, EXPLICIT_DELALLOC); } else if (m->mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) { set_opt2(sb, EXPLICIT_JOURNAL_CHECKSUM); - } else + } else if (m->mount_opt) { return -1; + } } if (m->flags & MOPT_CLEAR_ERR) clear_opt(sb, ERRORS_MASK); @@ -2027,10 +2033,17 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token, WARN_ON(1); return -1; } - if (arg != 0) - sbi->s_mount_opt |= m->mount_opt; - else - sbi->s_mount_opt &= ~m->mount_opt; + if (m->flags & MOPT_2) { + if (arg != 0) + sbi->s_mount_opt2 |= m->mount_opt; + else + sbi->s_mount_opt2 &= ~m->mount_opt; + } else { + if (arg != 0) + sbi->s_mount_opt |= m->mount_opt; + else + sbi->s_mount_opt &= ~m->mount_opt; + } } return 1; } @@ -3733,6 +3746,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) #ifdef CONFIG_EXT4_FS_POSIX_ACL set_opt(sb, POSIX_ACL); #endif + if (ext4_has_feature_fast_commit(sb)) + set_opt2(sb, JOURNAL_FAST_COMMIT); + /* don't forget to enable journal_csum when metadata_csum is enabled. */ if (ext4_has_metadata_csum(sb)) set_opt(sb, JOURNAL_CHECKSUM); @@ -4334,6 +4350,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_def_mount_opt &= ~EXT4_MOUNT_JOURNAL_CHECKSUM; clear_opt(sb, JOURNAL_CHECKSUM); clear_opt(sb, DATA_FLAGS); + clear_opt2(sb, JOURNAL_FAST_COMMIT); sbi->s_journal = NULL; needs_recovery = 0; goto no_journal; diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index df03825ad1a1..b7eed49b8ecd 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -288,6 +288,7 @@ typedef struct journal_superblock_s #define JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT 0x00000004 #define JBD2_FEATURE_INCOMPAT_CSUM_V2 0x00000008 #define JBD2_FEATURE_INCOMPAT_CSUM_V3 0x00000010 +#define JBD2_FEATURE_INCOMPAT_FAST_COMMIT 0x00000020 /* See "journal feature predicate functions" below */ @@ -298,7 +299,8 @@ typedef struct journal_superblock_s JBD2_FEATURE_INCOMPAT_64BIT | \ JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT | \ JBD2_FEATURE_INCOMPAT_CSUM_V2 | \ - JBD2_FEATURE_INCOMPAT_CSUM_V3) + JBD2_FEATURE_INCOMPAT_CSUM_V3 | \ + JBD2_FEATURE_INCOMPAT_FAST_COMMIT) #ifdef __KERNEL__ @@ -1235,6 +1237,7 @@ JBD2_FEATURE_INCOMPAT_FUNCS(64bit, 64BIT) JBD2_FEATURE_INCOMPAT_FUNCS(async_commit, ASYNC_COMMIT) JBD2_FEATURE_INCOMPAT_FUNCS(csum2, CSUM_V2) JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3) +JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit, FAST_COMMIT) /* * Journal flag definitions From patchwork Tue Oct 1 07:40:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169761 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Z/CP4mk3"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB714gY3z9sQy for ; Tue, 1 Oct 2019 17:42:05 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730124AbfJAHmE (ORCPT ); Tue, 1 Oct 2019 03:42:04 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:46144 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729189AbfJAHmE (ORCPT ); Tue, 1 Oct 2019 03:42:04 -0400 Received: by mail-pg1-f193.google.com with SMTP id a3so8972566pgm.13 for ; Tue, 01 Oct 2019 00:42:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EGqBedYLzZgUreszGDKD1/yH1H0MhImKzKjZTUiPsyc=; b=Z/CP4mk3S1ZRZup5pJYGSEmJ35y9fopeQEMYsEaFKg503JhkjzNsHkoYRT7D+w/pLq r8xi3fAPR0G7IYX0qo/vxVxjRgVvBHCRPS7YUX34mXlTbWBIIy5QEUuSe17+1xdj8YoH IUhsxZP+t3N3Ue1C7xK6yn+8xw4IpAxaPWYqBEd3CddJzj6QOTdYqkUROMCCwZL9NtRK V4FX5URSqYFYZcEu8DVTScJpZrJikEmRj6fo5G3eECNkiD+d07KOISVv5BWcPZmDRnl0 S0wRby3Om5zEe0yc83tJxj0jC39ynORXhQHtBVLbzHjApjor+YeE4z1bc1n8/FS4vv// wZkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EGqBedYLzZgUreszGDKD1/yH1H0MhImKzKjZTUiPsyc=; b=cJsNcZYg8Km/aHtqGkvPcY/D7ptF6gAaXC6j4gOxeRcNr98RVpsm3SoVydMDgBN9pK ovtAZU5TRIVJMa5r49aIdAkmGMt6zspOBuyg0aXkxPE9eY29c0iDut3+jqautyvlgWlt I5/vTO0q7/jCXWSDtnsffehxCqoSchg6L6XnzQcZylxhJaCX+cTyE6MBf6nPxw3jaIcG CYBNu4K+EsHP47SfRd5YoELkA8RAJzx6q8akQjcEzq4fMIMY27FGI5BSJOqde135gUI7 I0eYUGR7mGrH69H8rwdqUn4kHswmzWkSuYt9NfO2xPx+KvxuWowfPbumth/ODi9RM8gU Xp1Q== X-Gm-Message-State: APjAAAXAEuzAzTz3jxCjhbZ1yc9PQwbnEEphKlDxKZ+G0sJmOY7TdKHo AkkuzjFfqCP3k1OxPAmaBfXvlIF64zg= X-Google-Smtp-Source: APXvYqwVBVXCjahgXEAWnWCG9lt9GPt0iHB/KEDhZHxOI9F+qJHoTzUwM9KPxWjXqRFKpV8xkp6i2Q== X-Received: by 2002:aa7:96b8:: with SMTP id g24mr7778057pfk.163.1569915722862; Tue, 01 Oct 2019 00:42:02 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:02 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 02/13] jbd2: fast commit setup and enable Date: Tue, 1 Oct 2019 00:40:51 -0700 Message-Id: <20191001074101.256523-3-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch allows file systems to turn fast commits on and thereby restrict the normal journalling space to total journal blocks minus JBD2_FAST_COMMIT_BLOCKS. Fast commits are not actually performed, just the interface to turn fast commits on is opened. Signed-off-by: Harshad Shirwadkar --- fs/ext4/super.c | 5 +++- fs/jbd2/journal.c | 68 +++++++++++++++++++++++++++++++++----------- include/linux/jbd2.h | 39 +++++++++++++++++++++++++ 3 files changed, 95 insertions(+), 17 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index e376ac040cce..7725eb2105f4 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4933,7 +4933,10 @@ static int ext4_load_journal(struct super_block *sb, if (save) memcpy(save, ((char *) es) + EXT4_S_ERR_START, EXT4_S_ERR_LEN); - err = jbd2_journal_load(journal); + if (test_opt2(sb, JOURNAL_FAST_COMMIT)) + err = jbd2_journal_load_with_fc(journal); + else + err = jbd2_journal_load(journal); if (save) memcpy(((char *) es) + EXT4_S_ERR_START, save, EXT4_S_ERR_LEN); diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 953990eb70a9..7c13834873ad 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1159,12 +1159,15 @@ static journal_t *journal_init_common(struct block_device *bdev, journal->j_blk_offset = start; journal->j_maxlen = len; n = journal->j_blocksize / sizeof(journal_block_tag_t); - journal->j_wbufsize = n; + journal->j_wbufsize = n - JBD2_FAST_COMMIT_BLOCKS; journal->j_wbuf = kmalloc_array(n, sizeof(struct buffer_head *), GFP_KERNEL); if (!journal->j_wbuf) goto err_cleanup; + journal->j_fc_wbuf = &journal->j_wbuf[journal->j_wbufsize]; + journal->j_fc_wbufsize = JBD2_FAST_COMMIT_BLOCKS; + bh = getblk_unmovable(journal->j_dev, start, journal->j_blocksize); if (!bh) { pr_err("%s: Cannot get buffer for journal superblock\n", @@ -1297,11 +1300,19 @@ static int journal_reset(journal_t *journal) } journal->j_first = first; - journal->j_last = last; - journal->j_head = first; - journal->j_tail = first; - journal->j_free = last - first; + if (jbd2_has_feature_fast_commit(journal)) { + journal->j_last_fc = last; + journal->j_last = last - JBD2_FAST_COMMIT_BLOCKS; + journal->j_first_fc = journal->j_last + 1; + journal->j_fc_off = 0; + } else { + journal->j_last = last; + } + + journal->j_head = journal->j_first; + journal->j_tail = journal->j_first; + journal->j_free = journal->j_last - journal->j_first; journal->j_tail_sequence = journal->j_transaction_sequence; journal->j_commit_sequence = journal->j_transaction_sequence - 1; @@ -1626,22 +1637,21 @@ static int load_superblock(journal_t *journal) journal->j_tail_sequence = be32_to_cpu(sb->s_sequence); journal->j_tail = be32_to_cpu(sb->s_start); journal->j_first = be32_to_cpu(sb->s_first); - journal->j_last = be32_to_cpu(sb->s_maxlen); journal->j_errno = be32_to_cpu(sb->s_errno); + if (jbd2_has_feature_fast_commit(journal)) { + journal->j_last_fc = be32_to_cpu(sb->s_maxlen); + journal->j_last = journal->j_last_fc - JBD2_FAST_COMMIT_BLOCKS; + journal->j_first_fc = journal->j_last + 1; + journal->j_fc_off = 0; + } else { + journal->j_last = be32_to_cpu(sb->s_maxlen); + } + return 0; } - -/** - * int jbd2_journal_load() - Read journal from disk. - * @journal: Journal to act on. - * - * Given a journal_t structure which tells us which disk blocks contain - * a journal, read the journal from disk to initialise the in-memory - * structures. - */ -int jbd2_journal_load(journal_t *journal) +static int __jbd2_journal_load(journal_t *journal, bool enable_fc) { int err; journal_superblock_t *sb; @@ -1684,6 +1694,12 @@ int jbd2_journal_load(journal_t *journal) return -EFSCORRUPTED; } + if (enable_fc) + jbd2_journal_set_features(journal, 0, 0, + JBD2_FEATURE_INCOMPAT_FAST_COMMIT); + else + jbd2_journal_clear_features(journal, 0, 0, + JBD2_FEATURE_INCOMPAT_FAST_COMMIT); /* OK, we've finished with the dynamic journal bits: * reinitialise the dynamic contents of the superblock in memory * and reset them on disk. */ @@ -1699,6 +1715,26 @@ int jbd2_journal_load(journal_t *journal) return -EIO; } +/** + * int jbd2_journal_load() - Read journal from disk. + * @journal: Journal to act on. + * + * Given a journal_t structure which tells us which disk blocks contain + * a journal, read the journal from disk to initialise the in-memory + * structures. + */ +int jbd2_journal_load(journal_t *journal) +{ + return __jbd2_journal_load(journal, false); +} + +/* Same as above but also enables fast commits. */ +int jbd2_journal_load_with_fc(journal_t *journal) +{ + return __jbd2_journal_load(journal, true); +} + + /** * void jbd2_journal_destroy() - Release a journal_t structure. * @journal: Journal to act on. diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index b7eed49b8ecd..84d04e1f3d92 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -67,6 +67,7 @@ extern void *jbd2_alloc(size_t size, gfp_t flags); extern void jbd2_free(void *ptr, size_t size); #define JBD2_MIN_JOURNAL_BLOCKS 1024 +#define JBD2_FAST_COMMIT_BLOCKS 128 #ifdef __KERNEL__ @@ -918,6 +919,30 @@ struct journal_s */ unsigned long j_last; + /** + * @j_first_fc: + * + * The block number of the first fast commit block in the journal + * [j_state_lock]. + */ + unsigned long j_first_fc; + + /** + * @j_fc_off: + * + * Number of fast commit blocks currently allocated. + * [j_state_lock]. + */ + unsigned long j_fc_off; + + /** + * @j_last_fc: + * + * The block number one beyond the last fast commit block in the journal + * [j_state_lock]. + */ + unsigned long j_last_fc; + /** * @j_dev: Device where we store the journal. */ @@ -1061,6 +1086,12 @@ struct journal_s */ struct buffer_head **j_wbuf; + /** + * @j_fc_wbuf: Array of fast commit bhs for + * jbd2_journal_commit_transaction. + */ + struct buffer_head **j_fc_wbuf; + /** * @j_wbufsize: * @@ -1068,6 +1099,13 @@ struct journal_s */ int j_wbufsize; + /** + * @j_fc_wbufsize: + * + * Size of @j_fc_wbuf array. + */ + int j_fc_wbufsize; + /** * @j_last_sync_writer: * @@ -1398,6 +1436,7 @@ extern int jbd2_journal_set_features extern void jbd2_journal_clear_features (journal_t *, unsigned long, unsigned long, unsigned long); extern int jbd2_journal_load (journal_t *journal); +extern int jbd2_journal_load_with_fc(journal_t *journal); extern int jbd2_journal_destroy (journal_t *); extern int jbd2_journal_recover (journal_t *journal); extern int jbd2_journal_wipe (journal_t *, int); From patchwork Tue Oct 1 07:40:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169763 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="UvWJ6Reo"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB725YsPz9sRD for ; Tue, 1 Oct 2019 17:42:06 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733044AbfJAHmF (ORCPT ); Tue, 1 Oct 2019 03:42:05 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:33126 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729189AbfJAHmF (ORCPT ); Tue, 1 Oct 2019 03:42:05 -0400 Received: by mail-pg1-f194.google.com with SMTP id q1so985951pgb.0 for ; Tue, 01 Oct 2019 00:42:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jM8JPDkZemhTgvJ+cN8giFPadOHPxunbdFock8wet+U=; b=UvWJ6ReoG0w56MjjKypwvfWRQfqbVDmSlwFBcCIOvBgWRDT2vCaJwbs5fwiAqgbfc3 VCSF84VwBJf5JJUHPUT85rv9Z8hioy2qegByUnukU9ynRkNJ57cXeh1ueU0au/5uMJe1 WgP71SpjkIXLZeoDiNczr27rJQGoSjlBT2IZZNbjUpQWsoTeW4ZftZu9VBp1bgAO3vX0 sQUWB4pjQqVTqmQI37hX/66hYmCCc9OzD7AIQ+3sTbAJ+UENtA42jVKNYKSDds3u1/A0 i1QpIgYlM8X5MeZqTn/Bif5u9SGhEkYjWvAJH7kOPMwchBugrhwNnmsxgphQDScthRzf hydA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jM8JPDkZemhTgvJ+cN8giFPadOHPxunbdFock8wet+U=; b=GndGMo7t2jeTW0B1D4azaxI2NpF3rjcHJGPFErtTd99pTM0EGGMNahuQz1Hp5KlVuT WaC6s72Fvq3ZwI3vdDjEa+unBVLCmqkLNSwQXwZ2wVGhx5gsTaa2lCYNnbQyUTGh6k1s ODvaZ4AapjTwP6jc5BKhW8hDS1e3thF/oMGqpg/F9Z331VyKTNs5mF+BF/QAEdmnBHI2 M/sErlarWbnKe9dqdsUDlBWboTh45J3vax0N/Yy2DQbnIZue3g31ORGoT285KWFLld9M AgKpfpz77HtK7AVzrCu6H6z9PuZ86g/u1rLpb/G/LFy2cXRXvpHFYe2Xmv5m5FIb5YV0 hNGg== X-Gm-Message-State: APjAAAWOn3r9NnB1Zfqka6T1K/KCiJa3jmkBx3Qgts+ATAIpGReRrLbE 4KIdDStw9EbVSGRGHx1y9yYf+VM0e38= X-Google-Smtp-Source: APXvYqzEKbiZ06/z/XrO2QbDZWsH42ArNZYqDbJ8QTmhPqi+N+HAIy9Y50Q/wiaPtUSE7IigIeABMA== X-Received: by 2002:a62:ac13:: with SMTP id v19mr27198774pfe.202.1569915723659; Tue, 01 Oct 2019 00:42:03 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:03 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 03/13] jbd2: fast-commit commit path changes Date: Tue, 1 Oct 2019 00:40:52 -0700 Message-Id: <20191001074101.256523-4-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch adds core fast-commit commit path changes. This patch also modifies existing JBD2 APIs to allow usage of fast commits. If fast commits are enabled and journal->j_do_full_commit is not set, the commit routine tries the file system specific fast commmit first. Only if it fails, it falls back to the full commit. Commit start and wait routines have their own variants that support fast commits. In this patch we also add a new entry to journal->stats which counts the number of fast commits performed. Signed-off-by: Harshad Shirwadkar --- fs/jbd2/commit.c | 55 ++++++++++++++++++++-- fs/jbd2/journal.c | 94 ++++++++++++++++++++++++++++++++----- fs/jbd2/transaction.c | 1 + include/linux/jbd2.h | 42 ++++++++++++++++- include/trace/events/jbd2.h | 9 ++-- 5 files changed, 182 insertions(+), 19 deletions(-) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 132fb92098c7..7db3e2b6336d 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -351,8 +351,12 @@ static void jbd2_block_tag_csum_set(journal_t *j, journal_block_tag_t *tag, * * The primary function for committing a transaction to the log. This * function is called by the journal thread to begin a complete commit. + * + * fc is input / output parameter. If fc is non-null and is set to true, this + * function tries to perform fast commit. If the fast commit is successfully + * performed, *fc is set to true. */ -void jbd2_journal_commit_transaction(journal_t *journal) +void jbd2_journal_commit_transaction(journal_t *journal, bool *fc) { struct transaction_stats_s stats; transaction_t *commit_transaction; @@ -380,6 +384,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) tid_t first_tid; int update_tail; int csum_size = 0; + bool full_commit; LIST_HEAD(io_bufs); LIST_HEAD(log_bufs); @@ -413,6 +418,44 @@ void jbd2_journal_commit_transaction(journal_t *journal) J_ASSERT(journal->j_running_transaction != NULL); J_ASSERT(journal->j_committing_transaction == NULL); + write_lock(&journal->j_state_lock); + full_commit = journal->j_do_full_commit; + write_unlock(&journal->j_state_lock); + + /* Let file-system try its own fast commit */ + if (jbd2_has_feature_fast_commit(journal)) { + if (!full_commit && fc && *fc == true && + journal->j_fc_commit_callback && + !journal->j_fc_commit_callback( + journal, journal->j_running_transaction->t_tid, + journal->j_running_transaction->t_subtid, &stats.run)) { + jbd_debug(3, "fast commit success.\n"); + if (journal->j_fc_cleanup_callback) + journal->j_fc_cleanup_callback(journal); + write_lock(&journal->j_state_lock); + journal->j_fc_sequence = journal->j_running_transaction + ->t_subtid; + journal->j_running_transaction->t_subtid++; + if (fc) + *fc = true; + write_unlock(&journal->j_state_lock); + trace_jbd2_run_stats(journal->j_fs_dev->bd_dev, + journal->j_running_transaction + ->t_tid, + &stats.run, true); + goto update_overall_stats; + } + if (journal->j_fc_cleanup_callback) + journal->j_fc_cleanup_callback(journal); + write_lock(&journal->j_state_lock); + journal->j_do_full_commit = false; + write_unlock(&journal->j_state_lock); + } + + jbd_debug(3, "fast commit not performed, trying full.\n"); + if (fc) + *fc = false; + commit_transaction = journal->j_running_transaction; trace_jbd2_start_commit(journal, commit_transaction); @@ -420,6 +463,7 @@ void jbd2_journal_commit_transaction(journal_t *journal) commit_transaction->t_tid); write_lock(&journal->j_state_lock); + journal->j_fc_off = 0; J_ASSERT(commit_transaction->t_state == T_RUNNING); commit_transaction->t_state = T_LOCKED; @@ -1085,12 +1129,13 @@ void jbd2_journal_commit_transaction(journal_t *journal) stats.run.rs_handle_count = atomic_read(&commit_transaction->t_handle_count); trace_jbd2_run_stats(journal->j_fs_dev->bd_dev, - commit_transaction->t_tid, &stats.run); + commit_transaction->t_tid, &stats.run, false); stats.ts_requested = (commit_transaction->t_requested) ? 1 : 0; commit_transaction->t_state = T_COMMIT_CALLBACK; J_ASSERT(commit_transaction == journal->j_committing_transaction); journal->j_commit_sequence = commit_transaction->t_tid; + journal->j_fc_sequence = 0; journal->j_committing_transaction = NULL; commit_time = ktime_to_ns(ktime_sub(ktime_get(), start_time)); @@ -1129,8 +1174,12 @@ void jbd2_journal_commit_transaction(journal_t *journal) /* * Calculate overall stats */ +update_overall_stats: spin_lock(&journal->j_history_lock); - journal->j_stats.ts_tid++; + if (fc && *fc == true) + journal->j_stats.ts_num_fast_commits++; + else + journal->j_stats.ts_tid++; journal->j_stats.ts_requested += stats.ts_requested; journal->j_stats.run.rs_wait += stats.run.rs_wait; journal->j_stats.run.rs_request_delay += stats.run.rs_request_delay; diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 7c13834873ad..6853064605ff 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -160,7 +160,13 @@ static void commit_timeout(struct timer_list *t) * * 1) COMMIT: Every so often we need to commit the current state of the * filesystem to disk. The journal thread is responsible for writing - * all of the metadata buffers to disk. + * all of the metadata buffers to disk. If fast commits are allowed, + * journal thread passes the control to the file system and file system + * is then responsible for writing metadata buffers to disk (in whichever + * format it wants). If fast commit succeds, journal thread won't perform + * a normal commit. In case the fast commit fails, journal thread performs + * full commit as normal. + * * * 2) CHECKPOINT: We cannot reuse a used section of the log file until all * of the data in that part of the log has been rewritten elsewhere on @@ -172,6 +178,7 @@ static int kjournald2(void *arg) { journal_t *journal = arg; transaction_t *transaction; + bool fc_flag = true, fc_flag_save; /* * Set up an interval timer which can be used to trigger a commit wakeup @@ -209,9 +216,14 @@ static int kjournald2(void *arg) jbd_debug(1, "OK, requests differ\n"); write_unlock(&journal->j_state_lock); del_timer_sync(&journal->j_commit_timer); - jbd2_journal_commit_transaction(journal); + fc_flag_save = fc_flag; + jbd2_journal_commit_transaction(journal, &fc_flag); write_lock(&journal->j_state_lock); - goto loop; + if (!fc_flag) { + /* fast commit not performed */ + fc_flag = fc_flag_save; + goto loop; + } } wake_up(&journal->j_wait_done_commit); @@ -235,16 +247,18 @@ static int kjournald2(void *arg) prepare_to_wait(&journal->j_wait_commit, &wait, TASK_INTERRUPTIBLE); - if (journal->j_commit_sequence != journal->j_commit_request) + if (!fc_flag && + journal->j_commit_sequence != journal->j_commit_request) should_sleep = 0; transaction = journal->j_running_transaction; if (transaction && time_after_eq(jiffies, - transaction->t_expires)) + transaction->t_expires)) should_sleep = 0; if (journal->j_flags & JBD2_UNMOUNT) should_sleep = 0; if (should_sleep) { write_unlock(&journal->j_state_lock); + jbd_debug(1, "%s sleeps\n", __func__); schedule(); write_lock(&journal->j_state_lock); } @@ -259,7 +273,10 @@ static int kjournald2(void *arg) transaction = journal->j_running_transaction; if (transaction && time_after_eq(jiffies, transaction->t_expires)) { journal->j_commit_request = transaction->t_tid; + fc_flag = false; jbd_debug(1, "woke because of timeout\n"); + } else { + fc_flag = true; } goto loop; @@ -522,11 +539,23 @@ int jbd2_log_start_commit(journal_t *journal, tid_t tid) int ret; write_lock(&journal->j_state_lock); + journal->j_do_full_commit = true; ret = __jbd2_log_start_commit(journal, tid); write_unlock(&journal->j_state_lock); return ret; } +int jbd2_log_start_commit_fast(journal_t *journal, tid_t tid) +{ + int ret; + + write_lock(&journal->j_state_lock); + ret = __jbd2_log_start_commit(journal, tid); + write_unlock(&journal->j_state_lock); + + return ret; +} + /* * Force and wait any uncommitted transactions. We can only force the running * transaction if we don't have an active handle, otherwise, we will deadlock. @@ -603,11 +632,15 @@ int jbd2_journal_force_commit(journal_t *journal) * if a transaction is going to be committed (or is currently already * committing), and fills its tid in at *ptid */ -int jbd2_journal_start_commit(journal_t *journal, tid_t *ptid) +int __jbd2_journal_start_commit(journal_t *journal, tid_t *ptid, + bool full_commit) { int ret = 0; write_lock(&journal->j_state_lock); + if (!journal->j_do_full_commit) + journal->j_do_full_commit = full_commit; + if (journal->j_running_transaction) { tid_t tid = journal->j_running_transaction->t_tid; @@ -630,6 +663,16 @@ int jbd2_journal_start_commit(journal_t *journal, tid_t *ptid) return ret; } +int jbd2_journal_start_commit_fast(journal_t *journal, tid_t *ptid) +{ + return __jbd2_journal_start_commit(journal, ptid, false); +} + +int jbd2_journal_start_commit(journal_t *journal, tid_t *ptid) +{ + return __jbd2_journal_start_commit(journal, ptid, true); +} + /* * Return 1 if a given transaction has not yet sent barrier request * connected with a transaction commit. If 0 is returned, transaction @@ -675,7 +718,7 @@ EXPORT_SYMBOL(jbd2_trans_will_send_data_barrier); * Wait for a specified commit to complete. * The caller may not hold the journal lock. */ -int jbd2_log_wait_commit(journal_t *journal, tid_t tid) +int __jbd2_log_wait_commit(journal_t *journal, tid_t tid, tid_t subtid) { int err = 0; @@ -702,12 +745,27 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid) } #endif while (tid_gt(tid, journal->j_commit_sequence)) { - jbd_debug(1, "JBD2: want %u, j_commit_sequence=%u\n", - tid, journal->j_commit_sequence); + if ((!journal->j_do_full_commit) && + !tid_gt(subtid, journal->j_fc_sequence)) + break; + jbd_debug(1, "JBD2: want full commit %u %s %u, ", + tid, journal->j_do_full_commit ? + "and ignoring fast commit request for " : + "or want fast commit", + journal->j_fc_sequence); + jbd_debug(1, "j_commit_sequence=%u, j_fc_sequence=%u\n", + journal->j_commit_sequence, + journal->j_fc_sequence); read_unlock(&journal->j_state_lock); wake_up(&journal->j_wait_commit); - wait_event(journal->j_wait_done_commit, - !tid_gt(tid, journal->j_commit_sequence)); + if (journal->j_do_full_commit) + wait_event(journal->j_wait_done_commit, + !tid_gt(tid, journal->j_commit_sequence)); + else + wait_event(journal->j_wait_done_commit, + !tid_gt(tid, journal->j_commit_sequence) || + !tid_gt(subtid, + journal->j_fc_sequence)); read_lock(&journal->j_state_lock); } read_unlock(&journal->j_state_lock); @@ -717,6 +775,13 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid) return err; } +int jbd2_log_wait_commit(journal_t *journal, tid_t tid) +{ + journal->j_do_full_commit = true; + return __jbd2_log_wait_commit(journal, tid, 0); +} + + /* Return 1 when transaction with given tid has already committed. */ int jbd2_transaction_committed(journal_t *journal, tid_t tid) { @@ -996,6 +1061,8 @@ static int jbd2_seq_info_show(struct seq_file *seq, void *v) "each up to %u blocks\n", s->stats->ts_tid, s->stats->ts_requested, s->journal->j_max_transaction_buffers); + seq_printf(seq, "%lu fast commits performed\n", + s->stats->ts_num_fast_commits); if (s->stats->ts_tid == 0) return 0; seq_printf(seq, "average: \n %ums waiting for transaction\n", @@ -1020,6 +1087,9 @@ static int jbd2_seq_info_show(struct seq_file *seq, void *v) s->stats->run.rs_blocks / s->stats->ts_tid); seq_printf(seq, " %lu logged blocks per transaction\n", s->stats->run.rs_blocks_logged / s->stats->ts_tid); + seq_printf(seq, " %lu logged blocks per commit\n", + s->stats->run.rs_blocks_logged / + (s->stats->ts_tid + s->stats->ts_num_fast_commits)); return 0; } @@ -1752,7 +1822,7 @@ int jbd2_journal_destroy(journal_t *journal) /* Force a final log commit */ if (journal->j_running_transaction) - jbd2_journal_commit_transaction(journal); + jbd2_journal_commit_transaction(journal, NULL); /* Force any old transactions to disk */ diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 990e7b5062e7..ce7f03cfd90b 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -84,6 +84,7 @@ static void jbd2_get_transaction(journal_t *journal, transaction->t_state = T_RUNNING; transaction->t_start_time = ktime_get(); transaction->t_tid = journal->j_transaction_sequence++; + transaction->t_subtid = 1; transaction->t_expires = jiffies + journal->j_commit_interval; spin_lock_init(&transaction->t_handle_lock); atomic_set(&transaction->t_updates, 0); diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 84d04e1f3d92..41315f648c0f 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -580,6 +580,9 @@ struct transaction_s /* Sequence number for this transaction [no locking] */ tid_t t_tid; + /* Sequence number of the current ongoing fast commit [no locking] */ + tid_t t_subtid; + /* * Transaction's current state * [no locking - only kjournald2 alters this] @@ -742,6 +745,7 @@ struct transaction_run_stats_s { struct transaction_stats_s { unsigned long ts_tid; + unsigned long ts_num_fast_commits; unsigned long ts_requested; struct transaction_run_stats_s run; }; @@ -943,6 +947,13 @@ struct journal_s */ unsigned long j_last_fc; + /* + * @j_do_full_commit: + * + * Force a full commit. If this flag is set JBD2 won't try fast commits + */ + bool j_do_full_commit; + /** * @j_dev: Device where we store the journal. */ @@ -1012,6 +1023,14 @@ struct journal_s */ tid_t j_transaction_sequence; + /** + * @j_fc_sequence: + * + * The sequence number of the most recently committed fast + * commit. [j_state_lock] + */ + tid_t j_fc_sequence; + /** * @j_commit_sequence: * @@ -1205,6 +1224,24 @@ struct journal_s */ struct lockdep_map j_trans_commit_map; #endif + /** + * @j_fc_commit_callback: + * + * File-system specific function that performs actual fast commit + * operation. Should return 0 if the fast commit was successful, in that + * case, JBD2 will just increment journal->j_subtid and move on. If it + * returns < 0, JBD2 will fall-back to full commit. + */ + int (*j_fc_commit_callback)(struct journal_s *journal, tid_t tid, + tid_t subtid, + struct transaction_run_stats_s *stats); + /** + * @j_fc_cleanup_callback: + * + * Clean-up after fast commit or full commit. JBD2 calls this function + * after every commit operation. + */ + void (*j_fc_cleanup_callback)(struct journal_s *journal); }; #define jbd2_might_wait_for_commit(j) \ @@ -1323,7 +1360,8 @@ int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block); void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block); /* Commit management */ -extern void jbd2_journal_commit_transaction(journal_t *); +extern void jbd2_journal_commit_transaction(journal_t *journal, + bool *full_commit); /* Checkpoint list management */ void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy); @@ -1532,8 +1570,10 @@ extern void jbd2_clear_buffer_revoked_flags(journal_t *journal); */ int jbd2_log_start_commit(journal_t *journal, tid_t tid); +int jbd2_log_start_commit_fast(journal_t *journal, tid_t tid); int __jbd2_log_start_commit(journal_t *journal, tid_t tid); int jbd2_journal_start_commit(journal_t *journal, tid_t *tid); +int jbd2_journal_start_commit_fast(journal_t *journal, tid_t *tid); int jbd2_log_wait_commit(journal_t *journal, tid_t tid); int jbd2_transaction_committed(journal_t *journal, tid_t tid); int jbd2_complete_transaction(journal_t *journal, tid_t tid); diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h index 2310b259329f..af78bacdae83 100644 --- a/include/trace/events/jbd2.h +++ b/include/trace/events/jbd2.h @@ -233,9 +233,9 @@ TRACE_EVENT(jbd2_handle_stats, TRACE_EVENT(jbd2_run_stats, TP_PROTO(dev_t dev, unsigned long tid, - struct transaction_run_stats_s *stats), + struct transaction_run_stats_s *stats, bool fc), - TP_ARGS(dev, tid, stats), + TP_ARGS(dev, tid, stats, fc), TP_STRUCT__entry( __field( dev_t, dev ) @@ -249,6 +249,7 @@ TRACE_EVENT(jbd2_run_stats, __field( __u32, handle_count ) __field( __u32, blocks ) __field( __u32, blocks_logged ) + __field( bool, fc ) ), TP_fast_assign( @@ -263,11 +264,13 @@ TRACE_EVENT(jbd2_run_stats, __entry->handle_count = stats->rs_handle_count; __entry->blocks = stats->rs_blocks; __entry->blocks_logged = stats->rs_blocks_logged; + __entry->fc = fc; ), - TP_printk("dev %d,%d tid %lu wait %u request_delay %u running %u " + TP_printk("%s commit, dev %d,%d tid %lu wait %u request_delay %u running %u " "locked %u flushing %u logging %u handle_count %u " "blocks %u blocks_logged %u", + __entry->fc ? "fast" : "full", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid, jiffies_to_msecs(__entry->wait), jiffies_to_msecs(__entry->request_delay), From patchwork Tue Oct 1 07:40:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169764 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="J/v2WAn6"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB740NNzz9sQn for ; Tue, 1 Oct 2019 17:42:08 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733094AbfJAHmG (ORCPT ); Tue, 1 Oct 2019 03:42:06 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:34610 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728249AbfJAHmF (ORCPT ); Tue, 1 Oct 2019 03:42:05 -0400 Received: by mail-pg1-f194.google.com with SMTP id y35so9029855pgl.1 for ; Tue, 01 Oct 2019 00:42:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bz+rdl11Fft4nus7aI+CndEpiy5sIXJpnFCsdgrzL78=; b=J/v2WAn6fHJPJUdPrARnwP+7PiV5Sqlfs7/H/B+irldozDbseaE633P/+vBaVAXWbl Aazj954Frn8lgxc7ynQALMxyd8sD5PukIIS9R0UVC4c4nDoORrEppMfeAwjV3cV21Q6O UHkOCiOYFdrCk5DxXYwUF3RtOlgwV1FZ3OHrPV6oG4NL6Q4yv1rkDtX0OEkr5rrv2+WC Wc3KoOPbiynlpWLr/ZNAyMxRyyMC4pk9B3err7qas8Bq3Go4SXNaxzrzQ0nPgUnpLD/P UydrVsnvHKkrEKiQhxrTBDLix2tHqhQjeALyrN47Ig1+phq2xtHkjrv/bgO1v4+/+pzS U2Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bz+rdl11Fft4nus7aI+CndEpiy5sIXJpnFCsdgrzL78=; b=sU9aEAnNlZUb3ty2AzBf3AqjQYZ4E070IkIJI3zA2mZrMSS8ALwXKDfdz660P/ptkn ueWUr790YH+yucB8470ROCoGHVeOLtlog3x94CqfVquKbpcz+vtJj35vxBdfSQLH2mAM kNiZFkvx1NfnGH6i6rgr/sv2+7xN+23IZQ91X4Uld3sGzsvUxTfbGA4xfLlbOBJM/l4O HiLP768dErNyI/bRA5xDUDQ12A6u5vI8vc7e1lvClhA0QfdatTCS4l7PHNbr6b+SWiJn IRLSJ7z5AKzZm8tgx0PMlRCX26EhtG2eZh7kUGpRTcs5QRtD0vqk2q+pgvB6/DpLAdwR t1og== X-Gm-Message-State: APjAAAWY5BI2uQLrrByGvcm6hRzwBaV+o8PoCguJO/d71/eaQKIaUJVo EG8b0pRzzkuLpkpH/CyUCCNdAVpwZXE= X-Google-Smtp-Source: APXvYqy2mEV7ILpD2faGuOcploLj4Nc9oKApAgmyhOly5atCRfn3Duh7GKFkeiCALwwvsLElVmZTLg== X-Received: by 2002:a62:be01:: with SMTP id l1mr26372219pff.236.1569915724262; Tue, 01 Oct 2019 00:42:04 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:03 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 04/13] jbd2: fast-commit commit path new APIs Date: Tue, 1 Oct 2019 00:40:53 -0700 Message-Id: <20191001074101.256523-5-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch adds new helper APIs that ext4 needs for fast commits. These new fast commit APIs are used by subsequent fast commit patches to implement fast commits. Following new APIs are added: /* * Returns when either a full commit or a fast commit * completes */ int jbd2_fc_complete_commit(journal_tc *journal, tid_t tid, tid_t subtid) /* Send all the data buffers related to an inode */ int journal_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode) /* Map one fast commit buffer for use by the file system */ int jbd2_map_fc_buf(journal_t *journal, struct buffer_head **bh_out) /* Wait on fast commit buffers to complete IO */ jbd2_wait_on_fc_bufs(journal_t *journal, int num_bufs) /* * Returns 1 if transaction identified by tid:subtid is already * committed. */ int jbd2_commit_check(journal_t *journal, tid_t tid, tid_t subtid) Signed-off-by: Harshad Shirwadkar --- fs/jbd2/commit.c | 32 +++++++++++++ fs/jbd2/journal.c | 110 +++++++++++++++++++++++++++++++++++++++++++ include/linux/jbd2.h | 8 ++++ 3 files changed, 150 insertions(+) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 7db3e2b6336d..e85f51e1cc70 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -202,6 +202,38 @@ static int journal_submit_inode_data_buffers(struct address_space *mapping, return ret; } +int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode) +{ + struct address_space *mapping; + loff_t dirty_start = jinode->i_dirty_start; + loff_t dirty_end = jinode->i_dirty_end; + int ret; + + if (!jinode) + return 0; + + if (!(jinode->i_flags & JI_WRITE_DATA)) + return 0; + + dirty_start = jinode->i_dirty_start; + dirty_end = jinode->i_dirty_end; + + mapping = jinode->i_vfs_inode->i_mapping; + jinode->i_flags |= JI_COMMIT_RUNNING; + + trace_jbd2_submit_inode_data(jinode->i_vfs_inode); + ret = journal_submit_inode_data_buffers(mapping, dirty_start, + dirty_end); + + jinode->i_flags &= ~JI_COMMIT_RUNNING; + /* Protect JI_COMMIT_RUNNING flag */ + smp_mb(); + wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING); + + return ret; +} +EXPORT_SYMBOL(jbd2_submit_inode_data); + /* * Submit all the data buffers of inode associated with the transaction to * disk. diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 6853064605ff..14d549445418 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -781,6 +781,18 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid) return __jbd2_log_wait_commit(journal, tid, 0); } +int jbd2_commit_check(journal_t *journal, tid_t tid, tid_t subtid) +{ + if (journal->j_commit_sequence >= tid) + return 1; + if (!journal->j_running_transaction) + return 0; + if (journal->j_running_transaction->t_tid > tid) + return 1; + if (journal->j_running_transaction->t_subtid > subtid) + return 1; + return 0; +} /* Return 1 when transaction with given tid has already committed. */ int jbd2_transaction_committed(journal_t *journal, tid_t tid) @@ -830,6 +842,33 @@ int jbd2_complete_transaction(journal_t *journal, tid_t tid) } EXPORT_SYMBOL(jbd2_complete_transaction); +int jbd2_fc_complete_commit(journal_t *journal, tid_t tid, tid_t subtid) +{ + int need_to_wait = 1; + + read_lock(&journal->j_state_lock); + if (journal->j_running_transaction && + journal->j_running_transaction->t_tid == tid) { + /* Check if fast commit was already done */ + if (tid_geq(journal->j_fc_sequence, subtid)) + need_to_wait = 0; + if (journal->j_commit_request != tid) { + /* transaction not yet started, so request it */ + read_unlock(&journal->j_state_lock); + jbd2_log_start_commit_fast(journal, tid); + goto wait_commit; + } + } else if (!(journal->j_committing_transaction && + journal->j_committing_transaction->t_tid == tid)) + need_to_wait = 0; + read_unlock(&journal->j_state_lock); + if (!need_to_wait) + return 0; +wait_commit: + return __jbd2_log_wait_commit(journal, tid, subtid); +} +EXPORT_SYMBOL(jbd2_fc_complete_commit); + /* * Log buffer allocation routines: */ @@ -850,6 +889,77 @@ int jbd2_journal_next_log_block(journal_t *journal, unsigned long long *retp) return jbd2_journal_bmap(journal, blocknr, retp); } +int jbd2_map_fc_buf(journal_t *journal, struct buffer_head **bh_out) +{ + unsigned long long pblock; + unsigned long blocknr; + int ret = 0; + struct buffer_head *bh; + int fc_off; + journal_header_t *jhdr; + + write_lock(&journal->j_state_lock); + + if (journal->j_fc_off + journal->j_first_fc < journal->j_last_fc) { + fc_off = journal->j_fc_off; + blocknr = journal->j_first_fc + fc_off; + journal->j_fc_off++; + } else { + ret = -EINVAL; + } + write_unlock(&journal->j_state_lock); + + if (ret) + return ret; + + ret = jbd2_journal_bmap(journal, blocknr, &pblock); + if (ret) + return ret; + + bh = __getblk(journal->j_dev, pblock, journal->j_blocksize); + if (!bh) + return -ENOMEM; + + lock_buffer(bh); + jhdr = (journal_header_t *)bh->b_data; + jhdr->h_magic = cpu_to_be32(JBD2_MAGIC_NUMBER); + jhdr->h_blocktype = cpu_to_be32(JBD2_FC_BLOCK); + jhdr->h_sequence = cpu_to_be32(journal->j_running_transaction->t_tid); + + set_buffer_uptodate(bh); + unlock_buffer(bh); + journal->j_fc_wbuf[fc_off] = bh; + + *bh_out = bh; + + return 0; +} +EXPORT_SYMBOL(jbd2_map_fc_buf); + +int jbd2_wait_on_fc_bufs(journal_t *journal, int num_blks) +{ + struct buffer_head *bh; + int i, j_fc_off; + + read_lock(&journal->j_state_lock); + j_fc_off = journal->j_fc_off; + read_unlock(&journal->j_state_lock); + + /* + * Wait in reverse order to minimize chances of us being woken up before + * all IOs have completed + */ + for (i = j_fc_off - 1; i >= j_fc_off - num_blks; i--) { + bh = journal->j_fc_wbuf[i]; + wait_on_buffer(bh); + if (unlikely(!buffer_uptodate(bh))) + return -EIO; + } + + return 0; +} +EXPORT_SYMBOL(jbd2_wait_on_fc_bufs); + /* * Conversion of logical to physical block numbers for the journal * diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 41315f648c0f..c6a2b82de4cf 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -124,6 +124,7 @@ typedef struct journal_s journal_t; /* Journal control structure */ #define JBD2_SUPERBLOCK_V1 3 #define JBD2_SUPERBLOCK_V2 4 #define JBD2_REVOKE_BLOCK 5 +#define JBD2_FC_BLOCK 6 /* * Standard header for all descriptor blocks: @@ -1579,6 +1580,7 @@ int jbd2_transaction_committed(journal_t *journal, tid_t tid); int jbd2_complete_transaction(journal_t *journal, tid_t tid); int jbd2_log_do_checkpoint(journal_t *journal); int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid); +int jbd2_fc_complete_commit(journal_t *journal, tid_t tid, tid_t subtid); void __jbd2_log_wait_for_space(journal_t *journal); extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *); @@ -1729,6 +1731,12 @@ static inline tid_t jbd2_get_latest_transaction(journal_t *journal) return tid; } +/* Fast commit related APIs */ +int jbd2_map_fc_buf(journal_t *journal, struct buffer_head **bh_out); +int jbd2_wait_on_fc_bufs(journal_t *journal, int num_blks); +int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode); +int jbd2_commit_check(journal_t *journal, tid_t tid, tid_t subtid); + #ifdef __KERNEL__ #define buffer_trace_init(bh) do {} while (0) From patchwork Tue Oct 1 07:40:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169766 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="BEIj0pSG"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB752Mh3z9sRD for ; Tue, 1 Oct 2019 17:42:09 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733118AbfJAHmH (ORCPT ); Tue, 1 Oct 2019 03:42:07 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:38939 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732982AbfJAHmG (ORCPT ); Tue, 1 Oct 2019 03:42:06 -0400 Received: by mail-pf1-f194.google.com with SMTP id v4so7301149pff.6 for ; Tue, 01 Oct 2019 00:42:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=IG/o22FPTmuzsi+5RtU+Zm5XgdntXDau0oWlT14o+EA=; b=BEIj0pSG5WVGIxXPxW3D5Kfd/8B0po/MdSLT5syvlQ+fHbe6gUbg7Y1xQ1RYbJ1FYI 4Bufg81XQ75BjmxdwN/Lfa4Vv8uCKrJnL3jvkVPOmUly8ryJmrO87njFJPmvRGcs2s1+ nZ0Ixa92AxZ8SIVznaVfPWQIhXyE4aYdFK3Lk47bEHhBIzOh41srEi/bQNO6iVHqE6u1 stzvq0G/f4X56Hc9rnOvQ1lc2g9zu+DqsREk23gAFENZSEr6LuVn4mDnomo7aHYnNjrg mZ+jVNf7XCJThe41n34FpK9ovb2dbEuWS5CpFOBeP8iE4DEfZWBL04i6Df1pZgMfJGH/ 9CIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=IG/o22FPTmuzsi+5RtU+Zm5XgdntXDau0oWlT14o+EA=; b=JJABDNHvRvxGuSQ04t1Lr+xi7Pg9dRS9UDDJTAKa04ojTCFyolCWb3M9PYOeSCV6tx t/E2BU+RZ6BjOV5K9ehu/2Vp95LpGRV3fyNd1PglUjh5d4EQzTzXBM2d4R1Lgg/FwB3S FBC3mmqiwik1qtKIgRvc4yHovPXqXn0TZ9L4474oHwsBr2zGe4iU/xA1QaEsBelBtrBg qer60rTYVTJFb360V8pYHWxbgqRzwh+GDsMwCleM7PGIkY8i3ScuIvRSIFlJ9KBBnneb nkJZJdpANlg1bBqtjAGbIYGt+WNFFtVMfee7P/+6WlMxSO+EjWHQ56YL3mjS2YVYX9c4 WGFw== X-Gm-Message-State: APjAAAWA8OKO6j71+bSo5dvop5xlTFhsx1vtJP4uJJUurWK2tK8d+xJu Su37Bc4PeS/M88fP4Bd0CXad7IzihJ8= X-Google-Smtp-Source: APXvYqwMzFoq0u5anKYfPaMDmdfHDBcU76sc/ykGRBksNPmwA1dglWCT03GGeUfHn/LQI97H+JU57g== X-Received: by 2002:aa7:920d:: with SMTP id 13mr25970072pfo.17.1569915724883; Tue, 01 Oct 2019 00:42:04 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:04 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 05/13] jbd2: fast-commit recovery path changes Date: Tue, 1 Oct 2019 00:40:54 -0700 Message-Id: <20191001074101.256523-6-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch adds fast-commit recovery path changes for JBD2. If we find a fast commit block that is valid in our recovery phase call file system specific routine to handle that block. We also clear the fast commit flag in jbd2_mark_journal_empty() which is called after successful recovery as well successful checkpointing. This allows JBD2 journal to be compatible with older versions when there are no fast commit blocks. Signed-off-by: Harshad Shirwadkar --- fs/jbd2/journal.c | 12 +++++++++ fs/jbd2/recovery.c | 63 +++++++++++++++++++++++++++++++++++++++++--- include/linux/jbd2.h | 13 +++++++++ 3 files changed, 84 insertions(+), 4 deletions(-) diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 14d549445418..e0684212384d 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1635,6 +1635,7 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid, static void jbd2_mark_journal_empty(journal_t *journal, int write_op) { journal_superblock_t *sb = journal->j_superblock; + bool had_fast_commit = false; BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex)); lock_buffer(journal->j_sb_buffer); @@ -1648,9 +1649,20 @@ static void jbd2_mark_journal_empty(journal_t *journal, int write_op) sb->s_sequence = cpu_to_be32(journal->j_tail_sequence); sb->s_start = cpu_to_be32(0); + if (jbd2_has_feature_fast_commit(journal)) { + /* + * When journal is clean, no need to commit fast commit flag and + * make file system incompatible with older kernels. + */ + jbd2_clear_feature_fast_commit(journal); + had_fast_commit = true; + } jbd2_write_superblock(journal, write_op); + if (had_fast_commit) + jbd2_set_feature_fast_commit(journal); + /* Log is no longer empty */ write_lock(&journal->j_state_lock); journal->j_flags |= JBD2_FLUSHED; diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index a4967b27ffb6..c1f4c94ed375 100644 --- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -35,7 +35,6 @@ struct recovery_info int nr_revoke_hits; }; -enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY}; static int do_one_pass(journal_t *journal, struct recovery_info *info, enum passtype pass); static int scan_revoke_records(journal_t *, struct buffer_head *, @@ -225,8 +224,12 @@ static int count_tags(journal_t *journal, struct buffer_head *bh) /* Make sure we wrap around the log correctly! */ #define wrap(journal, var) \ do { \ - if (var >= (journal)->j_last) \ - var -= ((journal)->j_last - (journal)->j_first); \ + unsigned long _wrap_last = \ + jbd2_has_feature_fast_commit(journal) ? \ + (journal)->j_last_fc : (journal)->j_last; \ + \ + if (var >= _wrap_last) \ + var -= (_wrap_last - (journal)->j_first); \ } while (0) /** @@ -413,6 +416,51 @@ static int jbd2_block_tag_csum_verify(journal_t *j, journal_block_tag_t *tag, return tag->t_checksum == cpu_to_be16(csum32); } +static int fc_do_one_pass(journal_t *journal, + struct recovery_info *info, enum passtype pass) +{ + unsigned int expected_commit_id = info->end_transaction; + unsigned long next_fc_block; + struct buffer_head *bh; + unsigned int seq; + journal_header_t *jhdr; + int err = 0; + + next_fc_block = journal->j_first_fc; + + while (next_fc_block <= journal->j_last_fc) { + jbd_debug(3, "Fast commit replay: next block %lld", + next_fc_block); + err = jread(&bh, journal, next_fc_block); + if (err) + break; + + jhdr = (journal_header_t *)bh->b_data; + seq = be32_to_cpu(jhdr->h_sequence); + if (be32_to_cpu(jhdr->h_magic) != JBD2_MAGIC_NUMBER || + seq != expected_commit_id) { + break; + } + jbd_debug(3, "Processing fast commit blk with seq %d", + seq); + if (journal->j_fc_replay_callback) { + err = journal->j_fc_replay_callback( + journal, bh, pass, + next_fc_block - + journal->j_first_fc); + if (err) + break; + } + next_fc_block++; + } + + if (err) + jbd_debug(3, "Fast commit replay failed, err = %d\n", err); + + return err; +} + + static int do_one_pass(journal_t *journal, struct recovery_info *info, enum passtype pass) { @@ -470,7 +518,7 @@ static int do_one_pass(journal_t *journal, break; jbd_debug(2, "Scanning for sequence ID %u at %lu/%lu\n", - next_commit_ID, next_log_block, journal->j_last); + next_commit_ID, next_log_block, journal->j_last_fc); /* Skip over each chunk of the transaction looking * either the next descriptor block or the final commit @@ -768,6 +816,8 @@ static int do_one_pass(journal_t *journal, if (err) goto failed; continue; + case JBD2_FC_BLOCK: + continue; default: jbd_debug(3, "Unrecognised magic %d, end of scan.\n", @@ -799,6 +849,11 @@ static int do_one_pass(journal_t *journal, success = -EIO; } } + + + if (jbd2_has_feature_fast_commit(journal) && pass != PASS_REVOKE) + fc_do_one_pass(journal, info, pass); + if (block_error && success == 0) success = -EIO; return success; diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index c6a2b82de4cf..312103fc9581 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -762,6 +762,8 @@ jbd2_time_diff(unsigned long start, unsigned long end) #define JBD2_NR_BATCH 64 +enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY}; + /** * struct journal_s - The journal_s type is the concrete type associated with * journal_t. @@ -1243,6 +1245,17 @@ struct journal_s * after every commit operation. */ void (*j_fc_cleanup_callback)(struct journal_s *journal); + + /* + * @j_fc_replay_callback: + * + * File-system specific function that performs replay of a fast + * commit. JBD2 calls this function for each fast commit block found in + * the journal. + */ + int (*j_fc_replay_callback)(struct journal_s *journal, + struct buffer_head *bh, + enum passtype pass, int off); }; #define jbd2_might_wait_for_commit(j) \ From patchwork Tue Oct 1 07:40:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169765 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="vhPn11gl"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB744XXLz9sQq for ; Tue, 1 Oct 2019 17:42:08 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733117AbfJAHmH (ORCPT ); Tue, 1 Oct 2019 03:42:07 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:45944 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733115AbfJAHmG (ORCPT ); Tue, 1 Oct 2019 03:42:06 -0400 Received: by mail-pg1-f194.google.com with SMTP id q7so8981568pgi.12 for ; Tue, 01 Oct 2019 00:42:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=AZIwIIKhkPHpe75S3MVPOuZ40TBL+PpfgqOpwQ3TVG0=; b=vhPn11glkQ31XZQfS0gIMaRpFvNgKNXjpsJX5Qx/FulBl/cGB+g1W41vkoUdPDOSQf 1FJXf8ICI3Z5rkAVX5pVesp/3VbghvyLd/4gBCxmvar1LFrJnorrENH2J9obWu5bo23y vIh4KpPSGLuzjHPoaz45YHiN2O0D0PpCxBBKS2AkWZNmPBTBLjBdNzy51+I0K5J4KSxS OXyuYPsAy0oj6/hr+Ksb6p0lF7bh9R+QTsip42Tt6hmkp58HilYgYdXFzzd+ka5fJxez UijDzW5IzEsz4Jzn+uHze1rdjqKR6gKntknNalAiZ8dUzKjyaT3dOzaRHK6s7OHQHMtn XsGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AZIwIIKhkPHpe75S3MVPOuZ40TBL+PpfgqOpwQ3TVG0=; b=EGOiiOjokuC65QR/r5tsiI7ZSfed3jpTTT/7jb/hDSVGtp2SaapfedmvJ5kz4/Z18Z MDGje2DXJLioQ3CkPa9OICklZ1jc/RrxvYx33fB/3ycJlh/P+M5uhVUi8TjrUZ54Z4gL HLymufTSonrKdp85Y76EODs7P9XrS3s0Y356rPAisOo6taW08wgpIwgtDYbvv9HKnm5h rA84bvUlidWURq5MlL0DyDeJiekG6FuUAWifW0YEEXc6iAORzWmEijz4v7guIzCy7KUH 8gGgldpy1/nec1/DOdwy8RLjqU2OyH70Fl4no5Ox0XTUn3V54hrdPQ7joJ7RYlddY89A wCNg== X-Gm-Message-State: APjAAAX9XI+qKSKjsc/uwtDGK5VMDab9aw0jiRoA0rnO9KYGMn7P29jB TzQJpXmlqDJxGYcQfkXDNs6Hpy4UGXQ= X-Google-Smtp-Source: APXvYqy9Ub2B37yVl69lpT2oPGljLXjfuGtQ3qljtkwgOzqN7+reYYQ3Q6Cbzn/ih8RBDXR82z2B4Q== X-Received: by 2002:a62:1d12:: with SMTP id d18mr26467934pfd.53.1569915725536; Tue, 01 Oct 2019 00:42:05 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:05 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 06/13] ext4: add fields that are needed to track changed files Date: Tue, 1 Oct 2019 00:40:55 -0700 Message-Id: <20191001074101.256523-7-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Ext4's fast commit feature tracks changed files and maintains them in a queue. We also remember for each file the logical block range that needs to be committed. This patch adds these fields to ext4_inode_info and ext4_sb_info and also adds initialization calls. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4.h | 60 +++++++++++++++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.c | 20 +++++++++++++++ fs/ext4/ext4_jbd2.h | 2 ++ fs/ext4/ialloc.c | 1 + fs/ext4/inode.c | 1 + fs/ext4/super.c | 7 ++++++ 6 files changed, 91 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index becbda38b7db..c36ec23046f3 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -921,6 +921,48 @@ enum { I_DATA_SEM_QUOTA, }; +/* + * Ext4 fast commit inode specific information + */ +struct ext4_fast_commit_inode_info { + /* + * TID of when this struct was last updated. If fc_tid != + * running transaction tid, then none of the other fields in this struct + * are valid. Don't directly modify fields in this struct. Use wrappers + * provided in ext4_jbd2.c. + */ + tid_t fc_tid; + /* + * Start of logical block range that needs to be committed in this fast + * commit + */ + ext4_lblk_t fc_lblk_start; + + /* + * End of logical block range that needs to be committed in this fast + * commit + */ + ext4_lblk_t fc_lblk_end; + + /* + * Inode number of the directory that contains this inode. This field + * is onlt valid if fc_new is set. + */ + u32 fc_parent_ino; + + /* + * Flag indicating whether this inode is eligible for fast commits or + * not. + */ + bool fc_eligible; + + /* + * Flag indicating whether this inode is newly created during this + * tid:subtid. + */ + bool fc_new; + rwlock_t fc_lock; +}; /* * fourth extended file system inode data in memory @@ -955,6 +997,12 @@ struct ext4_inode_info { struct list_head i_orphan; /* unlinked but open inodes */ + struct list_head i_fc_list; /* + * inodes that need fast commit + * protected by sbi->s_fc_lock. + */ + struct ext4_fast_commit_inode_info i_fc; + /* * i_disksize keeps track of what the inode size is ON DISK, not * in memory. During truncate, i_size is set to the new size by @@ -1058,7 +1106,9 @@ struct ext4_inode_info { * fsync and fdatasync, respectively. */ tid_t i_sync_tid; + tid_t i_sync_subtid; tid_t i_datasync_tid; + tid_t i_datasync_subtid; #ifdef CONFIG_QUOTA struct dquot *i_dquot[MAXQUOTAS]; @@ -1529,6 +1579,16 @@ struct ext4_sb_info { /* Barrier between changing inodes' journal flags and writepages ops. */ struct percpu_rw_semaphore s_journal_flag_rwsem; struct dax_device *s_daxdev; + + /* Ext4 fast commit stuff */ + bool s_fc_replay; /* Fast commit replay in progress */ + struct list_head s_fc_q; /* Inodes that need fast commit. */ + __u32 s_fc_q_cnt; /* Number of inodes in the fc queue */ + bool s_fc_eligible; /* + * Are changes after the last commit + * eligible for fast commit? + */ + spinlock_t s_fc_lock; }; static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 7c70b08d104c..9066bcfbee29 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -330,3 +330,23 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line, mark_buffer_dirty(bh); return err; } + +static inline +void ext4_reset_inode_fc_info(struct ext4_fast_commit_inode_info *i_fc) +{ + i_fc->fc_tid = 0; + i_fc->fc_lblk_start = 0; + i_fc->fc_lblk_end = 0; + i_fc->fc_parent_ino = 0; + i_fc->fc_eligible = false; + i_fc->fc_new = false; +} + +void ext4_init_inode_fc_info(struct inode *inode) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + + ext4_reset_inode_fc_info(&ei->i_fc); + INIT_LIST_HEAD(&ei->i_fc_list); + rwlock_init(&ei->i_fc.fc_lock); +} diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index ef8fcf7d0d3b..2305c1acd415 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -459,4 +459,6 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) return 1; } +void ext4_init_inode_fc_info(struct inode *inode); + #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 764ff4c56233..ff30f3015551 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1131,6 +1131,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, ext4_clear_state_flags(ei); /* Only relevant on 32-bit archs */ ext4_set_inode_state(inode, EXT4_STATE_NEW); + ext4_init_inode_fc_info(inode); ei->i_extra_isize = sbi->s_want_extra_isize; ei->i_inline_off = 0; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 420fe3deed39..f230a888eddd 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4996,6 +4996,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, for (block = 0; block < EXT4_N_BLOCKS; block++) ei->i_data[block] = raw_inode->i_block[block]; INIT_LIST_HEAD(&ei->i_orphan); + ext4_init_inode_fc_info(&ei->vfs_inode); /* * Set transaction id's of transactions that have to be committed diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 7725eb2105f4..c90337fc98c1 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1100,6 +1100,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) ei->i_datasync_tid = 0; atomic_set(&ei->i_unwritten, 0); INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work); + ext4_init_inode_fc_info(&ei->vfs_inode); return &ei->vfs_inode; } @@ -1139,6 +1140,7 @@ static void init_once(void *foo) init_rwsem(&ei->i_data_sem); init_rwsem(&ei->i_mmap_sem); inode_init_once(&ei->vfs_inode); + ext4_init_inode_fc_info(&ei->vfs_inode); } static int __init init_inodecache(void) @@ -4301,6 +4303,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */ mutex_init(&sbi->s_orphan_lock); + INIT_LIST_HEAD(&sbi->s_fc_q); + sbi->s_fc_q_cnt = 0; + sbi->s_fc_eligible = true; + spin_lock_init(&sbi->s_fc_lock); + sb->s_root = NULL; needs_recovery = (es->s_last_orphan != 0 || From patchwork Tue Oct 1 07:40:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169767 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="riKWZbTR"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB783hT1z9sRW for ; Tue, 1 Oct 2019 17:42:12 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733119AbfJAHmI (ORCPT ); Tue, 1 Oct 2019 03:42:08 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:38944 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728249AbfJAHmH (ORCPT ); Tue, 1 Oct 2019 03:42:07 -0400 Received: by mail-pf1-f196.google.com with SMTP id v4so7301190pff.6 for ; Tue, 01 Oct 2019 00:42:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=I/YWvxeiYzEGIXeQlsiw80iCHG2DR3Xru95STQdZmlE=; b=riKWZbTRRScCM4Tm+/JWb2/4oo/R9jzQ76l4E/zR7UfIUT3G6kg06BVevOxFbDJdIM O5XFllgRUfoaX+LRbaOCoaCOf6raWY+4NA8BIseesL0KgsYIFgMz5O7w9kMaIIpALrJz b3RAz7+4dneIn9NHPduZHX4V/PLuDd8lp7utr5y3DqKdsJcE/Gl4MvwIefPjwWqnRkPO E4Y64q0DEOvudbdZLkvMLLCMEgUmL0Ii1I4r3iRl0CbwmIrgMgwE4A6Bm1m5GG60jzky SoAJE1epNJECEWZOIBvZs7BU9R7wn/1imbeaJtGWw2SlTVSL2PEbsOl5+p6Qr03wXFVm jdVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=I/YWvxeiYzEGIXeQlsiw80iCHG2DR3Xru95STQdZmlE=; b=AUUQLdv8lfmj4n2oqAIEecOUACMRvq6fkuAnYfvEzpA+WntoB2alEjcd54PlJ2Db3T iIm7nF8hkHOw86xKdHU0B9b92/5XWYuWI+zTUuUizMJgU0pZH6EuFl7UfoQg9CuSmXGc hSdJTyU6jY+EaH5Y0Ic8TZADeRYvlspRjS9asRMPYF7oT5GrYLy+kPGDcm18A/G6/s/m TsOzGv0vpHwVu3ROUHSX0WjdSZuYmG/QQo4nGRJkmDBP8epCjqlyy6TT+laTtTAZlSML u1qmBmUOwkjT+OQAR7sYsALR3xxvS4Imoj1GJdFcgduRLjq2T2pXkLV7GrFZc6Mr46XW 4w6A== X-Gm-Message-State: APjAAAXuPrkufNRFTNsTdpvgC7G6HMdNP+HFGWDReVTTC6O5XUDikfEd xi15Ni1cgblYnZ88MCgsQqTLUhb7asc= X-Google-Smtp-Source: APXvYqxeJjpYyGntkRhnUKA699eXfTbz3ZQQxzJB9P+tGpHh1dJptfUt3hK4gV2aOrO29tZ6Wug79g== X-Received: by 2002:aa7:99da:: with SMTP id v26mr26473899pfi.258.1569915726240; Tue, 01 Oct 2019 00:42:06 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:05 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 07/13] ext4: track changed files for fast commit Date: Tue, 1 Oct 2019 00:40:56 -0700 Message-Id: <20191001074101.256523-8-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org For fast commit, we need to remember all the files that have changed since last fast commit / full commit. For changes that are fast commit incompatible, we mark the file system fast commit incompatible. This patch adds code to either remember files that have changed or to mark ext4 as fast commit ineligible. We inspect every ext4_mark_inode_dirty calls and decide whether that particular file change is fast compatible or not. Signed-off-by: Harshad Shirwadkar --- fs/ext4/acl.c | 1 + fs/ext4/ext4_jbd2.c | 96 +++++++++++++++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.h | 44 +++++++++++++++++++++ fs/ext4/extents.c | 16 +++++++- fs/ext4/ialloc.c | 8 ++++ fs/ext4/inline.c | 10 +++++ fs/ext4/inode.c | 24 +++++++++++- fs/ext4/ioctl.c | 3 ++ fs/ext4/migrate.c | 1 + fs/ext4/namei.c | 12 +++++- fs/ext4/super.c | 5 +++ fs/ext4/xattr.c | 1 + 12 files changed, 216 insertions(+), 5 deletions(-) diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c index 8c7bbf3e566d..e84be9c315db 100644 --- a/fs/ext4/acl.c +++ b/fs/ext4/acl.c @@ -257,6 +257,7 @@ ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type) inode->i_mode = mode; inode->i_ctime = current_time(inode); ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); } out_stop: ext4_journal_stop(handle); diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 9066bcfbee29..e70ad7a8e46e 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -331,6 +331,13 @@ int __ext4_handle_dirty_super(const char *where, unsigned int line, return err; } +static inline tid_t get_running_txn_tid(struct super_block *sb) +{ + if (EXT4_SB(sb)->s_journal) + return EXT4_SB(sb)->s_journal->j_commit_sequence + 1; + return 0; +} + static inline void ext4_reset_inode_fc_info(struct ext4_fast_commit_inode_info *i_fc) { @@ -350,3 +357,92 @@ void ext4_init_inode_fc_info(struct inode *inode) INIT_LIST_HEAD(&ei->i_fc_list); rwlock_init(&ei->i_fc.fc_lock); } + +void ext4_fc_enqueue_inode(handle_t *handle, struct inode *inode) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + tid_t running_txn_tid = get_running_txn_tid(inode->i_sb); + + if (!ext4_should_fast_commit(inode->i_sb)) + return; + + spin_lock(&sbi->s_fc_lock); + if (!sbi->s_fc_eligible) { + spin_unlock(&sbi->s_fc_lock); + return; + } + if (list_empty(&EXT4_I(inode)->i_fc_list)) { + list_add(&EXT4_I(inode)->i_fc_list, &sbi->s_fc_q); + sbi->s_fc_q_cnt++; + } + spin_unlock(&sbi->s_fc_lock); + + write_lock(&ei->i_fc.fc_lock); + if (ei->i_fc.fc_tid == running_txn_tid) { + write_unlock(&ei->i_fc.fc_lock); + return; + } + + ext4_reset_inode_fc_info(&ei->i_fc); + ei->i_fc.fc_lblk_start = i_size_read(inode); + ei->i_fc.fc_lblk_end = i_size_read(inode); + ei->i_fc.fc_eligible = true; + ei->i_fc.fc_tid = running_txn_tid; + write_unlock(&ei->i_fc.fc_lock); +} + +void ext4_fc_del(struct inode *inode) +{ + if (!ext4_should_fast_commit(inode->i_sb)) + return; + + if (list_empty(&EXT4_I(inode)->i_fc_list)) + return; + + spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock); + list_del_init(&EXT4_I(inode)->i_fc_list); + spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); +} + +void ext4_fc_mark_new(struct inode *inode) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + tid_t running_txn_tid = get_running_txn_tid(inode->i_sb); + + write_lock(&ei->i_fc.fc_lock); + if (ei->i_fc.fc_tid != running_txn_tid) { + ext4_reset_inode_fc_info(&ei->i_fc); + ei->i_fc.fc_tid = running_txn_tid; + ei->i_fc.fc_eligible = true; + } + ei->i_fc.fc_new = true; + write_unlock(&ei->i_fc.fc_lock); +} + +bool ext4_is_inode_fc_ineligible(struct inode *inode) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + tid_t running_txn_tid = get_running_txn_tid(inode->i_sb); + bool ret = false; + + read_lock(&ei->i_fc.fc_lock); + if (running_txn_tid == ei->i_fc.fc_tid) + ret = !ei->i_fc.fc_eligible; + read_unlock(&ei->i_fc.fc_lock); + return ret; +} + +bool ext4_is_inode_fc_new(struct inode *inode) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + tid_t running_txn_tid = get_running_txn_tid(inode->i_sb); + bool ret = false; + + read_lock(&ei->i_fc.fc_lock); + if (running_txn_tid == ei->i_fc.fc_tid) + ret = ei->i_fc.fc_new; + read_unlock(&ei->i_fc.fc_lock); + + return ret; +} diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 2305c1acd415..65f20fbfb002 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -378,6 +378,17 @@ static inline int ext4_jbd2_inode_add_wait(handle_t *handle, return 0; } +static inline int ext4_should_fast_commit(struct super_block *sb) +{ + if (!ext4_has_feature_fast_commit(sb)) + return 0; + if (!test_opt2(sb, JOURNAL_FAST_COMMIT)) + return 0; + if (test_opt(sb, QUOTA)) + return 0; + return 1; +} + static inline void ext4_update_inode_fsync_trans(handle_t *handle, struct inode *inode, int datasync) @@ -460,5 +471,38 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) } void ext4_init_inode_fc_info(struct inode *inode); +extern void ext4_fc_enqueue_inode(handle_t *handle, struct inode *inode); +extern void ext4_fc_del(struct inode *inode); + +static inline void +ext4_fc_mark_sb_ineligible(struct super_block *sb) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + + spin_lock(&sbi->s_fc_lock); + sbi->s_fc_eligible = false; + spin_unlock(&sbi->s_fc_lock); +} + + +static inline void +ext4_fc_mark_ineligible(struct inode *inode) +{ + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + struct ext4_inode_info *ei = EXT4_I(inode); + + write_lock(&ei->i_fc.fc_lock); + if (sbi->s_journal) + ei->i_fc.fc_tid = sbi->s_journal->j_commit_sequence + 1; + ei->i_fc.fc_eligible = false; + write_unlock(&ei->i_fc.fc_lock); + spin_lock(&sbi->s_fc_lock); + sbi->s_fc_eligible = false; + spin_unlock(&sbi->s_fc_lock); +} + +void ext4_fc_mark_new(struct inode *inode); +bool ext4_is_inode_fc_ineligible(struct inode *inode); +bool ext4_is_inode_fc_new(struct inode *inode); #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 92266a2da7d6..b30f6175eb71 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -163,6 +163,7 @@ int __ext4_ext_dirty(const char *where, unsigned int line, handle_t *handle, } else { /* path points to leaf/index in inode body */ err = ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); } return err; } @@ -3714,6 +3715,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, err = ext4_zeroout_es(inode, &zero_ex1); if (!err) err = ext4_zeroout_es(inode, &zero_ex2); + } else { + ext4_fc_mark_ineligible(inode); } return err ? err : allocated; } @@ -3856,7 +3859,7 @@ static int check_eofblocks_fl(handle_t *handle, struct inode *inode, struct ext4_ext_path *path, unsigned int len) { - int i, depth; + int i, ret, depth; struct ext4_extent_header *eh; struct ext4_extent *last_ex; @@ -3898,7 +3901,10 @@ static int check_eofblocks_fl(handle_t *handle, struct inode *inode, return 0; out: ext4_clear_inode_flag(inode, EXT4_INODE_EOFBLOCKS); - return ext4_mark_inode_dirty(handle, inode); + ret = ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); + + return ret; } static int @@ -4607,6 +4613,7 @@ static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset, inode->i_ino, map.m_lblk, map.m_len, ret); ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); ret2 = ext4_journal_stop(handle); break; } @@ -4624,6 +4631,7 @@ static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset, ext4_set_inode_flag(inode, EXT4_INODE_EOFBLOCKS); } + ext4_fc_enqueue_inode(handle, inode); ext4_mark_inode_dirty(handle, inode); ext4_update_inode_fsync_trans(handle, inode, 1); ret2 = ext4_journal_stop(handle); @@ -4786,6 +4794,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, ext4_set_inode_flag(inode, EXT4_INODE_EOFBLOCKS); } ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); /* Zero out partial block at the edges of the range */ ret = ext4_zero_partial_blocks(handle, inode, offset, len); @@ -4957,6 +4966,7 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode, "ext4_ext_map_blocks returned %d", inode->i_ino, map.m_lblk, map.m_len, ret); + ext4_fc_mark_ineligible(inode); ext4_mark_inode_dirty(handle, inode); if (credits) ret2 = ext4_journal_stop(handle); @@ -5485,6 +5495,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) if (IS_SYNC(inode)) ext4_handle_sync(handle); inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_mark_ineligible(inode); ext4_mark_inode_dirty(handle, inode); ext4_update_inode_fsync_trans(handle, inode, 1); @@ -5599,6 +5610,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) inode->i_size += len; EXT4_I(inode)->i_disksize += len; inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_mark_ineligible(inode); ret = ext4_mark_inode_dirty(handle, inode); if (ret) goto out_stop; diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index ff30f3015551..47d04a33a3ca 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1133,6 +1133,14 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, ext4_set_inode_state(inode, EXT4_STATE_NEW); ext4_init_inode_fc_info(inode); + if (S_ISDIR(mode) || ext4_is_inode_fc_ineligible(dir) || + ext4_is_inode_fc_new(dir)) { + ext4_fc_mark_ineligible(inode); + } else { + ext4_fc_mark_new(inode); + ei->i_fc.fc_parent_ino = dir->i_ino; + } + ei->i_extra_isize = sbi->s_want_extra_isize; ei->i_inline_off = 0; if (ext4_has_feature_inline_data(sb)) diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 88cdf3c90bd1..fbd561cba098 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -435,6 +435,8 @@ static int ext4_destroy_inline_data_nolock(handle_t *handle, if (error) goto out; + ext4_fc_mark_ineligible(inode); + memset((void *)ext4_raw_inode(&is.iloc)->i_block, 0, EXT4_MIN_INLINE_DATA_SIZE); memset(ei->i_data, 0, EXT4_MIN_INLINE_DATA_SIZE); @@ -759,6 +761,7 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, ext4_write_unlock_xattr(inode, &no_expand); brelse(iloc.bh); + ext4_fc_enqueue_inode(ext4_journal_current_handle(), inode); mark_inode_dirty(inode); out: return copied; @@ -974,6 +977,7 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, * ordering of page lock and transaction start for journaling * filesystems. */ + ext4_fc_enqueue_inode(ext4_journal_current_handle(), inode); mark_inode_dirty(inode); return copied; @@ -1165,6 +1169,7 @@ static int ext4_finish_convert_inline_dir(handle_t *handle, if (err) return err; set_buffer_verified(dir_block); + ext4_fc_mark_ineligible(inode); return ext4_mark_inode_dirty(handle, inode); } @@ -1216,6 +1221,8 @@ static int ext4_convert_inline_data_nolock(handle_t *handle, goto out_restore; } + ext4_fc_mark_ineligible(inode); + data_bh = sb_getblk(inode->i_sb, map.m_pblk); if (!data_bh) { error = -ENOMEM; @@ -1709,6 +1716,8 @@ int ext4_delete_inline_entry(handle_t *handle, if (err) goto out; + ext4_fc_enqueue_inode(handle, dir); + ext4_show_inline_dir(dir, iloc.bh, inline_start, inline_size); out: ext4_write_unlock_xattr(dir, &no_expand); @@ -1986,6 +1995,7 @@ int ext4_inline_data_truncate(struct inode *inode, int *has_inline) if (err == 0) { inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_enqueue_inode(handle, inode); err = ext4_mark_inode_dirty(handle, inode); if (IS_SYNC(inode)) ext4_handle_sync(handle); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f230a888eddd..6d2efbd9aba9 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -262,6 +262,7 @@ void ext4_evict_inode(struct inode *inode) * cleaned up. */ ext4_orphan_del(NULL, inode); + ext4_fc_del(inode); sb_end_intwrite(inode->i_sb); goto no_delete; } @@ -279,6 +280,8 @@ void ext4_evict_inode(struct inode *inode) if (ext4_inode_is_fast_symlink(inode)) memset(EXT4_I(inode)->i_data, 0, sizeof(EXT4_I(inode)->i_data)); inode->i_size = 0; + ext4_fc_del(inode); + ext4_fc_mark_ineligible(inode); err = ext4_mark_inode_dirty(handle, inode); if (err) { ext4_warning(inode->i_sb, @@ -303,6 +306,7 @@ void ext4_evict_inode(struct inode *inode) stop_handle: ext4_journal_stop(handle); ext4_orphan_del(NULL, inode); + ext4_fc_del(inode); sb_end_intwrite(inode->i_sb); ext4_xattr_inode_array_free(ea_inode_array); goto no_delete; @@ -326,6 +330,8 @@ void ext4_evict_inode(struct inode *inode) * having errors), but we can't free the inode if the mark_dirty * fails. */ + ext4_fc_del(inode); + ext4_fc_mark_ineligible(inode); if (ext4_mark_inode_dirty(handle, inode)) /* If that failed, just do the required in-core inode clear. */ ext4_clear_inode(inode); @@ -1436,8 +1442,10 @@ static int ext4_write_end(struct file *file, * ordering of page lock and transaction start for journaling * filesystems. */ - if (i_size_changed || inline_data) + if (i_size_changed || inline_data) { ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); + } if (pos + len > inode->i_size && ext4_can_truncate(inode)) /* if we have allocated more blocks and copied @@ -1550,6 +1558,7 @@ static int ext4_journalled_write_end(struct file *file, pagecache_isize_extended(inode, old_size, pos); if (size_changed || inline_data) { + ext4_fc_enqueue_inode(handle, inode); ret2 = ext4_mark_inode_dirty(handle, inode); if (!ret) ret = ret2; @@ -2077,6 +2086,7 @@ static int __ext4_journalled_writepage(struct page *page, if (inline_data) { ret = ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); } else { ret = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL, do_journal_get_write_access); @@ -2604,6 +2614,7 @@ static int mpage_map_and_submit_extent(handle_t *handle, EXT4_I(inode)->i_disksize = disksize; up_write(&EXT4_I(inode)->i_data_sem); err2 = ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); if (err2) ext4_error(inode->i_sb, "Failed to mark inode %lu dirty", @@ -3205,6 +3216,7 @@ static int ext4_da_write_end(struct file *file, * bu greater than i_disksize.(hint delalloc) */ ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); } } @@ -3614,8 +3626,12 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length, ret = PTR_ERR(handle); goto orphan_del; } - if (ext4_update_inode_size(inode, offset + written)) + + if (ext4_update_inode_size(inode, offset + written)) { ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); + } + /* * We may need to truncate allocated but not written blocks beyond EOF. */ @@ -3851,6 +3867,7 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) * ignore it. */ ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); } } err = ext4_journal_stop(handle); @@ -4372,6 +4389,8 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) goto out_dio; } + ext4_fc_mark_ineligible(inode); + ret = ext4_zero_partial_blocks(handle, inode, offset, length); if (ret) @@ -4525,6 +4544,7 @@ int ext4_truncate(struct inode *inode) if (inode->i_size & (inode->i_sb->s_blocksize - 1)) ext4_block_truncate_page(handle, mapping, inode->i_size); + ext4_fc_mark_ineligible(inode); /* * We add the inode to the orphan list, so that if this * truncate spans multiple transactions, and we crash, we will diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 442f7ef873fc..a8e23acb5c03 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -987,6 +987,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) err = mnt_want_write_file(filp); if (err) return err; + ext4_fc_mark_sb_ineligible(sb); err = swap_inode_boot_loader(sb, inode); mnt_drop_write_file(filp); return err; @@ -997,6 +998,8 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) int err = 0, err2 = 0; ext4_group_t o_group = EXT4_SB(sb)->s_groups_count; + ext4_fc_mark_sb_ineligible(sb); + if (copy_from_user(&n_blocks_count, (__u64 __user *)arg, sizeof(__u64))) { return -EFAULT; diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c index b1e4d359f73b..b995690d73ce 100644 --- a/fs/ext4/migrate.c +++ b/fs/ext4/migrate.c @@ -513,6 +513,7 @@ int ext4_ext_migrate(struct inode *inode) * work to orphan_list_cleanup() */ ext4_orphan_del(NULL, tmp_inode); + ext4_fc_del(inode); retval = PTR_ERR(handle); goto out; } diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 129029534075..8b73c5a38d49 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2140,8 +2140,10 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname, * out all the changes we did so far. Otherwise we can end up * with corrupted filesystem. */ - if (retval) + if (retval) { ext4_mark_inode_dirty(handle, dir); + ext4_fc_mark_ineligible(dir); + } dx_release(frames); brelse(bh2); return retval; @@ -2661,6 +2663,7 @@ static int ext4_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode) err = ext4_orphan_add(handle, inode); if (err) goto err_unlock_inode; + ext4_fc_enqueue_inode(handle, inode); mark_inode_dirty(inode); unlock_new_inode(inode); } @@ -2773,6 +2776,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) err = ext4_init_new_dir(handle, dir, inode); if (err) goto out_clear_inode; + ext4_fc_mark_ineligible(inode); err = ext4_mark_inode_dirty(handle, inode); if (!err) err = ext4_add_entry(handle, dentry, inode); @@ -3114,6 +3118,7 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry) inode->i_size = 0; ext4_orphan_add(handle, inode); inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode); + ext4_fc_mark_ineligible(inode); ext4_mark_inode_dirty(handle, inode); ext4_dec_count(handle, dir); ext4_update_dx_flag(dir); @@ -3192,6 +3197,7 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry) goto end_unlink; dir->i_ctime = dir->i_mtime = current_time(dir); ext4_update_dx_flag(dir); + ext4_fc_mark_ineligible(dir); ext4_mark_inode_dirty(handle, dir); drop_nlink(inode); if (!inode->i_nlink) @@ -3387,6 +3393,7 @@ static int ext4_link(struct dentry *old_dentry, err = ext4_add_entry(handle, dentry, inode); if (!err) { + ext4_fc_mark_ineligible(inode); ext4_mark_inode_dirty(handle, inode); /* this can happen only for tmpfile being * linked the first time @@ -3991,6 +3998,9 @@ static int ext4_rename2(struct inode *old_dir, struct dentry *old_dentry, if (err) return err; + ext4_fc_mark_ineligible(old_dir); + ext4_fc_mark_ineligible(new_dir); + if (flags & RENAME_EXCHANGE) { return ext4_cross_rename(old_dir, old_dentry, new_dir, new_dentry); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index c90337fc98c1..3e9570ea9748 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1181,6 +1181,7 @@ void ext4_clear_inode(struct inode *inode) EXT4_I(inode)->jinode = NULL; } fscrypt_put_encryption_info(inode); + ext4_fc_del(inode); } static struct inode *ext4_nfs_get_inode(struct super_block *sb, @@ -1325,6 +1326,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len, * S_DAX may be disabled */ ext4_set_inode_flags(inode); + ext4_fc_mark_ineligible(inode); res = ext4_mark_inode_dirty(handle, inode); if (res) EXT4_ERROR_INODE(inode, "Failed to mark inode dirty"); @@ -5797,6 +5799,7 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id, EXT4_I(inode)->i_flags |= EXT4_NOATIME_FL | EXT4_IMMUTABLE_FL; inode_set_flags(inode, S_NOATIME | S_IMMUTABLE, S_NOATIME | S_IMMUTABLE); + ext4_fc_mark_ineligible(inode); ext4_mark_inode_dirty(handle, inode); ext4_journal_stop(handle); unlock_inode: @@ -5904,6 +5907,7 @@ static int ext4_quota_off(struct super_block *sb, int type) EXT4_I(inode)->i_flags &= ~(EXT4_NOATIME_FL | EXT4_IMMUTABLE_FL); inode_set_flags(inode, 0, S_NOATIME | S_IMMUTABLE); inode->i_mtime = inode->i_ctime = current_time(inode); + ext4_fc_mark_ineligible(inode); ext4_mark_inode_dirty(handle, inode); ext4_journal_stop(handle); out_unlock: @@ -6010,6 +6014,7 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type, if (inode->i_size < off + len) { i_size_write(inode, off + len); EXT4_I(inode)->i_disksize = inode->i_size; + ext4_fc_mark_ineligible(inode); ext4_mark_inode_dirty(handle, inode); } return len; diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 491f9ee4040e..19bc4046658c 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1406,6 +1406,7 @@ static int ext4_xattr_inode_write(handle_t *handle, struct inode *ea_inode, inode_unlock(ea_inode); ext4_mark_inode_dirty(handle, ea_inode); + ext4_fc_enqueue_inode(handle, ea_inode); out: brelse(bh); From patchwork Tue Oct 1 07:40:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169768 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="rce9Wv7G"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB795m8Sz9sRd for ; Tue, 1 Oct 2019 17:42:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733120AbfJAHmI (ORCPT ); Tue, 1 Oct 2019 03:42:08 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:44485 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725777AbfJAHmI (ORCPT ); Tue, 1 Oct 2019 03:42:08 -0400 Received: by mail-pf1-f195.google.com with SMTP id q21so7287445pfn.11 for ; Tue, 01 Oct 2019 00:42:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=J3uAdSaeo3SK4NEMv1bA39yXg1xoIX91DGqpS4F5v5A=; b=rce9Wv7GrLZkgaLuV3U+iCNJp3FWNl2LqRhBZZQ+9PrbHABzwvzE8SR6MBlHujsaz7 og4Gzj15dP4N19e8sDftD+7sJyKlfjH5TeZ+sks+vtL0Of8i6DjsD5EE0cgkyIJvMsbn ipOx94pRPjxYdsIVp+uN7nHaVtnnqsaLM7NqCpwXbSkCBljay/+TfPUxRFXlqlWRxUqs fBsMVvrHzXZsSFVFgol9f0c62KI6ADosxQoljtzbvyuTXaMCevH+O6aMMHeZL9yADfiw tVy8f8Gm6GGviNL09RlgI2GFgIhg1cEoEBRygkdxz1azKBFCc70xJLlmJRGScRDuNMvC DeKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=J3uAdSaeo3SK4NEMv1bA39yXg1xoIX91DGqpS4F5v5A=; b=ohB/mwljGq+mBMV4KFrCO543cg+yoYbAbSU0pkfIZqeskz2TsDABeruAd/rN/dMzZH jIBnEI2uNQ3/0VNUui4i+xUJzxtmGqkj0v2IH+HGSQLnCNa78z5V+k6hPDXhrbtPfCdz fGZj4yui5j1igPTlAUyzcddkmKP+Su+04Bj8D1kzdEL4kTlonbTtOc4vS4Vn9wEc/HCy NhzTFZZWul1CQnAzeaF4OiKXDKRELtMGOSyCRHr2G6amtuVPkHqthvqs7rVZzk8YRNxb GybRXrGQMI3/dMiTKXdNKJOJ/sxz5HIkJWzioqpUXJNV8BJkxwUkuZU7tkPu2hTiodqG 5Zgg== X-Gm-Message-State: APjAAAXiyk4zzsK4+Bn8M0JxdIP2kRa+Forkf7VjbBVbrE+AkuHjzdNj HHF6de2fnRsn9EtKpRqwzKM5aD6d85g= X-Google-Smtp-Source: APXvYqwsQQNxTZSgFmEhArypNBP/jxe1i/tNY9CO6pt9VAzDzJsbvoZvM1WpoI3iyJ+c5jjVvHy6mA== X-Received: by 2002:aa7:998f:: with SMTP id k15mr27228769pfh.203.1569915726947; Tue, 01 Oct 2019 00:42:06 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:06 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 08/13] ext4: fast-commit commit range tracking Date: Tue, 1 Oct 2019 00:40:57 -0700 Message-Id: <20191001074101.256523-9-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org With this patch, we track logical range of file offsets that need to be committed using fast commit. This allows us to find file extents that need to be committed during the commit time. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4_jbd2.c | 34 ++++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.h | 2 ++ fs/ext4/inline.c | 4 +++- fs/ext4/inode.c | 17 ++++++++++++++++- 4 files changed, 55 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index e70ad7a8e46e..0bb8de2139a5 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -405,6 +405,40 @@ void ext4_fc_del(struct inode *inode) spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock); } +void ext4_fc_update_commit_range(struct inode *inode, ext4_lblk_t start, + ext4_lblk_t end) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + tid_t running_txn_tid = get_running_txn_tid(inode->i_sb); + + if (!ext4_should_fast_commit(inode->i_sb)) + return; + + if (inode->i_ino < EXT4_FIRST_INO(inode->i_sb)) + ext4_debug("Special inode %ld being modified\n", inode->i_ino); + + if (!EXT4_SB(inode->i_sb)->s_fc_eligible) + return; + + write_lock(&ei->i_fc.fc_lock); + if (ei->i_fc.fc_tid == running_txn_tid) { + ei->i_fc.fc_lblk_start = ei->i_fc.fc_lblk_start < start ? + ei->i_fc.fc_lblk_start : start; + ei->i_fc.fc_lblk_end = ei->i_fc.fc_lblk_end > end ? + ei->i_fc.fc_lblk_end : end; + write_unlock(&ei->i_fc.fc_lock); + return; + } + + ext4_reset_inode_fc_info(&ei->i_fc); + ei->i_fc.fc_eligible = true; + ei->i_fc.fc_lblk_start = start; + ei->i_fc.fc_lblk_end = end; + ei->i_fc.fc_tid = running_txn_tid; + write_unlock(&ei->i_fc.fc_lock); + +} + void ext4_fc_mark_new(struct inode *inode) { struct ext4_inode_info *ei = EXT4_I(inode); diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 65f20fbfb002..2cb7e7e1f025 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -501,6 +501,8 @@ ext4_fc_mark_ineligible(struct inode *inode) spin_unlock(&sbi->s_fc_lock); } +void ext4_fc_update_commit_range(struct inode *inode, ext4_lblk_t start, + ext4_lblk_t end); void ext4_fc_mark_new(struct inode *inode); bool ext4_is_inode_fc_ineligible(struct inode *inode); diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index fbd561cba098..66b2c0e3f7e4 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -966,8 +966,10 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, * But it's important to update i_size while still holding page lock: * page writeout could otherwise come in and zero beyond i_size. */ - if (pos+copied > inode->i_size) + if (pos+copied > inode->i_size) { + ext4_fc_update_commit_range(inode, inode->i_size, pos + copied); i_size_write(inode, pos+copied); + } unlock_page(page); put_page(page); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 6d2efbd9aba9..ea039e3e1a4d 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1549,6 +1549,8 @@ static int ext4_journalled_write_end(struct file *file, SetPageUptodate(page); } size_changed = ext4_update_inode_size(inode, pos + copied); + ext4_fc_update_commit_range(inode, pos, pos + copied); + ext4_set_inode_state(inode, EXT4_STATE_JDATA); EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid; unlock_page(page); @@ -2610,8 +2612,12 @@ static int mpage_map_and_submit_extent(handle_t *handle, i_size = i_size_read(inode); if (disksize > i_size) disksize = i_size; - if (disksize > EXT4_I(inode)->i_disksize) + if (disksize > EXT4_I(inode)->i_disksize) { + ext4_fc_update_commit_range(inode, + EXT4_I(inode)->i_disksize, + disksize); EXT4_I(inode)->i_disksize = disksize; + } up_write(&EXT4_I(inode)->i_data_sem); err2 = ext4_mark_inode_dirty(handle, inode); ext4_fc_enqueue_inode(handle, inode); @@ -3220,6 +3226,8 @@ static int ext4_da_write_end(struct file *file, } } + ext4_fc_update_commit_range(inode, pos, pos + copied); + if (write_mode != CONVERT_INLINE_DATA && ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) && ext4_has_inline_data(inode)) @@ -3627,6 +3635,7 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length, goto orphan_del; } + ext4_fc_update_commit_range(inode, offset, offset + written); if (ext4_update_inode_size(inode, offset + written)) { ext4_mark_inode_dirty(handle, inode); ext4_fc_enqueue_inode(handle, inode); @@ -3751,6 +3760,7 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) ext4_update_i_disksize(inode, inode->i_size); ext4_journal_stop(handle); } + ext4_fc_update_commit_range(inode, offset, offset + count); BUG_ON(iocb->private == NULL); @@ -3869,6 +3879,8 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) ext4_mark_inode_dirty(handle, inode); ext4_fc_enqueue_inode(handle, inode); } + ext4_fc_update_commit_range(inode, offset, + offset + end); } err = ext4_journal_stop(handle); if (ret == 0) @@ -5327,6 +5339,9 @@ static int ext4_do_update_inode(handle_t *handle, cpu_to_le16(ei->i_file_acl >> 32); raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl); if (ei->i_disksize != ext4_isize(inode->i_sb, raw_inode)) { + ext4_fc_update_commit_range(inode, + ext4_isize(inode->i_sb, raw_inode), + ei->i_disksize); ext4_isize_set(raw_inode, ei->i_disksize); need_datasync = 1; } From patchwork Tue Oct 1 07:40:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169770 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="FdtKKXRa"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB7F1Xshz9sP7 for ; Tue, 1 Oct 2019 17:42:17 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733127AbfJAHmL (ORCPT ); Tue, 1 Oct 2019 03:42:11 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:43231 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733121AbfJAHmK (ORCPT ); Tue, 1 Oct 2019 03:42:10 -0400 Received: by mail-pg1-f193.google.com with SMTP id v27so8986147pgk.10 for ; Tue, 01 Oct 2019 00:42:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=YPEYFQogYMEqLHR322kSougE11dIMMHohDqztBq/tBk=; b=FdtKKXRabUC5TzXYf7W+Tnyz20+T03y7ICX71bemrmq9s+h194UgyLv+4y+pQ8XdxW dPVvfRy01l8E4w6ozvhKsEdE6HyWlkli9hFzTdk4zZ4Ij+BfENWUfHxovvsiKwgQNQ9p 67aZ0ZFRekSRO9CuDKx64HlJdhc8swK5xzuhCEvq4K2s+QRaFUnwf0pl3YP6uYIJHehO lMDnF7A2aMQK6BvCpWWsHihDFNciksjB0njqpQE5q0KmzRj9LY/V6SATsIeqqto53mDI QcmBWFd0Qxj2GzN05FrI3D4tRxirT/6jiYGFI6pTZqnPrgb0n27YTC/8D4ZgbopmdHv7 xOcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YPEYFQogYMEqLHR322kSougE11dIMMHohDqztBq/tBk=; b=PPs7vXo5vQE8ucx63MloYTtNflVsCPeQKuKSmBgSZD+yJnyU1sp6j7FJkG6wB14Lta zFxKnqbjRo5qkA1SFWEpTmV4WG/NCSr9ht8whXoDUWVZQjhfAE3v2+mVKc871bpHEmxH JUSlgrXYjh5dL4EXh27W0lFijoTKn/sVxc/N8Da+IodfIGr18PnnfimMSMdm8EmrYtk8 9CDQ9KFfqKUKHR9J1XX2cw/j720rGQeCXEYTFUmWPJysftojiY3ZUx/e1ii/w5CrYSsp RC4Gy9p9sB3aVU4hIZUPZPuQd40j8aFOhmyN1vJytDVHxvNrb8A+tOBJZjtUVDjKNvlP 3PUQ== X-Gm-Message-State: APjAAAVa1Xy5uEAHuheNRmZx10QHx823tG0zOFiGJ/JgyiFRjtvnP2Ix 1EzEpv6BSV8nzopeLSFxVwpGwwwv/7s= X-Google-Smtp-Source: APXvYqxjiZ1UbCkNTEhz+lZpkwm1l39S3am7h35wt6rKgnAgcpxBiH8bViIhabK/tcRIFI6muOlHOg== X-Received: by 2002:a63:408:: with SMTP id 8mr29275411pge.334.1569915727661; Tue, 01 Oct 2019 00:42:07 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:07 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 09/13] ext4: fast-commit commit path changes Date: Tue, 1 Oct 2019 00:40:58 -0700 Message-Id: <20191001074101.256523-10-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch implements the actual commit path for fast commit. Based on inodes tracked and their respective changes remembered, this patch adds code to create a fast commit block that stores extents added as well as dentrys created for the inode. We use new JBD2 interfaces added in previous patches in this series. The fast commit blocks that are created have extents that _should_ be present in the file. It doesn't yet support removing of extents, making operations such as truncate, delete fast commit incompatible. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4_jbd2.c | 309 ++++++++++++++++++++++++++++++++++++ fs/ext4/ext4_jbd2.h | 50 +++++- fs/ext4/extents.c | 8 +- fs/ext4/inode.c | 22 ++- fs/ext4/super.c | 11 ++ include/trace/events/ext4.h | 39 +++++ 6 files changed, 429 insertions(+), 10 deletions(-) diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 0bb8de2139a5..fd7740372438 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -4,6 +4,7 @@ */ #include "ext4_jbd2.h" +#include "ext4_extents.h" #include @@ -480,3 +481,311 @@ bool ext4_is_inode_fc_new(struct inode *inode) return ret; } + +static void ext4_end_buffer_io_sync(struct buffer_head *bh, int uptodate) +{ + struct buffer_head *orig_bh = bh->b_private; + + BUFFER_TRACE(bh, ""); + if (uptodate) { + ext4_debug("%s: Block %lld up-to-date", + __func__, bh->b_blocknr); + set_buffer_uptodate(bh); + } else { + ext4_debug("%s: Block %lld not up-to-date", + __func__, bh->b_blocknr); + clear_buffer_uptodate(bh); + } + if (orig_bh) { + clear_bit_unlock(BH_Shadow, &orig_bh->b_state); + /* Protect BH_Shadow bit in b_state */ + smp_mb__after_atomic(); + wake_up_bit(&orig_bh->b_state, BH_Shadow); + } + unlock_buffer(bh); +} + +static inline u8 *fc_add_tag(u8 *dst, u16 tag, u16 len, u8 *val) +{ + struct ext4_fc_tl tl; + + tl.fc_tag = cpu_to_le16(tag); + tl.fc_len = cpu_to_le16(len); + memcpy(dst, &tl, sizeof(tl)); + memcpy(dst + sizeof(tl), val, len); + + return dst + sizeof(tl) + len; +} + +int ext4_fc_write_inode(journal_t *journal, struct buffer_head *bh, + struct inode *inode, tid_t tid, tid_t subtid, + int is_last, struct dentry *dentry) +{ + ext4_lblk_t old_blk_size, cur_lblk_off, new_blk_size; + struct super_block *sb = journal->j_private; + struct ext4_inode_info *ei = EXT4_I(inode); + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_fc_commit_hdr *fc_hdr; + struct ext4_map_blocks map; + struct ext4_iloc iloc; + struct ext4_extent extent; + struct inode *parent; + __u32 dummy_csum = 0, csum; + __u8 *start, *cur, *end; + __u16 num_tlvs = 0; + int ret; + + read_lock(&ei->i_fc.fc_lock); + if (tid != ei->i_fc.fc_tid) { + jbd_debug(3, + "File not modified. Modified %d, expected %d", + ei->i_fc.fc_tid, tid); + read_unlock(&ei->i_fc.fc_lock); + return 0; + } + read_unlock(&ei->i_fc.fc_lock); + + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) + return -ECANCELED; + + if (ext4_is_inode_fc_new(inode)) { + parent = d_inode(dentry->d_parent); + if (parent && ext4_is_inode_fc_ineligible(parent)) + return -ECANCELED; + } + + ret = ext4_get_inode_loc(inode, &iloc); + if (ret) + return ret; + + end = (__u8 *)bh->b_data + journal->j_blocksize; + + write_lock(&ei->i_fc.fc_lock); + old_blk_size = (ei->i_fc.fc_lblk_start + sb->s_blocksize - 1) >> + inode->i_blkbits; + new_blk_size = ei->i_fc.fc_lblk_end >> inode->i_blkbits; + ei->i_fc.fc_lblk_start = ei->i_fc.fc_lblk_end; + write_unlock(&ei->i_fc.fc_lock); + + jbd_debug(3, "Committing as tid = %d, subtid = %d on buffer %lld\n", + tid, subtid, bh->b_blocknr); + + fc_hdr = (struct ext4_fc_commit_hdr *) + ((__u8 *)bh->b_data + sizeof(journal_header_t)); + fc_hdr->fc_magic = cpu_to_le32(EXT4_FC_MAGIC); + fc_hdr->fc_subtid = cpu_to_le32(subtid); + fc_hdr->fc_ino = cpu_to_le32(inode->i_ino); + fc_hdr->fc_features = 0; + fc_hdr->fc_flags = 0; + + if (is_last) + ext4_fc_mark_last(fc_hdr); + + memcpy(&fc_hdr->inode, ext4_raw_inode(&iloc), EXT4_INODE_SIZE(sb)); + cur = (__u8 *)(fc_hdr + 1); + start = cur; + if (ext4_is_inode_fc_new(inode)) { + __le32 parent_ino; + + read_lock(&ei->i_fc.fc_lock); + parent_ino = cpu_to_le32(ei->i_fc.fc_parent_ino); + read_unlock(&ei->i_fc.fc_lock); + + if (!dentry) + return -ECANCELED; + + cur = fc_add_tag(cur, EXT4_FC_TAG_PARENT_INO, + sizeof(parent_ino), (u8 *)&parent_ino); + cur = fc_add_tag(cur, EXT4_FC_TAG_DNAME, + dentry->d_name.len, + (u8 *)dentry->d_name.name); + num_tlvs = 2; + } + csum = 0; + cur_lblk_off = old_blk_size; + while (cur_lblk_off <= new_blk_size) { + map.m_lblk = cur_lblk_off; + map.m_len = new_blk_size - cur_lblk_off + 1; + ret = ext4_map_blocks(NULL, inode, &map, 0); + if (!ret) { + cur_lblk_off += map.m_len; + continue; + } + + if (map.m_flags & EXT4_MAP_UNWRITTEN) + return -ECANCELED; + extent.ee_block = cpu_to_le32(map.m_lblk); + cur_lblk_off += map.m_len; + if (cur + sizeof(struct ext4_extent) + + sizeof(struct ext4_fc_tl) >= end) + return -ENOSPC; + + extent.ee_len = cpu_to_le16(map.m_len); + ext4_ext_store_pblock(&extent, map.m_pblk); + ext4_ext_mark_initialized(&extent); + cur = fc_add_tag(cur, EXT4_FC_TAG_EXT, + sizeof(struct ext4_extent), + (u8 *)&extent); + num_tlvs++; + } + + fc_hdr->fc_num_tlvs = cpu_to_le16(num_tlvs); + csum = ext4_chksum(sbi, csum, (__u8 *)fc_hdr, + offsetof(struct ext4_fc_commit_hdr, fc_csum)); + csum = ext4_chksum(sbi, csum, &dummy_csum, sizeof(dummy_csum)); + csum = ext4_chksum(sbi, csum, start, cur - start); + fc_hdr->fc_csum = cpu_to_le32(csum); + + jbd_debug(3, "Created FC block for inode %ld with [%d, %d]", + inode->i_ino, tid, subtid); + + return 1; +} + +static void ext4_journal_fc_cleanup_cb(journal_t *journal) +{ + struct super_block *sb = journal->j_private; + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_inode_info *iter; + struct inode *inode; + + spin_lock(&sbi->s_fc_lock); + while (!list_empty(&sbi->s_fc_q)) { + iter = list_first_entry(&sbi->s_fc_q, + struct ext4_inode_info, i_fc_list); + list_del_init(&iter->i_fc_list); + inode = &iter->vfs_inode; + } + INIT_LIST_HEAD(&sbi->s_fc_q); + sbi->s_fc_q_cnt = 0; + spin_unlock(&sbi->s_fc_lock); + sbi->s_fc_eligible = true; +} + +/* + * Fast-commit commit callback. There is contention between sbi->s_fc_lock and + * i_data_sem. Locking order is - i_data_sem then s_fc_lock + */ +static int ext4_journal_fc_commit_cb(journal_t *journal, tid_t tid, + tid_t subtid, + struct transaction_run_stats_s *stats) +{ + struct super_block *sb = journal->j_private; + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct list_head *pos, *tmp; + struct ext4_inode_info *iter; + int num_bufs = 0, ret; + + memset(stats, 0, sizeof(*stats)); + + trace_ext4_journal_fc_commit_cb_start(sb); + sbi = sbi; + spin_lock(&sbi->s_fc_lock); + if (!sbi->s_fc_eligible) { + sbi->s_fc_eligible = true; + spin_unlock(&sbi->s_fc_lock); + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "ineligible"); + return -ECANCELED; + } + + if (unlikely(ext4_forced_shutdown(EXT4_SB(sb)))) { + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "shutdown"); + return -EIO; + } + + stats->rs_flushing = jiffies; + /* Submit data buffers first */ + list_for_each(pos, &sbi->s_fc_q) { + iter = list_entry(pos, struct ext4_inode_info, i_fc_list); + ret = jbd2_submit_inode_data(journal, iter->jinode); + if (ret) { + spin_unlock(&sbi->s_fc_lock); + trace_ext4_journal_fc_commit_cb_stop(sb, 0, + "data_commit"); + return ret; + } + } + stats->rs_logging = jiffies; + stats->rs_flushing = jbd2_time_diff(stats->rs_flushing, + stats->rs_logging); + + list_for_each_safe(pos, tmp, &sbi->s_fc_q) { + struct inode *inode; + struct buffer_head *bh; + int is_last; + + iter = list_entry(pos, struct ext4_inode_info, i_fc_list); + inode = &iter->vfs_inode; + + is_last = list_is_last(pos, &sbi->s_fc_q); + spin_unlock(&sbi->s_fc_lock); + + ret = jbd2_map_fc_buf(journal, &bh); + if (ret) { + trace_ext4_journal_fc_commit_cb_stop(sb, 0, + "map_fc_buf"); + return -ENOMEM; + } + + /* + * Release s_fc_lock here since fc_write_inode calls + * ext4_map_blocks which needs i_data_sem. + */ + ret = ext4_fc_write_inode(journal, bh, inode, tid, subtid, + is_last, NULL); + if (ret < 0) { + trace_ext4_journal_fc_commit_cb_stop(sb, 0, + "fc_write_inode"); + return ret; + } + lock_buffer(bh); + clear_buffer_dirty(bh); + set_buffer_uptodate(bh); + bh->b_end_io = ext4_end_buffer_io_sync; + submit_bh(REQ_OP_WRITE, REQ_SYNC, bh); + spin_lock(&sbi->s_fc_lock); + + num_bufs++; + } + + stats->rs_logging = jbd2_time_diff(stats->rs_logging, jiffies); + if (num_bufs == 0) { + spin_unlock(&sbi->s_fc_lock); + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "no_data"); + stats->rs_blocks_logged = num_bufs; + return 0; + } + + /* + * Before returning, check if s_fc_eligible was modified since we + * started. + */ + if (!sbi->s_fc_eligible) { + spin_unlock(&sbi->s_fc_lock); + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "ineligible2"); + return -ECANCELED; + } + + if (unlikely(ext4_forced_shutdown(EXT4_SB(sb)))) { + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "shutdown2"); + return -EIO; + } + + spin_unlock(&sbi->s_fc_lock); + + jbd_debug(3, "%s: Journal blocks ready for fast commit\n", __func__); + + stats->rs_blocks_logged = num_bufs; + + trace_ext4_journal_fc_commit_cb_stop(sb, num_bufs, "success"); + + return jbd2_wait_on_fc_bufs(journal, num_bufs); +} + +void ext4_init_fast_commit(struct super_block *sb, journal_t *journal) +{ + if (ext4_should_fast_commit(sb)) { + journal->j_fc_commit_callback = ext4_journal_fc_commit_cb; + journal->j_fc_cleanup_callback = ext4_journal_fc_cleanup_cb; + } +} diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 2cb7e7e1f025..acb9533068c4 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -397,8 +397,14 @@ static inline void ext4_update_inode_fsync_trans(handle_t *handle, if (ext4_handle_valid(handle) && !is_handle_aborted(handle)) { ei->i_sync_tid = handle->h_transaction->t_tid; - if (datasync) + if (ext4_should_fast_commit(inode->i_sb)) + ei->i_sync_subtid = handle->h_transaction->t_subtid; + if (datasync) { ei->i_datasync_tid = handle->h_transaction->t_tid; + if (ext4_should_fast_commit(inode->i_sb)) + ei->i_datasync_subtid = + handle->h_transaction->t_subtid; + } } } @@ -470,6 +476,47 @@ static inline int ext4_should_dioread_nolock(struct inode *inode) return 1; } +/* Ext4 fast commit related info */ + +/* Magic of fast commit header */ +#define EXT4_FC_MAGIC 0xE2540090 + +#define EXT4_FC_FL_LAST 0x00000001 + +#define ext4_fc_is_last(__fc_hdr) (((__fc_hdr)->fc_flags) & \ + EXT4_FC_FL_LAST) + +#define ext4_fc_mark_last(__fc_hdr) (((__fc_hdr)->fc_flags) |= \ + EXT4_FC_FL_LAST) + +struct ext4_fc_commit_hdr { + /* Fast commit magic, should be EXT4_FC_MAGIC */ + __le32 fc_magic; + /* Sub transaction ID */ + __le32 fc_subtid; + /* Features used by this fast commit block */ + __u8 fc_features; + /* Flags for this block. */ + __u8 fc_flags; + /* Number of TLVs in this fast commmit block */ + __le16 fc_num_tlvs; + /* Inode number */ + __le32 fc_ino; + /* ext4 inode on disk copy */ + struct ext4_inode inode; + /* Csum(hdr+contents) */ + __le32 fc_csum; +}; + +#define EXT4_FC_TAG_EXT 0x1 /* Extent */ +#define EXT4_FC_TAG_DNAME 0x2 +#define EXT4_FC_TAG_PARENT_INO 0x3 + +struct ext4_fc_tl { + __le16 fc_tag; + __le16 fc_len; +}; + void ext4_init_inode_fc_info(struct inode *inode); extern void ext4_fc_enqueue_inode(handle_t *handle, struct inode *inode); extern void ext4_fc_del(struct inode *inode); @@ -507,4 +554,5 @@ void ext4_fc_update_commit_range(struct inode *inode, ext4_lblk_t start, void ext4_fc_mark_new(struct inode *inode); bool ext4_is_inode_fc_ineligible(struct inode *inode); bool ext4_is_inode_fc_new(struct inode *inode); +void ext4_init_fast_commit(struct super_block *sb, journal_t *journal); #endif /* _EXT4_JBD2_H */ diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index b30f6175eb71..dea4c2632272 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4898,10 +4898,10 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) if (ret) goto out; - if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) { - ret = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal, - EXT4_I(inode)->i_sync_tid); - } + if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) + ret = jbd2_fc_complete_commit( + EXT4_SB(inode->i_sb)->s_journal, EXT4_I(inode)->i_sync_tid, + EXT4_I(inode)->i_sync_subtid); out: inode_unlock(inode); trace_ext4_fallocate_exit(inode, offset, max_blocks, ret); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ea039e3e1a4d..cbfa1ec858a1 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5039,20 +5039,25 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, */ if (journal) { transaction_t *transaction; - tid_t tid; + tid_t tid, subtid; read_lock(&journal->j_state_lock); if (journal->j_running_transaction) transaction = journal->j_running_transaction; else transaction = journal->j_committing_transaction; - if (transaction) + if (transaction) { tid = transaction->t_tid; - else + subtid = transaction->t_subtid; + } else { tid = journal->j_commit_sequence; + subtid = journal->j_fc_sequence; + } read_unlock(&journal->j_state_lock); ei->i_sync_tid = tid; ei->i_datasync_tid = tid; + ei->i_sync_subtid = subtid; + ei->i_datasync_subtid = subtid; } if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) { @@ -5475,8 +5480,9 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc) if (wbc->sync_mode != WB_SYNC_ALL || wbc->for_sync) return 0; - err = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal, - EXT4_I(inode)->i_sync_tid); + err = jbd2_fc_complete_commit( + EXT4_SB(inode->i_sb)->s_journal, EXT4_I(inode)->i_sync_tid, + EXT4_I(inode)->i_sync_subtid); } else { struct ext4_iloc iloc; @@ -5628,6 +5634,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (attr->ia_valid & ATTR_GID) inode->i_gid = attr->ia_gid; error = ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); ext4_journal_stop(handle); } @@ -5688,6 +5695,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) inode->i_mtime = current_time(inode); inode->i_ctime = inode->i_mtime; } + ext4_fc_enqueue_inode(handle, inode); down_write(&EXT4_I(inode)->i_data_sem); EXT4_I(inode)->i_disksize = attr->ia_size; rc = ext4_mark_inode_dirty(handle, inode); @@ -5732,6 +5740,8 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (!error) { setattr_copy(inode, attr); + ext4_fc_enqueue_inode(ext4_journal_current_handle(), + inode); mark_inode_dirty(inode); } @@ -6144,6 +6154,7 @@ void ext4_dirty_inode(struct inode *inode, int flags) goto out; ext4_mark_inode_dirty(handle, inode); + ext4_fc_enqueue_inode(handle, inode); ext4_journal_stop(handle); out: @@ -6229,6 +6240,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) if (IS_ERR(handle)) return PTR_ERR(handle); + ext4_fc_mark_ineligible(inode); err = ext4_mark_inode_dirty(handle, inode); ext4_handle_sync(handle); ext4_journal_stop(handle); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 3e9570ea9748..208c57b5ac80 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1129,6 +1129,16 @@ static void ext4_destroy_inode(struct inode *inode) true); dump_stack(); } + if (!list_empty(&(EXT4_I(inode)->i_fc_list))) { +#ifdef EXT4FS_DEBUG + if (EXT4_SB(inode->i_sb)->s_fc_eligible) { + pr_warn("%s: INODE %ld in FC List with FC allowd", + __func__, inode->i_ino); + dump_stack(); + } +#endif + ext4_fc_del(inode); + } } static void init_once(void *foo) @@ -4713,6 +4723,7 @@ static void ext4_init_journal_params(struct super_block *sb, journal_t *journal) journal->j_commit_interval = sbi->s_commit_interval; journal->j_min_batch_time = sbi->s_min_batch_time; journal->j_max_batch_time = sbi->s_max_batch_time; + ext4_init_fast_commit(sb, journal); write_lock(&journal->j_state_lock); if (test_opt(sb, BARRIER)) diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index d68e9e536814..9c24b1c5239f 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2703,6 +2703,45 @@ TRACE_EVENT(ext4_error, __entry->function, __entry->line) ); +TRACE_EVENT(ext4_journal_fc_commit_cb_start, + TP_PROTO(struct super_block *sb), + + TP_ARGS(sb), + + TP_STRUCT__entry( + __field(dev_t, dev) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + ), + + TP_printk("fast_commit started on dev %d,%d", + MAJOR(__entry->dev), MINOR(__entry->dev)) +); + +TRACE_EVENT(ext4_journal_fc_commit_cb_stop, + TP_PROTO(struct super_block *sb, int nblks, const char *reason), + + TP_ARGS(sb, nblks, reason), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int, nblks) + __field(const char *, reason) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + __entry->nblks = nblks; + __entry->reason = reason; + ), + + TP_printk("fast_commit done on dev %d,%d, nblks %d, reason %s", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->nblks, __entry->reason) +); + #endif /* _TRACE_EXT4_H */ /* This part must be outside protection */ From patchwork Tue Oct 1 07:40:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169769 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Ysg7GglA"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB7C6Pq5z9sQy for ; Tue, 1 Oct 2019 17:42:15 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733124AbfJAHmK (ORCPT ); Tue, 1 Oct 2019 03:42:10 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:47021 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725777AbfJAHmK (ORCPT ); Tue, 1 Oct 2019 03:42:10 -0400 Received: by mail-pf1-f196.google.com with SMTP id q5so7280723pfg.13 for ; Tue, 01 Oct 2019 00:42:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZuXRej6L5vOiGN+sOKYPoKIMS1zv0yuoG40GTFYvado=; b=Ysg7GglAxgQBkmjejoAEZ7LemIICRrBrkNR8tPnBQFLaoYSeCHQ9P0FcnHvhdn+C9B z1Dit/hoDXtww6iPxVgAXAMKU7uIVLLYIRo7yy9mi+Fkg21VKE5VeZ3JCmLi6zUyIJWP lctVEkD99k6hsp9cc6VHTOl/e9PaxhkuTqu02Ljm4CvqgQuBR87eJ1UDJSkKcNg+KhZa 5MZs64NNhmo51VNNp5GXZs+es5FHgTawFW/5OyI8s1k5QPKfkws/Krk59ZqZbBdfr2WP gGXkq1+Q+Xh+TqRYhMQq69yffqkniPFDSHZMSttgG0Wn6UtvBKIxtQ/Y6nd3Qvf0J8Cl NP2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZuXRej6L5vOiGN+sOKYPoKIMS1zv0yuoG40GTFYvado=; b=KUL0PeuXDzJtgow1y/4KGfmCvB7v6CIvB0+ruhothmKIjvMDd8O1QNUwqPGfHezdJs EsHo9bRtmroCRwb44wI4gLOJc71kfeYjL7RkgHLm+BnaLvChaHFTedf84gBZ9dN5ZxwF TidyXauf6wNVL35DhzwiAzpkkDm3FBU4SmyM4qmiwKNnQM/yrWaZRFy0Vkrkj+wQE2z8 OFWLAPdmBrbuuAZRyPMgW22/T8elfGo8VksT+96rwwYs2M/RLQSHMSKQAEkOLqsOLwCN XTlwxjWrmyZ4oYJGy0JieXtIl4LadCHvdBIrad/qRxhok+SIW7MGwKn4SiLHVtpq1wqA MVIg== X-Gm-Message-State: APjAAAUTTqts6zSZr1VWDZIv1VHk+HA2RoBeHujF5Dwllz/mMkReFsN7 B5yd9QQJ+OGA2nT/F7zCpbsI9Z+obx8= X-Google-Smtp-Source: APXvYqxgCSdaR5KHU38MeXQVwfkv23dPsEuzW8Bl0+jLRbZwZWg5JazQmIlJv/0RGhght9BxVR+5cw== X-Received: by 2002:a17:90a:ba91:: with SMTP id t17mr4125233pjr.116.1569915728449; Tue, 01 Oct 2019 00:42:08 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:08 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 10/13] ext4: fast-commit recovery path changes Date: Tue, 1 Oct 2019 00:40:59 -0700 Message-Id: <20191001074101.256523-11-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch adds core fast-commit recovery path changes. Each fast commit block stores modified extents and added dentry for a particular file. Replay code maps blocks in each such extent to the actual file one-by-one. We also update corresponding file system metadata to account for newly mapped blocks. Also, for the newly added dentrys we open the parent inode and add dentry found in fast commit block into the parent dir. In order to achieve all of these, ext4_inode_csum_set(), ext4_inode_blocks(), ext4_find_entry(), ext4_add_nondir(), ext4_reset_inode_seed() which were earlier static are now made visible. Signed-off-by: Harshad Shirwadkar --- fs/ext4/balloc.c | 7 +- fs/ext4/ext4.h | 19 ++ fs/ext4/ext4_jbd2.c | 369 ++++++++++++++++++++++++++++++++++++ fs/ext4/extents.c | 19 +- fs/ext4/ialloc.c | 51 +++-- fs/ext4/inode.c | 13 +- fs/ext4/ioctl.c | 6 +- fs/ext4/mballoc.c | 83 ++++++++ fs/ext4/mballoc.h | 2 + fs/ext4/namei.c | 4 +- include/trace/events/ext4.h | 22 +++ 11 files changed, 560 insertions(+), 35 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 0b202e00d93f..2433f12d2d88 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -360,7 +360,12 @@ static int ext4_validate_block_bitmap(struct super_block *sb, struct buffer_head *bh) { ext4_fsblk_t blk; - struct ext4_group_info *grp = ext4_get_group_info(sb, block_group); + struct ext4_group_info *grp; + + if (EXT4_SB(sb)->s_fc_replay) + return 0; + + grp = ext4_get_group_info(sb, block_group); if (buffer_verified(bh)) return 0; diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index c36ec23046f3..cd5b567d8ca8 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1404,6 +1404,13 @@ struct ext4_super_block { #define ext4_has_strict_mode(sbi) \ (sbi->s_encoding_flags & EXT4_ENC_STRICT_MODE_FL) +struct ext4_fc_replay_state { + int fc_replay_error; + int fc_replay_expected_off; + int fc_replay_expected_tid; + int fc_replay_current_subtid; +}; + /* * fourth extended-fs super-block data in memory */ @@ -1588,6 +1595,7 @@ struct ext4_sb_info { * Are changes after the last commit * eligible for fast commit? */ + struct ext4_fc_replay_state s_fc_replay_state; spinlock_t s_fc_lock; }; @@ -2577,6 +2585,10 @@ extern int ext4_trim_fs(struct super_block *, struct fstrim_range *); extern void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid); /* inode.c */ +void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw, + struct ext4_inode_info *ei); +blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, + struct ext4_inode_info *ei); int ext4_inode_is_fast_symlink(struct inode *inode); struct buffer_head *ext4_getblk(handle_t *, struct inode *, ext4_lblk_t, int); struct buffer_head *ext4_bread(handle_t *, struct inode *, ext4_lblk_t, int); @@ -2660,12 +2672,19 @@ extern int ext4_ind_remove_space(handle_t *handle, struct inode *inode, /* ioctl.c */ extern long ext4_ioctl(struct file *, unsigned int, unsigned long); extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long); +extern void ext4_reset_inode_seed(struct inode *inode); /* migrate.c */ extern int ext4_ext_migrate(struct inode *); extern int ext4_ind_migrate(struct inode *inode); /* namei.c */ +extern struct buffer_head *ext4_find_entry(struct inode *dir, + const struct qstr *d_name, + struct ext4_dir_entry_2 **res_dir, + int *inlined); +extern int ext4_add_nondir(handle_t *handle, + struct dentry *dentry, struct inode *inode); extern int ext4_dirblock_csum_verify(struct inode *inode, struct buffer_head *bh); extern int ext4_orphan_add(handle_t *, struct inode *); diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index fd7740372438..12d6e70bf676 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -5,6 +5,7 @@ #include "ext4_jbd2.h" #include "ext4_extents.h" +#include "mballoc.h" #include @@ -517,6 +518,16 @@ static inline u8 *fc_add_tag(u8 *dst, u16 tag, u16 len, u8 *val) return dst + sizeof(tl) + len; } +static int fc_tag_len(struct ext4_fc_tl *tl) +{ + return le16_to_cpu(tl->fc_len); +} + +static u8 *fc_tag_val(struct ext4_fc_tl *tl) +{ + return (u8 *)tl + sizeof(*tl); +} + int ext4_fc_write_inode(journal_t *journal, struct buffer_head *bh, struct inode *inode, tid_t tid, tid_t subtid, int is_last, struct dentry *dentry) @@ -782,10 +793,368 @@ static int ext4_journal_fc_commit_cb(journal_t *journal, tid_t tid, return jbd2_wait_on_fc_bufs(journal, num_bufs); } +int ext4_fc_create_inode(struct super_block *sb, struct ext4_inode *raw_inode, + int ino, unsigned long parent, const char *dname, + int dlen) +{ + struct inode *dir = NULL, *inode = NULL; + struct dentry *dentry_dir = NULL, *dentry_inode = NULL; + struct qstr qstr_dname = QSTR_INIT(dname, dlen); + struct ext4_dir_entry_2 *res_dir = NULL; + struct buffer_head *dirent_bh; + int ret = 0, inlined; + + inode = ext4_iget(sb, ino, EXT4_IGET_NORMAL); + if (!IS_ERR(inode)) { + jbd_debug(1, "Inode %d already exists.", inode->i_ino); + iput(inode); + return PTR_ERR(inode); + } + + dir = ext4_iget(sb, parent, EXT4_IGET_NORMAL); + if (IS_ERR(dir)) { + jbd_debug(1, "Dir with inode %d not found.", parent); + ret = PTR_ERR(inode); + goto out; + } + + dentry_dir = d_obtain_alias(dir); + if (IS_ERR(dentry_dir)) { + jbd_debug(1, "Failed to obtain dentry"); + ret = PTR_ERR(dentry_dir); + goto out; + } + + dentry_inode = d_alloc(dentry_dir, &qstr_dname); + if (!dentry_inode) { + jbd_debug(1, "Inode dentry not created."); + ret = -ENOMEM; + goto out; + } + + inode = ext4_new_inode(NULL, dir, le16_to_cpu(raw_inode->i_mode), NULL, + ino, NULL, le32_to_cpu(raw_inode->i_flags)); + if (IS_ERR(inode)) { + jbd_debug(1, "Failed to create a new inode."); + ret = PTR_ERR(inode); + goto out; + } + + dirent_bh = ext4_find_entry(dir, &qstr_dname, &res_dir, &inlined); + if (!dirent_bh || IS_ERR(dirent_bh)) { + ret = ext4_add_nondir(NULL, dentry_inode, inode); + if (ret != 0) { + jbd_debug(1, "Failed to add dentry\n"); + goto out; + } + } else { + if (le32_to_cpu(res_dir->inode) != inode->i_ino) { + jbd_debug(1, "Entry exists and mismatched inode nos."); + brelse(dirent_bh); + ret = -EEXIST; + goto out; + } + brelse(dirent_bh); + } + + ext4_mark_inode_dirty(NULL, dir); + +out: + if (dentry_dir) { + d_drop(dentry_dir); + dput(dentry_dir); + } else if (dir) { + iput(dir); + } + if (dentry_inode) { + d_drop(dentry_inode); + dput(dentry_inode); + } + + return 0; +} + +static int ext4_journal_fc_replay_scan(struct super_block *sb, + struct buffer_head *bh, int off) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_fc_replay_state *state; + struct ext4_fc_commit_hdr *fc_hdr; + struct ext4_fc_tl *tl; + __u32 csum, dummy_csum = 0; + __u8 *start; + tid_t fc_subtid; + int i; + + state = &sbi->s_fc_replay_state; + fc_hdr = (struct ext4_fc_commit_hdr *) + ((__u8 *)bh->b_data + sizeof(journal_header_t)); + + fc_subtid = le32_to_cpu(fc_hdr->fc_subtid); + + if (le32_to_cpu(fc_hdr->fc_magic) != EXT4_FC_MAGIC) { + state->fc_replay_error = -ENOENT; + goto out_err; + } + + if (off != state->fc_replay_expected_off) { + state->fc_replay_error = -EFSCORRUPTED; + goto out_err; + } + + if (le16_to_cpu(fc_hdr->fc_features)) { + state->fc_replay_error = -EOPNOTSUPP; + goto out_err; + } + + /* Check if we already concluded that this fast commit is not useful */ + if (state->fc_replay_error && state->fc_replay_error != -EPROTO) + goto out_err; + + if (state->fc_replay_expected_off == 0) { + /* This is a first block */ + state->fc_replay_current_subtid = fc_subtid; + /* + * We set replay error by default until we find an end + * block for a particular subtid + */ + state->fc_replay_error = -EPROTO; + } + + if (state->fc_replay_error == 0) { + /* + * We have already encountered _last_ block for previous + * subtid. So we should only find a bigger subtid here. + */ + if (fc_subtid <= state->fc_replay_current_subtid) { + state->fc_replay_error = -EFSCORRUPTED; + goto out_err; + } + state->fc_replay_current_subtid = fc_subtid; + state->fc_replay_error = -EPROTO; + } else if (state->fc_replay_current_subtid != fc_subtid) { + /* + * Different subtid found before we found the end of this + * subtid. + */ + state->fc_replay_error = -EFSCORRUPTED; + goto out_err; + } + + /* + * We can replay fast commit blocks only if we find a _last_ block for + * all subtids. + */ + if (ext4_fc_is_last(fc_hdr)) + state->fc_replay_error = 0; + + csum = ext4_chksum(sbi, 0, fc_hdr, + offsetof(struct ext4_fc_commit_hdr, fc_csum)); + csum = ext4_chksum(sbi, csum, &dummy_csum, sizeof(dummy_csum)); + + tl = (struct ext4_fc_tl *)(fc_hdr + 1); + start = (__u8 *)tl; + for (i = 0; i < le16_to_cpu(fc_hdr->fc_num_tlvs); i++) { + switch (le16_to_cpu(tl->fc_tag)) { + case EXT4_FC_TAG_PARENT_INO: + case EXT4_FC_TAG_DNAME: + case EXT4_FC_TAG_EXT: + break; + default: + goto out_err; + } + tl = (struct ext4_fc_tl *)((__u8 *)tl + + le16_to_cpu(tl->fc_len) + + sizeof(*tl)); + } + csum = ext4_chksum(sbi, csum, start, (__u8 *)tl - start); + if (csum != le32_to_cpu(fc_hdr->fc_csum)) { + state->fc_replay_error = -EFSBADCRC; + goto out_err; + } + + state->fc_replay_expected_off++; + return 0; + +out_err: + trace_ext4_journal_fc_replay_scan(sb, off, state->fc_replay_error); + return state->fc_replay_error; +} + +static void ext4_fc_add_block(struct inode *inode, ext4_lblk_t lblk, + ext4_fsblk_t pblk, int unwritten) +{ + struct ext4_extent ex; + struct ext4_ext_path *path = NULL; + struct ext4_map_blocks map; + int ret; + + map.m_lblk = lblk; + map.m_len = 0x1; + ret = ext4_map_blocks(NULL, inode, &map, 0); + if (ret > 0) { + if (pblk != map.m_pblk) + jbd_debug(1, "Bad mapping found while replaying fc\n"); + return; + } + + ex.ee_block = cpu_to_le32(lblk); + ext4_ext_store_pblock(&ex, pblk); + ex.ee_len = cpu_to_le16(0x1); + if (unwritten) + ext4_ext_mark_unwritten(&ex); + + path = ext4_find_extent(inode, lblk, NULL, 0); + if (path) { + down_write(&EXT4_I(inode)->i_data_sem); + ret = ext4_ext_insert_extent(NULL, inode, &path, &ex, 0); + ext4_mb_mark_used(inode->i_sb, ext4_ext_pblock(&ex), 0x1); + up_write((&EXT4_I(inode)->i_data_sem)); + kfree(path); + } +} + +static int ext4_journal_fc_replay_cb(journal_t *journal, struct buffer_head *bh, + enum passtype pass, int off) +{ + struct super_block *sb = journal->j_private; + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_fc_commit_hdr *fc_hdr; + struct ext4_fc_tl *tl; + struct ext4_iloc iloc; + struct ext4_extent *ex; + struct inode *inode; + char *dname = NULL; + int dname_len = 0; + int parent_ino = -1; + int i, j, ret; + + if (pass == PASS_SCAN) + return ext4_journal_fc_replay_scan(sb, bh, off); + + if (sbi->s_fc_replay_state.fc_replay_error) { + jbd_debug(1, "FC replay error set = %d\n", + sbi->s_fc_replay_state.fc_replay_error); + return sbi->s_fc_replay_state.fc_replay_error; + } + + sbi->s_fc_replay = true; + fc_hdr = (struct ext4_fc_commit_hdr *) + ((__u8 *)bh->b_data + sizeof(journal_header_t)); + + jbd_debug(3, "%s: Got FC block for inode %d at [%d,%d]", __func__, + le32_to_cpu(fc_hdr->fc_ino), + be32_to_cpu(((journal_header_t *)bh->b_data)->h_sequence), + le32_to_cpu(fc_hdr->fc_subtid)); + + tl = (struct ext4_fc_tl *)(fc_hdr + 1); + if (le16_to_cpu(fc_hdr->fc_num_tlvs) >= 2) { + for (i = 0; i < 2; i++) { + switch (le16_to_cpu(tl->fc_tag)) { + case EXT4_FC_TAG_DNAME: + dname = fc_tag_val(tl); + dname_len = fc_tag_len(tl); + break; + case EXT4_FC_TAG_PARENT_INO: + parent_ino = le32_to_cpu( + *(__le32 *)fc_tag_val(tl)); + break; + } + tl = (struct ext4_fc_tl *)(fc_tag_val(tl) + + fc_tag_len(tl)); + } + } + + if (parent_ino && dname) { + ret = ext4_fc_create_inode(sb, &fc_hdr->inode, + le32_to_cpu(fc_hdr->fc_ino), parent_ino, + dname, dname_len); + if (ret) { + jbd_debug(1, "Failed to create ext4 inode."); + return ret; + } + } + + inode = ext4_iget(sb, le32_to_cpu(fc_hdr->fc_ino), EXT4_IGET_NORMAL); + if (IS_ERR(inode)) + return 0; + + ret = ext4_get_inode_loc(inode, &iloc); + if (ret) + return ret; + + inode_lock(inode); + tl = (struct ext4_fc_tl *)(fc_hdr + 1); + for (i = 0; i < le16_to_cpu(fc_hdr->fc_num_tlvs); i++) { + switch (le16_to_cpu(tl->fc_tag)) { + case EXT4_FC_TAG_EXT: + ex = (struct ext4_extent *)(tl + 1); + /* + * We add block by block because part of extent may + * already have been added by a previous fast commit + * replay. + */ + for (j = 0; j < ext4_ext_get_actual_len(ex); j++) + ext4_fc_add_block(inode, + le32_to_cpu(ex->ee_block) + j, + ext4_ext_pblock(ex) + j, + ext4_ext_is_unwritten(ex)); + break; + case EXT4_FC_TAG_PARENT_INO: + case EXT4_FC_TAG_DNAME: + break; + default: + jbd_debug(1, "Unknown tag found.\n"); + } + tl = (struct ext4_fc_tl *)((__u8 *)tl + + le16_to_cpu(tl->fc_len) + + sizeof(*tl)); + } + ext4_reserve_inode_write(NULL, inode, &iloc); + inode_unlock(inode); + + /* + * Unless inode contains inline data, copy everything except + * i_blocks. i_blocks would have been set alright by ext4_fc_add_block + * call above. + */ + if (ext4_has_inline_data(inode)) { + memcpy(ext4_raw_inode(&iloc), &fc_hdr->inode, + sizeof(struct ext4_inode)); + } else { + memcpy(ext4_raw_inode(&iloc), &fc_hdr->inode, + offsetof(struct ext4_inode, i_block)); + memcpy(&ext4_raw_inode(&iloc)->i_generation, + &fc_hdr->inode.i_generation, + sizeof(struct ext4_inode) - + offsetof(struct ext4_inode, i_generation)); + } + inode->i_generation = le32_to_cpu(ext4_raw_inode(&iloc)->i_generation); + ext4_reset_inode_seed(inode); + + ext4_inode_csum_set(inode, ext4_raw_inode(&iloc), EXT4_I(inode)); + ret = ext4_handle_dirty_metadata(NULL, inode, iloc.bh); + brelse(iloc.bh); + iput(inode); + if (!ret) + ret = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL); + + sbi->s_fc_replay = false; + + return ret; +} + void ext4_init_fast_commit(struct super_block *sb, journal_t *journal) { if (ext4_should_fast_commit(sb)) { journal->j_fc_commit_callback = ext4_journal_fc_commit_cb; journal->j_fc_cleanup_callback = ext4_journal_fc_cleanup_cb; } + + /* + * We set replay callback even if fast commit disabled because we may + * could still have fast commit blocks that need to be replayed even if + * fast commit has now been turned off. + */ + journal->j_fc_replay_callback = ext4_journal_fc_replay_cb; } diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index dea4c2632272..d70c09cbbc3f 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2893,7 +2893,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, int depth = ext_depth(inode); struct ext4_ext_path *path = NULL; struct partial_cluster partial; - handle_t *handle; + handle_t *handle = NULL; int i = 0, err = 0; partial.pclu = 0; @@ -2903,9 +2903,11 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, ext_debug("truncate since %u to %u\n", start, end); /* probably first extent we're gonna free will be last in block */ - handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, depth + 1); - if (IS_ERR(handle)) - return PTR_ERR(handle); + if (!sbi->s_fc_replay) { + handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, depth + 1); + if (IS_ERR(handle)) + return PTR_ERR(handle); + } again: trace_ext4_ext_remove_space(inode, start, end, depth); @@ -2925,7 +2927,8 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, /* find extent for or closest extent to this block */ path = ext4_find_extent(inode, end, NULL, EXT4_EX_NOCACHE); if (IS_ERR(path)) { - ext4_journal_stop(handle); + if (!sbi->s_fc_replay) + ext4_journal_stop(handle); return PTR_ERR(path); } depth = ext_depth(inode); @@ -3011,7 +3014,8 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, path = kcalloc(depth + 1, sizeof(struct ext4_ext_path), GFP_NOFS); if (path == NULL) { - ext4_journal_stop(handle); + if (!sbi->s_fc_replay) + ext4_journal_stop(handle); return -ENOMEM; } path[0].p_maxdepth = path[0].p_depth = depth; @@ -3141,7 +3145,8 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, path = NULL; if (err == -EAGAIN) goto again; - ext4_journal_stop(handle); + if (!sbi->s_fc_replay) + ext4_journal_stop(handle); return err; } diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 47d04a33a3ca..d32dea0757fe 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -82,7 +82,12 @@ static int ext4_validate_inode_bitmap(struct super_block *sb, struct buffer_head *bh) { ext4_fsblk_t blk; - struct ext4_group_info *grp = ext4_get_group_info(sb, block_group); + struct ext4_group_info *grp; + + if (EXT4_SB(sb)->s_fc_replay) + return 0; + + grp = ext4_get_group_info(sb, block_group); if (buffer_verified(bh)) return 0; @@ -287,15 +292,17 @@ void ext4_free_inode(handle_t *handle, struct inode *inode) bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb); bitmap_bh = ext4_read_inode_bitmap(sb, block_group); /* Don't bother if the inode bitmap is corrupt. */ - grp = ext4_get_group_info(sb, block_group); if (IS_ERR(bitmap_bh)) { fatal = PTR_ERR(bitmap_bh); bitmap_bh = NULL; goto error_return; } - if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) { - fatal = -EFSCORRUPTED; - goto error_return; + if (!sbi->s_fc_replay) { + grp = ext4_get_group_info(sb, block_group); + if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) { + fatal = -EFSCORRUPTED; + goto error_return; + } } BUFFER_TRACE(bitmap_bh, "get_write_access"); @@ -758,7 +765,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, struct inode *ret; ext4_group_t i; ext4_group_t flex_group; - struct ext4_group_info *grp; + struct ext4_group_info *grp = NULL; int encrypt = 0; /* Cannot create files in a deleted directory */ @@ -896,15 +903,20 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, if (ext4_free_inodes_count(sb, gdp) == 0) goto next_group; - grp = ext4_get_group_info(sb, group); - /* Skip groups with already-known suspicious inode tables */ - if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) - goto next_group; + if (!sbi->s_fc_replay) { + grp = ext4_get_group_info(sb, group); + /* + * Skip groups with already-known suspicious inode + * tables + */ + if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) + goto next_group; + } brelse(inode_bitmap_bh); inode_bitmap_bh = ext4_read_inode_bitmap(sb, group); /* Skip groups with suspicious inode tables */ - if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp) || + if ((!sbi->s_fc_replay && EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) || IS_ERR(inode_bitmap_bh)) { inode_bitmap_bh = NULL; goto next_group; @@ -923,7 +935,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, goto next_group; } - if (!handle) { + if (!sbi->s_fc_replay && !handle) { BUG_ON(nblocks <= 0); handle = __ext4_journal_start_sb(dir->i_sb, line_no, handle_type, nblocks, @@ -1027,9 +1039,15 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, /* Update the relevant bg descriptor fields */ if (ext4_has_group_desc_csum(sb)) { int free; - struct ext4_group_info *grp = ext4_get_group_info(sb, group); - - down_read(&grp->alloc_sem); /* protect vs itable lazyinit */ + struct ext4_group_info *grp = NULL; + + if (!sbi->s_fc_replay) { + grp = ext4_get_group_info(sb, group); + down_read(&grp->alloc_sem); /* + * protect vs itable + * lazyinit + */ + } ext4_lock_group(sb, group); /* while we modify the bg desc */ free = EXT4_INODES_PER_GROUP(sb) - ext4_itable_unused_count(sb, gdp); @@ -1045,7 +1063,8 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, if (ino > free) ext4_itable_unused_set(sb, gdp, (EXT4_INODES_PER_GROUP(sb) - ino)); - up_read(&grp->alloc_sem); + if (!sbi->s_fc_replay) + up_read(&grp->alloc_sem); } else { ext4_lock_group(sb, group); } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index cbfa1ec858a1..9e5d8a82556f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -103,8 +103,8 @@ static int ext4_inode_csum_verify(struct inode *inode, struct ext4_inode *raw, return provided == calculated; } -static void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw, - struct ext4_inode_info *ei) +void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw, + struct ext4_inode_info *ei) { __u32 csum; @@ -4800,8 +4800,8 @@ void ext4_set_inode_flags(struct inode *inode) S_ENCRYPTED|S_CASEFOLD); } -static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, - struct ext4_inode_info *ei) +blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, + struct ext4_inode_info *ei) { blkcnt_t i_blocks ; struct inode *inode = &(ei->vfs_inode); @@ -4951,8 +4951,9 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, } if (!ext4_inode_csum_verify(inode, raw_inode, ei)) { - ext4_error_inode(inode, function, line, 0, - "iget: checksum invalid"); + if (!EXT4_SB(sb)->s_fc_replay) + ext4_error_inode(inode, function, line, 0, + "iget: checksum invalid"); ret = -EFSBADCRC; goto bad_inode; } diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index a8e23acb5c03..35019e9d2803 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -86,7 +86,7 @@ static void swap_inode_data(struct inode *inode1, struct inode *inode2) i_size_write(inode2, isize); } -static void reset_inode_seed(struct inode *inode) +void ext4_reset_inode_seed(struct inode *inode) { struct ext4_inode_info *ei = EXT4_I(inode); struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); @@ -199,8 +199,8 @@ static long swap_inode_boot_loader(struct super_block *sb, inode->i_generation = prandom_u32(); inode_bl->i_generation = prandom_u32(); - reset_inode_seed(inode); - reset_inode_seed(inode_bl); + ext4_reset_inode_seed(inode); + ext4_reset_inode_seed(inode_bl); ext4_discard_preallocations(inode); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index a3e2767bdf2f..70551fa91237 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2915,6 +2915,89 @@ void ext4_exit_mballoc(void) } +void ext4_mb_mark_used(struct super_block *sb, ext4_fsblk_t block, + int len) +{ + struct buffer_head *bitmap_bh = NULL; + struct ext4_group_desc *gdp; + struct buffer_head *gdp_bh; + struct ext4_sb_info *sbi = EXT4_SB(sb); + ext4_group_t group; + ext4_fsblk_t cluster; + ext4_grpblk_t blkoff; + int i, clen, err; + int already_allocated_count; + + cluster = EXT4_B2C(sbi, block); + clen = EXT4_B2C(sbi, len); + + ext4_get_group_no_and_offset(sb, block, &group, &blkoff); + bitmap_bh = ext4_read_block_bitmap(sb, group); + if (IS_ERR(bitmap_bh)) { + err = PTR_ERR(bitmap_bh); + bitmap_bh = NULL; + goto out_err; + } + + err = -EIO; + gdp = ext4_get_group_desc(sb, group, &gdp_bh); + if (!gdp) + goto out_err; + + if (!ext4_data_block_valid(sbi, block, len)) { + ext4_error(sb, "Allocating blks %llu-%llu which overlap mdata", + cluster, cluster+clen); + /* File system mounted not to panic on error + * Fix the bitmap and return EFSCORRUPTED + * We leak some of the blocks here. + */ + ext4_lock_group(sb, group); + ext4_set_bits(bitmap_bh->b_data, blkoff, clen); + ext4_unlock_group(sb, group); + err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + if (!err) + err = -EFSCORRUPTED; + goto out_err; + } + + ext4_lock_group(sb, group); + already_allocated_count = 0; + for (i = 0; i < clen; i++) + if (mb_test_bit(blkoff + i, bitmap_bh->b_data)) + already_allocated_count++; + + ext4_set_bits(bitmap_bh->b_data, blkoff, clen); + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { + gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); + ext4_free_group_clusters_set(sb, gdp, + ext4_free_clusters_after_init(sb, + group, gdp)); + } + clen = ext4_free_group_clusters(sb, gdp) - clen + + already_allocated_count; + ext4_free_group_clusters_set(sb, gdp, clen); + ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh); + ext4_group_desc_csum_set(sb, group, gdp); + + ext4_unlock_group(sb, group); + + if (sbi->s_log_groups_per_flex) { + ext4_group_t flex_group = ext4_flex_group(sbi, group); + + atomic64_sub(len, + &sbi->s_flex_groups[flex_group].free_clusters); + } + + err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + if (err) + goto out_err; + err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); + +out_err: + brelse(bitmap_bh); +} + /* * Check quota and mark chosen space (ac->ac_b_ex) non-free in bitmaps * Returns 0 if success or error code diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index 88c98f17e3d9..1881710041b6 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -215,4 +215,6 @@ ext4_mballoc_query_range( ext4_mballoc_query_range_fn formatter, void *priv); +void ext4_mb_mark_used(struct super_block *sb, ext4_fsblk_t block, + int len); #endif diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 8b73c5a38d49..0f0b6a64b3b1 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1578,7 +1578,7 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir, return ret; } -static struct buffer_head *ext4_find_entry(struct inode *dir, +struct buffer_head *ext4_find_entry(struct inode *dir, const struct qstr *d_name, struct ext4_dir_entry_2 **res_dir, int *inlined) @@ -2549,7 +2549,7 @@ static void ext4_dec_count(handle_t *handle, struct inode *inode) } -static int ext4_add_nondir(handle_t *handle, +int ext4_add_nondir(handle_t *handle, struct dentry *dentry, struct inode *inode) { int err = ext4_add_entry(handle, dentry, inode); diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h index 9c24b1c5239f..59329d69d0fc 100644 --- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -2703,6 +2703,28 @@ TRACE_EVENT(ext4_error, __entry->function, __entry->line) ); +TRACE_EVENT(ext4_journal_fc_replay_scan, + TP_PROTO(struct super_block *sb, int error, int off), + + TP_ARGS(sb, error, off), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(int, error) + __field(int, off) + ), + + TP_fast_assign( + __entry->dev = sb->s_dev; + __entry->error = error; + __entry->off = off; + ), + + TP_printk("FC scan pass on dev %d,%d: error %d, off %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->error, __entry->off) +); + TRACE_EVENT(ext4_journal_fc_commit_cb_start, TP_PROTO(struct super_block *sb), From patchwork Tue Oct 1 07:41:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169771 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="bon4WKyK"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB7G3vj1z9sQy for ; Tue, 1 Oct 2019 17:42:18 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733126AbfJAHmL (ORCPT ); Tue, 1 Oct 2019 03:42:11 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:43234 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733122AbfJAHmK (ORCPT ); Tue, 1 Oct 2019 03:42:10 -0400 Received: by mail-pg1-f194.google.com with SMTP id v27so8986182pgk.10 for ; Tue, 01 Oct 2019 00:42:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Csn8HpR/P0PMdXdUgWwjueiLxukgFLG05NBTuNCefSw=; b=bon4WKyKVHx9BysRxpZ/ELethKosuUF+XD0xDIUwR+swWjBpHbdVw85jXQWG4vIVBm 5benvb+lIlTmt3xiYw7QHS9ZJv/RoAKW3tEtYtav2q2GuMJuABQ7W4La3LVC5Csv1qJ1 Ss2hNq6crXODm7yX2jgomgnwmeukI9MOFvi9MLN1a2WF1IE9CYARDQAbNx4Fot4xxx/v S4fgEJQQElr/8RPrsf7qdyt+PGK2UQiFd9fLzihMjGZxdgGoU6gpbOpeOWsaDlA6VsBB iz9QYl7djMQlsCsjVJKYtde3XVKj98e1yyaLHbHJfSdYR+TPEUTN1ZP1SmhxNgzAUpun 1Iwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Csn8HpR/P0PMdXdUgWwjueiLxukgFLG05NBTuNCefSw=; b=OrRh7OsyjesViX3Udzn2GKCdobLFB9a/U6CmdtDyGRAuNyvFr3KmAZjIa3yv0K4Zxd 0A6O/ctqTjehyZ0LcJw4LTkSHxHRGCgUvgzFFxpyCVtNYz0wFlk/OC8YJbDR8fDtvFe4 /aWNAAVUEG+9JrZ+VHnWVHSvBQbgMPhsv0I+aDGRJXZRS0Gj9hG9WEgpTrLqBsRk99/B pjjnW2huG6z6c2JEcxgAX5t1htvawcW261tlIf01dlED5Vz4ipS6oKtyFWxcA1PBajJO NVdFCJnR+/CxFlrrb2kc4NsMBH1npjA4rcjJzu4MS3DmG67y3HBD9AyfiJpLUhK2m6AO RbRQ== X-Gm-Message-State: APjAAAUP4jv8lCNjsA7p+mrdeuoCKnXZH3J1nzCeF88fipHdCITl8UhV wvNoyoCvl/nFR2qJsCeY64CzzeEugdg= X-Google-Smtp-Source: APXvYqyf0wJHFndjU8F5yRrBPNwgwqU+0xr4gd9y94q8+W/9OnjnXQ04/GWYWJTry651RiTEE3ryoQ== X-Received: by 2002:a63:6c89:: with SMTP id h131mr29302173pgc.322.1569915729025; Tue, 01 Oct 2019 00:42:09 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:08 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 11/13] ext4: add support for asynchronous fast commits Date: Tue, 1 Oct 2019 00:41:00 -0700 Message-Id: <20191001074101.256523-12-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Until this patch, fast commits could only be invoked by jbd2 thread. This patch allows file system to perform fast commit in an async manner without involving jbd2 thread. This makes fast commits even faster as it gets rid of the time spent in context switching to jbd2 thread. In order to avoid race between jbd2 thread and async fast commits, we add new jbd2 APIs that allow file systems to indicate their intent of performing an async fast commit. Signed-off-by: Harshad Shirwadkar --- fs/ext4/ext4.h | 3 ++ fs/ext4/ext4_jbd2.c | 74 +++++++++++++++++++++++++++++++++++++++++++ fs/ext4/fsync.c | 7 ++-- fs/jbd2/commit.c | 11 +++++++ fs/jbd2/journal.c | 59 ++++++++++++++++++++++++++++++++++ fs/jbd2/transaction.c | 2 ++ include/linux/jbd2.h | 10 ++++++ 7 files changed, 164 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index cd5b567d8ca8..a8a481c5ffa4 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2716,6 +2716,9 @@ extern int ext4_group_extend(struct super_block *sb, extern int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count); /* super.c */ +int ext4_fc_async_commit(journal_t *journal, tid_t commit_tid, + tid_t commit_subtid, struct inode *inode, + struct dentry *dentry); extern struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block, int op_flags); extern int ext4_seq_options_show(struct seq_file *seq, void *offset); diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 12d6e70bf676..cf796268322b 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -1144,6 +1144,80 @@ static int ext4_journal_fc_replay_cb(journal_t *journal, struct buffer_head *bh, return ret; } +int ext4_fc_async_commit(journal_t *journal, tid_t commit_tid, + tid_t commit_subtid, struct inode *inode, + struct dentry *dentry) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + struct super_block *sb = inode->i_sb; + struct buffer_head *bh; + int ret; + + if (!ext4_should_fast_commit(sb)) + return jbd2_complete_transaction(journal, commit_tid); + + read_lock(&ei->i_fc.fc_lock); + if (ei->i_fc.fc_tid != commit_tid) { + read_unlock(&ei->i_fc.fc_lock); + return 0; + } + read_unlock(&ei->i_fc.fc_lock); + + if (ext4_is_inode_fc_ineligible(inode)) + return jbd2_complete_transaction(journal, commit_tid); + + if (jbd2_commit_check(journal, commit_tid, commit_subtid)) + return 0; + + ret = jbd2_start_async_fc(journal, commit_tid); + if (ret) + return jbd2_fc_complete_commit(journal, commit_tid, + commit_subtid); + + trace_ext4_journal_fc_commit_cb_start(sb); + + ret = jbd2_submit_inode_data(journal, ei->jinode); + if (ret) + goto out; + + ret = jbd2_map_fc_buf(journal, &bh); + if (ret) { + jbd2_stop_async_fc(journal, commit_tid); + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "map_fc_buf"); + return jbd2_complete_transaction(journal, commit_tid); + + } + + ret = ext4_fc_write_inode(journal, bh, inode, commit_tid, + commit_subtid, 1, dentry); + + if (ret < 0) { + brelse(bh); + jbd2_stop_async_fc(journal, commit_tid); + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "fc_write_inode"); + return jbd2_complete_transaction(journal, commit_tid); + } + lock_buffer(bh); + clear_buffer_dirty(bh); + set_buffer_uptodate(bh); + bh->b_end_io = ext4_end_buffer_io_sync; + submit_bh(REQ_OP_WRITE, REQ_SYNC, bh); + + jbd2_stop_async_fc(journal, commit_tid); + wait_on_buffer(bh); + if (unlikely(!buffer_uptodate(bh))) { + trace_ext4_journal_fc_commit_cb_stop(sb, 0, "IO"); + return -EIO; + } + +out: + trace_ext4_journal_fc_commit_cb_stop(sb, + ret < 0 ? 0 : ret, + ret >= 0 ? "success" : "fail"); + wake_up(&journal->j_wait_async_fc); + return ret; +} + void ext4_init_fast_commit(struct super_block *sb, journal_t *journal) { if (ext4_should_fast_commit(sb)) { diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index 5508baa11bb6..5bbfc55e1756 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -98,7 +98,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) struct ext4_inode_info *ei = EXT4_I(inode); journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; int ret = 0, err; - tid_t commit_tid; + tid_t commit_tid, commit_subtid; bool needs_barrier = false; if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) @@ -148,10 +148,13 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync) } commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; + commit_subtid = datasync ? ei->i_datasync_subtid : ei->i_sync_subtid; + if (journal->j_flags & JBD2_BARRIER && !jbd2_trans_will_send_data_barrier(journal, commit_tid)) needs_barrier = true; - ret = jbd2_complete_transaction(journal, commit_tid); + ret = ext4_fc_async_commit(journal, commit_tid, commit_subtid, + inode, file->f_path.dentry); if (needs_barrier) { issue_flush: err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index e85f51e1cc70..18cb70fa2421 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -452,6 +452,17 @@ void jbd2_journal_commit_transaction(journal_t *journal, bool *fc) write_lock(&journal->j_state_lock); full_commit = journal->j_do_full_commit; + journal->j_running_transaction->t_async_fc_allowed = false; + while (journal->j_running_transaction->t_async_fc_ongoing) { + DEFINE_WAIT(wait); + + prepare_to_wait(&journal->j_wait_async_fc, &wait, + TASK_UNINTERRUPTIBLE); + write_unlock(&journal->j_state_lock); + schedule(); + write_lock(&journal->j_state_lock); + finish_wait(&journal->j_wait_async_fc, &wait); + } write_unlock(&journal->j_state_lock); /* Let file-system try its own fast commit */ diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index e0684212384d..81daa2cff67f 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -794,6 +794,64 @@ int jbd2_commit_check(journal_t *journal, tid_t tid, tid_t subtid) return 0; } +int jbd2_start_async_fc(journal_t *journal, tid_t tid) +{ + transaction_t *txn; + int ret = -EINVAL; + + if (!journal->j_running_transaction) + return ret; + + if (journal->j_running_transaction->t_tid != tid) + return ret; + + txn = journal->j_running_transaction; + write_lock(&journal->j_state_lock); + while (txn->t_state == T_RUNNING) { + DEFINE_WAIT(wait); + + if (txn->t_async_fc_allowed) { + if (!txn->t_async_fc_ongoing) { + txn->t_async_fc_ongoing = true; + ret = 0; + break; + } + prepare_to_wait(&journal->j_wait_async_fc, + &wait, TASK_UNINTERRUPTIBLE); + write_unlock(&journal->j_state_lock); + schedule(); + write_lock(&journal->j_state_lock); + finish_wait(&journal->j_wait_async_fc, &wait); + } else { + ret = -ECANCELED; + break; + } + } + write_unlock(&journal->j_state_lock); + + return ret; +} + +int jbd2_stop_async_fc(journal_t *journal, tid_t tid) +{ + transaction_t *txn; + + if (!journal->j_running_transaction) + return -EINVAL; + + if (journal->j_running_transaction->t_tid != tid) + return -EINVAL; + + txn = journal->j_running_transaction; + write_lock(&journal->j_state_lock); + J_ASSERT(txn->t_state == T_RUNNING); + txn->t_async_fc_ongoing = false; + txn->t_subtid++; + write_unlock(&journal->j_state_lock); + return 0; + +} + /* Return 1 when transaction with given tid has already committed. */ int jbd2_transaction_committed(journal_t *journal, tid_t tid) { @@ -1308,6 +1366,7 @@ static journal_t *journal_init_common(struct block_device *bdev, init_waitqueue_head(&journal->j_wait_commit); init_waitqueue_head(&journal->j_wait_updates); init_waitqueue_head(&journal->j_wait_reserved); + init_waitqueue_head(&journal->j_wait_async_fc); mutex_init(&journal->j_barrier); mutex_init(&journal->j_checkpoint_mutex); spin_lock_init(&journal->j_revoke_lock); diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index ce7f03cfd90b..f17f813b5610 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -103,6 +103,8 @@ static void jbd2_get_transaction(journal_t *journal, transaction->t_max_wait = 0; transaction->t_start = jiffies; transaction->t_requested = 0; + transaction->t_async_fc_allowed = true; + transaction->t_async_fc_ongoing = false; } /* diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 312103fc9581..5610f16de919 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -604,6 +604,7 @@ struct transaction_s T_FINISHED } t_state; + bool t_async_fc_allowed, t_async_fc_ongoing; /* * Where in the log does this transaction's commit start? [no locking] */ @@ -869,6 +870,13 @@ struct journal_s */ wait_queue_head_t j_wait_reserved; + /** + * @j_wait_async_fc: + * + * Wait queue to wait for completion of async fast commits. + */ + wait_queue_head_t j_wait_async_fc; + /** * @j_checkpoint_mutex: * @@ -1594,6 +1602,8 @@ int jbd2_complete_transaction(journal_t *journal, tid_t tid); int jbd2_log_do_checkpoint(journal_t *journal); int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid); int jbd2_fc_complete_commit(journal_t *journal, tid_t tid, tid_t subtid); +int jbd2_start_async_fc(journal_t *journal, tid_t tid); +int jbd2_stop_async_fc(journal_t *journal, tid_t tid); void __jbd2_log_wait_for_space(journal_t *journal); extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *); From patchwork Tue Oct 1 07:41:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harshad shirwadkar X-Patchwork-Id: 1169772 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="G8s5Mhab"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46jB7H6rFRz9sPd for ; Tue, 1 Oct 2019 17:42:19 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733128AbfJAHmM (ORCPT ); Tue, 1 Oct 2019 03:42:12 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:34621 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725777AbfJAHmM (ORCPT ); Tue, 1 Oct 2019 03:42:12 -0400 Received: by mail-pg1-f195.google.com with SMTP id y35so9030026pgl.1 for ; Tue, 01 Oct 2019 00:42:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=n/Q7WPV+S23Bdpz0IUKjn+ilGhjsHlGDPmfP9B95hwQ=; b=G8s5Mhab2UX7r2+OJ0FjAYzzc0GGDFgvLFbUPaXzvhDoMPIVJYQQyaFpWsubhJt0SL MtD6amJ/h4w6eiTBk9cX0s2ix748E0GfOTA3bVK0CMCJtWpgNj4okoTb8IxEMKKon6L5 UHh/5WCaXnNecAVuXB6uLLt87f3IZMWvtZ/0tPdOv/duh4mwSv71xRGUGBezzojRh0MD Gf4YvneEBsxdl95do13kxJPDmUKD0h6xkEjqXbQRX+/3sC63s27K5fIUdqTEj9lyIRoA 5s0lHvtclNVbE9tgif6pvUDkCOXpGRfsSccvL/zPD3snaqM7mQmzhOIuGiVxXT4JzvOF Jdog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=n/Q7WPV+S23Bdpz0IUKjn+ilGhjsHlGDPmfP9B95hwQ=; b=mZPYqFROYxz+bGJgb1JE3+hHqCfhj+zT4wMOpb1FDe4dPXrxxrNzgDZo6K+Xr72z3k OvJ+XL+mmn8ucE7OTkv9vxBtPJqQ/aL9NDfUdsdnU18rjdpXCYw6ZP4Az2L6fcNosWhA QCbnKC94fYTNiFablwMrLwLGqp4aZZ6dfyRGT5BkUCohbCy1hDPCiCTqXyCCrkMv44kI ZrlfNYsEndCZKT6DRe/3SsrB02YXwVdH1D4E+qZPIlf3IhwlnTU8c1zPK1qDaYPDWSxR JO51PoJfDWNibmEqCXFsxtkKUsSaaHgM3sfmZdIegykzAiKW12g/Cy+0tz5UTacCnWwI 68sw== X-Gm-Message-State: APjAAAUYi10tX9XjPkbmzYa+5ogl1bHBzc67xRsJv9ntBaoOy6FAXVdz F3LYIo9UarAzxCZXDBPrmsH6wO6gLnA= X-Google-Smtp-Source: APXvYqz2NuiwYyIsLKfakqWI6kqonZDf4bO15NaRjuhpA0KYVEW9tQIfCSiNpvQC3f/HyIKbooTjcQ== X-Received: by 2002:a17:90a:356d:: with SMTP id q100mr4053408pjb.53.1569915730109; Tue, 01 Oct 2019 00:42:10 -0700 (PDT) Received: from harshads0.svl.corp.google.com ([2620:15c:2cd:202:ec1e:207a:e951:9a5b]) by smtp.googlemail.com with ESMTPSA id q13sm2287668pjq.0.2019.10.01.00.42.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Oct 2019 00:42:09 -0700 (PDT) From: Harshad Shirwadkar To: linux-ext4@vger.kernel.org Cc: Harshad Shirwadkar Subject: [PATCH v3 12/13] docs: Add fast commit documentation Date: Tue, 1 Oct 2019 00:41:01 -0700 Message-Id: <20191001074101.256523-13-harshadshirwadkar@gmail.com> X-Mailer: git-send-email 2.23.0.444.g18eeb5a265-goog In-Reply-To: <20191001074101.256523-1-harshadshirwadkar@gmail.com> References: <20191001074101.256523-1-harshadshirwadkar@gmail.com> MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This patch adds necessary documentation to Documentation/filesystems/journalling.rst and Documentation/filesystems/ext4/journal.rst. Signed-off-by: Harshad Shirwadkar --- Documentation/filesystems/ext4/journal.rst | 98 ++++++++++++++++++++-- Documentation/filesystems/journalling.rst | 22 +++++ 2 files changed, 114 insertions(+), 6 deletions(-) diff --git a/Documentation/filesystems/ext4/journal.rst b/Documentation/filesystems/ext4/journal.rst index ea613ee701f5..23e7db89fc6a 100644 --- a/Documentation/filesystems/ext4/journal.rst +++ b/Documentation/filesystems/ext4/journal.rst @@ -29,10 +29,14 @@ safest. If ``data=writeback``, dirty data blocks are not flushed to the disk before the metadata are written to disk through the journal. The journal inode is typically inode 8. The first 68 bytes of the -journal inode are replicated in the ext4 superblock. The journal itself -is normal (but hidden) file within the filesystem. The file usually -consumes an entire block group, though mke2fs tries to put it in the -middle of the disk. +journal inode are replicated in the ext4 superblock. The journal +itself is normal (but hidden) file within the filesystem. The file +usually consumes an entire block group, though mke2fs tries to put it +in the middle of the disk. Ext4 also utilizes JBD2's fast +commits. Fast commits store metadata changes to inodes in an +incremental fashion. A fast commit is valid only if there is no full +commit after that particular fast commit. Because of this fast commit +blocks are overwritten by a following transaction. All fields in jbd2 are written to disk in big-endian order. This is the opposite of ext4. @@ -48,16 +52,18 @@ Layout Generally speaking, the journal has this format: .. list-table:: - :widths: 16 48 16 + :widths: 16 48 16 18 :header-rows: 1 * - Superblock - descriptor\_block (data\_blocks or revocation\_block) [more data or revocations] commmit\_block - [more transactions...] + - [Fast commits...] * - - One transaction - + - Notice that a transaction begins with either a descriptor and some data, or a block revocation list. A finished transaction always ends with a @@ -76,7 +82,7 @@ The journal superblock will be in the next full block after the superblock. .. list-table:: - :widths: 12 12 12 32 12 + :widths: 12 12 12 32 12 12 :header-rows: 1 * - 1024 bytes of padding @@ -85,11 +91,13 @@ superblock. - descriptor\_block (data\_blocks or revocation\_block) [more data or revocations] commmit\_block - [more transactions...] + - [Fast commits...] * - - - - One transaction - + - Block Header ~~~~~~~~~~~~ @@ -609,3 +617,81 @@ bytes long (but uses a full block): - h\_commit\_nsec - Nanoseconds component of the above timestamp. +Fast Commit Block +~~~~~~~~~~~~~~~~~ + +The fast commit block indicates an append to the last commit block +that was written to the journal. One fast commit block records updates +to one inode. So, typically you would find as many fast commit blocks +as the number of inodes that got changed since the last commit. A fast +commit block is valid only if there is no commit block present with +transaction ID greater than that of the fast commit block. If such a +block a present, then there is no need to replay the fast commit +block. + +Multiple fast commit blocks are a part of one sub-transaction. To +indicate the last block in a fast commit transaction, fc_flags field +in the last block in every subtransaction is marked with "LAST" (0x1) +flag. A subtransaction is valid only if all the following conditions +are met: + +1) SUBTID of all blocks is either equal to or greater than SUBTID of + the previous fast commit block. +2) For every sub-transaction, last block is marked with LAST flag. +3) There are no invalid blocks in between. + +.. list-table:: + :widths: 8 8 24 40 + :header-rows: 1 + + * - Offset + - Type + - Name + - Descriptor + * - 0x0 + - journal\_header\_s + - (open coded) + - Common block header. + * - 0xC + - \_\_le32 + - fc\_magic + - Magic value which should be set to 0xE2540090. This identifies + that this block is a fast commit block. + * - 0x10 + - \_\_le32 + - fc\_subtid + - Sub-transaction ID for this commit block + * - 0x14 + - \_\_u8 + - fc\_features + - Features used by this fast commit block. + * - 0x15 + - \_\_u8 + - fc_flags + - Flags. (0x1(Last) - Indicates that this is the last block in sub-transaction) + * - 0x16 + - \_\_le16 + - fc_num_tlvs + - Number of TLVs contained in this fast commit block + * - 0x18 + - \_\_le32 + - \_\_fc\_len + - Length of the fast commit block in terms of number of blocks + * - 0x2c + - \_\_le32 + - fc\_ino + - Inode number of the inode that will be recovered using this fast commit + * - 0x30 + - struct ext4\_inode + - inode + - On-disk copy of the inode at the commit time + * - 0x34 + - struct ext4\_fc\_tl + - Array of struct ext4\_fc\_tl + - The actual delta with the last commit. Starting at this offset, + there is an array of TLVs that indicates which all extents + should be present in the corresponding inode. Currently, + following tags are supported: EXT4\_FC\_TAG\_EXT (extent that + should be present in the inode), EXT4\_FC\_TAG\_DNAME (dentry + name of the inode), EXT4\_FC\_TAG\_PARENT\_INO (inode number of + the directory that should contain the dentry of the inode). diff --git a/Documentation/filesystems/journalling.rst b/Documentation/filesystems/journalling.rst index 58ce6b395206..217f66d67f9d 100644 --- a/Documentation/filesystems/journalling.rst +++ b/Documentation/filesystems/journalling.rst @@ -115,6 +115,28 @@ called after each transaction commit. You can also use ``transaction->t_private_list`` for attaching entries to a transaction that need processing when the transaction commits. +JBD2 also allows client file systems to implement file system specific +commits which are called as ``fast commits``. File systems that wish +to use this feature should first set +``journal->j_fc_commit_callback``. That function is called before +performing a commit. File system can call :c:func:`jbd2_map_fc_buf()` +to get buffers reserved for fast commits. If file system returns 0, +JBD2 assumes that file system performed a fast commit and it backs off +from performing a commit. Otherwise, JBD2 falls back to normal full +commit. After performing either a fast or a full commit, JBD2 calls +``journal->j_fc_cleanup_cb`` to allow file systems to perform cleanups +for their internal fast commit related data structures. At the replay +time, JBD2 passes each and every fast commit block to the file system +via ``journal->j_fc_replay_cb``. Ext4 effectively uses this fast +commit mechanism to improve journal commit performance. + +It is possible for the file systems to perform fast commits +asynchronously (without involvement of journalling thread). All file +systems really need to do is to call :c:func:`jbd2_start_async_fc()` +before starting the commit and call :c:func:`jbd2_stop_async_fc()` +after the commit. This makes sure that the journalling thread and +other async fast committers don't interfere. + JBD2 also provides a way to block all transaction updates via :c:func:`jbd2_journal_lock_updates()` / :c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a