From patchwork Fri May 15 10:07:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Zhuravlev X-Patchwork-Id: 1290992 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=whamcloud.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=whamcloud.com header.i=@whamcloud.com header.a=rsa-sha256 header.s=selector2 header.b=gVAMwkW8; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49Nkbj4W2nz9sSF for ; Fri, 15 May 2020 20:07:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728038AbgEOKHM (ORCPT ); Fri, 15 May 2020 06:07:12 -0400 Received: from mail-bn8nam12on2080.outbound.protection.outlook.com ([40.107.237.80]:4064 "EHLO NAM12-BN8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728007AbgEOKHL (ORCPT ); Fri, 15 May 2020 06:07:11 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=brAWLP07LAeuUrRS2wtI9qAb7pqsMHBoIFNpLou4UYArDcYBE5d2HqknO4lkRsiss12pTarFzjCmfZYYbgrVpLzV3fN4LfiZUPXzR3PCmDTSl4PYXFNXbYUPH3tRHCw6wQgVeh43KtbtIgXuy3IVMSwSHrbnz1UCbiU5K0/ROa+ipBWzlmd2cwQ0OOpF/RHZ20Wl6s6sDOdVSZ8GucacNAXcu+YZLbTO7k8J8L0pRavIBnFuy6a7Kq4sPnnKGomlxfm7lzMFa2GtWVScVMSWU4iPthklZctsIyYBUlSKCtSO/3Ay5HQ1hzWh6g8cSP1rdvkWOJZTO7IYNWpMJkuQkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cMFq8AYgnLKU63e5C6NA2X9zMqu2At2u1uPg+xKH8E8=; b=VRxriAjhNzGe/YB/Zmh8TpgzJIQovFIxUnuaeIOy8GtRNiAcsJtcj+TN7sqioxySxGBkUsOz9dsaFZobKqiGu5vz5b5iBaK/a47QxgBHfaKc4F8/2HzWTb0KKxjaNFgIcrUR9eGTWcMhvO4hGUFf945Er3Pqdsb02IMHSpQ+B3zCg3uQ0DPq+DUCdGBKEuwThNBLaGvBeoR/v1+kT4CePCHGBfM0ETVFGOTB1oL/WBTbbKh0ai+6PIAw+765aS8HRYIXc2g6XmYzIpH06FGVkzKOAV6zSlkY0FVEkZRl1Y0k00J6V6xu8hzN7CZ3yc1fDQjXgI3IjH5/3AQ7YXIkdA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=whamcloud.com; dmarc=pass action=none header.from=whamcloud.com; dkim=pass header.d=whamcloud.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=whamcloud.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cMFq8AYgnLKU63e5C6NA2X9zMqu2At2u1uPg+xKH8E8=; b=gVAMwkW8Brna78wa/IFXPXiz+iDh3f8rNA1yCZPe7vTF9l70EKMr3dPHKGpxzLs+pSiksdPSnwGexwSSnybvT2F+wzUxWMADvfPfY1av3RPqCZ7Lb5X17L36xoTwYv3V1E6vDLN0+d+vW45JGEpTyMReOGc+/QMjD//yK7NJv5M= Received: from DM6PR19MB2441.namprd19.prod.outlook.com (2603:10b6:5:18d::16) by DM6PR19MB4060.namprd19.prod.outlook.com (2603:10b6:5:249::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2979.26; Fri, 15 May 2020 10:07:06 +0000 Received: from DM6PR19MB2441.namprd19.prod.outlook.com ([fe80::b111:c44a:87ea:4bf4]) by DM6PR19MB2441.namprd19.prod.outlook.com ([fe80::b111:c44a:87ea:4bf4%7]) with mapi id 15.20.3000.022; Fri, 15 May 2020 10:07:06 +0000 From: Alex Zhuravlev To: "linux-ext4@vger.kernel.org" Subject: [PATCH 1/2] ext4: mballoc - prefetching for bitmaps Thread-Topic: [PATCH 1/2] ext4: mballoc - prefetching for bitmaps Thread-Index: AQHWKqCWw4qzGBAMuEeYjSA1mDJZiA== Date: Fri, 15 May 2020 10:07:06 +0000 Message-ID: <262A2973-9B2D-4DBE-8752-67E91D52C632@whamcloud.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=none action=none header.from=whamcloud.com; x-originating-ip: [95.73.85.160] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: a00365c3-2362-413a-8fdb-08d7f8b7b95e x-ms-traffictypediagnostic: DM6PR19MB4060: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:3631; x-forefront-prvs: 04041A2886 x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: XGvpvOqU1WvfczwAFrhZj4n3sF1B0AsnT89/yZ7Ut/B7Z9Z2Agi0VRVBvcT75sUDg2xMtilH3X5BDlGbzgzEXowH4nEWR+HMB3sf35QmQb0m7L2gVmbGWfgflxnnYWQpiINw3dmJ/qoLBpGIJOFOuDDCximkpKXQuo2tf+qW59ZRSQawU1WNf687MVY+nRjbM94R6ZcBhNMI8XAF2F1eUZY4U5f5XG7qjNebdFR14zDrKw69+XJIrrrxcb5JRCFz6uF09eBzHlWP62N+tmQ5t3+4Zn094q+ZAHAfxza3TvL/vv2cXb7rmXvlKvB1DBz3G2ws2wXfNCX5jZhlCiwWrh0ZiGYPs8ZAFc+6EN2skctuxAMIKiqv4RP3WPFkiZ5ClzLtOUDfj6uTY7QMREeRaTtw78tFO5UwTfsr4tDyKWjduYX+0/rTJkrMbHktxXdJ x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR19MB2441.namprd19.prod.outlook.com;PTR:;CAT:NONE;SFTY:;SFS:(4636009)(366004)(39850400004)(136003)(346002)(376002)(396003)(36756003)(186003)(6916009)(66446008)(8676002)(6486002)(64756008)(91956017)(30864003)(76116006)(5660300002)(8936002)(71200400001)(66556008)(26005)(66946007)(2906002)(6506007)(33656002)(2616005)(6512007)(316002)(86362001)(478600001)(66476007);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata: rPiCymAebFu7+y/599tAlJzxueHYe437JzjJqAkljD2VaBfj618VSgI2w/OyIHLU6uz9Dp/n4i1Mr7R98aBynMm17JqSXx4h7dSO9bpHX4vG/iggq0akqiik7gF/avb8ULido3KL0BhgfoXW2CxLr09q5JGRI7HzNCvP50tewOkQqVkuyuEReXGRTRAVE0HCeRIrEVKxLTXXcIZjp8Uwwy/+2Z3mQjn7fx41NefNx8Jx/JRQwM5eFw5KfjkS9haCAG2gHUYj41oOOwBpvtnympBFq0PU0qG8Ff2sI6b7O7E2Amr3zeC4PSnWUAODtLTY/8/62jO36RPvOBAwkhSd0wwUFZDbtG5ZClkv01hVKQvdxHOQj5Hk7uGOo9d7hu30KMXFtSs8olwzVyX+76Y7dki8qi0rc4LUER0kutqCJ+RBz0X1Vz4IkOlAwxVJ5ejsYkVdCPBjWPotrhHkLTOebqaZPKMZbK7A39vqJQanlZE= x-ms-exchange-transport-forked: True Content-ID: <6D13CC8853AB3D45A1AB96B11B09469C@namprd19.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: whamcloud.com X-MS-Exchange-CrossTenant-Network-Message-Id: a00365c3-2362-413a-8fdb-08d7f8b7b95e X-MS-Exchange-CrossTenant-originalarrivaltime: 15 May 2020 10:07:06.8080 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 753b6e26-6fd3-43e6-8248-3f1735d59bb4 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: IBN4oj5Jzw16+K3uSLQAjS69zFJzjsA5dOvqV8mWsU5kQSQHA0KzetxX5OeTCweH1vlbmTi0PCwHUtpEqsnHhg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR19MB4060 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org This should significantly improve bitmap loading, especially for flex groups as it tries to load all bitmaps within a flex.group instead of one by one synchronously. Prefetching is done in 8 * flex_bg groups, so it should be 8 read-ahead reads for a single allocating thread. At the end of allocation the thread waits for read-ahead completion and initializes buddy information so that read-aheads are not lost in case of memory pressure. At cr=0 the number of prefetching IOs is limited per allocation context to prevent a situation when mballoc loads thousands of bitmaps looking for a perfect group and ignoring groups with good chunks. Together with the patch "ext4: limit scanning of uninitialized groups" the mount time (which includes few tiny allocations) of a 1PB filesystem is reduced significantly: 0% full 50%-full unpatched patched mount time 33s 9279s 563s Signed-off-by: Alex Zhuravlev Reviewed-by: Andreas Dilger Reported-by: kbuild test robot Reported-by: kbuild test robot Signed-off-by: Alex Zhuravlev Reviewed-by: Andreas Dilger --- fs/ext4/balloc.c | 12 ++++- fs/ext4/ext4.h | 8 +++- fs/ext4/mballoc.c | 116 +++++++++++++++++++++++++++++++++++++++++++++- fs/ext4/mballoc.h | 2 + fs/ext4/sysfs.c | 4 ++ 5 files changed, 138 insertions(+), 4 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index a32e5f7b5385..6712146195ed 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -413,7 +413,8 @@ static int ext4_validate_block_bitmap(struct super_block *sb, * Return buffer_head on success or an ERR_PTR in case of failure. */ struct buffer_head * -ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group) +ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group, + bool ignore_locked) { struct ext4_group_desc *desc; struct ext4_sb_info *sbi = EXT4_SB(sb); @@ -444,6 +445,13 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group) if (bitmap_uptodate(bh)) goto verify; + if (ignore_locked && buffer_locked(bh)) { + /* buffer under IO already, do not wait + * if called for prefetching */ + put_bh(bh); + return NULL; + } + lock_buffer(bh); if (bitmap_uptodate(bh)) { unlock_buffer(bh); @@ -534,7 +542,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group) struct buffer_head *bh; int err; - bh = ext4_read_block_bitmap_nowait(sb, block_group); + bh = ext4_read_block_bitmap_nowait(sb, block_group, false); if (IS_ERR(bh)) return bh; err = ext4_wait_block_bitmap(sb, block_group, bh); diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 91eb4381cae5..521fbcd8efc7 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1483,6 +1483,8 @@ struct ext4_sb_info { /* where last allocation was done - for stream allocation */ unsigned long s_mb_last_group; unsigned long s_mb_last_start; + unsigned int s_mb_prefetch; + unsigned int s_mb_prefetch_limit; /* stats for buddy allocator */ atomic_t s_bal_reqs; /* number of reqs with len > 1 */ @@ -2420,7 +2422,8 @@ extern struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb, extern int ext4_should_retry_alloc(struct super_block *sb, int *retries); extern struct buffer_head *ext4_read_block_bitmap_nowait(struct super_block *sb, - ext4_group_t block_group); + ext4_group_t block_group, + bool ignore_locked); extern int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group, struct buffer_head *bh); @@ -3119,6 +3122,7 @@ struct ext4_group_info { (1 << EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT) #define EXT4_GROUP_INFO_IBITMAP_CORRUPT \ (1 << EXT4_GROUP_INFO_IBITMAP_CORRUPT_BIT) +#define EXT4_GROUP_INFO_BBITMAP_READ_BIT 4 #define EXT4_MB_GRP_NEED_INIT(grp) \ (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state))) @@ -3133,6 +3137,8 @@ struct ext4_group_info { (set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state))) #define EXT4_MB_GRP_CLEAR_TRIMMED(grp) \ (clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state))) +#define EXT4_MB_GRP_TEST_AND_SET_READ(grp) \ + (test_and_set_bit(EXT4_GROUP_INFO_BBITMAP_READ_BIT, &((grp)->bb_state))) #define EXT4_MAX_CONTENTION 8 #define EXT4_CONTENTION_THRESHOLD 2 diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index afb8bd9a10e9..ebfe258bfd0f 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -861,7 +861,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp) bh[i] = NULL; continue; } - bh[i] = ext4_read_block_bitmap_nowait(sb, group); + bh[i] = ext4_read_block_bitmap_nowait(sb, group, false); if (IS_ERR(bh[i])) { err = PTR_ERR(bh[i]); bh[i] = NULL; @@ -2127,6 +2127,96 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac, return 0; } +/* + * each allocation context (i.e. a thread doing allocation) has own + * sliding prefetch window of @s_mb_prefetch size which starts at the + * very first goal and moves ahead of scaning. + * a side effect is that subsequent allocations will likely find + * the bitmaps in cache or at least in-flight. + */ +static void +ext4_mb_prefetch(struct ext4_allocation_context *ac, + ext4_group_t start) +{ + struct super_block *sb = ac->ac_sb; + ext4_group_t ngroups = ext4_get_groups_count(sb); + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_group_info *grp; + ext4_group_t nr, group = start; + struct buffer_head *bh; + + /* limit prefetching at cr=0, otherwise mballoc can + * spend a lot of time loading imperfect groups */ + if (ac->ac_criteria < 2 && ac->ac_prefetch_ios >= sbi->s_mb_prefetch_limit) + return; + + /* batch prefetching to get few READs in flight */ + nr = ac->ac_prefetch - group; + if (ac->ac_prefetch < group) + /* wrapped to the first groups */ + nr += ngroups; + if (nr > 0) + return; + BUG_ON(nr < 0); + + nr = sbi->s_mb_prefetch; + if (ext4_has_feature_flex_bg(sb)) { + /* align to flex_bg to get more bitmas with a single IO */ + nr = (group / sbi->s_mb_prefetch) * sbi->s_mb_prefetch; + nr = nr + sbi->s_mb_prefetch - group; + } + while (nr-- > 0) { + grp = ext4_get_group_info(sb, group); + + /* prevent expensive getblk() on groups w/ IO in progress */ + if (EXT4_MB_GRP_TEST_AND_SET_READ(grp)) + goto next; + + /* ignore empty groups - those will be skipped + * during the scanning as well */ + if (grp->bb_free > 0 && EXT4_MB_GRP_NEED_INIT(grp)) { + bh = ext4_read_block_bitmap_nowait(sb, group, true); + if (bh && !IS_ERR(bh)) { + if (!buffer_uptodate(bh)) + ac->ac_prefetch_ios++; + brelse(bh); + } + } +next: + if (++group >= ngroups) + group = 0; + } + ac->ac_prefetch = group; +} + +static void +ext4_mb_prefetch_fini(struct ext4_allocation_context *ac) +{ + struct ext4_group_info *grp; + ext4_group_t group; + int nr, rc; + + /* initialize last window of prefetched groups */ + nr = ac->ac_prefetch_ios; + if (nr > EXT4_SB(ac->ac_sb)->s_mb_prefetch) + nr = EXT4_SB(ac->ac_sb)->s_mb_prefetch; + group = ac->ac_prefetch; + if (!group) + group = ext4_get_groups_count(ac->ac_sb); + group--; + while (nr-- > 0) { + grp = ext4_get_group_info(ac->ac_sb, group); + if (grp->bb_free > 0 && EXT4_MB_GRP_NEED_INIT(grp)) { + rc = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS); + if (rc) + break; + } + if (!group) + group = ext4_get_groups_count(ac->ac_sb); + group--; + } +} + static noinline_for_stack int ext4_mb_regular_allocator(struct ext4_allocation_context *ac) { @@ -2200,6 +2290,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) * from the goal value specified */ group = ac->ac_g_ex.fe_group; + ac->ac_prefetch = group; for (i = 0; i < ngroups; group++, i++) { int ret = 0; @@ -2211,6 +2302,8 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) if (group >= ngroups) group = 0; + ext4_mb_prefetch(ac, group); + /* This now checks without needing the buddy page */ ret = ext4_mb_good_group(ac, group, cr); if (ret <= 0) { @@ -2283,6 +2376,8 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) out: if (!err && ac->ac_status != AC_STATUS_FOUND && first_err) err = first_err; + /* use prefetched bitmaps to init buddy so that read info is not lost */ + ext4_mb_prefetch_fini(ac); return err; } @@ -2542,6 +2637,25 @@ static int ext4_mb_init_backend(struct super_block *sb) goto err_freebuddy; } + if (ext4_has_feature_flex_bg(sb)) { + /* a single flex group is supposed to be read by a single IO */ + sbi->s_mb_prefetch = 1 << sbi->s_es->s_log_groups_per_flex; + sbi->s_mb_prefetch *= 8; /* 8 prefetch IOs in flight at most */ + } else { + sbi->s_mb_prefetch = 32; + } + if (sbi->s_mb_prefetch > ext4_get_groups_count(sb)) + sbi->s_mb_prefetch = ext4_get_groups_count(sb); + /* now many real IOs to prefetch within a single allocation at cr=0 + * given cr=0 is an CPU-related optimization we shouldn't try to + * load too many groups, at some point we should start to use what + * we've got in memory. + * with an average random access time 5ms, it'd take a second to get + * 200 groups (* N with flex_bg), so let's make this limit 4 */ + sbi->s_mb_prefetch_limit = sbi->s_mb_prefetch * 4; + if (sbi->s_mb_prefetch_limit > ext4_get_groups_count(sb)) + sbi->s_mb_prefetch_limit = ext4_get_groups_count(sb); + return 0; err_freebuddy: diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h index 88c98f17e3d9..c96a2bd81f72 100644 --- a/fs/ext4/mballoc.h +++ b/fs/ext4/mballoc.h @@ -175,6 +175,8 @@ struct ext4_allocation_context { struct page *ac_buddy_page; struct ext4_prealloc_space *ac_pa; struct ext4_locality_group *ac_lg; + ext4_group_t ac_prefetch; + int ac_prefetch_ios; /* number of initialied prefetch IO */ }; #define AC_STATUS_CONTINUE 1 diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index 04bfaf63752c..5f443f9d54b8 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -240,6 +240,8 @@ EXT4_RO_ATTR_ES_STRING(last_error_func, s_last_error_func, 32); EXT4_ATTR(first_error_time, 0444, first_error_time); EXT4_ATTR(last_error_time, 0444, last_error_time); EXT4_ATTR(journal_task, 0444, journal_task); +EXT4_RW_ATTR_SBI_UI(mb_prefetch, s_mb_prefetch); +EXT4_RW_ATTR_SBI_UI(mb_prefetch_limit, s_mb_prefetch_limit); static unsigned int old_bump_val = 128; EXT4_ATTR_PTR(max_writeback_mb_bump, 0444, pointer_ui, &old_bump_val); @@ -283,6 +285,8 @@ static struct attribute *ext4_attrs[] = { #ifdef CONFIG_EXT4_DEBUG ATTR_LIST(simulate_fail), #endif + ATTR_LIST(mb_prefetch), + ATTR_LIST(mb_prefetch_limit), NULL, }; ATTRIBUTE_GROUPS(ext4); From patchwork Fri May 15 10:14:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Zhuravlev X-Patchwork-Id: 1291004 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=whamcloud.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=whamcloud.com header.i=@whamcloud.com header.a=rsa-sha256 header.s=selector2 header.b=Wl1Svl6q; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49NkmX5LFRz9sSF for ; Fri, 15 May 2020 20:14:52 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728334AbgEOKOn (ORCPT ); Fri, 15 May 2020 06:14:43 -0400 Received: from mail-eopbgr700081.outbound.protection.outlook.com ([40.107.70.81]:43104 "EHLO NAM04-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727999AbgEOKOk (ORCPT ); Fri, 15 May 2020 06:14:40 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dmv8jCdfQovxf9f09Nu1ayW9dMRjl7AYO8a17LI1k6ct6j5PP9HThEgEBw15O4rhvsp0SaIRRKPF5X5R2PjWA7Zlr7jJpQNLl1fSFzvrsZAa9itk19v1u6UKeJyCnnVx5yFz7y/ypFksPmgxqfFJLy4mcOUy/QcHgeKJULyYC4GxlHErFl+/y0il7GrSJxMQ+Oq1ITjEnvdEq2CUkkL+SxJxtyihYvMBJg1SPzG1qTwwNneT6ccZtqdWh4Ad1uQ0fvbbPgs5QYj0XOWRjA6X6p6/YC0bzJ0W7TefjdCJsNQycCZyKgWz8+r3e2wlD1Dxo+Bz9X0bplVStFr0hQU5pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5sBykKiacna8DIwQIC2N0581tRK3nc8M/L0llZggvFo=; b=NrfoejGrkBo7oC2JgnpS873uU9PlHwGtSGtjYKofTV98rMsYEWL6cg4ztmG78m+dB1RZWVy9XLVmcVEDp5pTHXdKpVHHGGZrA5d4siaKmqns6uwULbRiD6diwIuQlLJ2fqDA85DJZX2AIfeW8hnlhSYRBbMMfFOrhav3lbNIpSthwuBa33HnO/WHQPUB4JmfptK1UX7eNmzxQAuh/bzoi5NlWJefq3dQTVNaOBYpitw+tHl6XvJck1reT8opMbbZdNHuI++VRBIUljKErqNVbbVMVaR6Y5txZDh1AwyyiynzUHj+f759iVuwGWfX9WboG0CLkzqPL+QTjFpO/gyqkA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=whamcloud.com; dmarc=pass action=none header.from=whamcloud.com; dkim=pass header.d=whamcloud.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=whamcloud.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5sBykKiacna8DIwQIC2N0581tRK3nc8M/L0llZggvFo=; b=Wl1Svl6qaFoh10obuOXJ8LlBtncp3Pb5Eh85fyVIT+Q6A0uDmxI6AsgcOQ1gKFsjPu3zhMqGTaNhOtZDHCFzwm6OOM5q6tz3+GZYqv/SlOkvYz7GRWQUveUqEfoTCiFMOEbCvVy8LdyNuY4H0LsrKBjWnTeVTOpXP0Cg1OfwzQY= Received: from DM6PR19MB2441.namprd19.prod.outlook.com (2603:10b6:5:18d::16) by DM6PR19MB4060.namprd19.prod.outlook.com (2603:10b6:5:249::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2979.26; Fri, 15 May 2020 10:14:38 +0000 Received: from DM6PR19MB2441.namprd19.prod.outlook.com ([fe80::b111:c44a:87ea:4bf4]) by DM6PR19MB2441.namprd19.prod.outlook.com ([fe80::b111:c44a:87ea:4bf4%7]) with mapi id 15.20.3000.022; Fri, 15 May 2020 10:14:38 +0000 From: Alex Zhuravlev To: "linux-ext4@vger.kernel.org" Subject: [PATCH 2/2] ext4: mballoc - limit scanning of uninitialized groups Thread-Topic: [PATCH 2/2] ext4: mballoc - limit scanning of uninitialized groups Thread-Index: AQHWKqGjC/tPhF2lAEWso3gCPPPysQ== Date: Fri, 15 May 2020 10:14:38 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=none action=none header.from=whamcloud.com; x-originating-ip: [95.73.85.160] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: dca91191-d4fa-48f3-7dea-08d7f8b8c676 x-ms-traffictypediagnostic: DM6PR19MB4060: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 04041A2886 x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: rLHWmnkh+L9JnAMCrI9KHd6wfikOTZe2svf20j0fBZ7GgqxAMSz+sXLSv6OqK8muBeNhTQM14Vi97e0NMTL1B6Pcckyty9VH8JLMplUH/hVT6rKOQ9zT2fhKYNbrJ60SY/BbPbKx+CUbUeWAk31xmnb46aXS4NaSHCt3vUA5WakqorFvAub98PZtY3EY6i8OH26/Fgr/XNZrCWlwrh1W8r8Lywen1JNQIsbljiOxk/UDI/AynYlSi0Wec3XuWyvy1VnSGwy6BighiOrIElrO3aWY/4Me8+Fah7TgDUxd3nXoTELg2EJKt2zL/ebi8gf1ES8bCn0pGm1yJW3Rvu1ZlHLYBwEJRidK3j0jHm/N5jWgPlsJWSnKlygRvBNhfi/xv6/181L2/1IBA908Ky7o4QZBzPqjtG/MnfYd3TfA+aKqd75cfv7i03tIZAd9O+6D x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR19MB2441.namprd19.prod.outlook.com;PTR:;CAT:NONE;SFTY:;SFS:(4636009)(366004)(39850400004)(136003)(346002)(376002)(396003)(36756003)(186003)(6916009)(66446008)(8676002)(6486002)(64756008)(91956017)(76116006)(5660300002)(8936002)(71200400001)(66556008)(26005)(66946007)(2906002)(6506007)(33656002)(2616005)(6512007)(316002)(86362001)(478600001)(66476007);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata: t5SlzPs0pAyLX1Za/xb+DrQBXhCbUM5ILkhSiGFX/dCcNmCJEwSVlIh1gT9A7E1hgl4wzDYkqrPCSU73kG+3EV6IRLAoGJnVXtJ69SUcHqgOaK2xhB8HMtKfTKlQ/Ghkz4/CHqmdWSGLgNobuWwM1p4YhenOuGJ737lPbltLhMPtz1spZnl1t7vmesWxRV43XIxMlGHAhZLFahHS/VAqpjslGO5MOSolPVKkDtO+XILPlRL/078QOQu4APnZYyhfz9QbVQ+MQqoFPf+ySlYdkYClRkdyJCo4jGIabNs6ge5fV5hYqPpUN3xGPuzqaP2ChX4hltaIrlFMn1/m8y12ozVD/VQM6ntLOw+xHH1FWudD0XmCJGF8y16oBPOCkbb/sU3WH9SMqdsxnyQilt6IXW1nF5mjGDPU76irn3LCkUFblkADnYBOWDWVSxMaCOGacr9N7dcqHfLtM8rAqT0WhlmEIGwKozRsmfsfV7030oE= x-ms-exchange-transport-forked: True Content-ID: <012088DF443C674F8416725D89A9BB92@namprd19.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: whamcloud.com X-MS-Exchange-CrossTenant-Network-Message-Id: dca91191-d4fa-48f3-7dea-08d7f8b8c676 X-MS-Exchange-CrossTenant-originalarrivaltime: 15 May 2020 10:14:38.2440 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 753b6e26-6fd3-43e6-8248-3f1735d59bb4 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: fu0qeOVqzBADRthZ9fWBUZgzgx+TtzyRUNMBXCk5tRX5wVggemcF7Bm8pXNtNRmIFM9Jm2Kpbq5Ie4rF/3wknQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR19MB4060 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org cr=0 is supposed to be an optimization to save CPU cycles, but if buddy data (in memory) is not initialized then all this makes no sense as we have to do sync IO taking a lot of cycles. also, at cr=0 mballoc doesn't store any available chunk. cr=1 also skips groups using heuristic based on avg. fragment size. it's more useful to skip such groups and switch to cr=2 where groups will be scanned for available chunks. using sparse image and dm-slow virtual device of 120TB was simulated. then the image was formatted and filled using debugfs to mark ~85% of available space as busy. the very first allocation w/o the patch could not complete in half an hour (according to vmstat it would take ~10-1 hours). with the patch applied the allocation took ~20 seconds. Signed-off-by: Alex Zhuravlev Reviewed-by: Andreas Dilger --- fs/ext4/mballoc.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 30d5d97548c4..afb8bd9a10e9 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1877,6 +1877,21 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac, return 0; } +static inline int ext4_mb_uninit_on_disk(struct super_block *sb, + ext4_group_t group) +{ + struct ext4_group_desc *desc; + + if (!ext4_has_group_desc_csum(sb)) + return 0; + + desc = ext4_get_group_desc(sb, group, NULL); + if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) + return 1; + + return 0; +} + /* * The routine scans buddy structures (not bitmap!) from given order * to max order and tries to find big enough chunk to satisfy the req @@ -2060,7 +2075,15 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac, /* We only do this if the grp has never been initialized */ if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { - int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS); + int ret; + + /* cr=0/1 is a very optimistic search to find large + * good chunks almost for free. if buddy data is + * not ready, then this optimization makes no sense */ + + if (cr < 2 && !ext4_mb_uninit_on_disk(ac->ac_sb, group)) + return 0; + ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS); if (ret) return ret; }