From patchwork Mon Feb 4 16:03:01 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 217978 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 8E1572C02A6 for ; Tue, 5 Feb 2013 03:03:09 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755016Ab3BDQDI (ORCPT ); Mon, 4 Feb 2013 11:03:08 -0500 Received: from li9-11.members.linode.com ([67.18.176.11]:46910 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754585Ab3BDQDH (ORCPT ); Mon, 4 Feb 2013 11:03:07 -0500 Received: from root (helo=closure.thunk.org) by imap.thunk.org with local-esmtp (Exim 4.80) (envelope-from ) id 1U2OVa-0008Kt-MA; Mon, 04 Feb 2013 16:03:02 +0000 Received: by closure.thunk.org (Postfix, from userid 15806) id 4BDDA368576; Mon, 4 Feb 2013 11:03:01 -0500 (EST) From: Theodore Ts'o To: Ext4 Developers List Cc: Theodore Ts'o , Andreas Dilger Subject: [PATCH RFC] ext4: optimize mballoc for large allocations Date: Mon, 4 Feb 2013 11:03:01 -0500 Message-Id: <1359993781-1542-1-git-send-email-tytso@mit.edu> X-Mailer: git-send-email 1.7.12.rc0.22.gcdd159b X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org The ext4 block allocator only maintains buddy bitmaps for chunks which are less than or equal to one quarter of a block group. That is, for a file aystem with a 1k blocksize, and where the number of blocks in a block group is 8192 blocks, the largest chunk size tracked by buddy bitmaps is 2048 blocks. For a file system with a 4k blocksize, and where the number of blocks in a block group is 32768 blocks, the largest chunk size tracked by buddy bitmaps is 8192 blocks. To work around this code, mballoc.c before this commit would truncate allocation requests to the number of blocks in a block group minus 10. Why 10? Aside from being a completely arbitrary number, it avoids block allocation to be a power of two larger than 25% of the block group. If you try to explicitly fallocate 50% of the block group size, this will demonstrate the problem; the block allocation code will scan the all of the blocks in the file system with cr==0 (since the request is for a natural power of two), but then completely fail for all blocks groups, since the buddy bitmaps don't track chunk sizes of 50% of the block group. To fix this, in these we use ext4_mb_complex_scan_group() instead of ext4_mb_simple_scan_group(). Signed-off-by: "Theodore Ts'o" Cc: Andreas Dilger --- fs/ext4/mballoc.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 061727a..ff95fac 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1884,15 +1884,19 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac, case 0: BUG_ON(ac->ac_2order == 0); - if (grp->bb_largest_free_order < ac->ac_2order) - return 0; - /* Avoid using the first bg of a flexgroup for data files */ if ((ac->ac_flags & EXT4_MB_HINT_DATA) && (flex_size >= EXT4_FLEX_SIZE_DIR_ALLOC_SCHEME) && ((group % flex_size) == 0)) return 0; + if ((ac->ac_2order > ac->ac_sb->s_bllocksize_bits+1) || + (free / fragments) >= ac->ac_g_ex.fe_len) + return 1; + + if (grp->bb_largest_free_order < ac->ac_2order) + return 0; + return 1; case 1: if ((free / fragments) >= ac->ac_g_ex.fe_len) @@ -2007,7 +2011,7 @@ repeat: } ac->ac_groups_scanned++; - if (cr == 0) + if (cr == 0 && ac->ac_2order < sb->s_blocksize_bits+2) ext4_mb_simple_scan_group(ac, &e4b); else if (cr == 1 && sbi->s_stripe && !(ac->ac_g_ex.fe_len % sbi->s_stripe)) @@ -4005,8 +4009,8 @@ ext4_mb_initialize_context(struct ext4_allocation_context *ac, len = ar->len; /* just a dirty hack to filter too big requests */ - if (len >= EXT4_CLUSTERS_PER_GROUP(sb) - 10) - len = EXT4_CLUSTERS_PER_GROUP(sb) - 10; + if (len >= EXT4_CLUSTERS_PER_GROUP(sb)) + len = EXT4_CLUSTERS_PER_GROUP(sb); /* start searching from the goal */ goal = ar->goal;