diff mbox

ext2/ext3: allocate ->s_blockgroup_lock separately to avoid wasting space

Message ID Pine.LNX.4.64.0811141115100.29826@melkki.cs.Helsinki.FI
State Superseded, archived
Headers show

Commit Message

Pekka Enberg Nov. 14, 2008, 9:17 a.m. UTC
From: Pekka Enberg <penberg@cs.helsinki.fi>

As spotted by kmemtrace, struct ext2_sb_info is 17024 bytes and ext3_sb_info is
17152 bytes on 64-bit which makes them a very bad fit for SLAB allocators. In
fact, both allocations are round up to the next available page size of 
order 3 which is 32 KB.

The culprit if the wasted memory is the ->s_blockgroup_lock which can be as big
as 16 KB when CONFIG_NR_CPUS is set to 32. As struct blockgroup_lock is a
perfect fit for order 2 page in the worst case, allocate ->s_blockgroup_lock
separately to avoid wasting space.

The change shrinks struct ext2_sb_info to 592 bytes and struct ext3_sb_info to
640 bytes which fits into a 1024 byte slab cache so now we allocate 16 KB + 1
KB instead of 32 KB saving 15 KB of memory!

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
---
 fs/ext2/super.c                 |   10 +++++++++-
 fs/ext3/super.c                 |   10 +++++++++-
 include/linux/blockgroup_lock.h |    2 +-
 include/linux/ext2_fs_sb.h      |    2 +-
 include/linux/ext3_fs_sb.h      |    2 +-
 5 files changed, 21 insertions(+), 5 deletions(-)

Comments

Andreas Dilger Nov. 14, 2008, 9:26 p.m. UTC | #1
On Nov 14, 2008  11:17 +0200, Pekka J Enberg wrote:
> As spotted by kmemtrace, struct ext2_sb_info is 17024 bytes and ext3_sb_info is
> 17152 bytes on 64-bit which makes them a very bad fit for SLAB allocators. In
> fact, both allocations are round up to the next available page size of 
> order 3 which is 32 KB.
> 
> The culprit if the wasted memory is the ->s_blockgroup_lock which can be as
> big as 16 KB when CONFIG_NR_CPUS is set to 32. As struct blockgroup_lock is a
> perfect fit for order 2 page in the worst case, allocate ->s_blockgroup_lock
> separately to avoid wasting space.
> 
> The change shrinks struct ext2_sb_info to 592 bytes and struct ext3_sb_info to
> 640 bytes which fits into a 1024 byte slab cache so now we allocate 16 KB + 1
> KB instead of 32 KB saving 15 KB of memory!
> 
> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

This looks very reasonable, with some minor comments below.
Could you please also include a patch for ext4.  Also, Andrew prefers that
the patches for ext2/ext3/ext4 are in separate emails.

> --- a/include/linux/blockgroup_lock.h
> +++ b/include/linux/blockgroup_lock.h
>  #define sb_bgl_lock(sb, block_group) \
> -	(&(sb)->s_blockgroup_lock.locks[(block_group) & (NR_BG_LOCKS-1)].lock)
> +	(&(sb)->s_blockgroup_lock->locks[(block_group) & (NR_BG_LOCKS-1)].lock)

How the struct is allocated seems like an implementation detail that doesn't
belong in blockgroup_lock.h at all, because "sb" is not "struct superblock"
but rather "struct ext[23]_sb_info".  In fact, changing this without also
patching ext4 would cause ext4 to break.

I would suggest to change this to take the s_blockgroup_lock as a parameter,

#define bgl_lock_ptr(bgl, block_group)
	(bgl->locks[(block_group) & (NR_BG_LOCKS - 1)].lock)

and then in ext[234]_fs_sb.h add a new helper in the same (first) patch:

#define sb_bgl_lock(sbi, block_group)
	bgl_lock_ptr(&sbi->s_blockgroup_lock, block_group)

and remove sb_bgl_lock() from blockgroup_lock.h entirely.  As part of the
later patches to change the s_blockgroup_lock allocations for each of
ext[234] this changes in ext[234]_fs_sb.h to:

#define sb_bgl_lock(sbi, block_group)
	bgl_lock_ptr(sbi->s_blockgroup_lock, block_group)


This allows each of the later patches to be landed separately without
breaking the build.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Nov. 16, 2008, 12:58 p.m. UTC | #2
On Fri, 2008-11-14 at 11:17 +0200, Pekka J Enberg wrote:
> From: Pekka Enberg <penberg@cs.helsinki.fi>
> 
> As spotted by kmemtrace, struct ext2_sb_info is 17024 bytes and ext3_sb_info is
> 17152 bytes on 64-bit which makes them a very bad fit for SLAB allocators. In
> fact, both allocations are round up to the next available page size of 
> order 3 which is 32 KB.
> 
> The culprit if the wasted memory is the ->s_blockgroup_lock which can be as big
> as 16 KB when CONFIG_NR_CPUS is set to 32. As struct blockgroup_lock is a
> perfect fit for order 2 page in the worst case, allocate ->s_blockgroup_lock
> separately to avoid wasting space.

And here I was thinking that NR_CPUS=4096 is currently our worst
case ;-)

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pekka Enberg Nov. 17, 2008, 9:27 p.m. UTC | #3
Peter Zijlstra wrote:
> On Fri, 2008-11-14 at 11:17 +0200, Pekka J Enberg wrote:
>> From: Pekka Enberg <penberg@cs.helsinki.fi>
>>
>> As spotted by kmemtrace, struct ext2_sb_info is 17024 bytes and ext3_sb_info is
>> 17152 bytes on 64-bit which makes them a very bad fit for SLAB allocators. In
>> fact, both allocations are round up to the next available page size of 
>> order 3 which is 32 KB.
>>
>> The culprit if the wasted memory is the ->s_blockgroup_lock which can be as big
>> as 16 KB when CONFIG_NR_CPUS is set to 32. As struct blockgroup_lock is a
>> perfect fit for order 2 page in the worst case, allocate ->s_blockgroup_lock
>> separately to avoid wasting space.
> 
> And here I was thinking that NR_CPUS=4096 is currently our worst
> case ;-)

Sure but look at <linux/blockgroup_lock.h>. NR_BG_LOCKS is capped to 128 
for >= 32 CPUs.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 647cd88..da8bdea 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -132,6 +132,7 @@  static void ext2_put_super (struct super_block * sb)
 	percpu_counter_destroy(&sbi->s_dirs_counter);
 	brelse (sbi->s_sbh);
 	sb->s_fs_info = NULL;
+	kfree(sbi->s_blockgroup_lock);
 	kfree(sbi);
 
 	return;
@@ -756,6 +757,13 @@  static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
 	if (!sbi)
 		return -ENOMEM;
+
+	sbi->s_blockgroup_lock =
+		kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
+	if (!sbi->s_blockgroup_lock) {
+		kfree(sbi);
+		return -ENOMEM;
+	}
 	sb->s_fs_info = sbi;
 	sbi->s_sb_block = sb_block;
 
@@ -983,7 +991,7 @@  static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 		printk ("EXT2-fs: not enough memory\n");
 		goto failed_mount;
 	}
-	bgl_lock_init(&sbi->s_blockgroup_lock);
+	bgl_lock_init(sbi->s_blockgroup_lock);
 	sbi->s_debts = kcalloc(sbi->s_groups_count, sizeof(*sbi->s_debts), GFP_KERNEL);
 	if (!sbi->s_debts) {
 		printk ("EXT2-fs: not enough memory\n");
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index f6c94f2..f41df22 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -439,6 +439,7 @@  static void ext3_put_super (struct super_block * sb)
 		ext3_blkdev_remove(sbi);
 	}
 	sb->s_fs_info = NULL;
+	kfree(sbi->s_blockgroup_lock);
 	kfree(sbi);
 	return;
 }
@@ -1548,6 +1549,13 @@  static int ext3_fill_super (struct super_block *sb, void *data, int silent)
 	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
 	if (!sbi)
 		return -ENOMEM;
+
+	sbi->s_blockgroup_lock =
+		kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
+	if (!sbi->s_blockgroup_lock) {
+		kfree(sbi);
+		return -ENOMEM;
+	}
 	sb->s_fs_info = sbi;
 	sbi->s_mount_opt = 0;
 	sbi->s_resuid = EXT3_DEF_RESUID;
@@ -1788,7 +1796,7 @@  static int ext3_fill_super (struct super_block *sb, void *data, int silent)
 		goto failed_mount;
 	}
 
-	bgl_lock_init(&sbi->s_blockgroup_lock);
+	bgl_lock_init(sbi->s_blockgroup_lock);
 
 	for (i = 0; i < db_count; i++) {
 		block = descriptor_loc(sb, logic_sb_block, i);
diff --git a/include/linux/blockgroup_lock.h b/include/linux/blockgroup_lock.h
index 8607312..d6d4787 100644
--- a/include/linux/blockgroup_lock.h
+++ b/include/linux/blockgroup_lock.h
@@ -54,6 +54,6 @@  static inline void bgl_lock_init(struct blockgroup_lock *bgl)
  * superblock types
  */
 #define sb_bgl_lock(sb, block_group) \
-	(&(sb)->s_blockgroup_lock.locks[(block_group) & (NR_BG_LOCKS-1)].lock)
+	(&(sb)->s_blockgroup_lock->locks[(block_group) & (NR_BG_LOCKS-1)].lock)
 
 #endif
diff --git a/include/linux/ext2_fs_sb.h b/include/linux/ext2_fs_sb.h
index f273415..7e61de9 100644
--- a/include/linux/ext2_fs_sb.h
+++ b/include/linux/ext2_fs_sb.h
@@ -101,7 +101,7 @@  struct ext2_sb_info {
 	struct percpu_counter s_freeblocks_counter;
 	struct percpu_counter s_freeinodes_counter;
 	struct percpu_counter s_dirs_counter;
-	struct blockgroup_lock s_blockgroup_lock;
+	struct blockgroup_lock *s_blockgroup_lock;
 	/* root of the per fs reservation window tree */
 	spinlock_t s_rsv_window_lock;
 	struct rb_root s_rsv_window_root;
diff --git a/include/linux/ext3_fs_sb.h b/include/linux/ext3_fs_sb.h
index b65f028..ec10d96 100644
--- a/include/linux/ext3_fs_sb.h
+++ b/include/linux/ext3_fs_sb.h
@@ -60,7 +60,7 @@  struct ext3_sb_info {
 	struct percpu_counter s_freeblocks_counter;
 	struct percpu_counter s_freeinodes_counter;
 	struct percpu_counter s_dirs_counter;
-	struct blockgroup_lock s_blockgroup_lock;
+	struct blockgroup_lock *s_blockgroup_lock;
 
 	/* root of the per fs reservation window tree */
 	spinlock_t s_rsv_window_lock;