diff mbox

[RFC,V2] ext4: limit block allocations for indirect-block files to < 2^32

Message ID 4AA1D94F.8060703@redhat.com
State Superseded, archived
Headers show

Commit Message

Eric Sandeen Sept. 5, 2009, 3:21 a.m. UTC
Today, the ext4 allocator will happily allocate blocks past
232 for indirect-block files, which results in the block
numbers getting truncated, and corruption ensues.

This patch limits such allocations to < 232, and adds
WARN_ONs (maybe should be BUG_ONs) if we do get blocks
larger than that.

This should address RH Bug 519471, ext4 bitmap allocator 
must limit blocks to < 232

* ext4_find_goal() is modified to choose a goal < UINT_MAX,
  so that our starting point is in an acceptable range.

* ext4_xattr_block_set() is modified such that the goal block
  is < UINT_MAX, as above.

* ext4_mb_regular_allocator() is modified so that the group
  search does not continue into groups which are too high

* ext4_mb_use_preallocated() has a check that we don't use
  preallocated space which is too far out

* ext4_alloc_blocks() and ext4_xattr_block_set() add some WARN_ONs

No attempt has been made to limit inode locations to < 232,
so we may wind up with blocks far from their inodes.  Doing
this much already will lead to some odd ENOSPC issues when the
"lower 32" gets full, and further restricting inodes could
make that even weirder.

For high inodes, choosing a goal of the original, % UINT_MAX,
may be a bit odd, but then we're in an odd situation anyway,
and I don't know of a better heuristic.

Perhaps an ext4-specific #define would be better than UINT_MAX?

The allocator being what it is, I may have missed some spots,
so I'd welcome review.

Thanks,
-Eric

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

V2: got modulo-happy in ext4_mb_regular_allocator, just limit
ngroups to no more than UINT_MAX.


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andreas Dilger Sept. 5, 2009, 4:45 p.m. UTC | #1
On Sep 04, 2009  22:21 -0500, Eric Sandeen wrote:
> Today, the ext4 allocator will happily allocate blocks past
> 232 for indirect-block files, which results in the block
> numbers getting truncated, and corruption ensues.
>
> This patch limits such allocations to < 2^32, and adds
> WARN_ONs (maybe should be BUG_ONs) if we do get blocks
> larger than that.

Eric, thanks for making the patch.

> This should address RH Bug 519471, ext4 bitmap allocator must limit 
> blocks to < 2^32
>
> * ext4_find_goal() is modified to choose a goal < UINT_MAX,
>  so that our starting point is in an acceptable range.
>
> * ext4_xattr_block_set() is modified such that the goal block
>  is < UINT_MAX, as above.

Using UINT_MAX probably isn't wholly safe, as I know of systems
that have e.g. 64-bit ints (though I guess none that have Linux
kernel ports).  It should use (u32)~0 or ((1 << 32) - 1) directly.

> Perhaps an ext4-specific #define would be better than UINT_MAX?

I think yes, since we know the maximum value is tied specifically
to the u32 indirect block pointers, and not necessarily to an "int".

> static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block,
> 				   Indirect *partial)
> {
> +	goal = ext4_find_near(inode, partial);
> +	goal = goal % UINT_MAX;
> +	return goal;

Using "% UINT_MAX" here will result in a 64-bit division on 32-bit
platforms, since ext4_fsblk_t is declared as an unsigned long long.
This should instead be "(u32)" or "& 0xffffffff".

> @@ -1943,6 +1943,11 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
> +	/* non-extent files are limited to low blocks/groups */
> +	if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL))
> +		ngroups = min_t(unsigned long, ngroups,
> +				(UINT_MAX / EXT4_BLOCKS_PER_GROUP(sb)));

Since EXT4_BLOCKS_PER_GROUP() is a run-time variable, but is constant
for the life of the filesystem, this could be computed once and stored
in the superblock?

> +++ b/fs/ext4/xattr.c
> @@ -810,12 +810,22 @@ inserted:
> +			if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
> +				goal = goal % UINT_MAX;

As above.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Sept. 5, 2009, 6:16 p.m. UTC | #2
Andreas Dilger wrote:
> On Sep 04, 2009  22:21 -0500, Eric Sandeen wrote:
>> Today, the ext4 allocator will happily allocate blocks past
>> 232 for indirect-block files, which results in the block
>> numbers getting truncated, and corruption ensues.
>>
>> This patch limits such allocations to < 2^32, and adds
>> WARN_ONs (maybe should be BUG_ONs) if we do get blocks
>> larger than that.
> 
> Eric, thanks for making the patch.
> 
>> This should address RH Bug 519471, ext4 bitmap allocator must limit 
>> blocks to < 2^32
>>
>> * ext4_find_goal() is modified to choose a goal < UINT_MAX,
>>  so that our starting point is in an acceptable range.
>>
>> * ext4_xattr_block_set() is modified such that the goal block
>>  is < UINT_MAX, as above.
> 
> Using UINT_MAX probably isn't wholly safe, as I know of systems
> that have e.g. 64-bit ints (though I guess none that have Linux
> kernel ports).  It should use (u32)~0 or ((1 << 32) - 1) directly.
> 
>> Perhaps an ext4-specific #define would be better than UINT_MAX?
> 
> I think yes, since we know the maximum value is tied specifically
> to the u32 indirect block pointers, and not necessarily to an "int".

yep, I had considered that, I should have just done it :)  (esp 
considering the patch I sent a while back to get rid of similar things) :)

>> static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block,
>> 				   Indirect *partial)
>> {
>> +	goal = ext4_find_near(inode, partial);
>> +	goal = goal % UINT_MAX;
>> +	return goal;
> 
> Using "% UINT_MAX" here will result in a 64-bit division on 32-bit
> platforms, since ext4_fsblk_t is declared as an unsigned long long.
> This should instead be "(u32)" or "& 0xffffffff".

whoops good point.  I wasn't thinking of 32-bit boxes, thinking they 
can't go past 16T but for smaller blocks we still could go past 2^32 
blocks... and it is a 64-bit modulo regardless.

>> @@ -1943,6 +1943,11 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>> +	/* non-extent files are limited to low blocks/groups */
>> +	if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL))
>> +		ngroups = min_t(unsigned long, ngroups,
>> +				(UINT_MAX / EXT4_BLOCKS_PER_GROUP(sb)));
> 
> Since EXT4_BLOCKS_PER_GROUP() is a run-time variable, but is constant
> for the life of the filesystem, this could be computed once and stored
> in the superblock?

ok.

>> +++ b/fs/ext4/xattr.c
>> @@ -810,12 +810,22 @@ inserted:
>> +			if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
>> +				goal = goal % UINT_MAX;
> 
> As above.

Thanks for the review, will fix those up.

-Eric

> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f9c642b..cda3f8d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -551,15 +551,21 @@  static ext4_fsblk_t ext4_find_near(struct inode *inode, Indirect *ind)
  *
  *	Normally this function find the preferred place for block allocation,
  *	returns it.
+ *	Because this is only used for non-extent files, we limit the block nr
+ *	to 32 bits.
  */
 static ext4_fsblk_t ext4_find_goal(struct inode *inode, ext4_lblk_t block,
 				   Indirect *partial)
 {
+	ext4_fsblk_t goal;
+
 	/*
 	 * XXX need to get goal block from mballoc's data structures
 	 */
 
-	return ext4_find_near(inode, partial);
+	goal = ext4_find_near(inode, partial);
+	goal = goal % UINT_MAX;
+	return goal;
 }
 
 /**
@@ -640,6 +646,8 @@  static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
 		if (*err)
 			goto failed_out;
 
+		WARN_ON(current_block + count > UINT_MAX);
+
 		target -= count;
 		/* allocate blocks for indirect blocks */
 		while (index < indirect_blks && count) {
@@ -674,6 +682,7 @@  static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
 		ar.flags = EXT4_MB_HINT_DATA;
 
 	current_block = ext4_mb_new_blocks(handle, &ar, err);
+	WARN_ON(current_block + ar.len > UINT_MAX);
 
 	if (*err && (target == blks)) {
 		/*
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index cd25846..10384c3 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1943,6 +1943,11 @@  ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 	sb = ac->ac_sb;
 	sbi = EXT4_SB(sb);
 	ngroups = ext4_get_groups_count(sb);
+	/* non-extent files are limited to low blocks/groups */
+	if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL))
+		ngroups = min_t(unsigned long, ngroups,
+				(UINT_MAX / EXT4_BLOCKS_PER_GROUP(sb)));
+
 	BUG_ON(ac->ac_status == AC_STATUS_FOUND);
 
 	/* first, try the goal */
@@ -3382,6 +3387,10 @@  ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
 			ac->ac_o_ex.fe_logical >= pa->pa_lstart + pa->pa_len)
 			continue;
 
+		if (!(EXT4_I(ac->ac_inode)->i_flags & EXT4_EXTENTS_FL) &&
+			pa->pa_pstart + pa->pa_len > UINT_MAX)
+			continue;
+
 		/* found preallocated blocks, use them */
 		spin_lock(&pa->pa_lock);
 		if (pa->pa_deleted == 0 && pa->pa_free) {
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 62b31c2..9ed0f12 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -810,12 +810,22 @@  inserted:
 			get_bh(new_bh);
 		} else {
 			/* We need to allocate a new block */
-			ext4_fsblk_t goal = ext4_group_first_block_no(sb,
+			ext4_fsblk_t goal, block;
+
+			goal = ext4_group_first_block_no(sb,
 						EXT4_I(inode)->i_block_group);
-			ext4_fsblk_t block = ext4_new_meta_blocks(handle, inode,
+
+			if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+				goal = goal % UINT_MAX;
+
+			block = ext4_new_meta_blocks(handle, inode,
 						  goal, NULL, &error);
 			if (error)
 				goto cleanup;
+
+			if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+				WARN_ON(block > UINT_MAX);
+
 			ea_idebug(inode, "creating block %d", block);
 
 			new_bh = sb_getblk(sb, block);