diff mbox

ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg

Message ID 1339077153-13212-1-git-send-email-tytso@mit.edu
State Accepted, archived
Headers show

Commit Message

Theodore Ts'o June 7, 2012, 1:52 p.m. UTC
Ext3 filesystems that are converted to use as many ext4 file system
features as possible will enable uninit_bg to speed up e2fsck times.
These file systems will have a native ext3 layout of inode tables and
block allocation bitmaps (as opposed to ext4's flex_bg layout).
Unfortunately, in these cases, when first allocating a block in an
uninitialized block group, ext4 would incorrectly calculate the number
of free blocks in that block group, and then errorneously report that
the file system was corrupt:

EXT4-fs error (device vdd): ext4_mb_generate_buddy:741: group 30, 32254 clusters in bitmap, 32258 in gd

This problem can be reproduced via:

    mke2fs -q -t ext4 -O ^flex_bg /dev/vdd 5g
    mount -t ext4 /dev/vdd /mnt
    fallocate -l 4600m /mnt/test

The problem was caused by a bone headed mistake in the check to see if a
particular metadata block was part of the block group.

Many thanks to Kees Cook for finding and bisecting the buggy commit
which introduced this bug (commit fd034a84e1, present since v3.2).

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Kees Cook <keescook@chromium.org>
---

Note: I send to push this to Linus soon, preferably before 3.5-rc2, so
I'd appreciate any comments ASAP.  Thanks!!

 fs/ext4/balloc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Sander Eikelenboom June 7, 2012, 2:01 p.m. UTC | #1
Thursday, June 7, 2012, 3:52:33 PM, you wrote:

> Ext3 filesystems that are converted to use as many ext4 file system
> features as possible will enable uninit_bg to speed up e2fsck times.
> These file systems will have a native ext3 layout of inode tables and
> block allocation bitmaps (as opposed to ext4's flex_bg layout).
> Unfortunately, in these cases, when first allocating a block in an
> uninitialized block group, ext4 would incorrectly calculate the number
> of free blocks in that block group, and then errorneously report that
> the file system was corrupt:

> EXT4-fs error (device vdd): ext4_mb_generate_buddy:741: group 30, 32254 clusters in bitmap, 32258 in gd

> This problem can be reproduced via:

>     mke2fs -q -t ext4 -O ^flex_bg /dev/vdd 5g
>     mount -t ext4 /dev/vdd /mnt
>     fallocate -l 4600m /mnt/test

> The problem was caused by a bone headed mistake in the check to see if a
> particular metadata block was part of the block group.

> Many thanks to Kees Cook for finding and bisecting the buggy commit
> which introduced this bug (commit fd034a84e1, present since v3.2).

> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Reported-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: stable@kernel.org
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
> Cc: Kees Cook <keescook@chromium.org>
> ---

> Note: I send to push this to Linus soon, preferably before 3.5-rc2, so
> I'd appreciate any comments ASAP.  Thanks!!

Compiled it, turned uninit_bg back on, now copying some GB's worth of data to the FS.
No problems so far...

BTW: Is there any (real) advantage to recreate the FS using the ext4's flex_bg instead of converting the ext3 to ext4 and using ext3 uninit_bg ?

--
Sander


>  fs/ext4/balloc.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 99b6324..cee7812 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -90,8 +90,8 @@ unsigned ext4_num_overhead_clusters(struct super_block *sb,
>          * unusual file system layouts.
>          */
>         if (ext4_block_in_group(sb, ext4_block_bitmap(sb, gdp), block_group)) {
> -               block_cluster = EXT4_B2C(sbi, (start -
> -                                              ext4_block_bitmap(sb, gdp)));
> +               block_cluster = EXT4_B2C(sbi,
> +                                        ext4_block_bitmap(sb, gdp) - start);
>                 if (block_cluster < num_clusters)
>                         block_cluster = -1;
>                 else if (block_cluster == num_clusters) {
> @@ -102,7 +102,7 @@ unsigned ext4_num_overhead_clusters(struct super_block *sb,
>  
>         if (ext4_block_in_group(sb, ext4_inode_bitmap(sb, gdp), block_group)) {
>                 inode_cluster = EXT4_B2C(sbi,
> -                                        start - ext4_inode_bitmap(sb, gdp));
> +                                        ext4_inode_bitmap(sb, gdp) - start);
>                 if (inode_cluster < num_clusters)
>                         inode_cluster = -1;
>                 else if (inode_cluster == num_clusters) {
> @@ -114,7 +114,7 @@ unsigned ext4_num_overhead_clusters(struct super_block *sb,
>         itbl_blk = ext4_inode_table(sb, gdp);
>         for (i = 0; i < sbi->s_itb_per_group; i++) {
>                 if (ext4_block_in_group(sb, itbl_blk + i, block_group)) {
> -                       c = EXT4_B2C(sbi, start - itbl_blk + i);
> +                       c = EXT4_B2C(sbi, itbl_blk + i - start);
>                         if ((c < num_clusters) || (c == inode_cluster) ||
>                             (c == block_cluster) || (c == itbl_cluster))
>                                 continue;




--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o June 7, 2012, 2:10 p.m. UTC | #2
On Thu, Jun 07, 2012 at 04:01:16PM +0200, Sander Eikelenboom wrote:
> 
> BTW: Is there any (real) advantage to recreate the FS using the ext4's flex_bg instead of converting the ext3 to ext4 and using ext3 uninit_bg ?
> 

It speeds up e2fsck times, and it allows for more contiguous free
space in the file system.  So yes, there is an advantage to using a
native ext4 file system --- but of course that's a pain in the tuckus
for people with large RAID arrays.  Which also explains why we were
mainly seeing this bug reported with people using large RAID devices;
those are the people who are most likely to want to do an
upgrade-in-place, because it's too painful to recreate and then copy
the data.  Which is fine; one of the major features of ext4 is to be
able to support upgrades-in-place from ext3.

Unfortunately, while I had native (unconverted) ext3 file systems in
my test matrix, I didn't have converted ext3 file system formats in my
regular regression test suite script.  That is going to be fixed
shortly....

		   	   	   	    - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sander Eikelenboom June 7, 2012, 2:39 p.m. UTC | #3
Thursday, June 7, 2012, 4:10:51 PM, you wrote:

> On Thu, Jun 07, 2012 at 04:01:16PM +0200, Sander Eikelenboom wrote:
>> 
>> BTW: Is there any (real) advantage to recreate the FS using the ext4's flex_bg instead of converting the ext3 to ext4 and using ext3 uninit_bg ?
>> 

> It speeds up e2fsck times, and it allows for more contiguous free
> space in the file system.  So yes, there is an advantage to using a
> native ext4 file system --- but of course that's a pain in the tuckus
> for people with large RAID arrays.  Which also explains why we were
> mainly seeing this bug reported with people using large RAID devices;
> those are the people who are most likely to want to do an
> upgrade-in-place, because it's too painful to recreate and then copy
> the data.  Which is fine; one of the major features of ext4 is to be
> able to support upgrades-in-place from ext3.

Ok thx for the info and the fix !

--
Sander

> Unfortunately, while I had native (unconverted) ext3 file systems in
> my test matrix, I didn't have converted ext3 file system formats in my
> regular regression test suite script.  That is going to be fixed
> shortly....

>                                             - Ted
Kees Cook June 7, 2012, 5:12 p.m. UTC | #4
On Thu, Jun 7, 2012 at 6:52 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> Ext3 filesystems that are converted to use as many ext4 file system
> features as possible will enable uninit_bg to speed up e2fsck times.
> These file systems will have a native ext3 layout of inode tables and
> block allocation bitmaps (as opposed to ext4's flex_bg layout).
> Unfortunately, in these cases, when first allocating a block in an
> uninitialized block group, ext4 would incorrectly calculate the number
> of free blocks in that block group, and then errorneously report that
> the file system was corrupt:
>
> EXT4-fs error (device vdd): ext4_mb_generate_buddy:741: group 30, 32254 clusters in bitmap, 32258 in gd
>
> This problem can be reproduced via:
>
>    mke2fs -q -t ext4 -O ^flex_bg /dev/vdd 5g
>    mount -t ext4 /dev/vdd /mnt
>    fallocate -l 4600m /mnt/test
>
> The problem was caused by a bone headed mistake in the check to see if a
> particular metadata block was part of the block group.
>
> Many thanks to Kees Cook for finding and bisecting the buggy commit
> which introduced this bug (commit fd034a84e1, present since v3.2).
>
> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Reported-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: stable@kernel.org
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
> Cc: Kees Cook <keescook@chromium.org>

Tested-by: Kees Cook <keescook@chromium.org>

Thanks! I can confirm that this fixes it.

-Kees
diff mbox

Patch

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 99b6324..cee7812 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -90,8 +90,8 @@  unsigned ext4_num_overhead_clusters(struct super_block *sb,
 	 * unusual file system layouts.
 	 */
 	if (ext4_block_in_group(sb, ext4_block_bitmap(sb, gdp), block_group)) {
-		block_cluster = EXT4_B2C(sbi, (start -
-					       ext4_block_bitmap(sb, gdp)));
+		block_cluster = EXT4_B2C(sbi,
+					 ext4_block_bitmap(sb, gdp) - start);
 		if (block_cluster < num_clusters)
 			block_cluster = -1;
 		else if (block_cluster == num_clusters) {
@@ -102,7 +102,7 @@  unsigned ext4_num_overhead_clusters(struct super_block *sb,
 
 	if (ext4_block_in_group(sb, ext4_inode_bitmap(sb, gdp), block_group)) {
 		inode_cluster = EXT4_B2C(sbi,
-					 start - ext4_inode_bitmap(sb, gdp));
+					 ext4_inode_bitmap(sb, gdp) - start);
 		if (inode_cluster < num_clusters)
 			inode_cluster = -1;
 		else if (inode_cluster == num_clusters) {
@@ -114,7 +114,7 @@  unsigned ext4_num_overhead_clusters(struct super_block *sb,
 	itbl_blk = ext4_inode_table(sb, gdp);
 	for (i = 0; i < sbi->s_itb_per_group; i++) {
 		if (ext4_block_in_group(sb, itbl_blk + i, block_group)) {
-			c = EXT4_B2C(sbi, start - itbl_blk + i);
+			c = EXT4_B2C(sbi, itbl_blk + i - start);
 			if ((c < num_clusters) || (c == inode_cluster) ||
 			    (c == block_cluster) || (c == itbl_cluster))
 				continue;