Patchwork [05/25] libext2fs: don't overflow when punching indirect blocks with large blocks

login
register
mail settings
Submitter Darrick J. Wong
Date Oct. 18, 2013, 4:49 a.m.
Message ID <20131018044928.7339.30260.stgit@birch.djwong.org>
Download mbox | patch
Permalink /patch/284412/
State Superseded
Headers show

Comments

Darrick J. Wong - Oct. 18, 2013, 4:49 a.m.
On a FS with a rather large blockize (> 4K), the old block map
structure can construct a fat enough "tree" (or whatever we call that
lopsided thing) that (at least in theory) one could create mappings
for logical blocks higher than 32 bits.  In practice this doesn't
happen, but the 'max' and 'iter' variables that the punch helpers use
will overflow because the BLOCK_SIZE_BITS shifts are too large to fit
a 32-bit variable.  This causes punch to fail on TIND-mapped blocks
even if the file is < 16T.  So enlarge the fields to fit.

(Yes this is an obscure corner case...)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/punch.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Oct. 24, 2013, 12:08 a.m.
On Thu, Oct 17, 2013 at 09:49:28PM -0700, Darrick J. Wong wrote:
> On a FS with a rather large blockize (> 4K), the old block map
> structure can construct a fat enough "tree" (or whatever we call that
> lopsided thing) that (at least in theory) one could create mappings
> for logical blocks higher than 32 bits.  In practice this doesn't
> happen, but the 'max' and 'iter' variables that the punch helpers use
> will overflow because the BLOCK_SIZE_BITS shifts are too large to fit
> a 32-bit variable.  This causes punch to fail on TIND-mapped blocks
> even if the file is < 16T.  So enlarge the fields to fit.

Hmm.... this brings up the question of whether we should support
inodes that have indirect block maps that result in mappings for
logical blocks > 32-bits.  There is probably a lot of code that
assumes that the logical block number is 32-bits that will break
horribly.

So things brings up a couple of different questions.

#1) Does e2fsck notice, and does it complain if it trips against one
of these.

#2) What should e2fsprogs do when it comes across one of these inodes.
It may be that simply returning an error is enough, once we notice
that it hsa blocks larger than this.  Would it be cleaner and more
efficient for the punch code to simply make sure that it stops before
the logical block number overflows?  64-bit variables have a cost,
especially on 32-bit machines.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong - Dec. 4, 2013, 4:40 a.m.
On Wed, Oct 23, 2013 at 08:08:34PM -0400, Theodore Ts'o wrote:
> On Thu, Oct 17, 2013 at 09:49:28PM -0700, Darrick J. Wong wrote:
> > On a FS with a rather large blockize (> 4K), the old block map
> > structure can construct a fat enough "tree" (or whatever we call that
> > lopsided thing) that (at least in theory) one could create mappings
> > for logical blocks higher than 32 bits.  In practice this doesn't
> > happen, but the 'max' and 'iter' variables that the punch helpers use
> > will overflow because the BLOCK_SIZE_BITS shifts are too large to fit
> > a 32-bit variable.  This causes punch to fail on TIND-mapped blocks
> > even if the file is < 16T.  So enlarge the fields to fit.
> 
> Hmm.... this brings up the question of whether we should support
> inodes that have indirect block maps that result in mappings for
> logical blocks > 32-bits.  There is probably a lot of code that
> assumes that the logical block number is 32-bits that will break
> horribly.

I'm not sure.  The way I noticed this brokeness was by creating a FS with 64k
blocks, sparse-writing a range of blocks at lblk 268451854 (to force it to
create a tind map) and then try to punch it.  The file itself had a size of
just under 16T.  e2fsck seemed fine with the file, and as you can see the lblk
number was nowhere close to 2^32.

I think the problem is that the punch code is using two variables max and incr
as upper limits on how many blocks it should try to punch for a given level.
Since the variables aren't wide enough, they overflow (effectively becoming
zero) and then things like (offset + incr(0) <= start) become true and so it
quits early.

---

If I use fuse2fs to create a non-extent file that exceeds 2^32 blocks (and
blocksize > 4k), fsck doesn't complain.

If the blocksize is 4k or less, the kernel refuses to write the file, but
fuse2fs creates a garbled filesystem (with enormous i_size but no blocks
mapped) and fsck complains.  Hmm, I'll look into that.

--D

> 
> So things brings up a couple of different questions.
> 
> #1) Does e2fsck notice, and does it complain if it trips against one
> of these.
> 
> #2) What should e2fsprogs do when it comes across one of these inodes.
> It may be that simply returning an error is enough, once we notice
> that it hsa blocks larger than this.  Would it be cleaner and more
> efficient for the punch code to simply make sure that it stops before
> the logical block number overflows?  64-bit variables have a cost,
> especially on 32-bit machines.
> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
index 4471f46..790a0ad8 100644
--- a/lib/ext2fs/punch.c
+++ b/lib/ext2fs/punch.c
@@ -50,15 +50,16 @@  static errcode_t ind_punch(ext2_filsys fs, struct ext2_inode *inode,
 			   blk_t start, blk_t count, int max)
 {
 	errcode_t	retval;
-	blk_t		b, offset;
-	int		i, incr;
+	blk_t		b;
+	int		i;
+	blk64_t		offset, incr;
 	int		freed = 0;
 
 #ifdef PUNCH_DEBUG
 	printf("Entering ind_punch, level %d, start %u, count %u, "
 	       "max %d\n", level, start, count, max);
 #endif
-	incr = 1 << ((EXT2_BLOCK_SIZE_BITS(fs->super)-2)*level);
+	incr = 1ULL << ((EXT2_BLOCK_SIZE_BITS(fs->super)-2)*level);
 	for (i=0, offset=0; i < max; i++, p++, offset += incr) {
 		if (offset >= start + count)
 			break;
@@ -87,7 +88,7 @@  static errcode_t ind_punch(ext2_filsys fs, struct ext2_inode *inode,
 				continue;
 		}
 #ifdef PUNCH_DEBUG
-		printf("Freeing block %u (offset %d)\n", b, offset);
+		printf("Freeing block %u (offset %llu)\n", b, offset);
 #endif
 		ext2fs_block_alloc_stats(fs, b, -1);
 		*p = 0;
@@ -108,7 +109,7 @@  static errcode_t ext2fs_punch_ind(ext2_filsys fs, struct ext2_inode *inode,
 	int			num = EXT2_NDIR_BLOCKS;
 	blk_t			*bp = inode->i_block;
 	blk_t			addr_per_block;
-	blk_t			max = EXT2_NDIR_BLOCKS;
+	blk64_t			max = EXT2_NDIR_BLOCKS;
 
 	if (!block_buf) {
 		retval = ext2fs_get_array(3, fs->blocksize, &buf);
@@ -119,10 +120,10 @@  static errcode_t ext2fs_punch_ind(ext2_filsys fs, struct ext2_inode *inode,
 
 	addr_per_block = (blk_t) fs->blocksize >> 2;
 
-	for (level=0; level < 4; level++, max *= addr_per_block) {
+	for (level = 0; level < 4; level++, max *= (blk64_t)addr_per_block) {
 #ifdef PUNCH_DEBUG
 		printf("Main loop level %d, start %u count %u "
-		       "max %d num %d\n", level, start, count, max, num);
+		       "max %llu num %d\n", level, start, count, max, num);
 #endif
 		if (start < max) {
 			retval = ind_punch(fs, inode, block_buf, bp, level,