Message ID | 20131018044928.7339.30260.stgit@birch.djwong.org |
---|---|
State | Superseded, archived |
Headers | show |
On Thu, Oct 17, 2013 at 09:49:28PM -0700, Darrick J. Wong wrote: > On a FS with a rather large blockize (> 4K), the old block map > structure can construct a fat enough "tree" (or whatever we call that > lopsided thing) that (at least in theory) one could create mappings > for logical blocks higher than 32 bits. In practice this doesn't > happen, but the 'max' and 'iter' variables that the punch helpers use > will overflow because the BLOCK_SIZE_BITS shifts are too large to fit > a 32-bit variable. This causes punch to fail on TIND-mapped blocks > even if the file is < 16T. So enlarge the fields to fit. Hmm.... this brings up the question of whether we should support inodes that have indirect block maps that result in mappings for logical blocks > 32-bits. There is probably a lot of code that assumes that the logical block number is 32-bits that will break horribly. So things brings up a couple of different questions. #1) Does e2fsck notice, and does it complain if it trips against one of these. #2) What should e2fsprogs do when it comes across one of these inodes. It may be that simply returning an error is enough, once we notice that it hsa blocks larger than this. Would it be cleaner and more efficient for the punch code to simply make sure that it stops before the logical block number overflows? 64-bit variables have a cost, especially on 32-bit machines. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 23, 2013 at 08:08:34PM -0400, Theodore Ts'o wrote: > On Thu, Oct 17, 2013 at 09:49:28PM -0700, Darrick J. Wong wrote: > > On a FS with a rather large blockize (> 4K), the old block map > > structure can construct a fat enough "tree" (or whatever we call that > > lopsided thing) that (at least in theory) one could create mappings > > for logical blocks higher than 32 bits. In practice this doesn't > > happen, but the 'max' and 'iter' variables that the punch helpers use > > will overflow because the BLOCK_SIZE_BITS shifts are too large to fit > > a 32-bit variable. This causes punch to fail on TIND-mapped blocks > > even if the file is < 16T. So enlarge the fields to fit. > > Hmm.... this brings up the question of whether we should support > inodes that have indirect block maps that result in mappings for > logical blocks > 32-bits. There is probably a lot of code that > assumes that the logical block number is 32-bits that will break > horribly. I'm not sure. The way I noticed this brokeness was by creating a FS with 64k blocks, sparse-writing a range of blocks at lblk 268451854 (to force it to create a tind map) and then try to punch it. The file itself had a size of just under 16T. e2fsck seemed fine with the file, and as you can see the lblk number was nowhere close to 2^32. I think the problem is that the punch code is using two variables max and incr as upper limits on how many blocks it should try to punch for a given level. Since the variables aren't wide enough, they overflow (effectively becoming zero) and then things like (offset + incr(0) <= start) become true and so it quits early. --- If I use fuse2fs to create a non-extent file that exceeds 2^32 blocks (and blocksize > 4k), fsck doesn't complain. If the blocksize is 4k or less, the kernel refuses to write the file, but fuse2fs creates a garbled filesystem (with enormous i_size but no blocks mapped) and fsck complains. Hmm, I'll look into that. --D > > So things brings up a couple of different questions. > > #1) Does e2fsck notice, and does it complain if it trips against one > of these. > > #2) What should e2fsprogs do when it comes across one of these inodes. > It may be that simply returning an error is enough, once we notice > that it hsa blocks larger than this. Would it be cleaner and more > efficient for the punch code to simply make sure that it stops before > the logical block number overflows? 64-bit variables have a cost, > especially on 32-bit machines. > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c index 4471f46..790a0ad8 100644 --- a/lib/ext2fs/punch.c +++ b/lib/ext2fs/punch.c @@ -50,15 +50,16 @@ static errcode_t ind_punch(ext2_filsys fs, struct ext2_inode *inode, blk_t start, blk_t count, int max) { errcode_t retval; - blk_t b, offset; - int i, incr; + blk_t b; + int i; + blk64_t offset, incr; int freed = 0; #ifdef PUNCH_DEBUG printf("Entering ind_punch, level %d, start %u, count %u, " "max %d\n", level, start, count, max); #endif - incr = 1 << ((EXT2_BLOCK_SIZE_BITS(fs->super)-2)*level); + incr = 1ULL << ((EXT2_BLOCK_SIZE_BITS(fs->super)-2)*level); for (i=0, offset=0; i < max; i++, p++, offset += incr) { if (offset >= start + count) break; @@ -87,7 +88,7 @@ static errcode_t ind_punch(ext2_filsys fs, struct ext2_inode *inode, continue; } #ifdef PUNCH_DEBUG - printf("Freeing block %u (offset %d)\n", b, offset); + printf("Freeing block %u (offset %llu)\n", b, offset); #endif ext2fs_block_alloc_stats(fs, b, -1); *p = 0; @@ -108,7 +109,7 @@ static errcode_t ext2fs_punch_ind(ext2_filsys fs, struct ext2_inode *inode, int num = EXT2_NDIR_BLOCKS; blk_t *bp = inode->i_block; blk_t addr_per_block; - blk_t max = EXT2_NDIR_BLOCKS; + blk64_t max = EXT2_NDIR_BLOCKS; if (!block_buf) { retval = ext2fs_get_array(3, fs->blocksize, &buf); @@ -119,10 +120,10 @@ static errcode_t ext2fs_punch_ind(ext2_filsys fs, struct ext2_inode *inode, addr_per_block = (blk_t) fs->blocksize >> 2; - for (level=0; level < 4; level++, max *= addr_per_block) { + for (level = 0; level < 4; level++, max *= (blk64_t)addr_per_block) { #ifdef PUNCH_DEBUG printf("Main loop level %d, start %u count %u " - "max %d num %d\n", level, start, count, max, num); + "max %llu num %d\n", level, start, count, max, num); #endif if (start < max) { retval = ind_punch(fs, inode, block_buf, bp, level,
On a FS with a rather large blockize (> 4K), the old block map structure can construct a fat enough "tree" (or whatever we call that lopsided thing) that (at least in theory) one could create mappings for logical blocks higher than 32 bits. In practice this doesn't happen, but the 'max' and 'iter' variables that the punch helpers use will overflow because the BLOCK_SIZE_BITS shifts are too large to fit a 32-bit variable. This causes punch to fail on TIND-mapped blocks even if the file is < 16T. So enlarge the fields to fit. (Yes this is an obscure corner case...) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- lib/ext2fs/punch.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html