diff mbox

[01/54] libext2fs: Read and write full size inodes

Message ID 20120306235727.11945.11066.stgit@elm3b70.beaverton.ibm.com
State Superseded, archived
Delegated to: Theodore Ts'o
Headers show

Commit Message

Darrick J. Wong March 6, 2012, 11:57 p.m. UTC
Change libext2fs to read and write full-size inodes in preparation for the
metadata checksumming patchset, which will require this.  Due to ABI
compatibility requirements, this change must be hidden from client programs.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
---
 e2fsck/pass1.c       |    3 +
 lib/ext2fs/ext2fsP.h |    2 -
 lib/ext2fs/inode.c   |  139 +++++++++++++++++++++++++++-----------------------
 3 files changed, 79 insertions(+), 65 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Theodore Ts'o March 9, 2012, 11:36 p.m. UTC | #1
On Tue, Mar 06, 2012 at 03:57:27PM -0800, Darrick J. Wong wrote:
> Change libext2fs to read and write full-size inodes in preparation for the
> metadata checksumming patchset, which will require this.  Due to ABI
> compatibility requirements, this change must be hidden from client programs.
> 
> Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>

After applying this first patch, the e2fsprogs regression test suite
blew up spectacularly caused by the malloc-managed free lists pointers
getting corrupted:

*** glibc detected *** /usr/projects/e2fsprogs/e2fsprogs/build/debugfs/debugfs: free(): invalid next size (fast): 0x000000000063f9d0 ***
======= Backtrace: =========
/lib/libc.so.6(+0x77806)[0x7ffff6e4a806]
/lib/libc.so.6(cfree+0x73)[0x7ffff6e510d3]
/usr/projects/e2fsprogs/e2fsprogs/build/lib/libext2fs.so.2(ext2fs_free_mem+0x30)[0x7ffff7bb562f]


Interestingly, valgrind was *not* useful in finding the problem;
apparently it was getting confused by the ext2fs_get_mem()
abstraction, which is unfortunate.  I'll have to look into that at
some point.

Anyway, the problem was in ext2fs_write_inode_full(), and it could be
replicated by simply writing to an inode, i.e.

   mke2fs  -F -O resize_inode -o Linux -b 1024 /tmp/image  16384
   debugfs -w -R "set_inode_field <7> mtime now" /tmp/image

is enough to trigger it.  The problem is with a 128 byte ext2 file
system, ext2fs_write_inode_full() is passed a large inode and so
bufsize is 156, but EXT2_INODE_SIZE(fs->super) is 128.  So at
lib/ext2fs/inode.c:698:

	memcpy(w_inode, inode, bufsize);

you end up writing 156 bytes into a memory buffer that was allocated
to a size of 128 bytes.  Hilarity ensues.

The fix is relatively simple:

	memcpy(w_inode, inode, (bufsize > length) ? length : bufsize);

Anyway, this is *why* running the regression tests are important.
(And why projects which don't have regression test suites are just
asking for trouble.)

They catch all sorts of interesting oversights like this....

     	       	     		    	       - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong March 12, 2012, 10:23 p.m. UTC | #2
On Fri, Mar 09, 2012 at 06:36:31PM -0500, Ted Ts'o wrote:
> On Tue, Mar 06, 2012 at 03:57:27PM -0800, Darrick J. Wong wrote:
> > Change libext2fs to read and write full-size inodes in preparation for the
> > metadata checksumming patchset, which will require this.  Due to ABI
> > compatibility requirements, this change must be hidden from client programs.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
> 
> After applying this first patch, the e2fsprogs regression test suite
> blew up spectacularly caused by the malloc-managed free lists pointers
> getting corrupted:
> 
> *** glibc detected *** /usr/projects/e2fsprogs/e2fsprogs/build/debugfs/debugfs: free(): invalid next size (fast): 0x000000000063f9d0 ***
> ======= Backtrace: =========
> /lib/libc.so.6(+0x77806)[0x7ffff6e4a806]
> /lib/libc.so.6(cfree+0x73)[0x7ffff6e510d3]
> /usr/projects/e2fsprogs/e2fsprogs/build/lib/libext2fs.so.2(ext2fs_free_mem+0x30)[0x7ffff7bb562f]
> 
> 
> Interestingly, valgrind was *not* useful in finding the problem;
> apparently it was getting confused by the ext2fs_get_mem()
> abstraction, which is unfortunate.  I'll have to look into that at
> some point.
> 
> Anyway, the problem was in ext2fs_write_inode_full(), and it could be
> replicated by simply writing to an inode, i.e.
> 
>    mke2fs  -F -O resize_inode -o Linux -b 1024 /tmp/image  16384
>    debugfs -w -R "set_inode_field <7> mtime now" /tmp/image
> 
> is enough to trigger it.  The problem is with a 128 byte ext2 file
> system, ext2fs_write_inode_full() is passed a large inode and so
> bufsize is 156, but EXT2_INODE_SIZE(fs->super) is 128.  So at
> lib/ext2fs/inode.c:698:
> 
> 	memcpy(w_inode, inode, bufsize);
> 
> you end up writing 156 bytes into a memory buffer that was allocated
> to a size of 128 bytes.  Hilarity ensues.
> 
> The fix is relatively simple:
> 
> 	memcpy(w_inode, inode, (bufsize > length) ? length : bufsize);
> 
> Anyway, this is *why* running the regression tests are important.
> (And why projects which don't have regression test suites are just
> asking for trouble.)
> 
> They catch all sorts of interesting oversights like this....

Hmm, good catch.  To be honest I wasn't aware that there /was/ a make check.
I'll contribute my checksum tests when I figure out how to integrate them.

I'll send out patches to fix the other 45 failures this afternoon; I traced
most of them to a mistake I made in e2fsck/rehash.c.

--D

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 12, 2012, 10:53 p.m. UTC | #3
On Mon, Mar 12, 2012 at 03:23:25PM -0700, Darrick J. Wong wrote:
> 
> I'll send out patches to fix the other 45 failures this afternoon; I traced
> most of them to a mistake I made in e2fsck/rehash.c.

Do you have a github login?  It will be a lot easier if you just
update the patch queue at:

	https://github.com/tytso/e2fsprogs-cksum-patch-queue

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 5faa093..c31e073 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -691,7 +691,8 @@  void e2fsck_pass1(e2fsck_t ctx)
 	}
 	block_buf = (char *) e2fsck_allocate_memory(ctx, fs->blocksize * 3,
 						    "block interate buffer");
-	e2fsck_use_inode_shortcuts(ctx, 1);
+	if (EXT2_INODE_SIZE(fs->super) == EXT2_GOOD_OLD_INODE_SIZE)
+		e2fsck_use_inode_shortcuts(ctx, 1);
 	old_op = ehandler_operation(_("opening inode scan"));
 	pctx.errcode = ext2fs_open_inode_scan(fs, ctx->inode_buffer_blocks,
 					      &scan);
diff --git a/lib/ext2fs/ext2fsP.h b/lib/ext2fs/ext2fsP.h
index 729d5c5..4db0ae9 100644
--- a/lib/ext2fs/ext2fsP.h
+++ b/lib/ext2fs/ext2fsP.h
@@ -75,7 +75,7 @@  struct ext2_inode_cache {
 
 struct ext2_inode_cache_ent {
 	ext2_ino_t		ino;
-	struct ext2_inode	inode;
+	struct ext2_inode	*inode;
 };
 
 /* Function prototypes */
diff --git a/lib/ext2fs/inode.c b/lib/ext2fs/inode.c
index 6c524ff..a4f6670 100644
--- a/lib/ext2fs/inode.c
+++ b/lib/ext2fs/inode.c
@@ -74,7 +74,9 @@  errcode_t ext2fs_flush_icache(ext2_filsys fs)
 
 static errcode_t create_icache(ext2_filsys fs)
 {
+	int		i;
 	errcode_t	retval;
+	void		*p;
 
 	if (fs->icache)
 		return 0;
@@ -93,13 +95,20 @@  static errcode_t create_icache(ext2_filsys fs)
 	fs->icache->cache_size = 4;
 	fs->icache->refcount = 1;
 	retval = ext2fs_get_array(fs->icache->cache_size,
-				  sizeof(struct ext2_inode_cache_ent),
+				  sizeof(struct ext2_inode_cache_ent) +
+				  EXT2_INODE_SIZE(fs->super),
 				  &fs->icache->cache);
 	if (retval) {
 		ext2fs_free_mem(&fs->icache->buffer);
 		ext2fs_free_mem(&fs->icache);
 		return retval;
 	}
+
+	for (i = 0, p = (void *)(fs->icache->cache + fs->icache->cache_size);
+	     i < fs->icache->cache_size;
+	     i++, p += EXT2_INODE_SIZE(fs->super))
+		fs->icache->cache[i].inode = p;
+
 	ext2fs_flush_icache(fs);
 	return 0;
 }
@@ -407,6 +416,8 @@  errcode_t ext2fs_get_next_inode_full(ext2_inode_scan scan, ext2_ino_t *ino,
 {
 	errcode_t	retval;
 	int		extra_bytes = 0;
+	const int	length = EXT2_INODE_SIZE(scan->fs->super);
+	struct ext2_inode	*iptr = inode;
 
 	EXT2_CHECK_MAGIC(scan, EXT2_ET_MAGIC_INODE_SCAN);
 
@@ -469,6 +480,12 @@  errcode_t ext2fs_get_next_inode_full(ext2_inode_scan scan, ext2_ino_t *ino,
 #endif
 	}
 
+	if (bufsize < length) {
+		retval = ext2fs_get_mem(length, &iptr);
+		if (retval)
+			return retval;
+	}
+
 	retval = 0;
 	if (extra_bytes) {
 		memcpy(scan->temp_buffer+extra_bytes, scan->ptr,
@@ -477,26 +494,26 @@  errcode_t ext2fs_get_next_inode_full(ext2_inode_scan scan, ext2_ino_t *ino,
 		scan->bytes_left -= scan->inode_size - extra_bytes;
 
 #ifdef WORDS_BIGENDIAN
-		memset(inode, 0, bufsize);
+		memset(iptr, 0, length);
 		ext2fs_swap_inode_full(scan->fs,
-			       (struct ext2_inode_large *) inode,
+			       (struct ext2_inode_large *) iptr,
 			       (struct ext2_inode_large *) scan->temp_buffer,
-			       0, bufsize);
+			       0, length);
 #else
-		*inode = *((struct ext2_inode *) scan->temp_buffer);
+		memcpy(iptr, scan->temp_buffer, length);
 #endif
 		if (scan->scan_flags & EXT2_SF_BAD_EXTRA_BYTES)
 			retval = EXT2_ET_BAD_BLOCK_IN_INODE_TABLE;
 		scan->scan_flags &= ~EXT2_SF_BAD_EXTRA_BYTES;
 	} else {
 #ifdef WORDS_BIGENDIAN
-		memset(inode, 0, bufsize);
+		memset(iptr, 0, length);
 		ext2fs_swap_inode_full(scan->fs,
-				(struct ext2_inode_large *) inode,
+				(struct ext2_inode_large *) iptr,
 				(struct ext2_inode_large *) scan->ptr,
-				0, bufsize);
+				0, length);
 #else
-		memcpy(inode, scan->ptr, bufsize);
+		memcpy(iptr, scan->ptr, length);
 #endif
 		scan->ptr += scan->inode_size;
 		scan->bytes_left -= scan->inode_size;
@@ -507,6 +524,10 @@  errcode_t ext2fs_get_next_inode_full(ext2_inode_scan scan, ext2_ino_t *ino,
 	scan->inodes_left--;
 	scan->current_inode++;
 	*ino = scan->current_inode;
+	if (inode != iptr) {
+		memcpy(inode, iptr, bufsize);
+		ext2fs_free_mem(&iptr);
+	}
 	return retval;
 }
 
@@ -527,8 +548,11 @@  errcode_t ext2fs_read_inode_full(ext2_filsys fs, ext2_ino_t ino,
 	unsigned long 	group, block, offset;
 	char 		*ptr;
 	errcode_t	retval;
-	int 		clen, i, inodes_per_block, length;
+	int		clen, i, inodes_per_block;
 	io_channel	io;
+	int		length = EXT2_INODE_SIZE(fs->super);
+	struct ext2_inode	*iptr;
+	int		cache_slot;
 
 	EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
 
@@ -547,13 +571,10 @@  errcode_t ext2fs_read_inode_full(ext2_filsys fs, ext2_ino_t ino,
 			return retval;
 	}
 	/* Check to see if it's in the inode cache */
-	if (bufsize == sizeof(struct ext2_inode)) {
-		/* only old good inode can be retrieved from the cache */
-		for (i=0; i < fs->icache->cache_size; i++) {
-			if (fs->icache->cache[i].ino == ino) {
-				*inode = fs->icache->cache[i].inode;
-				return 0;
-			}
+	for (i = 0; i < fs->icache->cache_size; i++) {
+		if (fs->icache->cache[i].ino == ino) {
+			memcpy(inode, fs->icache->cache[i].inode, bufsize);
+			return 0;
 		}
 	}
 	if (fs->flags & EXT2_FLAG_IMAGE_FILE) {
@@ -578,11 +599,10 @@  errcode_t ext2fs_read_inode_full(ext2_filsys fs, ext2_ino_t ino,
 	}
 	offset &= (EXT2_BLOCK_SIZE(fs->super) - 1);
 
-	length = EXT2_INODE_SIZE(fs->super);
-	if (bufsize < length)
-		length = bufsize;
+	cache_slot = (fs->icache->cache_last + 1) % fs->icache->cache_size;
+	iptr = fs->icache->cache[cache_slot].inode;
 
-	ptr = (char *) inode;
+	ptr = (char *) iptr;
 	while (length) {
 		clen = length;
 		if ((offset + length) > fs->blocksize)
@@ -604,18 +624,18 @@  errcode_t ext2fs_read_inode_full(ext2_filsys fs, ext2_ino_t ino,
 		ptr += clen;
 		block_nr++;
 	}
+	length = EXT2_INODE_SIZE(fs->super);
 
 #ifdef WORDS_BIGENDIAN
-	ext2fs_swap_inode_full(fs, (struct ext2_inode_large *) inode,
-			       (struct ext2_inode_large *) inode,
-			       0, bufsize);
+	ext2fs_swap_inode_full(fs, (struct ext2_inode_large *) iptr,
+			       (struct ext2_inode_large *) iptr,
+			       0, length);
 #endif
 
-	/* Update the inode cache */
-	fs->icache->cache_last = (fs->icache->cache_last + 1) %
-		fs->icache->cache_size;
-	fs->icache->cache[fs->icache->cache_last].ino = ino;
-	fs->icache->cache[fs->icache->cache_last].inode = *inode;
+	/* Update the inode cache bookkeeping */
+	fs->icache->cache_last = cache_slot;
+	fs->icache->cache[cache_slot].ino = ino;
+	memcpy(inode, iptr, bufsize);
 
 	return 0;
 }
@@ -633,9 +653,10 @@  errcode_t ext2fs_write_inode_full(ext2_filsys fs, ext2_ino_t ino,
 	blk64_t block_nr;
 	unsigned long group, block, offset;
 	errcode_t retval = 0;
-	struct ext2_inode_large temp_inode, *w_inode;
+	struct ext2_inode_large *w_inode;
 	char *ptr;
-	int clen, i, length;
+	int clen, i;
+	int length = EXT2_INODE_SIZE(fs->super);
 
 	EXT2_CHECK_MAGIC(fs, EXT2_ET_MAGIC_EXT2FS_FILSYS);
 
@@ -646,46 +667,43 @@  errcode_t ext2fs_write_inode_full(ext2_filsys fs, ext2_ino_t ino,
 			return retval;
 	}
 
+	if ((ino == 0) || (ino > fs->super->s_inodes_count))
+		return EXT2_ET_BAD_INODE_NUM;
+
+	/* Prepare our shadow buffer for read/modify/byteswap/write */
+	retval = ext2fs_get_mem(length, &w_inode);
+	if (retval)
+		return retval;
+	if (bufsize < length)
+		retval = ext2fs_read_inode_full(fs, ino,
+						(struct ext2_inode *)w_inode,
+						length);
+	if (retval)
+		goto errout;
+
 	/* Check to see if the inode cache needs to be updated */
 	if (fs->icache) {
 		for (i=0; i < fs->icache->cache_size; i++) {
 			if (fs->icache->cache[i].ino == ino) {
-				fs->icache->cache[i].inode = *inode;
+				memcpy(fs->icache->cache[i].inode, inode,
+				       bufsize);
 				break;
 			}
 		}
 	} else {
 		retval = create_icache(fs);
 		if (retval)
-			return retval;
+			goto errout;
 	}
+	memcpy(w_inode, inode, bufsize);
 
-	if (!(fs->flags & EXT2_FLAG_RW))
-		return EXT2_ET_RO_FILSYS;
-
-	if ((ino == 0) || (ino > fs->super->s_inodes_count))
-		return EXT2_ET_BAD_INODE_NUM;
-
-	length = bufsize;
-	if (length < EXT2_INODE_SIZE(fs->super))
-		length = EXT2_INODE_SIZE(fs->super);
-
-	if (length > (int) sizeof(struct ext2_inode_large)) {
-		w_inode = malloc(length);
-		if (!w_inode) {
-			retval = ENOMEM;
-			goto errout;
-		}
-	} else
-		w_inode = &temp_inode;
-	memset(w_inode, 0, length);
+	if (!(fs->flags & EXT2_FLAG_RW)) {
+		retval = EXT2_ET_RO_FILSYS;
+		goto errout;
+	}
 
 #ifdef WORDS_BIGENDIAN
-	ext2fs_swap_inode_full(fs, w_inode,
-			       (struct ext2_inode_large *) inode,
-			       1, bufsize);
-#else
-	memcpy(w_inode, inode, bufsize);
+	ext2fs_swap_inode_full(fs, w_inode, w_inode, 1, length);
 #endif
 
 	group = (ino - 1) / EXT2_INODES_PER_GROUP(fs->super);
@@ -700,10 +718,6 @@  errcode_t ext2fs_write_inode_full(ext2_filsys fs, ext2_ino_t ino,
 
 	offset &= (EXT2_BLOCK_SIZE(fs->super) - 1);
 
-	length = EXT2_INODE_SIZE(fs->super);
-	if (length > bufsize)
-		length = bufsize;
-
 	ptr = (char *) w_inode;
 
 	while (length) {
@@ -736,8 +750,7 @@  errcode_t ext2fs_write_inode_full(ext2_filsys fs, ext2_ino_t ino,
 
 	fs->flags |= EXT2_FLAG_CHANGED;
 errout:
-	if (w_inode && w_inode != &temp_inode)
-		free(w_inode);
+	ext2fs_free_mem(&w_inode);
 	return retval;
 }