Patchwork ext4: handle NULL p_ext in ext4_ext_next_allocated_block()

login
register
mail settings
Submitter Dmitri Monakho
Date Oct. 11, 2011, 7:01 a.m.
Message ID <87ipnwnh96.fsf@dmbot.sw.ru>
Download mbox | patch
Permalink /patch/118863/
State New
Headers show

Comments

Dmitri Monakho - Oct. 11, 2011, 7:01 a.m.
On Mon, 10 Oct 2011 08:28:23 -0700, Curt Wohlgemuth <curtw@google.com> wrote:
> Hi Lukas:
> 
> On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> > On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
> >
> >> In ext4_ext_next_allocated_block(), the path[depth] might
> >> have a p_ext that is NULL -- see ext4_ext_binsearch().  In
> >> such a case, dereferencing it will crash the machine.
> >>
> >> This patch checks for p_ext == NULL in
> >> ext4_ext_next_allocated_block() before dereferencinging it.
> >>
> >> Tested using a hand-crafted an inode with eh_entries == 0 in
> >> an extent block, verified that running FIEMAP on it crashes
> >> without this patch, works fine with it.
> >
> > Hi Curt,
> >
> > It seems to me that even that the patch fixes the NULL dereference, it
> > is not a complete solution. Is it possible that in "normal" case p_ext
> > would be NULL ? I think that this is only possible in extent split/add
> > case (as noted in ext4_ext_binsearch()) which should be atomic to the
> > other operations (locked i_mutex?).
> 
> Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL.
> 
> We've seen this problem during what appears to be a race between an
> inode growth (or truncate?) and another task doing a FIEMAP ioctl.
> The real problem is that FIEMAP handing in ext4 is just, well, buggy?
Wow, IMHO it not just buggy, it is obviously incorrect, IMHO it is more
fair just return -ENOTSUPP, at least it is much safer.
Yes calling FIEMAP and truncate/write in parallel is stupid, but not
prohibited. 
> 
> ext4_ext_walk_space() will get the i_data_sem, construct the path
> array, then release the semaphore.  But then it does a bazillion
> accesses on the extent/header/index pointers in the path array, with
> no protection against truncate, growth, or any other changes.  As far
> as I can tell, this is the only use of a path array retrieved from
> ext4_ext_find_extent() that isn't completely covered by i_data_sem.
In that case i_data sem protects us from nothing. Path collected can
simply disappear under us. And in fact i don't understand the
reason why we drop i_data_sem too soon. Are any reason to do that?
Seems like following patch should fix the issue.
From 12cd56ccd86c4d132f186034a9c11b0a2441a19f Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Tue, 11 Oct 2011 10:44:31 +0400
Subject: [PATCH] ext4: do not drop i_data_sem too soon

Path returned from ext4_find_extent is valid only while we hold
i_data_sem, so we can drop it only after we nolonger need it.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |   20 +++++++++++---------
 1 files changed, 11 insertions(+), 9 deletions(-)
Theodore Ts'o - Oct. 26, 2011, 8:26 a.m.
On Tue, Oct 11, 2011 at 11:01:57AM +0400, Dmitry Monakhov wrote:
> > ext4_ext_walk_space() will get the i_data_sem, construct the path
> > array, then release the semaphore.  But then it does a bazillion
> > accesses on the extent/header/index pointers in the path array, with
> > no protection against truncate, growth, or any other changes.  As far
> > as I can tell, this is the only use of a path array retrieved from
> > ext4_ext_find_extent() that isn't completely covered by i_data_sem.
>
> In that case i_data sem protects us from nothing. Path collected can
> simply disappear under us. And in fact i don't understand the
> reason why we drop i_data_sem too soon. Are any reason to do that?

The one concern I have is that I don't want FIEMAP to slow down "real"
ext4 processing.  So what I've hoping we'll be able to do is use a
seq_lock sort of design, where if the pointer changes out from under
us, FIEMAP is forced to redo its sampling.  But if there is some crazy
userspace program which is calling FIEMAP all the time, I'd much
rather that it not block ext4_map_blocks() if possible (which is what
I using a seqlock to protect the FIEMAP routines would help).

  	  	     	     	 	- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index da4583f..c716a1f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1847,23 +1847,32 @@  static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
 		num = last - block;
 		/* find extent for this block */
 		down_read(&EXT4_I(inode)->i_data_sem);
+		if (ext_depth(inode) != depth) {
+			/* depth was changed. we have to realloc path */
+			kfree(path);
+			path = NULL;
+		}
+
 		path = ext4_ext_find_extent(inode, block, path);
-		up_read(&EXT4_I(inode)->i_data_sem);
 		if (IS_ERR(path)) {
 			err = PTR_ERR(path);
+			up_read(&EXT4_I(inode)->i_data_sem);
 			path = NULL;
 			break;
 		}
 
 		depth = ext_depth(inode);
 		if (unlikely(path[depth].p_hdr == NULL)) {
+			up_read(&EXT4_I(inode)->i_data_sem);
 			EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
 			err = -EIO;
 			break;
 		}
+
 		ex = path[depth].p_ext;
 		next = ext4_ext_next_allocated_block(path);
-
+		up_read(&EXT4_I(inode)->i_data_sem);
+		ext4_ext_drop_refs(path);
 		exists = 0;
 		if (!ex) {
 			/* there is no extent yet, so try to allocate
@@ -1915,7 +1924,6 @@  static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
 			break;
 		}
 		err = func(inode, next, &cbex, ex, cbdata);
-		ext4_ext_drop_refs(path);
 
 		if (err < 0)
 			break;
@@ -1927,12 +1935,6 @@  static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
 			break;
 		}
 
-		if (ext_depth(inode) != depth) {
-			/* depth was changed. we have to realloc path */
-			kfree(path);
-			path = NULL;
-		}
-
 		block = cbex.ec_block + cbex.ec_len;
 	}