Patchwork ext4: handle NULL p_ext in ext4_ext_next_allocated_block()

login
register
mail settings
Submitter Curt Wohlgemuth
Date Oct. 9, 2011, 1:01 a.m.
Message ID <1318122074-16056-1-git-send-email-curtw@google.com>
Download mbox | patch
Permalink /patch/118570/
State Accepted
Headers show

Comments

Curt Wohlgemuth - Oct. 9, 2011, 1:01 a.m.
In ext4_ext_next_allocated_block(), the path[depth] might
have a p_ext that is NULL -- see ext4_ext_binsearch().  In
such a case, dereferencing it will crash the machine.

This patch checks for p_ext == NULL in
ext4_ext_next_allocated_block() before dereferencinging it.

Tested using a hand-crafted an inode with eh_entries == 0 in
an extent block, verified that running FIEMAP on it crashes
without this patch, works fine with it.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
---
 fs/ext4/extents.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)
Lukas Czerner - Oct. 10, 2011, 7:19 a.m.
On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:

> In ext4_ext_next_allocated_block(), the path[depth] might
> have a p_ext that is NULL -- see ext4_ext_binsearch().  In
> such a case, dereferencing it will crash the machine.
> 
> This patch checks for p_ext == NULL in
> ext4_ext_next_allocated_block() before dereferencinging it.
> 
> Tested using a hand-crafted an inode with eh_entries == 0 in
> an extent block, verified that running FIEMAP on it crashes
> without this patch, works fine with it.

Hi Curt,

It seems to me that even that the patch fixes the NULL dereference, it
is not a complete solution. Is it possible that in "normal" case p_ext
would be NULL ? I think that this is only possible in extent split/add
case (as noted in ext4_ext_binsearch()) which should be atomic to the
other operations (locked i_mutex?).

However this seems like an inode corruption so we should probably be
more verbose about it and print an appropriate EXT4_ERROR_INODE() or
even better check for the corrupted tree in the ext4_ext_find_extent()
(instead in ext4_ext_map_blocks()), however this will need to distinguish
between the normal and the tree modification case.

Thanks!
-Lukas

> 
> Signed-off-by: Curt Wohlgemuth <curtw@google.com>
> ---
>  fs/ext4/extents.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 57cf568..063a5b8 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
>  	while (depth >= 0) {
>  		if (depth == path->p_depth) {
>  			/* leaf */
> -			if (path[depth].p_ext !=
> +			/* p_ext can be NULL */
> +			if (path[depth].p_ext &&
> +				path[depth].p_ext !=
>  					EXT_LAST_EXTENT(path[depth].p_hdr))
>  			  return le32_to_cpu(path[depth].p_ext[1].ee_block);
>  		} else {
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Curt Wohlgemuth - Oct. 10, 2011, 3:28 p.m.
Hi Lukas:

On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
>
>> In ext4_ext_next_allocated_block(), the path[depth] might
>> have a p_ext that is NULL -- see ext4_ext_binsearch().  In
>> such a case, dereferencing it will crash the machine.
>>
>> This patch checks for p_ext == NULL in
>> ext4_ext_next_allocated_block() before dereferencinging it.
>>
>> Tested using a hand-crafted an inode with eh_entries == 0 in
>> an extent block, verified that running FIEMAP on it crashes
>> without this patch, works fine with it.
>
> Hi Curt,
>
> It seems to me that even that the patch fixes the NULL dereference, it
> is not a complete solution. Is it possible that in "normal" case p_ext
> would be NULL ? I think that this is only possible in extent split/add
> case (as noted in ext4_ext_binsearch()) which should be atomic to the
> other operations (locked i_mutex?).

Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL.

We've seen this problem during what appears to be a race between an
inode growth (or truncate?) and another task doing a FIEMAP ioctl.
The real problem is that FIEMAP handing in ext4 is just, well, buggy?

ext4_ext_walk_space() will get the i_data_sem, construct the path
array, then release the semaphore.  But then it does a bazillion
accesses on the extent/header/index pointers in the path array, with
no protection against truncate, growth, or any other changes.  As far
as I can tell, this is the only use of a path array retrieved from
ext4_ext_find_extent() that isn't completely covered by i_data_sem.

> However this seems like an inode corruption so we should probably be
> more verbose about it and print an appropriate EXT4_ERROR_INODE() or
> even better check for the corrupted tree in the ext4_ext_find_extent()
> (instead in ext4_ext_map_blocks()), however this will need to distinguish
> between the normal and the tree modification case.

What we've observed many times is a crash during a FIEMAP call to
ext4_ext_next_allocated_block(), which appears to me to be during a
race with another thread that's splitting the extent tree.  This
causes the machine to go down with the inode in a bad state.  But of
course, fsck won't detect and fix this, so when the machine comes back
up, and a FIEMAP call is done on this same inode -- without any other
threads -- it'll crash again.  Hence a nasty crash loop.

So you're right, in that this isn't the "real solution."  But devising
a safe, non-racy design for FIEMAP is not so simple, unless of course
you want to just hold the i_data_sem during the entire loop body of
ext4_ext_walk_space(), which would be pretty ugly.  Hence the
"band-aid" approach in my patch, which at least seems correct, if not
thorough.

Thanks,
Curt

>
> Thanks!
> -Lukas
>
>>
>> Signed-off-by: Curt Wohlgemuth <curtw@google.com>
>> ---
>>  fs/ext4/extents.c |    4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 57cf568..063a5b8 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
>>       while (depth >= 0) {
>>               if (depth == path->p_depth) {
>>                       /* leaf */
>> -                     if (path[depth].p_ext !=
>> +                     /* p_ext can be NULL */
>> +                     if (path[depth].p_ext &&
>> +                             path[depth].p_ext !=
>>                                       EXT_LAST_EXTENT(path[depth].p_hdr))
>>                         return le32_to_cpu(path[depth].p_ext[1].ee_block);
>>               } else {
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Oct. 26, 2011, 8:38 a.m.
On Sat, Oct 08, 2011 at 06:01:14PM -0700, Curt Wohlgemuth wrote:
> In ext4_ext_next_allocated_block(), the path[depth] might
> have a p_ext that is NULL -- see ext4_ext_binsearch().  In
> such a case, dereferencing it will crash the machine.
> 
> This patch checks for p_ext == NULL in
> ext4_ext_next_allocated_block() before dereferencinging it.
> 
> Tested using a hand-crafted an inode with eh_entries == 0 in
> an extent block, verified that running FIEMAP on it crashes
> without this patch, works fine with it.
> 
> Signed-off-by: Curt Wohlgemuth <curtw@google.com>

Thanks, applied.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 57cf568..063a5b8 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1395,7 +1395,9 @@  ext4_ext_next_allocated_block(struct ext4_ext_path *path)
 	while (depth >= 0) {
 		if (depth == path->p_depth) {
 			/* leaf */
-			if (path[depth].p_ext !=
+			/* p_ext can be NULL */
+			if (path[depth].p_ext &&
+				path[depth].p_ext !=
 					EXT_LAST_EXTENT(path[depth].p_hdr))
 			  return le32_to_cpu(path[depth].p_ext[1].ee_block);
 		} else {