diff mbox

ext4: don't cache out of order extents

Message ID 1382002073-27862-1-git-send-email-guaneryu@gmail.com
State Superseded, archived
Headers show

Commit Message

Eryu Guan Oct. 17, 2013, 9:27 a.m. UTC
A corrupted ext4 may have out of order leaf extents, i.e.

extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
             ^^^^ overlap with previous extent

Reading such extent could hit BUG_ON() in ext4_es_cache_extent().

	BUG_ON(end < lblk);

The problem is that __read_extent_tree_block() tries to cache holes as
well but assumes 'lblk' is greater than 'prev' and passes underflowed
length to ext4_es_cache_extent().

I hit this when fuzz testing ext4, and am able to reproduce it by
modifying the on-disk extent by hand.

Ran xfstests on patched ext4 and no regression.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
---
 fs/ext4/extents.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Lukas Czerner Oct. 17, 2013, 1:44 p.m. UTC | #1
On Thu, 17 Oct 2013, Eryu Guan wrote:

> Date: Thu, 17 Oct 2013 17:27:53 +0800
> From: Eryu Guan <guaneryu@gmail.com>
> To: linux-ext4@vger.kernel.org
> Cc: Eryu Guan <guaneryu@gmail.com>, Theodore Ts'o <tytso@mit.edu>
> Subject: [PATCH] ext4: don't cache out of order extents
> 
> A corrupted ext4 may have out of order leaf extents, i.e.
> 
> extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
> extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
>              ^^^^ overlap with previous extent
> 
> Reading such extent could hit BUG_ON() in ext4_es_cache_extent().
> 
> 	BUG_ON(end < lblk);
> 
> The problem is that __read_extent_tree_block() tries to cache holes as
> well but assumes 'lblk' is greater than 'prev' and passes underflowed
> length to ext4_es_cache_extent().
> 
> I hit this when fuzz testing ext4, and am able to reproduce it by
> modifying the on-disk extent by hand.
> 
> Ran xfstests on patched ext4 and no regression.

So what will happen with the file system with this patch when
presented with such corruption ?

It seems to me that ext4_es_cache_extent() will happily skip this
extent because it will find that this particular offset is already
in the tree. Hence we'll have a gap in the status tree which really
should not be there and I suspect that something bad will happen.

I think that we should deal with this corruption immediately when we
spot it there, not just hide it.

Thanks!
-Lukas

> 
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Signed-off-by: Eryu Guan <guaneryu@gmail.com>
> ---
>  fs/ext4/extents.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 54d52af..c9ebcb9 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -503,7 +503,7 @@ __read_extent_tree_block(const char *function, unsigned int line,
>  			ext4_lblk_t lblk = le32_to_cpu(ex->ee_block);
>  			int len = ext4_ext_get_actual_len(ex);
>  
> -			if (prev && (prev != lblk))
> +			if (prev && (prev < lblk))
>  				ext4_es_cache_extent(inode, prev,
>  						     lblk - prev, ~0,
>  						     EXTENT_STATUS_HOLE);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Oct. 17, 2013, 2:40 p.m. UTC | #2
On Thu, Oct 17, 2013 at 03:44:35PM +0200, Lukáš Czerner wrote:
> 
> So what will happen with the file system with this patch when
> presented with such corruption ?
> 
> It seems to me that ext4_es_cache_extent() will happily skip this
> extent because it will find that this particular offset is already
> in the tree. Hence we'll have a gap in the status tree which really
> should not be there and I suspect that something bad will happen.

Ah, I see what's going wrong.  __read_extent_tree_block() calls
__ext4_ext_check() which is supposed to validate that extent tree
block is valid.  The __ext4_ext_check() function calls
ext4_valid_extent_entries() which is supposed to validate the
individual entries in the extent.  However, it is not checking to make
sure there are no overlapping extents.  We should add that check to
ext4_valid_extent_entries(); that way, we will call ext4_error_inode()
to mark the file system as corrupted.

Eryu's patch, or something like it, will still be needed so that in
the case of errors=countinue, we don't end up calling BUG_ON().

Thanks for finding this!

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan Oct. 17, 2013, 3:06 p.m. UTC | #3
On Thu, Oct 17, 2013 at 03:44:35PM +0200, Lukáš Czerner wrote:
> On Thu, 17 Oct 2013, Eryu Guan wrote:
> 
> > Date: Thu, 17 Oct 2013 17:27:53 +0800
> > From: Eryu Guan <guaneryu@gmail.com>
> > To: linux-ext4@vger.kernel.org
> > Cc: Eryu Guan <guaneryu@gmail.com>, Theodore Ts'o <tytso@mit.edu>
> > Subject: [PATCH] ext4: don't cache out of order extents
> > 
> > A corrupted ext4 may have out of order leaf extents, i.e.
> > 
> > extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
> > extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
> >              ^^^^ overlap with previous extent
> > 
> > Reading such extent could hit BUG_ON() in ext4_es_cache_extent().
> > 
> > 	BUG_ON(end < lblk);
> > 
> > The problem is that __read_extent_tree_block() tries to cache holes as
> > well but assumes 'lblk' is greater than 'prev' and passes underflowed
> > length to ext4_es_cache_extent().
> > 
> > I hit this when fuzz testing ext4, and am able to reproduce it by
> > modifying the on-disk extent by hand.
> > 
> > Ran xfstests on patched ext4 and no regression.
> 
> So what will happen with the file system with this patch when
> presented with such corruption ?
> 
> It seems to me that ext4_es_cache_extent() will happily skip this
> extent because it will find that this particular offset is already
> in the tree. Hence we'll have a gap in the status tree which really
> should not be there and I suspect that something bad will happen.
> 
> I think that we should deal with this corruption immediately when we
> spot it there, not just hide it.

Yes, agreed, we should validate the extent not cover the corruption as
Ted pointed out. Don't know why I didn't think about it more in the
first place..

Thanks!

Eryu Guan
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner Oct. 17, 2013, 3:22 p.m. UTC | #4
On Thu, 17 Oct 2013, Theodore Ts'o wrote:

> Date: Thu, 17 Oct 2013 10:40:44 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Eryu Guan <guaneryu@gmail.com>, linux-ext4@vger.kernel.org
> Subject: Re: [PATCH] ext4: don't cache out of order extents
> 
> On Thu, Oct 17, 2013 at 03:44:35PM +0200, Lukáš Czerner wrote:
> > 
> > So what will happen with the file system with this patch when
> > presented with such corruption ?
> > 
> > It seems to me that ext4_es_cache_extent() will happily skip this
> > extent because it will find that this particular offset is already
> > in the tree. Hence we'll have a gap in the status tree which really
> > should not be there and I suspect that something bad will happen.
> 
> Ah, I see what's going wrong.  __read_extent_tree_block() calls
> __ext4_ext_check() which is supposed to validate that extent tree
> block is valid.  The __ext4_ext_check() function calls
> ext4_valid_extent_entries() which is supposed to validate the
> individual entries in the extent.  However, it is not checking to make
> sure there are no overlapping extents.  We should add that check to
> ext4_valid_extent_entries(); that way, we will call ext4_error_inode()
> to mark the file system as corrupted.

I agree, since ext4_ext_check() should be really only used when
reading data from disk. That said, we might actually remove the
check from ext4_ext_precache() and ext4_ext_remove_space() because
it seems to be that the check has already been done in ext4_iget()
and it should be enough to check it once when reading from disk,
right ?

> 
> Eryu's patch, or something like it, will still be needed so that in
> the case of errors=countinue, we don't end up calling BUG_ON().

Hmm shouldn't we avoid that data in the case that it's corrupted
rather than using it ? It seems like this is what the code would do
anyway even with errors=continue when __ext4_ext_check() returns
error.

Thanks!
-Lukas

> 
> Thanks for finding this!
> 
> 						- Ted
>
Theodore Ts'o Oct. 17, 2013, 3:58 p.m. UTC | #5
On Thu, Oct 17, 2013 at 05:22:58PM +0200, Lukáš Czerner wrote:
> 
> I agree, since ext4_ext_check() should be really only used when
> reading data from disk. That said, we might actually remove the
> check from ext4_ext_precache() and ext4_ext_remove_space() because
> it seems to be that the check has already been done in ext4_iget()
> and it should be enough to check it once when reading from disk,
> right ?

Yes, since we do call ext4_ext_check() in ext4_iget() to verify the
root node of the extent tree located in ei->i_blocks[], it's strictly
speaking not necessary.  OTOH, there are at most four entries in
i_blocks[] that need to be verified, and it's a bit easier for the
contents of i_blocks[] to get corrupted by buggy code, so it's a
toss-up whether it's really worth it to remove it from those two
places, which aren't really hotspots.  It could be argued that are
plenty of other places that where we're not validating the root extent
tree node, so we might as well remove it from those two functions,
though.

> > Eryu's patch, or something like it, will still be needed so that in
> > the case of errors=countinue, we don't end up calling BUG_ON().
> 
> Hmm shouldn't we avoid that data in the case that it's corrupted
> rather than using it ? It seems like this is what the code would do
> anyway even with errors=continue when __ext4_ext_check() returns
> error.

Hmm, maybe we should set a flag indicating that the inode is bad, and
then cause attempts to read or write the contents of that inode should
return EIO.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 54d52af..c9ebcb9 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -503,7 +503,7 @@  __read_extent_tree_block(const char *function, unsigned int line,
 			ext4_lblk_t lblk = le32_to_cpu(ex->ee_block);
 			int len = ext4_ext_get_actual_len(ex);
 
-			if (prev && (prev != lblk))
+			if (prev && (prev < lblk))
 				ext4_es_cache_extent(inode, prev,
 						     lblk - prev, ~0,
 						     EXTENT_STATUS_HOLE);