Message ID | 1382002073-27862-1-git-send-email-guaneryu@gmail.com |
---|---|
State | Superseded, archived |
Headers | show |
On Thu, 17 Oct 2013, Eryu Guan wrote: > Date: Thu, 17 Oct 2013 17:27:53 +0800 > From: Eryu Guan <guaneryu@gmail.com> > To: linux-ext4@vger.kernel.org > Cc: Eryu Guan <guaneryu@gmail.com>, Theodore Ts'o <tytso@mit.edu> > Subject: [PATCH] ext4: don't cache out of order extents > > A corrupted ext4 may have out of order leaf extents, i.e. > > extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT > extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT > ^^^^ overlap with previous extent > > Reading such extent could hit BUG_ON() in ext4_es_cache_extent(). > > BUG_ON(end < lblk); > > The problem is that __read_extent_tree_block() tries to cache holes as > well but assumes 'lblk' is greater than 'prev' and passes underflowed > length to ext4_es_cache_extent(). > > I hit this when fuzz testing ext4, and am able to reproduce it by > modifying the on-disk extent by hand. > > Ran xfstests on patched ext4 and no regression. So what will happen with the file system with this patch when presented with such corruption ? It seems to me that ext4_es_cache_extent() will happily skip this extent because it will find that this particular offset is already in the tree. Hence we'll have a gap in the status tree which really should not be there and I suspect that something bad will happen. I think that we should deal with this corruption immediately when we spot it there, not just hide it. Thanks! -Lukas > > Cc: "Theodore Ts'o" <tytso@mit.edu> > Signed-off-by: Eryu Guan <guaneryu@gmail.com> > --- > fs/ext4/extents.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 54d52af..c9ebcb9 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -503,7 +503,7 @@ __read_extent_tree_block(const char *function, unsigned int line, > ext4_lblk_t lblk = le32_to_cpu(ex->ee_block); > int len = ext4_ext_get_actual_len(ex); > > - if (prev && (prev != lblk)) > + if (prev && (prev < lblk)) > ext4_es_cache_extent(inode, prev, > lblk - prev, ~0, > EXTENT_STATUS_HOLE); > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 17, 2013 at 03:44:35PM +0200, Lukáš Czerner wrote: > > So what will happen with the file system with this patch when > presented with such corruption ? > > It seems to me that ext4_es_cache_extent() will happily skip this > extent because it will find that this particular offset is already > in the tree. Hence we'll have a gap in the status tree which really > should not be there and I suspect that something bad will happen. Ah, I see what's going wrong. __read_extent_tree_block() calls __ext4_ext_check() which is supposed to validate that extent tree block is valid. The __ext4_ext_check() function calls ext4_valid_extent_entries() which is supposed to validate the individual entries in the extent. However, it is not checking to make sure there are no overlapping extents. We should add that check to ext4_valid_extent_entries(); that way, we will call ext4_error_inode() to mark the file system as corrupted. Eryu's patch, or something like it, will still be needed so that in the case of errors=countinue, we don't end up calling BUG_ON(). Thanks for finding this! - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 17, 2013 at 03:44:35PM +0200, Lukáš Czerner wrote: > On Thu, 17 Oct 2013, Eryu Guan wrote: > > > Date: Thu, 17 Oct 2013 17:27:53 +0800 > > From: Eryu Guan <guaneryu@gmail.com> > > To: linux-ext4@vger.kernel.org > > Cc: Eryu Guan <guaneryu@gmail.com>, Theodore Ts'o <tytso@mit.edu> > > Subject: [PATCH] ext4: don't cache out of order extents > > > > A corrupted ext4 may have out of order leaf extents, i.e. > > > > extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT > > extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT > > ^^^^ overlap with previous extent > > > > Reading such extent could hit BUG_ON() in ext4_es_cache_extent(). > > > > BUG_ON(end < lblk); > > > > The problem is that __read_extent_tree_block() tries to cache holes as > > well but assumes 'lblk' is greater than 'prev' and passes underflowed > > length to ext4_es_cache_extent(). > > > > I hit this when fuzz testing ext4, and am able to reproduce it by > > modifying the on-disk extent by hand. > > > > Ran xfstests on patched ext4 and no regression. > > So what will happen with the file system with this patch when > presented with such corruption ? > > It seems to me that ext4_es_cache_extent() will happily skip this > extent because it will find that this particular offset is already > in the tree. Hence we'll have a gap in the status tree which really > should not be there and I suspect that something bad will happen. > > I think that we should deal with this corruption immediately when we > spot it there, not just hide it. Yes, agreed, we should validate the extent not cover the corruption as Ted pointed out. Don't know why I didn't think about it more in the first place.. Thanks! Eryu Guan -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 17 Oct 2013, Theodore Ts'o wrote: > Date: Thu, 17 Oct 2013 10:40:44 -0400 > From: Theodore Ts'o <tytso@mit.edu> > To: Lukáš Czerner <lczerner@redhat.com> > Cc: Eryu Guan <guaneryu@gmail.com>, linux-ext4@vger.kernel.org > Subject: Re: [PATCH] ext4: don't cache out of order extents > > On Thu, Oct 17, 2013 at 03:44:35PM +0200, Lukáš Czerner wrote: > > > > So what will happen with the file system with this patch when > > presented with such corruption ? > > > > It seems to me that ext4_es_cache_extent() will happily skip this > > extent because it will find that this particular offset is already > > in the tree. Hence we'll have a gap in the status tree which really > > should not be there and I suspect that something bad will happen. > > Ah, I see what's going wrong. __read_extent_tree_block() calls > __ext4_ext_check() which is supposed to validate that extent tree > block is valid. The __ext4_ext_check() function calls > ext4_valid_extent_entries() which is supposed to validate the > individual entries in the extent. However, it is not checking to make > sure there are no overlapping extents. We should add that check to > ext4_valid_extent_entries(); that way, we will call ext4_error_inode() > to mark the file system as corrupted. I agree, since ext4_ext_check() should be really only used when reading data from disk. That said, we might actually remove the check from ext4_ext_precache() and ext4_ext_remove_space() because it seems to be that the check has already been done in ext4_iget() and it should be enough to check it once when reading from disk, right ? > > Eryu's patch, or something like it, will still be needed so that in > the case of errors=countinue, we don't end up calling BUG_ON(). Hmm shouldn't we avoid that data in the case that it's corrupted rather than using it ? It seems like this is what the code would do anyway even with errors=continue when __ext4_ext_check() returns error. Thanks! -Lukas > > Thanks for finding this! > > - Ted >
On Thu, Oct 17, 2013 at 05:22:58PM +0200, Lukáš Czerner wrote: > > I agree, since ext4_ext_check() should be really only used when > reading data from disk. That said, we might actually remove the > check from ext4_ext_precache() and ext4_ext_remove_space() because > it seems to be that the check has already been done in ext4_iget() > and it should be enough to check it once when reading from disk, > right ? Yes, since we do call ext4_ext_check() in ext4_iget() to verify the root node of the extent tree located in ei->i_blocks[], it's strictly speaking not necessary. OTOH, there are at most four entries in i_blocks[] that need to be verified, and it's a bit easier for the contents of i_blocks[] to get corrupted by buggy code, so it's a toss-up whether it's really worth it to remove it from those two places, which aren't really hotspots. It could be argued that are plenty of other places that where we're not validating the root extent tree node, so we might as well remove it from those two functions, though. > > Eryu's patch, or something like it, will still be needed so that in > > the case of errors=countinue, we don't end up calling BUG_ON(). > > Hmm shouldn't we avoid that data in the case that it's corrupted > rather than using it ? It seems like this is what the code would do > anyway even with errors=continue when __ext4_ext_check() returns > error. Hmm, maybe we should set a flag indicating that the inode is bad, and then cause attempts to read or write the contents of that inode should return EIO. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 54d52af..c9ebcb9 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -503,7 +503,7 @@ __read_extent_tree_block(const char *function, unsigned int line, ext4_lblk_t lblk = le32_to_cpu(ex->ee_block); int len = ext4_ext_get_actual_len(ex); - if (prev && (prev != lblk)) + if (prev && (prev < lblk)) ext4_es_cache_extent(inode, prev, lblk - prev, ~0, EXTENT_STATUS_HOLE);
A corrupted ext4 may have out of order leaf extents, i.e. extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT ^^^^ overlap with previous extent Reading such extent could hit BUG_ON() in ext4_es_cache_extent(). BUG_ON(end < lblk); The problem is that __read_extent_tree_block() tries to cache holes as well but assumes 'lblk' is greater than 'prev' and passes underflowed length to ext4_es_cache_extent(). I hit this when fuzz testing ext4, and am able to reproduce it by modifying the on-disk extent by hand. Ran xfstests on patched ext4 and no regression. Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Eryu Guan <guaneryu@gmail.com> --- fs/ext4/extents.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)