diff mbox

[20/49] libext2fs: fix parents when modifying extents

Message ID 20140311065605.30585.47791.stgit@birch.djwong.org
State Accepted, archived
Headers show

Commit Message

Darrick Wong March 11, 2014, 6:56 a.m. UTC
In ext2fs_extent_set_bmap() and ext2fs_punch_extent(), fix the parents
when altering either end of an extent so that the parent nodes reflect
the added mapping.

There's a slight complication to using fix_parents: if there are two
mappings to an lblk in the tree, the value of handle->path->curr can
point to either extent afterwards), which is documented in a comment.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 lib/ext2fs/extent.c |   30 ++++++++++++++++++++++++------
 lib/ext2fs/punch.c  |   14 ++++++++++----
 2 files changed, 34 insertions(+), 10 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Theodore Ts'o March 14, 2014, 2:01 p.m. UTC | #1
On Mon, Mar 10, 2014 at 11:56:05PM -0700, Darrick J. Wong wrote:
> In ext2fs_extent_set_bmap() and ext2fs_punch_extent(), fix the parents
> when altering either end of an extent so that the parent nodes reflect
> the added mapping.

Can you say more about what bug/symptom this fixes?

> There's a slight complication to using fix_parents: if there are two
> mappings to an lblk in the tree, the value of handle->path->curr can
> point to either extent afterwards), which is documented in a comment.

It's horribly wrong to map an lblk with two extents.  So the question
is at what places should we complain if we notice this.  In the ideal
world we would never allow an extent tree to be mutated in such a way
that it is invalid like this, but I am worried about the overhead
costs of always checking.  But if there are places where it wouldn't
take much effort to check, we should probably do so and return an
error.  (If the extent tree is already invalid, I suppose we should
allow error out the operation, since this would affect e2fsck and
debugfs.  I'm talking about checks to make sure that libext2fs or its
callers don't accidentally make things worse.)

	    	  	       	  - Ted

P.S.  I suppose the one possible valid use case for making an invalid
extent tree is a developer making an invalid extent tree for
regression testing, so maybe there should be an exemption for debugfs.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick Wong March 14, 2014, 8:13 p.m. UTC | #2
On Fri, Mar 14, 2014 at 10:01:58AM -0400, Theodore Ts'o wrote:
> On Mon, Mar 10, 2014 at 11:56:05PM -0700, Darrick J. Wong wrote:
> > In ext2fs_extent_set_bmap() and ext2fs_punch_extent(), fix the parents
> > when altering either end of an extent so that the parent nodes reflect
> > the added mapping.
> 
> Can you say more about what bug/symptom this fixes?

I first observed symptoms when calls to _set_bmap() or _punch_extent() on a
leaf node would leave the index node's ei_block set to the wrong value, which
e2fsck complains about.

In the _set_bmap() case, I noticed that the "remapping last block in extent"
case would produce symptoms if we are trying to remap a block from "extent" to
"next_extent", and the two extents are pointed to by different index nodes.
_extent_replace(..., next_extent) updates e_lblk in the leaf extent, but
because there's no _extent_fix_parents() call, the index nodes never get
updated.

In the _punch_extent() case, we conclude that we need to split an extent into
two pieces since we're punching out the middle.  If the extent is the last
extent in the block, the second extent will be inserted into a new leaf node
block.  Without _fix_parents(), the index node doesn't seem to get updated.

> > There's a slight complication to using fix_parents: if there are two
> > mappings to an lblk in the tree, the value of handle->path->curr can
> > point to either extent afterwards), which is documented in a comment.
> 
> It's horribly wrong to map an lblk with two extents.  So the question
> is at what places should we complain if we notice this.  In the ideal
> world we would never allow an extent tree to be mutated in such a way
> that it is invalid like this, but I am worried about the overhead
> costs of always checking.  But if there are places where it wouldn't
> take much effort to check, we should probably do so and return an
> error.  (If the extent tree is already invalid, I suppose we should
> allow error out the operation, since this would affect e2fsck and
> debugfs.  I'm talking about checks to make sure that libext2fs or its
> callers don't accidentally make things worse.)

Both cases above cause the (temporary) use of two extents to map the same lblk.
_set_bmap() first adjusts next_extent.e_lblk to cover the lblk, then decrements
extent.e_len so that "extent" no longer covers the lblk.  _punch_extent()
splits an extent by first inserting the second part of the extent and then
shortening the original extent to reflect the punchout.

These two cases seemed slightly suspect to me, but since e2fsprogs doesn't
journal, changing the code to reduce one extent and enlarge the other opens the
possibility that we could lose the lblk mapping entirely if something happens
in between the two operations.  I prefer "block still mapped but fsck is
unhappy about redundant bookeepping" to "the block is gone", so I added the two
fixes and let it go.

--D
> 
> 	    	  	       	  - Ted
> 
> P.S.  I suppose the one possible valid use case for making an invalid
> extent tree is a developer making an invalid extent tree for
> regression testing, so maybe there should be an exemption for debugfs.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 15, 2014, 3:46 p.m. UTC | #3
On Fri, Mar 14, 2014 at 01:13:19PM -0700, Darrick J. Wong wrote:
> Both cases above cause the (temporary) use of two extents to map the same lblk.
> _set_bmap() first adjusts next_extent.e_lblk to cover the lblk, then decrements
> extent.e_len so that "extent" no longer covers the lblk.  _punch_extent()
> splits an extent by first inserting the second part of the extent and then
> shortening the original extent to reflect the punchout.
> 
> These two cases seemed slightly suspect to me, but since e2fsprogs doesn't
> journal, changing the code to reduce one extent and enlarge the other opens the
> possibility that we could lose the lblk mapping entirely if something happens
> in between the two operations.  I prefer "block still mapped but fsck is
> unhappy about redundant bookeepping" to "the block is gone", so I added the two
> fixes and let it go.

OK, fair enough, I'll merge in this patch since it's fixing a real
bug, and I can't think of a good way to fix this issue without making
pretty massive changes.

I'll note though that at the moment e2fsck doesn't have sophisticated
extent tree recovery support; so if the extent tree is corrupt, it
offers to zap the entire extent tree, instead of trying to fix up the
extent tree.  That wasn't an issue because it was likely that if
absent bugs, the most likely case if the extent tree is corrupted,
there's not much you can do, so it wasn't worth it to add some code to
handle these cases.

However, if a large number of users start using your FUSE server in
production, then making sure the right thing happens when they suffer
power failures start becoming more important --- but it may not be
trivial since libext2fs wasn't originally intended for that use case.
I am glad that you implemented it, since it's a great way to get
better testing for various corner cases.  But that's different from
advertising that the FUSE server should be used in production use
cases; in particular, we might need to figure out some kind of
journalling system for FUSE, either using the ext4's internal journal,
or some user-space journalling system.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick Wong March 17, 2014, 4:59 p.m. UTC | #4
On Sat, Mar 15, 2014 at 11:46:26AM -0400, Theodore Ts'o wrote:
> On Fri, Mar 14, 2014 at 01:13:19PM -0700, Darrick J. Wong wrote:
> > Both cases above cause the (temporary) use of two extents to map the same lblk.
> > _set_bmap() first adjusts next_extent.e_lblk to cover the lblk, then decrements
> > extent.e_len so that "extent" no longer covers the lblk.  _punch_extent()
> > splits an extent by first inserting the second part of the extent and then
> > shortening the original extent to reflect the punchout.
> > 
> > These two cases seemed slightly suspect to me, but since e2fsprogs doesn't
> > journal, changing the code to reduce one extent and enlarge the other opens the
> > possibility that we could lose the lblk mapping entirely if something happens
> > in between the two operations.  I prefer "block still mapped but fsck is
> > unhappy about redundant bookeepping" to "the block is gone", so I added the two
> > fixes and let it go.
> 
> OK, fair enough, I'll merge in this patch since it's fixing a real
> bug, and I can't think of a good way to fix this issue without making
> pretty massive changes.
> 
> I'll note though that at the moment e2fsck doesn't have sophisticated
> extent tree recovery support; so if the extent tree is corrupt, it
> offers to zap the entire extent tree, instead of trying to fix up the
> extent tree.  That wasn't an issue because it was likely that if
> absent bugs, the most likely case if the extent tree is corrupted,
> there's not much you can do, so it wasn't worth it to add some code to
> handle these cases.
> 
> However, if a large number of users start using your FUSE server in
> production, then making sure the right thing happens when they suffer
> power failures start becoming more important --- but it may not be
> trivial since libext2fs wasn't originally intended for that use case.
> I am glad that you implemented it, since it's a great way to get
> better testing for various corner cases.  But that's different from
> advertising that the FUSE server should be used in production use
> cases; in particular, we might need to figure out some kind of
> journalling system for FUSE, either using the ext4's internal journal,
> or some user-space journalling system.

I wasn't planning to advertise fuse2fs for production use.  But perhaps it
could cough out a few more warnings if you're not mounting ro, and/or default
to ro?

--D
> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/lib/ext2fs/extent.c b/lib/ext2fs/extent.c
index f27344e..80ce88f 100644
--- a/lib/ext2fs/extent.c
+++ b/lib/ext2fs/extent.c
@@ -720,7 +720,14 @@  errcode_t ext2fs_extent_goto(ext2_extent_handle_t handle,
  * and so on.
  *
  * Safe to call for any position in node; if not at the first entry,
- * will  simply return.
+ * it will simply return.
+ *
+ * Note a subtlety of this function -- if there happen to be two extents
+ * mapping the same lblk and someone calls fix_parents on the second of the two
+ * extents, the position of the extent handle after the call will be the second
+ * extent if nothing happened, or the first extent if something did.  A caller
+ * in this situation must use ext2fs_extent_goto() after calling this function.
+ * Or simply don't map the same lblk with two extents, ever.
  */
 errcode_t ext2fs_extent_fix_parents(ext2_extent_handle_t handle)
 {
@@ -1379,17 +1386,25 @@  errcode_t ext2fs_extent_set_bmap(ext2_extent_handle_t handle,
 							       &next_extent);
 				if (retval)
 					goto done;
-				retval = ext2fs_extent_fix_parents(handle);
-				if (retval)
-					goto done;
 			} else
 				retval = ext2fs_extent_insert(handle,
 				      EXT2_EXTENT_INSERT_AFTER, &newextent);
 			if (retval)
 				goto done;
-			/* Now pointing at inserted extent; move back to prev */
+			retval = ext2fs_extent_fix_parents(handle);
+			if (retval)
+				goto done;
+			/*
+			 * Now pointing at inserted extent; move back to prev.
+			 *
+			 * We cannot use EXT2_EXTENT_PREV to go back; note the
+			 * subtlety in the comment for fix_parents().
+			 */
+			retval = ext2fs_extent_goto(handle, logical);
+			if (retval)
+				goto done;
 			retval = ext2fs_extent_get(handle,
-						   EXT2_EXTENT_PREV_LEAF,
+						   EXT2_EXTENT_CURRENT,
 						   &extent);
 			if (retval)
 				goto done;
@@ -1422,6 +1437,9 @@  errcode_t ext2fs_extent_set_bmap(ext2_extent_handle_t handle,
 							      0, &newextent);
 			if (retval)
 				goto done;
+			retval = ext2fs_extent_fix_parents(handle);
+			if (retval)
+				goto done;
 			retval = ext2fs_extent_get(handle,
 						   EXT2_EXTENT_NEXT_LEAF,
 						   &extent);
diff --git a/lib/ext2fs/punch.c b/lib/ext2fs/punch.c
index 532c4b8..60cd2a3 100644
--- a/lib/ext2fs/punch.c
+++ b/lib/ext2fs/punch.c
@@ -344,10 +344,16 @@  static errcode_t ext2fs_punch_extent(ext2_filsys fs, ext2_ino_t ino,
 					EXT2_EXTENT_INSERT_AFTER, &newex);
 			if (retval)
 				goto errout;
-			/* Now pointing at inserted extent; so go back */
-			retval = ext2fs_extent_get(handle,
-						   EXT2_EXTENT_PREV_LEAF,
-						   &newex);
+			retval = ext2fs_extent_fix_parents(handle);
+			if (retval)
+				goto errout;
+			/*
+			 * Now pointing at inserted extent; so go back.
+			 *
+			 * We cannot use EXT2_EXTENT_PREV to go back; note the
+			 * subtlety in the comment for fix_parents().
+			 */
+			retval = ext2fs_extent_goto(handle, extent.e_lblk);
 			if (retval)
 				goto errout;
 		}