diff mbox

ext4: allow inode expansion for nojournal file systems

Message ID 20161025211405.GA15502@localhost.localdomain
State Awaiting Upstream, archived
Headers show

Commit Message

Eric Whitney Oct. 25, 2016, 9:14 p.m. UTC
Runs of xfstest ext4/022 on nojournal file systems result in failures
because the inodes of some of its test files do not expand as expected.
The cause is a conditional in ext4_mark_inode_dirty() that prevents inode
expansion unless the test file system has a journal.  Remove this
unnecessary restriction.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
---
 fs/ext4/inode.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Andreas Dilger Oct. 25, 2016, 9:54 p.m. UTC | #1
On Oct 25, 2016, at 3:14 PM, Eric Whitney <enwlinux@gmail.com> wrote:
> 
> Runs of xfstest ext4/022 on nojournal file systems result in failures
> because the inodes of some of its test files do not expand as expected.
> The cause is a conditional in ext4_mark_inode_dirty() that prevents inode
> expansion unless the test file system has a journal.  Remove this
> unnecessary restriction.

I think the reason we required that the filesystem be journaled to expand
the inode reserved space is to ensure that the update was an all-or-nothing
approach.  If there are in-inode xattrs that need to be moved to an external
xattr block, and that block needs allocation and such, it is possible that
a nojournal filesystem could result in the xattrs being moved and the inode
is written (w/o the xattrs), but the xattr block is not written to disk
before a crash.

That said, this check could potentially be moved later in the xattr handling,
since most xattrs will stay within the inode (when growing only a few bytes),
which is always safe.  If moving xattrs to an external block, we could sync
write the xattr block before marking the inode dirty to ensure that the xattr
is not lost in a crash (though it may be duplicated, e2fsck should handle
that?).  However, that has a potentially large performance cost.

Cheers, Andreas

> 
> Signed-off-by: Eric Whitney <enwlinux@gmail.com>
> ---
> fs/ext4/inode.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 9c06472..260da4d 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5455,18 +5455,20 @@ int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode)
> 	err = ext4_reserve_inode_write(handle, inode, &iloc);
> 	if (err)
> 		return err;
> -	if (ext4_handle_valid(handle) &&
> -	    EXT4_I(inode)->i_extra_isize < sbi->s_want_extra_isize &&
> +	if (EXT4_I(inode)->i_extra_isize < sbi->s_want_extra_isize &&
> 	    !ext4_test_inode_state(inode, EXT4_STATE_NO_EXPAND)) {
> 		/*
> -		 * We need extra buffer credits since we may write into EA block
> +		 * In nojournal mode, we can immediately attempt to expand
> +		 * the inode.  When journaled, we first need to obtain extra
> +		 * buffer credits since we may write into the EA block
> 		 * with this same handle. If journal_extend fails, then it will
> 		 * only result in a minor loss of functionality for that inode.
> 		 * If this is felt to be critical, then e2fsck should be run to
> 		 * force a large enough s_min_extra_isize.
> 		 */
> -		if ((jbd2_journal_extend(handle,
> -			     EXT4_DATA_TRANS_BLOCKS(inode->i_sb))) == 0) {
> +		if (!ext4_handle_valid(handle) ||
> +		    jbd2_journal_extend(handle,
> +			     EXT4_DATA_TRANS_BLOCKS(inode->i_sb)) == 0) {
> 			ret = ext4_expand_extra_isize(inode,
> 						      sbi->s_want_extra_isize,
> 						      iloc, handle);
> --
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas
Theodore Ts'o Oct. 25, 2016, 11:05 p.m. UTC | #2
On Tue, Oct 25, 2016 at 03:54:16PM -0600, Andreas Dilger wrote:
> I think the reason we required that the filesystem be journaled to expand
> the inode reserved space is to ensure that the update was an all-or-nothing
> approach.  If there are in-inode xattrs that need to be moved to an external
> xattr block, and that block needs allocation and such, it is possible that
> a nojournal filesystem could result in the xattrs being moved and the inode
> is written (w/o the xattrs), but the xattr block is not written to disk
> before a crash.

In nojournal mode, any number of failures can cause data loss.  For
example, we could be splitting a node in an extent tree, or in a htree
directory block, and we can end up losing data if we crash at an
inconvenient time.  The chances of losing half of an interior node of
an extent tree block after a crash is going to be the same, and the
data loss form such same is going to be far worse than the loss if we
are unluck expanding an inode and moving xattrs to an external block....

    	   	     	      	  - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Nov. 15, 2016, 2:52 a.m. UTC | #3
On Tue, Oct 25, 2016 at 05:14:05PM -0400, Eric Whitney wrote:
> Runs of xfstest ext4/022 on nojournal file systems result in failures
> because the inodes of some of its test files do not expand as expected.
> The cause is a conditional in ext4_mark_inode_dirty() that prevents inode
> expansion unless the test file system has a journal.  Remove this
> unnecessary restriction.
> 
> Signed-off-by: Eric Whitney <enwlinux@gmail.com>

Thanks, applied.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9c06472..260da4d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5455,18 +5455,20 @@  int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode)
 	err = ext4_reserve_inode_write(handle, inode, &iloc);
 	if (err)
 		return err;
-	if (ext4_handle_valid(handle) &&
-	    EXT4_I(inode)->i_extra_isize < sbi->s_want_extra_isize &&
+	if (EXT4_I(inode)->i_extra_isize < sbi->s_want_extra_isize &&
 	    !ext4_test_inode_state(inode, EXT4_STATE_NO_EXPAND)) {
 		/*
-		 * We need extra buffer credits since we may write into EA block
+		 * In nojournal mode, we can immediately attempt to expand
+		 * the inode.  When journaled, we first need to obtain extra
+		 * buffer credits since we may write into the EA block
 		 * with this same handle. If journal_extend fails, then it will
 		 * only result in a minor loss of functionality for that inode.
 		 * If this is felt to be critical, then e2fsck should be run to
 		 * force a large enough s_min_extra_isize.
 		 */
-		if ((jbd2_journal_extend(handle,
-			     EXT4_DATA_TRANS_BLOCKS(inode->i_sb))) == 0) {
+		if (!ext4_handle_valid(handle) ||
+		    jbd2_journal_extend(handle,
+			     EXT4_DATA_TRANS_BLOCKS(inode->i_sb)) == 0) {
 			ret = ext4_expand_extra_isize(inode,
 						      sbi->s_want_extra_isize,
 						      iloc, handle);