diff mbox series

[05/22] ext4: Fix ext4_should_journal_data() for EA inodes

Message ID 20191003220613.10791-5-jack@suse.cz
State Superseded
Headers show
Series ext4: Fix transaction overflow due to revoke descriptors | expand

Commit Message

Jan Kara Oct. 3, 2019, 10:05 p.m. UTC
Similarly to directories, EA inodes do only journalled modifications to
their data. Change ext4_should_journal_data() to return true for them so
that we don't have to special-case them during truncate.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4_jbd2.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Theodore Ts'o Oct. 21, 2019, 1:38 a.m. UTC | #1
On Fri, Oct 04, 2019 at 12:05:51AM +0200, Jan Kara wrote:
> Similarly to directories, EA inodes do only journalled modifications to
> their data. Change ext4_should_journal_data() to return true for them so
> that we don't have to special-case them during truncate.

We are already special-casing EA inodes in ext4_clear_blocks() in
fs/ext4/indirect.c, and get_default_free_blocks_flags() in
fs/ext4/extents.c, and like S_ISDIR, we want to treat EA inode blocks
as metadata.   So I'm not sure I see the value of this change?

As an aside, I was looking at fs/ext4/mballoc.c to see what the
difference is for treating a block as a metadata block versus a
journaled data block, and what I found made my hair rise on end:

	/*
	 * We need to make sure we don't reuse the freed block until after the
	 * transaction is committed. We make an exception if the inode is to be
	 * written in writeback mode since writeback mode has weak data
	 * consistency guarantees.
	 */

So in data=writeback, if a file is deleted, its blocks are available
for immediate reallocation, and if we are under heavy memory pressure,
the deleted file's blocks could get overwritten --- even in the case
where we crash and the transaction never committed.

While it's true that date=writeback mode has weaker guarantees, my
understanding is that it only applied to the exposure stale data, and
not to a long-standing file's blocks getting corrupted if it is almost
deleted, but not quite before a crash.

Granted, the situation where this would happen is quite wrare, but it
seems quite wrong....

						- Ted
Jan Kara Oct. 23, 2019, 4:55 p.m. UTC | #2
On Sun 20-10-19 21:38:42, Theodore Y. Ts'o wrote:
> On Fri, Oct 04, 2019 at 12:05:51AM +0200, Jan Kara wrote:
> > Similarly to directories, EA inodes do only journalled modifications to
> > their data. Change ext4_should_journal_data() to return true for them so
> > that we don't have to special-case them during truncate.
> 
> We are already special-casing EA inodes in ext4_clear_blocks() in
> fs/ext4/indirect.c, and get_default_free_blocks_flags() in
> fs/ext4/extents.c, and like S_ISDIR, we want to treat EA inode blocks
> as metadata.   So I'm not sure I see the value of this change?

Firstly, ext4_should_journal_data() should tell whether inode's data blocks
are modified through journalling. So as a principle of least surprise it
should return true for EA inodes because that's how data blocks of those
inodes are modified.

Secondly, once ext4_should_journal_data() is fixed by this patch, I think
that we can just drop that special-casing from ext4_clear_blocks() and
get_default_free_blocks_flags() and just have there:

	if (ext4_should_journal_data(inode))
		flags |= EXT4_FREE_BLOCKS_FORGET;

> As an aside, I was looking at fs/ext4/mballoc.c to see what the
> difference is for treating a block as a metadata block versus a
> journaled data block, and what I found made my hair rise on end:
> 
> 	/*
> 	 * We need to make sure we don't reuse the freed block until after the
> 	 * transaction is committed. We make an exception if the inode is to be
> 	 * written in writeback mode since writeback mode has weak data
> 	 * consistency guarantees.
> 	 */
> 
> So in data=writeback, if a file is deleted, its blocks are available
> for immediate reallocation, and if we are under heavy memory pressure,
> the deleted file's blocks could get overwritten --- even in the case
> where we crash and the transaction never committed.
> 
> While it's true that date=writeback mode has weaker guarantees, my
> understanding is that it only applied to the exposure stale data, and
> not to a long-standing file's blocks getting corrupted if it is almost
> deleted, but not quite before a crash.
> 
> Granted, the situation where this would happen is quite wrare, but it
> seems quite wrong....

I've always considered data=writeback as: You don't know what the data is
going to be if the file was touched shortly before crashing (i.e., similar
to old ext2 non-guarantees).

								Honza
diff mbox series

Patch

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index ef8fcf7d0d3b..99fe72522960 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -407,6 +407,7 @@  static inline int ext4_inode_journal_mode(struct inode *inode)
 		return EXT4_INODE_WRITEBACK_DATA_MODE;	/* writeback */
 	/* We do not support data journalling with delayed allocation */
 	if (!S_ISREG(inode->i_mode) ||
+	    ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) ||
 	    test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
 	    (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) &&
 	    !test_opt(inode->i_sb, DELALLOC))) {