Message ID | 20191003220613.10791-5-jack@suse.cz |
---|---|
State | Superseded |
Headers | show |
Series | ext4: Fix transaction overflow due to revoke descriptors | expand |
On Fri, Oct 04, 2019 at 12:05:51AM +0200, Jan Kara wrote: > Similarly to directories, EA inodes do only journalled modifications to > their data. Change ext4_should_journal_data() to return true for them so > that we don't have to special-case them during truncate. We are already special-casing EA inodes in ext4_clear_blocks() in fs/ext4/indirect.c, and get_default_free_blocks_flags() in fs/ext4/extents.c, and like S_ISDIR, we want to treat EA inode blocks as metadata. So I'm not sure I see the value of this change? As an aside, I was looking at fs/ext4/mballoc.c to see what the difference is for treating a block as a metadata block versus a journaled data block, and what I found made my hair rise on end: /* * We need to make sure we don't reuse the freed block until after the * transaction is committed. We make an exception if the inode is to be * written in writeback mode since writeback mode has weak data * consistency guarantees. */ So in data=writeback, if a file is deleted, its blocks are available for immediate reallocation, and if we are under heavy memory pressure, the deleted file's blocks could get overwritten --- even in the case where we crash and the transaction never committed. While it's true that date=writeback mode has weaker guarantees, my understanding is that it only applied to the exposure stale data, and not to a long-standing file's blocks getting corrupted if it is almost deleted, but not quite before a crash. Granted, the situation where this would happen is quite wrare, but it seems quite wrong.... - Ted
On Sun 20-10-19 21:38:42, Theodore Y. Ts'o wrote: > On Fri, Oct 04, 2019 at 12:05:51AM +0200, Jan Kara wrote: > > Similarly to directories, EA inodes do only journalled modifications to > > their data. Change ext4_should_journal_data() to return true for them so > > that we don't have to special-case them during truncate. > > We are already special-casing EA inodes in ext4_clear_blocks() in > fs/ext4/indirect.c, and get_default_free_blocks_flags() in > fs/ext4/extents.c, and like S_ISDIR, we want to treat EA inode blocks > as metadata. So I'm not sure I see the value of this change? Firstly, ext4_should_journal_data() should tell whether inode's data blocks are modified through journalling. So as a principle of least surprise it should return true for EA inodes because that's how data blocks of those inodes are modified. Secondly, once ext4_should_journal_data() is fixed by this patch, I think that we can just drop that special-casing from ext4_clear_blocks() and get_default_free_blocks_flags() and just have there: if (ext4_should_journal_data(inode)) flags |= EXT4_FREE_BLOCKS_FORGET; > As an aside, I was looking at fs/ext4/mballoc.c to see what the > difference is for treating a block as a metadata block versus a > journaled data block, and what I found made my hair rise on end: > > /* > * We need to make sure we don't reuse the freed block until after the > * transaction is committed. We make an exception if the inode is to be > * written in writeback mode since writeback mode has weak data > * consistency guarantees. > */ > > So in data=writeback, if a file is deleted, its blocks are available > for immediate reallocation, and if we are under heavy memory pressure, > the deleted file's blocks could get overwritten --- even in the case > where we crash and the transaction never committed. > > While it's true that date=writeback mode has weaker guarantees, my > understanding is that it only applied to the exposure stale data, and > not to a long-standing file's blocks getting corrupted if it is almost > deleted, but not quite before a crash. > > Granted, the situation where this would happen is quite wrare, but it > seems quite wrong.... I've always considered data=writeback as: You don't know what the data is going to be if the file was touched shortly before crashing (i.e., similar to old ext2 non-guarantees). Honza
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index ef8fcf7d0d3b..99fe72522960 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -407,6 +407,7 @@ static inline int ext4_inode_journal_mode(struct inode *inode) return EXT4_INODE_WRITEBACK_DATA_MODE; /* writeback */ /* We do not support data journalling with delayed allocation */ if (!S_ISREG(inode->i_mode) || + ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) || test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA || (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) && !test_opt(inode->i_sb, DELALLOC))) {
Similarly to directories, EA inodes do only journalled modifications to their data. Change ext4_should_journal_data() to return true for them so that we don't have to special-case them during truncate. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/ext4_jbd2.h | 1 + 1 file changed, 1 insertion(+)