[05/22] ext4: Fix ext4_should_journal_data() for EA inodes

Message ID	20191003220613.10791-5-jack@suse.cz
State	Superseded
Headers	show Return-Path: <linux-ext4-owner@vger.kernel.org> From: Jan Kara <jack@suse.cz> To: <linux-ext4@vger.kernel.org> Cc: Ted Tso <tytso@mit.edu>, Jan Kara <jack@suse.cz> Subject: [PATCH 05/22] ext4: Fix ext4_should_journal_data() for EA inodes Date: Fri, 4 Oct 2019 00:05:51 +0200 Message-Id: <20191003220613.10791-5-jack@suse.cz> In-Reply-To: <20191003215523.7313-1-jack@suse.cz> References: <20191003215523.7313-1-jack@suse.cz> Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk
Series	ext4: Fix transaction overflow due to revoke descriptors \| expand [0/19,v3] ext4: Fix transaction overflow due to revoke descriptors [01/22] jbd2: Fix possible overflow in jbd2_log_space_left() [02/22] jbd2: Fixup stale comment in commit code [03/22] ext4: Do not iput inode under running transaction in ext4_mkdir() [04/22] ext4: Fix credit estimate for final inode freeing [05/22] ext4: Fix ext4_should_journal_data() for EA inodes [06/22] ext4: Use ext4_journal_extend() instead of jbd2_journal_extend() [07/22] ext4: Avoid unnecessary revokes in ext4_alloc_branch() [08/22] ext4: Provide function to handle transaction restarts [09/22] ext4, jbd2: Provide accessor function for handle credits [10/22] ocfs2: Use accessor function for h_buffer_credits [11/22] jbd2: Fix statistics for the number of logged blocks [12/22] jbd2: Reorganize jbd2_journal_stop() [13/22] jbd2: Drop pointless check from jbd2_journal_stop() [14/22] jbd2: Drop pointless wakeup from jbd2_journal_stop() [15/22] jbd2: Factor out common parts of stopping and restarting a handle [16/22] jbd2: Account descriptor blocks into t_outstanding_credits [17/22] jbd2: Drop jbd2_space_needed() [18/22] jbd2: Reserve space for revoke descriptor blocks [19/22] jbd2: Rename h_buffer_credits to h_total_credits [20/22] jbd2: Make credit checking more strict [21/22] ext4: Reserve revoke credits for freed blocks [22/22] jbd2: Provide trace event for handle restarts [23/25] ext4: Reserve revoke credits for freed blocks [24/25] jbd2: Provide trace event for handle restarts [25/25] jbd2: Fine tune estimate of necessary descriptor blocks

Message ID

20191003220613.10791-5-jack@suse.cz

State

Superseded

Headers

From: Jan Kara <jack@suse.cz>
To: <linux-ext4@vger.kernel.org>
Cc: Ted Tso <tytso@mit.edu>, Jan Kara <jack@suse.cz>
Subject: [PATCH 05/22] ext4: Fix ext4_should_journal_data() for EA inodes
Date: Fri,  4 Oct 2019 00:05:51 +0200
Message-Id: <20191003220613.10791-5-jack@suse.cz>
In-Reply-To: <20191003215523.7313-1-jack@suse.cz>
References: <20191003215523.7313-1-jack@suse.cz>
Sender: linux-ext4-owner@vger.kernel.org
Precedence: bulk

Series

ext4: Fix transaction overflow due to revoke descriptors | expand

Commit Message

Jan Kara Oct. 3, 2019, 10:05 p.m. UTC

Similarly to directories, EA inodes do only journalled modifications to
their data. Change ext4_should_journal_data() to return true for them so
that we don't have to special-case them during truncate.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4_jbd2.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Theodore Ts'o Oct. 21, 2019, 1:38 a.m. UTC | #1

On Fri, Oct 04, 2019 at 12:05:51AM +0200, Jan Kara wrote:
> Similarly to directories, EA inodes do only journalled modifications to
> their data. Change ext4_should_journal_data() to return true for them so
> that we don't have to special-case them during truncate.

We are already special-casing EA inodes in ext4_clear_blocks() in
fs/ext4/indirect.c, and get_default_free_blocks_flags() in
fs/ext4/extents.c, and like S_ISDIR, we want to treat EA inode blocks
as metadata.   So I'm not sure I see the value of this change?

As an aside, I was looking at fs/ext4/mballoc.c to see what the
difference is for treating a block as a metadata block versus a
journaled data block, and what I found made my hair rise on end:

	/*
	 * We need to make sure we don't reuse the freed block until after the
	 * transaction is committed. We make an exception if the inode is to be
	 * written in writeback mode since writeback mode has weak data
	 * consistency guarantees.
	 */

So in data=writeback, if a file is deleted, its blocks are available
for immediate reallocation, and if we are under heavy memory pressure,
the deleted file's blocks could get overwritten --- even in the case
where we crash and the transaction never committed.

While it's true that date=writeback mode has weaker guarantees, my
understanding is that it only applied to the exposure stale data, and
not to a long-standing file's blocks getting corrupted if it is almost
deleted, but not quite before a crash.

Granted, the situation where this would happen is quite wrare, but it
seems quite wrong....

						- Ted

Jan Kara Oct. 23, 2019, 4:55 p.m. UTC | #2

On Sun 20-10-19 21:38:42, Theodore Y. Ts'o wrote:
> On Fri, Oct 04, 2019 at 12:05:51AM +0200, Jan Kara wrote:
> > Similarly to directories, EA inodes do only journalled modifications to
> > their data. Change ext4_should_journal_data() to return true for them so
> > that we don't have to special-case them during truncate.
> 
> We are already special-casing EA inodes in ext4_clear_blocks() in
> fs/ext4/indirect.c, and get_default_free_blocks_flags() in
> fs/ext4/extents.c, and like S_ISDIR, we want to treat EA inode blocks
> as metadata.   So I'm not sure I see the value of this change?

Firstly, ext4_should_journal_data() should tell whether inode's data blocks
are modified through journalling. So as a principle of least surprise it
should return true for EA inodes because that's how data blocks of those
inodes are modified.

Secondly, once ext4_should_journal_data() is fixed by this patch, I think
that we can just drop that special-casing from ext4_clear_blocks() and
get_default_free_blocks_flags() and just have there:

	if (ext4_should_journal_data(inode))
		flags |= EXT4_FREE_BLOCKS_FORGET;

> As an aside, I was looking at fs/ext4/mballoc.c to see what the
> difference is for treating a block as a metadata block versus a
> journaled data block, and what I found made my hair rise on end:
> 
> 	/*
> 	 * We need to make sure we don't reuse the freed block until after the
> 	 * transaction is committed. We make an exception if the inode is to be
> 	 * written in writeback mode since writeback mode has weak data
> 	 * consistency guarantees.
> 	 */
> 
> So in data=writeback, if a file is deleted, its blocks are available
> for immediate reallocation, and if we are under heavy memory pressure,
> the deleted file's blocks could get overwritten --- even in the case
> where we crash and the transaction never committed.
> 
> While it's true that date=writeback mode has weaker guarantees, my
> understanding is that it only applied to the exposure stale data, and
> not to a long-standing file's blocks getting corrupted if it is almost
> deleted, but not quite before a crash.
> 
> Granted, the situation where this would happen is quite wrare, but it
> seems quite wrong....

I've always considered data=writeback as: You don't know what the data is
going to be if the file was touched shortly before crashing (i.e., similar
to old ext2 non-guarantees).

								Honza

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index ef8fcf7d0d3b..99fe72522960 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -407,6 +407,7 @@  static inline int ext4_inode_journal_mode(struct inode *inode)
 		return EXT4_INODE_WRITEBACK_DATA_MODE;	/* writeback */
 	/* We do not support data journalling with delayed allocation */
 	if (!S_ISREG(inode->i_mode) ||
+	    ext4_test_inode_flag(inode, EXT4_INODE_EA_INODE) ||
 	    test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA ||
 	    (ext4_test_inode_flag(inode, EXT4_INODE_JOURNAL_DATA) &&
 	    !test_opt(inode->i_sb, DELALLOC))) {

[05/22] ext4: Fix ext4_should_journal_data() for EA inodes

Commit Message

Comments

Patch