diff mbox series

[v2,4/4] ext4: prevent getting empty inode buffer

Message ID 20210819065704.1248402-5-yi.zhang@huawei.com
State Superseded
Headers show
Series ext4: fix a inode checksum error | expand

Commit Message

Zhang Yi Aug. 19, 2021, 6:57 a.m. UTC
In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
inode buffer when the inode monopolize an inode block for performance
reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
buffer to make it fine, but we could miss this call if something bad
happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
empty inode buffer and trigger ext4 error.

For example, if we remove a nonexistent xattr on inode A,
ext4_xattr_set_handle() will return ENODATA before invoking
ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
will get checksum error message in ext4_iget() when getting inode again.

  EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid

Even worse, if we allocate another inode B at the same inode block, it
will corrupt the inode A on disk when write back inode B.

So this patch postpone the init and mark buffer uptodate logic until
before filling correct inode data in ext4_do_update_inode() if skip read
I/O, ensure the buffer is real uptodate.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

Comments

Jan Kara Aug. 19, 2021, 10:35 a.m. UTC | #1
On Thu 19-08-21 14:57:04, Zhang Yi wrote:
> In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
> inode buffer when the inode monopolize an inode block for performance
> reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
> buffer to make it fine, but we could miss this call if something bad
> happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
> empty inode buffer and trigger ext4 error.
> 
> For example, if we remove a nonexistent xattr on inode A,
> ext4_xattr_set_handle() will return ENODATA before invoking
> ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
> will get checksum error message in ext4_iget() when getting inode again.
> 
>   EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid
> 
> Even worse, if we allocate another inode B at the same inode block, it
> will corrupt the inode A on disk when write back inode B.
> 
> So this patch postpone the init and mark buffer uptodate logic until
> before filling correct inode data in ext4_do_update_inode() if skip read
> I/O, ensure the buffer is real uptodate.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Just some language fixes below. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

> ---
>  fs/ext4/inode.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index d0d7a5295bf9..02d910c9d8f1 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4367,9 +4367,11 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
>  		}
>  		brelse(bitmap_bh);
>  		if (i == start + inodes_per_block) {
> -			/* all other inodes are free, so skip I/O */
> -			memset(bh->b_data, 0, bh->b_size);
> -			set_buffer_uptodate(bh);
> +			/*
> +			 * All other inodes are free, skip I/O. Return
> +			 * un-inited buffer (which is postponed until

I'd repharse this sentence as: Return uninitialized buffer immediately,
initialization is postponed until shortly before we fill inode contents.

> +			 * before filling inode data) immediately.
> +			 */
>  			unlock_buffer(bh);
>  			goto has_buffer;
>  		}
> @@ -5026,6 +5028,24 @@ static int ext4_do_update_inode(handle_t *handle,
>  	gid_t i_gid;
>  	projid_t i_projid;
>  
> +	/*
> +	 * If the buffer is not uptodate, it means all information of inode
								   ^^^^^^^^
of the inode is

> +	 * in memory and we got this buffer without reading the block. We
> +	 * must be cautious that once we mark the buffer as uptodate, we
> +	 * rely on filling in the correct inode data later in this function.
> +	 * Otherwise if we getting the left falsepositive buffer when

I'd rephrase this sentence as: Otherwise if we left uptodate buffer without
copying proper inode contents, we could corrupt the inode on disk after
allocating another inode in the same block.

								Honza
diff mbox series

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d0d7a5295bf9..02d910c9d8f1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4367,9 +4367,11 @@  static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 		}
 		brelse(bitmap_bh);
 		if (i == start + inodes_per_block) {
-			/* all other inodes are free, so skip I/O */
-			memset(bh->b_data, 0, bh->b_size);
-			set_buffer_uptodate(bh);
+			/*
+			 * All other inodes are free, skip I/O. Return
+			 * un-inited buffer (which is postponed until
+			 * before filling inode data) immediately.
+			 */
 			unlock_buffer(bh);
 			goto has_buffer;
 		}
@@ -5026,6 +5028,24 @@  static int ext4_do_update_inode(handle_t *handle,
 	gid_t i_gid;
 	projid_t i_projid;
 
+	/*
+	 * If the buffer is not uptodate, it means all information of inode
+	 * in memory and we got this buffer without reading the block. We
+	 * must be cautious that once we mark the buffer as uptodate, we
+	 * rely on filling in the correct inode data later in this function.
+	 * Otherwise if we getting the left falsepositive buffer when
+	 * creating other inode on the same block, it could corrupt the
+	 * inode data on disk.
+	 */
+	if (!buffer_uptodate(bh)) {
+		lock_buffer(bh);
+		if (!buffer_uptodate(bh)) {
+			memset(bh->b_data, 0, bh->b_size);
+			set_buffer_uptodate(bh);
+		}
+		unlock_buffer(bh);
+	}
+
 	spin_lock(&ei->i_raw_lock);
 
 	/* For fields not tracked in the in-memory inode,