diff mbox series

[2/3] ext4: Do not create EA inode under buffer lock

Message ID 20240209112107.10585-2-jack@suse.cz
State Awaiting Upstream
Headers show
Series ext4: Create EA inodes outside of buffer lock | expand

Commit Message

Jan Kara Feb. 9, 2024, 11:21 a.m. UTC
ext4_xattr_set_entry() creates new EA inodes while holding buffer lock
on the external xattr block. This is problematic as it nests all the
allocation locking (which acquires locks on other buffers) under the
buffer lock. This can even deadlock when the filesystem is corrupted and
e.g. quota file is setup to contain xattr block as data block. Move the
allocation of EA inode out of ext4_xattr_set_entry() into the callers.

Reported-by: syzbot+a43d4f48b8397d0e41a9@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/xattr.c | 100 +++++++++++++++++++++++-------------------------
 1 file changed, 48 insertions(+), 52 deletions(-)

Comments

Theodore Ts'o Feb. 29, 2024, 3:59 p.m. UTC | #1
On Fri, Feb 09, 2024 at 12:21:00PM +0100, Jan Kara wrote:
> ext4_xattr_set_entry() creates new EA inodes while holding buffer lock
> on the external xattr block. This is problematic as it nests all the
> allocation locking (which acquires locks on other buffers) under the
> buffer lock. This can even deadlock when the filesystem is corrupted and
> e.g. quota file is setup to contain xattr block as data block. Move the
> allocation of EA inode out of ext4_xattr_set_entry() into the callers.
> 
> Reported-by: syzbot+a43d4f48b8397d0e41a9@syzkaller.appspotmail.com
> Signed-off-by: Jan Kara <jack@suse.cz>

In my testing I've found that this is causing a regression for
ext4/026 in the encrypt configuration.  This can be replicated using
"kvm-xfstests -c encrypt ext4/026.   Logs follow below.

I'll try to take a closer look, but I may end up deciding to drop this
patch or possible the whole patch series until we can figure out
what's going on.

						- Ted

ext4/026 1s ...  [10:51:57][    3.111475] run fstests ext4/026 at 2024-02-29 10:51:57
EXT4-fs (vdc): Test dummy encryption mode enabled
EXT4-fs (vdc): Test dummy encryption mode enabled
EXT4-fs error (device vdc): ext4_xattr_inode_iget:443: comm getfattr: error while reading EA inode 18 err=-116
EXT4-fs error (device vdc): ext4_xattr_inode_iget:443: comm getfattr: error while reading EA inode 18 err=-116
EXT4-fs error (device vdc): ext4_xattr_inode_iget:443: comm getfattr: error while reading EA inode 18 err=-116
EXT4-fs error (device vdc): ext4_xattr_inode_iget:443: comm getfattr: error while reading EA inode 18 err=-116
EXT4-fs error (device vdc): ext4_xattr_inode_iget:443: comm getfattr: error while reading EA inode 18 err=-116
EXT4-fs error (device vdc): ext4_xattr_inode_iget:443: comm getfattr: error while reading EA inode 18 err=-116
------------[ cut here ]------------
EA inode 18 ref_count=-1
WARNING: CPU: 0 PID: 2391 at fs/ext4/xattr.c:1064 ext4_xattr_inode_update_ref+0x1c0/0x230
CPU: 0 PID: 2391 Comm: setfattr Not tainted 6.8.0-rc3-xfstests-00021-gf7528aea5d49 #320
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:ext4_xattr_inode_update_ref+0x1c0/0x230
Code: 0b e9 21 ff ff ff 80 3d 13 47 5a 01 00 0f 85 14 ff ff ff 48 8b 73 40 48 c7 c7 a6 8e 5d 82 c6 05 fb 46 5a 01 01 e8 50 40 c1 ff <0f> 0b e9 f6 fe ff ff 80 3d e7 46 5a 01 00 0f 85 5d ff ff ff 48 8b
RSP: 0018:ffffc900032cb980 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8880043438a8 RCX: 0000000000000027
RDX: ffff88807dc1c848 RSI: 0000000000000001 RDI: ffff88807dc1c840
RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff82860e00
R10: ffffc900032cb828 R11: ffffffff828d0e48 R12: ffff888007c93150
R13: ffff888004343948 R14: 00000000ffffffff R15: ffff8880043438a8
FS:  00007fab1e02b740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055e0e4fd2000 CR3: 000000000a0a6002 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? ext4_xattr_inode_update_ref+0x1c0/0x230
 ? __warn+0x7c/0x130
 ? ext4_xattr_inode_update_ref+0x1c0/0x230
 ? report_bug+0x173/0x1d0
 ? handle_bug+0x3a/0x70
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? ext4_xattr_inode_update_ref+0x1c0/0x230
 ext4_xattr_inode_dec_ref_all+0x166/0x330
 ext4_xattr_release_block+0x29e/0x300
 ext4_xattr_block_set+0x795/0xc70
 ext4_xattr_set_handle+0x468/0x680
 ext4_xattr_set+0x88/0x160
 __vfs_setxattr+0x96/0xd0
 __vfs_setxattr_noperm+0x79/0x1d0
 vfs_setxattr+0x9f/0x180
 setxattr+0x9e/0xc0
 path_setxattr+0xc9/0xf0
 __x64_sys_setxattr+0x2b/0x40
 do_syscall_64+0x52/0x120
 entry_SYSCALL_64_after_hwframe+0x6e/0x76
RIP: 0033:0x7fab1e12f4f9
Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d7 08 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007fffe7694618 EFLAGS: 00000206 ORIG_RAX: 00000000000000bc
RAX: ffffffffffffffda RBX: 000055e0e4fc3340 RCX: 00007fab1e12f4f9
RDX: 000055e0e4fc3340 RSI: 00007fffe7695a22 RDI: 00007fffe76a5a96
RBP: 00007fffe76a5a96 R08: 0000000000000000 R09: 00007fab1e247020
R10: 0000000000010000 R11: 0000000000000206 R12: 00007fffe7695a22
R13: 000055e0e3e8008c R14: 000055e0e3e82100 R15: 00007fab1e247020
 </TASK>
---[ end trace 0000000000000000 ]---
EXT4-fs error (device vdc): ext4_xattr_inode_iget:443: comm getfattr: error while reading EA inode 18 err=-116
EXT4-fs error (device vdc): ext4_xattr_inode_iget:443: comm getfattr: error while reading EA inode 18 err=-116
EXT4-fs error (device vdc): ext4_xattr_inode_iget:443: comm getfattr: error while reading EA inode 18 err=-116
EXT4-fs error (device vdc): ext4_xattr_inode_iget:443: comm getfattr: error while reading EA inode 18 err=-116
EXT4-fs (vdc): Test dummy encryption mode enabled
EXT4-fs (vdc): warning: mounting fs with errors, running e2fsck is recommended
Jan Kara March 14, 2024, 6:12 p.m. UTC | #2
On Thu 29-02-24 10:59:17, Theodore Ts'o wrote:
> On Fri, Feb 09, 2024 at 12:21:00PM +0100, Jan Kara wrote:
> > ext4_xattr_set_entry() creates new EA inodes while holding buffer lock
> > on the external xattr block. This is problematic as it nests all the
> > allocation locking (which acquires locks on other buffers) under the
> > buffer lock. This can even deadlock when the filesystem is corrupted and
> > e.g. quota file is setup to contain xattr block as data block. Move the
> > allocation of EA inode out of ext4_xattr_set_entry() into the callers.
> > 
> > Reported-by: syzbot+a43d4f48b8397d0e41a9@syzkaller.appspotmail.com
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> In my testing I've found that this is causing a regression for
> ext4/026 in the encrypt configuration.  This can be replicated using
> "kvm-xfstests -c encrypt ext4/026.   Logs follow below.
> 
> I'll try to take a closer look, but I may end up deciding to drop this
> patch or possible the whole patch series until we can figure out
> what's going on.

OK, I've found the problem. I'll rebase patches on top of rc1 when it
happens and send fixed version. Thanks for catching this!

								Honza
diff mbox series

Patch

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 146690c10c73..e7e1ffff8eba 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1619,6 +1619,7 @@  static struct inode *ext4_xattr_inode_lookup_create(handle_t *handle,
 static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
 				struct ext4_xattr_search *s,
 				handle_t *handle, struct inode *inode,
+				struct inode *new_ea_inode,
 				bool is_block)
 {
 	struct ext4_xattr_entry *last, *next;
@@ -1626,7 +1627,6 @@  static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
 	size_t min_offs = s->end - s->base, name_len = strlen(i->name);
 	int in_inode = i->in_inode;
 	struct inode *old_ea_inode = NULL;
-	struct inode *new_ea_inode = NULL;
 	size_t old_size, new_size;
 	int ret;
 
@@ -1711,38 +1711,11 @@  static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
 			old_ea_inode = NULL;
 			goto out;
 		}
-	}
-	if (i->value && in_inode) {
-		WARN_ON_ONCE(!i->value_len);
-
-		new_ea_inode = ext4_xattr_inode_lookup_create(handle, inode,
-					i->value, i->value_len);
-		if (IS_ERR(new_ea_inode)) {
-			ret = PTR_ERR(new_ea_inode);
-			new_ea_inode = NULL;
-			goto out;
-		}
-	}
 
-	if (old_ea_inode) {
 		/* We are ready to release ref count on the old_ea_inode. */
 		ret = ext4_xattr_inode_dec_ref(handle, old_ea_inode);
-		if (ret) {
-			/* Release newly required ref count on new_ea_inode. */
-			if (new_ea_inode) {
-				int err;
-
-				err = ext4_xattr_inode_dec_ref(handle,
-							       new_ea_inode);
-				if (err)
-					ext4_warning_inode(new_ea_inode,
-						  "dec ref new_ea_inode err=%d",
-						  err);
-				ext4_xattr_inode_free_quota(inode, new_ea_inode,
-							    i->value_len);
-			}
+		if (ret)
 			goto out;
-		}
 
 		ext4_xattr_inode_free_quota(inode, old_ea_inode,
 					    le32_to_cpu(here->e_value_size));
@@ -1866,7 +1839,6 @@  static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
 	ret = 0;
 out:
 	iput(old_ea_inode);
-	iput(new_ea_inode);
 	return ret;
 }
 
@@ -1929,9 +1901,21 @@  ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 	size_t old_ea_inode_quota = 0;
 	unsigned int ea_ino;
 
-
 #define header(x) ((struct ext4_xattr_header *)(x))
 
+	/* If we need EA inode, prepare it before locking the buffer */
+	if (i->value && i->in_inode) {
+		WARN_ON_ONCE(!i->value_len);
+
+		ea_inode = ext4_xattr_inode_lookup_create(handle, inode,
+					i->value, i->value_len);
+		if (IS_ERR(ea_inode)) {
+			error = PTR_ERR(ea_inode);
+			ea_inode = NULL;
+			goto cleanup;
+		}
+	}
+
 	if (s->base) {
 		int offset = (char *)s->here - bs->bh->b_data;
 
@@ -1940,6 +1924,7 @@  ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 						      EXT4_JTR_NONE);
 		if (error)
 			goto cleanup;
+
 		lock_buffer(bs->bh);
 
 		if (header(s->base)->h_refcount == cpu_to_le32(1)) {
@@ -1966,7 +1951,7 @@  ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 			}
 			ea_bdebug(bs->bh, "modifying in-place");
 			error = ext4_xattr_set_entry(i, s, handle, inode,
-						     true /* is_block */);
+					     ea_inode, true /* is_block */);
 			ext4_xattr_block_csum_set(inode, bs->bh);
 			unlock_buffer(bs->bh);
 			if (error == -EFSCORRUPTED)
@@ -2034,29 +2019,13 @@  ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 		s->end = s->base + sb->s_blocksize;
 	}
 
-	error = ext4_xattr_set_entry(i, s, handle, inode, true /* is_block */);
+	error = ext4_xattr_set_entry(i, s, handle, inode, ea_inode,
+				     true /* is_block */);
 	if (error == -EFSCORRUPTED)
 		goto bad_block;
 	if (error)
 		goto cleanup;
 
-	if (i->value && s->here->e_value_inum) {
-		/*
-		 * A ref count on ea_inode has been taken as part of the call to
-		 * ext4_xattr_set_entry() above. We would like to drop this
-		 * extra ref but we have to wait until the xattr block is
-		 * initialized and has its own ref count on the ea_inode.
-		 */
-		ea_ino = le32_to_cpu(s->here->e_value_inum);
-		error = ext4_xattr_inode_iget(inode, ea_ino,
-					      le32_to_cpu(s->here->e_hash),
-					      &ea_inode);
-		if (error) {
-			ea_inode = NULL;
-			goto cleanup;
-		}
-	}
-
 inserted:
 	if (!IS_LAST_ENTRY(s->first)) {
 		new_bh = ext4_xattr_block_cache_find(inode, header(s->base),
@@ -2277,14 +2246,40 @@  int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode,
 {
 	struct ext4_xattr_ibody_header *header;
 	struct ext4_xattr_search *s = &is->s;
+	struct inode *ea_inode = NULL;
 	int error;
 
 	if (!EXT4_INODE_HAS_XATTR_SPACE(inode))
 		return -ENOSPC;
 
-	error = ext4_xattr_set_entry(i, s, handle, inode, false /* is_block */);
-	if (error)
+	/* If we need EA inode, prepare it before locking the buffer */
+	if (i->value && i->in_inode) {
+		WARN_ON_ONCE(!i->value_len);
+
+		ea_inode = ext4_xattr_inode_lookup_create(handle, inode,
+					i->value, i->value_len);
+		if (IS_ERR(ea_inode))
+			return PTR_ERR(ea_inode);
+	}
+	error = ext4_xattr_set_entry(i, s, handle, inode, ea_inode,
+				     false /* is_block */);
+	if (error) {
+		if (ea_inode) {
+			int error2;
+
+			error2 = ext4_xattr_inode_dec_ref(handle, ea_inode);
+			if (error2)
+				ext4_warning_inode(ea_inode, "dec ref error=%d",
+						   error2);
+
+			/* If there was an error, revert the quota charge. */
+			if (error)
+				ext4_xattr_inode_free_quota(inode, ea_inode,
+						    i_size_read(ea_inode));
+			iput(ea_inode);
+		}
 		return error;
+	}
 	header = IHDR(inode, ext4_raw_inode(&is->iloc));
 	if (!IS_LAST_ENTRY(s->first)) {
 		header->h_magic = cpu_to_le32(EXT4_XATTR_MAGIC);
@@ -2293,6 +2288,7 @@  int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode,
 		header->h_magic = cpu_to_le32(0);
 		ext4_clear_inode_state(inode, EXT4_STATE_XATTR);
 	}
+	iput(ea_inode);
 	return 0;
 }