diff mbox series

[v3] xfs: clear PF_MEMALLOC before exiting xfsaild thread

Message ID 20200309185714.42850-1-ebiggers@kernel.org
State Not Applicable
Headers show
Series [v3] xfs: clear PF_MEMALLOC before exiting xfsaild thread | expand

Commit Message

Eric Biggers March 9, 2020, 6:57 p.m. UTC
From: Eric Biggers <ebiggers@google.com>

Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
during do_exit().  That can confuse things.  In particular, if BSD
process accounting is enabled, then do_exit() writes data to an
accounting file.  If that file has FS_SYNC_FL set, then this write
occurs synchronously and can misbehave if PF_MEMALLOC is set.

For example, if the accounting file is located on an XFS filesystem,
then a WARN_ON_ONCE() in iomap_do_writepage() is triggered and the data
doesn't get written when it should.  Or if the accounting file is
located on an ext4 filesystem without a journal, then a WARN_ON_ONCE()
in ext4_write_inode() is triggered and the inode doesn't get written.

Fix this in xfsaild() by using the helper functions to save and restore
PF_MEMALLOC.

This can be reproduced as follows in the kvm-xfstests test appliance
modified to add the 'acct' Debian package, and with kvm-xfstests's
recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y:

        mkfs.xfs -f /dev/vdb
        mount /vdb
        touch /vdb/file
        chattr +S /vdb/file
        accton /vdb/file
        mkfs.xfs -f /dev/vdc
        mount /vdc
        umount /vdc

It causes:
	WARNING: CPU: 1 PID: 336 at fs/iomap/buffered-io.c:1534
	CPU: 1 PID: 336 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #3
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
	RIP: 0010:iomap_do_writepage+0x16b/0x1f0 fs/iomap/buffered-io.c:1534
	[...]
	Call Trace:
	 write_cache_pages+0x189/0x4d0 mm/page-writeback.c:2238
	 iomap_writepages+0x1c/0x33 fs/iomap/buffered-io.c:1642
	 xfs_vm_writepages+0x65/0x90 fs/xfs/xfs_aops.c:578
	 do_writepages+0x41/0xe0 mm/page-writeback.c:2344
	 __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:421
	 file_write_and_wait_range+0x71/0xc0 mm/filemap.c:760
	 xfs_file_fsync+0x7a/0x2b0 fs/xfs/xfs_file.c:114
	 generic_write_sync include/linux/fs.h:2867 [inline]
	 xfs_file_buffered_aio_write+0x379/0x3b0 fs/xfs/xfs_file.c:691
	 call_write_iter include/linux/fs.h:1901 [inline]
	 new_sync_write+0x130/0x1d0 fs/read_write.c:483
	 __kernel_write+0x54/0xe0 fs/read_write.c:515
	 do_acct_process+0x122/0x170 kernel/acct.c:522
	 slow_acct_process kernel/acct.c:581 [inline]
	 acct_process+0x1d4/0x27c kernel/acct.c:607
	 do_exit+0x83d/0xbc0 kernel/exit.c:791
	 kthread+0xf1/0x140 kernel/kthread.c:257
	 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352

This bug was originally reported by syzbot at
https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com.

Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
---

v3: updated commit message again, this time to take into account the bug
    also being reproducible when the accounting file is located on XFS.

v2: include more details in the commit message.

 fs/xfs/xfs_trans_ail.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong March 10, 2020, 3:47 p.m. UTC | #1
On Mon, Mar 09, 2020 at 11:57:14AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
> during do_exit().  That can confuse things.  In particular, if BSD
> process accounting is enabled, then do_exit() writes data to an
> accounting file.  If that file has FS_SYNC_FL set, then this write
> occurs synchronously and can misbehave if PF_MEMALLOC is set.
> 
> For example, if the accounting file is located on an XFS filesystem,
> then a WARN_ON_ONCE() in iomap_do_writepage() is triggered and the data
> doesn't get written when it should.  Or if the accounting file is
> located on an ext4 filesystem without a journal, then a WARN_ON_ONCE()
> in ext4_write_inode() is triggered and the inode doesn't get written.
> 
> Fix this in xfsaild() by using the helper functions to save and restore
> PF_MEMALLOC.
> 
> This can be reproduced as follows in the kvm-xfstests test appliance
> modified to add the 'acct' Debian package, and with kvm-xfstests's
> recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y:
> 
>         mkfs.xfs -f /dev/vdb
>         mount /vdb
>         touch /vdb/file
>         chattr +S /vdb/file
>         accton /vdb/file
>         mkfs.xfs -f /dev/vdc
>         mount /vdc
>         umount /vdc
> 
> It causes:
> 	WARNING: CPU: 1 PID: 336 at fs/iomap/buffered-io.c:1534
> 	CPU: 1 PID: 336 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #3
> 	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
> 	RIP: 0010:iomap_do_writepage+0x16b/0x1f0 fs/iomap/buffered-io.c:1534
> 	[...]
> 	Call Trace:
> 	 write_cache_pages+0x189/0x4d0 mm/page-writeback.c:2238
> 	 iomap_writepages+0x1c/0x33 fs/iomap/buffered-io.c:1642
> 	 xfs_vm_writepages+0x65/0x90 fs/xfs/xfs_aops.c:578
> 	 do_writepages+0x41/0xe0 mm/page-writeback.c:2344
> 	 __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:421
> 	 file_write_and_wait_range+0x71/0xc0 mm/filemap.c:760
> 	 xfs_file_fsync+0x7a/0x2b0 fs/xfs/xfs_file.c:114
> 	 generic_write_sync include/linux/fs.h:2867 [inline]
> 	 xfs_file_buffered_aio_write+0x379/0x3b0 fs/xfs/xfs_file.c:691
> 	 call_write_iter include/linux/fs.h:1901 [inline]
> 	 new_sync_write+0x130/0x1d0 fs/read_write.c:483
> 	 __kernel_write+0x54/0xe0 fs/read_write.c:515
> 	 do_acct_process+0x122/0x170 kernel/acct.c:522
> 	 slow_acct_process kernel/acct.c:581 [inline]
> 	 acct_process+0x1d4/0x27c kernel/acct.c:607
> 	 do_exit+0x83d/0xbc0 kernel/exit.c:791
> 	 kthread+0xf1/0x140 kernel/kthread.c:257
> 	 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352
> 
> This bug was originally reported by syzbot at
> https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com.
> 
> Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Seems reasonable to me, will give it a spin...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
> 
> v3: updated commit message again, this time to take into account the bug
>     also being reproducible when the accounting file is located on XFS.
> 
> v2: include more details in the commit message.
> 
>  fs/xfs/xfs_trans_ail.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 00cc5b8734be8..3bc570c90ad97 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -529,8 +529,9 @@ xfsaild(
>  {
>  	struct xfs_ail	*ailp = data;
>  	long		tout = 0;	/* milliseconds */
> +	unsigned int	noreclaim_flag;
>  
> -	current->flags |= PF_MEMALLOC;
> +	noreclaim_flag = memalloc_noreclaim_save();
>  	set_freezable();
>  
>  	while (1) {
> @@ -601,6 +602,7 @@ xfsaild(
>  		tout = xfsaild_push(ailp);
>  	}
>  
> +	memalloc_noreclaim_restore(noreclaim_flag);
>  	return 0;
>  }
>  
> -- 
> 2.25.1
>
Christoph Hellwig March 11, 2020, 6:34 a.m. UTC | #2
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 00cc5b8734be8..3bc570c90ad97 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -529,8 +529,9 @@  xfsaild(
 {
 	struct xfs_ail	*ailp = data;
 	long		tout = 0;	/* milliseconds */
+	unsigned int	noreclaim_flag;
 
-	current->flags |= PF_MEMALLOC;
+	noreclaim_flag = memalloc_noreclaim_save();
 	set_freezable();
 
 	while (1) {
@@ -601,6 +602,7 @@  xfsaild(
 		tout = xfsaild_push(ailp);
 	}
 
+	memalloc_noreclaim_restore(noreclaim_flag);
 	return 0;
 }