diff mbox series

[3/3] ext4: Use generic_quota_read()

Message ID 20220605143815.2330891-4-willy@infradead.org
State Rejected
Headers show
Series Cache quota files in the page cache | expand

Commit Message

Matthew Wilcox (Oracle) June 5, 2022, 2:38 p.m. UTC
The comment about the page cache is rather stale; the buffer cache will
read into the page cache if the buffer isn't present, and the page cache
will not take any locks if the page is present.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/ext4/super.c | 81 ++++++++++++-------------------------------------
 1 file changed, 20 insertions(+), 61 deletions(-)

Comments

Jan Kara June 6, 2022, 8:38 a.m. UTC | #1
On Sun 05-06-22 15:38:15, Matthew Wilcox (Oracle) wrote:
> The comment about the page cache is rather stale; the buffer cache will
> read into the page cache if the buffer isn't present, and the page cache
> will not take any locks if the page is present.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

This will not work for couple of reasons, see below. BTW, I don't think the
comment about page cache was stale (but lacking details I admit ;). As far
as I remember (and it was really many years ago - definitely pre-git era)
the problem was (mainly on the write side) that before current state of the
code we were using calls like vfs_read() / vfs_write() to get quota
information and that was indeed prone to deadlocks.

> @@ -6924,20 +6882,21 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
>  		return -EIO;
>  	}
>  
> -	do {
> -		bh = ext4_bread(handle, inode, blk,
> -				EXT4_GET_BLOCKS_CREATE |
> -				EXT4_GET_BLOCKS_METADATA_NOFAIL);
> -	} while (PTR_ERR(bh) == -ENOSPC &&
> -		 ext4_should_retry_alloc(inode->i_sb, &retries));
> -	if (IS_ERR(bh))
> -		return PTR_ERR(bh);
> -	if (!bh)
> +	folio = read_mapping_folio(inode->i_mapping, off / PAGE_SIZE, NULL);
> +	if (IS_ERR(folio))
> +		return PTR_ERR(folio);
> +	head = folio_buffers(folio);
> +	if (!head)
> +		head = alloc_page_buffers(&folio->page, sb->s_blocksize, false);
> +	if (!head)
>  		goto out;
> +	bh = head;
> +	while ((bh_offset(bh) + sb->s_blocksize) <= (off % PAGE_SIZE))
> +		bh = bh->b_this_page;

We miss proper handling of blocks that are currently beyond i_size
(we are extending the quota file), plus we also miss any mapping of buffers
to appropriate disk blocks here...

It could be all fixed by replicating what we do in ext4_write_begin() but
I'm not quite convinced using inode's page cache is really worth it...

								Honza

>  	BUFFER_TRACE(bh, "get write access");
>  	err = ext4_journal_get_write_access(handle, sb, bh, EXT4_JTR_NONE);
>  	if (err) {
> -		brelse(bh);
> +		folio_put(folio);
>  		return err;
>  	}
>  	lock_buffer(bh);
> @@ -6945,14 +6904,14 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
>  	flush_dcache_page(bh->b_page);
>  	unlock_buffer(bh);
>  	err = ext4_handle_dirty_metadata(handle, NULL, bh);
> -	brelse(bh);
>  out:
> +	folio_put(folio);
> +	if (err)
> +		return err;
>  	if (inode->i_size < off + len) {
>  		i_size_write(inode, off + len);
>  		EXT4_I(inode)->i_disksize = inode->i_size;
> -		err2 = ext4_mark_inode_dirty(handle, inode);
> -		if (unlikely(err2 && !err))
> -			err = err2;
> +		err = ext4_mark_inode_dirty(handle, inode);
>  	}
>  	return err ? err : len;
>  }
> -- 
> 2.35.1
>
Matthew Wilcox (Oracle) June 8, 2022, 1:42 a.m. UTC | #2
On Mon, Jun 06, 2022 at 10:38:14AM +0200, Jan Kara wrote:
> On Sun 05-06-22 15:38:15, Matthew Wilcox (Oracle) wrote:
> > The comment about the page cache is rather stale; the buffer cache will
> > read into the page cache if the buffer isn't present, and the page cache
> > will not take any locks if the page is present.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
> This will not work for couple of reasons, see below. BTW, I don't think the
> comment about page cache was stale (but lacking details I admit ;). As far
> as I remember (and it was really many years ago - definitely pre-git era)
> the problem was (mainly on the write side) that before current state of the
> code we were using calls like vfs_read() / vfs_write() to get quota
> information and that was indeed prone to deadlocks.

Ah yes, vfs_write() might indeed be prone to deadlocks.  Particularly
if we're doing it under the dq_mutex and any memory allocation might
have recursed into reclaim ;-)

I actually found the commit in linux-fullhistory.  Changelog for
context:

commit b72debd66a6ed
Author: Jan Kara <jack@suse.cz>
Date:   Mon Jan 3 04:12:24 2005 -0800

    [PATCH] Fix of quota deadlock on pagelock: quota core

    The four patches in this series fix deadlocks with quotas of pagelock (the
    problem was lock inversion on PageLock and transaction start - quota code
    needed to first start a transaction and then write the data which subsequent
ly
    needed acquisition of PageLock while the standard ordering - PageLock first
    and transaction start later - was used e.g.  by pdflush).  They implement a
    new way of quota access to disk: Every filesystem that would like to impleme
nt
    quotas now has to provide quota_read() and quota_write() functions.  These
    functions must obey quota lock ordering (in particular they should not take
    PageLock inside a transaction).

    The first patch implements the changes in the quota core, the other three
    patches implement needed functions in ext2, ext3 and reiserfs.  The patch for
    reiserfs also fixes several other lock inversion problems (similar as ext3
    had) and implements the journaled quota functionality (which comes almost for
    free after the locking fixes...).
    
    The quota core patch makes quota support in other filesystems (except XFS
    which implements everything on its own ;)) unfunctional (quotaon() will refuse
    to turn on quotas on them).  When the patches get reasonable wide testing and
    it will seem that no major changes will be needed I can make fixes also for
    the other filesystems (JFS, UDF, UFS).
    
    This patch:
    
    The patch implements the new way of quota io in the quota core.  Every
    filesystem wanting to support quotas has to provide functions quota_read()
    and quota_write() obeying quota locking rules.  As the writes and reads
    bypass the pagecache there is some ugly stuff ensuring that userspace can
    see all the data after quotaoff() (or Q_SYNC quotactl).  In future I plan
    to make quota files inaccessible from userspace (with the exception of
    quotacheck(8) which will take care about the cache flushing and such stuff
    itself) so that this synchronization stuff can be removed...
    
    The rewrite of the quota core. Quota uses the filesystem read() and write()
    functions no more to avoid possible deadlocks on PageLock. From now on every
    filesystem supporting quotas must provide functions quota_read() and
    quota_write() which obey the quota locking rules (e.g. they cannot acquire the
    PageLock).
    
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

> > @@ -6924,20 +6882,21 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
> >  		return -EIO;
> >  	}
> >  
> > -	do {
> > -		bh = ext4_bread(handle, inode, blk,
> > -				EXT4_GET_BLOCKS_CREATE |
> > -				EXT4_GET_BLOCKS_METADATA_NOFAIL);
> > -	} while (PTR_ERR(bh) == -ENOSPC &&
> > -		 ext4_should_retry_alloc(inode->i_sb, &retries));
> > -	if (IS_ERR(bh))
> > -		return PTR_ERR(bh);
> > -	if (!bh)
> > +	folio = read_mapping_folio(inode->i_mapping, off / PAGE_SIZE, NULL);
> > +	if (IS_ERR(folio))
> > +		return PTR_ERR(folio);
> > +	head = folio_buffers(folio);
> > +	if (!head)
> > +		head = alloc_page_buffers(&folio->page, sb->s_blocksize, false);
> > +	if (!head)
> >  		goto out;
> > +	bh = head;
> > +	while ((bh_offset(bh) + sb->s_blocksize) <= (off % PAGE_SIZE))
> > +		bh = bh->b_this_page;
> 
> We miss proper handling of blocks that are currently beyond i_size
> (we are extending the quota file), plus we also miss any mapping of buffers
> to appropriate disk blocks here...
> 
> It could be all fixed by replicating what we do in ext4_write_begin() but
> I'm not quite convinced using inode's page cache is really worth it...

Ah, yes, write_begin.  Of course that's what I should have used.

I'm looking at this from the point of view of removing buffer_heads
where possible.  Of course, it's not possible for ext4 while the journal
relies on buffer_heads, but if we can steer filesystems away from using
sb_bread() (or equivalents), I think that's a good thing.
Jan Kara June 8, 2022, 2:21 p.m. UTC | #3
On Wed 08-06-22 02:42:21, Matthew Wilcox wrote:
> On Mon, Jun 06, 2022 at 10:38:14AM +0200, Jan Kara wrote:
> > On Sun 05-06-22 15:38:15, Matthew Wilcox (Oracle) wrote:
> > > The comment about the page cache is rather stale; the buffer cache will
> > > read into the page cache if the buffer isn't present, and the page cache
> > > will not take any locks if the page is present.
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > 
> > This will not work for couple of reasons, see below. BTW, I don't think the
> > comment about page cache was stale (but lacking details I admit ;). As far
> > as I remember (and it was really many years ago - definitely pre-git era)
> > the problem was (mainly on the write side) that before current state of the
> > code we were using calls like vfs_read() / vfs_write() to get quota
> > information and that was indeed prone to deadlocks.
> 
> Ah yes, vfs_write() might indeed be prone to deadlocks.  Particularly
> if we're doing it under the dq_mutex and any memory allocation might
> have recursed into reclaim ;-)
> 
> I actually found the commit in linux-fullhistory.  Changelog for
> context:
> 
> commit b72debd66a6ed
> Author: Jan Kara <jack@suse.cz>
> Date:   Mon Jan 3 04:12:24 2005 -0800
> 
>     [PATCH] Fix of quota deadlock on pagelock: quota core

\o/ to you history searching skills :)

> > > @@ -6924,20 +6882,21 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
> > >  		return -EIO;
> > >  	}
> > >  
> > > -	do {
> > > -		bh = ext4_bread(handle, inode, blk,
> > > -				EXT4_GET_BLOCKS_CREATE |
> > > -				EXT4_GET_BLOCKS_METADATA_NOFAIL);
> > > -	} while (PTR_ERR(bh) == -ENOSPC &&
> > > -		 ext4_should_retry_alloc(inode->i_sb, &retries));
> > > -	if (IS_ERR(bh))
> > > -		return PTR_ERR(bh);
> > > -	if (!bh)
> > > +	folio = read_mapping_folio(inode->i_mapping, off / PAGE_SIZE, NULL);
> > > +	if (IS_ERR(folio))
> > > +		return PTR_ERR(folio);
> > > +	head = folio_buffers(folio);
> > > +	if (!head)
> > > +		head = alloc_page_buffers(&folio->page, sb->s_blocksize, false);
> > > +	if (!head)
> > >  		goto out;
> > > +	bh = head;
> > > +	while ((bh_offset(bh) + sb->s_blocksize) <= (off % PAGE_SIZE))
> > > +		bh = bh->b_this_page;
> > 
> > We miss proper handling of blocks that are currently beyond i_size
> > (we are extending the quota file), plus we also miss any mapping of buffers
> > to appropriate disk blocks here...
> > 
> > It could be all fixed by replicating what we do in ext4_write_begin() but
> > I'm not quite convinced using inode's page cache is really worth it...
> 
> Ah, yes, write_begin.  Of course that's what I should have used.
> 
> I'm looking at this from the point of view of removing buffer_heads
> where possible.  Of course, it's not possible for ext4 while the journal
> relies on buffer_heads, but if we can steer filesystems away from using
> sb_bread() (or equivalents), I think that's a good thing.

Well, ext4 uses sb_bread() (sb_getblk()) for all its metadata so quota
code, which is rather well localized, is the least of your worries I'm
afraid ;).

								Honza
kernel test robot June 13, 2022, 7:59 a.m. UTC | #4
Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: fa964903692268d3913cdaf489f80db849a3e928 ("[PATCH 3/3] ext4: Use generic_quota_read()")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/Cache-quota-files-in-the-page-cache/20220606-021629
base: https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git dev
patch link: https://lore.kernel.org/linux-ext4/20220605143815.2330891-4-willy@infradead.org

in testcase: ltp
version: ltp-x86_64-14c1f76-1_20220604
with following parameters:

	disk: 1HDD
	fs: btrfs
	test: fs-03
	ucode: 0xec

test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features.
test-url: http://linux-test-project.github.io/


on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz with 32G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@intel.com>


[ 366.801777][ T5247] WARNING: CPU: 5 PID: 5247 at fs/ext4/inode.c:3228 ext4_invalidate_folio (fs/ext4/inode.c:3228) 
[  366.811124][ T5247] Modules linked in: loop binfmt_misc dm_mod btrfs blake2b_generic xor raid6_pq intel_rapl_msr intel_rapl_common zstd_compress libcrc32c i915 sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg x86_pkg_temp_thermal intel_gtt drm_buddy intel_powerclamp ipmi_devintf ipmi_msghandler coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm_dp_helper wmi_bmof mei_wdt ttm rapl intel_cstate drm_kms_helper ahci libahci syscopyarea sysfillrect i2c_designware_platform intel_uncore mei_me libata sysimgblt i2c_designware_core mei idma64 fb_sys_fops wmi video intel_pmc_core acpi_pad drm fuse ip_tables
[  366.869509][ T5247] CPU: 5 PID: 5247 Comm: umount Tainted: G          I       5.18.0-rc5-00028-gfa9649036922 #1
[  366.879632][ T5247] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.2.0 12/22/2016
[ 366.887739][ T5247] RIP: 0010:ext4_invalidate_folio (fs/ext4/inode.c:3228) 
[ 366.893597][ T5247] Code: e8 f5 12 c7 ff 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 c1 00 00 00 48 8b 03 a9 00 00 01 00 74 02 <0f> 0b 5b 4c 89 ea 4c 89 e6 48 89 ef 5d 41 5c 41 5d e9 3a 4a df ff
All code
========
   0:	e8 f5 12 c7 ff       	callq  0xffffffffffc712fa
   5:	48 89 da             	mov    %rbx,%rdx
   8:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
   f:	fc ff df 
  12:	48 c1 ea 03          	shr    $0x3,%rdx
  16:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)
  1a:	0f 85 c1 00 00 00    	jne    0xe1
  20:	48 8b 03             	mov    (%rbx),%rax
  23:	a9 00 00 01 00       	test   $0x10000,%eax
  28:	74 02                	je     0x2c
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	5b                   	pop    %rbx
  2d:	4c 89 ea             	mov    %r13,%rdx
  30:	4c 89 e6             	mov    %r12,%rsi
  33:	48 89 ef             	mov    %rbp,%rdi
  36:	5d                   	pop    %rbp
  37:	41 5c                	pop    %r12
  39:	41 5d                	pop    %r13
  3b:	e9 3a 4a df ff       	jmpq   0xffffffffffdf4a7a

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	5b                   	pop    %rbx
   3:	4c 89 ea             	mov    %r13,%rdx
   6:	4c 89 e6             	mov    %r12,%rsi
   9:	48 89 ef             	mov    %rbp,%rdi
   c:	5d                   	pop    %rbp
   d:	41 5c                	pop    %r12
   f:	41 5d                	pop    %r13
  11:	e9 3a 4a df ff       	jmpq   0xffffffffffdf4a50
[  366.913124][ T5247] RSP: 0018:ffffc90000e3fab0 EFLAGS: 00010206
[  366.919089][ T5247] RAX: 000000000011601b RBX: ffff8887f2c18dc8 RCX: ffffffff81c7828b
[  366.926940][ T5247] RDX: 1ffff110fe5831b9 RSI: 0000000000000008 RDI: ffff8887f2c18dc8
[  366.934793][ T5247] RBP: ffffea00072cd700 R08: 0000000000000000 R09: ffff8887f2c18dcf
[  366.942641][ T5247] R10: ffffed10fe5831b9 R11: 0000000000000001 R12: 0000000000000000
[  366.950492][ T5247] R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000000001
[  366.958343][ T5247] FS:  00007f5852a94840(0000) GS:ffff8886aaa80000(0000) knlGS:0000000000000000
[  366.967151][ T5247] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  366.973607][ T5247] CR2: 0000561684be9559 CR3: 00000007f60fc006 CR4: 00000000003706e0
[  366.981457][ T5247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  366.989303][ T5247] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  366.997156][ T5247] Call Trace:
[  367.000322][ T5247]  <TASK>
[ 367.003130][ T5247] truncate_cleanup_folio (mm/truncate.c:159 mm/truncate.c:179) 
[ 367.008372][ T5247] truncate_inode_pages_range (include/linux/pagevec.h:119 (discriminator 3) mm/truncate.c:368 (discriminator 3)) 
[ 367.013965][ T5247] ? truncate_inode_partial_folio (mm/truncate.c:332) 
[ 367.019900][ T5247] ? jbd2_journal_set_features (fs/jbd2/journal.c:682) 
[ 367.025580][ T5247] ? _raw_write_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qrwlock.h:94 include/linux/rwlock_api_smp.h:210 kernel/locking/spinlock.c:300) 
[ 367.030300][ T5247] ? prepare_to_swait_exclusive (kernel/sched/wait.c:414) 
[ 367.036082][ T5247] ? inode_to_bdi (mm/backing-dev.c:995 mm/backing-dev.c:985) 
[ 367.040540][ T5247] ? filemap_check_errors (arch/x86/include/asm/bitops.h:207 include/asm-generic/bitops/instrumented-non-atomic.h:135 mm/filemap.c:351) 
[ 367.045712][ T5247] ? watch_queue_set_size (kernel/watch_queue.c:219) 
[ 367.050864][ T5247] ? filemap_fdatawait_keep_errors (mm/filemap.c:669) 
[ 367.056715][ T5247] ? __cond_resched (kernel/sched/core.c:8177) 
[ 367.061262][ T5247] ? down_write (arch/x86/include/asm/atomic64_64.h:34 include/linux/atomic/atomic-long.h:41 include/linux/atomic/atomic-instrumented.h:1280 kernel/locking/rwsem.c:138 kernel/locking/rwsem.c:255 kernel/locking/rwsem.c:1258 kernel/locking/rwsem.c:1268 kernel/locking/rwsem.c:1515) 
[ 367.065562][ T5247] ? down_write_killable (kernel/locking/rwsem.c:1512) 
[ 367.070715][ T5247] ? v2_free_file_info (fs/quota/quota_v2.c:384) 
[ 367.075530][ T5247] ? kfree (mm/slub.c:1754 mm/slub.c:3510 mm/slub.c:4552) 
[ 367.079385][ T5247] dquot_disable (fs/quota/dquot.c:2355) 
[ 367.083850][ T5247] ext4_quota_off (fs/ext4/super.c:6831) 
[ 367.088314][ T5247] ext4_put_super (fs/ext4/super.c:1173 fs/ext4/super.c:1218) 
[ 367.092779][ T5247] generic_shutdown_super (fs/super.c:464) 
[ 367.098029][ T5247] kill_block_super (fs/super.c:1395) 
[ 367.102680][ T5247] deactivate_locked_super (fs/super.c:339) 
[ 367.107923][ T5247] cleanup_mnt (fs/namespace.c:138 fs/namespace.c:1187) 
[ 367.112210][ T5247] ? path_umount (fs/namespace.c:1808) 
[ 367.116682][ T5247] task_work_run (kernel/task_work.c:166 (discriminator 1)) 
[ 367.121091][ T5247] exit_to_user_mode_loop (include/linux/resume_user_mode.h:49 kernel/entry/common.c:169) 
[ 367.126338][ T5247] exit_to_user_mode_prepare (kernel/entry/common.c:201) 
[ 367.131756][ T5247] syscall_exit_to_user_mode (arch/x86/include/asm/jump_label.h:27 include/linux/context_tracking_state.h:31 include/linux/context_tracking.h:40 kernel/entry/common.c:132 kernel/entry/common.c:296) 
[ 367.137112][ T5247] do_syscall_64 (arch/x86/entry/common.c:87) 
[ 367.141398][ T5247] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115) 
[  367.147174][ T5247] RIP: 0033:0x7f5852cd3e27
[ 367.151460][ T5247] Code: 00 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 00 0c 00 f7 d8 64 89 01 48
All code
========
   0:	00 0c 00             	add    %cl,(%rax,%rax,1)
   3:	f7 d8                	neg    %eax
   5:	64 89 01             	mov    %eax,%fs:(%rcx)
   8:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
   c:	c3                   	retq   
   d:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  13:	31 f6                	xor    %esi,%esi
  15:	e9 09 00 00 00       	jmpq   0x23
  1a:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
  21:	00 00 
  23:	b8 a6 00 00 00       	mov    $0xa6,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
  30:	73 01                	jae    0x33
  32:	c3                   	retq   
  33:	48 8b 0d 39 00 0c 00 	mov    0xc0039(%rip),%rcx        # 0xc0073
  3a:	f7 d8                	neg    %eax
  3c:	64 89 01             	mov    %eax,%fs:(%rcx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
   6:	73 01                	jae    0x9
   8:	c3                   	retq   
   9:	48 8b 0d 39 00 0c 00 	mov    0xc0039(%rip),%rcx        # 0xc0049
  10:	f7 d8                	neg    %eax
  12:	64 89 01             	mov    %eax,%fs:(%rcx)
  15:	48                   	rex.W


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
diff mbox series

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 450c918d68fc..1780649ed224 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1506,8 +1506,6 @@  static int ext4_mark_dquot_dirty(struct dquot *dquot);
 static int ext4_write_info(struct super_block *sb, int type);
 static int ext4_quota_on(struct super_block *sb, int type, int format_id,
 			 const struct path *path);
-static ssize_t ext4_quota_read(struct super_block *sb, int type, char *data,
-			       size_t len, loff_t off);
 static ssize_t ext4_quota_write(struct super_block *sb, int type,
 				const char *data, size_t len, loff_t off);
 static int ext4_quota_enable(struct super_block *sb, int type, int format_id,
@@ -1535,7 +1533,7 @@  static const struct dquot_operations ext4_quota_operations = {
 static const struct quotactl_ops ext4_qctl_operations = {
 	.quota_on	= ext4_quota_on,
 	.quota_off	= ext4_quota_off,
-	.quota_sync	= dquot_quota_sync,
+	.quota_sync	= generic_quota_sync,
 	.get_state	= dquot_get_state,
 	.set_info	= dquot_set_dqinfo,
 	.get_dqblk	= dquot_get_dqblk,
@@ -1559,7 +1557,7 @@  static const struct super_operations ext4_sops = {
 	.statfs		= ext4_statfs,
 	.show_options	= ext4_show_options,
 #ifdef CONFIG_QUOTA
-	.quota_read	= ext4_quota_read,
+	.quota_read	= generic_quota_read,
 	.quota_write	= ext4_quota_write,
 	.get_dquots	= ext4_get_dquots,
 #endif
@@ -6856,55 +6854,15 @@  static int ext4_quota_off(struct super_block *sb, int type)
 	return dquot_quota_off(sb, type);
 }
 
-/* Read data from quotafile - avoid pagecache and such because we cannot afford
- * acquiring the locks... As quota files are never truncated and quota code
- * itself serializes the operations (and no one else should touch the files)
- * we don't have to be afraid of races */
-static ssize_t ext4_quota_read(struct super_block *sb, int type, char *data,
-			       size_t len, loff_t off)
-{
-	struct inode *inode = sb_dqopt(sb)->files[type];
-	ext4_lblk_t blk = off >> EXT4_BLOCK_SIZE_BITS(sb);
-	int offset = off & (sb->s_blocksize - 1);
-	int tocopy;
-	size_t toread;
-	struct buffer_head *bh;
-	loff_t i_size = i_size_read(inode);
-
-	if (off > i_size)
-		return 0;
-	if (off+len > i_size)
-		len = i_size-off;
-	toread = len;
-	while (toread > 0) {
-		tocopy = sb->s_blocksize - offset < toread ?
-				sb->s_blocksize - offset : toread;
-		bh = ext4_bread(NULL, inode, blk, 0);
-		if (IS_ERR(bh))
-			return PTR_ERR(bh);
-		if (!bh)	/* A hole? */
-			memset(data, 0, tocopy);
-		else
-			memcpy(data, bh->b_data+offset, tocopy);
-		brelse(bh);
-		offset = 0;
-		toread -= tocopy;
-		data += tocopy;
-		blk++;
-	}
-	return len;
-}
-
 /* Write to quotafile (we know the transaction is already started and has
  * enough credits) */
 static ssize_t ext4_quota_write(struct super_block *sb, int type,
 				const char *data, size_t len, loff_t off)
 {
 	struct inode *inode = sb_dqopt(sb)->files[type];
-	ext4_lblk_t blk = off >> EXT4_BLOCK_SIZE_BITS(sb);
-	int err = 0, err2 = 0, offset = off & (sb->s_blocksize - 1);
-	int retries = 0;
-	struct buffer_head *bh;
+	int err = 0, offset = off & (sb->s_blocksize - 1);
+	struct buffer_head *bh, *head;
+	struct folio *folio;
 	handle_t *handle = journal_current_handle();
 
 	if (!handle) {
@@ -6924,20 +6882,21 @@  static ssize_t ext4_quota_write(struct super_block *sb, int type,
 		return -EIO;
 	}
 
-	do {
-		bh = ext4_bread(handle, inode, blk,
-				EXT4_GET_BLOCKS_CREATE |
-				EXT4_GET_BLOCKS_METADATA_NOFAIL);
-	} while (PTR_ERR(bh) == -ENOSPC &&
-		 ext4_should_retry_alloc(inode->i_sb, &retries));
-	if (IS_ERR(bh))
-		return PTR_ERR(bh);
-	if (!bh)
+	folio = read_mapping_folio(inode->i_mapping, off / PAGE_SIZE, NULL);
+	if (IS_ERR(folio))
+		return PTR_ERR(folio);
+	head = folio_buffers(folio);
+	if (!head)
+		head = alloc_page_buffers(&folio->page, sb->s_blocksize, false);
+	if (!head)
 		goto out;
+	bh = head;
+	while ((bh_offset(bh) + sb->s_blocksize) <= (off % PAGE_SIZE))
+		bh = bh->b_this_page;
 	BUFFER_TRACE(bh, "get write access");
 	err = ext4_journal_get_write_access(handle, sb, bh, EXT4_JTR_NONE);
 	if (err) {
-		brelse(bh);
+		folio_put(folio);
 		return err;
 	}
 	lock_buffer(bh);
@@ -6945,14 +6904,14 @@  static ssize_t ext4_quota_write(struct super_block *sb, int type,
 	flush_dcache_page(bh->b_page);
 	unlock_buffer(bh);
 	err = ext4_handle_dirty_metadata(handle, NULL, bh);
-	brelse(bh);
 out:
+	folio_put(folio);
+	if (err)
+		return err;
 	if (inode->i_size < off + len) {
 		i_size_write(inode, off + len);
 		EXT4_I(inode)->i_disksize = inode->i_size;
-		err2 = ext4_mark_inode_dirty(handle, inode);
-		if (unlikely(err2 && !err))
-			err = err2;
+		err = ext4_mark_inode_dirty(handle, inode);
 	}
 	return err ? err : len;
 }