mbox series

[0/1,RESEND] ext4: fix lazy initialization next schedule time computation in more granular unit

Message ID 20210902164412.9994-1-shaoyi@amazon.com
Headers show
Series ext4: fix lazy initialization next schedule time computation in more granular unit | expand

Message

Shaoying Xu Sept. 2, 2021, 4:44 p.m. UTC
Description
===========
Ext4 FS has inappropriate implementations on the next schedule time calculation
that use jiffies to measure the time for one request to zero out inode table. This
actually makes the wait time effectively dependent on CONFIG_HZ, which is
undesirable. We have observed on server systems with 100HZ some fairly long delays
in initialization as a result. Therefore, we propose to use more granular unit to
calculate the next schedule time.

Test
====
Tested the patch in stable kernel 5.10 with FS volume 2T and 3T on EC2
instances. Before the fix, instances with 250HZ finished the lazy initialization 
in around 2.4x time less than instances with 100HZ. 
After the fix, both of them finished within approximately same time. 

Patch
=====
Shaoying Xu (1):
  ext4: fix lazy initialization next schedule time computation in more
    granular unit

 fs/ext4/super.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

Comments

Shaoying Xu Sept. 20, 2021, 7:56 p.m. UTC | #1
Here are more context and testing details:

This issue was originally identified in Amazon Linux 2 with kernel 5.10 and
CONFIG_HZ is 250 in x86_64 while 100 in arm64. It can be reproduced by launching
EC2 instances c5.2xlarge (x86_64) and c6g.2xlarge (arm64) then measuring time to
finish ext4lazyinit thread after mounting the ext4 FS.

w/o fix in kernel 5.10
|----------------+-------------+------------|
| ext4 FS volume | c6g.2xlarge | c5.2xlarge |
|----------------+-------------+------------|
| 2T             | 1842 secs   | 743 secs   |
|----------------+-------------+------------|
| 3T             | 2690 secs   | 1110 secs  |
|----------------+-------------+------------|

w/ fix in kernel 5.10
|----------------+-------------+------------|
| ext4 FS volume | c6g.2xlarge | c5.2xlarge |
|----------------+-------------+------------|
| 2T             | 660 secs    | 544 secs   |
|----------------+-------------+------------|
| 3T             | 1053 secs   | 932 secs   |
|----------------+-------------+------------|

On Thu, Sep 02, 2021 at 04:44:11PM +0000, Shaoying Xu wrote:
> Description
> ===========
> Ext4 FS has inappropriate implementations on the next schedule time calculation
> that use jiffies to measure the time for one request to zero out inode table. This
> actually makes the wait time effectively dependent on CONFIG_HZ, which is
> undesirable. We have observed on server systems with 100HZ some fairly long delays
> in initialization as a result. Therefore, we propose to use more granular unit to
> calculate the next schedule time.
> 
> Test
> ====
> Tested the patch in stable kernel 5.10 with FS volume 2T and 3T on EC2
> instances. Before the fix, instances with 250HZ finished the lazy initialization 
> in around 2.4x time less than instances with 100HZ. 
> After the fix, both of them finished within approximately same time. 
> 
> Patch
> =====
> Shaoying Xu (1):
>   ext4: fix lazy initialization next schedule time computation in more
>     granular unit
> 
>  fs/ext4/super.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> -- 
> 2.16.6
>
Theodore Ts'o Oct. 7, 2021, 2:21 p.m. UTC | #2
On Thu, 2 Sep 2021 16:44:11 +0000, Shaoying Xu wrote:
> Description
> ===========
> Ext4 FS has inappropriate implementations on the next schedule time calculation
> that use jiffies to measure the time for one request to zero out inode table. This
> actually makes the wait time effectively dependent on CONFIG_HZ, which is
> undesirable. We have observed on server systems with 100HZ some fairly long delays
> in initialization as a result. Therefore, we propose to use more granular unit to
> calculate the next schedule time.
> 
> [...]

Applied, thanks!

[1/1] ext4: fix lazy initialization next schedule time computation in more granular unit
      commit: 3782027982881d2c1105ffe058aecb69cc780dfa

Best regards,