Message ID | 20230718131052.283350-3-libaokun1@huawei.com |
---|---|
State | Superseded |
Headers | show |
Series | ext4: fix some ext4_lblk_t overflow issues | expand |
Baokun Li <libaokun1@huawei.com> writes: > When we calculate the end position of ext4_free_extent, this position may > be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if > ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the > computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not > the first case of adjusting the best extent, that is, new_bex_end > 0, the > following BUG_ON will be triggered: Nice spotting. > > ========================================================= > kernel BUG at fs/ext4/mballoc.c:5116! > invalid opcode: 0000 [#1] PREEMPT SMP PTI > CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279 > RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430 > Call Trace: > <TASK> > ext4_mb_use_best_found+0x203/0x2f0 > ext4_mb_try_best_found+0x163/0x240 > ext4_mb_regular_allocator+0x158/0x1550 > ext4_mb_new_blocks+0x86a/0xe10 > ext4_ext_map_blocks+0xb0c/0x13a0 > ext4_map_blocks+0x2cd/0x8f0 > ext4_iomap_begin+0x27b/0x400 > iomap_iter+0x222/0x3d0 > __iomap_dio_rw+0x243/0xcb0 > iomap_dio_rw+0x16/0x80 > ========================================================= > > A simple reproducer demonstrating the problem: > > mkfs.ext4 -F /dev/sda -b 4096 100M > mount /dev/sda /tmp/test > fallocate -l1M /tmp/test/tmp > fallocate -l10M /tmp/test/file > fallocate -i -o 1M -l16777203M /tmp/test/file > fsstress -d /tmp/test -l 0 -n 100000 -p 8 & > sleep 10 && killall -9 fsstress > rm -f /tmp/test/tmp > xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192" Could you please also add it into xfstests? I think it's a nice test which can check the boundary conditions for start and end of data types used in mballoc. I think it should even work if you don't do fsstress but instead just fallocate some remaining space in filesystem, so that when you open and try to write it to 0th offset, if automatically hits this error in ext4_mb_new_inode_pa(). > > We declare new_bex_start and new_bex_end as correct types and use fex_end() > to avoid the problems caused by the ext4_lblk_t overflow above. > > Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") > Signed-off-by: Baokun Li <libaokun1@huawei.com> > --- > fs/ext4/mballoc.c | 11 +++++------ > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index eb7f5d35ef96..2090e5e7ba58 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -5076,8 +5076,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) > pa = ac->ac_pa; > > if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) { > - int new_bex_start; > - int new_bex_end; > + ext4_lblk_t new_bex_start; > + loff_t new_bex_end; > > /* we can't allocate as much as normalizer wants. > * so, found space must get proper lstart > @@ -5096,8 +5096,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) > * still cover original start > * 3. Else, keep the best ex at start of original request. > */ > - new_bex_end = ac->ac_g_ex.fe_logical + > - EXT4_C2B(sbi, ac->ac_orig_goal_len); > + new_bex_end = fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len); > new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); > if (ac->ac_o_ex.fe_logical >= new_bex_start) > goto adjust_bex; > @@ -5117,8 +5116,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) > > BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); > BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); > - BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + > - EXT4_C2B(sbi, ac->ac_orig_goal_len))); Ok so the right hand becomes 0 (because then end can go upto 1<<32 which will be 0 for unsigned int). And the left (new_bex_end) might be negative due to some operations above as I see it. And comparing an int with unsigned int, it will promote new_bex_end to unsigned int which will make it's value too large and will hit the bug_on. I would like to carefully review all such paths. I will soon review and get back. > + BUG_ON(new_bex_end > > + fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len)); I am not sure whether using fex_end or pa_end is any helpful. I think we can just typecast if needed and keep it simple rather than adding helpers functions for addition operation. (because of the fact that fex_end() can take a third parameter which sometimes you pass as NULL. Hence it doesn't look clean, IMO) > } > > pa->pa_lstart = ac->ac_b_ex.fe_logical; > -- > 2.31.1 -ritesh
On 2023/7/20 20:44, Ritesh Harjani (IBM) wrote: > Baokun Li <libaokun1@huawei.com> writes: > >> When we calculate the end position of ext4_free_extent, this position may >> be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if >> ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the >> computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not >> the first case of adjusting the best extent, that is, new_bex_end > 0, the >> following BUG_ON will be triggered: > Nice spotting. > >> ========================================================= >> kernel BUG at fs/ext4/mballoc.c:5116! >> invalid opcode: 0000 [#1] PREEMPT SMP PTI >> CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279 >> RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430 >> Call Trace: >> <TASK> >> ext4_mb_use_best_found+0x203/0x2f0 >> ext4_mb_try_best_found+0x163/0x240 >> ext4_mb_regular_allocator+0x158/0x1550 >> ext4_mb_new_blocks+0x86a/0xe10 >> ext4_ext_map_blocks+0xb0c/0x13a0 >> ext4_map_blocks+0x2cd/0x8f0 >> ext4_iomap_begin+0x27b/0x400 >> iomap_iter+0x222/0x3d0 >> __iomap_dio_rw+0x243/0xcb0 >> iomap_dio_rw+0x16/0x80 >> ========================================================= >> >> A simple reproducer demonstrating the problem: >> >> mkfs.ext4 -F /dev/sda -b 4096 100M >> mount /dev/sda /tmp/test >> fallocate -l1M /tmp/test/tmp >> fallocate -l10M /tmp/test/file >> fallocate -i -o 1M -l16777203M /tmp/test/file >> fsstress -d /tmp/test -l 0 -n 100000 -p 8 & >> sleep 10 && killall -9 fsstress >> rm -f /tmp/test/tmp >> xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192" > > Could you please also add it into xfstests? Sure!I'll try to push this test case to xfstests. > I think it's a nice test which can check the boundary conditions for > start and end of data types used in mballoc. I think it should even work > if you don't do fsstress but instead just fallocate some remaining space > in filesystem, so that when you open and try to write it to 0th offset, > if automatically hits this error in ext4_mb_new_inode_pa(). Yes, the fsstress here is just to fill up the remaining space on the disk. > >> We declare new_bex_start and new_bex_end as correct types and use fex_end() >> to avoid the problems caused by the ext4_lblk_t overflow above. >> >> Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") >> Signed-off-by: Baokun Li <libaokun1@huawei.com> >> --- >> fs/ext4/mballoc.c | 11 +++++------ >> 1 file changed, 5 insertions(+), 6 deletions(-) >> >> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c >> index eb7f5d35ef96..2090e5e7ba58 100644 >> --- a/fs/ext4/mballoc.c >> +++ b/fs/ext4/mballoc.c >> @@ -5076,8 +5076,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) >> pa = ac->ac_pa; >> >> if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) { >> - int new_bex_start; >> - int new_bex_end; >> + ext4_lblk_t new_bex_start; >> + loff_t new_bex_end; >> >> /* we can't allocate as much as normalizer wants. >> * so, found space must get proper lstart >> @@ -5096,8 +5096,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) >> * still cover original start >> * 3. Else, keep the best ex at start of original request. >> */ >> - new_bex_end = ac->ac_g_ex.fe_logical + >> - EXT4_C2B(sbi, ac->ac_orig_goal_len); >> + new_bex_end = fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len); >> new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); >> if (ac->ac_o_ex.fe_logical >= new_bex_start) >> goto adjust_bex; >> @@ -5117,8 +5116,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) >> >> BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); >> BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); >> - BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + >> - EXT4_C2B(sbi, ac->ac_orig_goal_len))); > Ok so the right hand becomes 0 (because then end can go upto 1<<32 which > will be 0 for unsigned int). And the left (new_bex_end) might be > negative due to some operations above as I see it. > And comparing an int with unsigned int, it will promote new_bex_end to > unsigned int which will make it's value too large and will hit the > bug_on. Exactly! > > I would like to carefully review all such paths. I will soon review and > get back. Okay, thank you very much for your careful review. The 2nd and 3rd cases of adjusting the best extent are impossible to overflow, so only the first case is converted here. > > >> + BUG_ON(new_bex_end > >> + fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len)); > I am not sure whether using fex_end or pa_end is any helpful. > I think we can just typecast if needed and keep it simple rather > than adding helpers functions for addition operation. > (because of the fact that fex_end() can take a third parameter which > sometimes you pass as NULL. Hence it doesn't look clean, IMO) I added the helper functions here for two reasons: 1. restricting the type of the return value. 2. This avoids the ugly line breaks in most cases. The fex_end() indeed doesn't look as clean as the pa_end(), because we might use the start of the free extent plus some other length to get a new end, like right in ext4_mb_new_inode_pa(), which makes me have to add another extra length argument, but I think it's worth it, and even with the addition of a parameter that will probably be unused, it still looks a lot shorter than the original code. >> } >> >> pa->pa_lstart = ac->ac_b_ex.fe_logical; >> -- >> 2.31.1 > -ritesh Thanks again!
Baokun Li <libaokun1@huawei.com> writes: > When we calculate the end position of ext4_free_extent, this position may > be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if > ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the > computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not > the first case of adjusting the best extent, that is, new_bex_end > 0, the > following BUG_ON will be triggered: > > ========================================================= > kernel BUG at fs/ext4/mballoc.c:5116! > invalid opcode: 0000 [#1] PREEMPT SMP PTI > CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279 > RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430 > Call Trace: > <TASK> > ext4_mb_use_best_found+0x203/0x2f0 > ext4_mb_try_best_found+0x163/0x240 > ext4_mb_regular_allocator+0x158/0x1550 > ext4_mb_new_blocks+0x86a/0xe10 > ext4_ext_map_blocks+0xb0c/0x13a0 > ext4_map_blocks+0x2cd/0x8f0 > ext4_iomap_begin+0x27b/0x400 > iomap_iter+0x222/0x3d0 > __iomap_dio_rw+0x243/0xcb0 > iomap_dio_rw+0x16/0x80 > ========================================================= > > A simple reproducer demonstrating the problem: > > mkfs.ext4 -F /dev/sda -b 4096 100M > mount /dev/sda /tmp/test > fallocate -l1M /tmp/test/tmp > fallocate -l10M /tmp/test/file > fallocate -i -o 1M -l16777203M /tmp/test/file > fsstress -d /tmp/test -l 0 -n 100000 -p 8 & > sleep 10 && killall -9 fsstress > rm -f /tmp/test/tmp > xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192" > > We declare new_bex_start and new_bex_end as correct types and use fex_end() > to avoid the problems caused by the ext4_lblk_t overflow above. > > Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") > Signed-off-by: Baokun Li <libaokun1@huawei.com> > --- > fs/ext4/mballoc.c | 11 +++++------ > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index eb7f5d35ef96..2090e5e7ba58 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -5076,8 +5076,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) > pa = ac->ac_pa; > > if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) { > - int new_bex_start; > - int new_bex_end; > + ext4_lblk_t new_bex_start; > + loff_t new_bex_end; > > /* we can't allocate as much as normalizer wants. > * so, found space must get proper lstart > @@ -5096,8 +5096,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) > * still cover original start > * 3. Else, keep the best ex at start of original request. > */ > - new_bex_end = ac->ac_g_ex.fe_logical + > - EXT4_C2B(sbi, ac->ac_orig_goal_len); > + new_bex_end = fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len); > new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); > if (ac->ac_o_ex.fe_logical >= new_bex_start) > goto adjust_bex; new_bex_end = ac->ac_g_ex.fe_logical + EXT4_C2B(sbi, ac->ac_orig_goal_len); new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); if (ac->ac_o_ex.fe_logical >= new_bex_start) goto adjust_bex; new_bex_start = ac->ac_g_ex.fe_logical; new_bex_end = new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); if (ac->ac_o_ex.fe_logical < new_bex_end) goto adjust_bex; new_bex_start = ac->ac_o_ex.fe_logical; new_bex_end = new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); I think it will be safer if we typecast all above new_bex_end calculations. I understand that it might never happen here that fe_logical + best found extent length will make it overflow. Simply because we only enter here when ac_b_ex.fe_len < ac_orig_goal_len. But I think if we change any logic tomorrow or refactor any part of code, it will be much safer if we simply keep the typecast in place so that we avoid any future tripping caused by arithmatic overflow here. Also I am not finding fex_end() or pa_end() helpers very intuitive way of doing it. I feel typecasting in place is much simpler and reads better. -ritesh > @@ -5117,8 +5116,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) > > BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); > BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); > - BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + > - EXT4_C2B(sbi, ac->ac_orig_goal_len))); > + BUG_ON(new_bex_end > > + fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len)); > } > > pa->pa_lstart = ac->ac_b_ex.fe_logical; > -- > 2.31.1
Baokun Li <libaokun1@huawei.com> writes: > On 2023/7/20 20:44, Ritesh Harjani (IBM) wrote: >> Baokun Li <libaokun1@huawei.com> writes: >> >>> When we calculate the end position of ext4_free_extent, this position may >>> be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if >>> ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the >>> computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not >>> the first case of adjusting the best extent, that is, new_bex_end > 0, the >>> following BUG_ON will be triggered: >> Nice spotting. >> >>> ========================================================= >>> kernel BUG at fs/ext4/mballoc.c:5116! >>> invalid opcode: 0000 [#1] PREEMPT SMP PTI >>> CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279 >>> RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430 >>> Call Trace: >>> <TASK> >>> ext4_mb_use_best_found+0x203/0x2f0 >>> ext4_mb_try_best_found+0x163/0x240 >>> ext4_mb_regular_allocator+0x158/0x1550 >>> ext4_mb_new_blocks+0x86a/0xe10 >>> ext4_ext_map_blocks+0xb0c/0x13a0 >>> ext4_map_blocks+0x2cd/0x8f0 >>> ext4_iomap_begin+0x27b/0x400 >>> iomap_iter+0x222/0x3d0 >>> __iomap_dio_rw+0x243/0xcb0 >>> iomap_dio_rw+0x16/0x80 >>> ========================================================= >>> >>> A simple reproducer demonstrating the problem: >>> >>> mkfs.ext4 -F /dev/sda -b 4096 100M >>> mount /dev/sda /tmp/test >>> fallocate -l1M /tmp/test/tmp >>> fallocate -l10M /tmp/test/file >>> fallocate -i -o 1M -l16777203M /tmp/test/file >>> fsstress -d /tmp/test -l 0 -n 100000 -p 8 & >>> sleep 10 && killall -9 fsstress >>> rm -f /tmp/test/tmp >>> xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192" >> >> Could you please also add it into xfstests? > Sure!I'll try to push this test case to xfstests. Thanks that would be great! >> I think it's a nice test which can check the boundary conditions for >> start and end of data types used in mballoc. I think it should even work >> if you don't do fsstress but instead just fallocate some remaining space >> in filesystem, so that when you open and try to write it to 0th offset, >> if automatically hits this error in ext4_mb_new_inode_pa(). > Yes, the fsstress here is just to fill up the remaining space on the disk. >> >>> We declare new_bex_start and new_bex_end as correct types and use fex_end() >>> to avoid the problems caused by the ext4_lblk_t overflow above. >>> >>> Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") >>> Signed-off-by: Baokun Li <libaokun1@huawei.com> >>> --- >>> fs/ext4/mballoc.c | 11 +++++------ >>> 1 file changed, 5 insertions(+), 6 deletions(-) >>> >>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c >>> index eb7f5d35ef96..2090e5e7ba58 100644 >>> --- a/fs/ext4/mballoc.c >>> +++ b/fs/ext4/mballoc.c >>> @@ -5076,8 +5076,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) >>> pa = ac->ac_pa; >>> >>> if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) { >>> - int new_bex_start; >>> - int new_bex_end; >>> + ext4_lblk_t new_bex_start; >>> + loff_t new_bex_end; >>> >>> /* we can't allocate as much as normalizer wants. >>> * so, found space must get proper lstart >>> @@ -5096,8 +5096,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) >>> * still cover original start >>> * 3. Else, keep the best ex at start of original request. >>> */ >>> - new_bex_end = ac->ac_g_ex.fe_logical + >>> - EXT4_C2B(sbi, ac->ac_orig_goal_len); >>> + new_bex_end = fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len); >>> new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); >>> if (ac->ac_o_ex.fe_logical >= new_bex_start) >>> goto adjust_bex; >>> @@ -5117,8 +5116,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) >>> >>> BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); >>> BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); >>> - BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + >>> - EXT4_C2B(sbi, ac->ac_orig_goal_len))); >> Ok so the right hand becomes 0 (because then end can go upto 1<<32 which >> will be 0 for unsigned int). And the left (new_bex_end) might be >> negative due to some operations above as I see it. >> And comparing an int with unsigned int, it will promote new_bex_end to >> unsigned int which will make it's value too large and will hit the >> bug_on. > Exactly! >> >> I would like to carefully review all such paths. I will soon review and >> get back. > Okay, thank you very much for your careful review. > The 2nd and 3rd cases of adjusting the best extent are impossible to > overflow, > so only the first case is converted here. I noticed them too during review. I think it would be safe to make the changes in other two places as well such that in future we never trip over such overlooked overflow bugs. >> >> >>> + BUG_ON(new_bex_end > >>> + fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len)); >> I am not sure whether using fex_end or pa_end is any helpful. >> I think we can just typecast if needed and keep it simple rather >> than adding helpers functions for addition operation. >> (because of the fact that fex_end() can take a third parameter which >> sometimes you pass as NULL. Hence it doesn't look clean, IMO) > I added the helper functions here for two reasons: > 1. restricting the type of the return value. > 2. This avoids the ugly line breaks in most cases. > > The fex_end() indeed doesn't look as clean as the pa_end(), because we > might use > the start of the free extent plus some other length to get a new end, > like right in > ext4_mb_new_inode_pa(), which makes me have to add another extra length > argument, but I think it's worth it, and even with the addition of a > parameter > that will probably be unused, it still looks a lot shorter than the > original code. IMO, we don't need pa_end() and fex_end() at all. In several places in ext4 we always have taken care by directly typecasting to avoid overflows. Also it reads much simpler rather to typecast in place than having a helper function which is also not very elegant due to a third parameter. Hence I think we should drop those helpers. Thanks once again for catching the overflows and coming up with a easy reproducer. I am surprised that this bug was never caught with LTP, fstests, smatch static checker. How did you find it? :) -ritesh
On 2023/7/21 3:30, Ritesh Harjani (IBM) wrote: > >>> I would like to carefully review all such paths. I will soon review and >>> get back. >> Okay, thank you very much for your careful review. >> The 2nd and 3rd cases of adjusting the best extent are impossible to >> overflow, >> so only the first case is converted here. > I noticed them too during review. I think it would be safe to make the > changes in other two places as well such that in future we never > trip over such overlooked overflow bugs. > >>> >>>> + BUG_ON(new_bex_end > >>>> + fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len)); >>> I am not sure whether using fex_end or pa_end is any helpful. >>> I think we can just typecast if needed and keep it simple rather >>> than adding helpers functions for addition operation. >>> (because of the fact that fex_end() can take a third parameter which >>> sometimes you pass as NULL. Hence it doesn't look clean, IMO) >> I added the helper functions here for two reasons: >> 1. restricting the type of the return value. >> 2. This avoids the ugly line breaks in most cases. >> >> The fex_end() indeed doesn't look as clean as the pa_end(), because we >> might use >> the start of the free extent plus some other length to get a new end, >> like right in >> ext4_mb_new_inode_pa(), which makes me have to add another extra length >> argument, but I think it's worth it, and even with the addition of a >> parameter >> that will probably be unused, it still looks a lot shorter than the >> original code. > IMO, we don't need pa_end() and fex_end() at all. In several places in > ext4 we always have taken care by directly typecasting to avoid > overflows. Also it reads much simpler rather to typecast in place than > having a helper function which is also not very elegant due to a third > parameter. Hence I think we should drop those helpers. I still think helper is useful, but my previous thinking is problematic. I shouldn't make fex_end() adapt to ext4_mb_new_inode_pa(), but should make ext4_mb_new_inode_pa() adapt to fex_end(). After dropping the third argument of fex_end(), modify the ext4_mb_new_inode_pa() function as follows: diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index a2475b8c9fb5..7492ba9062f4 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -5072,8 +5072,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) pa = ac->ac_pa; if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) { - int new_bex_start; - int new_bex_end; + struct ext4_free_extent ex = { + .fe_logical = ac->ac_g_ex.fe_logical; + .fe_len = ac->ac_orig_goal_len; + } + loff_t orig_goal_end = fex_end(sbi, &ex); /* we can't allocate as much as normalizer wants. * so, found space must get proper lstart @@ -5092,29 +5095,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) * still cover original start * 3. Else, keep the best ex at start of original request. */ - new_bex_end = ac->ac_g_ex.fe_logical + - EXT4_C2B(sbi, ac->ac_orig_goal_len); - new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); - if (ac->ac_o_ex.fe_logical >= new_bex_start) - goto adjust_bex; + ex.fe_len = ac->ac_b_ex.fe_len; - new_bex_start = ac->ac_g_ex.fe_logical; - new_bex_end = - new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); - if (ac->ac_o_ex.fe_logical < new_bex_end) + ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len); + if (ac->ac_o_ex.fe_logical >= ex.fe_logical) goto adjust_bex; - new_bex_start = ac->ac_o_ex.fe_logical; - new_bex_end = - new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); + ex.fe_logical = ac->ac_g_ex.fe_logical; + if (ac->ac_o_ex.fe_logical < fex_end(sbi, &ex)) + goto adjust_bex; + ex.fe_logical = ac->ac_o_ex.fe_logical; adjust_bex: - ac->ac_b_ex.fe_logical = new_bex_start; + ac->ac_b_ex.fe_logical = ex.fe_logical; BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); - BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + - EXT4_C2B(sbi, ac->ac_orig_goal_len))); + BUG_ON(fex_end(sbi, &ex) > orig_goal_end); } pa->pa_lstart = ac->ac_b_ex.fe_logical; What do you think of this modification? > Thanks once again for catching the overflows and coming up with a > easy reproducer. I am surprised that this bug was never caught with LTP, > fstests, smatch static checker. > How did you find it? :) > > -ritesh This problem is found in the internal test. Thank you for your review!
On 2023/7/21 3:07, Ritesh Harjani (IBM) wrote: > Baokun Li <libaokun1@huawei.com> writes: > >> When we calculate the end position of ext4_free_extent, this position may >> be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if >> ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the >> computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not >> the first case of adjusting the best extent, that is, new_bex_end > 0, the >> following BUG_ON will be triggered: >> >> ========================================================= >> kernel BUG at fs/ext4/mballoc.c:5116! >> invalid opcode: 0000 [#1] PREEMPT SMP PTI >> CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279 >> RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430 >> Call Trace: >> <TASK> >> ext4_mb_use_best_found+0x203/0x2f0 >> ext4_mb_try_best_found+0x163/0x240 >> ext4_mb_regular_allocator+0x158/0x1550 >> ext4_mb_new_blocks+0x86a/0xe10 >> ext4_ext_map_blocks+0xb0c/0x13a0 >> ext4_map_blocks+0x2cd/0x8f0 >> ext4_iomap_begin+0x27b/0x400 >> iomap_iter+0x222/0x3d0 >> __iomap_dio_rw+0x243/0xcb0 >> iomap_dio_rw+0x16/0x80 >> ========================================================= >> >> A simple reproducer demonstrating the problem: >> >> mkfs.ext4 -F /dev/sda -b 4096 100M >> mount /dev/sda /tmp/test >> fallocate -l1M /tmp/test/tmp >> fallocate -l10M /tmp/test/file >> fallocate -i -o 1M -l16777203M /tmp/test/file >> fsstress -d /tmp/test -l 0 -n 100000 -p 8 & >> sleep 10 && killall -9 fsstress >> rm -f /tmp/test/tmp >> xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192" >> >> We declare new_bex_start and new_bex_end as correct types and use fex_end() >> to avoid the problems caused by the ext4_lblk_t overflow above. >> >> Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") >> Signed-off-by: Baokun Li <libaokun1@huawei.com> >> --- >> fs/ext4/mballoc.c | 11 +++++------ >> 1 file changed, 5 insertions(+), 6 deletions(-) >> >> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c >> index eb7f5d35ef96..2090e5e7ba58 100644 >> --- a/fs/ext4/mballoc.c >> +++ b/fs/ext4/mballoc.c >> @@ -5076,8 +5076,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) >> pa = ac->ac_pa; >> >> if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) { >> - int new_bex_start; >> - int new_bex_end; >> + ext4_lblk_t new_bex_start; >> + loff_t new_bex_end; >> >> /* we can't allocate as much as normalizer wants. >> * so, found space must get proper lstart >> @@ -5096,8 +5096,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) >> * still cover original start >> * 3. Else, keep the best ex at start of original request. >> */ >> - new_bex_end = ac->ac_g_ex.fe_logical + >> - EXT4_C2B(sbi, ac->ac_orig_goal_len); >> + new_bex_end = fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len); >> new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); >> if (ac->ac_o_ex.fe_logical >= new_bex_start) >> goto adjust_bex; > new_bex_end = ac->ac_g_ex.fe_logical + > EXT4_C2B(sbi, ac->ac_orig_goal_len); > new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); > if (ac->ac_o_ex.fe_logical >= new_bex_start) > goto adjust_bex; > > new_bex_start = ac->ac_g_ex.fe_logical; > new_bex_end = > new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); > if (ac->ac_o_ex.fe_logical < new_bex_end) > goto adjust_bex; > > new_bex_start = ac->ac_o_ex.fe_logical; > new_bex_end = > new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); > > I think it will be safer if we typecast all above new_bex_end > calculations. I understand that it might never happen here that > fe_logical + best found extent length will make it overflow. > Simply because we only enter here when ac_b_ex.fe_len < > ac_orig_goal_len. But I think if we change any logic tomorrow or > refactor any part of code, it will be much safer if we simply keep the > typecast in place so that we avoid any future tripping caused by > arithmatic overflow here. > > Also I am not finding fex_end() or pa_end() helpers very intuitive way > of doing it. I feel typecasting in place is much simpler and reads better. > > -ritesh I remove the third parameter of fex_end() and adjust the code in ext4_mb_new_inode_pa() as follows: struct ext4_free_extent ex = { .fe_logical = ac->ac_g_ex.fe_logical, .fe_len = ac->ac_orig_goal_len, }; loff_t orig_goal_end = fex_end(sbi, &ex); ex.fe_len = ac->ac_b_ex.fe_len; ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len); if (ac->ac_o_ex.fe_logical >= ex.fe_logical) goto adjust_bex; ex.fe_logical = ac->ac_g_ex.fe_logical; if (ac->ac_o_ex.fe_logical < fex_end(sbi, &ex)) goto adjust_bex; ex.fe_logical = ac->ac_o_ex.fe_logical; adjust_bex: ac->ac_b_ex.fe_logical = ex.fe_logical; BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); BUG_ON(fex_end(sbi, &ex) > orig_goal_end); In this way, all new_bex_end calculations do not have overflow problems, and the code looks much more neat than before. 😊 >> @@ -5117,8 +5116,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) >> >> BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); >> BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); >> - BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + >> - EXT4_C2B(sbi, ac->ac_orig_goal_len))); >> + BUG_ON(new_bex_end > >> + fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len)); >> } >> >> pa->pa_lstart = ac->ac_b_ex.fe_logical; >> -- >> 2.31.1 Thanks!
Baokun Li <libaokun1@huawei.com> writes: > On 2023/7/21 3:30, Ritesh Harjani (IBM) wrote: >> >>>> I would like to carefully review all such paths. I will soon review and >>>> get back. >>> Okay, thank you very much for your careful review. >>> The 2nd and 3rd cases of adjusting the best extent are impossible to >>> overflow, >>> so only the first case is converted here. >> I noticed them too during review. I think it would be safe to make the >> changes in other two places as well such that in future we never >> trip over such overlooked overflow bugs. >> >>>> >>>>> + BUG_ON(new_bex_end > >>>>> + fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len)); >>>> I am not sure whether using fex_end or pa_end is any helpful. >>>> I think we can just typecast if needed and keep it simple rather >>>> than adding helpers functions for addition operation. >>>> (because of the fact that fex_end() can take a third parameter which >>>> sometimes you pass as NULL. Hence it doesn't look clean, IMO) >>> I added the helper functions here for two reasons: >>> 1. restricting the type of the return value. >>> 2. This avoids the ugly line breaks in most cases. >>> >>> The fex_end() indeed doesn't look as clean as the pa_end(), because we >>> might use >>> the start of the free extent plus some other length to get a new end, >>> like right in >>> ext4_mb_new_inode_pa(), which makes me have to add another extra length >>> argument, but I think it's worth it, and even with the addition of a >>> parameter >>> that will probably be unused, it still looks a lot shorter than the >>> original code. >> IMO, we don't need pa_end() and fex_end() at all. In several places in >> ext4 we always have taken care by directly typecasting to avoid >> overflows. Also it reads much simpler rather to typecast in place than >> having a helper function which is also not very elegant due to a third >> parameter. Hence I think we should drop those helpers. > I still think helper is useful, but my previous thinking is problematic. > I shouldn't > make fex_end() adapt to ext4_mb_new_inode_pa(), but should make > ext4_mb_new_inode_pa() adapt to fex_end(). After dropping the third argument > of fex_end(), modify the ext4_mb_new_inode_pa() function as follows: No. I think we are overly complicating it by doing this. IMHO we don't need fex_end and pa_end at all here. With fex_end, pa_end, we are passing pointers, updating it's members before and after sending it to these functions, dereferencing them within those helpers. e.g. with your change it will look like <snip> struct ext4_free_extent ex = { .fe_logical = ac->ac_g_ex.fe_logical; .fe_len = ac->ac_orig_goal_len; } loff_t orig_goal_end = fex_end(sbi, &ex); ex.fe_len = ac->ac_b_ex.fe_len; ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len); if (ac->ac_o_ex.fe_logical >= ex.fe_logical) goto adjust_bex; </snip> In above snip we introduced ex variable, updated it with orig_goal_len, then called fex_end() to get orig_goal_end, then we again updated ex.fe_len, but this time we didn't call fex_end instead we needed it for getting ex.fe_logical. All of this is not needed at all. e.g. we should use just use (loff_t) wherever it was missed in the code. <snip> ext4_lblk_t new_bex_start; loff_t new_bex_end; new_bex_end = (loff_t)ac->ac_g_ex.fe_logical + EXT4_C2B(sbi, ac->ac_orig_goal_len); new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); if (ac->ac_o_ex.fe_logical >= new_bex_start) goto adjust_bex; </snip> In this we just update (loff_t) and it reads better in my opinion then using ex, fex_end() etc. In my opinion we should simply drop the helpers. It should be obvious that while calculating logical end block for an inode in ext4 by doing lstart + len, we should use loff_t. Since it can be 1 more than the last block which a u32 can hold. So wherever such calculations are made we should ensure the data type of left hand operand should be loff_t and we should typecast the right hand side operands such that there should not be any overflow happening. We do at several places in ext4 already (also while doing left-shifting (loff_t)map.m_lblk). Doing this with helpers, IMO is not useful as we also saw above. > > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index a2475b8c9fb5..7492ba9062f4 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -5072,8 +5072,11 @@ ext4_mb_new_inode_pa(struct > ext4_allocation_context *ac) > pa = ac->ac_pa; > > if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) { > - int new_bex_start; > - int new_bex_end; > + struct ext4_free_extent ex = { > + .fe_logical = ac->ac_g_ex.fe_logical; > + .fe_len = ac->ac_orig_goal_len; > + } > + loff_t orig_goal_end = fex_end(sbi, &ex); > > /* we can't allocate as much as normalizer wants. > * so, found space must get proper lstart > @@ -5092,29 +5095,23 @@ ext4_mb_new_inode_pa(struct > ext4_allocation_context *ac) > * still cover original start > * 3. Else, keep the best ex at start of original request. > */ > - new_bex_end = ac->ac_g_ex.fe_logical + > - EXT4_C2B(sbi, ac->ac_orig_goal_len); > - new_bex_start = new_bex_end - EXT4_C2B(sbi, > ac->ac_b_ex.fe_len); > - if (ac->ac_o_ex.fe_logical >= new_bex_start) > - goto adjust_bex; > + ex.fe_len = ac->ac_b_ex.fe_len; > > - new_bex_start = ac->ac_g_ex.fe_logical; > - new_bex_end = > - new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); > - if (ac->ac_o_ex.fe_logical < new_bex_end) > + ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len); > + if (ac->ac_o_ex.fe_logical >= ex.fe_logical) > goto adjust_bex; > > - new_bex_start = ac->ac_o_ex.fe_logical; > - new_bex_end = > - new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); > + ex.fe_logical = ac->ac_g_ex.fe_logical; > + if (ac->ac_o_ex.fe_logical < fex_end(sbi, &ex)) > + goto adjust_bex; > > + ex.fe_logical = ac->ac_o_ex.fe_logical; > adjust_bex: > - ac->ac_b_ex.fe_logical = new_bex_start; > + ac->ac_b_ex.fe_logical = ex.fe_logical; > > BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); > BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); > - BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + > - EXT4_C2B(sbi, ac->ac_orig_goal_len))); > + BUG_ON(fex_end(sbi, &ex) > orig_goal_end); > } > > pa->pa_lstart = ac->ac_b_ex.fe_logical; > > > What do you think of this modification? > >> Thanks once again for catching the overflows and coming up with a >> easy reproducer. I am surprised that this bug was never caught with LTP, >> fstests, smatch static checker. >> How did you find it? :) >> >> -ritesh > This problem is found in the internal test. Ok. > > Thank you for your review! Thanks for working on the fix. -ritesh
On 2023/7/21 16:24, Ritesh Harjani (IBM) wrote: > Baokun Li <libaokun1@huawei.com> writes: > >> On 2023/7/21 3:30, Ritesh Harjani (IBM) wrote: >>>>> I would like to carefully review all such paths. I will soon review and >>>>> get back. >>>> Okay, thank you very much for your careful review. >>>> The 2nd and 3rd cases of adjusting the best extent are impossible to >>>> overflow, >>>> so only the first case is converted here. >>> I noticed them too during review. I think it would be safe to make the >>> changes in other two places as well such that in future we never >>> trip over such overlooked overflow bugs. >>> >>>>>> + BUG_ON(new_bex_end > >>>>>> + fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len)); >>>>> I am not sure whether using fex_end or pa_end is any helpful. >>>>> I think we can just typecast if needed and keep it simple rather >>>>> than adding helpers functions for addition operation. >>>>> (because of the fact that fex_end() can take a third parameter which >>>>> sometimes you pass as NULL. Hence it doesn't look clean, IMO) >>>> I added the helper functions here for two reasons: >>>> 1. restricting the type of the return value. >>>> 2. This avoids the ugly line breaks in most cases. >>>> >>>> The fex_end() indeed doesn't look as clean as the pa_end(), because we >>>> might use >>>> the start of the free extent plus some other length to get a new end, >>>> like right in >>>> ext4_mb_new_inode_pa(), which makes me have to add another extra length >>>> argument, but I think it's worth it, and even with the addition of a >>>> parameter >>>> that will probably be unused, it still looks a lot shorter than the >>>> original code. >>> IMO, we don't need pa_end() and fex_end() at all. In several places in >>> ext4 we always have taken care by directly typecasting to avoid >>> overflows. Also it reads much simpler rather to typecast in place than >>> having a helper function which is also not very elegant due to a third >>> parameter. Hence I think we should drop those helpers. >> I still think helper is useful, but my previous thinking is problematic. >> I shouldn't >> make fex_end() adapt to ext4_mb_new_inode_pa(), but should make >> ext4_mb_new_inode_pa() adapt to fex_end(). After dropping the third argument >> of fex_end(), modify the ext4_mb_new_inode_pa() function as follows: > No. I think we are overly complicating it by doing this. IMHO we don't need > fex_end and pa_end at all here. With fex_end, pa_end, we are passing pointers, > updating it's members before and after sending it to these functions, > dereferencing them within those helpers. > > e.g. with your change it will look like > <snip> > struct ext4_free_extent ex = { > .fe_logical = ac->ac_g_ex.fe_logical; > .fe_len = ac->ac_orig_goal_len; > } > > loff_t orig_goal_end = fex_end(sbi, &ex); > ex.fe_len = ac->ac_b_ex.fe_len; > ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len); > if (ac->ac_o_ex.fe_logical >= ex.fe_logical) > goto adjust_bex; > </snip> > > In above snip we introduced ex variable, updated it with > orig_goal_len, then called fex_end() to get orig_goal_end, then we again > updated ex.fe_len, but this time we didn't call fex_end instead we > needed it for getting ex.fe_logical. All of this is not needed at all. > > e.g. we should use just use (loff_t) wherever it was missed in the code. > <snip> > ext4_lblk_t new_bex_start; > loff_t new_bex_end; > > new_bex_end = (loff_t)ac->ac_g_ex.fe_logical + > EXT4_C2B(sbi, ac->ac_orig_goal_len); > new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); > if (ac->ac_o_ex.fe_logical >= new_bex_start) > goto adjust_bex; > </snip> > > > In this we just update (loff_t) and it reads better in my opinion then > using ex, fex_end() etc. > > In my opinion we should simply drop the helpers. It should be obvious > that while calculating logical end block for an inode in ext4 by doing > lstart + len, we should use loff_t. > Since it can be 1 more than the last block which a u32 can hold. > So wherever such calculations are made we should ensure the data > type of left hand operand should be loff_t and we should typecast the > right hand side operands such that there should not be any overflow > happening. We do at several places in ext4 already (also while doing > left-shifting (loff_t)map.m_lblk). > > Doing this with helpers, IMO is not useful as we also saw above. I still think it is necessary to add a helper to make the code more concise. Ted, do you think the fex_end() helper function is needed here? I think we might need your advice to end this discussion. 😅 >> >> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c >> index a2475b8c9fb5..7492ba9062f4 100644 >> --- a/fs/ext4/mballoc.c >> +++ b/fs/ext4/mballoc.c >> @@ -5072,8 +5072,11 @@ ext4_mb_new_inode_pa(struct >> ext4_allocation_context *ac) >> pa = ac->ac_pa; >> >> if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) { >> - int new_bex_start; >> - int new_bex_end; >> + struct ext4_free_extent ex = { >> + .fe_logical = ac->ac_g_ex.fe_logical; >> + .fe_len = ac->ac_orig_goal_len; >> + } >> + loff_t orig_goal_end = fex_end(sbi, &ex); >> >> /* we can't allocate as much as normalizer wants. >> * so, found space must get proper lstart >> @@ -5092,29 +5095,23 @@ ext4_mb_new_inode_pa(struct >> ext4_allocation_context *ac) >> * still cover original start >> * 3. Else, keep the best ex at start of original request. >> */ >> - new_bex_end = ac->ac_g_ex.fe_logical + >> - EXT4_C2B(sbi, ac->ac_orig_goal_len); >> - new_bex_start = new_bex_end - EXT4_C2B(sbi, >> ac->ac_b_ex.fe_len); >> - if (ac->ac_o_ex.fe_logical >= new_bex_start) >> - goto adjust_bex; >> + ex.fe_len = ac->ac_b_ex.fe_len; >> >> - new_bex_start = ac->ac_g_ex.fe_logical; >> - new_bex_end = >> - new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); >> - if (ac->ac_o_ex.fe_logical < new_bex_end) >> + ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len); >> + if (ac->ac_o_ex.fe_logical >= ex.fe_logical) >> goto adjust_bex; >> >> - new_bex_start = ac->ac_o_ex.fe_logical; >> - new_bex_end = >> - new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len); >> + ex.fe_logical = ac->ac_g_ex.fe_logical; >> + if (ac->ac_o_ex.fe_logical < fex_end(sbi, &ex)) >> + goto adjust_bex; >> >> + ex.fe_logical = ac->ac_o_ex.fe_logical; >> adjust_bex: >> - ac->ac_b_ex.fe_logical = new_bex_start; >> + ac->ac_b_ex.fe_logical = ex.fe_logical; >> >> BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); >> BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); >> - BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + >> - EXT4_C2B(sbi, ac->ac_orig_goal_len))); >> + BUG_ON(fex_end(sbi, &ex) > orig_goal_end); >> } >> >> pa->pa_lstart = ac->ac_b_ex.fe_logical; Thanks!
On Fri, Jul 21, 2023 at 05:13:13PM +0800, Baokun Li wrote: > > Doing this with helpers, IMO is not useful as we also saw above. > > I still think it is necessary to add a helper to make the code more concise. > > Ted, do you think the fex_end() helper function is needed here? > > I think we might need your advice to end this discussion. 😅 Having helper functions doesn't bother me all _that_ much --- so long as they are named appropriately. The readibility issues start with the fact that the helper function uses loff_t as opposed to ext4_lblk_t, and because someone looking at fex_end() would need additional thinking to figure out what it did. If renamed it to be fex_logical_end() and made it take an ext4_lblk_t, I think it would be better. The bigger complaint that I have, although it's not the fault of your patch, is the use of "ext4_free_extent" all over ext4/mballoc.c when it's much more often used for allocating blocks. So the name of the structure and the "fex" in "fex_end" or "fex_logical_end" is also confusing. Hmm... how about naming helper function extent_logial_end()? And at some point we might want to do a global search and replace changing ext4_free_extent to something like ext4_mballoc_extent, and changing the structure element names. Perhaps: fe_logical ---> ex_lblk fe_start ---> ex_cluster_start fe_group ---> ex_group fe_len ---> ex_cluster_len This is addressing problems where "start" can mean the starting physical block, the starting logical block, or the starting cluster relative to the beginning of the block group. There is also documentation which is just wrong. For example: /** * ext4_trim_all_free -- function to trim all free space in alloc. group * @sb: super block for file system * @group: group to be trimmed * @start: first group block to examine * @max: last group block to examine start and max should be "first group cluster to examine" and "last group cluster to examine", respectively. The bottom line is that there are much larger opportunities to make the code more maintainable than worrying about two new helper functions. :-) Cheers, - Ted
On 2023/7/22 1:15, Theodore Ts'o wrote: > On Fri, Jul 21, 2023 at 05:13:13PM +0800, Baokun Li wrote: >>> Doing this with helpers, IMO is not useful as we also saw above. >> I still think it is necessary to add a helper to make the code more concise. >> >> Ted, do you think the fex_end() helper function is needed here? >> >> I think we might need your advice to end this discussion. 😅 > Having helper functions doesn't bother me all _that_ much --- so long > as they are named appropriately. The readibility issues start with > the fact that the helper function uses loff_t as opposed to > ext4_lblk_t, and because someone looking at fex_end() would need > additional thinking to figure out what it did. If renamed it to be > fex_logical_end() and made it take an ext4_lblk_t, I think it would be > better. Yes, naming is one of the most difficult things. The reason the helper function uses loff_t instead of ext4_lblk_t is because when we compute the extent logical end or pa end, we may exceed the maximum length of ext4_lblk_t and get an overflowed result, which can lead to the four issues fixed in the patch set. Perhaps I should add some comments to explain these. In other words, add this helper function and limit the return value of the function to loff_t is precisely to remind people that such overflow problems exist. As mentioned by ritesh, ext4 has many places to directly perform type conversion in place. However, when we modify the kernel or review code, we will still ignore that the current code may have overflow problems, as in the commit fixed by this patch. If we have a helper function fex_end(), we can avoid potential overflow problems by using it directly when we make changes or by questioning why we didn't use a simpler helper function when reviewing the code, rather than just adding a (loff_t) when we spot the problem and saying, "Oh, the problem is perfectly solved!" 😀 The current problem can be solved in any way, the key is how to prevent similar problems in the future? > The bigger complaint that I have, although it's not the fault of your > patch, is the use of "ext4_free_extent" all over ext4/mballoc.c when > it's much more often used for allocating blocks. So the name of the > structure and the "fex" in "fex_end" or "fex_logical_end" is also > confusing. > > Hmm... how about naming helper function extent_logial_end()? Great! Thank you for naming the helper function. > > And at some point we might want to do a global search and replace > changing ext4_free_extent to something like ext4_mballoc_extent, and > changing the structure element names. Perhaps: > > fe_logical ---> ex_lblk > fe_start ---> ex_cluster_start > fe_group ---> ex_group > fe_len ---> ex_cluster_len > > This is addressing problems where "start" can mean the starting > physical block, the starting logical block, or the starting cluster > relative to the beginning of the block group. > > There is also documentation which is just wrong. For example: > > /** > * ext4_trim_all_free -- function to trim all free space in alloc. group > * @sb: super block for file system > * @group: group to be trimmed > * @start: first group block to examine > * @max: last group block to examine > > start and max should be "first group cluster to examine" and "last > group cluster to examine", respectively. Yes, it is also very important to distinguish between mballoc_extent and free_extent. I will try to rename ext4_free_extent to ext4_mballoc_extent and fix some comment errors in another patch set that does some performance optimizations using "free extent". > > The bottom line is that there are much larger opportunities to make > the code more maintainable than worrying about two new helper > functions. :-) > > Cheers, > > - Ted Yes, making code more maintainable is always the goal. Thank you for your patient explanation! Cheers!
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index eb7f5d35ef96..2090e5e7ba58 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -5076,8 +5076,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) pa = ac->ac_pa; if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) { - int new_bex_start; - int new_bex_end; + ext4_lblk_t new_bex_start; + loff_t new_bex_end; /* we can't allocate as much as normalizer wants. * so, found space must get proper lstart @@ -5096,8 +5096,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) * still cover original start * 3. Else, keep the best ex at start of original request. */ - new_bex_end = ac->ac_g_ex.fe_logical + - EXT4_C2B(sbi, ac->ac_orig_goal_len); + new_bex_end = fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len); new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len); if (ac->ac_o_ex.fe_logical >= new_bex_start) goto adjust_bex; @@ -5117,8 +5116,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac) BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical); BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len); - BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical + - EXT4_C2B(sbi, ac->ac_orig_goal_len))); + BUG_ON(new_bex_end > + fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len)); } pa->pa_lstart = ac->ac_b_ex.fe_logical;
When we calculate the end position of ext4_free_extent, this position may be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not the first case of adjusting the best extent, that is, new_bex_end > 0, the following BUG_ON will be triggered: ========================================================= kernel BUG at fs/ext4/mballoc.c:5116! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279 RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430 Call Trace: <TASK> ext4_mb_use_best_found+0x203/0x2f0 ext4_mb_try_best_found+0x163/0x240 ext4_mb_regular_allocator+0x158/0x1550 ext4_mb_new_blocks+0x86a/0xe10 ext4_ext_map_blocks+0xb0c/0x13a0 ext4_map_blocks+0x2cd/0x8f0 ext4_iomap_begin+0x27b/0x400 iomap_iter+0x222/0x3d0 __iomap_dio_rw+0x243/0xcb0 iomap_dio_rw+0x16/0x80 ========================================================= A simple reproducer demonstrating the problem: mkfs.ext4 -F /dev/sda -b 4096 100M mount /dev/sda /tmp/test fallocate -l1M /tmp/test/tmp fallocate -l10M /tmp/test/file fallocate -i -o 1M -l16777203M /tmp/test/file fsstress -d /tmp/test -l 0 -n 100000 -p 8 & sleep 10 && killall -9 fsstress rm -f /tmp/test/tmp xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192" We declare new_bex_start and new_bex_end as correct types and use fex_end() to avoid the problems caused by the ext4_lblk_t overflow above. Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") Signed-off-by: Baokun Li <libaokun1@huawei.com> --- fs/ext4/mballoc.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)