| Message ID | 20260514062955.1183976-5-yi.zhang@huaweicloud.com |
|---|---|
| State | Superseded |
| Headers | show |
| Series | iomap: trivial fixes for ext4 conversion | expand |
On Wed, May 13, 2026 at 11:35 PM Zhang Yi <yi.zhang@huaweicloud.com> wrote: > > From: Zhang Yi <yi.zhang@huawei.com> > > ifs_set_range_dirty() and ifs_set_range_uptodate() compute last_blk > as (off + len - 1) >> i_blkbits. When off is 0 and len is 0, the > unsigned subtraction underflows to SIZE_MAX, producing a huge > last_blk and nr_blks value that causes bitmap_set() to write far > beyond the ifs->state allocation. > > Regarding ifs_set_range_uptodate(), it is temporarily safe because len > cannot be passed in as 0. However, for ifs_set_range_dirty() this is > reachable from __iomap_write_end(): when copy_folio_from_iter_atomic() > returns 0 (e.g. user buffer fault) and the folio is already uptodate, > the guard at the top of __iomap_write_end() does not trigger because > !folio_test_uptodate() is false, and iomap_set_range_dirty() is called > with copied == 0. Is the Fixes: 4ce02c679722 ("iomap: Add per-block dirty state tracking to improve performance") tag needed for this? > > Add a !len guard to both functions before the computation, so that a > zero-length range is a no-op. > > Signed-off-by: Zhang Yi <yi.zhang@huawei.com> > --- > fs/iomap/buffered-io.c | 23 +++++++++++++++-------- > 1 file changed, 15 insertions(+), 8 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 27ab33edbdee..6fe5f7e998fd 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -67,11 +67,14 @@ static bool ifs_set_range_uptodate(struct folio *folio, > struct iomap_folio_state *ifs, size_t off, size_t len) > { > struct inode *inode = folio->mapping->host; > - unsigned int first_blk = off >> inode->i_blkbits; > - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; > - unsigned int nr_blks = last_blk - first_blk + 1; > + unsigned int first_blk, last_blk; > > - bitmap_set(ifs->state, first_blk, nr_blks); > + if (!len) > + return true; I think both callers of ifs_set_range_uptodate() use the return value to decide whether to mark the folio uptodate or not - does this still need to return ifs_is_fully_uptodate(folio, ifs) instead of always true? Thanks, Joanne > + > + first_blk = off >> inode->i_blkbits; > + last_blk = (off + len - 1) >> inode->i_blkbits; > + bitmap_set(ifs->state, first_blk, last_blk - first_blk + 1); > return ifs_is_fully_uptodate(folio, ifs); > }
On Thu, May 14, 2026 at 02:29:55PM +0800, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@huawei.com> > > ifs_set_range_dirty() and ifs_set_range_uptodate() compute last_blk > as (off + len - 1) >> i_blkbits. When off is 0 and len is 0, the > unsigned subtraction underflows to SIZE_MAX, producing a huge > last_blk and nr_blks value that causes bitmap_set() to write far > beyond the ifs->state allocation. > > Regarding ifs_set_range_uptodate(), it is temporarily safe because len > cannot be passed in as 0. However, for ifs_set_range_dirty() this is > reachable from __iomap_write_end(): when copy_folio_from_iter_atomic() > returns 0 (e.g. user buffer fault) and the folio is already uptodate, > the guard at the top of __iomap_write_end() does not trigger because > !folio_test_uptodate() is false, and iomap_set_range_dirty() is called > with copied == 0. > > Add a !len guard to both functions before the computation, so that a > zero-length range is a no-op. > > Signed-off-by: Zhang Yi <yi.zhang@huawei.com> > --- > fs/iomap/buffered-io.c | 23 +++++++++++++++-------- > 1 file changed, 15 insertions(+), 8 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 27ab33edbdee..6fe5f7e998fd 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -67,11 +67,14 @@ static bool ifs_set_range_uptodate(struct folio *folio, > struct iomap_folio_state *ifs, size_t off, size_t len) > { > struct inode *inode = folio->mapping->host; > - unsigned int first_blk = off >> inode->i_blkbits; > - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; > - unsigned int nr_blks = last_blk - first_blk + 1; > + unsigned int first_blk, last_blk; > > - bitmap_set(ifs->state, first_blk, nr_blks); > + if (!len) > + return true; > + > + first_blk = off >> inode->i_blkbits; > + last_blk = (off + len - 1) >> inode->i_blkbits; > + bitmap_set(ifs->state, first_blk, last_blk - first_blk + 1); > return ifs_is_fully_uptodate(folio, ifs); > } > > @@ -203,13 +206,17 @@ static void ifs_set_range_dirty(struct folio *folio, > { > struct inode *inode = folio->mapping->host; > unsigned int blks_per_folio = i_blocks_per_folio(inode, folio); > - unsigned int first_blk = (off >> inode->i_blkbits); > - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; > - unsigned int nr_blks = last_blk - first_blk + 1; > + unsigned int first_blk, last_blk; > unsigned long flags; > > + if (!len) > + return; > + > + first_blk = off >> inode->i_blkbits; > + last_blk = (off + len - 1) >> inode->i_blkbits; > spin_lock_irqsave(&ifs->state_lock, flags); > - bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks); > + bitmap_set(ifs->state, first_blk + blks_per_folio, > + last_blk - first_blk + 1); I'm curious about the inconsistency in the computations between ifs_clear_range_dirty and ifs_set_range_dirty now. In the function that clears dirty bits, off/len are rounded inwards: unsigned int first_blk = round_up(off, i_blocksize(inode)) >> inode->i_blkbits; unsigned int last_blk = (off + len) >> inode->i_blkbits; unsigned long flags; if (first_blk >= last_blk) return; spin_lock_irqsave(&ifs->state_lock, flags); bitmap_clear(ifs->state, first_blk + blks_per_folio, last_blk - first_blk); but here we're still rounding outwards: if (!len) return; first_blk = off >> inode->i_blkbits; last_blk = (off + len - 1) >> inode->i_blkbits; spin_lock_irqsave(&ifs->state_lock, flags); bitmap_set(ifs->state, first_blk + blks_per_folio, last_blk - first_blk + 1); That doesn't quite sound right to me without an explanation in the code, which currently lacks one. I *think* the reason for the discrepancy is that if we want to dirty part of an fsblock, we need to mark the whole block dirty in the ifs so that all the blocks get written out; but when we're clearing dirty bits, we want to leave an fsblock dirty if we only wrote back part of that fsblock. Does that sound right? --D
On 5/15/2026 2:10 AM, Darrick J. Wong wrote: > On Thu, May 14, 2026 at 02:29:55PM +0800, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@huawei.com> >> >> ifs_set_range_dirty() and ifs_set_range_uptodate() compute last_blk >> as (off + len - 1) >> i_blkbits. When off is 0 and len is 0, the >> unsigned subtraction underflows to SIZE_MAX, producing a huge >> last_blk and nr_blks value that causes bitmap_set() to write far >> beyond the ifs->state allocation. >> >> Regarding ifs_set_range_uptodate(), it is temporarily safe because len >> cannot be passed in as 0. However, for ifs_set_range_dirty() this is >> reachable from __iomap_write_end(): when copy_folio_from_iter_atomic() >> returns 0 (e.g. user buffer fault) and the folio is already uptodate, >> the guard at the top of __iomap_write_end() does not trigger because >> !folio_test_uptodate() is false, and iomap_set_range_dirty() is called >> with copied == 0. >> >> Add a !len guard to both functions before the computation, so that a >> zero-length range is a no-op. >> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> >> --- >> fs/iomap/buffered-io.c | 23 +++++++++++++++-------- >> 1 file changed, 15 insertions(+), 8 deletions(-) >> >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c >> index 27ab33edbdee..6fe5f7e998fd 100644 >> --- a/fs/iomap/buffered-io.c >> +++ b/fs/iomap/buffered-io.c >> @@ -67,11 +67,14 @@ static bool ifs_set_range_uptodate(struct folio *folio, >> struct iomap_folio_state *ifs, size_t off, size_t len) >> { >> struct inode *inode = folio->mapping->host; >> - unsigned int first_blk = off >> inode->i_blkbits; >> - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; >> - unsigned int nr_blks = last_blk - first_blk + 1; >> + unsigned int first_blk, last_blk; >> >> - bitmap_set(ifs->state, first_blk, nr_blks); >> + if (!len) >> + return true; >> + >> + first_blk = off >> inode->i_blkbits; >> + last_blk = (off + len - 1) >> inode->i_blkbits; >> + bitmap_set(ifs->state, first_blk, last_blk - first_blk + 1); >> return ifs_is_fully_uptodate(folio, ifs); >> } >> >> @@ -203,13 +206,17 @@ static void ifs_set_range_dirty(struct folio *folio, >> { >> struct inode *inode = folio->mapping->host; >> unsigned int blks_per_folio = i_blocks_per_folio(inode, folio); >> - unsigned int first_blk = (off >> inode->i_blkbits); >> - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; >> - unsigned int nr_blks = last_blk - first_blk + 1; >> + unsigned int first_blk, last_blk; >> unsigned long flags; >> >> + if (!len) >> + return; >> + >> + first_blk = off >> inode->i_blkbits; >> + last_blk = (off + len - 1) >> inode->i_blkbits; >> spin_lock_irqsave(&ifs->state_lock, flags); >> - bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks); >> + bitmap_set(ifs->state, first_blk + blks_per_folio, >> + last_blk - first_blk + 1); > > I'm curious about the inconsistency in the computations between > ifs_clear_range_dirty and ifs_set_range_dirty now. In the function that > clears dirty bits, off/len are rounded inwards: > > unsigned int first_blk = round_up(off, i_blocksize(inode)) >> > inode->i_blkbits; > unsigned int last_blk = (off + len) >> inode->i_blkbits; > unsigned long flags; > > if (first_blk >= last_blk) > return; > > spin_lock_irqsave(&ifs->state_lock, flags); > bitmap_clear(ifs->state, first_blk + blks_per_folio, > last_blk - first_blk); > > but here we're still rounding outwards: > > if (!len) > return; > > first_blk = off >> inode->i_blkbits; > last_blk = (off + len - 1) >> inode->i_blkbits; > spin_lock_irqsave(&ifs->state_lock, flags); > bitmap_set(ifs->state, first_blk + blks_per_folio, > last_blk - first_blk + 1); > > That doesn't quite sound right to me without an explanation in the code, > which currently lacks one. I *think* the reason for the discrepancy is > that if we want to dirty part of an fsblock, we need to mark the whole > block dirty in the ifs so that all the blocks get written out; but when > we're clearing dirty bits, we want to leave an fsblock dirty if we only > wrote back part of that fsblock. Does that sound right? > > --D > Yes, that's right. The primary purpose of clearing the dirty bit is for invalidating a partial folio(e.g., when punching hole). Only when a full fsblock has been invalidated can its corresponding dirty bit be cleared. Write-back operations never seem to write back only part of a fsblock. I can add comments for them in my next iteration. Thanks, Yi.
On 5/14/2026 11:08 PM, Joanne Koong wrote: > On Wed, May 13, 2026 at 11:35 PM Zhang Yi <yi.zhang@huaweicloud.com> wrote: >> >> From: Zhang Yi <yi.zhang@huawei.com> >> >> ifs_set_range_dirty() and ifs_set_range_uptodate() compute last_blk >> as (off + len - 1) >> i_blkbits. When off is 0 and len is 0, the >> unsigned subtraction underflows to SIZE_MAX, producing a huge >> last_blk and nr_blks value that causes bitmap_set() to write far >> beyond the ifs->state allocation. >> >> Regarding ifs_set_range_uptodate(), it is temporarily safe because len >> cannot be passed in as 0. However, for ifs_set_range_dirty() this is >> reachable from __iomap_write_end(): when copy_folio_from_iter_atomic() >> returns 0 (e.g. user buffer fault) and the folio is already uptodate, >> the guard at the top of __iomap_write_end() does not trigger because >> !folio_test_uptodate() is false, and iomap_set_range_dirty() is called >> with copied == 0. > > Is the Fixes: 4ce02c679722 ("iomap: Add per-block dirty state > tracking to improve performance") tag needed for this? > >> >> Add a !len guard to both functions before the computation, so that a >> zero-length range is a no-op. >> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> >> --- >> fs/iomap/buffered-io.c | 23 +++++++++++++++-------- >> 1 file changed, 15 insertions(+), 8 deletions(-) >> >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c >> index 27ab33edbdee..6fe5f7e998fd 100644 >> --- a/fs/iomap/buffered-io.c >> +++ b/fs/iomap/buffered-io.c >> @@ -67,11 +67,14 @@ static bool ifs_set_range_uptodate(struct folio *folio, >> struct iomap_folio_state *ifs, size_t off, size_t len) >> { >> struct inode *inode = folio->mapping->host; >> - unsigned int first_blk = off >> inode->i_blkbits; >> - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; >> - unsigned int nr_blks = last_blk - first_blk + 1; >> + unsigned int first_blk, last_blk; >> >> - bitmap_set(ifs->state, first_blk, nr_blks); >> + if (!len) >> + return true; > > I think both callers of ifs_set_range_uptodate() use the return value > to decide whether to mark the folio uptodate or not - does this still > need to return ifs_is_fully_uptodate(folio, ifs) instead of always > true? > Yeah, you are right! I missed that, will fix. Thanks, Yi. > Thanks, > Joanne > >> + >> + first_blk = off >> inode->i_blkbits; >> + last_blk = (off + len - 1) >> inode->i_blkbits; >> + bitmap_set(ifs->state, first_blk, last_blk - first_blk + 1); >> return ifs_is_fully_uptodate(folio, ifs); >> }
On Fri, May 15, 2026 at 09:50:08AM +0800, Zhang Yi wrote: > On 5/15/2026 2:10 AM, Darrick J. Wong wrote: > > On Thu, May 14, 2026 at 02:29:55PM +0800, Zhang Yi wrote: > >> From: Zhang Yi <yi.zhang@huawei.com> > >> > >> ifs_set_range_dirty() and ifs_set_range_uptodate() compute last_blk > >> as (off + len - 1) >> i_blkbits. When off is 0 and len is 0, the > >> unsigned subtraction underflows to SIZE_MAX, producing a huge > >> last_blk and nr_blks value that causes bitmap_set() to write far > >> beyond the ifs->state allocation. > >> > >> Regarding ifs_set_range_uptodate(), it is temporarily safe because len > >> cannot be passed in as 0. However, for ifs_set_range_dirty() this is > >> reachable from __iomap_write_end(): when copy_folio_from_iter_atomic() > >> returns 0 (e.g. user buffer fault) and the folio is already uptodate, > >> the guard at the top of __iomap_write_end() does not trigger because > >> !folio_test_uptodate() is false, and iomap_set_range_dirty() is called > >> with copied == 0. > >> > >> Add a !len guard to both functions before the computation, so that a > >> zero-length range is a no-op. > >> > >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> > >> --- > >> fs/iomap/buffered-io.c | 23 +++++++++++++++-------- > >> 1 file changed, 15 insertions(+), 8 deletions(-) > >> > >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > >> index 27ab33edbdee..6fe5f7e998fd 100644 > >> --- a/fs/iomap/buffered-io.c > >> +++ b/fs/iomap/buffered-io.c > >> @@ -67,11 +67,14 @@ static bool ifs_set_range_uptodate(struct folio *folio, > >> struct iomap_folio_state *ifs, size_t off, size_t len) > >> { > >> struct inode *inode = folio->mapping->host; > >> - unsigned int first_blk = off >> inode->i_blkbits; > >> - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; > >> - unsigned int nr_blks = last_blk - first_blk + 1; > >> + unsigned int first_blk, last_blk; > >> > >> - bitmap_set(ifs->state, first_blk, nr_blks); > >> + if (!len) > >> + return true; > >> + > >> + first_blk = off >> inode->i_blkbits; > >> + last_blk = (off + len - 1) >> inode->i_blkbits; > >> + bitmap_set(ifs->state, first_blk, last_blk - first_blk + 1); > >> return ifs_is_fully_uptodate(folio, ifs); > >> } > >> > >> @@ -203,13 +206,17 @@ static void ifs_set_range_dirty(struct folio *folio, > >> { > >> struct inode *inode = folio->mapping->host; > >> unsigned int blks_per_folio = i_blocks_per_folio(inode, folio); > >> - unsigned int first_blk = (off >> inode->i_blkbits); > >> - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; > >> - unsigned int nr_blks = last_blk - first_blk + 1; > >> + unsigned int first_blk, last_blk; > >> unsigned long flags; > >> > >> + if (!len) > >> + return; > >> + > >> + first_blk = off >> inode->i_blkbits; > >> + last_blk = (off + len - 1) >> inode->i_blkbits; > >> spin_lock_irqsave(&ifs->state_lock, flags); > >> - bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks); > >> + bitmap_set(ifs->state, first_blk + blks_per_folio, > >> + last_blk - first_blk + 1); > > > > I'm curious about the inconsistency in the computations between > > ifs_clear_range_dirty and ifs_set_range_dirty now. In the function that > > clears dirty bits, off/len are rounded inwards: > > > > unsigned int first_blk = round_up(off, i_blocksize(inode)) >> > > inode->i_blkbits; > > unsigned int last_blk = (off + len) >> inode->i_blkbits; > > unsigned long flags; > > > > if (first_blk >= last_blk) > > return; > > > > spin_lock_irqsave(&ifs->state_lock, flags); > > bitmap_clear(ifs->state, first_blk + blks_per_folio, > > last_blk - first_blk); > > > > but here we're still rounding outwards: > > > > if (!len) > > return; > > > > first_blk = off >> inode->i_blkbits; > > last_blk = (off + len - 1) >> inode->i_blkbits; > > spin_lock_irqsave(&ifs->state_lock, flags); > > bitmap_set(ifs->state, first_blk + blks_per_folio, > > last_blk - first_blk + 1); > > > > That doesn't quite sound right to me without an explanation in the code, > > which currently lacks one. I *think* the reason for the discrepancy is > > that if we want to dirty part of an fsblock, we need to mark the whole > > block dirty in the ifs so that all the blocks get written out; but when > > we're clearing dirty bits, we want to leave an fsblock dirty if we only > > wrote back part of that fsblock. Does that sound right? > > > > --D > > > > Yes, that's right. The primary purpose of clearing the dirty bit is for > invalidating a partial folio(e.g., when punching hole). Only when a full > fsblock has been invalidated can its corresponding dirty bit be cleared. > Write-back operations never seem to write back only part of a fsblock. > > I can add comments for them in my next iteration. Sounds good, thank you. (Does this patch also need a fixes tag like Joanne asked?) --D > Thanks, > Yi. > > >
On 5/16/2026 1:30 AM, Darrick J. Wong wrote: > On Fri, May 15, 2026 at 09:50:08AM +0800, Zhang Yi wrote: >> On 5/15/2026 2:10 AM, Darrick J. Wong wrote: >>> On Thu, May 14, 2026 at 02:29:55PM +0800, Zhang Yi wrote: >>>> From: Zhang Yi <yi.zhang@huawei.com> >>>> >>>> ifs_set_range_dirty() and ifs_set_range_uptodate() compute last_blk >>>> as (off + len - 1) >> i_blkbits. When off is 0 and len is 0, the >>>> unsigned subtraction underflows to SIZE_MAX, producing a huge >>>> last_blk and nr_blks value that causes bitmap_set() to write far >>>> beyond the ifs->state allocation. >>>> >>>> Regarding ifs_set_range_uptodate(), it is temporarily safe because len >>>> cannot be passed in as 0. However, for ifs_set_range_dirty() this is >>>> reachable from __iomap_write_end(): when copy_folio_from_iter_atomic() >>>> returns 0 (e.g. user buffer fault) and the folio is already uptodate, >>>> the guard at the top of __iomap_write_end() does not trigger because >>>> !folio_test_uptodate() is false, and iomap_set_range_dirty() is called >>>> with copied == 0. >>>> >>>> Add a !len guard to both functions before the computation, so that a >>>> zero-length range is a no-op. >>>> >>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> >>>> --- >>>> fs/iomap/buffered-io.c | 23 +++++++++++++++-------- >>>> 1 file changed, 15 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c >>>> index 27ab33edbdee..6fe5f7e998fd 100644 >>>> --- a/fs/iomap/buffered-io.c >>>> +++ b/fs/iomap/buffered-io.c >>>> @@ -67,11 +67,14 @@ static bool ifs_set_range_uptodate(struct folio *folio, >>>> struct iomap_folio_state *ifs, size_t off, size_t len) >>>> { >>>> struct inode *inode = folio->mapping->host; >>>> - unsigned int first_blk = off >> inode->i_blkbits; >>>> - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; >>>> - unsigned int nr_blks = last_blk - first_blk + 1; >>>> + unsigned int first_blk, last_blk; >>>> >>>> - bitmap_set(ifs->state, first_blk, nr_blks); >>>> + if (!len) >>>> + return true; >>>> + >>>> + first_blk = off >> inode->i_blkbits; >>>> + last_blk = (off + len - 1) >> inode->i_blkbits; >>>> + bitmap_set(ifs->state, first_blk, last_blk - first_blk + 1); >>>> return ifs_is_fully_uptodate(folio, ifs); >>>> } >>>> >>>> @@ -203,13 +206,17 @@ static void ifs_set_range_dirty(struct folio *folio, >>>> { >>>> struct inode *inode = folio->mapping->host; >>>> unsigned int blks_per_folio = i_blocks_per_folio(inode, folio); >>>> - unsigned int first_blk = (off >> inode->i_blkbits); >>>> - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; >>>> - unsigned int nr_blks = last_blk - first_blk + 1; >>>> + unsigned int first_blk, last_blk; >>>> unsigned long flags; >>>> >>>> + if (!len) >>>> + return; >>>> + >>>> + first_blk = off >> inode->i_blkbits; >>>> + last_blk = (off + len - 1) >> inode->i_blkbits; >>>> spin_lock_irqsave(&ifs->state_lock, flags); >>>> - bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks); >>>> + bitmap_set(ifs->state, first_blk + blks_per_folio, >>>> + last_blk - first_blk + 1); >>> >>> I'm curious about the inconsistency in the computations between >>> ifs_clear_range_dirty and ifs_set_range_dirty now. In the function that >>> clears dirty bits, off/len are rounded inwards: >>> >>> unsigned int first_blk = round_up(off, i_blocksize(inode)) >> >>> inode->i_blkbits; >>> unsigned int last_blk = (off + len) >> inode->i_blkbits; >>> unsigned long flags; >>> >>> if (first_blk >= last_blk) >>> return; >>> >>> spin_lock_irqsave(&ifs->state_lock, flags); >>> bitmap_clear(ifs->state, first_blk + blks_per_folio, >>> last_blk - first_blk); >>> >>> but here we're still rounding outwards: >>> >>> if (!len) >>> return; >>> >>> first_blk = off >> inode->i_blkbits; >>> last_blk = (off + len - 1) >> inode->i_blkbits; >>> spin_lock_irqsave(&ifs->state_lock, flags); >>> bitmap_set(ifs->state, first_blk + blks_per_folio, >>> last_blk - first_blk + 1); >>> >>> That doesn't quite sound right to me without an explanation in the code, >>> which currently lacks one. I *think* the reason for the discrepancy is >>> that if we want to dirty part of an fsblock, we need to mark the whole >>> block dirty in the ifs so that all the blocks get written out; but when >>> we're clearing dirty bits, we want to leave an fsblock dirty if we only >>> wrote back part of that fsblock. Does that sound right? >>> >>> --D >>> >> >> Yes, that's right. The primary purpose of clearing the dirty bit is for >> invalidating a partial folio(e.g., when punching hole). Only when a full >> fsblock has been invalidated can its corresponding dirty bit be cleared. >> Write-back operations never seem to write back only part of a fsblock. >> >> I can add comments for them in my next iteration. > > Sounds good, thank you. > > (Does this patch also need a fixes tag like Joanne asked?) Yeah, I will add it as well. Thanks, Yi. > > --D > >> Thanks, >> Yi. >> >> >>
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 27ab33edbdee..6fe5f7e998fd 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -67,11 +67,14 @@ static bool ifs_set_range_uptodate(struct folio *folio, struct iomap_folio_state *ifs, size_t off, size_t len) { struct inode *inode = folio->mapping->host; - unsigned int first_blk = off >> inode->i_blkbits; - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; - unsigned int nr_blks = last_blk - first_blk + 1; + unsigned int first_blk, last_blk; - bitmap_set(ifs->state, first_blk, nr_blks); + if (!len) + return true; + + first_blk = off >> inode->i_blkbits; + last_blk = (off + len - 1) >> inode->i_blkbits; + bitmap_set(ifs->state, first_blk, last_blk - first_blk + 1); return ifs_is_fully_uptodate(folio, ifs); } @@ -203,13 +206,17 @@ static void ifs_set_range_dirty(struct folio *folio, { struct inode *inode = folio->mapping->host; unsigned int blks_per_folio = i_blocks_per_folio(inode, folio); - unsigned int first_blk = (off >> inode->i_blkbits); - unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; - unsigned int nr_blks = last_blk - first_blk + 1; + unsigned int first_blk, last_blk; unsigned long flags; + if (!len) + return; + + first_blk = off >> inode->i_blkbits; + last_blk = (off + len - 1) >> inode->i_blkbits; spin_lock_irqsave(&ifs->state_lock, flags); - bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks); + bitmap_set(ifs->state, first_blk + blks_per_folio, + last_blk - first_blk + 1); spin_unlock_irqrestore(&ifs->state_lock, flags); }