mbox series

[RFC,v2,0/2] ext4: Add multi-fsblock atomic write support using bigalloc

Message ID cover.1745987268.git.ritesh.list@gmail.com
Headers show
Series ext4: Add multi-fsblock atomic write support using bigalloc | expand

Message

Ritesh Harjani (IBM) April 30, 2025, 5:20 a.m. UTC
This is still an early preview (RFC v2) of multi-fsblock atomic write. Since the
core design of the feature looks ready, wanted to post this for some early
feedback. We will break this into more smaller and meaningful patches in later
revision. However to simplify the review of the core design changes, this
version is limited to just two patches. Individual patches might have more
details in the commit msg.

Note: This overall needs more careful review (other than the core design) which
I will be doing in parallel. However it would be helpful if one can provide any
feedback on the core design changes. Specially around ext4_iomap_alloc()
changes, ->end_io() changes and a new get block flag
EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS.

v1 -> v2:
==========
1. Handled review comments from Ojaswin to optimize the ext4_map_block() calls
   in ext4_iomap_alloc().
2. Fixed the journal credits calculation for both:
	- during block allocation in ext4_iomap_alloc()
	- during dio completion in ->end_io callback.
   Earlier we were starting multiple txns in ->end_io callback for unwritten to
   written conversion. But since in case of atomic writes, we want a single jbd2
   txn, hence made the necessary changes there.

TODOs:
======
1. although this survived quick tests and some custom fio tests for atomic
   write, but I still think we need to add more tests for the corner cases.
   Will work on adding such corner test cases to xfstests.
2. Many fsx related tests in quick group were failing with bigalloc. But those
   were failing even without these patches. Need further investigation.
3. Finish more thorough testing of the series.
4. Refactoring + Cleanups and a more careful review of the overall patch series
   before sending v3.


Ritesh Harjani (IBM) (2):
  ext4: Add multi-fsblock atomic write support with bigalloc
  ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS

 fs/ext4/ext4.h    |   9 ++
 fs/ext4/extents.c |  69 ++++++++++++++
 fs/ext4/file.c    |   7 +-
 fs/ext4/inode.c   | 231 ++++++++++++++++++++++++++++++++++++++++++----
 fs/ext4/super.c   |   8 +-
 5 files changed, 302 insertions(+), 22 deletions(-)

--
2.49.0

Comments

John Garry May 8, 2025, 12:01 p.m. UTC | #1
On 30/04/2025 06:20, Ritesh Harjani (IBM) wrote:
> This is still an early preview (RFC v2) of multi-fsblock atomic write. Since the
> core design of the feature looks ready, wanted to post this for some early
> feedback. We will break this into more smaller and meaningful patches in later
> revision. However to simplify the review of the core design changes, this
> version is limited to just two patches. Individual patches might have more
> details in the commit msg.
> 
> Note: This overall needs more careful review (other than the core design) which
> I will be doing in parallel. However it would be helpful if one can provide any
> feedback on the core design changes. Specially around ext4_iomap_alloc()
> changes, ->end_io() changes and a new get block flag
> EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS.

I gave this a try and it looks ok, specifically atomic writing mixed 
mappings.

I'll try to look closer that the implementation details. But I do note 
that you use blkdev_issue_zeroout() to pre-zero any unwritten range 
which is being atomically written.
Ritesh Harjani (IBM) May 8, 2025, 2:35 p.m. UTC | #2
John Garry <john.g.garry@oracle.com> writes:

> On 30/04/2025 06:20, Ritesh Harjani (IBM) wrote:
>> This is still an early preview (RFC v2) of multi-fsblock atomic write. Since the
>> core design of the feature looks ready, wanted to post this for some early
>> feedback. We will break this into more smaller and meaningful patches in later
>> revision. However to simplify the review of the core design changes, this
>> version is limited to just two patches. Individual patches might have more
>> details in the commit msg.
>> 
>> Note: This overall needs more careful review (other than the core design) which
>> I will be doing in parallel. However it would be helpful if one can provide any
>> feedback on the core design changes. Specially around ext4_iomap_alloc()
>> changes, ->end_io() changes and a new get block flag
>> EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS.
>
> I gave this a try and it looks ok, specifically atomic writing mixed 
> mappings.
>

Thanks John for taking a look.

> I'll try to look closer that the implementation details.

We are in the process of sending v3 (hopefully by tonight) which is an
improved version w.r.t error handling, journal credits and few other
changes. Although nothing has changed w.r.t the design aspect.

> But I do note 
> that you use blkdev_issue_zeroout() to pre-zero any unwritten range 
> which is being atomically written.

Yes, that is how internally ext4_map_blocks() with
EXT4_GET_BLOCKS_CREATE_ZERO will return us the allocated blocks. During
block allocation, on mixed mapping range, we ensure that the entire range
becomes a contiguous mapped extent before starting any data writes.
That means calling ext4_map_blocks() in a loop with
EXT4_GET_BLOCKS_CREATE_ZERO, so that it can zero out any unwritten
extents in the requested region.
I assume writing over a mixed mapping region is not a performance
critical path. 

Do you forsee any problems with the approach (since you said "But I do note...")?

-ritesh
Darrick J. Wong May 8, 2025, 3:19 p.m. UTC | #3
On Thu, May 08, 2025 at 08:05:27PM +0530, Ritesh Harjani wrote:
> John Garry <john.g.garry@oracle.com> writes:
> 
> > On 30/04/2025 06:20, Ritesh Harjani (IBM) wrote:
> >> This is still an early preview (RFC v2) of multi-fsblock atomic write. Since the
> >> core design of the feature looks ready, wanted to post this for some early
> >> feedback. We will break this into more smaller and meaningful patches in later
> >> revision. However to simplify the review of the core design changes, this
> >> version is limited to just two patches. Individual patches might have more
> >> details in the commit msg.
> >> 
> >> Note: This overall needs more careful review (other than the core design) which
> >> I will be doing in parallel. However it would be helpful if one can provide any
> >> feedback on the core design changes. Specially around ext4_iomap_alloc()
> >> changes, ->end_io() changes and a new get block flag
> >> EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS.
> >
> > I gave this a try and it looks ok, specifically atomic writing mixed 
> > mappings.
> >
> 
> Thanks John for taking a look.
> 
> > I'll try to look closer that the implementation details.
> 
> We are in the process of sending v3 (hopefully by tonight) which is an
> improved version w.r.t error handling, journal credits and few other
> changes. Although nothing has changed w.r.t the design aspect.
> 
> > But I do note 
> > that you use blkdev_issue_zeroout() to pre-zero any unwritten range 
> > which is being atomically written.
> 
> Yes, that is how internally ext4_map_blocks() with
> EXT4_GET_BLOCKS_CREATE_ZERO will return us the allocated blocks. During
> block allocation, on mixed mapping range, we ensure that the entire range
> becomes a contiguous mapped extent before starting any data writes.
> That means calling ext4_map_blocks() in a loop with
> EXT4_GET_BLOCKS_CREATE_ZERO, so that it can zero out any unwritten
> extents in the requested region.
> I assume writing over a mixed mapping region is not a performance
> critical path. 
> 
> Do you forsee any problems with the approach (since you said "But I do note...")?

It's a little dumb to write zeroes just so you can atomicwrite a block.
However, ext4 lacks an out of place write handler, so I don't think
there's much else that can be done easily.

--D

> -ritesh