Patchwork [v6] ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate

login
register
mail settings
Submitter Theodore Ts'o
Date Feb. 22, 2014, 5:09 p.m.
Message ID <20140222170930.GE26637@thunk.org>
Download mbox | patch
Permalink /patch/323206/
State Superseded
Headers show

Comments

Theodore Ts'o - Feb. 22, 2014, 5:09 p.m.
On Fri, Feb 21, 2014 at 12:07:41AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <namjae.jeon@samsung.com>
> 
> This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4.
> 
> The semantics of this flag are following:
> 1) It collapses the range lying between offset and length by removing any data
>    blocks which are present in this range and than updates all the logical
>    offsets of extents beyond "offset + len" to nullify the hole created by
>    removing blocks. In short, it does not leave a hole.
> 2) It should be used exclusively. No other fallocate flag in combination.
> 3) Offset and length supplied to fallocate should be fs block size aligned
>    in case of xfs and ext4.
> 4) Collaspe range does not work beyond i_size.
> 
> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
> Tested-by: Dongsu Park <dongsu.park@profitbricks.com>

In terms of how to get this upstream, it looks like if we do something
like this, we can let this patch go via the ext4 tree and we don't
need to worry about whether the vfs level changes have gone in our
not.



> +	ret = ext4_es_remove_extent(inode, punch_start,
> +				    EXT_MAX_BLOCKS - punch_start - 1);
> +	if (ret) {
> +		up_write(&EXT4_I(inode)->i_data_sem);
> +		goto out_stop;
> +	}

Doing this at first is probably a bad idea; you should do this at the
end, and then completely invalidate the es cache for that inode.  That
way, the right thing happens if you get an error in the middle
releasing the boxes and shifting the extents:

> +
> +	ret = ext4_ext_remove_space(inode, punch_start, punch_stop - 1);
> +	if (ret) {
> +		up_write(&EXT4_I(inode)->i_data_sem);
> +		goto out_stop;
> +	}
> +
> +	ret = ext4_ext_shift_extents(inode, handle, punch_stop,
> +				     punch_stop - punch_start);
> +	if (ret) {
> +		up_write(&EXT4_I(inode)->i_data_sem);
> +		goto out_stop;
> +	}

The fact that you are doing these two as two separate steps is
dangerous; what if you've already released the blocks in
ext4_ext_remove_space(), and ext4_ext_shift_extents() fails in the
middle of the processing?  That will leave the file system
inconsistent, which would be bad.

Making sure that the right happens if there is a failure in the middle
of the operation is Really Important....

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner - Feb. 23, 2014, 9:43 p.m.
On Sat, Feb 22, 2014 at 12:09:30PM -0500, Theodore Ts'o wrote:
> On Fri, Feb 21, 2014 at 12:07:41AM +0900, Namjae Jeon wrote:
> > From: Namjae Jeon <namjae.jeon@samsung.com>
> > 
> > This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4.
> > 
> > The semantics of this flag are following:
> > 1) It collapses the range lying between offset and length by removing any data
> >    blocks which are present in this range and than updates all the logical
> >    offsets of extents beyond "offset + len" to nullify the hole created by
> >    removing blocks. In short, it does not leave a hole.
> > 2) It should be used exclusively. No other fallocate flag in combination.
> > 3) Offset and length supplied to fallocate should be fs block size aligned
> >    in case of xfs and ext4.
> > 4) Collaspe range does not work beyond i_size.
> > 
> > Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> > Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
> > Tested-by: Dongsu Park <dongsu.park@profitbricks.com>
> 
> In terms of how to get this upstream, it looks like if we do something
> like this, we can let this patch go via the ext4 tree and we don't
> need to worry about whether the vfs level changes have gone in our
> not.
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ad13359..d7a78ed 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -46,6 +46,10 @@
>  
>  #include <trace/events/ext4.h>
>  
> +#ifndef FALLOC_FL_COLLAPSE_RANGE
> +#define FALLOC_FL_COLLAPSE_RANGE	0x08
> +#endif
> +
>  /*
>   * used by extent splitting.
>   */

You're more than welcome to do that, Ted. Just wait until we get the
VFS part into the XFS tree first ;)

Cheers,

Dave.
NamJae Jeon - Feb. 24, 2014, 1:22 a.m.
2014-02-23 2:09 GMT+09:00, Theodore Ts'o <tytso@mit.edu>:
> On Fri, Feb 21, 2014 at 12:07:41AM +0900, Namjae Jeon wrote:
>> From: Namjae Jeon <namjae.jeon@samsung.com>
>>
>> This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4.
>>
>> The semantics of this flag are following:
>> 1) It collapses the range lying between offset and length by removing any
>> data
>>    blocks which are present in this range and than updates all the
>> logical
>>    offsets of extents beyond "offset + len" to nullify the hole created
>> by
>>    removing blocks. In short, it does not leave a hole.
>> 2) It should be used exclusively. No other fallocate flag in combination.
>> 3) Offset and length supplied to fallocate should be fs block size
>> aligned
>>    in case of xfs and ext4.
>> 4) Collaspe range does not work beyond i_size.
>>
>> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
>> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
>> Tested-by: Dongsu Park <dongsu.park@profitbricks.com>
>
> In terms of how to get this upstream, it looks like if we do something
> like this, we can let this patch go via the ext4 tree and we don't
> need to worry about whether the vfs level changes have gone in our
> not.
Okay, Dave already answered.
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ad13359..d7a78ed 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -46,6 +46,10 @@
>
>  #include <trace/events/ext4.h>
>
> +#ifndef FALLOC_FL_COLLAPSE_RANGE
> +#define FALLOC_FL_COLLAPSE_RANGE	0x08
> +#endif
> +
>  /*
>   * used by extent splitting.
>   */
>
>
>> +	ret = ext4_es_remove_extent(inode, punch_start,
>> +				    EXT_MAX_BLOCKS - punch_start - 1);
>> +	if (ret) {
>> +		up_write(&EXT4_I(inode)->i_data_sem);
>> +		goto out_stop;
>> +	}
>
> Doing this at first is probably a bad idea; you should do this at the
> end, and then completely invalidate the es cache for that inode.  That
> way, the right thing happens if you get an error in the middle
> releasing the boxes and shifting the extents:
Okay, I see.

>
>> +
>> +	ret = ext4_ext_remove_space(inode, punch_start, punch_stop - 1);
>> +	if (ret) {
>> +		up_write(&EXT4_I(inode)->i_data_sem);
>> +		goto out_stop;
>> +	}
>> +
>> +	ret = ext4_ext_shift_extents(inode, handle, punch_stop,
>> +				     punch_stop - punch_start);
>> +	if (ret) {
>> +		up_write(&EXT4_I(inode)->i_data_sem);
>> +		goto out_stop;
>> +	}
>
Hi Ted.
> The fact that you are doing these two as two separate steps is
> dangerous; what if you've already released the blocks in
> ext4_ext_remove_space(), and ext4_ext_shift_extents() fails in the
> middle of the processing?  That will leave the file system
> inconsistent, which would be bad.
If there is error in the middle of extent shifting, the hole will
present between the last shifted extent and the extent at which error
happen so this will be consistent state.
IMHO even if there is error in between the shift, filesystem will be
in consistent state.
Am I missing something?
>
> Making sure that the right happens if there is a failure in the middle
> of the operation is Really Important....
>
> 					- Ted
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Feb. 26, 2014, 4:48 p.m.
On Mon, Feb 24, 2014 at 10:22:10AM +0900, Namjae Jeon wrote:
> >> +	ret = ext4_es_remove_extent(inode, punch_start,
> >> +				    EXT_MAX_BLOCKS - punch_start - 1);
> >> +	if (ret) {
> >> +		up_write(&EXT4_I(inode)->i_data_sem);
> >> +		goto out_stop;
> >> +	}
> >
> > Doing this at first is probably a bad idea; you should do this at the
> > end, and then completely invalidate the es cache for that inode.  That
> > way, the right thing happens if you get an error in the middle
> > releasing the boxes and shifting the extents:
> Okay, I see.

Actually, thinking about this some more, we do want to do this first,
since if we error out, we do need to make sure the extent cache is
flushed.

> If there is error in the middle of extent shifting, the hole will
> present between the last shifted extent and the extent at which error
> happen so this will be consistent state.
> IMHO even if there is error in between the shift, filesystem will be
> in consistent state.
> Am I missing something?

No, I was wrong about that; you're right.  The file will be in an
inconsistent statement, which will probably be highly confusing for
the application, but the file system will be fine.

So I withdraw my complaints.  I'll do a bit more testing, but so far
the patch looks fine to me.  Thanks for your reply and your work!

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
NamJae Jeon - Feb. 27, 2014, 4:03 a.m.
2014-02-27 1:48 GMT+09:00, Theodore Ts'o <tytso@mit.edu>:
> On Mon, Feb 24, 2014 at 10:22:10AM +0900, Namjae Jeon wrote:
>> >> +	ret = ext4_es_remove_extent(inode, punch_start,
>> >> +				    EXT_MAX_BLOCKS - punch_start - 1);
>> >> +	if (ret) {
>> >> +		up_write(&EXT4_I(inode)->i_data_sem);
>> >> +		goto out_stop;
>> >> +	}
>> >
>> > Doing this at first is probably a bad idea; you should do this at the
>> > end, and then completely invalidate the es cache for that inode.  That
>> > way, the right thing happens if you get an error in the middle
>> > releasing the boxes and shifting the extents:
>> Okay, I see.
>
> Actually, thinking about this some more, we do want to do this first,
> since if we error out, we do need to make sure the extent cache is
> flushed.
Okay.

>
>> If there is error in the middle of extent shifting, the hole will
>> present between the last shifted extent and the extent at which error
>> happen so this will be consistent state.
>> IMHO even if there is error in between the shift, filesystem will be
>> in consistent state.
>> Am I missing something?
>
> No, I was wrong about that; you're right.  The file will be in an
> inconsistent statement, which will probably be highly confusing for
> the application, but the file system will be fine.
>
> So I withdraw my complaints.  I'll do a bit more testing, but so far
> the patch looks fine to me.  Thanks for your reply and your work!
Thanks for your review! I will fix these include Hugh's review points.
>
> 					- Ted
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ad13359..d7a78ed 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -46,6 +46,10 @@ 
 
 #include <trace/events/ext4.h>
 
+#ifndef FALLOC_FL_COLLAPSE_RANGE
+#define FALLOC_FL_COLLAPSE_RANGE	0x08
+#endif
+
 /*
  * used by extent splitting.
  */