diff mbox

[RESEND,1/7] fs: add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

Message ID 1381090366-2727-1-git-send-email-linkinjeon@gmail.com
State Superseded, archived
Headers show

Commit Message

Namjae Jeon Oct. 6, 2013, 8:12 p.m. UTC
From: Namjae Jeon <namjae.jeon@samsung.com>

Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate.
updated detailed semantics in comments.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
---
 fs/open.c                   |   24 +++++++++++++++++++++---
 include/uapi/linux/falloc.h |   17 +++++++++++++++++
 2 files changed, 38 insertions(+), 3 deletions(-)

Comments

Dave Chinner Oct. 9, 2013, 10:46 p.m. UTC | #1
On Mon, Oct 07, 2013 at 05:12:46AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <namjae.jeon@samsung.com>
> 
> Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate.
> updated detailed semantics in comments.
> 
> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
> ---
>  fs/open.c                   |   24 +++++++++++++++++++++---
>  include/uapi/linux/falloc.h |   17 +++++++++++++++++
>  2 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 7931f76..85d243a 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -225,12 +225,14 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  {
>  	struct inode *inode = file_inode(file);
>  	long ret;
> +	unsigned blksize_mask = (1 << inode->i_blkbits) - 1;
>  
>  	if (offset < 0 || len <= 0)
>  		return -EINVAL;
>  
>  	/* Return error if mode is not supported */
> -	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> +		     FALLOC_FL_COLLAPSE_RANGE))
>  		return -EOPNOTSUPP;
>  
>  	/* Punch hole must have keep size set */
> @@ -241,8 +243,12 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  	if (!(file->f_mode & FMODE_WRITE))
>  		return -EBADF;
>  
> -	/* It's not possible punch hole on append only file */
> -	if (mode & FALLOC_FL_PUNCH_HOLE && IS_APPEND(inode))
> +	/*
> +	 * It's not possible to punch hole or perform collapse range
> +	 * on append only file
> +	 */
> +	if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE)
> +	    && IS_APPEND(inode))
>  		return -EPERM;
>  
>  	if (IS_IMMUTABLE(inode))
> @@ -270,6 +276,18 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  	if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0))
>  		return -EFBIG;
>  
> +	/*
> +	 * Collapse range works only on fs block size aligned offsets.
> +	 * Check if collapse range is contained within (aligned)i_size.
> +	 * Collapse range can only be used exclusively.
> +	 */
> +	if ((mode & FALLOC_FL_COLLAPSE_RANGE) &&
> +	    (offset & blksize_mask || len & blksize_mask ||
> +	     mode & ~FALLOC_FL_COLLAPSE_RANGE ||
> +	     (offset + len >
> +	      round_up(i_size_read(inode), (blksize_mask + 1)))))
> +		return -EINVAL;

There's lots of individual checks here. Let's separate them out
logically. Firstly, "Collapse range can only be used exclusively" is
a mode parameter check, and so should be done directly after
validating the mode only contains known commands. i.e. in the first
hunk above.

Secondly, "Collapse range works only on fs block size aligned
offsets" is an implementation constraint, not an API constraint.
i.e. There is no reason why a filesystem can't implement byte range
granularity for this operation, it just may not be efficient for all
fielsystems and so they don't choose to implement byte range
granularity. Further, filesystems might have different allocation
constraints than the filesystem block size (think bigalloc on ext4,
per-file extent size hints for XFS), and these generally aren't
reflected in inode->i_blkbits. In these cases, the granularity of
the collapse operation can only be determined by the filesystem
itself, not this high level code.

Hence I think the granularity check should be put into a helper
function that the filesystem's ->fallocate() method calls if it can
only support fs block aligned operations. That allows each
filesystem to determine it's own constraints on a per-operation
basis.

All that remains here is the "within file size"
check, and that doesn't need to be rounded up to block size to check
if it is valid. If the range given overlaps the end of file in any
way, then it is a truncate operation....


> diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
> index 990c4cc..9614b72 100644
> --- a/include/uapi/linux/falloc.h
> +++ b/include/uapi/linux/falloc.h
> @@ -4,6 +4,23 @@
>  #define FALLOC_FL_KEEP_SIZE	0x01 /* default is extend size */
>  #define FALLOC_FL_PUNCH_HOLE	0x02 /* de-allocates range */
>  #define FALLOC_FL_NO_HIDE_STALE	0x04 /* reserved codepoint */
> +/*
> + * FALLOC_FL_COLLAPSE_RANGE:
> + * This flag works in 2 steps.
> + * Firstly, it deallocates any data blocks present between [offset, offset+len)
> + * This step is same as punch hole and leaves a hole in the place from where
> + * the blocks are removed.
> + * Next, it eliminates the hole created by moving data blocks into it.
> + * For extent based file systems, we achieve this functionality simply by
> + * updating the starting logical offset of each extent which appears beyond
> + * the hole. As this flag works on blocks of filesystem, the offset and len
> + * provided to fallocate should be aligned with block size of filesystem.

Hmmm - you're describing an implementation, not the API. i.e. what
you need to describe is the functionality users are provided with by
the flag and it's usage constraints, not how filesystems need to
implement it. Something like:

"FALLOC_FL_COLLAPSE_RANGE is used to remove a range of a file
without leaving a hole in the file. The contents of the file beyond
the range being removed is appended to the start offset of the range
being removed (i.e. the hole that was punched is "collapsed"),
resulting in a file layout that looks like the range that was
removed never existed. As suchm collapsing a range of a file changes
the size of the file, reducing it by the same length of the range
that has been removed by the operation.

Different filesystems may implement different limitations on the
granularity of the operation. Most will limit operations to
filesystem block size boundaries, but this boundary may be larger or
smaller depending on the filesystem and/or the configuration of the
filesystem or file.

Attempting to collapse a range that crosses the end of the file is
considered an illegal operation - just use ftruncate(2) if you need
to collapse a range that crosses EOF."

> +#define FALLOC_FL_COLLAPSE_RANGE	0x08 /* it does not leave a hole */

With the large descriptive comment, there is no need for the
appended "/* it does not leave a hole */" comment.

Cheers,

Dave.
Namjae Jeon Oct. 10, 2013, 6:56 a.m. UTC | #2
Hi Dave.

> >
> > +     /*
> > +      * Collapse range works only on fs block size aligned offsets.
> > +      * Check if collapse range is contained within (aligned)i_size.
> > +      * Collapse range can only be used exclusively.
> > +      */
> > +     if ((mode & FALLOC_FL_COLLAPSE_RANGE) &&
> > +         (offset & blksize_mask || len & blksize_mask ||
> > +          mode & ~FALLOC_FL_COLLAPSE_RANGE ||
> > +          (offset + len >
> > +           round_up(i_size_read(inode), (blksize_mask + 1)))))
> > +             return -EINVAL;
>
> There's lots of individual checks here. Let's separate them out
> logically. Firstly, "Collapse range can only be used exclusively" is
> a mode parameter check, and so should be done directly after
> validating the mode only contains known commands. i.e. in the first
> hunk above.
>
> Secondly, "Collapse range works only on fs block size aligned
> offsets" is an implementation constraint, not an API constraint.
> i.e. There is no reason why a filesystem can't implement byte range
> granularity for this operation, it just may not be efficient for all
> fielsystems and so they don't choose to implement byte range
> granularity. Further, filesystems might have different allocation
> constraints than the filesystem block size (think bigalloc on ext4,
> per-file extent size hints for XFS), and these generally aren't
> reflected in inode->i_blkbits. In these cases, the granularity of
> the collapse operation can only be determined by the filesystem
> itself, not this high level code.
>
> Hence I think the granularity check should be put into a helper
> function that the filesystem's ->fallocate() method calls if it can
> only support fs block aligned operations. That allows each
> filesystem to determine it's own constraints on a per-operation
> basis.
>
> All that remains here is the "within file size"
> check, and that doesn't need to be rounded up to block size to check
> if it is valid. If the range given overlaps the end of file in any
> way, then it is a truncate operation....

Okay, I will update your points on next version patch.
>
>
> > diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
> > index 990c4cc..9614b72 100644
> > --- a/include/uapi/linux/falloc.h
> > +++ b/include/uapi/linux/falloc.h
> > @@ -4,6 +4,23 @@
> >  #define FALLOC_FL_KEEP_SIZE  0x01 /* default is extend size */
> >  #define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */
> >  #define FALLOC_FL_NO_HIDE_STALE      0x04 /* reserved codepoint */
> > +/*
> > + * FALLOC_FL_COLLAPSE_RANGE:
> > + * This flag works in 2 steps.
> > + * Firstly, it deallocates any data blocks present between [offset,
> > offset+len)
> > + * This step is same as punch hole and leaves a hole in the place from
> > where
> > + * the blocks are removed.
> > + * Next, it eliminates the hole created by moving data blocks into it.
> > + * For extent based file systems, we achieve this functionality simply
> > by
> > + * updating the starting logical offset of each extent which appears
> > beyond
> > + * the hole. As this flag works on blocks of filesystem, the offset and
> > len
> > + * provided to fallocate should be aligned with block size of
> > filesystem.
>
> Hmmm - you're describing an implementation, not the API. i.e. what
> you need to describe is the functionality users are provided with by
> the flag and it's usage constraints, not how filesystems need to
> implement it. Something like:
Okay, I will reference your shared description.

>
> "FALLOC_FL_COLLAPSE_RANGE is used to remove a range of a file
> without leaving a hole in the file. The contents of the file beyond
> the range being removed is appended to the start offset of the range
> being removed (i.e. the hole that was punched is "collapsed"),
> resulting in a file layout that looks like the range that was
> removed never existed. As suchm collapsing a range of a file changes
> the size of the file, reducing it by the same length of the range
> that has been removed by the operation.
>
> Different filesystems may implement different limitations on the
> granularity of the operation. Most will limit operations to
> filesystem block size boundaries, but this boundary may be larger or
> smaller depending on the filesystem and/or the configuration of the
> filesystem or file.
>
> Attempting to collapse a range that crosses the end of the file is
> considered an illegal operation - just use ftruncate(2) if you need
> to collapse a range that crosses EOF."
>
> > +#define FALLOC_FL_COLLAPSE_RANGE     0x08 /* it does not leave a hole
> > */
>
> With the large descriptive comment, there is no need for the
> appended "/* it does not leave a hole */" comment.
Okay. will remove.

Thanks for review.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/open.c b/fs/open.c
index 7931f76..85d243a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -225,12 +225,14 @@  int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 {
 	struct inode *inode = file_inode(file);
 	long ret;
+	unsigned blksize_mask = (1 << inode->i_blkbits) - 1;
 
 	if (offset < 0 || len <= 0)
 		return -EINVAL;
 
 	/* Return error if mode is not supported */
-	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
+	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
+		     FALLOC_FL_COLLAPSE_RANGE))
 		return -EOPNOTSUPP;
 
 	/* Punch hole must have keep size set */
@@ -241,8 +243,12 @@  int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	if (!(file->f_mode & FMODE_WRITE))
 		return -EBADF;
 
-	/* It's not possible punch hole on append only file */
-	if (mode & FALLOC_FL_PUNCH_HOLE && IS_APPEND(inode))
+	/*
+	 * It's not possible to punch hole or perform collapse range
+	 * on append only file
+	 */
+	if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE)
+	    && IS_APPEND(inode))
 		return -EPERM;
 
 	if (IS_IMMUTABLE(inode))
@@ -270,6 +276,18 @@  int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0))
 		return -EFBIG;
 
+	/*
+	 * Collapse range works only on fs block size aligned offsets.
+	 * Check if collapse range is contained within (aligned)i_size.
+	 * Collapse range can only be used exclusively.
+	 */
+	if ((mode & FALLOC_FL_COLLAPSE_RANGE) &&
+	    (offset & blksize_mask || len & blksize_mask ||
+	     mode & ~FALLOC_FL_COLLAPSE_RANGE ||
+	     (offset + len >
+	      round_up(i_size_read(inode), (blksize_mask + 1)))))
+		return -EINVAL;
+
 	if (!file->f_op->fallocate)
 		return -EOPNOTSUPP;
 
diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
index 990c4cc..9614b72 100644
--- a/include/uapi/linux/falloc.h
+++ b/include/uapi/linux/falloc.h
@@ -4,6 +4,23 @@ 
 #define FALLOC_FL_KEEP_SIZE	0x01 /* default is extend size */
 #define FALLOC_FL_PUNCH_HOLE	0x02 /* de-allocates range */
 #define FALLOC_FL_NO_HIDE_STALE	0x04 /* reserved codepoint */
+/*
+ * FALLOC_FL_COLLAPSE_RANGE:
+ * This flag works in 2 steps.
+ * Firstly, it deallocates any data blocks present between [offset, offset+len)
+ * This step is same as punch hole and leaves a hole in the place from where
+ * the blocks are removed.
+ * Next, it eliminates the hole created by moving data blocks into it.
+ * For extent based file systems, we achieve this functionality simply by
+ * updating the starting logical offset of each extent which appears beyond
+ * the hole. As this flag works on blocks of filesystem, the offset and len
+ * provided to fallocate should be aligned with block size of filesystem.
+ * The semantics of this flag are:
+ * 1) It should be used exclusively. No other fallocate flag in combination.
+ * 2) Offset and len supplied to fallocate should be aligned with block size.
+ * 3) (offset + len) could not be greater than file size.
+ */
+#define FALLOC_FL_COLLAPSE_RANGE	0x08 /* it does not leave a hole */
 
 
 #endif /* _UAPI_FALLOC_H_ */