diff mbox

[2/2] ext3: Add FITRIM handle for ext3

Message ID 1290425358-16253-2-git-send-email-lczerner@redhat.com
State Not Applicable, archived
Headers show

Commit Message

Lukas Czerner Nov. 22, 2010, 11:29 a.m. UTC
It takes fstrim_range structure as an argument. fstrim_range is definec in
the include/linux/fs.h.

After the FITRIM is done, the number of actually discarded Bytes is stored
in fstrim_range.len to give the user better insight on how much storage
space has been really released for wear-leveling.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext3/ioctl.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

Comments

Greg Freemyer Nov. 22, 2010, 4:13 p.m. UTC | #1
On Mon, Nov 22, 2010 at 6:29 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> It takes fstrim_range structure as an argument. fstrim_range is definec in
> the include/linux/fs.h.
>
> After the FITRIM is done, the number of actually discarded Bytes is stored
> in fstrim_range.len to give the user better insight on how much storage
> space has been really released for wear-leveling.
>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>

That's a misleading description in my opinion, from my understanding
this is more accurate:

===
After the FITRIM is done, the number of bytes passed from the
filesystem down the block stack to the device for potential discard is
stored in fstrim_range.len.  This number is a maximum discard amount
from the storage device's perspective, because FITRIM called repeated
will keep sending the same sectors for discard repeatedly.
fstrim_range.len will report the same potential discard bytes each
time, but only sectors which had been written to between the discards
would actually be discarded by the storage device.  Further, the
kernel block layer reserves the right to adjust the discard ranges to
fit raid stripe geometry, non-trim capable devices in a LVM setup,
etc.  These reductions would not be reflected in fstrim_range.len.

As 2.6.37, the kernel block layer does not fully support discard and
as such will simply ignore all discard requests sent to volumes
created by device mapper or mdraid.  This is done in a silent way, so
these failures to discard are also not reflected in fstrim_range.len.

Thus fstrim_range.len can give the user better insight on how much
storage space has potentially been released for wear-leveling, but it
needs to be one of only one criteria the userspace tools take into
account when trying to optimize calls to FITRIM.
===

Obviously, I'd like to also see that also in API documentation for
FITRIM.  (And correct me if I'm wrong about device mapper / mdraid.
I'd love to be wrong about that statement..)

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Nov. 22, 2010, 5:42 p.m. UTC | #2
On Mon 22-11-10 12:29:18, Lukas Czerner wrote:
> It takes fstrim_range structure as an argument. fstrim_range is definec in
> the include/linux/fs.h.
> 
> After the FITRIM is done, the number of actually discarded Bytes is stored
> in fstrim_range.len to give the user better insight on how much storage
> space has been really released for wear-leveling.
  Umm, why do we have to do this when FITRIM is already handled in
fs/ioctl.c? I'd expect us to just provide .trim_fs ioctl, no?

									Honza
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
>  fs/ext3/ioctl.c |   22 ++++++++++++++++++++++
>  1 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
> index 8897481..fc080dd 100644
> --- a/fs/ext3/ioctl.c
> +++ b/fs/ext3/ioctl.c
> @@ -276,7 +276,29 @@ group_add_out:
>  		mnt_drop_write(filp->f_path.mnt);
>  		return err;
>  	}
> +	case FITRIM: {
>  
> +		struct super_block *sb = inode->i_sb;
> +		struct fstrim_range range;
> +		int ret = 0;
> +
> +		if (!capable(CAP_SYS_ADMIN))
> +			return -EPERM;
> +
> +		if (copy_from_user(&range, (struct fstrim_range *)arg,
> +				   sizeof(range)))
> +			return -EFAULT;
> +
> +		ret = ext3_trim_fs(sb, &range);
> +		if (ret < 0)
> +			return ret;
> +
> +		if (copy_to_user((struct fstrim_range *)arg, &range,
> +				 sizeof(range)))
> +			return -EFAULT;
> +
> +		return 0;
> +	}
>  
>  	default:
>  		return -ENOTTY;
> -- 
> 1.7.2.3
>
Lukas Czerner Nov. 23, 2010, 10:32 a.m. UTC | #3
On Mon, 22 Nov 2010, Jan Kara wrote:

> On Mon 22-11-10 12:29:18, Lukas Czerner wrote:
> > It takes fstrim_range structure as an argument. fstrim_range is definec in
> > the include/linux/fs.h.
> > 
> > After the FITRIM is done, the number of actually discarded Bytes is stored
> > in fstrim_range.len to give the user better insight on how much storage
> > space has been really released for wear-leveling.
>   Umm, why do we have to do this when FITRIM is already handled in
> fs/ioctl.c? I'd expect us to just provide .trim_fs ioctl, no?

Hi,

see upstream commits:
93bb41f4f8b89ac8b4d0a734bc59634cb0a29a89
e681c047e47c0abe67bf95857f23814372793cb0

unfortunately people was not happy with generic ioctl interface for that
purpose, there were concerned that it is no common enough to be included
in core vfs and there is no need for new super operation since each
filesystem can easily setup its own ioctl handling.

When I say people I need to clarify that it was mainly Christoph Hellwig
who had objections against the former implementation and I must say that
he had a point.

-Lukas


> 
> 									Honza
> > 
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > ---
> >  fs/ext3/ioctl.c |   22 ++++++++++++++++++++++
> >  1 files changed, 22 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
> > index 8897481..fc080dd 100644
> > --- a/fs/ext3/ioctl.c
> > +++ b/fs/ext3/ioctl.c
> > @@ -276,7 +276,29 @@ group_add_out:
> >  		mnt_drop_write(filp->f_path.mnt);
> >  		return err;
> >  	}
> > +	case FITRIM: {
> >  
> > +		struct super_block *sb = inode->i_sb;
> > +		struct fstrim_range range;
> > +		int ret = 0;
> > +
> > +		if (!capable(CAP_SYS_ADMIN))
> > +			return -EPERM;
> > +
> > +		if (copy_from_user(&range, (struct fstrim_range *)arg,
> > +				   sizeof(range)))
> > +			return -EFAULT;
> > +
> > +		ret = ext3_trim_fs(sb, &range);
> > +		if (ret < 0)
> > +			return ret;
> > +
> > +		if (copy_to_user((struct fstrim_range *)arg, &range,
> > +				 sizeof(range)))
> > +			return -EFAULT;
> > +
> > +		return 0;
> > +	}
> >  
> >  	default:
> >  		return -ENOTTY;
> > -- 
> > 1.7.2.3
> > 
>
Lukas Czerner Nov. 23, 2010, 11:06 a.m. UTC | #4
On Mon, 22 Nov 2010, Greg Freemyer wrote:

> On Mon, Nov 22, 2010 at 6:29 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> > It takes fstrim_range structure as an argument. fstrim_range is definec in
> > the include/linux/fs.h.
> >
> > After the FITRIM is done, the number of actually discarded Bytes is stored
> > in fstrim_range.len to give the user better insight on how much storage
> > space has been really released for wear-leveling.
> >
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> 
> That's a misleading description in my opinion, from my understanding
> this is more accurate:
> 
> ===
> After the FITRIM is done, the number of bytes passed from the
> filesystem down the block stack to the device for potential discard is
> stored in fstrim_range.len.  This number is a maximum discard amount
> from the storage device's perspective, because FITRIM called repeated
> will keep sending the same sectors for discard repeatedly.
> fstrim_range.len will report the same potential discard bytes each
> time, but only sectors which had been written to between the discards
> would actually be discarded by the storage device.  Further, the
> kernel block layer reserves the right to adjust the discard ranges to
> fit raid stripe geometry, non-trim capable devices in a LVM setup,
> etc.  These reductions would not be reflected in fstrim_range.len.
> 
> As 2.6.37, the kernel block layer does not fully support discard and
> as such will simply ignore all discard requests sent to volumes
> created by device mapper or mdraid.  This is done in a silent way, so
> these failures to discard are also not reflected in fstrim_range.len.

I think this is not entirely true, because we have discard support for
dm linear and stripe:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5ae89a8720c28caf35c4e53711d77df2856c404e
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7b76ec11fec40203836b488496d2df082d5b2022

(Adding Mike Snitzer into cc)

> 
> Thus fstrim_range.len can give the user better insight on how much
> storage space has potentially been released for wear-leveling, but it
> needs to be one of only one criteria the userspace tools take into
> account when trying to optimize calls to FITRIM.
> ===
> 
> Obviously, I'd like to also see that also in API documentation for
> FITRIM.  (And correct me if I'm wrong about device mapper / mdraid.
> I'd love to be wrong about that statement..)
> 
> Greg
> 

Greg, thank you for this, aside the dm thing, this is definitely a lot
better explanation and it would be nice to include this into commit
message (it is too late for ext4:).

Jan, do you want me to repost this with new commit message or you can
add it yourself ?

Thanks!

-Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index 8897481..fc080dd 100644
--- a/fs/ext3/ioctl.c
+++ b/fs/ext3/ioctl.c
@@ -276,7 +276,29 @@  group_add_out:
 		mnt_drop_write(filp->f_path.mnt);
 		return err;
 	}
+	case FITRIM: {
 
+		struct super_block *sb = inode->i_sb;
+		struct fstrim_range range;
+		int ret = 0;
+
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+
+		if (copy_from_user(&range, (struct fstrim_range *)arg,
+				   sizeof(range)))
+			return -EFAULT;
+
+		ret = ext3_trim_fs(sb, &range);
+		if (ret < 0)
+			return ret;
+
+		if (copy_to_user((struct fstrim_range *)arg, &range,
+				 sizeof(range)))
+			return -EFAULT;
+
+		return 0;
+	}
 
 	default:
 		return -ENOTTY;