Message ID | 1290425358-16253-2-git-send-email-lczerner@redhat.com |
---|---|
State | Not Applicable, archived |
Headers | show |
On Mon, Nov 22, 2010 at 6:29 AM, Lukas Czerner <lczerner@redhat.com> wrote: > It takes fstrim_range structure as an argument. fstrim_range is definec in > the include/linux/fs.h. > > After the FITRIM is done, the number of actually discarded Bytes is stored > in fstrim_range.len to give the user better insight on how much storage > space has been really released for wear-leveling. > > Signed-off-by: Lukas Czerner <lczerner@redhat.com> That's a misleading description in my opinion, from my understanding this is more accurate: === After the FITRIM is done, the number of bytes passed from the filesystem down the block stack to the device for potential discard is stored in fstrim_range.len. This number is a maximum discard amount from the storage device's perspective, because FITRIM called repeated will keep sending the same sectors for discard repeatedly. fstrim_range.len will report the same potential discard bytes each time, but only sectors which had been written to between the discards would actually be discarded by the storage device. Further, the kernel block layer reserves the right to adjust the discard ranges to fit raid stripe geometry, non-trim capable devices in a LVM setup, etc. These reductions would not be reflected in fstrim_range.len. As 2.6.37, the kernel block layer does not fully support discard and as such will simply ignore all discard requests sent to volumes created by device mapper or mdraid. This is done in a silent way, so these failures to discard are also not reflected in fstrim_range.len. Thus fstrim_range.len can give the user better insight on how much storage space has potentially been released for wear-leveling, but it needs to be one of only one criteria the userspace tools take into account when trying to optimize calls to FITRIM. === Obviously, I'd like to also see that also in API documentation for FITRIM. (And correct me if I'm wrong about device mapper / mdraid. I'd love to be wrong about that statement..) Greg -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon 22-11-10 12:29:18, Lukas Czerner wrote: > It takes fstrim_range structure as an argument. fstrim_range is definec in > the include/linux/fs.h. > > After the FITRIM is done, the number of actually discarded Bytes is stored > in fstrim_range.len to give the user better insight on how much storage > space has been really released for wear-leveling. Umm, why do we have to do this when FITRIM is already handled in fs/ioctl.c? I'd expect us to just provide .trim_fs ioctl, no? Honza > > Signed-off-by: Lukas Czerner <lczerner@redhat.com> > --- > fs/ext3/ioctl.c | 22 ++++++++++++++++++++++ > 1 files changed, 22 insertions(+), 0 deletions(-) > > diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c > index 8897481..fc080dd 100644 > --- a/fs/ext3/ioctl.c > +++ b/fs/ext3/ioctl.c > @@ -276,7 +276,29 @@ group_add_out: > mnt_drop_write(filp->f_path.mnt); > return err; > } > + case FITRIM: { > > + struct super_block *sb = inode->i_sb; > + struct fstrim_range range; > + int ret = 0; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > + if (copy_from_user(&range, (struct fstrim_range *)arg, > + sizeof(range))) > + return -EFAULT; > + > + ret = ext3_trim_fs(sb, &range); > + if (ret < 0) > + return ret; > + > + if (copy_to_user((struct fstrim_range *)arg, &range, > + sizeof(range))) > + return -EFAULT; > + > + return 0; > + } > > default: > return -ENOTTY; > -- > 1.7.2.3 >
On Mon, 22 Nov 2010, Jan Kara wrote: > On Mon 22-11-10 12:29:18, Lukas Czerner wrote: > > It takes fstrim_range structure as an argument. fstrim_range is definec in > > the include/linux/fs.h. > > > > After the FITRIM is done, the number of actually discarded Bytes is stored > > in fstrim_range.len to give the user better insight on how much storage > > space has been really released for wear-leveling. > Umm, why do we have to do this when FITRIM is already handled in > fs/ioctl.c? I'd expect us to just provide .trim_fs ioctl, no? Hi, see upstream commits: 93bb41f4f8b89ac8b4d0a734bc59634cb0a29a89 e681c047e47c0abe67bf95857f23814372793cb0 unfortunately people was not happy with generic ioctl interface for that purpose, there were concerned that it is no common enough to be included in core vfs and there is no need for new super operation since each filesystem can easily setup its own ioctl handling. When I say people I need to clarify that it was mainly Christoph Hellwig who had objections against the former implementation and I must say that he had a point. -Lukas > > Honza > > > > Signed-off-by: Lukas Czerner <lczerner@redhat.com> > > --- > > fs/ext3/ioctl.c | 22 ++++++++++++++++++++++ > > 1 files changed, 22 insertions(+), 0 deletions(-) > > > > diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c > > index 8897481..fc080dd 100644 > > --- a/fs/ext3/ioctl.c > > +++ b/fs/ext3/ioctl.c > > @@ -276,7 +276,29 @@ group_add_out: > > mnt_drop_write(filp->f_path.mnt); > > return err; > > } > > + case FITRIM: { > > > > + struct super_block *sb = inode->i_sb; > > + struct fstrim_range range; > > + int ret = 0; > > + > > + if (!capable(CAP_SYS_ADMIN)) > > + return -EPERM; > > + > > + if (copy_from_user(&range, (struct fstrim_range *)arg, > > + sizeof(range))) > > + return -EFAULT; > > + > > + ret = ext3_trim_fs(sb, &range); > > + if (ret < 0) > > + return ret; > > + > > + if (copy_to_user((struct fstrim_range *)arg, &range, > > + sizeof(range))) > > + return -EFAULT; > > + > > + return 0; > > + } > > > > default: > > return -ENOTTY; > > -- > > 1.7.2.3 > > >
On Mon, 22 Nov 2010, Greg Freemyer wrote: > On Mon, Nov 22, 2010 at 6:29 AM, Lukas Czerner <lczerner@redhat.com> wrote: > > It takes fstrim_range structure as an argument. fstrim_range is definec in > > the include/linux/fs.h. > > > > After the FITRIM is done, the number of actually discarded Bytes is stored > > in fstrim_range.len to give the user better insight on how much storage > > space has been really released for wear-leveling. > > > > Signed-off-by: Lukas Czerner <lczerner@redhat.com> > > That's a misleading description in my opinion, from my understanding > this is more accurate: > > === > After the FITRIM is done, the number of bytes passed from the > filesystem down the block stack to the device for potential discard is > stored in fstrim_range.len. This number is a maximum discard amount > from the storage device's perspective, because FITRIM called repeated > will keep sending the same sectors for discard repeatedly. > fstrim_range.len will report the same potential discard bytes each > time, but only sectors which had been written to between the discards > would actually be discarded by the storage device. Further, the > kernel block layer reserves the right to adjust the discard ranges to > fit raid stripe geometry, non-trim capable devices in a LVM setup, > etc. These reductions would not be reflected in fstrim_range.len. > > As 2.6.37, the kernel block layer does not fully support discard and > as such will simply ignore all discard requests sent to volumes > created by device mapper or mdraid. This is done in a silent way, so > these failures to discard are also not reflected in fstrim_range.len. I think this is not entirely true, because we have discard support for dm linear and stripe: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5ae89a8720c28caf35c4e53711d77df2856c404e http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7b76ec11fec40203836b488496d2df082d5b2022 (Adding Mike Snitzer into cc) > > Thus fstrim_range.len can give the user better insight on how much > storage space has potentially been released for wear-leveling, but it > needs to be one of only one criteria the userspace tools take into > account when trying to optimize calls to FITRIM. > === > > Obviously, I'd like to also see that also in API documentation for > FITRIM. (And correct me if I'm wrong about device mapper / mdraid. > I'd love to be wrong about that statement..) > > Greg > Greg, thank you for this, aside the dm thing, this is definitely a lot better explanation and it would be nice to include this into commit message (it is too late for ext4:). Jan, do you want me to repost this with new commit message or you can add it yourself ? Thanks! -Lukas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c index 8897481..fc080dd 100644 --- a/fs/ext3/ioctl.c +++ b/fs/ext3/ioctl.c @@ -276,7 +276,29 @@ group_add_out: mnt_drop_write(filp->f_path.mnt); return err; } + case FITRIM: { + struct super_block *sb = inode->i_sb; + struct fstrim_range range; + int ret = 0; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&range, (struct fstrim_range *)arg, + sizeof(range))) + return -EFAULT; + + ret = ext3_trim_fs(sb, &range); + if (ret < 0) + return ret; + + if (copy_to_user((struct fstrim_range *)arg, &range, + sizeof(range))) + return -EFAULT; + + return 0; + } default: return -ENOTTY;
It takes fstrim_range structure as an argument. fstrim_range is definec in the include/linux/fs.h. After the FITRIM is done, the number of actually discarded Bytes is stored in fstrim_range.len to give the user better insight on how much storage space has been really released for wear-leveling. Signed-off-by: Lukas Czerner <lczerner@redhat.com> --- fs/ext3/ioctl.c | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-)