Message ID | 1340319857-28198-3-git-send-email-mdroth@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
> FITRIM is a mounted filesystem feature to discard (or "trim") blocks which > are not in use by the filesystem. This is useful for solid-state drives > (SSDs) and thinly-provisioned storage. Provide access to the feature > from the host so that filesystems can be trimmed periodically or before > migration. Why can't we use the block layer for this? AHCI (I think) already has support, others could be added some some coordination. That was existing operating systems with current filesystems will DTRT when needed.
On Fri, Jun 22, 2012 at 10:48:56AM -0700, Chris Wedgwood wrote: > > FITRIM is a mounted filesystem feature to discard (or "trim") blocks which > > are not in use by the filesystem. This is useful for solid-state drives > > (SSDs) and thinly-provisioned storage. Provide access to the feature > > from the host so that filesystems can be trimmed periodically or before > > migration. > > Why can't we use the block layer for this? AHCI (I think) already > has support, others could be added some some coordination. I'm not sure I understand, wouldn't the filesystem need to be involved at some level? How can the block layer differentiate lazilly discarded data blocks from ones that are still in use without the aid of the filesystem? > > That was existing operating systems with current filesystems will DTRT > when needed. >
> I'm not sure I understand, wouldn't the filesystem need to be involved > at some level? How can the block layer differentiate lazilly discarded data > blocks from ones that are still in use without the aid of the > filesystem? It might be me that doesn't understand. Yes, the filesystem is involved. Current linux filesystems can trim on demand or using fstrim. That mechanism seems to be like it should suffice if exposed in the most common cases.
On Fri, Jun 22, 2012 at 08:38:51PM -0700, Chris Wedgwood wrote: > > I'm not sure I understand, wouldn't the filesystem need to be involved > > at some level? How can the block layer differentiate lazilly discarded data > > blocks from ones that are still in use without the aid of the > > filesystem? > > It might be me that doesn't understand. > > Yes, the filesystem is involved. Current linux filesystems can trim > on demand or using fstrim. That mechanism seems to be like it should > suffice if exposed in the most common cases. > You mean the "discard" mount option? I don't think that's generally enabled by default due to the performance impact on bare metal. A periodic, filesystem-wide call (like the FITRIM ioctl) is probably the better approach in those cases as well. I'm not sure what the penalty of auto-discard would be in the case of VMs, but there would be at least some additional overhead there due the hole punching operations run by the host, and the only use case I can think of where it's useful in that context is for reducing the size of the images before we do some work with them on the host, in which case an explicit FITRIM ioctl beforehand makes the most sense IMO.
On Fri, Jun 22, 2012 at 10:48:56AM -0700, Chris Wedgwood wrote: > > FITRIM is a mounted filesystem feature to discard (or "trim") blocks which > > are not in use by the filesystem. This is useful for solid-state drives > > (SSDs) and thinly-provisioned storage. Provide access to the feature > > from the host so that filesystems can be trimmed periodically or before > > migration. > > Why can't we use the block layer for this? AHCI (I think) already > has support, others could be added some some coordination. > > That was existing operating systems with current filesystems will DTRT > when needed. This commands calls the FITRIM command in the guest, which tells the filesystems to issue an discard for all currently unused space. At that point it enters the block layer.
diff --git a/qapi-schema-guest.json b/qapi-schema-guest.json index d4055d2..d955cf1 100644 --- a/qapi-schema-guest.json +++ b/qapi-schema-guest.json @@ -351,6 +351,26 @@ 'returns': 'int' } ## +# @guest-fstrim: +# +# Discard (or "trim") blocks which are not in use by the filesystem. +# +# @minimum: +# Minimum contiguous free range to discard, in bytes. Free ranges +# smaller than this may be ignored (this is a hint and the guest +# may not respect it). By increasing this value, the fstrim +# operation will complete more quickly for filesystems with badly +# fragmented free space, although not all blocks will be discarded. +# The default value is zero, meaning "discard every free block". +# +# Returns: Nothing. +# +# Since: 1.2 +## +{ 'command': 'guest-fstrim', + 'data': { '*minimum': 'int' } } + +## # @guest-suspend-disk # # Suspend guest to disk. diff --git a/qga/commands-posix.c b/qga/commands-posix.c index b1a7ce6..ce90421 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -38,9 +38,12 @@ extern char **environ; #include <sys/socket.h> #include <net/if.h> -#if defined(__linux__) && defined(FIFREEZE) +#ifdef FIFREEZE #define CONFIG_FSFREEZE #endif +#ifdef FITRIM +#define CONFIG_FSTRIM +#endif #endif void qmp_guest_shutdown(bool has_mode, const char *mode, Error **err) @@ -312,8 +315,7 @@ static void guest_file_init(void) /* linux-specific implementations. avoid this if at all possible. */ #if defined(__linux__) -#if defined(CONFIG_FSFREEZE) - +#if defined(CONFIG_FSFREEZE) || defined(CONFIG_FSTRIM) typedef struct FsMount { char *dirname; char *devtype; @@ -378,6 +380,9 @@ static int build_fs_mount_list(FsMountList *mounts) return 0; } +#endif + +#if defined(CONFIG_FSFREEZE) /* * Return status of freeze/thaw @@ -525,6 +530,65 @@ static void guest_fsfreeze_cleanup(void) } #endif /* CONFIG_FSFREEZE */ +#if defined(CONFIG_FSTRIM) +/* + * Walk list of mounted file systems in the guest, and trim them. + */ +void qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **err) +{ + int ret = 0; + FsMountList mounts; + struct FsMount *mount; + int fd; + char err_msg[512]; + struct fstrim_range r = { + .start = 0, + .len = -1, + .minlen = has_minimum ? minimum : 0, + }; + + slog("guest-fstrim called"); + + QTAILQ_INIT(&mounts); + ret = build_fs_mount_list(&mounts); + if (ret < 0) { + return; + } + + QTAILQ_FOREACH(mount, &mounts, next) { + fd = qemu_open(mount->dirname, O_RDONLY); + if (fd == -1) { + sprintf(err_msg, "failed to open %s, %s", mount->dirname, + strerror(errno)); + error_set(err, QERR_QGA_COMMAND_FAILED, err_msg); + goto error; + } + + /* We try to cull filesytems we know won't work in advance, but other + * filesytems may not implement fstrim for less obvious reasons. These + * will report EOPNOTSUPP; we simply ignore these errors. Any other + * error means an unexpected error, so return it in those cases. In + * some other cases ENOTTY will be reported (e.g. CD-ROMs). + */ + ret = ioctl(fd, FITRIM, &r); + if (ret == -1) { + if (errno != ENOTTY && errno != EOPNOTSUPP) { + sprintf(err_msg, "failed to trim %s, %s", + mount->dirname, strerror(errno)); + error_set(err, QERR_QGA_COMMAND_FAILED, err_msg); + close(fd); + goto error; + } + } + close(fd); + } + +error: + free_fs_mount_list(&mounts); +} +#endif /* CONFIG_FSTRIM */ + + #define LINUX_SYS_STATE_FILE "/sys/power/state" #define SUSPEND_SUPPORTED 0 #define SUSPEND_NOT_SUPPORTED 1 @@ -918,7 +982,15 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err) return 0; } +#endif /* CONFIG_FSFREEZE */ + +#if !defined(CONFIG_FSTRIM) +void qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **err) +{ + error_set(err, QERR_UNSUPPORTED); + return; +} #endif /* register init/cleanup routines for stateful command groups */ diff --git a/qga/commands-win32.c b/qga/commands-win32.c index eb8d140..54bc546 100644 --- a/qga/commands-win32.c +++ b/qga/commands-win32.c @@ -173,6 +173,17 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err) return 0; } +/* + * Walk list of mounted file systems in the guest, and discard unused + * areas. + */ +void qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **err) +{ + error_set(err, QERR_UNSUPPORTED); + + return; +} + typedef enum { GUEST_SUSPEND_MODE_DISK, GUEST_SUSPEND_MODE_RAM