diff mbox

[2/2] qemu-ga: add guest-fstrim command

Message ID 1340319857-28198-3-git-send-email-mdroth@linux.vnet.ibm.com
State New
Headers show

Commit Message

Michael Roth June 21, 2012, 11:04 p.m. UTC
From: Paolo Bonzini <pbonzini@redhat.com>

FITRIM is a mounted filesystem feature to discard (or "trim") blocks which
are not in use by the filesystem. This is useful for solid-state drives
(SSDs) and thinly-provisioned storage.  Provide access to the feature
from the host so that filesystems can be trimmed periodically or before
migration.

Here is an example using scsi_debug:

    # modprobe scsi_debug lbpu=1 lbpws=1
    # sg_vpd -p0xb2 /dev/sdb
    Logical block provisioning VPD page (SBC):
      Unmap command supported (LBPU): 1
      Write same (16) with unmap bit supported (LBWS): 1
      Write same (10) with unmap bit supported (LBWS10): 0
    # mke2fs /dev/sdb
    # cat /sys/bus/pseudo/drivers/scsi_debug/map
    1-616,16257-16383
    # mount /dev/sdb /run/media/pbonzini/test
    # dd if=/dev/zero of=/run/media/pbonzini/test/file
    # cat map
    1-616,645-1588,1599-4026,4029-16383
    # rm /run/media/pbonzini/test/file
    # ./qemu-ga /dev/fd/0
    {"execute":"guest-fstrim"}
    {"return": {}}
    # cat map
    1-612

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 qapi-schema-guest.json |   20 ++++++++++++
 qga/commands-posix.c   |   78 ++++++++++++++++++++++++++++++++++++++++++++++--
 qga/commands-win32.c   |   11 +++++++
 3 files changed, 106 insertions(+), 3 deletions(-)

Comments

Chris Wedgwood June 22, 2012, 5:48 p.m. UTC | #1
> FITRIM is a mounted filesystem feature to discard (or "trim") blocks which
> are not in use by the filesystem. This is useful for solid-state drives
> (SSDs) and thinly-provisioned storage.  Provide access to the feature
> from the host so that filesystems can be trimmed periodically or before
> migration.

Why can't we use the block layer for this?   AHCI (I think) already
has support, others could be added some some coordination.

That was existing operating systems with current filesystems will DTRT
when needed.
Michael Roth June 22, 2012, 9:12 p.m. UTC | #2
On Fri, Jun 22, 2012 at 10:48:56AM -0700, Chris Wedgwood wrote:
> > FITRIM is a mounted filesystem feature to discard (or "trim") blocks which
> > are not in use by the filesystem. This is useful for solid-state drives
> > (SSDs) and thinly-provisioned storage.  Provide access to the feature
> > from the host so that filesystems can be trimmed periodically or before
> > migration.
> 
> Why can't we use the block layer for this?   AHCI (I think) already
> has support, others could be added some some coordination.

I'm not sure I understand, wouldn't the filesystem need to be involved
at some level? How can the block layer differentiate lazilly discarded data
blocks from ones that are still in use without the aid of the
filesystem?

> 
> That was existing operating systems with current filesystems will DTRT
> when needed.
>
Chris Wedgwood June 23, 2012, 3:38 a.m. UTC | #3
> I'm not sure I understand, wouldn't the filesystem need to be involved
> at some level? How can the block layer differentiate lazilly discarded data
> blocks from ones that are still in use without the aid of the
> filesystem?

It might be me that doesn't understand.

Yes, the filesystem is involved.  Current linux filesystems can trim
on demand or using fstrim.  That mechanism seems to be like it should
suffice if exposed in the most common cases.
Michael Roth June 23, 2012, 8:54 p.m. UTC | #4
On Fri, Jun 22, 2012 at 08:38:51PM -0700, Chris Wedgwood wrote:
> > I'm not sure I understand, wouldn't the filesystem need to be involved
> > at some level? How can the block layer differentiate lazilly discarded data
> > blocks from ones that are still in use without the aid of the
> > filesystem?
> 
> It might be me that doesn't understand.
> 
> Yes, the filesystem is involved.  Current linux filesystems can trim
> on demand or using fstrim.  That mechanism seems to be like it should
> suffice if exposed in the most common cases.
> 

You mean the "discard" mount option? I don't think that's generally enabled by
default due to the performance impact on bare metal. A periodic,
filesystem-wide call (like the FITRIM ioctl) is probably the better approach in
those cases as well.

I'm not sure what the penalty of auto-discard would be in the case of VMs, but
there would be at least some additional overhead there due the hole punching
operations run by the host, and the only use case I can think of where it's
useful in that context is for reducing the size of the images before we do some
work with them on the host, in which case an explicit FITRIM ioctl beforehand
makes the most sense IMO.
Christoph Hellwig June 24, 2012, 4:33 p.m. UTC | #5
On Fri, Jun 22, 2012 at 10:48:56AM -0700, Chris Wedgwood wrote:
> > FITRIM is a mounted filesystem feature to discard (or "trim") blocks which
> > are not in use by the filesystem. This is useful for solid-state drives
> > (SSDs) and thinly-provisioned storage.  Provide access to the feature
> > from the host so that filesystems can be trimmed periodically or before
> > migration.
> 
> Why can't we use the block layer for this?   AHCI (I think) already
> has support, others could be added some some coordination.
> 
> That was existing operating systems with current filesystems will DTRT
> when needed.

This commands calls the FITRIM command in the guest, which tells the
filesystems to issue an discard for all currently unused space.  At that
point it enters the block layer.
diff mbox

Patch

diff --git a/qapi-schema-guest.json b/qapi-schema-guest.json
index d4055d2..d955cf1 100644
--- a/qapi-schema-guest.json
+++ b/qapi-schema-guest.json
@@ -351,6 +351,26 @@ 
   'returns': 'int' }
 
 ##
+# @guest-fstrim:
+#
+# Discard (or "trim") blocks which are not in use by the filesystem.
+#
+# @minimum:
+#       Minimum contiguous free range to discard, in bytes. Free ranges
+#       smaller than this may be ignored (this is a hint and the guest
+#       may not respect it).  By increasing this value, the fstrim
+#       operation will complete more quickly for filesystems with badly
+#       fragmented free space, although not all blocks will be discarded.
+#       The default value is zero, meaning "discard every free block".
+#
+# Returns: Nothing.
+#
+# Since: 1.2
+##
+{ 'command': 'guest-fstrim',
+  'data': { '*minimum': 'int' } }
+
+##
 # @guest-suspend-disk
 #
 # Suspend guest to disk.
diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index b1a7ce6..ce90421 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -38,9 +38,12 @@  extern char **environ;
 #include <sys/socket.h>
 #include <net/if.h>
 
-#if defined(__linux__) && defined(FIFREEZE)
+#ifdef FIFREEZE
 #define CONFIG_FSFREEZE
 #endif
+#ifdef FITRIM
+#define CONFIG_FSTRIM
+#endif
 #endif
 
 void qmp_guest_shutdown(bool has_mode, const char *mode, Error **err)
@@ -312,8 +315,7 @@  static void guest_file_init(void)
 /* linux-specific implementations. avoid this if at all possible. */
 #if defined(__linux__)
 
-#if defined(CONFIG_FSFREEZE)
-
+#if defined(CONFIG_FSFREEZE) || defined(CONFIG_FSTRIM)
 typedef struct FsMount {
     char *dirname;
     char *devtype;
@@ -378,6 +380,9 @@  static int build_fs_mount_list(FsMountList *mounts)
 
     return 0;
 }
+#endif
+
+#if defined(CONFIG_FSFREEZE)
 
 /*
  * Return status of freeze/thaw
@@ -525,6 +530,65 @@  static void guest_fsfreeze_cleanup(void)
 }
 #endif /* CONFIG_FSFREEZE */
 
+#if defined(CONFIG_FSTRIM)
+/*
+ * Walk list of mounted file systems in the guest, and trim them.
+ */
+void qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **err)
+{
+    int ret = 0;
+    FsMountList mounts;
+    struct FsMount *mount;
+    int fd;
+    char err_msg[512];
+    struct fstrim_range r = {
+        .start = 0,
+        .len = -1,
+        .minlen = has_minimum ? minimum : 0,
+    };
+
+    slog("guest-fstrim called");
+
+    QTAILQ_INIT(&mounts);
+    ret = build_fs_mount_list(&mounts);
+    if (ret < 0) {
+        return;
+    }
+
+    QTAILQ_FOREACH(mount, &mounts, next) {
+        fd = qemu_open(mount->dirname, O_RDONLY);
+        if (fd == -1) {
+            sprintf(err_msg, "failed to open %s, %s", mount->dirname,
+                    strerror(errno));
+            error_set(err, QERR_QGA_COMMAND_FAILED, err_msg);
+            goto error;
+        }
+
+        /* We try to cull filesytems we know won't work in advance, but other
+         * filesytems may not implement fstrim for less obvious reasons.  These
+         * will report EOPNOTSUPP; we simply ignore these errors.  Any other
+         * error means an unexpected error, so return it in those cases.  In
+         * some other cases ENOTTY will be reported (e.g. CD-ROMs).
+         */
+        ret = ioctl(fd, FITRIM, &r);
+        if (ret == -1) {
+            if (errno != ENOTTY && errno != EOPNOTSUPP) {
+                sprintf(err_msg, "failed to trim %s, %s",
+                        mount->dirname, strerror(errno));
+                error_set(err, QERR_QGA_COMMAND_FAILED, err_msg);
+                close(fd);
+                goto error;
+            }
+        }
+        close(fd);
+    }
+
+error:
+    free_fs_mount_list(&mounts);
+}
+#endif /* CONFIG_FSTRIM */
+
+
 #define LINUX_SYS_STATE_FILE "/sys/power/state"
 #define SUSPEND_SUPPORTED 0
 #define SUSPEND_NOT_SUPPORTED 1
@@ -918,7 +982,15 @@  int64_t qmp_guest_fsfreeze_thaw(Error **err)
 
     return 0;
 }
+#endif /* CONFIG_FSFREEZE */
+
+#if !defined(CONFIG_FSTRIM)
+void qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **err)
+{
+    error_set(err, QERR_UNSUPPORTED);
 
+    return;
+}
 #endif
 
 /* register init/cleanup routines for stateful command groups */
diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index eb8d140..54bc546 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -173,6 +173,17 @@  int64_t qmp_guest_fsfreeze_thaw(Error **err)
     return 0;
 }
 
+/*
+ * Walk list of mounted file systems in the guest, and discard unused
+ * areas.
+ */
+void qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **err)
+{
+    error_set(err, QERR_UNSUPPORTED);
+
+    return;
+}
+
 typedef enum {
     GUEST_SUSPEND_MODE_DISK,
     GUEST_SUSPEND_MODE_RAM