diff mbox

qemu-img: Implement 'diff' operation.

Message ID 1337262266-32227-1-git-send-email-rjones@redhat.com
State New
Headers show

Commit Message

Richard W.M. Jones May 17, 2012, 1:44 p.m. UTC
From: "Richard W.M. Jones" <rjones@redhat.com>

This produces a qcow2 file which is the different between
two disk images.  ie, if:

  original.img - is a disk image (in any format)
  modified.img - is a modified version of original.img

then:

  qemu-img diff -b original.img modified.img diff.qcow2

creates 'diff.qcow2' which contains just the differences.  Note that
'diff.qcow2' has 'original.img' set as the backing file.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Cc: Matthew Booth <mbooth@redhat.com>
Cc: Pablo Iranzo Gómez <Pablo.Iranzo@redhat.com>
Cc: Tomas Von Veschler <tvvcox@redhat.com>
---
 qemu-img-cmds.hx |    6 +++
 qemu-img.c       |  150 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-img.texi    |   17 +++++++
 3 files changed, 173 insertions(+)

Comments

Peter Maydell May 17, 2012, 1:52 p.m. UTC | #1
On 17 May 2012 14:44, Richard W.M. Jones <rjones@redhat.com> wrote:
> From: "Richard W.M. Jones" <rjones@redhat.com>
>
> This produces a qcow2 file which is the different between
> two disk images.  ie, if:
>
>  original.img - is a disk image (in any format)
>  modified.img - is a modified version of original.img
>
> then:
>
>  qemu-img diff -b original.img modified.img diff.qcow2
>
> creates 'diff.qcow2' which contains just the differences.  Note that
> 'diff.qcow2' has 'original.img' set as the backing file.

Any chance of some more detailed explanation in the docs patch
about what this actually means and why it's useful? I spent
several minutes going "huh, does it even mean anything to
calculate the difference between two binary disk images?"
before realising that it's the presence of the backing file
that makes it actually make sense...

(maybe I'm just dense :-))

-- PMM
Eric Blake May 17, 2012, 1:57 p.m. UTC | #2
On 05/17/2012 07:44 AM, Richard W.M. Jones wrote:
> From: "Richard W.M. Jones" <rjones@redhat.com>
> 
> This produces a qcow2 file which is the different between
> two disk images.  ie, if:
> 
>   original.img - is a disk image (in any format)
>   modified.img - is a modified version of original.img
> 
> then:
> 
>   qemu-img diff -b original.img modified.img diff.qcow2
> 
> creates 'diff.qcow2' which contains just the differences.  Note that
> 'diff.qcow2' has 'original.img' set as the backing file.

Sounds useful!

>  
> +DEF("diff", img_diff,
> +    "diff [-f fmt] [-F backing_fmt] [-O output_fmt] -b backing_file filename output_filename")

Just so I'm clear: -f is for filename (the file with the modifications
being extracted), -F is for backing_file (the file that serves as the
base of the diff), and -O is for output_filename.

> +STEXI
> +@item rebase [-f @var{fmt}] [-F @var{backing_fmt}] [-O @var{output_fmt}] -b @var{backing_file} @var{filename} @var{output_filename}

s/rebase/diff/

We also need to support -o options for the output_filename, so that we
can expose other qcow2 attributes while creating the diff.  For example,
encryption, cluster_size, and preacllocation all come to mind.


> +
> +    bdrv_get_geometry(bs_original, &num_sectors);
> +    bdrv_get_geometry(bs_modified, &modified_num_sectors);
> +    if (num_sectors != modified_num_sectors) {
> +        error_report("Number of sectors in backing and source must be the same");
> +        goto out2;
> +    }

Why are you requiring equality?  I can see the usefulness of doing a
diff where the modified file is larger than the original (basically, the
diff was created by extending the original file to something larger).
Prohibiting a modified file smaller than the original makes sense, so I
think this should be >, not !=.

> +
> +    /* Output image. */
> +    if (fmt_out == NULL || fmt_out[0] == '\0') {
> +        fmt_out = "qcow2";
> +    }
> +    ret = bdrv_img_create(out, fmt_out,
> +                          /* original file becomes the new backing file */
> +                          original, fmt_original,
> +                          NULL, num_sectors * BDRV_SECTOR_SIZE, BDRV_O_FLAGS);

If you allow a modified larger than backing, then this should be
modified_num_sectors, not num_sectors.


> +++ b/qemu-img.texi
> @@ -114,6 +114,23 @@ created as a copy on write image of the specified base image; the
>  @var{backing_file} should have the same content as the input's base image,
>  however the path, image format, etc may differ.
>  
> +@item diff [-f @var{fmt}] [-F @var{backing_fmt}] [-O @var{output_fmt}] -b @var{backing_file} @var{filename} @var{output_filename}
> +
> +Create a new file (@var{output_filename}) which contains the
> +differences between @var{backing_file} and @var{filename}.
> +
> +The @var{backing_file} and @var{filename} must have the same
> +virtual disk size, but may be in different formats.

Again, I think this is overly tight.

> +
> +@var{output_file} will have @var{backing_file} set as its backing
> +file.  The format of @var{output_file} must be one that supports
> +backing files (currently @code{qcow2} is the default and only
> +permitted output format).

Why doesn't qed just work out of the box?
Eric Blake May 17, 2012, 1:58 p.m. UTC | #3
On 05/17/2012 07:52 AM, Peter Maydell wrote:
> On 17 May 2012 14:44, Richard W.M. Jones <rjones@redhat.com> wrote:
>> From: "Richard W.M. Jones" <rjones@redhat.com>
>>
>> This produces a qcow2 file which is the different between
>> two disk images.  ie, if:
>>
>>  original.img - is a disk image (in any format)
>>  modified.img - is a modified version of original.img
>>
>> then:
>>
>>  qemu-img diff -b original.img modified.img diff.qcow2
>>
>> creates 'diff.qcow2' which contains just the differences.  Note that
>> 'diff.qcow2' has 'original.img' set as the backing file.
> 
> Any chance of some more detailed explanation in the docs patch
> about what this actually means and why it's useful? I spent
> several minutes going "huh, does it even mean anything to
> calculate the difference between two binary disk images?"
> before realising that it's the presence of the backing file
> that makes it actually make sense...

Even something as simple as:

Useful for converting a monolithic image back into a thin image on top
of a common base.
Richard W.M. Jones May 17, 2012, 2:01 p.m. UTC | #4
On Thu, May 17, 2012 at 02:52:56PM +0100, Peter Maydell wrote:
> On 17 May 2012 14:44, Richard W.M. Jones <rjones@redhat.com> wrote:
> > From: "Richard W.M. Jones" <rjones@redhat.com>
> >
> > This produces a qcow2 file which is the different between
> > two disk images.  ie, if:
> >
> >  original.img - is a disk image (in any format)
> >  modified.img - is a modified version of original.img
> >
> > then:
> >
> >  qemu-img diff -b original.img modified.img diff.qcow2
> >
> > creates 'diff.qcow2' which contains just the differences.  Note that
> > 'diff.qcow2' has 'original.img' set as the backing file.
> 
> Any chance of some more detailed explanation in the docs patch
> about what this actually means and why it's useful? I spent
> several minutes going "huh, does it even mean anything to
> calculate the difference between two binary disk images?"
> before realising that it's the presence of the backing file
> that makes it actually make sense...

Well I'll say first of all that I was asked to implement this by a
colleague.  Personally, I'm far more organized than this, and I always
use snapshots and backing files if I want to create an efficient COW
from a base template :-)

However my colleague has got himself into a situation where he has
copied (ie. "cp" or equivalent) a guest several times from a template,
and these guests have been running independently.  He now wants to
conserve disk space by turning this situation back into one where he
has one backing file + several COW copies.

To do this, he can (with this patch) do:

  qemu-img diff -b base.img the_copied_guest.img guest.qcow2
  rm the_copied_guest.img

'guest.qcow2' will (in theory at least) be much smaller than the
copied guests he has right now.

Does that make sense?

[BTW I'm still working on this.  There are a few spelling mistakes and
it needs a lot more testing.  This patch is just for comment at the
moment.]

Rich.
Richard W.M. Jones May 17, 2012, 2:58 p.m. UTC | #5
On Thu, May 17, 2012 at 07:57:31AM -0600, Eric Blake wrote:
[...]

I just posted a v2 patch which fixes everything you mentioned except
the case of resizing the disk, which I need to think about a bit more.

Rich.
diff mbox

Patch

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 49dce7c..01a9246 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -33,6 +33,12 @@  STEXI
 @item convert [-c] [-p] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
+DEF("diff", img_diff,
+    "diff [-f fmt] [-F backing_fmt] [-O output_fmt] -b backing_file filename output_filename")
+STEXI
+@item rebase [-f @var{fmt}] [-F @var{backing_fmt}] [-O @var{output_fmt}] -b @var{backing_file} @var{filename} @var{output_filename}
+ETEXI
+
 DEF("info", img_info,
     "info [-f fmt] filename")
 STEXI
diff --git a/qemu-img.c b/qemu-img.c
index c8a70ff..6e3fe2a 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1533,6 +1533,156 @@  out:
     return 0;
 }
 
+static int img_diff(int argc, char **argv)
+{
+    /* qemu-img diff -b original modified out */
+    BlockDriverState *bs_original, *bs_modified, *bs_out;
+    const char *fmt_original, *original,
+        *fmt_modified, *modified,
+        *fmt_out, *out;
+    int c, ret = 0;
+    uint64_t num_sectors, modified_num_sectors;
+    uint64_t sector;
+    int n;
+    uint8_t *buf_original;
+    uint8_t *buf_modified;
+
+    /* Parse commandline parameters */
+    fmt_original = NULL;
+    fmt_modified = NULL;
+    fmt_out = NULL;
+    original = NULL;
+    for(;;) {
+        c = getopt(argc, argv, "hf:F:b:O:");
+        if (c == -1) {
+            break;
+        }
+        switch(c) {
+        case '?':
+        case 'h':
+            help();
+            return 0;
+        case 'f':
+            fmt_modified = optarg;
+            break;
+        case 'F':
+            fmt_original = optarg;
+            break;
+        case 'b':
+            original = optarg;
+            break;
+        case 'O':
+            fmt_out = optarg;
+            break;
+        }
+    }
+
+    if (original == NULL) {
+        error_report("The -b (backing filename) option must be supplied");
+        return 1;
+    }
+
+    if (argc - optind != 2) {
+        error_report("The input and output filenames must be supplied");
+        return 1;
+    }
+    modified = argv[optind++];
+    out = argv[optind++];
+
+    /* Open the input images. */
+    bs_original = bdrv_new_open(original, fmt_original, BDRV_O_FLAGS);
+    if (!bs_original) {
+        return 1;
+    }
+
+    bs_modified = bdrv_new_open(modified, fmt_modified, BDRV_O_FLAGS);
+    if (!bs_modified) {
+        return 1;
+    }
+
+    bdrv_get_geometry(bs_original, &num_sectors);
+    bdrv_get_geometry(bs_modified, &modified_num_sectors);
+    if (num_sectors != modified_num_sectors) {
+        error_report("Number of sectors in backing and source must be the same");
+        goto out2;
+    }
+
+    /* Output image. */
+    if (fmt_out == NULL || fmt_out[0] == '\0') {
+        fmt_out = "qcow2";
+    }
+    ret = bdrv_img_create(out, fmt_out,
+                          /* original file becomes the new backing file */
+                          original, fmt_original,
+                          NULL, num_sectors * BDRV_SECTOR_SIZE, BDRV_O_FLAGS);
+    if (ret != 0) {
+        goto out2;
+    }
+    bs_out = bdrv_new_open(out, fmt_out, BDRV_O_RDWR);
+
+    buf_original = qemu_blockalign(bs_original, IO_BUF_SIZE);
+    buf_modified = qemu_blockalign(bs_modified, IO_BUF_SIZE);
+
+    for (sector = 0; sector < num_sectors; sector += n) {
+        /* How many sectors can we handle with the next read? */
+        if (sector + (IO_BUF_SIZE / BDRV_SECTOR_SIZE) <= num_sectors) {
+            n = IO_BUF_SIZE / BDRV_SECTOR_SIZE;
+        } else {
+            n = num_sectors - sector;
+        }
+
+        /* Read input files and compare. */
+        ret = bdrv_read(bs_original, sector, buf_original, n);
+        if (ret < 0) {
+            error_report("error while reading from backing file");
+            goto out;
+        }
+
+        ret = bdrv_read(bs_modified, sector, buf_modified, n);
+        if (ret < 0) {
+            error_report("error while reading from input file");
+            goto out;
+        }
+
+        /* If they differ, we need to write to the differences file. */
+        uint64_t written = 0;
+
+        while (written < n) {
+            int pnum;
+
+            if (compare_sectors(buf_original + written * BDRV_SECTOR_SIZE,
+                                buf_modified + written * BDRV_SECTOR_SIZE,
+                                n - written, &pnum)) {
+                ret = bdrv_write(bs_out, sector + written,
+                                 buf_modified + written * BDRV_SECTOR_SIZE,
+                                 pnum);
+                if (ret < 0) {
+                    error_report("Error while writing to output file: %s",
+                                 strerror(-ret));
+                    goto out;
+                }
+            }
+
+            written += pnum;
+        }
+    }
+
+    qemu_vfree(buf_original);
+    qemu_vfree(buf_modified);
+
+ out:
+    /* Cleanup */
+    bdrv_delete(bs_out);
+ out2:
+    bdrv_delete(bs_original);
+    bdrv_delete(bs_modified);
+
+    if (ret) {
+        return 1;
+    }
+    return 0;
+}
+
 static int img_resize(int argc, char **argv)
 {
     int c, ret, relative;
diff --git a/qemu-img.texi b/qemu-img.texi
index b2ca3a5..e1a123b 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -114,6 +114,23 @@  created as a copy on write image of the specified base image; the
 @var{backing_file} should have the same content as the input's base image,
 however the path, image format, etc may differ.
 
+@item diff [-f @var{fmt}] [-F @var{backing_fmt}] [-O @var{output_fmt}] -b @var{backing_file} @var{filename} @var{output_filename}
+
+Create a new file (@var{output_filename}) which contains the
+differences between @var{backing_file} and @var{filename}.
+
+The @var{backing_file} and @var{filename} must have the same
+virtual disk size, but may be in different formats.
+
+@var{output_file} will have @var{backing_file} set as its backing
+file.  The format of @var{output_file} must be one that supports
+backing files (currently @code{qcow2} is the default and only
+permitted output format).
+
+Typical usage is:
+
+@code{qemu-img diff -b original.img modified.img diff.qcow2}
+
 @item info [-f @var{fmt}] @var{filename}
 
 Give information about the disk image @var{filename}. Use it in