Message ID | 1337262266-32227-1-git-send-email-rjones@redhat.com |
---|---|
State | New |
Headers | show |
On 17 May 2012 14:44, Richard W.M. Jones <rjones@redhat.com> wrote: > From: "Richard W.M. Jones" <rjones@redhat.com> > > This produces a qcow2 file which is the different between > two disk images. ie, if: > > original.img - is a disk image (in any format) > modified.img - is a modified version of original.img > > then: > > qemu-img diff -b original.img modified.img diff.qcow2 > > creates 'diff.qcow2' which contains just the differences. Note that > 'diff.qcow2' has 'original.img' set as the backing file. Any chance of some more detailed explanation in the docs patch about what this actually means and why it's useful? I spent several minutes going "huh, does it even mean anything to calculate the difference between two binary disk images?" before realising that it's the presence of the backing file that makes it actually make sense... (maybe I'm just dense :-)) -- PMM
On 05/17/2012 07:44 AM, Richard W.M. Jones wrote: > From: "Richard W.M. Jones" <rjones@redhat.com> > > This produces a qcow2 file which is the different between > two disk images. ie, if: > > original.img - is a disk image (in any format) > modified.img - is a modified version of original.img > > then: > > qemu-img diff -b original.img modified.img diff.qcow2 > > creates 'diff.qcow2' which contains just the differences. Note that > 'diff.qcow2' has 'original.img' set as the backing file. Sounds useful! > > +DEF("diff", img_diff, > + "diff [-f fmt] [-F backing_fmt] [-O output_fmt] -b backing_file filename output_filename") Just so I'm clear: -f is for filename (the file with the modifications being extracted), -F is for backing_file (the file that serves as the base of the diff), and -O is for output_filename. > +STEXI > +@item rebase [-f @var{fmt}] [-F @var{backing_fmt}] [-O @var{output_fmt}] -b @var{backing_file} @var{filename} @var{output_filename} s/rebase/diff/ We also need to support -o options for the output_filename, so that we can expose other qcow2 attributes while creating the diff. For example, encryption, cluster_size, and preacllocation all come to mind. > + > + bdrv_get_geometry(bs_original, &num_sectors); > + bdrv_get_geometry(bs_modified, &modified_num_sectors); > + if (num_sectors != modified_num_sectors) { > + error_report("Number of sectors in backing and source must be the same"); > + goto out2; > + } Why are you requiring equality? I can see the usefulness of doing a diff where the modified file is larger than the original (basically, the diff was created by extending the original file to something larger). Prohibiting a modified file smaller than the original makes sense, so I think this should be >, not !=. > + > + /* Output image. */ > + if (fmt_out == NULL || fmt_out[0] == '\0') { > + fmt_out = "qcow2"; > + } > + ret = bdrv_img_create(out, fmt_out, > + /* original file becomes the new backing file */ > + original, fmt_original, > + NULL, num_sectors * BDRV_SECTOR_SIZE, BDRV_O_FLAGS); If you allow a modified larger than backing, then this should be modified_num_sectors, not num_sectors. > +++ b/qemu-img.texi > @@ -114,6 +114,23 @@ created as a copy on write image of the specified base image; the > @var{backing_file} should have the same content as the input's base image, > however the path, image format, etc may differ. > > +@item diff [-f @var{fmt}] [-F @var{backing_fmt}] [-O @var{output_fmt}] -b @var{backing_file} @var{filename} @var{output_filename} > + > +Create a new file (@var{output_filename}) which contains the > +differences between @var{backing_file} and @var{filename}. > + > +The @var{backing_file} and @var{filename} must have the same > +virtual disk size, but may be in different formats. Again, I think this is overly tight. > + > +@var{output_file} will have @var{backing_file} set as its backing > +file. The format of @var{output_file} must be one that supports > +backing files (currently @code{qcow2} is the default and only > +permitted output format). Why doesn't qed just work out of the box?
On 05/17/2012 07:52 AM, Peter Maydell wrote: > On 17 May 2012 14:44, Richard W.M. Jones <rjones@redhat.com> wrote: >> From: "Richard W.M. Jones" <rjones@redhat.com> >> >> This produces a qcow2 file which is the different between >> two disk images. ie, if: >> >> original.img - is a disk image (in any format) >> modified.img - is a modified version of original.img >> >> then: >> >> qemu-img diff -b original.img modified.img diff.qcow2 >> >> creates 'diff.qcow2' which contains just the differences. Note that >> 'diff.qcow2' has 'original.img' set as the backing file. > > Any chance of some more detailed explanation in the docs patch > about what this actually means and why it's useful? I spent > several minutes going "huh, does it even mean anything to > calculate the difference between two binary disk images?" > before realising that it's the presence of the backing file > that makes it actually make sense... Even something as simple as: Useful for converting a monolithic image back into a thin image on top of a common base.
On Thu, May 17, 2012 at 02:52:56PM +0100, Peter Maydell wrote: > On 17 May 2012 14:44, Richard W.M. Jones <rjones@redhat.com> wrote: > > From: "Richard W.M. Jones" <rjones@redhat.com> > > > > This produces a qcow2 file which is the different between > > two disk images. ie, if: > > > > original.img - is a disk image (in any format) > > modified.img - is a modified version of original.img > > > > then: > > > > qemu-img diff -b original.img modified.img diff.qcow2 > > > > creates 'diff.qcow2' which contains just the differences. Note that > > 'diff.qcow2' has 'original.img' set as the backing file. > > Any chance of some more detailed explanation in the docs patch > about what this actually means and why it's useful? I spent > several minutes going "huh, does it even mean anything to > calculate the difference between two binary disk images?" > before realising that it's the presence of the backing file > that makes it actually make sense... Well I'll say first of all that I was asked to implement this by a colleague. Personally, I'm far more organized than this, and I always use snapshots and backing files if I want to create an efficient COW from a base template :-) However my colleague has got himself into a situation where he has copied (ie. "cp" or equivalent) a guest several times from a template, and these guests have been running independently. He now wants to conserve disk space by turning this situation back into one where he has one backing file + several COW copies. To do this, he can (with this patch) do: qemu-img diff -b base.img the_copied_guest.img guest.qcow2 rm the_copied_guest.img 'guest.qcow2' will (in theory at least) be much smaller than the copied guests he has right now. Does that make sense? [BTW I'm still working on this. There are a few spelling mistakes and it needs a lot more testing. This patch is just for comment at the moment.] Rich.
On Thu, May 17, 2012 at 07:57:31AM -0600, Eric Blake wrote: [...] I just posted a v2 patch which fixes everything you mentioned except the case of resizing the disk, which I need to think about a bit more. Rich.
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index 49dce7c..01a9246 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -33,6 +33,12 @@ STEXI @item convert [-c] [-p] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} ETEXI +DEF("diff", img_diff, + "diff [-f fmt] [-F backing_fmt] [-O output_fmt] -b backing_file filename output_filename") +STEXI +@item rebase [-f @var{fmt}] [-F @var{backing_fmt}] [-O @var{output_fmt}] -b @var{backing_file} @var{filename} @var{output_filename} +ETEXI + DEF("info", img_info, "info [-f fmt] filename") STEXI diff --git a/qemu-img.c b/qemu-img.c index c8a70ff..6e3fe2a 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1533,6 +1533,156 @@ out: return 0; } +static int img_diff(int argc, char **argv) +{ + /* qemu-img diff -b original modified out */ + BlockDriverState *bs_original, *bs_modified, *bs_out; + const char *fmt_original, *original, + *fmt_modified, *modified, + *fmt_out, *out; + int c, ret = 0; + uint64_t num_sectors, modified_num_sectors; + uint64_t sector; + int n; + uint8_t *buf_original; + uint8_t *buf_modified; + + /* Parse commandline parameters */ + fmt_original = NULL; + fmt_modified = NULL; + fmt_out = NULL; + original = NULL; + for(;;) { + c = getopt(argc, argv, "hf:F:b:O:"); + if (c == -1) { + break; + } + switch(c) { + case '?': + case 'h': + help(); + return 0; + case 'f': + fmt_modified = optarg; + break; + case 'F': + fmt_original = optarg; + break; + case 'b': + original = optarg; + break; + case 'O': + fmt_out = optarg; + break; + } + } + + if (original == NULL) { + error_report("The -b (backing filename) option must be supplied"); + return 1; + } + + if (argc - optind != 2) { + error_report("The input and output filenames must be supplied"); + return 1; + } + modified = argv[optind++]; + out = argv[optind++]; + + /* Open the input images. */ + bs_original = bdrv_new_open(original, fmt_original, BDRV_O_FLAGS); + if (!bs_original) { + return 1; + } + + bs_modified = bdrv_new_open(modified, fmt_modified, BDRV_O_FLAGS); + if (!bs_modified) { + return 1; + } + + bdrv_get_geometry(bs_original, &num_sectors); + bdrv_get_geometry(bs_modified, &modified_num_sectors); + if (num_sectors != modified_num_sectors) { + error_report("Number of sectors in backing and source must be the same"); + goto out2; + } + + /* Output image. */ + if (fmt_out == NULL || fmt_out[0] == '\0') { + fmt_out = "qcow2"; + } + ret = bdrv_img_create(out, fmt_out, + /* original file becomes the new backing file */ + original, fmt_original, + NULL, num_sectors * BDRV_SECTOR_SIZE, BDRV_O_FLAGS); + if (ret != 0) { + goto out2; + } + bs_out = bdrv_new_open(out, fmt_out, BDRV_O_RDWR); + + buf_original = qemu_blockalign(bs_original, IO_BUF_SIZE); + buf_modified = qemu_blockalign(bs_modified, IO_BUF_SIZE); + + for (sector = 0; sector < num_sectors; sector += n) { + /* How many sectors can we handle with the next read? */ + if (sector + (IO_BUF_SIZE / BDRV_SECTOR_SIZE) <= num_sectors) { + n = IO_BUF_SIZE / BDRV_SECTOR_SIZE; + } else { + n = num_sectors - sector; + } + + /* Read input files and compare. */ + ret = bdrv_read(bs_original, sector, buf_original, n); + if (ret < 0) { + error_report("error while reading from backing file"); + goto out; + } + + ret = bdrv_read(bs_modified, sector, buf_modified, n); + if (ret < 0) { + error_report("error while reading from input file"); + goto out; + } + + /* If they differ, we need to write to the differences file. */ + uint64_t written = 0; + + while (written < n) { + int pnum; + + if (compare_sectors(buf_original + written * BDRV_SECTOR_SIZE, + buf_modified + written * BDRV_SECTOR_SIZE, + n - written, &pnum)) { + ret = bdrv_write(bs_out, sector + written, + buf_modified + written * BDRV_SECTOR_SIZE, + pnum); + if (ret < 0) { + error_report("Error while writing to output file: %s", + strerror(-ret)); + goto out; + } + } + + written += pnum; + } + } + + qemu_vfree(buf_original); + qemu_vfree(buf_modified); + + out: + /* Cleanup */ + bdrv_delete(bs_out); + out2: + bdrv_delete(bs_original); + bdrv_delete(bs_modified); + + if (ret) { + return 1; + } + return 0; +} + static int img_resize(int argc, char **argv) { int c, ret, relative; diff --git a/qemu-img.texi b/qemu-img.texi index b2ca3a5..e1a123b 100644 --- a/qemu-img.texi +++ b/qemu-img.texi @@ -114,6 +114,23 @@ created as a copy on write image of the specified base image; the @var{backing_file} should have the same content as the input's base image, however the path, image format, etc may differ. +@item diff [-f @var{fmt}] [-F @var{backing_fmt}] [-O @var{output_fmt}] -b @var{backing_file} @var{filename} @var{output_filename} + +Create a new file (@var{output_filename}) which contains the +differences between @var{backing_file} and @var{filename}. + +The @var{backing_file} and @var{filename} must have the same +virtual disk size, but may be in different formats. + +@var{output_file} will have @var{backing_file} set as its backing +file. The format of @var{output_file} must be one that supports +backing files (currently @code{qcow2} is the default and only +permitted output format). + +Typical usage is: + +@code{qemu-img diff -b original.img modified.img diff.qcow2} + @item info [-f @var{fmt}] @var{filename} Give information about the disk image @var{filename}. Use it in