Patchwork [1/5] block: Virtual Bridges VERDE GOW disk image format documentation

login
register
mail settings
Submitter Leonardo E. Reiter
Date March 8, 2012, 10:13 p.m.
Message ID <CA+BHXkKWf59_PpLZ1-vqvU7C=H33XNFN3e4ONV9hrU1064kedw@mail.gmail.com>
Download mbox | patch
Permalink /patch/145668/
State New
Headers show

Comments

Leonardo E. Reiter - March 8, 2012, 10:13 p.m.
commit 4b7b36f3776247c92615073b6fa0880d0a1ea1fb
Author: Leonardo E. Reiter <lreiter@vbridges.com>
Date:   Thu Mar 8 15:50:55 2012 -0600

    Documentation for Virtual Bridges GOW version 1, 2, and 3 disk image
formats.  Includes products consuming these disk image formats, basic
overview of how they work, and example use case for remote disk image
synchronization.

    Signed-off-by: Leonardo E. Reiter <lreiter@vbridges.com>

+way.  This assumes that the consumer is using the virtual desktop image in
COW
+mode, with no changes to the image file committed back to it.
+In this scenario, it is easy to do block-level delta downloads from the
master
+image that can even be interrupted and resumed from partial downloads.
+The consumer need only to copy blocks from the master image file that have
+a newer version number in the image's header than the local block.
+
+License
+=======
+Virtual Bridges GOW functionality is licensed under BSD-style terms that
+are identical to how most QEMU source files are licensed, including vl.c.
+Please check the respective source files for the comment header with this
+license stated explicitly.
+
+Copyright (C) 1984-2012 Virtual Bridges, Inc.  All Rights Reserved.
+
+
Stefan Hajnoczi - March 9, 2012, 10:50 a.m.
On Thu, Mar 8, 2012 at 10:13 PM, Leonardo E. Reiter
<lreiter@vbridges.com> wrote:
> commit 4b7b36f3776247c92615073b6fa0880d0a1ea1fb
> Author: Leonardo E. Reiter <lreiter@vbridges.com>
> Date:   Thu Mar 8 15:50:55 2012 -0600
>
>     Documentation for Virtual Bridges GOW version 1, 2, and 3 disk image
> formats.  Includes products consuming these disk image formats, basic
> overview of how they work, and example use case for remote disk image
> synchronization.
>
>     Signed-off-by: Leonardo E. Reiter <lreiter@vbridges.com>
>
> diff --git a/docs/vb-gow.txt b/docs/vb-gow.txt
> new file mode 100644
> index 0000000..e2ba64b
> --- /dev/null
> +++ b/docs/vb-gow.txt
> @@ -0,0 +1,71 @@
> +Virtual Bridges GOW Disk Image formats
> +======================================
> +GOW has 3 versions that covers the following products:
> + v1: (circa 2006) Win4Lin Pro, Win4BSD, Win4Solaris
> + v2: (circa 2008) Virtual Bridges VERDE,
> + IBM Virtual Desktop for Smart Business
> + v3: (circa 2009) Virtual Bridges VERDE,
> + IBM Virtual Desktop for Smart Business
> +
> +Current versions of VERDE and IBM Virtual Desktop for Smart Business use
> both
> +versions 2 and 3 in virtual machines to store user images and gold images,
> +respectively.  Older versions of VERDE (prior to 2009) used only version 2.
> +
> +What is GOW?
> +============
> +GOW stands for "Grow on Write", which is a very simple disk image format
> that
> +grows by 64KB blocks when data is added to disk images.  Data added to
> existing
> +allocated areas do not result in growth of course.  There is no compression
> +other than that explicitly available by not having to allocate unused
> blocks
> +in order to handle data beyond them.  A simple header at the beginning of
> the
> +image file maps logical blocks (from the guest's perspective) to file
> offsets
> +(from the host's perspective).
> +This image format is optimized for product virtual desktop use cases only,
> and
> +has been in mainstream use since 2006.  Both Virtual Bridges and IBM sell
> +new versions of this technology worldwide at the time of this writing.
> +GOW implementation memory maps the allocation map in order to reduce
> additional
> +file-level system calls during normal operation.  It supports both
> read-only
> +and read-write images.

The mmap(2) approach doesn't support QEMU's "protocol" concept where
an image format block driver is independent of the underlying storage
(host file system, NBD, HTTP, etc).  In QEMU block layer terminology
NBD, HTTP, and the host file system block drivers are "protocols" in
that they give access to data.  It's not possible to mmap(2) over NBD
or HTTP.

(I'm doing a linear code review, so perhaps your later patches avoid
using mmap.  But at this point I wanted to comment on this.)

> +
> +Differences Between Versions
> +============================
> +Version 1 supports disk images of up to 64GB logical size, with an
> approximate
> +4MB header overhead.
> +Version 2 supports disk images of up to 256GB logical size, with an
> approximate
> +16MB header overhead.  It also aligned file offsets of logical sectors to
> 512-
> +byte boundaries (starting at the first such boundary following the header).
> +Version 3 supports disk images of up to 256GB logical size, with an
> approximate
> +32MB header overhead.  It is identical to GOW2 except that it also tracks
> +block-level version numbers, incrementing (once-per-session) them on
> changes
> +or new allocation.  This allows for very easy delta size calculations when
> +synchronizing images with external tools (see below).
> +File offsets in headers are expressed in 64KB blocks.  Block 0 starts
> +immediately after the header for each version.  In GOW2 and GOW3, block 0
> +is actually aligned to a 512 byte boundary beyond the header.  Using 64KB
> +blocks allows the use of 32-bit unsigned integers in the header itself,
> rather
> +than 64-bit integers, to store offsets even for large files.  This cuts the
> +header size requirement in half while adding only a minimal shift overhead
> to
> +offset calculations.

This is a good overview.  It would be nice to see a structure-level
specification of the file format on disk, but given this explanation
it doesn't seem critical unless you wish to do that.

> +Advanced Use Case: Synchronizing Remote Images from Master
> +==========================================================
> +One of the technologies VERDE provides is "decentralized" virtual desktops.
> +This means a gold master image living in the data center can be cached on
> +either local desktop(s) or local server(s) to use offline or in a
> decentralized
> +way.  This assumes that the consumer is using the virtual desktop image in
> COW
> +mode, with no changes to the image file committed back to it.
> +In this scenario, it is easy to do block-level delta downloads from the
> master
> +image that can even be interrupted and resumed from partial downloads.
> +The consumer need only to copy blocks from the master image file that have
> +a newer version number in the image's header than the local block.

This sounds cool and reminds me of the image streaming code that was
added upstream recently although GOW takes a different approach using
block version numbers instead of purely relying on allocating
information.

> +License
> +=======
> +Virtual Bridges GOW functionality is licensed under BSD-style terms that
> +are identical to how most QEMU source files are licensed, including vl.c.
> +Please check the respective source files for the comment header with this
> +license stated explicitly.
> +
> +Copyright (C) 1984-2012 Virtual Bridges, Inc.  All Rights Reserved.

This has been raised in similar situations in the past: you have BSD
licensed this but then say "All Rights Reserved".  What does that
mean?  You have just given rights to distribute, modify, etc through
the BSD license so I'm not sure it makes sense to reserve all rights.
Your copyright is fine but you cannot restrict rights, that would
conflict with QEMU's license (which overall is GPL).

Stefan
Leonardo E. Reiter - March 11, 2012, 9:03 p.m.
On Fri, Mar 9, 2012 at 4:50 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:

> The mmap(2) approach doesn't support QEMU's "protocol" concept where
> an image format block driver is independent of the underlying storage
> (host file system, NBD, HTTP, etc).  In QEMU block layer terminology
> NBD, HTTP, and the host file system block drivers are "protocols" in
> that they give access to data.  It's not possible to mmap(2) over NBD
> or HTTP.
>
> (I'm doing a linear code review, so perhaps your later patches avoid
> using mmap.  But at this point I wanted to comment on this.)
>
indeed mmap() is used in the code.  This is unfortunate that it cannot be
used.  It's a really high performance way to achieve what we want here, and
very safe for the use-case.  Of course the only medium we support in the
product that uses this is filesystem, so I see your point.  I'll see about
using some different mechanism.

>
>
> This is a good overview.  It would be nice to see a structure-level
> specification of the file format on disk, but given this explanation
> it doesn't seem critical unless you wish to do that.
>
Thanks - I'd rather not.  The format is actually quite obvious from the
code.  It's very simple and doesn't involve any sort of clustering, etc.
 There's not much more than the overview that is not quickly understood
from the code itself (even the .h file).

This has been raised in similar situations in the past: you have BSD
> licensed this but then say "All Rights Reserved".  What does that
> mean?  You have just given rights to distribute, modify, etc through
> the BSD license so I'm not sure it makes sense to reserve all rights.
> Your copyright is fine but you cannot restrict rights, that would
> conflict with QEMU's license (which overall is GPL).
>
I'm happy to hack off the "All Rights Reserved".  Our main goal is to get
this accepted upstream.  We provide value to customers with our knowledge
and our higher level frameworks, not with this disk image format by itself.
 Also as far as image formats go, as you can see, it's pretty trivial.  We
chose BSD because 1) QEMU was all BSD a few years back when we originated
this, and 2) it plays nice with both open source and closed source.  If
someone wants to take this and do what they want with it, that's fine with
me (and my company).  We have been shipping these patches for years with
our commercial product so it's not new to the market.

I have to post a v2 of the patches anyway - I'll make sure to hack off the
"All Rights Reserved" clause in those.

Thanks for your time on this,

- Leo

>
> Stefan
>
Christoph Hellwig - March 24, 2012, 3:43 p.m.
On Sun, Mar 11, 2012 at 04:03:01PM -0500, Leonardo E. Reiter wrote:
> indeed mmap() is used in the code.  This is unfortunate that it cannot be
> used.  It's a really high performance way to achieve what we want here, and
> very safe for the use-case.  Of course the only medium we support in the
> product that uses this is filesystem, so I see your point.  I'll see about
> using some different mechanism.

using shared writeable mmaps for disk I/O is never a safe approach, as
there is no way for sane error handling.

Patch

diff --git a/docs/vb-gow.txt b/docs/vb-gow.txt
new file mode 100644
index 0000000..e2ba64b
--- /dev/null
+++ b/docs/vb-gow.txt
@@ -0,0 +1,71 @@ 
+Virtual Bridges GOW Disk Image formats
+======================================
+GOW has 3 versions that covers the following products:
+ v1: (circa 2006) Win4Lin Pro, Win4BSD, Win4Solaris
+ v2: (circa 2008) Virtual Bridges VERDE,
+ IBM Virtual Desktop for Smart Business
+ v3: (circa 2009) Virtual Bridges VERDE,
+ IBM Virtual Desktop for Smart Business
+
+Current versions of VERDE and IBM Virtual Desktop for Smart Business use
both
+versions 2 and 3 in virtual machines to store user images and gold images,
+respectively.  Older versions of VERDE (prior to 2009) used only version 2.
+
+What is GOW?
+============
+GOW stands for "Grow on Write", which is a very simple disk image format
that
+grows by 64KB blocks when data is added to disk images.  Data added to
existing
+allocated areas do not result in growth of course.  There is no compression
+other than that explicitly available by not having to allocate unused
blocks
+in order to handle data beyond them.  A simple header at the beginning of
the
+image file maps logical blocks (from the guest's perspective) to file
offsets
+(from the host's perspective).
+This image format is optimized for product virtual desktop use cases only,
and
+has been in mainstream use since 2006.  Both Virtual Bridges and IBM sell
+new versions of this technology worldwide at the time of this writing.
+GOW implementation memory maps the allocation map in order to reduce
additional
+file-level system calls during normal operation.  It supports both
read-only
+and read-write images.
+
+Differences Between Versions
+============================
+Version 1 supports disk images of up to 64GB logical size, with an
approximate
+4MB header overhead.
+Version 2 supports disk images of up to 256GB logical size, with an
approximate
+16MB header overhead.  It also aligned file offsets of logical sectors to
512-
+byte boundaries (starting at the first such boundary following the header).
+Version 3 supports disk images of up to 256GB logical size, with an
approximate
+32MB header overhead.  It is identical to GOW2 except that it also tracks
+block-level version numbers, incrementing (once-per-session) them on
changes
+or new allocation.  This allows for very easy delta size calculations when
+synchronizing images with external tools (see below).
+File offsets in headers are expressed in 64KB blocks.  Block 0 starts
+immediately after the header for each version.  In GOW2 and GOW3, block 0
+is actually aligned to a 512 byte boundary beyond the header.  Using 64KB
+blocks allows the use of 32-bit unsigned integers in the header itself,
rather
+than 64-bit integers, to store offsets even for large files.  This cuts the
+header size requirement in half while adding only a minimal shift overhead
to
+offset calculations.
+
+Advanced Use Case: Synchronizing Remote Images from Master
+==========================================================
+One of the technologies VERDE provides is "decentralized" virtual desktops.
+This means a gold master image living in the data center can be cached on
+either local desktop(s) or local server(s) to use offline or in a
decentralized