Patchwork Add qcow2 documentation

login
register
mail settings
Submitter Kevin Wolf
Date March 8, 2011, 11:47 a.m.
Message ID <1299584839-5688-1-git-send-email-kwolf@redhat.com>
Download mbox | patch
Permalink /patch/85970/
State New
Headers show

Comments

Kevin Wolf - March 8, 2011, 11:47 a.m.
This adds a description of the qcow2 file format to the docs/ directory.
Besides documenting what's there, which is never wrong, the document should
provide a good basis for the discussion of format extensions (called "qcow3"
in previous discussions)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 docs/specs/qcow2.txt |  228 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 228 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/qcow2.txt
Stefan Hajnoczi - March 8, 2011, 1:13 p.m.
On Tue, Mar 8, 2011 at 11:47 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> +         20 - 23:   cluster_bits
> +                    Number of bits that are used for addressing an offset
> +                    within a cluster (1 << cluster_bits is the cluster size)
> +
> +         24 - 31:   size
> +                    Virtual disk size in bytes

Any constraints on these two fields that should be mentioned?

Stefan
Kevin Wolf - March 8, 2011, 1:31 p.m.
Am 08.03.2011 14:13, schrieb Stefan Hajnoczi:
> On Tue, Mar 8, 2011 at 11:47 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> +         20 - 23:   cluster_bits
>> +                    Number of bits that are used for addressing an offset
>> +                    within a cluster (1 << cluster_bits is the cluster size)
>> +
>> +         24 - 31:   size
>> +                    Virtual disk size in bytes
> 
> Any constraints on these two fields that should be mentioned?

For the size not that I'm aware of.

For cluster_bits qemu restricts it to 512 <= cluster_size <= 2 MB. I
think we should add 512 as a lower limit, anything smaller doesn't make
sense and steals us bits that we want to use for flags.

The 2 MB are more of an implementation limitation. Would you mention it
here? The format shouldn't have any problem with larger sizes.

Kevin
Stefan Hajnoczi - March 8, 2011, 1:48 p.m.
On Tue, Mar 8, 2011 at 1:31 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 08.03.2011 14:13, schrieb Stefan Hajnoczi:
>> On Tue, Mar 8, 2011 at 11:47 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> +         20 - 23:   cluster_bits
>>> +                    Number of bits that are used for addressing an offset
>>> +                    within a cluster (1 << cluster_bits is the cluster size)
>>> +
>>> +         24 - 31:   size
>>> +                    Virtual disk size in bytes
>>
>> Any constraints on these two fields that should be mentioned?
>
> For the size not that I'm aware of.
>
> For cluster_bits qemu restricts it to 512 <= cluster_size <= 2 MB. I
> think we should add 512 as a lower limit, anything smaller doesn't make
> sense and steals us bits that we want to use for flags.
>
> The 2 MB are more of an implementation limitation. Would you mention it
> here? The format shouldn't have any problem with larger sizes.

It could be mentioned as an explicit implementation limit so that a
third party implementing qcow2 support from scratch doesn't use the
format in ways that won't work with QEMU's implementation.

Stefan
Dushyant Bansal - March 9, 2011, 6:08 p.m.
On Tuesday 08 March 2011 05:17 PM, Kevin Wolf wrote:
> This adds a description of the qcow2 file format to the docs/ directory.
> Besides documenting what's there, which is never wrong, the document should
> provide a good basis for the discussion of format extensions (called "qcow3"
> in previous discussions)
>
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> ---
>   docs/specs/qcow2.txt |  228 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 files changed, 228 insertions(+), 0 deletions(-)
>   create mode 100644 docs/specs/qcow2.txt
>
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> new file mode 100644
> index 0000000..0e7bcda
> --- /dev/null
> +++ b/docs/specs/qcow2.txt
> @@ -0,0 +1,228 @@
> +== Clusters ==
> +
> +A qcow2 image file is organized in units of constant size, which are called
> +(host) clusters. A cluster is the unit in which all allocations are done,
> +both for actual guest data and for image metadata.
> +
> +Likewise, the virtual disk as seen by the guest is divided into (guest)
> +clusters of the same size.
> +
> +
> +== Header ==
> +
> +The first cluster of a qcow2 image contains the file header:
> +
> +    Byte  0 -  3:   magic
> +                    QCOW magic string ("QFI\xfb")
> +
> +          4 -  7:   version
> +                    Version number (only valid value is 2)
> +
> +          8 - 15:   backing_file_offset
> +                    Offset into the image file at which the backing file name
> +                    is stored (NB: The string is not null terminated). 0 if the
> +                    image doesn't have a backing file.
> +
> +         16 - 19:   backing_file_size
> +                    Length of the backing file name in bytes. Must not be
> +                    longer than 1023 bytes. Undefined if the image doesn't have
> +                    a backing file.
> +
> +         20 - 23:   cluster_bits
> +                    Number of bits that are used for addressing an offset
> +                    within a cluster (1<<  cluster_bits is the cluster size)
> +
> +         24 - 31:   size
> +                    Virtual disk size in bytes
> +
> +         32 - 35:   crypt_method
> +                    0 for no encryption
> +                    1 for AES encryption
> +
> +         36 - 39:   l1_size
> +                    Number of entries in the active L1 table
> +
> +         40 - 47:   l1_table_offset
> +                    Offset into the image file at which the active L1 table
> +                    starts. Must be aligned to a cluster boundary.
> +
> +         48 - 55:   refcount_table_offset
> +                    Offset into the image file at which the refcount table
> +                    starts. Must be aligned to a cluster boundary.
> +
> +         56 - 59:   refcount_table_clusters
> +                    Number of clusters that the refcount table occupies
> +
> +         60 - 63:   nb_snapshots
> +                    Number of snapshots contained in the image
> +
> +         64 - 71:   snapshots_offset
> +                    Offset into the image file at which the snapshot table
> +                    starts. Must be aligned to a cluster boundary.
> +
> +All numbers in qcow2 are stored in Big Endian byte order.
> +
> +
> +== Host cluster management ==
> +
> +qcow2 manages the allocation of host clusters by maintaining a reference count
> +for each host cluster. A refcount of 0 means that the cluster is free, 1 means
> +that it is used, and>= 2 means that it is used and any write access must
> +perform a COW (copy on write) operation.
> +
> +The refcounts are managed in a two-level table. The first level is called
> +refcount table and has a variable size (which is stored in the header). The
> +refcount table can cover multiple clusters, however it needs to be contiguous
> +in the image file.
> +
> +It contains pointers to the second level structures which are called refcount
> +blocks and are exactly one cluster in size.
> +
> +Given a offset into the image file, the refcount of its cluster can be obtained
> +as follows:
> +
> +    refcount_block_entries = (cluster_size / sizeof(uint16_t))
> +
> +    refcount_block_index = (offset / cluster_size) % refcount_table_entries
> +    refcount_table_index = (offset / cluster_size) / refcount_table_entries
> +
> +    refcount_block = load_cluster(refcount_table[refcount_table_index]);
> +    return refcount_block[refcount_block_index];
> +
> +Refcount table entry:
> +
> +    Bit  0 -  8:    Reserved (set to 0)
> +
> +         9 - 63:    Bits 9-63 of the offset into the image file at which the
> +                    refcount block starts. Must be aligned to a cluster
> +                    boundary.
> +
> +                    If this is 0, the corresponding refcount block has not yet
> +                    been allocated. All refcounts managed by this refcount block
> +                    are 0.
> +
> +Refcount block entry:
> +
> +    Bit  0 - 15:    Reference count of the cluster
> +
> +
> +== Cluster mapping ==
> +
> +Just as for refcounts, qcow2 uses a two-level structure for the mapping of
> +guest clusters to host clusters. They are called L1 and L2 table.
> +
> +The L1 table has a variable size (stored in the header) and may use multiple
> +clusters, however it must be contiguous in the image file. L2 tables are
> +exactly one cluster in size.
> +
> +Given a offset into the virtual disk, the offset into the image file can be
> +obtained as follows:
> +
> +    l2_entries = (cluster_size / sizeof(uint64_t))
> +
> +    l2_index = (offset / cluster_size) % l2_entries
> +    l1_index = (offset / cluster_size) / l2_entries
> +
> +    l2_table = load_cluster(l1_table[l1_index]);
> +    cluster_offset = refcount_block[l2_index];
>    
It should be cluster_offset = l2_table[l2_index];
Right?

--
Dushyant
Stefan Hajnoczi - March 9, 2011, 9:46 p.m.
On Wed, Mar 9, 2011 at 6:08 PM, Dushyant Bansal
<cs5070214@cse.iitd.ac.in> wrote:
> On Tuesday 08 March 2011 05:17 PM, Kevin Wolf wrote:
>> +    l2_entries = (cluster_size / sizeof(uint64_t))
>> +
>> +    l2_index = (offset / cluster_size) % l2_entries
>> +    l1_index = (offset / cluster_size) / l2_entries
>> +
>> +    l2_table = load_cluster(l1_table[l1_index]);
>> +    cluster_offset = refcount_block[l2_index];
>>
>
> It should be cluster_offset = l2_table[l2_index];
> Right?

Good catch.

Kevin, besides what Dushyant found it looks good.

Stefan
Kevin Wolf - March 10, 2011, 8:12 a.m.
Am 09.03.2011 19:08, schrieb Dushyant Bansal:
> On Tuesday 08 March 2011 05:17 PM, Kevin Wolf wrote:
>> This adds a description of the qcow2 file format to the docs/ directory.
>> Besides documenting what's there, which is never wrong, the document should
>> provide a good basis for the discussion of format extensions (called "qcow3"
>> in previous discussions)
>>
>> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
>> ---
>>   docs/specs/qcow2.txt |  228 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 files changed, 228 insertions(+), 0 deletions(-)
>>   create mode 100644 docs/specs/qcow2.txt
>>
>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>> new file mode 100644
>> index 0000000..0e7bcda
>> --- /dev/null
>> +++ b/docs/specs/qcow2.txt
>> @@ -0,0 +1,228 @@
>> +== Clusters ==
>> +
>> +A qcow2 image file is organized in units of constant size, which are called
>> +(host) clusters. A cluster is the unit in which all allocations are done,
>> +both for actual guest data and for image metadata.
>> +
>> +Likewise, the virtual disk as seen by the guest is divided into (guest)
>> +clusters of the same size.
>> +
>> +
>> +== Header ==
>> +
>> +The first cluster of a qcow2 image contains the file header:
>> +
>> +    Byte  0 -  3:   magic
>> +                    QCOW magic string ("QFI\xfb")
>> +
>> +          4 -  7:   version
>> +                    Version number (only valid value is 2)
>> +
>> +          8 - 15:   backing_file_offset
>> +                    Offset into the image file at which the backing file name
>> +                    is stored (NB: The string is not null terminated). 0 if the
>> +                    image doesn't have a backing file.
>> +
>> +         16 - 19:   backing_file_size
>> +                    Length of the backing file name in bytes. Must not be
>> +                    longer than 1023 bytes. Undefined if the image doesn't have
>> +                    a backing file.
>> +
>> +         20 - 23:   cluster_bits
>> +                    Number of bits that are used for addressing an offset
>> +                    within a cluster (1<<  cluster_bits is the cluster size)
>> +
>> +         24 - 31:   size
>> +                    Virtual disk size in bytes
>> +
>> +         32 - 35:   crypt_method
>> +                    0 for no encryption
>> +                    1 for AES encryption
>> +
>> +         36 - 39:   l1_size
>> +                    Number of entries in the active L1 table
>> +
>> +         40 - 47:   l1_table_offset
>> +                    Offset into the image file at which the active L1 table
>> +                    starts. Must be aligned to a cluster boundary.
>> +
>> +         48 - 55:   refcount_table_offset
>> +                    Offset into the image file at which the refcount table
>> +                    starts. Must be aligned to a cluster boundary.
>> +
>> +         56 - 59:   refcount_table_clusters
>> +                    Number of clusters that the refcount table occupies
>> +
>> +         60 - 63:   nb_snapshots
>> +                    Number of snapshots contained in the image
>> +
>> +         64 - 71:   snapshots_offset
>> +                    Offset into the image file at which the snapshot table
>> +                    starts. Must be aligned to a cluster boundary.
>> +
>> +All numbers in qcow2 are stored in Big Endian byte order.
>> +
>> +
>> +== Host cluster management ==
>> +
>> +qcow2 manages the allocation of host clusters by maintaining a reference count
>> +for each host cluster. A refcount of 0 means that the cluster is free, 1 means
>> +that it is used, and>= 2 means that it is used and any write access must
>> +perform a COW (copy on write) operation.
>> +
>> +The refcounts are managed in a two-level table. The first level is called
>> +refcount table and has a variable size (which is stored in the header). The
>> +refcount table can cover multiple clusters, however it needs to be contiguous
>> +in the image file.
>> +
>> +It contains pointers to the second level structures which are called refcount
>> +blocks and are exactly one cluster in size.
>> +
>> +Given a offset into the image file, the refcount of its cluster can be obtained
>> +as follows:
>> +
>> +    refcount_block_entries = (cluster_size / sizeof(uint16_t))
>> +
>> +    refcount_block_index = (offset / cluster_size) % refcount_table_entries
>> +    refcount_table_index = (offset / cluster_size) / refcount_table_entries
>> +
>> +    refcount_block = load_cluster(refcount_table[refcount_table_index]);
>> +    return refcount_block[refcount_block_index];
>> +
>> +Refcount table entry:
>> +
>> +    Bit  0 -  8:    Reserved (set to 0)
>> +
>> +         9 - 63:    Bits 9-63 of the offset into the image file at which the
>> +                    refcount block starts. Must be aligned to a cluster
>> +                    boundary.
>> +
>> +                    If this is 0, the corresponding refcount block has not yet
>> +                    been allocated. All refcounts managed by this refcount block
>> +                    are 0.
>> +
>> +Refcount block entry:
>> +
>> +    Bit  0 - 15:    Reference count of the cluster
>> +
>> +
>> +== Cluster mapping ==
>> +
>> +Just as for refcounts, qcow2 uses a two-level structure for the mapping of
>> +guest clusters to host clusters. They are called L1 and L2 table.
>> +
>> +The L1 table has a variable size (stored in the header) and may use multiple
>> +clusters, however it must be contiguous in the image file. L2 tables are
>> +exactly one cluster in size.
>> +
>> +Given a offset into the virtual disk, the offset into the image file can be
>> +obtained as follows:
>> +
>> +    l2_entries = (cluster_size / sizeof(uint64_t))
>> +
>> +    l2_index = (offset / cluster_size) % l2_entries
>> +    l1_index = (offset / cluster_size) / l2_entries
>> +
>> +    l2_table = load_cluster(l1_table[l1_index]);
>> +    cluster_offset = refcount_block[l2_index];
>>    
> It should be cluster_offset = l2_table[l2_index];
> Right?

Correct. Thanks for catching this.

Kevin

Patch

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
new file mode 100644
index 0000000..0e7bcda
--- /dev/null
+++ b/docs/specs/qcow2.txt
@@ -0,0 +1,228 @@ 
+== Clusters ==
+
+A qcow2 image file is organized in units of constant size, which are called
+(host) clusters. A cluster is the unit in which all allocations are done,
+both for actual guest data and for image metadata.
+
+Likewise, the virtual disk as seen by the guest is divided into (guest)
+clusters of the same size.
+
+
+== Header ==
+
+The first cluster of a qcow2 image contains the file header:
+
+    Byte  0 -  3:   magic
+                    QCOW magic string ("QFI\xfb")
+
+          4 -  7:   version
+                    Version number (only valid value is 2)
+
+          8 - 15:   backing_file_offset
+                    Offset into the image file at which the backing file name
+                    is stored (NB: The string is not null terminated). 0 if the
+                    image doesn't have a backing file.
+
+         16 - 19:   backing_file_size
+                    Length of the backing file name in bytes. Must not be
+                    longer than 1023 bytes. Undefined if the image doesn't have
+                    a backing file.
+
+         20 - 23:   cluster_bits
+                    Number of bits that are used for addressing an offset
+                    within a cluster (1 << cluster_bits is the cluster size)
+
+         24 - 31:   size
+                    Virtual disk size in bytes
+
+         32 - 35:   crypt_method
+                    0 for no encryption
+                    1 for AES encryption
+
+         36 - 39:   l1_size
+                    Number of entries in the active L1 table
+
+         40 - 47:   l1_table_offset
+                    Offset into the image file at which the active L1 table
+                    starts. Must be aligned to a cluster boundary.
+
+         48 - 55:   refcount_table_offset
+                    Offset into the image file at which the refcount table
+                    starts. Must be aligned to a cluster boundary.
+
+         56 - 59:   refcount_table_clusters
+                    Number of clusters that the refcount table occupies
+
+         60 - 63:   nb_snapshots
+                    Number of snapshots contained in the image
+
+         64 - 71:   snapshots_offset
+                    Offset into the image file at which the snapshot table
+                    starts. Must be aligned to a cluster boundary.
+
+All numbers in qcow2 are stored in Big Endian byte order.
+
+
+== Host cluster management ==
+
+qcow2 manages the allocation of host clusters by maintaining a reference count
+for each host cluster. A refcount of 0 means that the cluster is free, 1 means
+that it is used, and >= 2 means that it is used and any write access must
+perform a COW (copy on write) operation.
+
+The refcounts are managed in a two-level table. The first level is called
+refcount table and has a variable size (which is stored in the header). The
+refcount table can cover multiple clusters, however it needs to be contiguous
+in the image file.
+
+It contains pointers to the second level structures which are called refcount
+blocks and are exactly one cluster in size.
+
+Given a offset into the image file, the refcount of its cluster can be obtained
+as follows:
+
+    refcount_block_entries = (cluster_size / sizeof(uint16_t))
+
+    refcount_block_index = (offset / cluster_size) % refcount_table_entries
+    refcount_table_index = (offset / cluster_size) / refcount_table_entries
+
+    refcount_block = load_cluster(refcount_table[refcount_table_index]);
+    return refcount_block[refcount_block_index];
+
+Refcount table entry:
+
+    Bit  0 -  8:    Reserved (set to 0)
+
+         9 - 63:    Bits 9-63 of the offset into the image file at which the
+                    refcount block starts. Must be aligned to a cluster
+                    boundary.
+
+                    If this is 0, the corresponding refcount block has not yet
+                    been allocated. All refcounts managed by this refcount block
+                    are 0.
+
+Refcount block entry:
+
+    Bit  0 - 15:    Reference count of the cluster
+
+
+== Cluster mapping ==
+
+Just as for refcounts, qcow2 uses a two-level structure for the mapping of
+guest clusters to host clusters. They are called L1 and L2 table.
+
+The L1 table has a variable size (stored in the header) and may use multiple
+clusters, however it must be contiguous in the image file. L2 tables are
+exactly one cluster in size.
+
+Given a offset into the virtual disk, the offset into the image file can be
+obtained as follows:
+
+    l2_entries = (cluster_size / sizeof(uint64_t))
+
+    l2_index = (offset / cluster_size) % l2_entries
+    l1_index = (offset / cluster_size) / l2_entries
+
+    l2_table = load_cluster(l1_table[l1_index]);
+    cluster_offset = refcount_block[l2_index];
+
+    return cluster_offset + (offset % cluster_size)
+
+L1 table entry:
+
+    Bit  0 -  8:    Reserved (set to 0)
+
+         9 - 55:    Bits 9-55 of the offset into the image file at which the L2
+                    table starts. Must be aligned to a cluster boundary.
+
+        56 - 62:    Reserved (set to 0)
+
+             63:    0 for an L2 table that is unused or requires COW, 1 if its
+                    refcount is exactly one. This information is only accurate
+                    in the active L1 table.
+
+L2 table entry (for normal clusters):
+
+    Bit  0 -  8:    Reserved (set to 0)
+
+         9 - 55:    Bits 9-55 of host cluster offset. Must be aligned to a
+                    cluster boundary.
+
+        56 - 61:    Reserved (set to 0)
+
+             62:    0 (this cluster is not compressed)
+
+             63:    0 for a cluster that is unused or requires COW, 1 if its
+                    refcount is exactly one. This information is only accurate
+                    in L2 tables that are reachable from the the active L1
+                    table.
+
+L2 table entry (for compressed clusters; x = 62 - (cluster_size - 8)):
+
+    Bit  0 -  x:    Host cluster offset. This is usually _not_ aligned to a
+                    cluster boundary!
+
+       x+1 - 61:    Compressed size of the images in sectors of 512 bytes
+
+             62:    1 (this cluster is compressed using zlib)
+
+             63:    0 for a cluster that is unused or requires COW, 1 if its
+                    refcount is exactly one. This information is only accurate
+                    in L2 tables that are reachable from the the active L1
+                    table.
+
+
+== Snapshots ==
+
+qcow2 supports internal snapshots. Their basic principle of operation is to
+switch the active L1 table, so that a different set of host clusters are
+exposed to the guest.
+
+When creating a snapshot, the L1 table should be copied and the refcount of all
+L2 tables and clusters reachable form this L1 table must be increased, so that
+a write causes a COW and isn't visible in other snapshots.
+
+When loading a snapshot, bit 63 of all entries in the new active L1 table and
+all L2 tables referenced by it must be reconstructed from the refcount table
+as it doesn't need to be accurate in inactive L1 tables.
+
+A directory of all snapshots is stored in the snapshot table, a contiguous area
+in the image file, whose starting offset and length are given by the header
+fields snapshots_offset and nb_snapshots. The entries of the snapshot table
+have variable length, depending on the length of ID, name and extra data.
+
+Snapshot table entry:
+
+    Byte 0 -  7:    Offset into the image file at which the L1 table for the
+                    snapshot starts. Must be aligned to a cluster boundary.
+
+         8 - 11:    Number of entries in the L1 table of the snapshots
+
+        12 - 13:    Length of the unique ID string describing the snapshot
+
+        14 - 15:    Length of the name of the snapshot
+
+        16 - 19:    Time at which the snapshot was taken in seconds since the
+                    Epoch
+
+        20 - 23:    Subsecond part of the time at which the snapshot was taken
+                    in nanoseconds
+
+        24 - 31:    Time that the guest was running until the snapshot was
+                    taken in nanoseconds
+
+        32 - 35:    Size of the VM state in bytes. 0 if no VM state is saved.
+                    If there is VM state, it starts at the first cluster
+                    described by first L1 table entry that doesn't describe a
+                    regular guest cluster (i.e. VM state is stored like guest
+                    disk content, except that it is stored at offsets that are
+                    larger than the virtual disk presented to the guest)
+
+        36 - 39:    Size of extra data in the table entry (used for future
+                    extensions of the format)
+
+        variable:   Extra data for future extensions. Must be ignored.
+
+        variable:   Unique ID string for the snapshot (not null terminated)
+
+        variable:   Name of the snapshot (not null terminated)