Message ID | 1299584839-5688-1-git-send-email-kwolf@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, Mar 8, 2011 at 11:47 AM, Kevin Wolf <kwolf@redhat.com> wrote: > + 20 - 23: cluster_bits > + Number of bits that are used for addressing an offset > + within a cluster (1 << cluster_bits is the cluster size) > + > + 24 - 31: size > + Virtual disk size in bytes Any constraints on these two fields that should be mentioned? Stefan
Am 08.03.2011 14:13, schrieb Stefan Hajnoczi: > On Tue, Mar 8, 2011 at 11:47 AM, Kevin Wolf <kwolf@redhat.com> wrote: >> + 20 - 23: cluster_bits >> + Number of bits that are used for addressing an offset >> + within a cluster (1 << cluster_bits is the cluster size) >> + >> + 24 - 31: size >> + Virtual disk size in bytes > > Any constraints on these two fields that should be mentioned? For the size not that I'm aware of. For cluster_bits qemu restricts it to 512 <= cluster_size <= 2 MB. I think we should add 512 as a lower limit, anything smaller doesn't make sense and steals us bits that we want to use for flags. The 2 MB are more of an implementation limitation. Would you mention it here? The format shouldn't have any problem with larger sizes. Kevin
On Tue, Mar 8, 2011 at 1:31 PM, Kevin Wolf <kwolf@redhat.com> wrote: > Am 08.03.2011 14:13, schrieb Stefan Hajnoczi: >> On Tue, Mar 8, 2011 at 11:47 AM, Kevin Wolf <kwolf@redhat.com> wrote: >>> + 20 - 23: cluster_bits >>> + Number of bits that are used for addressing an offset >>> + within a cluster (1 << cluster_bits is the cluster size) >>> + >>> + 24 - 31: size >>> + Virtual disk size in bytes >> >> Any constraints on these two fields that should be mentioned? > > For the size not that I'm aware of. > > For cluster_bits qemu restricts it to 512 <= cluster_size <= 2 MB. I > think we should add 512 as a lower limit, anything smaller doesn't make > sense and steals us bits that we want to use for flags. > > The 2 MB are more of an implementation limitation. Would you mention it > here? The format shouldn't have any problem with larger sizes. It could be mentioned as an explicit implementation limit so that a third party implementing qcow2 support from scratch doesn't use the format in ways that won't work with QEMU's implementation. Stefan
On Tuesday 08 March 2011 05:17 PM, Kevin Wolf wrote: > This adds a description of the qcow2 file format to the docs/ directory. > Besides documenting what's there, which is never wrong, the document should > provide a good basis for the discussion of format extensions (called "qcow3" > in previous discussions) > > Signed-off-by: Kevin Wolf<kwolf@redhat.com> > --- > docs/specs/qcow2.txt | 228 ++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 228 insertions(+), 0 deletions(-) > create mode 100644 docs/specs/qcow2.txt > > diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt > new file mode 100644 > index 0000000..0e7bcda > --- /dev/null > +++ b/docs/specs/qcow2.txt > @@ -0,0 +1,228 @@ > +== Clusters == > + > +A qcow2 image file is organized in units of constant size, which are called > +(host) clusters. A cluster is the unit in which all allocations are done, > +both for actual guest data and for image metadata. > + > +Likewise, the virtual disk as seen by the guest is divided into (guest) > +clusters of the same size. > + > + > +== Header == > + > +The first cluster of a qcow2 image contains the file header: > + > + Byte 0 - 3: magic > + QCOW magic string ("QFI\xfb") > + > + 4 - 7: version > + Version number (only valid value is 2) > + > + 8 - 15: backing_file_offset > + Offset into the image file at which the backing file name > + is stored (NB: The string is not null terminated). 0 if the > + image doesn't have a backing file. > + > + 16 - 19: backing_file_size > + Length of the backing file name in bytes. Must not be > + longer than 1023 bytes. Undefined if the image doesn't have > + a backing file. > + > + 20 - 23: cluster_bits > + Number of bits that are used for addressing an offset > + within a cluster (1<< cluster_bits is the cluster size) > + > + 24 - 31: size > + Virtual disk size in bytes > + > + 32 - 35: crypt_method > + 0 for no encryption > + 1 for AES encryption > + > + 36 - 39: l1_size > + Number of entries in the active L1 table > + > + 40 - 47: l1_table_offset > + Offset into the image file at which the active L1 table > + starts. Must be aligned to a cluster boundary. > + > + 48 - 55: refcount_table_offset > + Offset into the image file at which the refcount table > + starts. Must be aligned to a cluster boundary. > + > + 56 - 59: refcount_table_clusters > + Number of clusters that the refcount table occupies > + > + 60 - 63: nb_snapshots > + Number of snapshots contained in the image > + > + 64 - 71: snapshots_offset > + Offset into the image file at which the snapshot table > + starts. Must be aligned to a cluster boundary. > + > +All numbers in qcow2 are stored in Big Endian byte order. > + > + > +== Host cluster management == > + > +qcow2 manages the allocation of host clusters by maintaining a reference count > +for each host cluster. A refcount of 0 means that the cluster is free, 1 means > +that it is used, and>= 2 means that it is used and any write access must > +perform a COW (copy on write) operation. > + > +The refcounts are managed in a two-level table. The first level is called > +refcount table and has a variable size (which is stored in the header). The > +refcount table can cover multiple clusters, however it needs to be contiguous > +in the image file. > + > +It contains pointers to the second level structures which are called refcount > +blocks and are exactly one cluster in size. > + > +Given a offset into the image file, the refcount of its cluster can be obtained > +as follows: > + > + refcount_block_entries = (cluster_size / sizeof(uint16_t)) > + > + refcount_block_index = (offset / cluster_size) % refcount_table_entries > + refcount_table_index = (offset / cluster_size) / refcount_table_entries > + > + refcount_block = load_cluster(refcount_table[refcount_table_index]); > + return refcount_block[refcount_block_index]; > + > +Refcount table entry: > + > + Bit 0 - 8: Reserved (set to 0) > + > + 9 - 63: Bits 9-63 of the offset into the image file at which the > + refcount block starts. Must be aligned to a cluster > + boundary. > + > + If this is 0, the corresponding refcount block has not yet > + been allocated. All refcounts managed by this refcount block > + are 0. > + > +Refcount block entry: > + > + Bit 0 - 15: Reference count of the cluster > + > + > +== Cluster mapping == > + > +Just as for refcounts, qcow2 uses a two-level structure for the mapping of > +guest clusters to host clusters. They are called L1 and L2 table. > + > +The L1 table has a variable size (stored in the header) and may use multiple > +clusters, however it must be contiguous in the image file. L2 tables are > +exactly one cluster in size. > + > +Given a offset into the virtual disk, the offset into the image file can be > +obtained as follows: > + > + l2_entries = (cluster_size / sizeof(uint64_t)) > + > + l2_index = (offset / cluster_size) % l2_entries > + l1_index = (offset / cluster_size) / l2_entries > + > + l2_table = load_cluster(l1_table[l1_index]); > + cluster_offset = refcount_block[l2_index]; > It should be cluster_offset = l2_table[l2_index]; Right? -- Dushyant
On Wed, Mar 9, 2011 at 6:08 PM, Dushyant Bansal <cs5070214@cse.iitd.ac.in> wrote: > On Tuesday 08 March 2011 05:17 PM, Kevin Wolf wrote: >> + l2_entries = (cluster_size / sizeof(uint64_t)) >> + >> + l2_index = (offset / cluster_size) % l2_entries >> + l1_index = (offset / cluster_size) / l2_entries >> + >> + l2_table = load_cluster(l1_table[l1_index]); >> + cluster_offset = refcount_block[l2_index]; >> > > It should be cluster_offset = l2_table[l2_index]; > Right? Good catch. Kevin, besides what Dushyant found it looks good. Stefan
Am 09.03.2011 19:08, schrieb Dushyant Bansal: > On Tuesday 08 March 2011 05:17 PM, Kevin Wolf wrote: >> This adds a description of the qcow2 file format to the docs/ directory. >> Besides documenting what's there, which is never wrong, the document should >> provide a good basis for the discussion of format extensions (called "qcow3" >> in previous discussions) >> >> Signed-off-by: Kevin Wolf<kwolf@redhat.com> >> --- >> docs/specs/qcow2.txt | 228 ++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 files changed, 228 insertions(+), 0 deletions(-) >> create mode 100644 docs/specs/qcow2.txt >> >> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt >> new file mode 100644 >> index 0000000..0e7bcda >> --- /dev/null >> +++ b/docs/specs/qcow2.txt >> @@ -0,0 +1,228 @@ >> +== Clusters == >> + >> +A qcow2 image file is organized in units of constant size, which are called >> +(host) clusters. A cluster is the unit in which all allocations are done, >> +both for actual guest data and for image metadata. >> + >> +Likewise, the virtual disk as seen by the guest is divided into (guest) >> +clusters of the same size. >> + >> + >> +== Header == >> + >> +The first cluster of a qcow2 image contains the file header: >> + >> + Byte 0 - 3: magic >> + QCOW magic string ("QFI\xfb") >> + >> + 4 - 7: version >> + Version number (only valid value is 2) >> + >> + 8 - 15: backing_file_offset >> + Offset into the image file at which the backing file name >> + is stored (NB: The string is not null terminated). 0 if the >> + image doesn't have a backing file. >> + >> + 16 - 19: backing_file_size >> + Length of the backing file name in bytes. Must not be >> + longer than 1023 bytes. Undefined if the image doesn't have >> + a backing file. >> + >> + 20 - 23: cluster_bits >> + Number of bits that are used for addressing an offset >> + within a cluster (1<< cluster_bits is the cluster size) >> + >> + 24 - 31: size >> + Virtual disk size in bytes >> + >> + 32 - 35: crypt_method >> + 0 for no encryption >> + 1 for AES encryption >> + >> + 36 - 39: l1_size >> + Number of entries in the active L1 table >> + >> + 40 - 47: l1_table_offset >> + Offset into the image file at which the active L1 table >> + starts. Must be aligned to a cluster boundary. >> + >> + 48 - 55: refcount_table_offset >> + Offset into the image file at which the refcount table >> + starts. Must be aligned to a cluster boundary. >> + >> + 56 - 59: refcount_table_clusters >> + Number of clusters that the refcount table occupies >> + >> + 60 - 63: nb_snapshots >> + Number of snapshots contained in the image >> + >> + 64 - 71: snapshots_offset >> + Offset into the image file at which the snapshot table >> + starts. Must be aligned to a cluster boundary. >> + >> +All numbers in qcow2 are stored in Big Endian byte order. >> + >> + >> +== Host cluster management == >> + >> +qcow2 manages the allocation of host clusters by maintaining a reference count >> +for each host cluster. A refcount of 0 means that the cluster is free, 1 means >> +that it is used, and>= 2 means that it is used and any write access must >> +perform a COW (copy on write) operation. >> + >> +The refcounts are managed in a two-level table. The first level is called >> +refcount table and has a variable size (which is stored in the header). The >> +refcount table can cover multiple clusters, however it needs to be contiguous >> +in the image file. >> + >> +It contains pointers to the second level structures which are called refcount >> +blocks and are exactly one cluster in size. >> + >> +Given a offset into the image file, the refcount of its cluster can be obtained >> +as follows: >> + >> + refcount_block_entries = (cluster_size / sizeof(uint16_t)) >> + >> + refcount_block_index = (offset / cluster_size) % refcount_table_entries >> + refcount_table_index = (offset / cluster_size) / refcount_table_entries >> + >> + refcount_block = load_cluster(refcount_table[refcount_table_index]); >> + return refcount_block[refcount_block_index]; >> + >> +Refcount table entry: >> + >> + Bit 0 - 8: Reserved (set to 0) >> + >> + 9 - 63: Bits 9-63 of the offset into the image file at which the >> + refcount block starts. Must be aligned to a cluster >> + boundary. >> + >> + If this is 0, the corresponding refcount block has not yet >> + been allocated. All refcounts managed by this refcount block >> + are 0. >> + >> +Refcount block entry: >> + >> + Bit 0 - 15: Reference count of the cluster >> + >> + >> +== Cluster mapping == >> + >> +Just as for refcounts, qcow2 uses a two-level structure for the mapping of >> +guest clusters to host clusters. They are called L1 and L2 table. >> + >> +The L1 table has a variable size (stored in the header) and may use multiple >> +clusters, however it must be contiguous in the image file. L2 tables are >> +exactly one cluster in size. >> + >> +Given a offset into the virtual disk, the offset into the image file can be >> +obtained as follows: >> + >> + l2_entries = (cluster_size / sizeof(uint64_t)) >> + >> + l2_index = (offset / cluster_size) % l2_entries >> + l1_index = (offset / cluster_size) / l2_entries >> + >> + l2_table = load_cluster(l1_table[l1_index]); >> + cluster_offset = refcount_block[l2_index]; >> > It should be cluster_offset = l2_table[l2_index]; > Right? Correct. Thanks for catching this. Kevin
diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt new file mode 100644 index 0000000..0e7bcda --- /dev/null +++ b/docs/specs/qcow2.txt @@ -0,0 +1,228 @@ +== Clusters == + +A qcow2 image file is organized in units of constant size, which are called +(host) clusters. A cluster is the unit in which all allocations are done, +both for actual guest data and for image metadata. + +Likewise, the virtual disk as seen by the guest is divided into (guest) +clusters of the same size. + + +== Header == + +The first cluster of a qcow2 image contains the file header: + + Byte 0 - 3: magic + QCOW magic string ("QFI\xfb") + + 4 - 7: version + Version number (only valid value is 2) + + 8 - 15: backing_file_offset + Offset into the image file at which the backing file name + is stored (NB: The string is not null terminated). 0 if the + image doesn't have a backing file. + + 16 - 19: backing_file_size + Length of the backing file name in bytes. Must not be + longer than 1023 bytes. Undefined if the image doesn't have + a backing file. + + 20 - 23: cluster_bits + Number of bits that are used for addressing an offset + within a cluster (1 << cluster_bits is the cluster size) + + 24 - 31: size + Virtual disk size in bytes + + 32 - 35: crypt_method + 0 for no encryption + 1 for AES encryption + + 36 - 39: l1_size + Number of entries in the active L1 table + + 40 - 47: l1_table_offset + Offset into the image file at which the active L1 table + starts. Must be aligned to a cluster boundary. + + 48 - 55: refcount_table_offset + Offset into the image file at which the refcount table + starts. Must be aligned to a cluster boundary. + + 56 - 59: refcount_table_clusters + Number of clusters that the refcount table occupies + + 60 - 63: nb_snapshots + Number of snapshots contained in the image + + 64 - 71: snapshots_offset + Offset into the image file at which the snapshot table + starts. Must be aligned to a cluster boundary. + +All numbers in qcow2 are stored in Big Endian byte order. + + +== Host cluster management == + +qcow2 manages the allocation of host clusters by maintaining a reference count +for each host cluster. A refcount of 0 means that the cluster is free, 1 means +that it is used, and >= 2 means that it is used and any write access must +perform a COW (copy on write) operation. + +The refcounts are managed in a two-level table. The first level is called +refcount table and has a variable size (which is stored in the header). The +refcount table can cover multiple clusters, however it needs to be contiguous +in the image file. + +It contains pointers to the second level structures which are called refcount +blocks and are exactly one cluster in size. + +Given a offset into the image file, the refcount of its cluster can be obtained +as follows: + + refcount_block_entries = (cluster_size / sizeof(uint16_t)) + + refcount_block_index = (offset / cluster_size) % refcount_table_entries + refcount_table_index = (offset / cluster_size) / refcount_table_entries + + refcount_block = load_cluster(refcount_table[refcount_table_index]); + return refcount_block[refcount_block_index]; + +Refcount table entry: + + Bit 0 - 8: Reserved (set to 0) + + 9 - 63: Bits 9-63 of the offset into the image file at which the + refcount block starts. Must be aligned to a cluster + boundary. + + If this is 0, the corresponding refcount block has not yet + been allocated. All refcounts managed by this refcount block + are 0. + +Refcount block entry: + + Bit 0 - 15: Reference count of the cluster + + +== Cluster mapping == + +Just as for refcounts, qcow2 uses a two-level structure for the mapping of +guest clusters to host clusters. They are called L1 and L2 table. + +The L1 table has a variable size (stored in the header) and may use multiple +clusters, however it must be contiguous in the image file. L2 tables are +exactly one cluster in size. + +Given a offset into the virtual disk, the offset into the image file can be +obtained as follows: + + l2_entries = (cluster_size / sizeof(uint64_t)) + + l2_index = (offset / cluster_size) % l2_entries + l1_index = (offset / cluster_size) / l2_entries + + l2_table = load_cluster(l1_table[l1_index]); + cluster_offset = refcount_block[l2_index]; + + return cluster_offset + (offset % cluster_size) + +L1 table entry: + + Bit 0 - 8: Reserved (set to 0) + + 9 - 55: Bits 9-55 of the offset into the image file at which the L2 + table starts. Must be aligned to a cluster boundary. + + 56 - 62: Reserved (set to 0) + + 63: 0 for an L2 table that is unused or requires COW, 1 if its + refcount is exactly one. This information is only accurate + in the active L1 table. + +L2 table entry (for normal clusters): + + Bit 0 - 8: Reserved (set to 0) + + 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a + cluster boundary. + + 56 - 61: Reserved (set to 0) + + 62: 0 (this cluster is not compressed) + + 63: 0 for a cluster that is unused or requires COW, 1 if its + refcount is exactly one. This information is only accurate + in L2 tables that are reachable from the the active L1 + table. + +L2 table entry (for compressed clusters; x = 62 - (cluster_size - 8)): + + Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a + cluster boundary! + + x+1 - 61: Compressed size of the images in sectors of 512 bytes + + 62: 1 (this cluster is compressed using zlib) + + 63: 0 for a cluster that is unused or requires COW, 1 if its + refcount is exactly one. This information is only accurate + in L2 tables that are reachable from the the active L1 + table. + + +== Snapshots == + +qcow2 supports internal snapshots. Their basic principle of operation is to +switch the active L1 table, so that a different set of host clusters are +exposed to the guest. + +When creating a snapshot, the L1 table should be copied and the refcount of all +L2 tables and clusters reachable form this L1 table must be increased, so that +a write causes a COW and isn't visible in other snapshots. + +When loading a snapshot, bit 63 of all entries in the new active L1 table and +all L2 tables referenced by it must be reconstructed from the refcount table +as it doesn't need to be accurate in inactive L1 tables. + +A directory of all snapshots is stored in the snapshot table, a contiguous area +in the image file, whose starting offset and length are given by the header +fields snapshots_offset and nb_snapshots. The entries of the snapshot table +have variable length, depending on the length of ID, name and extra data. + +Snapshot table entry: + + Byte 0 - 7: Offset into the image file at which the L1 table for the + snapshot starts. Must be aligned to a cluster boundary. + + 8 - 11: Number of entries in the L1 table of the snapshots + + 12 - 13: Length of the unique ID string describing the snapshot + + 14 - 15: Length of the name of the snapshot + + 16 - 19: Time at which the snapshot was taken in seconds since the + Epoch + + 20 - 23: Subsecond part of the time at which the snapshot was taken + in nanoseconds + + 24 - 31: Time that the guest was running until the snapshot was + taken in nanoseconds + + 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved. + If there is VM state, it starts at the first cluster + described by first L1 table entry that doesn't describe a + regular guest cluster (i.e. VM state is stored like guest + disk content, except that it is stored at offsets that are + larger than the virtual disk presented to the guest) + + 36 - 39: Size of extra data in the table entry (used for future + extensions of the format) + + variable: Extra data for future extensions. Must be ignored. + + variable: Unique ID string for the snapshot (not null terminated) + + variable: Name of the snapshot (not null terminated)
This adds a description of the qcow2 file format to the docs/ directory. Besides documenting what's there, which is never wrong, the document should provide a good basis for the discussion of format extensions (called "qcow3" in previous discussions) Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- docs/specs/qcow2.txt | 228 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 228 insertions(+), 0 deletions(-) create mode 100644 docs/specs/qcow2.txt