diff mbox series

[v3,1/3] qcow2: Document some maximum size constraints

Message ID 20181113230319.1008531-2-eblake@redhat.com
State New
Headers show
Series [v3,1/3] qcow2: Document some maximum size constraints | expand

Commit Message

Eric Blake Nov. 13, 2018, 11:03 p.m. UTC
Although off_t permits up to 63 bits (8EB) of file offsets, in
practice, we're going to hit other limits first.  Document some
of those limits in the qcow2 spec, and how choice of cluster size
can influence some of the limits.

While we cannot map any virtual cluster to any address higher than
64 PB (56 bits) (due to the current L1/L2 field encoding stopping
at bit 55), the refcount table can currently be sized larger.  For
comparison, ext4 with 4k blocks caps files at 16PB.

Another interesting limit: for compressed clusters, the L2 layout
requires an ever-smaller maximum host offset as cluster size gets
larger, down to a 512 TB maximum with 2M clusters.

Signed-off-by: Eric Blake <eblake@redhat.com>

--
v8: don't try and limit refcount (R-b dropped)
v5: even more wording tweaks
v4: more wording tweaks
v3: new patch
---
 docs/interop/qcow2.txt | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

Comments

Alberto Garcia Nov. 15, 2018, 3:17 p.m. UTC | #1
On Wed 14 Nov 2018 12:03:17 AM CET, Eric Blake wrote:
> @@ -427,7 +451,9 @@ Standard Cluster Descriptor:
>  Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
>
>      Bit  0 - x-1:   Host cluster offset. This is usually _not_ aligned to a
> -                    cluster or sector boundary!
> +                    cluster or sector boundary!  If cluster_bits is
> +                    small enough that this field includes bits beyond
> +                    55, those upper bits must be set to 0.

While I think that it's good to have a 56 bits upper limit for both
compressed and uncompressed clusters, I'm wondering: is it theoretically
possible to have data clusters above 64PB if they're all compressed?

Berto
Eric Blake Nov. 15, 2018, 4:24 p.m. UTC | #2
On 11/15/18 9:17 AM, Alberto Garcia wrote:
> On Wed 14 Nov 2018 12:03:17 AM CET, Eric Blake wrote:
>> @@ -427,7 +451,9 @@ Standard Cluster Descriptor:
>>   Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
>>
>>       Bit  0 - x-1:   Host cluster offset. This is usually _not_ aligned to a
>> -                    cluster or sector boundary!
>> +                    cluster or sector boundary!  If cluster_bits is
>> +                    small enough that this field includes bits beyond
>> +                    55, those upper bits must be set to 0.
> 
> While I think that it's good to have a 56 bits upper limit for both
> compressed and uncompressed clusters, I'm wondering: is it theoretically
> possible to have data clusters above 64PB if they're all compressed?

The question is only applicable for cluster sizes of 8k and smaller. 
With an 8k cluster image and the qcow2.h limit of a 32MiB L1 table (4096 
clusters, each of which holds 1024 L2 entries, and each L2 table holds 
1024 cluster entries), you can have up to 4k * 1k * 1k * 8k == 32T of 
guest size.  You'd need a LOT of metadata (for example, over 2000 
internal snapshots) before the host file would reach 64PB to even need 
to store compressed clusters at a host offset that large.  At the same 
time, qemu would limit you to an 8MiB refcount table (1024 clusters, 
each of which holds 1024 refblocks, which in turn hold a default of 4096 
refcounts, but with a refcount_order of 0 could hold 64k refcounts), 
which results in qemu allowing your maximum host offset to be 1k * 1k * 
64k * 8k == 512T, which means qemu will refuse to generate or open such 
an image in the first place.  So if you have an image that tries to 
store a compressed data cluster above host offset 64PB, qemu is unable 
to process that image.

But your question does mean this other part of my patch:

 >
 >            24 - 31:   size
 > -                    Virtual disk size in bytes
 > +                    Virtual disk size in bytes.
 > +
 > +                    Note: with a 2 MB cluster size, the maximum
 > +                    virtual size is 2 EB (61 bits) for a fully sparse
 > +                    file; however, L1/L2 table layouts limit an image
 > +                    to no more than 64 PB (56 bits) of populated
 > +                    clusters, and an image may hit other limits first
 > +                    (such as a file system's maximum size).  With a
 > +                    512 byte cluster size, the maximum virtual size
 > +                    drops to 128 GB (37 bits).

is misleading.  Elsewhere, we mention for cluster_bits:

                     Note: qemu as of today has an implementation limit 
of 2 MB
                     as the maximum cluster size and won't be able to 
open images
                     with larger cluster sizes.

and looking at the code in qcow2.h, the 2EB limits on maximum virtual 
size is NOT an inherent limit in the qcow2 file format, but rather a 
result of qemu's implementation refusing to size the L1 table larger 
than 32MiB.  If you allow a larger L1 table, you can get to larger 
virtual addresses.  So I need to fix this patch [again] to add in 
wording about this being a qemu limit.
diff mbox series

Patch

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 845d40a086d..89faf7b99f3 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -40,7 +40,16 @@  The first cluster of a qcow2 image contains the file header:
                     with larger cluster sizes.

          24 - 31:   size
-                    Virtual disk size in bytes
+                    Virtual disk size in bytes.
+
+                    Note: with a 2 MB cluster size, the maximum
+                    virtual size is 2 EB (61 bits) for a fully sparse
+                    file; however, L1/L2 table layouts limit an image
+                    to no more than 64 PB (56 bits) of populated
+                    clusters, and an image may hit other limits first
+                    (such as a file system's maximum size).  With a
+                    512 byte cluster size, the maximum virtual size
+                    drops to 128 GB (37 bits).

          32 - 35:   crypt_method
                     0 for no encryption
@@ -326,6 +335,11 @@  in the image file.
 It contains pointers to the second level structures which are called refcount
 blocks and are exactly one cluster in size.

+Although the refcount table can reserve clusters past 64 PB (56 bits)
+(assuming the underlying protocol can even be sized that large), note
+that some qcow2 metadata such as L1/L2 tables must point to clusters
+prior to that point.
+
 Given an offset into the image file, the refcount of its cluster can be
 obtained as follows:

@@ -365,6 +379,16 @@  The L1 table has a variable size (stored in the header) and may use multiple
 clusters, however it must be contiguous in the image file. L2 tables are
 exactly one cluster in size.

+The L1 and L2 tables have implications on the maximum virtual file
+size; a larger cluster size is required for the guest to have access
+to more space.  Furthermore, a virtual cluster must currently map to a
+host offset below 64 PB (56 bits) (although this limit could be
+relaxed by putting reserved bits into use).  Additionally, as cluster
+size increases, the maximum host offset for a compressed cluster is
+reduced (a 2M cluster size requires compressed clusters to reside
+below 512 TB (49 bits), and this limit cannot be relaxed without an
+incompatible layout change).
+
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:

@@ -427,7 +451,9 @@  Standard Cluster Descriptor:
 Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):

     Bit  0 - x-1:   Host cluster offset. This is usually _not_ aligned to a
-                    cluster or sector boundary!
+                    cluster or sector boundary!  If cluster_bits is
+                    small enough that this field includes bits beyond
+                    55, those upper bits must be set to 0.

          x - 61:    Number of additional 512-byte sectors used for the
                     compressed data, beyond the sector containing the offset