diff mbox

[V4,01/10] specs/qcow2: add compress format extension

Message ID 1500560441-5670-2-git-send-email-pl@kamp.de
State New
Headers show

Commit Message

Peter Lieven July 20, 2017, 2:20 p.m. UTC
Signed-off-by: Peter Lieven <pl@kamp.de>
---
 docs/interop/qcow2.txt | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

Comments

Eric Blake July 20, 2017, 3:52 p.m. UTC | #1
On 07/20/2017 09:20 AM, Peter Lieven wrote:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  docs/interop/qcow2.txt | 43 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 42 insertions(+), 1 deletion(-)
> 

> +== Compress format extension ==
> +
> +The compress format extension is an optional header extension. It provides
> +the ability to specify the compress algorithm and compress parameters
> +that are used for compressed clusters. This new header MUST be present if
> +the incompatible-feature bit "compress format bit" is set and MUST be absent
> +otherwise.
> +
> +Note: Ommitting the incompatible "Compress format bit" results in the usage

s/Ommitting/Omitting/

> +of zlib compression with default compression level (default before QEMU 2.10).
> +However, this old default has a smaller compression window size which results in
> +lower compression ratio and slightly worse compression speed compared to
> +explicity specifying zlib compression with default compression level in the

s/explicity/explicitly/

> +compress format extension.

If window size affects decompression, then we absolutely must specify
what window size is in use, both for the old-style compression (bug that
we haven't mentioned window size in the past), and for the new-style.
That is, if you have to tell zlib at decompression time what size window
was used at compression, then our choices for window size should be
mentioned in this spec.  Furthermore, if window size matters, it sounds
like it should be a zlib-specific tunable.  I really would like to be
able to document that having the extension header present with
parameters XYZ is the same as omitting the extension header (but that is
only doable if window size is a tunable parameter).

I haven't yet checked your code implementation to see where you are
setting window sizes, to know if window size is something that should be
a tunable in the file format.
Peter Lieven July 20, 2017, 4:26 p.m. UTC | #2
Am 20.07.2017 um 17:52 schrieb Eric Blake:
> On 07/20/2017 09:20 AM, Peter Lieven wrote:
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  docs/interop/qcow2.txt | 43 ++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 42 insertions(+), 1 deletion(-)
>>
>> +== Compress format extension ==
>> +
>> +The compress format extension is an optional header extension. It provides
>> +the ability to specify the compress algorithm and compress parameters
>> +that are used for compressed clusters. This new header MUST be present if
>> +the incompatible-feature bit "compress format bit" is set and MUST be absent
>> +otherwise.
>> +
>> +Note: Ommitting the incompatible "Compress format bit" results in the usage
> s/Ommitting/Omitting/
>
>> +of zlib compression with default compression level (default before QEMU 2.10).
>> +However, this old default has a smaller compression window size which results in
>> +lower compression ratio and slightly worse compression speed compared to
>> +explicity specifying zlib compression with default compression level in the
> s/explicity/explicitly/
>
>> +compress format extension.
> If window size affects decompression, then we absolutely must specify
> what window size is in use, both for the old-style compression (bug that
> we haven't mentioned window size in the past), and for the new-style.
> That is, if you have to tell zlib at decompression time what size window
> was used at compression, then our choices for window size should be
> mentioned in this spec.  Furthermore, if window size matters, it sounds
> like it should be a zlib-specific tunable.  I really would like to be
> able to document that having the extension header present with
> parameters XYZ is the same as omitting the extension header (but that is
> only doable if window size is a tunable parameter).
>
> I haven't yet checked your code implementation to see where you are
> setting window sizes, to know if window size is something that should be
> a tunable in the file format.
>
You don't have to know the window size that was used. The docs of zlib

just say, that the windowSize specified at inflate must be greater or equal

to the one used at deflate. So if we change the code to use 15 we can safely

decompress old clusters that where compressed with 12.


Peter
Eric Blake July 20, 2017, 6:31 p.m. UTC | #3
On 07/20/2017 11:26 AM, Peter Lieven wrote:
>> I haven't yet checked your code implementation to see where you are
>> setting window sizes, to know if window size is something that should be
>> a tunable in the file format.
>>
> You don't have to know the window size that was used. The docs of zlib
> 
> just say, that the windowSize specified at inflate must be greater or equal
> 
> to the one used at deflate. So if we change the code to use 15 we can safely
> 
> decompress old clusters that where compressed with 12.

Then you DO have to know the window size.  If I write my own qcow2
parser, and don't know what size you picked on compression, then how do
I know that the size I pick is >= your size during my decompression?

Even if you DON'T want size to be a tunable (and I'm on the fence on
that one), you STILL need the spec to state that for maximum
compatibility, when the incompatible feature is off, implementations
MUST use window size 12 for compression; and that when the compression
extension is used, implementations MUST use window size 15.

Or, by having window size be a tunable in the extension header allows
implementations to record window size, then you allow implemenations to
explicitly KNOW what window size they must meet or exceed.
Peter Lieven July 20, 2017, 6:59 p.m. UTC | #4
Am 20.07.2017 um 20:31 schrieb Eric Blake:
> On 07/20/2017 11:26 AM, Peter Lieven wrote:
>>> I haven't yet checked your code implementation to see where you are
>>> setting window sizes, to know if window size is something that should be
>>> a tunable in the file format.
>>>
>> You don't have to know the window size that was used. The docs of zlib
>>
>> just say, that the windowSize specified at inflate must be greater or equal
>>
>> to the one used at deflate. So if we change the code to use 15 we can safely
>>
>> decompress old clusters that where compressed with 12.
> Then you DO have to know the window size.  If I write my own qcow2
> parser, and don't know what size you picked on compression, then how do
> I know that the size I pick is >= your size during my decompression?
>
> Even if you DON'T want size to be a tunable (and I'm on the fence on
> that one), you STILL need the spec to state that for maximum
> compatibility, when the incompatible feature is off, implementations
> MUST use window size 12 for compression; and that when the compression
> extension is used, implementations MUST use window size 15.
>
> Or, by having window size be a tunable in the extension header allows
> implementations to record window size, then you allow implemenations to
> explicitly KNOW what window size they must meet or exceed.
>

But they still have to now what values to use if the header is absent.


Bottom line, you are right. Its cleaner to have that value in the header and

default it to zlib, level 0, windowBits 12 if the header is absent and on the other

side don't write the header if these values are used.


I checked the spec again expect for the windowBits nothing else is passed as

a parameter to inflateinit2.


Here is what is written about the windowBits provided to inflateInit2:

" The windowBits parameter is the base two logarithm of the maximum window size (the size of the history buffer). It should be in the range 8..15 for this version of the library. The default value is 15 if inflateInit is used instead. windowBits must be greater than or equal to the windowBitsvalue provided to deflateInit2() while compressing, or it must be equal to 15 if deflateInit2() was not used. If a compressed stream with a larger window size is given as input, inflate() will return with the error code Z_DATA_ERROR instead of trying to allocate a larger window."


I checked through the parameters of some compression algorithms. Most obviously have a positive interger for a compression level. And at least some have not a window as zlib, but a dictionary size which is also a power of two. So it might be reasonable to have that second

uint8_t also in the default header for a window/dictionary size exponent.


So it would propose to update the header like this:


    Byte  0 - 13:  compress_format_name
              14:  compress_level (uint8_t)
              15:  compress_window_exp (uint8_t)


Peter
diff mbox

Patch

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index d7fdb1f..ef5abb2 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -86,7 +86,12 @@  in the description of a field.
                                 be written to (unless for regaining
                                 consistency).
 
-                    Bits 2-63:  Reserved (set to 0)
+                    Bit 2:      Compress format bit.  If and only if this bit
+                                is set then the compress format extension
+                                MUST be present and MUST be parsed and checked
+                                for compatibility.
+
+                    Bits 3-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -137,6 +142,7 @@  be stored. Each extension has a structure like the following:
                         0x6803f857 - Feature name table
                         0x23852875 - Bitmaps extension
                         0x0537be77 - Full disk encryption header pointer
+                        0xC03183A3 - Compress format extension
                         other      - Unknown header extension, can be safely
                                      ignored
 
@@ -311,6 +317,41 @@  The algorithms used for encryption vary depending on the method
    in the LUKS header, with the physical disk sector as the
    input tweak.
 
+
+== Compress format extension ==
+
+The compress format extension is an optional header extension. It provides
+the ability to specify the compress algorithm and compress parameters
+that are used for compressed clusters. This new header MUST be present if
+the incompatible-feature bit "compress format bit" is set and MUST be absent
+otherwise.
+
+Note: Ommitting the incompatible "Compress format bit" results in the usage
+of zlib compression with default compression level (default before QEMU 2.10).
+However, this old default has a smaller compression window size which results in
+lower compression ratio and slightly worse compression speed compared to
+explicity specifying zlib compression with default compression level in the
+compress format extension.
+
+The fields of the compress format extension are:
+
+    Byte  0 - 14:  compress_format_name (padded with zeros, but not
+                   necessarily null terminated if it has full length).
+                   Valid compression format names currently are:
+
+                   zlib: Standard zlib deflate compression
+
+              15:  compress_level (uint8_t)
+
+                   0 = default compress level (valid for all formats, default)
+
+                   Additional valid compression levels for zlib compression:
+
+                   All values between 1 and 9. 1 gives best speed, 9 gives best
+                   compression. The default compression level is defined by zlib
+                   and currently defaults to 6.
+
+
 == Host cluster management ==
 
 qcow2 manages the allocation of host clusters by maintaining a reference count