Patchwork [RFC,V6,01/33] qcow2: Add deduplication to the qcow2 specification.

login
register
mail settings
Submitter Benoît Canet
Date Feb. 6, 2013, 12:31 p.m.
Message ID <1360153926-9492-2-git-send-email-benoit@irqsave.net>
Download mbox | patch
Permalink /patch/218618/
State New
Headers show

Comments

Benoît Canet - Feb. 6, 2013, 12:31 p.m.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 docs/specs/qcow2.txt |  105 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 103 insertions(+), 2 deletions(-)
Stefan Hajnoczi - Feb. 6, 2013, 4:07 p.m.
On Wed, Feb 06, 2013 at 01:31:34PM +0100, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  docs/specs/qcow2.txt |  105 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 103 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 36a559d..8e52de1 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -80,7 +80,12 @@ in the description of a field.
>                                  tables to repair refcounts before accessing the
>                                  image.
>  
> -                    Bits 1-63:  Reserved (set to 0)
> +                    Bit 1:      Deduplication bit.  If this bit is set then
> +                                deduplication is used on this image.
> +                                L2 tables size 64KB is different from
> +                                cluster size 4KB.

Does that mean L2 table size is fixed at 64KB when dedup is enabled with
no way for the user to set it?

I guess this is okay but please document it explicitly.

> +
> +                    Bits 2-63:  Reserved (set to 0)
>  
>           80 -  87:  compatible_features
>                      Bitmask of compatible features. An implementation can
> @@ -116,6 +121,7 @@ be stored. Each extension has a structure like the following:
>                          0x00000000 - End of the header extension area
>                          0xE2792ACA - Backing file format name
>                          0x6803f857 - Feature name table
> +                        0xCD8E819B - Deduplication
>                          other      - Unknown header extension, can be safely
>                                       ignored
>  
> @@ -159,6 +165,101 @@ the header extension data. Each entry look like this:
>                      terminated if it has full length)
>  
>  
> +== Deduplication ==
> +
> +The deduplication extension contains information concerning deduplication.
> +
> +    Byte   0 - 7:   Offset of the RAM deduplication table (RAM lookup)
> +
> +          8 - 11:   Size of the RAM deduplication table = number of L1 64-bit
> +                    pointers
> +
> +              12:   Hash algo enum field
> +                        0: SHA-256
> +                        1: SHA3
> +                        2: SKEIN-256
> +
> +              13:   Dedup strategies bitmap
> +                        0: RAM based hash lookup (always set to 1 for now)
> +                        1: Disk based hash lookup
> +                        2: Deduplication running if set to 1

I don't understand the purpose of bit 2.  Please document it further.

> +
> +        14 - 69:    Set to zero and reserved for future use
> +
> +Disk based lookup structure will be described in a future QCOW2 specification.

In that case please leave out the disk based lookup bit for now.  Add it
later when you submit the disk based hash lookup patches.

> +
> +== Deduplication table (RAM method) ==
> +
> +The deduplication table maps a physical offset to a data hash and
> +logical offset. It is used to permanently store the information to
> +do the deduplication. It is loaded at startup into a RAM based representation
> +used to do the lookups.
> +
> +The deduplication table contains 64-bit offsets to the level 2 deduplication
> +table blocks.
> +Each entry of these blocks contains a 32-byte SHA256 hash followed by the

Refine this to explain how Skein and SHA3 fit into the structure?

> +64-bit logical offset of the first encountered cluster having this hash.

I'm not sure if "first encountered" is always correct.  After the first
encountered logical cluster changes contents, what happens?

> +== Deduplication table schematic (RAM method) ==
> +
> +0       l1_dedup_index                                              Size
> +              |
> +|--------------------------------------------------------------------|
> +|             |                                                      |
> +|             |        L1 Deduplication table                        |
> +|             |                                                      |
> +|--------------------------------------------------------------------|
> +              |
> +              |
> +              |
> +0             |           l2_dedup_block_entries
> +              |
> +|---------------------------------|
> +|                                 |
> +|    L2 deduplication block       |
> +|                                 |
> +|                 l2_dedup_index  |
> +|---------------------------------|
> +                         |
> +         0               |              40
> +                         |
> +         |-------------------------------|
> +         |                               |
> +         |    Deduplication table entry  |
> +         |                               |
> +         |-------------------------------|
> +
> +
> +== Deduplication table entry description (RAM method) ==
> +
> +Each L2 deduplication table entry has the following structure:
> +
> +    Byte  0 - 31:   hash of data cluster
> +
> +         32 - 39:   Logical offset of first encountered block having
> +                    this hash
> +
> +== Deduplication table arithmetics (RAM method) ==
> +
> +cluster_size = 4096
> +dedup_block_size = 65536 * 5
> +l2_size = 65536 * 16 (16 factor is from the smaller cluster_size)
> +refcount_order must be >= 4
> +
> +Entries in the deduplication table are ordered by physical cluster index.
> +
> +The number of entries in an l2 deduplication table block is :
> +l2_dedup_block_entries = FLOOR(dedup_block_size / (32 + 8))
> +
> +The index in the level 1 deduplication table is :
> +l1_dedup_index = physical_cluster_index / l2_block_cluster_entries
> +
> +The index in the level 2 deduplication table is:
> +l2_dedup_index = physical_cluster_index % l2_block_cluster_entries
> +
> +The 16 remaining bytes in each l2 deduplication blocks are set to zero and
> +reserved for a future usage.
> +
>  == Host cluster management ==
>  
>  qcow2 manages the allocation of host clusters by maintaining a reference count
> @@ -211,7 +312,7 @@ guest clusters to host clusters. They are called L1 and L2 table.
>  
>  The L1 table has a variable size (stored in the header) and may use multiple
>  clusters, however it must be contiguous in the image file. L2 tables are
> -exactly one cluster in size.
> +exactly one cluster in size excepted for the deduplication case.

Please either define their size for the dedup case or refer to where in
the spec the value can be found.
Eric Blake - Feb. 6, 2013, 4:31 p.m.
On 02/06/2013 05:31 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  docs/specs/qcow2.txt |  105 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 103 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 36a559d..8e52de1 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -80,7 +80,12 @@ in the description of a field.
>                                  tables to repair refcounts before accessing the
>                                  image.
>  
> -                    Bits 1-63:  Reserved (set to 0)
> +                    Bit 1:      Deduplication bit.  If this bit is set then
> +                                deduplication is used on this image.
> +                                L2 tables size 64KB is different from
> +                                cluster size 4KB.

If you are going to enforce other aspects of the file when deduplication
is enabled, such as forcing a cluster size of 4kb and forcing the use of
a 16-bit refcount on each cluster, then spell it out here.

> +
> +              13:   Dedup strategies bitmap
> +                        0: RAM based hash lookup (always set to 1 for now)
> +                        1: Disk based hash lookup
> +                        2: Deduplication running if set to 1
> +
> +        14 - 69:    Set to zero and reserved for future use
> +
> +Disk based lookup structure will be described in a future QCOW2 specification.

It looks weird to define bit 0 and 2 as the only valid bits, while
reserving bit 1 but stating it cannot be used.  Also, in the future, can
you use both RAM and disk lookup simultaneously, or are you forced to
choose only one or the other?  Depending on how you answer that, I'd
recommend one of these two approaches:

Mutually exclusive:

Now:
13: Dedup flags
    0: Deduplication is running if set to 1
Future patch:
13: Dedup flags
    0: Deduplication is running if set to 1
    1: hash lookup mode: 0 for RAM, 1 for disk

Both supported at once:

Now:
13: Dedup flags
    0: Deduplication is running if set to 1
    1: use RAM lookup (always set to 1 for now)
Future patch:
13: Dedup flags
    0: Deduplication is running if set to 1
    1: use RAM lookup if set to 1
    2: use disk lookup if set to 1
    Bits 1 and 2 cannot both be 0.

> +== Deduplication table arithmetics (RAM method) ==
> +
> +cluster_size = 4096
> +dedup_block_size = 65536 * 5
> +l2_size = 65536 * 16 (16 factor is from the smaller cluster_size)
> +refcount_order must be >= 4

Make restrictions on cluster_size and refcount_order more explicit at
the point where you introduce the deduplication feature bit.

> +
> +The 16 remaining bytes in each l2 deduplication blocks are set to zero and

s/blocks/block/

>  
>  The L1 table has a variable size (stored in the header) and may use multiple
>  clusters, however it must be contiguous in the image file. L2 tables are
> -exactly one cluster in size.
> +exactly one cluster in size excepted for the deduplication case.

s/size excepted/size; except/
s/case./case, where clusters are fixed at 4k and the L2 table is fixed
at 64k./

Patch

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 36a559d..8e52de1 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -80,7 +80,12 @@  in the description of a field.
                                 tables to repair refcounts before accessing the
                                 image.
 
-                    Bits 1-63:  Reserved (set to 0)
+                    Bit 1:      Deduplication bit.  If this bit is set then
+                                deduplication is used on this image.
+                                L2 tables size 64KB is different from
+                                cluster size 4KB.
+
+                    Bits 2-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -116,6 +121,7 @@  be stored. Each extension has a structure like the following:
                         0x00000000 - End of the header extension area
                         0xE2792ACA - Backing file format name
                         0x6803f857 - Feature name table
+                        0xCD8E819B - Deduplication
                         other      - Unknown header extension, can be safely
                                      ignored
 
@@ -159,6 +165,101 @@  the header extension data. Each entry look like this:
                     terminated if it has full length)
 
 
+== Deduplication ==
+
+The deduplication extension contains information concerning deduplication.
+
+    Byte   0 - 7:   Offset of the RAM deduplication table (RAM lookup)
+
+          8 - 11:   Size of the RAM deduplication table = number of L1 64-bit
+                    pointers
+
+              12:   Hash algo enum field
+                        0: SHA-256
+                        1: SHA3
+                        2: SKEIN-256
+
+              13:   Dedup strategies bitmap
+                        0: RAM based hash lookup (always set to 1 for now)
+                        1: Disk based hash lookup
+                        2: Deduplication running if set to 1
+
+        14 - 69:    Set to zero and reserved for future use
+
+Disk based lookup structure will be described in a future QCOW2 specification.
+
+== Deduplication table (RAM method) ==
+
+The deduplication table maps a physical offset to a data hash and
+logical offset. It is used to permanently store the information to
+do the deduplication. It is loaded at startup into a RAM based representation
+used to do the lookups.
+
+The deduplication table contains 64-bit offsets to the level 2 deduplication
+table blocks.
+Each entry of these blocks contains a 32-byte SHA256 hash followed by the
+64-bit logical offset of the first encountered cluster having this hash.
+
+== Deduplication table schematic (RAM method) ==
+
+0       l1_dedup_index                                              Size
+              |
+|--------------------------------------------------------------------|
+|             |                                                      |
+|             |        L1 Deduplication table                        |
+|             |                                                      |
+|--------------------------------------------------------------------|
+              |
+              |
+              |
+0             |           l2_dedup_block_entries
+              |
+|---------------------------------|
+|                                 |
+|    L2 deduplication block       |
+|                                 |
+|                 l2_dedup_index  |
+|---------------------------------|
+                         |
+         0               |              40
+                         |
+         |-------------------------------|
+         |                               |
+         |    Deduplication table entry  |
+         |                               |
+         |-------------------------------|
+
+
+== Deduplication table entry description (RAM method) ==
+
+Each L2 deduplication table entry has the following structure:
+
+    Byte  0 - 31:   hash of data cluster
+
+         32 - 39:   Logical offset of first encountered block having
+                    this hash
+
+== Deduplication table arithmetics (RAM method) ==
+
+cluster_size = 4096
+dedup_block_size = 65536 * 5
+l2_size = 65536 * 16 (16 factor is from the smaller cluster_size)
+refcount_order must be >= 4
+
+Entries in the deduplication table are ordered by physical cluster index.
+
+The number of entries in an l2 deduplication table block is :
+l2_dedup_block_entries = FLOOR(dedup_block_size / (32 + 8))
+
+The index in the level 1 deduplication table is :
+l1_dedup_index = physical_cluster_index / l2_block_cluster_entries
+
+The index in the level 2 deduplication table is:
+l2_dedup_index = physical_cluster_index % l2_block_cluster_entries
+
+The 16 remaining bytes in each l2 deduplication blocks are set to zero and
+reserved for a future usage.
+
 == Host cluster management ==
 
 qcow2 manages the allocation of host clusters by maintaining a reference count
@@ -211,7 +312,7 @@  guest clusters to host clusters. They are called L1 and L2 table.
 
 The L1 table has a variable size (stored in the header) and may use multiple
 clusters, however it must be contiguous in the image file. L2 tables are
-exactly one cluster in size.
+exactly one cluster in size excepted for the deduplication case.
 
 Given a offset into the virtual disk, the offset into the image file can be
 obtained as follows: