diff mbox

[RFC,V5,01/62] qcow2: Add deduplication to the qcow2 specification.

Message ID 1358351321-4891-2-git-send-email-benoit@irqsave.net
State New
Headers show

Commit Message

Benoît Canet Jan. 16, 2013, 3:47 p.m. UTC
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 docs/specs/qcow2.txt |  104 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 2 deletions(-)

Comments

Eric Blake Jan. 16, 2013, 4:43 p.m. UTC | #1
On 01/16/2013 08:47 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  docs/specs/qcow2.txt |  104 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 102 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 36a559d..d5f8072 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -80,7 +80,12 @@ in the description of a field.
>                                  tables to repair refcounts before accessing the
>                                  image.
>  
> -                    Bits 1-63:  Reserved (set to 0)
> +                    Bit 1:      Deduplication bit.  If this bit is set then
> +                                deduplication is used on this image.

If this bit is set, you probably want to require that the deduplication
header extension is present.

> +                                L2 tables size 64KB is different from
> +                                cluster size 4KB.

I'm still not sure what this sentence means.  Remember, cluster size of
normal disk data is configurable; are you stating that if dedup is in
effect, then the cluster size MUST be fixed at 4k (or in other words,
that header offsets 20-23 [cluster_bits] must be exactly 12)?  And my
understanding is that with dedup, there are now two L1 and L2 tables -
the normal tables to get at the actual logical data, and the dedup
tables for getting at the hashes.  Are you stating that both L2 tables
are 64k, or that just the dedup L2 is 64k?

>  
> +== Deduplication ==
> +
> +The deduplication extension contains information concerning deduplication.

Just as I suggested that the deduplication feature bit field above
should require this extension be present, here, I would probably require
that this extension not be present unless the deduplication feature bit
is set.

> +
> +    Byte   0 - 7:   Offset of the RAM deduplication table (RAM lookup)
> +
> +          8 - 11:   Size of the RAM deduplication table = number of L1 64-bit
> +                    pointers
> +
> +              12:   Hash algo enum field
> +                        0: SHA-256
> +                        1: SHA3
> +                        2: SKEIN-256
> +
> +              13:   Dedup strategies bitmap
> +                        0: RAM based hash lookup (always set to 1 for now)
> +                        1: Disk based hash lookup

Are these two bits mutually exclusive, or can they both be used at once?

> +                        2: Deduplication running if set to 1
> +
> +        14 - 69:    Set to zero and reserved for future use
> +
> +Disk based lookup structure will be described in a future QCOW2 specification.

If so, it may be better to document in this revision of the file that
the disk-based hash lookup strategy bit must always be 0 for now.

> +
> +== Deduplication table (RAM method) ==
> +

>  == Host cluster management ==
>  
>  qcow2 manages the allocation of host clusters by maintaining a reference count
> @@ -211,7 +311,7 @@ guest clusters to host clusters. They are called L1 and L2 table.
>  
>  The L1 table has a variable size (stored in the header) and may use multiple
>  clusters, however it must be contiguous in the image file. L2 tables are
> -exactly one cluster in size.
> +exactly one cluster in size excepted for the deduplication case.

s/excepted/except/ - and again, is this for all L2 tables, or just the
dedup L2 tables?
diff mbox

Patch

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 36a559d..d5f8072 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -80,7 +80,12 @@  in the description of a field.
                                 tables to repair refcounts before accessing the
                                 image.
 
-                    Bits 1-63:  Reserved (set to 0)
+                    Bit 1:      Deduplication bit.  If this bit is set then
+                                deduplication is used on this image.
+                                L2 tables size 64KB is different from
+                                cluster size 4KB.
+
+                    Bits 2-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -116,6 +121,7 @@  be stored. Each extension has a structure like the following:
                         0x00000000 - End of the header extension area
                         0xE2792ACA - Backing file format name
                         0x6803f857 - Feature name table
+                        0xCD8E819B - Deduplication
                         other      - Unknown header extension, can be safely
                                      ignored
 
@@ -159,6 +165,100 @@  the header extension data. Each entry look like this:
                     terminated if it has full length)
 
 
+== Deduplication ==
+
+The deduplication extension contains information concerning deduplication.
+
+    Byte   0 - 7:   Offset of the RAM deduplication table (RAM lookup)
+
+          8 - 11:   Size of the RAM deduplication table = number of L1 64-bit
+                    pointers
+
+              12:   Hash algo enum field
+                        0: SHA-256
+                        1: SHA3
+                        2: SKEIN-256
+
+              13:   Dedup strategies bitmap
+                        0: RAM based hash lookup (always set to 1 for now)
+                        1: Disk based hash lookup
+                        2: Deduplication running if set to 1
+
+        14 - 69:    Set to zero and reserved for future use
+
+Disk based lookup structure will be described in a future QCOW2 specification.
+
+== Deduplication table (RAM method) ==
+
+The deduplication table maps a physical offset to a data hash and
+logical offset. It is used to permanently store the information to
+do the deduplication. It is loaded at startup into a RAM based representation
+used to do the lookups.
+
+The deduplication table contains 64-bit offsets to the level 2 deduplication
+table blocks.
+Each entry of these blocks contains a 32-byte SHA256 hash followed by the
+64-bit logical offset of the first encountered cluster having this hash.
+
+== Deduplication table schematic (RAM method) ==
+
+0       l1_dedup_index                                              Size
+              |
+|--------------------------------------------------------------------|
+|             |                                                      |
+|             |        L1 Deduplication table                        |
+|             |                                                      |
+|--------------------------------------------------------------------|
+              |
+              |
+              |
+0             |           l2_dedup_block_entries
+              |
+|---------------------------------|
+|                                 |
+|    L2 deduplication block       |
+|                                 |
+|                 l2_dedup_index  |
+|---------------------------------|
+                         |
+         0               |              40
+                         |
+         |-------------------------------|
+         |                               |
+         |    Deduplication table entry  |
+         |                               |
+         |-------------------------------|
+
+
+== Deduplication table entry description (RAM method) ==
+
+Each L2 deduplication table entry has the following structure:
+
+    Byte  0 - 31:   hash of data cluster
+
+         32 - 39:   Logical offset of first encountered block having
+                    this hash
+
+== Deduplication table arithmetics (RAM method) ==
+
+cluster_size = 4096
+dedup_block_size = 65536 * 5
+l2_size = 65536 * 16 (16 factor is from the smaller cluster_size)
+
+Entries in the deduplication table are ordered by physical cluster index.
+
+The number of entries in an l2 deduplication table block is :
+l2_dedup_block_entries = FLOOR(dedup_block_size / (32 + 8))
+
+The index in the level 1 deduplication table is :
+l1_dedup_index = physical_cluster_index / l2_block_cluster_entries
+
+The index in the level 2 deduplication table is:
+l2_dedup_index = physical_cluster_index % l2_block_cluster_entries
+
+The 16 remaining bytes in each l2 deduplication blocks are set to zero and
+reserved for a future usage.
+
 == Host cluster management ==
 
 qcow2 manages the allocation of host clusters by maintaining a reference count
@@ -211,7 +311,7 @@  guest clusters to host clusters. They are called L1 and L2 table.
 
 The L1 table has a variable size (stored in the header) and may use multiple
 clusters, however it must be contiguous in the image file. L2 tables are
-exactly one cluster in size.
+exactly one cluster in size excepted for the deduplication case.
 
 Given a offset into the virtual disk, the offset into the image file can be
 obtained as follows: