diff mbox

[RFC,V3,01/24] qcow2: Add deduplication to the qcow2 specification.

Message ID 1353935123-24199-2-git-send-email-benoit@irqsave.net
State New
Headers show

Commit Message

Benoît Canet Nov. 26, 2012, 1:05 p.m. UTC
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 docs/specs/qcow2.txt |   33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

Comments

Stefan Hajnoczi Dec. 11, 2012, 11:28 a.m. UTC | #1
On Mon, Nov 26, 2012 at 02:05:00PM +0100, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  docs/specs/qcow2.txt |   33 ++++++++++++++++++++++++++++++++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 36a559d..16eafd7 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -80,7 +80,10 @@ in the description of a field.
>                                  tables to repair refcounts before accessing the
>                                  image.
>  
> -                    Bits 1-63:  Reserved (set to 0)
> +                    Bit 1:      Deduplication bit.  If this bit is set then
> +                                deduplication is used on this image.
> +
> +                    Bits 2-63:  Reserved (set to 0)
>  
>           80 -  87:  compatible_features
>                      Bitmask of compatible features. An implementation can

This bit prevents programs that don't support dedup from opening the
image file.  What are the restrictions really - can a program without
dedup support read the file?  Can it write to the file (invalidating the
dedup table)?

> @@ -116,6 +119,7 @@ be stored. Each extension has a structure like the following:
>                          0x00000000 - End of the header extension area
>                          0xE2792ACA - Backing file format name
>                          0x6803f857 - Feature name table
> +                        0xCD8E819B - Deduplication
>                          other      - Unknown header extension, can be safely
>                                       ignored
>  
> @@ -159,6 +163,33 @@ the header extension data. Each entry look like this:
>                      terminated if it has full length)
>  
>  
> +== Deduplication ==
> +
> +The deduplication extension contains the offset and size of the deduplication
> +table.
> +
> +    Byte   0 - 7:   Offset
> +
> +          8 - 11:   Size

Units?

> +
> +== Deduplication table ==

Before going into the layout please summarize the point of this table:

The deduplication table maps a physical offset to a data hash and
logical offset.  ...

> +The deduplication table contains 64 bits offsets to the level 2 deduplication
> +table clusters.
> +Each entry of these clusters contains a 32 bytes SHA256 hash followed by the
> +64 bits logical offset of the first encountered block having this hash.

At this point a diagram showing L1, L2, and dedup table entry would
help.

Or perhaps the entry structure can be presented like other structures in
this spec to reduce the amount of English description and use a more
formal reference:

Each L2 deduplication table entry has the following structure:

    Byte  0 - 31:   SHA256 hash of data cluster

         32 - 39:   Logical offset of first encountered block having
                    this hash

> +Entries in the deduplication table are orderered by physical cluster index.
> +
> +The number of entries in an l2 deduplication table cluster is :
> +l2_dedup_cluster_entries = cluster_size / (32 + 8)
> +
> +The index in the level 1 deduplication table is :
> +l1_dedup_index = physical_cluster_index / l2_dedup_cluster_entries
> +
> +The index in the level 2 deduplication table is:
> +l2_dedup_index = physical_cluster_index % l2_dedup_cluster_entries
> +
>  == Host cluster management ==
>  
>  qcow2 manages the allocation of host clusters by maintaining a reference count
> -- 
> 1.7.10.4
> 
>
Stefan Hajnoczi Dec. 11, 2012, 11:32 a.m. UTC | #2
On Mon, Nov 26, 2012 at 02:05:00PM +0100, Benoît Canet wrote:
> +== Deduplication table ==
> +
> +The deduplication table contains 64 bits offsets to the level 2 deduplication
> +table clusters.
> +Each entry of these clusters contains a 32 bytes SHA256 hash followed by the
> +64 bits logical offset of the first encountered block having this hash.

Can you foresee the need to use a different hash algorithm in the future
and should we add a hash_algo enum field to the dedup QCOW2 header
extension?

Stefan
Eric Blake Dec. 11, 2012, 11:03 p.m. UTC | #3
On 11/26/2012 06:05 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  docs/specs/qcow2.txt |   33 ++++++++++++++++++++++++++++++++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 

In addition to Stefan's comments,

> @@ -159,6 +163,33 @@ the header extension data. Each entry look like this:
>                      terminated if it has full length)
>  
>  
> +== Deduplication ==
> +
> +The deduplication extension contains the offset and size of the deduplication
> +table.
> +
> +    Byte   0 - 7:   Offset
> +
> +          8 - 11:   Size
> +
> +== Deduplication table ==
> +
> +The deduplication table contains 64 bits offsets to the level 2 deduplication

s/64 bits/64-bit/

> +table clusters.
> +Each entry of these clusters contains a 32 bytes SHA256 hash followed by the

s/32 bytes/32-byte/

> +64 bits logical offset of the first encountered block having this hash.

s/64 bits/64-bit/

> +
> +Entries in the deduplication table are orderered by physical cluster index.

s/orderered/ordered/

> +
> +The number of entries in an l2 deduplication table cluster is :
> +l2_dedup_cluster_entries = cluster_size / (32 + 8)

32+8 is not a power of two; what happens to the tail bytes at the end of
a cluster of entries?  If you define them to be 0 now, you can use them
for possible extensions later.

> +
> +The index in the level 1 deduplication table is :
> +l1_dedup_index = physical_cluster_index / l2_dedup_cluster_entries
> +
> +The index in the level 2 deduplication table is:
> +l2_dedup_index = physical_cluster_index % l2_dedup_cluster_entries
> +
>  == Host cluster management ==
>  
>  qcow2 manages the allocation of host clusters by maintaining a reference count
>
Benoît Canet Dec. 12, 2012, 3:57 p.m. UTC | #4
> Can you foresee the need to use a different hash algorithm in the future
> and should we add a hash_algo enum field to the dedup QCOW2 header
> extension?

Yes I foresee the future use of faster hash function like SHA3 or Skein.

I also think an alternate deduplication mechanism where lookups are done
on disk in order to be able to deduplicate very large volume could be added.

What would be the cleanest way to store this in the header extension ?
bitmaps or two char fields ?

Benoît
Benoît Canet Dec. 12, 2012, 3:59 p.m. UTC | #5
> 32+8 is not a power of two; what happens to the tail bytes at the end of
> a cluster of entries?  If you define them to be 0 now, you can use them
> for possible extensions later.
They are currently unused. That's a nice idea.

Benoît
Stefan Hajnoczi Dec. 18, 2012, 1:38 p.m. UTC | #6
On Wed, Dec 12, 2012 at 04:57:38PM +0100, Benoît Canet wrote:
> > Can you foresee the need to use a different hash algorithm in the future
> > and should we add a hash_algo enum field to the dedup QCOW2 header
> > extension?
> 
> Yes I foresee the future use of faster hash function like SHA3 or Skein.
> 
> I also think an alternate deduplication mechanism where lookups are done
> on disk in order to be able to deduplicate very large volume could be added.
> 
> What would be the cleanest way to store this in the header extension ?
> bitmaps or two char fields ?

The header extension could have a uint8_t hash_algo field (and 3
reserved bytes that can be used in the future).

0 - SHA256
1 - Skein
...

Stefan
diff mbox

Patch

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 36a559d..16eafd7 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -80,7 +80,10 @@  in the description of a field.
                                 tables to repair refcounts before accessing the
                                 image.
 
-                    Bits 1-63:  Reserved (set to 0)
+                    Bit 1:      Deduplication bit.  If this bit is set then
+                                deduplication is used on this image.
+
+                    Bits 2-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -116,6 +119,7 @@  be stored. Each extension has a structure like the following:
                         0x00000000 - End of the header extension area
                         0xE2792ACA - Backing file format name
                         0x6803f857 - Feature name table
+                        0xCD8E819B - Deduplication
                         other      - Unknown header extension, can be safely
                                      ignored
 
@@ -159,6 +163,33 @@  the header extension data. Each entry look like this:
                     terminated if it has full length)
 
 
+== Deduplication ==
+
+The deduplication extension contains the offset and size of the deduplication
+table.
+
+    Byte   0 - 7:   Offset
+
+          8 - 11:   Size
+
+== Deduplication table ==
+
+The deduplication table contains 64 bits offsets to the level 2 deduplication
+table clusters.
+Each entry of these clusters contains a 32 bytes SHA256 hash followed by the
+64 bits logical offset of the first encountered block having this hash.
+
+Entries in the deduplication table are orderered by physical cluster index.
+
+The number of entries in an l2 deduplication table cluster is :
+l2_dedup_cluster_entries = cluster_size / (32 + 8)
+
+The index in the level 1 deduplication table is :
+l1_dedup_index = physical_cluster_index / l2_dedup_cluster_entries
+
+The index in the level 2 deduplication table is:
+l2_dedup_index = physical_cluster_index % l2_dedup_cluster_entries
+
 == Host cluster management ==
 
 qcow2 manages the allocation of host clusters by maintaining a reference count