Patchwork [v14,03/13] Add XBZRLE documentation

login
register
mail settings
Submitter Orit Wasserman
Date July 3, 2012, 1:52 p.m.
Message ID <1341323574-23206-4-git-send-email-owasserm@redhat.com>
Download mbox | patch
Permalink /patch/168813/
State New
Headers show

Comments

Orit Wasserman - July 3, 2012, 1:52 p.m.
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 docs/xbzrle.txt |  133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 133 insertions(+), 0 deletions(-)
 create mode 100644 docs/xbzrle.txt
Eric Blake - July 3, 2012, 7:45 p.m.
On 07/03/2012 07:52 AM, Orit Wasserman wrote:
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> ---
>  docs/xbzrle.txt |  133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 133 insertions(+), 0 deletions(-)
>  create mode 100644 docs/xbzrle.txt

> +
> +Example
> +old buffer:
> +1000 zeros
> +05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 00 00 6b 00 6d
> +3072 zeros

That's only 4092 bytes, but a page is typically 4096.  I think you meant
3076 trailing zeros.

> +
> +new buffer:
> +1000 zeros
> +01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 67 00 69
> +3702 zeros

That's 4722 bytes; looks like a transposition error, and given the above
comment, I still think you want 3076 trailing zeros.

> +
> +encoded buffer:
> +
> +encoded length 29
> +e8 07 18 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 67 00 69 00 00
> +00 80 18

Looks better, but still not quite accurate.  Per the spec, you always
end on a nzrun, which means you should not have transmitted the last two
bytes (80 18 => 3072, which is the length of your trailing zrun).  I
also find it weak that your example never shows an unchanged byte on
anything other than 00; having at least one non-zero unchanged byte may
make it more obvious in the encoded example that it is the xor
difference that determines a zrun vs. nzrun, and not the old or new
buffer contents.

Also, reading that encoding says you have 1000 zrun, then 24 bytes of
nzrun starting with a leading 00, but based on your old and new buffer,
the only way to have a leading 00 in your nzrun is if you count a zrun
of 999 bytes.  Are you sure you didn't test with a buffer of 1001
leading zeros, then 20 bytes of interest, then 3073 trailing zeros?

Given your old and new buffer as-is, and my assumption that you are
compressing a page of exactly 4096 bytes (3076 trailing zeros after your
20-byte window of interesting data), I see at least the following four
valid encodings, listed in increasing difficulty of computation:

8-byte long at a time; all chunks that differ in at least one byte are
treated as at least 8 bytes of nzrun, transmit 27 bytes
e8 07 18 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 67 00 69 00
00 00 00

4 bytes at a time; all chunks that differ in at least one byte are
treated as at least 4 bytes of nzrun, transmit 23 bytes
e8 07 14 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 67 00 69

list every single zrun (except trailing), even if only one byte long,
transmit 24 bytes
e8 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 02 01 67 01 01 69

avoid a 1-byte zrun, transmit 23 bytes
e8 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 02 03 67 00 69
Orit Wasserman - July 4, 2012, 8:29 a.m.
On 07/03/2012 10:45 PM, Eric Blake wrote:
> On 07/03/2012 07:52 AM, Orit Wasserman wrote:
>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
>> ---
>>  docs/xbzrle.txt |  133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 133 insertions(+), 0 deletions(-)
>>  create mode 100644 docs/xbzrle.txt
> 
>> +
>> +Example
>> +old buffer:
>> +1000 zeros
>> +05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 00 00 6b 00 6d
>> +3072 zeros
> 
> That's only 4092 bytes, but a page is typically 4096.  I think you meant
> 3076 trailing zeros.
> 
>> +
>> +new buffer:
>> +1000 zeros
>> +01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 67 00 69
>> +3702 zeros
> 
> That's 4722 bytes; looks like a transposition error, and given the above
> comment, I still think you want 3076 trailing zeros.
> 
>> +
>> +encoded buffer:
>> +
>> +encoded length 29
>> +e8 07 18 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 67 00 69 00 00
>> +00 80 18
> 
> Looks better, but still not quite accurate.  Per the spec, you always
> end on a nzrun, which means you should not have transmitted the last two
> bytes (80 18 => 3072, which is the length of your trailing zrun).  I
> also find it weak that your example never shows an unchanged byte on
> anything other than 00; having at least one non-zero unchanged byte may
> make it more obvious in the encoded example that it is the xor
> difference that determines a zrun vs. nzrun, and not the old or new
> buffer contents.
> 
> Also, reading that encoding says you have 1000 zrun, then 24 bytes of
> nzrun starting with a leading 00, but based on your old and new buffer,
> the only way to have a leading 00 in your nzrun is if you count a zrun
> of 999 bytes.  Are you sure you didn't test with a buffer of 1001
> leading zeros, then 20 bytes of interest, then 3073 trailing zeros?
yes you are right 1001 zeros ...
I fix it .
> 
> Given your old and new buffer as-is, and my assumption that you are
> compressing a page of exactly 4096 bytes (3076 trailing zeros after your
> 20-byte window of interesting data), I see at least the following four
> valid encodings, listed in increasing difficulty of computation:
> 
> 8-byte long at a time; all chunks that differ in at least one byte are
> treated as at least 8 bytes of nzrun, transmit 27 bytes
> e8 07 18 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 67 00 69 00
> 00 00 00
> 
> 4 bytes at a time; all chunks that differ in at least one byte are
> treated as at least 4 bytes of nzrun, transmit 23 bytes
> e8 07 14 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 67 00 69
> 
> list every single zrun (except trailing), even if only one byte long,
> transmit 24 bytes
> e8 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 02 01 67 01 01 69
> 
> avoid a 1-byte zrun, transmit 23 bytes
> e8 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 02 03 67 00 69
>

Patch

diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
new file mode 100644
index 0000000..f00d63f
--- /dev/null
+++ b/docs/xbzrle.txt
@@ -0,0 +1,133 @@ 
+XBZRLE (Xor Based Zero Run Length Encoding)
+===========================================
+
+Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction
+of VM downtime and the total live-migration time of Virtual machines.
+It is particularly useful for virtual machines running memory write intensive
+workloads that are typical of large enterprise applications such as SAP ERP
+Systems, and generally speaking for any application that uses a sparse memory
+update pattern.
+
+Instead of sending the changed guest memory page this solution will send a
+compressed version of the updates, thus reducing the amount of data sent during
+live migration.
+In order to be able to calculate the update, the previous memory pages need to
+be stored on the source. Those pages are stored in a dedicated cache
+(hash table) and are
+accessed by their address.
+The larger the cache size the better the chances are that the page has already
+been stored in the cache.
+A small cache size will result in high cache miss rate.
+Cache size can be changed before and during migration.
+
+Format
+=======
+
+The compression format performs a XOR between the previous and current content
+of the page, where zero represents an unchanged value.
+The page data delta is represented by zero and non zero runs.
+A zero run is represented by its length (in bytes).
+A non zero run is represented by its length (in bytes) and the new data.
+The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+
+There can be more than one valid encoding, the sender may send a longer encoding
+for the benefit of reducing computation cost.
+
+page = zrun nzrun
+       | zrun nzrun page
+
+zrun = length
+
+nzrun = length byte...
+
+length = uleb128 encoded integer
+
+On the sender side XBZRLE is used as a compact delta encoding of page updates,
+retrieving the old page content from the cache (default size of 512 MB). The
+receiving side uses the existing page's content and XBZRLE to decode the new
+page's content.
+
+This work was originally based on research results published
+VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live
+Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth.
+Additionally the delta encoder XBRLE was improved further using the XBZRLE
+instead.
+
+XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
+ideal for in-line, real-time encoding such as is needed for live-migration.
+
+Example
+old buffer:
+1000 zeros
+05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 00 00 6b 00 6d
+3072 zeros
+
+new buffer:
+1000 zeros
+01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 67 00 69
+3702 zeros
+
+encoded buffer:
+
+encoded length 29
+e8 07 18 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00 00 67 00 69 00 00
+00 80 18
+
+Migration Capabilities
+======================
+In order to use XBZRLE the destination QEMU version should be able to
+decode the new format.
+Adding a new migration capabilities command that will allow external management
+to query for it support.
+A typical use for the destination
+    {qemu} info migrate_capabilities
+    {qemu} xbzrle, ...
+
+In order to enable capabilities for future live migration,
+a new command migrate_set_parameter is introduced:
+    {qemu} migrate_set_parameter xbzrle
+
+Usage
+======
+
+1. Activate xbzrle
+2. Set the XBZRLE cache size - the cache size is in MBytes and should be a
+power of 2. The cache default value is 64MBytes.
+3. start outgoing migration
+
+A typical usage scenario:
+    {qemu} migrate_set_parameter xbzrle
+    {qemu} migrate_set_cachesize 256m
+    {qemu} migrate -d tcp:destination.host:4444
+    {qemu} info migrate
+    ...
+    transferred ram-duplicate: A kbytes
+    transferred ram-normal: B kbytes
+    transferred ram-xbrle: C kbytes
+    overflow ram-xbrle: D pages
+    cache-miss ram-xbrle: E pages
+
+cache-miss: the number of cache misses to date - high cache-miss rate
+indicates that the cache size is set too low.
+overflow: the number of overflows in the decoding which where the delta could
+not be compressed. This can happen if the changes in the pages are too large
+or there are many short changes; for example, changing every second byte (half a
+page).
+
+Testing: Testing indicated that live migration with XBZRLE was completed in 110
+seconds, whereas without it would not be able to complete.
+
+A simple synthetic memory r/w load generator:
+..    include <stdlib.h>
+..    include <stdio.h>
+..    int main()
+..    {
+..        char *buf = (char *) calloc(4096, 4096);
+..        while (1) {
+..            int i;
+..            for (i = 0; i < 4096 * 4; i++) {
+..                buf[i * 4096 / 4]++;
+..            }
+..            printf(".");
+..        }
+..    }