diff mbox

[v13,03/13] Add XBZRLE documentation

Message ID 1340793261-11400-4-git-send-email-owasserm@redhat.com
State New
Headers show

Commit Message

Orit Wasserman June 27, 2012, 10:34 a.m. UTC
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 docs/xbzrle.txt |  142 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 142 insertions(+), 0 deletions(-)
 create mode 100644 docs/xbzrle.txt

Comments

Eric Blake June 27, 2012, 3:53 p.m. UTC | #1
On 06/27/2012 04:34 AM, Orit Wasserman wrote:
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> ---
>  docs/xbzrle.txt |  142 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 142 insertions(+), 0 deletions(-)
>  create mode 100644 docs/xbzrle.txt
> 

> +Format
> +=======
> +
> +The compression format preforms a XOR between the previous and current content

s/preforms/performs/

> +of the page, where zero represents an unchanged value.
> +The page data delta is represented by zero and non zero runs.
> +A zero run is represented by its length (in bytes).
> +A non zero run is represented by its length (in bytes) and the new data.
> +The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
> +

Maybe mention that there is more than one valid encoding, and that the
sender may send a longer encoding if the computation cost of determining
the shortest representation is not worthwhile (to make it clear that we
may or may not decide to optimize the case of one unchanged byte
splitting two nzruns or being inlined into a single nzrun).

> +On the sender side XBZRLE is used as a compact delta encoding of page updates,
> +retrieving the old page content from the cache (default size of 512 MB). The
> +receiving side uses the existing page's content and XBZRLE to decode the new
> +page's content.
> +
> +This is a more compact way to store the deltas than the previous version.

What previous version?  Oh, you mentioned it in the next paragraph (the
XBRLE algorithm used by Benoit, Svard, Tordsson, and Elmroth); although
I had been assuming you were comparing it to the previous way that qemu
sent changed pages (that is, uncompressed).  I think you could either
delete this one-liner paragraph with no loss in information, or use this
more generic wording instead:

This typically results in a more compact representation of a changed page.

> +
> +Example
> +new buffer:
> +1100 zeros
> +1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f

> 
> +old buffer:
> +1100 zeros
> +5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22

> 
> +encoded buffer:
> +
> +encoded length 118
> +
> +e8 7 70 0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c

Huh?  If I'm doing my math right, this starts with a zero run of 1100
bytes, which 0x44c, which is encoded as 0xcc 0x08 (that is,
((0x44c&0x7f)|0x80) for the first byte, and (0x44c>>7) for the second
byte).  But you listed the zero run as 0xe8 0x07 (which decodes to
0x3e8, or 1000).

Also, I would list old buffer before new.  And I'd pad out the buffer so
that all bytes are two-digits (use leading zero) and therefore columns
are aligned.  You can also use a shorter string to still get the point
across.  Maybe something like:

old buffer, 4096 bytes:
1100 zeros
01 02 03 04 05 06 00 08 09 0a 0b 0c
2984 zeros

new buffer, 4096 bytes:
1100 zeros
02 03 04 04 05 06 07 08 00 00 09 0a 00 0b 0c 0d
2980 zeros

one possible encoded buffer, 18 bytes:
cc 08 03 02 03 04 03 0a 07 08 00 00 09 0a 00 0b 0c 0d
diff mbox

Patch

diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt
new file mode 100644
index 0000000..d29b0a7
--- /dev/null
+++ b/docs/xbzrle.txt
@@ -0,0 +1,142 @@ 
+XBZRLE (Xor Based Zero Run Length Encoding)
+===========================================
+
+Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction
+of VM downtime and the total live-migration time of Virtual machines.
+It is particularly useful for virtual machines running memory write intensive
+workloads that are typical of large enterprise applications such as SAP ERP
+Systems, and generally speaking for any application that uses a sparse memory
+update pattern.
+
+Instead of sending the changed guest memory page this solution will send a
+compressed version of the updates, thus reducing the amount of data sent during
+live migration.
+In order to be able to calculate the update, the previous memory pages need to
+be stored on the source. Those pages are stored in a dedicated cache
+(hash table) and are
+accessed by their address.
+The larger the cache size the better the chances are that the page has already
+been stored in the cache.
+A small cache size will result in high cache miss rate.
+Cache size can be changed before and during migration.
+
+Format
+=======
+
+The compression format preforms a XOR between the previous and current content
+of the page, where zero represents an unchanged value.
+The page data delta is represented by zero and non zero runs.
+A zero run is represented by its length (in bytes).
+A non zero run is represented by its length (in bytes) and the new data.
+The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
+
+page = zrun nzrun
+       | zrun nzrun page
+
+zrun = length
+
+nzrun = length byte...
+
+length = uleb128 encoded integer
+
+On the sender side XBZRLE is used as a compact delta encoding of page updates,
+retrieving the old page content from the cache (default size of 512 MB). The
+receiving side uses the existing page's content and XBZRLE to decode the new
+page's content.
+
+This is a more compact way to store the deltas than the previous version.
+
+This work was originally based on research results published
+VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live
+Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth.
+Additionally the delta encoder XBRLE was improved further using the XBZRLE
+instead.
+
+XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
+ideal for in-line, real-time encoding such as is needed for live-migration.
+
+Example
+new buffer:
+1100 zeros
+1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
+20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 39
+3a 3b 3c 3d 3e 3f 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53
+54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60 61 62 63 64 0 0 67 0 69
+2984 zeros
+
+old buffer:
+1100 zeros
+5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22
+23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 39 3a 3b 3c
+3d 3e 3f 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56
+57 58 59 5a 5b 5c 5d 5e 5f 60 61 62 63 64 65 66 67 68 0 0 6b 0 6d
+2984 zeros
+
+encoded buffer:
+
+encoded length 118
+
+e8 7 70 0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c
+1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36
+37 38 39 3a 3b 3c 3d 3e 3f 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50
+51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60 61 62 63 64 0 0 67 0 69 0 0 0 0
+0 0 a8 17
+
+Migration Capabilities
+======================
+In order to use XBZRLE the destination QEMU version should be able to
+decode the new format.
+Adding a new migration capabilities command that will allow external management
+to query for it support.
+A typical use for the destination
+    {qemu} info migrate_capabilities
+    {qemu} xbzrle, ...
+
+In order to enable capabilities for future live migration,
+a new command migrate_set_parameter is introduced:
+    {qemu} migrate_set_parameter xbzrle
+
+Usage
+======
+
+1. Activate xbzrle
+2. Set the XBZRLE cache size - the cache size is in MBytes and should be a
+power of 2. The cache default value is 64MBytes.
+3. start outgoing migration
+
+A typical usage scenario:
+    {qemu} migrate_set_parameter xbzrle
+    {qemu} migrate_set_cachesize 256m
+    {qemu} migrate -d tcp:destination.host:4444
+    {qemu} info migrate
+    ...
+    transferred ram-duplicate: A kbytes
+    transferred ram-normal: B kbytes
+    transferred ram-xbrle: C kbytes
+    overflow ram-xbrle: D pages
+    cache-miss ram-xbrle: E pages
+
+cache-miss: the number of cache misses to date - high cache-miss rate
+indicates that the cache size is set too low.
+overflow: the number of overflows in the decoding which where the delta could
+not be compressed. This can happen if the changes in the pages are too large
+or there are many short changes; for example, changing every second byte (half a
+page).
+
+Testing: Testing indicated that live migration with XBZRLE was completed in 110
+seconds, whereas without it would not be able to complete.
+
+A simple synthetic memory r/w load generator:
+..    include <stdlib.h>
+..    include <stdio.h>
+..    int main()
+..    {
+..        char *buf = (char *) calloc(4096, 4096);
+..        while (1) {
+..            int i;
+..            for (i = 0; i < 4096 * 4; i++) {
+..                buf[i * 4096 / 4]++;
+..            }
+..            printf(".");
+..        }
+..    }