From patchwork Wed Jun 27 10:34:11 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Orit Wasserman X-Patchwork-Id: 167604 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 9211EB6FFF for ; Wed, 27 Jun 2012 20:35:09 +1000 (EST) Received: from localhost ([::1]:51360 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SjpaT-0003gC-4v for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2012 06:35:05 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45631) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SjpaA-0003Uc-Ei for qemu-devel@nongnu.org; Wed, 27 Jun 2012 06:34:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Sjpa5-0006m3-7d for qemu-devel@nongnu.org; Wed, 27 Jun 2012 06:34:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:62565) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sjpa4-0006lb-Ve for qemu-devel@nongnu.org; Wed, 27 Jun 2012 06:34:41 -0400 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q5RAYZX5025094 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 27 Jun 2012 06:34:35 -0400 Received: from dhcp-1-120.tlv.redhat.com (vpn-200-91.tlv.redhat.com [10.35.200.91]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q5RAYHqH013602; Wed, 27 Jun 2012 06:34:31 -0400 From: Orit Wasserman To: qemu-devel@nongnu.org Date: Wed, 27 Jun 2012 13:34:11 +0300 Message-Id: <1340793261-11400-4-git-send-email-owasserm@redhat.com> In-Reply-To: <1340793261-11400-1-git-send-email-owasserm@redhat.com> References: <1340793261-11400-1-git-send-email-owasserm@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.132.183.28 Cc: peter.maydell@linaro.org, aliguori@us.ibm.com, quintela@redhat.com, stefanha@gmail.com, mdroth@linux.vnet.ibm.com, blauwirbel@gmail.com, Orit Wasserman , chegu_vinod@hp.com, avi@redhat.com, pbonzini@redhat.com, eblake@redhat.com Subject: [Qemu-devel] [PATCH v13 03/13] Add XBZRLE documentation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Signed-off-by: Orit Wasserman --- docs/xbzrle.txt | 142 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 142 insertions(+), 0 deletions(-) create mode 100644 docs/xbzrle.txt diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt new file mode 100644 index 0000000..d29b0a7 --- /dev/null +++ b/docs/xbzrle.txt @@ -0,0 +1,142 @@ +XBZRLE (Xor Based Zero Run Length Encoding) +=========================================== + +Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction +of VM downtime and the total live-migration time of Virtual machines. +It is particularly useful for virtual machines running memory write intensive +workloads that are typical of large enterprise applications such as SAP ERP +Systems, and generally speaking for any application that uses a sparse memory +update pattern. + +Instead of sending the changed guest memory page this solution will send a +compressed version of the updates, thus reducing the amount of data sent during +live migration. +In order to be able to calculate the update, the previous memory pages need to +be stored on the source. Those pages are stored in a dedicated cache +(hash table) and are +accessed by their address. +The larger the cache size the better the chances are that the page has already +been stored in the cache. +A small cache size will result in high cache miss rate. +Cache size can be changed before and during migration. + +Format +======= + +The compression format preforms a XOR between the previous and current content +of the page, where zero represents an unchanged value. +The page data delta is represented by zero and non zero runs. +A zero run is represented by its length (in bytes). +A non zero run is represented by its length (in bytes) and the new data. +The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128) + +page = zrun nzrun + | zrun nzrun page + +zrun = length + +nzrun = length byte... + +length = uleb128 encoded integer + +On the sender side XBZRLE is used as a compact delta encoding of page updates, +retrieving the old page content from the cache (default size of 512 MB). The +receiving side uses the existing page's content and XBZRLE to decode the new +page's content. + +This is a more compact way to store the deltas than the previous version. + +This work was originally based on research results published +VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live +Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth. +Additionally the delta encoder XBRLE was improved further using the XBZRLE +instead. + +XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it +ideal for in-line, real-time encoding such as is needed for live-migration. + +Example +new buffer: +1100 zeros +1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f +20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 39 +3a 3b 3c 3d 3e 3f 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 +54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60 61 62 63 64 0 0 67 0 69 +2984 zeros + +old buffer: +1100 zeros +5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 +23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 39 3a 3b 3c +3d 3e 3f 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 +57 58 59 5a 5b 5c 5d 5e 5f 60 61 62 63 64 65 66 67 68 0 0 6b 0 6d +2984 zeros + +encoded buffer: + +encoded length 118 + +e8 7 70 0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c +1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 +37 38 39 3a 3b 3c 3d 3e 3f 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 +51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60 61 62 63 64 0 0 67 0 69 0 0 0 0 +0 0 a8 17 + +Migration Capabilities +====================== +In order to use XBZRLE the destination QEMU version should be able to +decode the new format. +Adding a new migration capabilities command that will allow external management +to query for it support. +A typical use for the destination + {qemu} info migrate_capabilities + {qemu} xbzrle, ... + +In order to enable capabilities for future live migration, +a new command migrate_set_parameter is introduced: + {qemu} migrate_set_parameter xbzrle + +Usage +====== + +1. Activate xbzrle +2. Set the XBZRLE cache size - the cache size is in MBytes and should be a +power of 2. The cache default value is 64MBytes. +3. start outgoing migration + +A typical usage scenario: + {qemu} migrate_set_parameter xbzrle + {qemu} migrate_set_cachesize 256m + {qemu} migrate -d tcp:destination.host:4444 + {qemu} info migrate + ... + transferred ram-duplicate: A kbytes + transferred ram-normal: B kbytes + transferred ram-xbrle: C kbytes + overflow ram-xbrle: D pages + cache-miss ram-xbrle: E pages + +cache-miss: the number of cache misses to date - high cache-miss rate +indicates that the cache size is set too low. +overflow: the number of overflows in the decoding which where the delta could +not be compressed. This can happen if the changes in the pages are too large +or there are many short changes; for example, changing every second byte (half a +page). + +Testing: Testing indicated that live migration with XBZRLE was completed in 110 +seconds, whereas without it would not be able to complete. + +A simple synthetic memory r/w load generator: +.. include +.. include +.. int main() +.. { +.. char *buf = (char *) calloc(4096, 4096); +.. while (1) { +.. int i; +.. for (i = 0; i < 4096 * 4; i++) { +.. buf[i * 4096 / 4]++; +.. } +.. printf("."); +.. } +.. }