From patchwork Sun Jul 29 09:42:55 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Orit Wasserman X-Patchwork-Id: 173907 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id BD03A2C008B for ; Sun, 29 Jul 2012 19:43:34 +1000 (EST) Received: from localhost ([::1]:42993 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SvQ28-0004EY-SD for incoming@patchwork.ozlabs.org; Sun, 29 Jul 2012 05:43:32 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52386) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SvQ1r-00041u-2L for qemu-devel@nongnu.org; Sun, 29 Jul 2012 05:43:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SvQ1p-0000oG-Hv for qemu-devel@nongnu.org; Sun, 29 Jul 2012 05:43:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:30113) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SvQ1p-0000o5-9W for qemu-devel@nongnu.org; Sun, 29 Jul 2012 05:43:13 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q6T9h82h027726 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 29 Jul 2012 05:43:08 -0400 Received: from dhcp-1-120.tlv.redhat.com (vpn-201-182.tlv.redhat.com [10.35.201.182]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q6T9gpLI028160; Sun, 29 Jul 2012 05:43:04 -0400 From: Orit Wasserman To: qemu-devel@nongnu.org Date: Sun, 29 Jul 2012 12:42:55 +0300 Message-Id: <1343554983-4195-4-git-send-email-owasserm@redhat.com> In-Reply-To: <1343554983-4195-1-git-send-email-owasserm@redhat.com> References: <1343554983-4195-1-git-send-email-owasserm@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.132.183.28 Cc: peter.maydell@linaro.org, aliguori@us.ibm.com, quintela@redhat.com, stefanha@gmail.com, mdroth@linux.vnet.ibm.com, lcapitulino@redhat.com, blauwirbel@gmail.com, Orit Wasserman , chegu_vinod@hp.com, avi@redhat.com, pbonzini@redhat.com, eblake@redhat.com Subject: [Qemu-devel] [PATCH 03/11] Add XBZRLE documentation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Signed-off-by: Orit Wasserman --- docs/xbzrle.txt | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 136 insertions(+), 0 deletions(-) create mode 100644 docs/xbzrle.txt diff --git a/docs/xbzrle.txt b/docs/xbzrle.txt new file mode 100644 index 0000000..f70e851 --- /dev/null +++ b/docs/xbzrle.txt @@ -0,0 +1,136 @@ +XBZRLE (Xor Based Zero Run Length Encoding) +=========================================== + +Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction +of VM downtime and the total live-migration time of Virtual machines. +It is particularly useful for virtual machines running memory write intensive +workloads that are typical of large enterprise applications such as SAP ERP +Systems, and generally speaking for any application that uses a sparse memory +update pattern. + +Instead of sending the changed guest memory page this solution will send a +compressed version of the updates, thus reducing the amount of data sent during +live migration. +In order to be able to calculate the update, the previous memory pages need to +be stored on the source. Those pages are stored in a dedicated cache +(hash table) and are +accessed by their address. +The larger the cache size the better the chances are that the page has already +been stored in the cache. +A small cache size will result in high cache miss rate. +Cache size can be changed before and during migration. + +Format +======= + +The compression format performs a XOR between the previous and current content +of the page, where zero represents an unchanged value. +The page data delta is represented by zero and non zero runs. +A zero run is represented by its length (in bytes). +A non zero run is represented by its length (in bytes) and the new data. +The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128) + +There can be more than one valid encoding, the sender may send a longer encoding +for the benefit of reducing computation cost. + +page = zrun nzrun + | zrun nzrun page + +zrun = length + +nzrun = length byte... + +length = uleb128 encoded integer + +On the sender side XBZRLE is used as a compact delta encoding of page updates, +retrieving the old page content from the cache (default size of 512 MB). The +receiving side uses the existing page's content and XBZRLE to decode the new +page's content. + +This work was originally based on research results published +VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live +Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth. +Additionally the delta encoder XBRLE was improved further using the XBZRLE +instead. + +XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it +ideal for in-line, real-time encoding such as is needed for live-migration. + +Example +old buffer: +1001 zeros +05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d +3074 zeros + +new buffer: +1001 zeros +01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69 +3074 zeros + +encoded buffer: + +encoded length 24 +e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69 + +Migration Capabilities +====================== +In order to use XBZRLE the destination QEMU version should be able to +decode the new format. +Adding a new migration capabilities command that will allow external management +to query for it support. +A typical use for the destination + {qemu} info migrate_capabilities + {qemu} xbzrle, ... + +In order to enable capabilities for future live migration, +a new command migrate_set_parameter is introduced: + {qemu} migrate_set_parameter xbzrle + +Usage +====== + +1. Activate xbzrle +2. Set the XBZRLE cache size - the cache size is in MBytes and should be a +power of 2. The cache default value is 64MBytes. +3. start outgoing migration + +A typical usage scenario: +On the incoming QEMU: + {qemu} migrate_set_parameter xbzrle on +On the outgoing QEMU: + {qemu} migrate_set_parameter xbzrle on + {qemu} migrate_set_cachesize 256m + {qemu} migrate -d tcp:destination.host:4444 + {qemu} info migrate + ... + cache size: 67108864 bytes + transferred ram-duplicate: A kbytes + transferred ram-normal: B kbytes + transferred ram-xbrle: C kbytes + overflow ram-xbrle: D pages + cache-miss ram-xbrle: E pages + +cache-miss: the number of cache misses to date - high cache-miss rate +indicates that the cache size is set too low. +overflow: the number of overflows in the decoding which where the delta could +not be compressed. This can happen if the changes in the pages are too large +or there are many short changes; for example, changing every second byte (half a +page). + +Testing: Testing indicated that live migration with XBZRLE was completed in 110 +seconds, whereas without it would not be able to complete. + +A simple synthetic memory r/w load generator: +.. include +.. include +.. int main() +.. { +.. char *buf = (char *) calloc(4096, 4096); +.. while (1) { +.. int i; +.. for (i = 0; i < 4096 * 4; i++) { +.. buf[i * 4096 / 4]++; +.. } +.. printf("."); +.. } +.. }