From patchwork Mon Feb 11 22:49:52 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: mrhines@linux.vnet.ibm.com X-Patchwork-Id: 219699 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id B6F802C02F9 for ; Tue, 12 Feb 2013 10:09:26 +1100 (EST) Received: from localhost ([::1]:42429 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U52ED-0000vC-Kw for incoming@patchwork.ozlabs.org; Mon, 11 Feb 2013 17:52:01 -0500 Received: from eggs.gnu.org ([208.118.235.92]:40382) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U52DD-0006hL-2M for qemu-devel@nongnu.org; Mon, 11 Feb 2013 17:51:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1U52D6-0007td-DE for qemu-devel@nongnu.org; Mon, 11 Feb 2013 17:50:58 -0500 Received: from e34.co.us.ibm.com ([32.97.110.152]:41534) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U52D5-0007tE-Vw for qemu-devel@nongnu.org; Mon, 11 Feb 2013 17:50:52 -0500 Received: from /spool/local by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 11 Feb 2013 15:50:51 -0700 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 11 Feb 2013 15:50:48 -0700 Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 2A5933E4003E for ; Mon, 11 Feb 2013 15:50:40 -0700 (MST) Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r1BMoV0i021698 for ; Mon, 11 Feb 2013 15:50:33 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r1BMoVBK010093 for ; Mon, 11 Feb 2013 15:50:31 -0700 Received: from mrhinesdev.klabtestbed.com (klinux.watson.ibm.com [9.2.208.21]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r1BMoUWH010052; Mon, 11 Feb 2013 15:50:30 -0700 From: "Michael R. Hines" To: qemu-devel@nongnu.org Date: Mon, 11 Feb 2013 17:49:52 -0500 Message-Id: <1360622997-26904-1-git-send-email-mrhines@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10.4 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13021122-2876-0000-0000-0000052CEB4F X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 32.97.110.152 Cc: aliguori@us.ibm.com, abali@us.ibm.com, "Michael R. Hines" , gokul@us.ibm.com Subject: [Qemu-devel] [RFC PATCH RDMA support v2: 1/6] add openfabrics RDMA libraries, configure options to build X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org From: "Michael R. Hines" This patchest introduces RDMA-based live-migration to QEMU. A copy of this documentation is located online: http://wiki.qemu.org/Features/RDMALiveMigration DESIGN: ========== 1. In order to provide maximum cross-device compatibility, we use the librdmacm library, which abstracts out the RDMA capabilities of each individual type of RDMA device, including infiniband, iWARP, as well as RoCE. This patch has been tested on both RoCE and infiniband devices from Mellanox. 2. A new file named "migration-rdma.c" contains the core code required to perform librdmacm connection establishment and the transfer of actual RDMA contents. 3. Files "arch_init.c" and "savevm.c" have been modified to transfer the VM's memory in the standard live migration path using RMDA memory instead of using TCP. 4. All of the original logic for migration of devices and protocol synchronization does not change - that happens simultaneously over TCP as it normally does. 5. Currently, the XBZRLE capability and the detection of zero pages (dup_page()) significantly slow down the empircal throughput observed when RDMA is activated, so the code path skips these capabilities when RDMA is enabled. Hopefully, we can stop doing this in the future and come up with a way to preserve these capabilities simultaneously with the use of RDMA. PERFORMANCE: ============ Using a 40gbps infinband link performing a worst-case stress test: RDMA Throughput With $ stress --vm-bytes 1024M --vm 1 --vm-keep Approximately 26 gpbs 1. Average worst-case throughput TCP Throughput With $ stress --vm-bytes 1024M --vm 1 --vm-keep 2. Approximately 8 gpbs (using IPOIB IP over Infiniband) Average downtime (stop time) ranges between 28 and 33 milliseconds. An *exhaustive* paper (2010) shows additional performance details linked on the QEMU wiki: http://wiki.qemu.org/Features/RDMALiveMigration USAGE: ========== Complete instructions for compiling and running with RDMA are also available on the wiki (probably too much for a cover letter). Signed-off-by: Michael R. Hines --- Makefile.objs | 1 + configure | 25 +++++++++++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/Makefile.objs b/Makefile.objs index 68eb0ce..38767cc 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -57,6 +57,7 @@ common-obj-$(CONFIG_POSIX) += os-posix.o common-obj-$(CONFIG_LINUX) += fsdev/ common-obj-y += migration.o migration-tcp.o +common-obj-$(CONFIG_RDMA) += migration-rdma.o common-obj-y += qemu-char.o #aio.o common-obj-y += block-migration.o common-obj-y += page_cache.o diff --git a/configure b/configure index b7635e4..893935f 100755 --- a/configure +++ b/configure @@ -170,6 +170,7 @@ xfs="" vhost_net="no" kvm="no" +rdma="no" gprof="no" debug_tcg="no" debug="no" @@ -897,6 +898,10 @@ for opt do ;; --enable-virtio-blk-data-plane) virtio_blk_data_plane="yes" ;; + --enable-rdma) rdma="yes" + ;; + --disable-rdma) rdma="no" + ;; *) echo "ERROR: unknown option $opt"; show_help="yes" ;; esac @@ -1087,6 +1092,8 @@ echo " --enable-bluez enable bluez stack connectivity" echo " --disable-slirp disable SLIRP userspace network connectivity" echo " --disable-kvm disable KVM acceleration support" echo " --enable-kvm enable KVM acceleration support" +echo " --disable-rdma disable RDMA-based migration support" +echo " --enable-rdma enable RDMA-based migration support" echo " --enable-tcg-interpreter enable TCG with bytecode interpreter (TCI)" echo " --disable-nptl disable usermode NPTL support" echo " --enable-nptl enable usermode NPTL support" @@ -1718,6 +1725,18 @@ EOF libs_softmmu="$sdl_libs $libs_softmmu" fi +if test "$rdma" = "yes" ; then + cat > $TMPC < +int main(void) { return 0; } +EOF + rdma_libs="-lrdmacm" + if ! compile_prog "" "$rdma_libs" ; then + feature_not_found "rdma" + fi + +fi + ########################################## # VNC TLS/WS detection if test "$vnc" = "yes" -a \( "$vnc_tls" != "no" -o "$vnc_ws" != "no" \) ; then @@ -3318,6 +3337,7 @@ echo "Linux AIO support $linux_aio" echo "ATTR/XATTR support $attr" echo "Install blobs $blobs" echo "KVM support $kvm" +echo "RDMA support $rdma" echo "TCG interpreter $tcg_interpreter" echo "fdt support $fdt" echo "preadv support $preadv" @@ -4278,6 +4298,11 @@ if [ "$pixman" = "internal" ]; then echo "config-host.h: subdir-pixman" >> $config_host_mak fi +if test "$rdma" = "yes" ; then +echo "CONFIG_RDMA=y" >> $config_host_mak +echo "LIBS+=$rdma_libs" >> $config_host_mak +fi + # build tree in object directory in case the source is not in the current directory DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32" DIRS="$DIRS pc-bios/optionrom pc-bios/spapr-rtas"