From patchwork Tue Oct 30 08:33:01 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 195411 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id E527E2C009B for ; Tue, 30 Oct 2012 21:25:59 +1100 (EST) Received: from localhost ([::1]:49973 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TT7Io-0007Mn-HB for incoming@patchwork.ozlabs.org; Tue, 30 Oct 2012 04:36:02 -0400 Received: from eggs.gnu.org ([208.118.235.92]:60907) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TT7GQ-0004sR-8k for qemu-devel@nongnu.org; Tue, 30 Oct 2012 04:33:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TT7GD-0000aO-9D for qemu-devel@nongnu.org; Tue, 30 Oct 2012 04:33:34 -0400 Received: from mail.valinux.co.jp ([210.128.90.3]:44734) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TT7GC-0000XC-Hf for qemu-devel@nongnu.org; Tue, 30 Oct 2012 04:33:21 -0400 Received: from ps.local.valinux.co.jp (vagw.valinux.co.jp [210.128.90.14]) by mail.valinux.co.jp (Postfix) with SMTP id 01CCD181CB; Tue, 30 Oct 2012 17:33:14 +0900 (JST) Received: (nullmailer pid 29472 invoked by uid 1000); Tue, 30 Oct 2012 08:33:13 -0000 From: Isaku Yamahata To: qemu-devel@nongnu.org, kvm@vger.kernel.org Date: Tue, 30 Oct 2012 17:33:01 +0900 Message-Id: <6075db39631f70e981ff0f64e37bbebb3544a930.1351582535.git.yamahata@valinux.co.jp> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: References: In-Reply-To: References: X-Virus-Scanned: clamav-milter 0.95.2 at va-mail.local.valinux.co.jp X-Virus-Status: Clean X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 210.128.90.3 Cc: benoit.hudzia@gmail.com, aarcange@redhat.com, aliguori@us.ibm.com, quintela@redhat.com, stefanha@gmail.com, t.hirofuchi@aist.go.jp, dlaor@redhat.com, satoshi.itoh@aist.go.jp, mdroth@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp, owasserm@redhat.com, avi@redhat.com, pbonzini@redhat.com, chegu_vinod@hp.com Subject: [Qemu-devel] [PATCH v3 25/35] postcopy: implement outgoing part of postcopy live migration X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org This patch implements postcopy live migration for outgoing part Signed-off-by: Isaku Yamahata --- Changes v2 -> v3: - modify savevm_ram_handlers instead of if (postcopy) - code simplification Changes v1 -> v2: - fix parameter to qemu_fdopen() - handle QEMU_UMEM_REQ_EOC properly when PO_STATE_ALL_PAGES_SENT, QEMU_UMEM_REQ_EOC request was ignored. handle properly it. - flush on-demand page unconditionally - improve postcopy_outgoing_ram_save_live and postcopy_outgoing_begin() - use qemu_fopen_fd - use memory api instead of obsolete api - segv in postcopy_outgoing_check_all_ram_sent() - catch up qapi change --- arch_init.c | 22 ++- migration-exec.c | 4 + migration-fd.c | 17 ++ migration-postcopy.c | 423 ++++++++++++++++++++++++++++++++++++++++++++++++++ migration-tcp.c | 6 +- migration-unix.c | 26 +++- migration.c | 32 +++- migration.h | 18 +++ savevm.c | 35 ++++- sysemu.h | 2 +- 10 files changed, 572 insertions(+), 13 deletions(-) diff --git a/arch_init.c b/arch_init.c index d82316d..d95ce7b 100644 --- a/arch_init.c +++ b/arch_init.c @@ -189,7 +189,6 @@ static struct { .cache = NULL, }; - int64_t xbzrle_cache_resize(int64_t new_size) { if (XBZRLE.cache != NULL) { @@ -591,6 +590,7 @@ static void reset_ram_globals(void) static int ram_save_setup(QEMUFile *f, void *opaque) { RAMBlock *block; + const MigrationParams *params = &migrate_get_current()->params; migration_bitmap_init(); qemu_mutex_lock_ramlist(); @@ -610,8 +610,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque) acct_clear(); } - memory_global_dirty_log_start(); - migration_bitmap_sync(); + if (!params->postcopy) { + memory_global_dirty_log_start(); + migration_bitmap_sync(); + } qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); @@ -916,7 +918,21 @@ done: return ret; } +static void ram_save_set_params(const MigrationParams *params, void *opaque) +{ + if (params->postcopy) { + savevm_ram_handlers.save_live_iterate = + postcopy_outgoing_ram_save_iterate; + savevm_ram_handlers.save_live_complete = + postcopy_outgoing_ram_save_complete; + } else { + savevm_ram_handlers.save_live_iterate = ram_save_iterate; + savevm_ram_handlers.save_live_complete = ram_save_complete; + } +} + SaveVMHandlers savevm_ram_handlers = { + .set_params = ram_save_set_params, .save_live_setup = ram_save_setup, .save_live_iterate = ram_save_iterate, .save_live_complete = ram_save_complete, diff --git a/migration-exec.c b/migration-exec.c index 95e9779..10bbecf 100644 --- a/migration-exec.c +++ b/migration-exec.c @@ -64,6 +64,10 @@ int exec_start_outgoing_migration(MigrationState *s, const char *command) { FILE *f; + if (s->params.postcopy) { + return -ENOSYS; + } + f = popen(command, "w"); if (f == NULL) { DPRINTF("Unable to popen exec target\n"); diff --git a/migration-fd.c b/migration-fd.c index 8384975..f68fa28 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -90,6 +90,23 @@ int fd_start_outgoing_migration(MigrationState *s, const char *fdname) s->write = fd_write; s->close = fd_close; + if (s->params.postcopy) { + int flags = fcntl(s->fd, F_GETFL); + if ((flags & O_ACCMODE) != O_RDWR) { + goto err_after_open; + } + + s->fd_read = dup(s->fd); + if (s->fd_read == -1) { + goto err_after_open; + } + s->file_read = qemu_fopen_fd(s->fd_read, "rb"); + if (s->file_read == NULL) { + close(s->fd_read); + goto err_after_open; + } + } + migrate_fd_connect(s); return 0; diff --git a/migration-postcopy.c b/migration-postcopy.c index 0809ffa..399e233 100644 --- a/migration-postcopy.c +++ b/migration-postcopy.c @@ -167,6 +167,107 @@ static void postcopy_incoming_send_req(QEMUFile *f, } } +static int postcopy_outgoing_recv_req_idstr(QEMUFile *f, + struct qemu_umem_req *req, + size_t *offset) +{ + int ret; + + req->len = qemu_peek_byte(f, *offset); + *offset += 1; + if (req->len == 0) { + return -EAGAIN; + } + req->idstr = g_malloc((int)req->len + 1); + ret = qemu_peek_buffer(f, (uint8_t*)req->idstr, req->len, *offset); + *offset += ret; + if (ret != req->len) { + g_free(req->idstr); + req->idstr = NULL; + return -EAGAIN; + } + req->idstr[req->len] = 0; + return 0; +} + +static int postcopy_outgoing_recv_req_pgoffs(QEMUFile *f, + struct qemu_umem_req *req, + size_t *offset) +{ + int ret; + uint32_t be32; + uint32_t i; + + ret = qemu_peek_buffer(f, (uint8_t*)&be32, sizeof(be32), *offset); + *offset += sizeof(be32); + if (ret != sizeof(be32)) { + return -EAGAIN; + } + + req->nr = be32_to_cpu(be32); + req->pgoffs = g_new(uint64_t, req->nr); + for (i = 0; i < req->nr; i++) { + uint64_t be64; + ret = qemu_peek_buffer(f, (uint8_t*)&be64, sizeof(be64), *offset); + *offset += sizeof(be64); + if (ret != sizeof(be64)) { + g_free(req->pgoffs); + req->pgoffs = NULL; + return -EAGAIN; + } + req->pgoffs[i] = be64_to_cpu(be64); + } + return 0; +} + +static int postcopy_outgoing_recv_req(QEMUFile *f, struct qemu_umem_req *req) +{ + int size; + int ret; + size_t offset = 0; + + size = qemu_peek_buffer(f, (uint8_t*)&req->cmd, 1, offset); + if (size <= 0) { + return -EAGAIN; + } + offset += 1; + + switch (req->cmd) { + case QEMU_UMEM_REQ_INIT: + case QEMU_UMEM_REQ_EOC: + /* nothing */ + break; + case QEMU_UMEM_REQ_PAGE: + ret = postcopy_outgoing_recv_req_idstr(f, req, &offset); + if (ret < 0) { + return ret; + } + ret = postcopy_outgoing_recv_req_pgoffs(f, req, &offset); + if (ret < 0) { + return ret; + } + break; + case QEMU_UMEM_REQ_PAGE_CONT: + ret = postcopy_outgoing_recv_req_pgoffs(f, req, &offset); + if (ret < 0) { + return ret; + } + break; + default: + abort(); + break; + } + qemu_file_skip(f, offset); + DPRINTF("cmd %d\n", req->cmd); + return 0; +} + +static void postcopy_outgoing_free_req(struct qemu_umem_req *req) +{ + g_free(req->idstr); + g_free(req->pgoffs); +} + /*************************************************************************** * QEMU_VM_POSTCOPY section subtype */ @@ -174,6 +275,328 @@ static void postcopy_incoming_send_req(QEMUFile *f, #define QEMU_VM_POSTCOPY_SECTION_FULL 1 /*************************************************************************** + * outgoing part + */ + +enum POState { + PO_STATE_ERROR_RECEIVE, + PO_STATE_ACTIVE, + PO_STATE_EOC_RECEIVED, + PO_STATE_ALL_PAGES_SENT, + PO_STATE_COMPLETED, +}; +typedef enum POState POState; + +struct PostcopyOutgoingState { + POState state; + QEMUFile *mig_read; + int fd_read; + RAMBlock *last_block_read; + + QEMUFile *mig_buffered_write; + MigrationState *ms; +}; + +int postcopy_outgoing_create_read_socket(MigrationState *s) +{ + if (!s->params.postcopy) { + return 0; + } + + s->fd_read = dup(s->fd); + if (s->fd_read == -1) { + int ret = -errno; + perror("dup"); + return ret; + } + s->file_read = qemu_fopen_socket(s->fd_read); + if (s->file_read == NULL) { + return -EINVAL; + } + return 0; +} + +void postcopy_outgoing_state_begin(QEMUFile *f) +{ + uint64_t options = 0; + qemu_put_ubyte(f, QEMU_VM_POSTCOPY_INIT); + qemu_put_be32(f, sizeof(options)); + qemu_put_be64(f, options); +} + +void postcopy_outgoing_state_complete( + QEMUFile *f, const uint8_t *buffer, size_t buffer_size) +{ + qemu_put_ubyte(f, QEMU_VM_POSTCOPY_SECTION_FULL); + qemu_put_be32(f, buffer_size); + qemu_put_buffer(f, buffer, buffer_size); +} + +int postcopy_outgoing_ram_save_iterate(QEMUFile *f, void *opaque) +{ + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); + return 1; +} + +int postcopy_outgoing_ram_save_complete(QEMUFile *f, void *opaque) +{ + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); + return 0; +} + +/* + * return value + * 0: continue postcopy mode + * > 0: completed postcopy mode. + * < 0: error + */ +static int postcopy_outgoing_handle_req(PostcopyOutgoingState *s, + const struct qemu_umem_req *req, + bool *written) +{ + int i; + RAMBlock *block; + + DPRINTF("cmd %d state %d\n", req->cmd, s->state); + switch(req->cmd) { + case QEMU_UMEM_REQ_INIT: + /* nothing */ + break; + case QEMU_UMEM_REQ_EOC: + /* tell to finish migration. */ + if (s->state == PO_STATE_ALL_PAGES_SENT) { + s->state = PO_STATE_COMPLETED; + DPRINTF("-> PO_STATE_COMPLETED\n"); + } else { + s->state = PO_STATE_EOC_RECEIVED; + DPRINTF("-> PO_STATE_EOC_RECEIVED\n"); + } + return 1; + case QEMU_UMEM_REQ_PAGE: + DPRINTF("idstr: %s\n", req->idstr); + block = ram_find_block(req->idstr, strlen(req->idstr)); + if (block == NULL) { + return -EINVAL; + } + s->last_block_read = block; + /* fall through */ + case QEMU_UMEM_REQ_PAGE_CONT: + DPRINTF("nr %d\n", req->nr); + if (s->mig_buffered_write == NULL) { + assert(s->state == PO_STATE_ALL_PAGES_SENT); + break; + } + for (i = 0; i < req->nr; i++) { + DPRINTF("offs[%d] 0x%"PRIx64"\n", i, req->pgoffs[i]); + int ret = ram_save_page(s->mig_buffered_write, s->last_block_read, + req->pgoffs[i] << TARGET_PAGE_BITS, false); + if (ret > 0) { + *written = true; + } + } + break; + default: + return -EINVAL; + } + return 0; +} + +static void postcopy_outgoing_close_mig_read(PostcopyOutgoingState *s) +{ + if (s->mig_read != NULL) { + qemu_set_fd_handler(s->fd_read, NULL, NULL, NULL); + qemu_fclose(s->mig_read); + s->mig_read = NULL; + fd_close(&s->fd_read); + + s->ms->file_read = NULL; + s->ms->fd_read = -1; + } +} + +static void postcopy_outgoing_completed(PostcopyOutgoingState *s) +{ + postcopy_outgoing_close_mig_read(s); + s->ms->postcopy = NULL; + g_free(s); +} + +static void postcopy_outgoing_recv_handler(void *opaque) +{ + PostcopyOutgoingState *s = opaque; + bool written = false; + int ret = 0; + + assert(s->state == PO_STATE_ACTIVE || + s->state == PO_STATE_ALL_PAGES_SENT); + + do { + struct qemu_umem_req req = {.idstr = NULL, + .pgoffs = NULL}; + + ret = postcopy_outgoing_recv_req(s->mig_read, &req); + if (ret < 0) { + if (ret == -EAGAIN) { + ret = 0; + } + break; + } + + /* Even when s->state == PO_STATE_ALL_PAGES_SENT, + some request can be received like QEMU_UMEM_REQ_EOC */ + ret = postcopy_outgoing_handle_req(s, &req, &written); + postcopy_outgoing_free_req(&req); + } while (ret == 0); + + /* + * flush buffered_file. + * Although mig_write is rate-limited buffered file, those written pages + * are requested on demand by the destination. So forcibly push + * those pages ignoring rate limiting + */ + if (written) { + qemu_buffered_file_drain(s->mig_buffered_write); + } + + if (ret < 0) { + switch (s->state) { + case PO_STATE_ACTIVE: + s->state = PO_STATE_ERROR_RECEIVE; + DPRINTF("-> PO_STATE_ERROR_RECEIVE\n"); + break; + case PO_STATE_ALL_PAGES_SENT: + s->state = PO_STATE_COMPLETED; + DPRINTF("-> PO_STATE_ALL_PAGES_SENT\n"); + break; + default: + abort(); + } + } + if (s->state == PO_STATE_ERROR_RECEIVE || s->state == PO_STATE_COMPLETED) { + postcopy_outgoing_close_mig_read(s); + } + if (s->state == PO_STATE_COMPLETED) { + DPRINTF("PO_STATE_COMPLETED\n"); + MigrationState *ms = s->ms; + postcopy_outgoing_completed(s); + migrate_fd_completed(ms); + } +} + +PostcopyOutgoingState *postcopy_outgoing_begin(MigrationState *ms) +{ + PostcopyOutgoingState *s = g_new(PostcopyOutgoingState, 1); + DPRINTF("outgoing begin\n"); + qemu_buffered_file_drain(ms->file); + + s->ms = ms; + s->state = PO_STATE_ACTIVE; + s->fd_read = ms->fd_read; + s->mig_read = ms->file_read; + s->mig_buffered_write = ms->file; + + /* Make sure all dirty bits are set */ + memory_global_dirty_log_stop(); + migration_bitmap_init(); + + qemu_set_fd_handler(s->fd_read, + &postcopy_outgoing_recv_handler, NULL, s); + postcopy_outgoing_recv_handler(s); + return s; +} + +static void postcopy_outgoing_ram_all_sent(QEMUFile *f, + PostcopyOutgoingState *s) +{ + assert(s->state == PO_STATE_ACTIVE); + + s->state = PO_STATE_ALL_PAGES_SENT; + /* tell incoming side that all pages are sent */ + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); + qemu_buffered_file_drain(f); + DPRINTF("sent RAM_SAVE_FLAG_EOS\n"); + migrate_fd_cleanup(s->ms); + + /* Later migrate_fd_complete() will be called which calls + * migrate_fd_cleanup() again. So dummy file is created + * for qemu monitor to keep working. + */ + s->ms->file = qemu_fopen_ops(NULL, NULL, NULL, NULL, NULL, + NULL, NULL); + s->mig_buffered_write = NULL; + + migration_bitmap_free(); +} + +int postcopy_outgoing_ram_save_background(QEMUFile *f, void *postcopy) +{ + PostcopyOutgoingState *s = postcopy; +#define MAX_WAIT 50 /* stolen from ram_save_iterate() */ + double t0; + int i; + + assert(s->state == PO_STATE_ACTIVE || + s->state == PO_STATE_EOC_RECEIVED || + s->state == PO_STATE_ERROR_RECEIVE); + + switch (s->state) { + case PO_STATE_ACTIVE: + /* nothing. processed below */ + break; + case PO_STATE_EOC_RECEIVED: + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); + s->state = PO_STATE_COMPLETED; + postcopy_outgoing_completed(s); + DPRINTF("PO_STATE_COMPLETED\n"); + return 1; + case PO_STATE_ERROR_RECEIVE: + postcopy_outgoing_completed(s); + DPRINTF("PO_STATE_ERROR_RECEIVE\n"); + return -1; + default: + abort(); + } + + DPRINTF("outgoing background state: %d\n", s->state); + i = 0; + t0 = qemu_get_clock_ns(rt_clock); + while (qemu_file_rate_limit(f) == 0) { + int nfds = -1; + fd_set readfds; + struct timeval timeout = {.tv_sec = 0, .tv_usec = 0}; + int ret; + + if (ram_save_block(f, false) == 0) { /* no more blocks */ + DPRINTF("outgoing background all sent\n"); + assert(s->state == PO_STATE_ACTIVE); + postcopy_outgoing_ram_all_sent(f, s); + return 0; + } + + FD_ZERO(&readfds); + set_fd(s->fd_read, &readfds, &nfds); + ret = select(nfds + 1, &readfds, NULL, NULL, &timeout); + if (ret >= 0 && FD_ISSET(s->fd_read, &readfds)) { + /* page request is pending */ + DPRINTF("pending request\n"); + break; + } + + /* stolen from ram_save_iterate() */ + if ((i & 63) == 0) { + int64_t t1 = (qemu_get_clock_ns(rt_clock) - t0) / 1000000; + if (t1 > MAX_WAIT) { + DPRINTF("too long %"PRIu64"\n", t1); + break; + } + } + i++; + } + + return 0; +} + +/*************************************************************************** * incoming part */ diff --git a/migration-tcp.c b/migration-tcp.c index 69c655d..506246e 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -64,7 +64,11 @@ static void tcp_wait_for_connect(int fd, void *opaque) } else { DPRINTF("migrate connect success\n"); s->fd = fd; - migrate_fd_connect(s); + if (postcopy_outgoing_create_read_socket(s) < 0) { + migrate_fd_error(s); + } else { + migrate_fd_connect(s); + } } } diff --git a/migration-unix.c b/migration-unix.c index d4e2431..7fc4906 100644 --- a/migration-unix.c +++ b/migration-unix.c @@ -71,12 +71,20 @@ static void unix_wait_for_connect(void *opaque) qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL); - if (val == 0) + if (val == 0) { + ret = postcopy_outgoing_create_read_socket(s); + if (ret < 0) { + goto error_out; + } migrate_fd_connect(s); - else { + } else { DPRINTF("error connecting %d\n", val); - migrate_fd_error(s); + goto error_out; } + return; + +error_out: + migrate_fd_error(s); } int unix_start_outgoing_migration(MigrationState *s, const char *path) @@ -111,11 +119,19 @@ int unix_start_outgoing_migration(MigrationState *s, const char *path) if (ret < 0) { DPRINTF("connect failed\n"); - migrate_fd_error(s); - return ret; + goto error_out; + } + + ret = postcopy_outgoing_create_read_socket(s); + if (ret < 0) { + goto error_out; } migrate_fd_connect(s); return 0; + +error_out: + migrate_fd_error(s); + return ret; } static void unix_accept_incoming_migration(void *opaque) diff --git a/migration.c b/migration.c index 8bb6073..85f8f71 100644 --- a/migration.c +++ b/migration.c @@ -41,6 +41,11 @@ enum { MIG_STATE_COMPLETED, }; +enum { + MIG_SUBSTATE_PRECOPY, + MIG_SUBSTATE_POSTCOPY, +}; + #define MAX_THROTTLE (32 << 20) /* Migration speed throttling */ /* Migration XBZRLE default cache size */ @@ -328,6 +333,17 @@ void migrate_fd_put_ready(MigrationState *s) return; } + if (s->substate == MIG_SUBSTATE_POSTCOPY) { + /* PRINTF("postcopy background\n"); */ + ret = postcopy_outgoing_ram_save_background(s->file, s->postcopy); + if (ret > 0) { + migrate_fd_completed(s); + } else if (ret < 0) { + migrate_fd_error(s); + } + return; + } + DPRINTF("iterate\n"); ret = qemu_savevm_state_iterate(s->file); if (ret < 0) { @@ -341,7 +357,20 @@ void migrate_fd_put_ready(MigrationState *s) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); - if (qemu_savevm_state_complete(s->file) < 0) { + if (s->params.postcopy) { + if (qemu_savevm_state_complete(s->file, &s->params) < 0) { + migrate_fd_error(s); + if (old_vm_running) { + vm_start(); + } + return; + } + s->substate = MIG_SUBSTATE_POSTCOPY; + s->postcopy = postcopy_outgoing_begin(s); + return; + } + + if (qemu_savevm_state_complete(s->file, &s->params) < 0) { migrate_fd_error(s); } else { migrate_fd_completed(s); @@ -431,6 +460,7 @@ void migrate_fd_connect(MigrationState *s) int ret; s->state = MIG_STATE_ACTIVE; + s->substate = MIG_SUBSTATE_PRECOPY; s->file = qemu_fopen_ops_buffered(s); DPRINTF("beginning savevm\n"); diff --git a/migration.h b/migration.h index b21df18..9b3c03b 100644 --- a/migration.h +++ b/migration.h @@ -28,6 +28,7 @@ struct MigrationParams { }; typedef struct MigrationState MigrationState; +typedef struct PostcopyOutgoingState PostcopyOutgoingState; struct MigrationState { @@ -46,6 +47,12 @@ struct MigrationState int64_t dirty_pages_rate; bool enabled_capabilities[MIGRATION_CAPABILITY_MAX]; int64_t xbzrle_cache_size; + + /* for postcopy */ + int substate; /* precopy or postcopy */ + int fd_read; + QEMUFile *file_read; /* connection from the detination */ + PostcopyOutgoingState *postcopy; }; void process_incoming_migration(QEMUFile *f); @@ -135,6 +142,17 @@ int64_t migrate_xbzrle_cache_size(void); int64_t xbzrle_cache_resize(int64_t new_size); +/* For outgoing postcopy */ +int postcopy_outgoing_create_read_socket(MigrationState *s); +void postcopy_outgoing_state_begin(QEMUFile *f); +void postcopy_outgoing_state_complete( + QEMUFile *f, const uint8_t *buffer, size_t buffer_size); +int postcopy_outgoing_ram_save_iterate(QEMUFile *f, void *opaque); +int postcopy_outgoing_ram_save_complete(QEMUFile *f, void *opaque); + +PostcopyOutgoingState *postcopy_outgoing_begin(MigrationState *s); +int postcopy_outgoing_ram_save_background(QEMUFile *f, void *postcopy); + /* For incoming postcopy */ extern bool incoming_postcopy; diff --git a/savevm.c b/savevm.c index 04b03cf..675f9a5 100644 --- a/savevm.c +++ b/savevm.c @@ -1647,6 +1647,12 @@ int qemu_savevm_state_begin(QEMUFile *f, qemu_put_be32(f, QEMU_VM_FILE_MAGIC); qemu_put_be32(f, QEMU_VM_FILE_VERSION); + if (params->postcopy) { + /* tell this is postcopy */ + qemu_put_byte(f, QEMU_VM_POSTCOPY); + postcopy_outgoing_state_begin(f); + } + QTAILQ_FOREACH(se, &savevm_handlers, entry) { int len; @@ -1734,8 +1740,10 @@ int qemu_savevm_state_iterate(QEMUFile *f) return ret; } -int qemu_savevm_state_complete(QEMUFile *f) +int qemu_savevm_state_complete(QEMUFile *f, const MigrationParams *params) { + QEMUFile *orig_f = NULL; + QEMUFileBuf *buf_file = NULL; SaveStateEntry *se; int ret; @@ -1762,6 +1770,20 @@ int qemu_savevm_state_complete(QEMUFile *f) } } + if (params->postcopy) { + /* VMStateDescription:pre/post_load and + * cpu_sychronize_all_post_init() may fault on guest RAM. + * (MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME) + * postcopy threads needs to be created before the fault. + * + * This is hacky, but it's because size of section/state structure + * can't be easily determined without actual loading. + */ + orig_f = f; + buf_file = qemu_fopen_buf_write(); + f = buf_file->file; + } + QTAILQ_FOREACH(se, &savevm_handlers, entry) { int len; @@ -1787,6 +1809,15 @@ int qemu_savevm_state_complete(QEMUFile *f) qemu_put_byte(f, QEMU_VM_EOF); + if (params->postcopy) { + qemu_fflush(f); + qemu_put_byte(orig_f, QEMU_VM_POSTCOPY); + postcopy_outgoing_state_complete( + orig_f, buf_file->buffer, buf_file->buffer_size); + qemu_fclose(f); + f = orig_f; + } + return qemu_file_get_error(f); } @@ -1825,7 +1856,7 @@ static int qemu_savevm_state(QEMUFile *f) goto out; } while (ret == 0); - ret = qemu_savevm_state_complete(f); + ret = qemu_savevm_state_complete(f, ¶ms); out: if (ret == 0) { diff --git a/sysemu.h b/sysemu.h index 0c39a3a..f1129e7 100644 --- a/sysemu.h +++ b/sysemu.h @@ -81,7 +81,7 @@ bool qemu_savevm_state_blocked(Error **errp); int qemu_savevm_state_begin(QEMUFile *f, const MigrationParams *params); int qemu_savevm_state_iterate(QEMUFile *f); -int qemu_savevm_state_complete(QEMUFile *f); +int qemu_savevm_state_complete(QEMUFile *f, const MigrationParams *params); void qemu_savevm_state_cancel(QEMUFile *f); int qemu_loadvm_state(QEMUFile *f);