From patchwork Fri Mar 9 09:15:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883522 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMRy6dRkz9sWt for ; Fri, 9 Mar 2018 20:25:34 +1100 (AEDT) Received: from localhost ([::1]:43994 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEHM-0002pI-Ms for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:25:32 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44779) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE7w-0003S6-B9 for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE7v-0005bZ-CX for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:48 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59236 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE7v-0005bM-7h for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:47 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D6D7B406802E; Fri, 9 Mar 2018 09:15:46 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1C36C215CDA7; Fri, 9 Mar 2018 09:15:43 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:13 +0800 Message-Id: <20180309091535.13315-2-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:15:46 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:15:46 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 01/23] migration: let incoming side use thread context X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Laurent Vivier , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" The old incoming migration is running in main thread and default gcontext. With the new qio_channel_add_watch_full() we can now let it run in the thread's own gcontext (if there is one). Currently this patch does nothing alone. But when any of the incoming migration is run in another iothread (e.g., the upcoming migrate-recover command), this patch will bind the incoming logic to the iothread instead of the main thread (which may already get page faulted and hanged). RDMA is not considered for now since it's not even using the QIO watch framework at all. CC: Juan Quintela CC: Dr. David Alan Gilbert CC: Laurent Vivier Signed-off-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert --- migration/exec.c | 9 ++++----- migration/fd.c | 9 ++++----- migration/socket.c | 10 +++++----- 3 files changed, 13 insertions(+), 15 deletions(-) diff --git a/migration/exec.c b/migration/exec.c index 0bc5a427dd..9d0f82f1f0 100644 --- a/migration/exec.c +++ b/migration/exec.c @@ -65,9 +65,8 @@ void exec_start_incoming_migration(const char *command, Error **errp) } qio_channel_set_name(ioc, "migration-exec-incoming"); - qio_channel_add_watch(ioc, - G_IO_IN, - exec_accept_incoming_migration, - NULL, - NULL); + qio_channel_add_watch_full(ioc, G_IO_IN, + exec_accept_incoming_migration, + NULL, NULL, + g_main_context_get_thread_default()); } diff --git a/migration/fd.c b/migration/fd.c index cd06182d1e..9a380bbbc4 100644 --- a/migration/fd.c +++ b/migration/fd.c @@ -66,9 +66,8 @@ void fd_start_incoming_migration(const char *infd, Error **errp) } qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-incoming"); - qio_channel_add_watch(ioc, - G_IO_IN, - fd_accept_incoming_migration, - NULL, - NULL); + qio_channel_add_watch_full(ioc, G_IO_IN, + fd_accept_incoming_migration, + NULL, NULL, + g_main_context_get_thread_default()); } diff --git a/migration/socket.c b/migration/socket.c index 8a93fb1af5..91071b06ef 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -174,11 +174,11 @@ static void socket_start_incoming_migration(SocketAddress *saddr, return; } - qio_channel_add_watch(QIO_CHANNEL(listen_ioc), - G_IO_IN, - socket_accept_incoming_migration, - listen_ioc, - (GDestroyNotify)object_unref); + qio_channel_add_watch_full(QIO_CHANNEL(listen_ioc), G_IO_IN, + socket_accept_incoming_migration, + listen_ioc, + (GDestroyNotify)object_unref, + g_main_context_get_thread_default()); } void tcp_start_incoming_migration(const char *host_port, Error **errp) From patchwork Fri Mar 9 09:15:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883533 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMZ66nvyz9sbv for ; Fri, 9 Mar 2018 20:30:54 +1100 (AEDT) Received: from localhost ([::1]:44028 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEMW-0007q9-NZ for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:30:52 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44812) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE81-0003XO-B0 for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE7y-0005dy-CS for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:53 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:45384 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE7y-0005di-8G for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:50 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D6AF5EBFE5; Fri, 9 Mar 2018 09:15:49 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 638CE215CDA7; Fri, 9 Mar 2018 09:15:47 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:14 +0800 Message-Id: <20180309091535.13315-3-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 09 Mar 2018 09:15:49 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 09 Mar 2018 09:15:49 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 02/23] migration: new postcopy-pause state X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Introducing a new state "postcopy-paused", which can be used when the postcopy migration is paused. It is targeted for postcopy network failure recovery. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 2 ++ qapi/migration.json | 5 ++++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index 9007ad83ea..f456f0d3f2 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -558,6 +558,7 @@ static bool migration_is_setup_or_active(int state) switch (state) { case MIGRATION_STATUS_ACTIVE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_PAUSED: case MIGRATION_STATUS_SETUP: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: @@ -637,6 +638,7 @@ MigrationInfo *qmp_query_migrate(Error **errp) case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: + case MIGRATION_STATUS_POSTCOPY_PAUSED: /* TODO add some postcopy stats */ info->has_status = true; info->has_total_time = true; diff --git a/qapi/migration.json b/qapi/migration.json index 7f465a1902..6142b065a1 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -89,6 +89,8 @@ # # @postcopy-active: like active, but now in postcopy mode. (since 2.5) # +# @postcopy-paused: during postcopy but paused. (since 2.12) +# # @completed: migration is finished. # # @failed: some error occurred during migration process. @@ -106,7 +108,8 @@ ## { 'enum': 'MigrationStatus', 'data': [ 'none', 'setup', 'cancelling', 'cancelled', - 'active', 'postcopy-active', 'completed', 'failed', 'colo', + 'active', 'postcopy-active', 'postcopy-paused', + 'completed', 'failed', 'colo', 'pre-switchover', 'device' ] } ## From patchwork Fri Mar 9 09:15:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883536 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMdv5zqqz9sdd for ; Fri, 9 Mar 2018 20:34:10 +1100 (AEDT) Received: from localhost ([::1]:44044 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEPg-0001wy-5b for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:34:08 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44828) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE82-0003YO-HJ for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:55 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE81-0005f0-D7 for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:54 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:45388 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE81-0005ee-7d for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:53 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D67A8EBFE3; Fri, 9 Mar 2018 09:15:52 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 625BB215CDA7; Fri, 9 Mar 2018 09:15:50 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:15 +0800 Message-Id: <20180309091535.13315-4-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 09 Mar 2018 09:15:52 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 09 Mar 2018 09:15:52 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 03/23] migration: implement "postcopy-pause" src logic X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Now when network down for postcopy, the source side will not fail the migration. Instead we convert the status into this new paused state, and we will try to wait for a rescue in the future. If a recovery is detected, migration_thread() will reset its local variables to prepare for that. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++--- migration/migration.h | 3 ++ migration/trace-events | 1 + 3 files changed, 97 insertions(+), 6 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index f456f0d3f2..84afed174d 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2174,6 +2174,80 @@ bool migrate_colo_enabled(void) return s->enabled_capabilities[MIGRATION_CAPABILITY_X_COLO]; } +typedef enum MigThrError { + /* No error detected */ + MIG_THR_ERR_NONE = 0, + /* Detected error, but resumed successfully */ + MIG_THR_ERR_RECOVERED = 1, + /* Detected fatal error, need to exit */ + MIG_THR_ERR_FATAL = 2, +} MigThrError; + +/* + * We don't return until we are in a safe state to continue current + * postcopy migration. Returns MIG_THR_ERR_RECOVERED if recovered, or + * MIG_THR_ERR_FATAL if unrecovery failure happened. + */ +static MigThrError postcopy_pause(MigrationState *s) +{ + assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); + migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, + MIGRATION_STATUS_POSTCOPY_PAUSED); + + /* Current channel is possibly broken. Release it. */ + assert(s->to_dst_file); + qemu_file_shutdown(s->to_dst_file); + qemu_fclose(s->to_dst_file); + s->to_dst_file = NULL; + + error_report("Detected IO failure for postcopy. " + "Migration paused."); + + /* + * We wait until things fixed up. Then someone will setup the + * status back for us. + */ + while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { + qemu_sem_wait(&s->postcopy_pause_sem); + } + + trace_postcopy_pause_continued(); + + return MIG_THR_ERR_RECOVERED; +} + +static MigThrError migration_detect_error(MigrationState *s) +{ + int ret; + + /* Try to detect any file errors */ + ret = qemu_file_get_error(s->to_dst_file); + + if (!ret) { + /* Everything is fine */ + return MIG_THR_ERR_NONE; + } + + if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) { + /* + * For postcopy, we allow the network to be down for a + * while. After that, it can be continued by a + * recovery phase. + */ + return postcopy_pause(s); + } else { + /* + * For precopy (or postcopy with error outside IO), we fail + * with no time. + */ + migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED); + trace_migration_thread_file_err(); + + /* Time to stop the migration, now. */ + return MIG_THR_ERR_FATAL; + } +} + static void migration_calculate_complete(MigrationState *s) { uint64_t bytes = qemu_ftell(s->to_dst_file); @@ -2330,6 +2404,7 @@ static void *migration_thread(void *opaque) { MigrationState *s = opaque; int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST); + MigThrError thr_error; rcu_register_thread(); @@ -2379,13 +2454,22 @@ static void *migration_thread(void *opaque) } } - if (qemu_file_get_error(s->to_dst_file)) { - if (migration_is_setup_or_active(s->state)) { - migrate_set_state(&s->state, s->state, - MIGRATION_STATUS_FAILED); - } - trace_migration_thread_file_err(); + /* + * Try to detect any kind of failures, and see whether we + * should stop the migration now. + */ + thr_error = migration_detect_error(s); + if (thr_error == MIG_THR_ERR_FATAL) { + /* Stop migration */ break; + } else if (thr_error == MIG_THR_ERR_RECOVERED) { + /* + * Just recovered from a e.g. network failure, reset all + * the local variables. This is important to avoid + * breaking transferred_bytes and bandwidth calculation + */ + s->iteration_start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + s->iteration_initial_bytes = 0; } current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); @@ -2543,6 +2627,7 @@ static void migration_instance_finalize(Object *obj) g_free(params->tls_hostname); g_free(params->tls_creds); qemu_sem_destroy(&ms->pause_sem); + qemu_sem_destroy(&ms->postcopy_pause_sem); } static void migration_instance_init(Object *obj) @@ -2571,6 +2656,8 @@ static void migration_instance_init(Object *obj) params->has_x_multifd_channels = true; params->has_x_multifd_page_count = true; params->has_xbzrle_cache_size = true; + + qemu_sem_init(&ms->postcopy_pause_sem, 0); } /* diff --git a/migration/migration.h b/migration/migration.h index 08c5d2ded1..5774b25358 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -178,6 +178,9 @@ struct MigrationState bool send_configuration; /* Whether we send section footer during migration */ bool send_section_footer; + + /* Needed by postcopy-pause state */ + QemuSemaphore postcopy_pause_sem; }; void migrate_set_state(int *state, int old_state, int new_state); diff --git a/migration/trace-events b/migration/trace-events index 93961dea16..7afbb3e791 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -99,6 +99,7 @@ migration_thread_setup_complete(void) "" open_return_path_on_source(void) "" open_return_path_on_source_continue(void) "" postcopy_start(void) "" +postcopy_pause_continued(void) "" postcopy_start_set_run(void) "" source_return_path_thread_bad_end(void) "" source_return_path_thread_end(void) "" From patchwork Fri Mar 9 09:15:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883516 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMFx42cGz9sWt for ; Fri, 9 Mar 2018 20:16:53 +1100 (AEDT) Received: from localhost ([::1]:43942 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8x-0003ce-84 for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:16:51 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44854) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE85-0003aa-Py for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE84-0005g6-Cs for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:57 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60692 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE84-0005fr-7A for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:56 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D70938182D30; Fri, 9 Mar 2018 09:15:55 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 62803215CDA7; Fri, 9 Mar 2018 09:15:53 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:16 +0800 Message-Id: <20180309091535.13315-5-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:15:55 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:15:55 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 04/23] migration: allow dst vm pause on postcopy X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" When there is IO error on the incoming channel (e.g., network down), instead of bailing out immediately, we allow the dst vm to switch to the new POSTCOPY_PAUSE state. Currently it is still simple - it waits the new semaphore, until someone poke it for another attempt. One note is that here on ram loading thread we cannot detect the POSTCOPY_ACTIVE state, but we need to detect the more specific POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all the device states. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 1 + migration/migration.h | 3 +++ migration/savevm.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++-- migration/trace-events | 2 ++ 4 files changed, 67 insertions(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 84afed174d..91a4ebba8f 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -157,6 +157,7 @@ MigrationIncomingState *migration_incoming_get_current(void) memset(&mis_current, 0, sizeof(MigrationIncomingState)); qemu_mutex_init(&mis_current.rp_mutex); qemu_event_init(&mis_current.main_thread_load_event, false); + qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0); once = true; } return &mis_current; diff --git a/migration/migration.h b/migration/migration.h index 5774b25358..9a2de4ba9b 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -61,6 +61,9 @@ struct MigrationIncomingState { /* The coroutine we should enter (back) after failover */ Coroutine *migration_incoming_co; QemuSemaphore colo_incoming_sem; + + /* notify PAUSED postcopy incoming migrations to try to continue */ + QemuSemaphore postcopy_pause_sem_dst; }; MigrationIncomingState *migration_incoming_get_current(void); diff --git a/migration/savevm.c b/migration/savevm.c index 358c5b51e2..babcff4844 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1549,8 +1549,8 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis, */ static void *postcopy_ram_listen_thread(void *opaque) { - QEMUFile *f = opaque; MigrationIncomingState *mis = migration_incoming_get_current(); + QEMUFile *f = mis->from_src_file; int load_res; migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, @@ -1564,6 +1564,14 @@ static void *postcopy_ram_listen_thread(void *opaque) */ qemu_file_set_blocking(f, true); load_res = qemu_loadvm_state_main(f, mis); + + /* + * This is tricky, but, mis->from_src_file can change after it + * returns, when postcopy recovery happened. In the future, we may + * want a wrapper for the QEMUFile handle. + */ + f = mis->from_src_file; + /* And non-blocking again so we don't block in any cleanup */ qemu_file_set_blocking(f, false); @@ -1646,7 +1654,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis) /* Start up the listening thread and wait for it to signal ready */ qemu_sem_init(&mis->listen_thread_sem, 0); qemu_thread_create(&mis->listen_thread, "postcopy/listen", - postcopy_ram_listen_thread, mis->from_src_file, + postcopy_ram_listen_thread, NULL, QEMU_THREAD_DETACHED); qemu_sem_wait(&mis->listen_thread_sem); qemu_sem_destroy(&mis->listen_thread_sem); @@ -2031,11 +2039,44 @@ void qemu_loadvm_state_cleanup(void) } } +/* Return true if we should continue the migration, or false. */ +static bool postcopy_pause_incoming(MigrationIncomingState *mis) +{ + trace_postcopy_pause_incoming(); + + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, + MIGRATION_STATUS_POSTCOPY_PAUSED); + + assert(mis->from_src_file); + qemu_file_shutdown(mis->from_src_file); + qemu_fclose(mis->from_src_file); + mis->from_src_file = NULL; + + assert(mis->to_src_file); + qemu_file_shutdown(mis->to_src_file); + qemu_mutex_lock(&mis->rp_mutex); + qemu_fclose(mis->to_src_file); + mis->to_src_file = NULL; + qemu_mutex_unlock(&mis->rp_mutex); + + error_report("Detected IO failure for postcopy. " + "Migration paused."); + + while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { + qemu_sem_wait(&mis->postcopy_pause_sem_dst); + } + + trace_postcopy_pause_incoming_continued(); + + return true; +} + static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis) { uint8_t section_type; int ret = 0; +retry: while (true) { section_type = qemu_get_byte(f); @@ -2080,6 +2121,24 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis) out: if (ret < 0) { qemu_file_set_error(f, ret); + + /* + * Detect whether it is: + * + * 1. postcopy running (after receiving all device data, which + * must be in POSTCOPY_INCOMING_RUNNING state. Note that + * POSTCOPY_INCOMING_LISTENING is still not enough, it's + * still receiving device states). + * 2. network failure (-EIO) + * + * If so, we try to wait for a recovery. + */ + if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING && + ret == -EIO && postcopy_pause_incoming(mis)) { + /* Reset f to point to the newly created channel */ + f = mis->from_src_file; + goto retry; + } } return ret; } diff --git a/migration/trace-events b/migration/trace-events index 7afbb3e791..8685a62c98 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -100,6 +100,8 @@ open_return_path_on_source(void) "" open_return_path_on_source_continue(void) "" postcopy_start(void) "" postcopy_pause_continued(void) "" +postcopy_pause_incoming(void) "" +postcopy_pause_incoming_continued(void) "" postcopy_start_set_run(void) "" source_return_path_thread_bad_end(void) "" source_return_path_thread_end(void) "" From patchwork Fri Mar 9 09:15:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883542 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMll6VrNz9s8r for ; Fri, 9 Mar 2018 20:39:14 +1100 (AEDT) Received: from localhost ([::1]:44082 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEUZ-0006Qn-KG for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:39:11 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44901) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8C-0003gG-CK for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE87-0005ha-Dc for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:04 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:45392 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE87-0005gs-7g for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:15:59 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DAA25EBFE3; Fri, 9 Mar 2018 09:15:58 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 64D81215CDA7; Fri, 9 Mar 2018 09:15:56 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:17 +0800 Message-Id: <20180309091535.13315-6-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 09 Mar 2018 09:15:58 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 09 Mar 2018 09:15:58 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 05/23] migration: allow src return path to pause X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Let the thread pause for network issues. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 35 +++++++++++++++++++++++++++++++++-- migration/migration.h | 1 + migration/trace-events | 2 ++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 91a4ebba8f..b764f3b804 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1708,6 +1708,18 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, } } +/* Return true to retry, false to quit */ +static bool postcopy_pause_return_path_thread(MigrationState *s) +{ + trace_postcopy_pause_return_path(); + + qemu_sem_wait(&s->postcopy_pause_rp_sem); + + trace_postcopy_pause_return_path_continued(); + + return true; +} + /* * Handles messages sent on the return path towards the source VM * @@ -1724,6 +1736,8 @@ static void *source_return_path_thread(void *opaque) int res; trace_source_return_path_thread_entry(); + +retry: while (!ms->rp_state.error && !qemu_file_get_error(rp) && migration_is_setup_or_active(ms->state)) { trace_source_return_path_thread_loop_top(); @@ -1815,13 +1829,28 @@ static void *source_return_path_thread(void *opaque) break; } } - if (qemu_file_get_error(rp)) { + +out: + res = qemu_file_get_error(rp); + if (res) { + if (res == -EIO) { + /* + * Maybe there is something we can do: it looks like a + * network down issue, and we pause for a recovery. + */ + if (postcopy_pause_return_path_thread(ms)) { + /* Reload rp, reset the rest */ + rp = ms->rp_state.from_dst_file; + ms->rp_state.error = false; + goto retry; + } + } + trace_source_return_path_thread_bad_end(); mark_source_rp_bad(ms); } trace_source_return_path_thread_end(); -out: ms->rp_state.from_dst_file = NULL; qemu_fclose(rp); return NULL; @@ -2629,6 +2658,7 @@ static void migration_instance_finalize(Object *obj) g_free(params->tls_creds); qemu_sem_destroy(&ms->pause_sem); qemu_sem_destroy(&ms->postcopy_pause_sem); + qemu_sem_destroy(&ms->postcopy_pause_rp_sem); } static void migration_instance_init(Object *obj) @@ -2659,6 +2689,7 @@ static void migration_instance_init(Object *obj) params->has_xbzrle_cache_size = true; qemu_sem_init(&ms->postcopy_pause_sem, 0); + qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); } /* diff --git a/migration/migration.h b/migration/migration.h index 9a2de4ba9b..1a18d2e142 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -184,6 +184,7 @@ struct MigrationState /* Needed by postcopy-pause state */ QemuSemaphore postcopy_pause_sem; + QemuSemaphore postcopy_pause_rp_sem; }; void migrate_set_state(int *state, int old_state, int new_state); diff --git a/migration/trace-events b/migration/trace-events index 8685a62c98..ca17a70222 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -99,6 +99,8 @@ migration_thread_setup_complete(void) "" open_return_path_on_source(void) "" open_return_path_on_source_continue(void) "" postcopy_start(void) "" +postcopy_pause_return_path(void) "" +postcopy_pause_return_path_continued(void) "" postcopy_pause_continued(void) "" postcopy_pause_incoming(void) "" postcopy_pause_incoming_continued(void) "" From patchwork Fri Mar 9 09:15:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883521 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMQc23XPz9sXr for ; Fri, 9 Mar 2018 20:24:24 +1100 (AEDT) Received: from localhost ([::1]:43987 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEGE-0001gY-2D for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:24:22 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44902) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8C-0003gH-CM for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8A-0005mU-D3 for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:04 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60700 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8A-0005ln-6r for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:02 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DBA358182D30; Fri, 9 Mar 2018 09:16:01 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 66484215CDA7; Fri, 9 Mar 2018 09:15:59 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:18 +0800 Message-Id: <20180309091535.13315-7-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:16:01 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:16:01 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 06/23] migration: allow fault thread to pause X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Allows the fault thread to stop handling page faults temporarily. When network failure happened (and if we expect a recovery afterwards), we should not allow the fault thread to continue sending things to source, instead, it should halt for a while until the connection is rebuilt. When the dest main thread noticed the failure, it kicks the fault thread to switch to pause state. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 1 + migration/migration.h | 1 + migration/postcopy-ram.c | 50 ++++++++++++++++++++++++++++++++++++++++++++---- migration/savevm.c | 3 +++ migration/trace-events | 2 ++ 5 files changed, 53 insertions(+), 4 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index b764f3b804..2d97dd742a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -158,6 +158,7 @@ MigrationIncomingState *migration_incoming_get_current(void) qemu_mutex_init(&mis_current.rp_mutex); qemu_event_init(&mis_current.main_thread_load_event, false); qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0); + qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0); once = true; } return &mis_current; diff --git a/migration/migration.h b/migration/migration.h index 1a18d2e142..92517fd6e3 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -64,6 +64,7 @@ struct MigrationIncomingState { /* notify PAUSED postcopy incoming migrations to try to continue */ QemuSemaphore postcopy_pause_sem_dst; + QemuSemaphore postcopy_pause_sem_fault; }; MigrationIncomingState *migration_incoming_get_current(void); diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 032abfbf1a..31c290c884 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -485,6 +485,17 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr, return 0; } +static bool postcopy_pause_fault_thread(MigrationIncomingState *mis) +{ + trace_postcopy_pause_fault_thread(); + + qemu_sem_wait(&mis->postcopy_pause_sem_fault); + + trace_postcopy_pause_fault_thread_continued(); + + return true; +} + /* * Handle faults detected by the USERFAULT markings */ @@ -535,6 +546,22 @@ static void *postcopy_ram_fault_thread(void *opaque) } } + if (!mis->to_src_file) { + /* + * Possibly someone tells us that the return path is + * broken already using the event. We should hold until + * the channel is rebuilt. + */ + if (postcopy_pause_fault_thread(mis)) { + last_rb = NULL; + /* Continue to read the userfaultfd */ + } else { + error_report("%s: paused but don't allow to continue", + __func__); + break; + } + } + ret = read(mis->userfault_fd, &msg, sizeof(msg)); if (ret != sizeof(msg)) { if (errno == EAGAIN) { @@ -574,18 +601,33 @@ static void *postcopy_ram_fault_thread(void *opaque) qemu_ram_get_idstr(rb), rb_offset); +retry: /* * Send the request to the source - we want to request one * of our host page sizes (which is >= TPS) */ if (rb != last_rb) { last_rb = rb; - migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb), - rb_offset, qemu_ram_pagesize(rb)); + ret = migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb), + rb_offset, qemu_ram_pagesize(rb)); } else { /* Save some space */ - migrate_send_rp_req_pages(mis, NULL, - rb_offset, qemu_ram_pagesize(rb)); + ret = migrate_send_rp_req_pages(mis, NULL, + rb_offset, qemu_ram_pagesize(rb)); + } + + if (ret) { + /* May be network failure, try to wait for recovery */ + if (ret == -EIO && postcopy_pause_fault_thread(mis)) { + /* We got reconnected somehow, try to continue */ + last_rb = NULL; + goto retry; + } else { + /* This is a unavoidable fault */ + error_report("%s: migrate_send_rp_req_pages() get %d", + __func__, ret); + break; + } } } trace_postcopy_ram_fault_thread_exit(); diff --git a/migration/savevm.c b/migration/savevm.c index babcff4844..93e6a13477 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2059,6 +2059,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis) mis->to_src_file = NULL; qemu_mutex_unlock(&mis->rp_mutex); + /* Notify the fault thread for the invalidated file handle */ + postcopy_fault_thread_notify(mis); + error_report("Detected IO failure for postcopy. " "Migration paused."); diff --git a/migration/trace-events b/migration/trace-events index ca17a70222..06a919a6e3 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -101,6 +101,8 @@ open_return_path_on_source_continue(void) "" postcopy_start(void) "" postcopy_pause_return_path(void) "" postcopy_pause_return_path_continued(void) "" +postcopy_pause_fault_thread(void) "" +postcopy_pause_fault_thread_continued(void) "" postcopy_pause_continued(void) "" postcopy_pause_incoming(void) "" postcopy_pause_incoming_continued(void) "" From patchwork Fri Mar 9 09:15:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883518 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMLN0T4nz9sXr for ; Fri, 9 Mar 2018 20:20:44 +1100 (AEDT) Received: from localhost ([::1]:43961 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euECf-0006mk-5S for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:20:41 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44924) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8E-0003iK-Ff for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8D-0005s6-Br for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:06 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:40160 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8D-0005rJ-7O for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:05 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DBF43D1437; Fri, 9 Mar 2018 09:16:04 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 66DD7215CDA7; Fri, 9 Mar 2018 09:16:02 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:19 +0800 Message-Id: <20180309091535.13315-8-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Fri, 09 Mar 2018 09:16:04 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Fri, 09 Mar 2018 09:16:04 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 07/23] qmp: hmp: add migrate "resume" option X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" It will be used when we want to resume one paused migration. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- hmp-commands.hx | 7 ++++--- hmp.c | 4 +++- migration/migration.c | 2 +- qapi/migration.json | 5 ++++- 4 files changed, 12 insertions(+), 6 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 964eb515cf..defba474d6 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -895,13 +895,14 @@ ETEXI { .name = "migrate", - .args_type = "detach:-d,blk:-b,inc:-i,uri:s", - .params = "[-d] [-b] [-i] uri", + .args_type = "detach:-d,blk:-b,inc:-i,resume:-r,uri:s", + .params = "[-d] [-b] [-i] [-r] uri", .help = "migrate to URI (using -d to not wait for completion)" "\n\t\t\t -b for migration without shared storage with" " full copy of disk\n\t\t\t -i for migration without " "shared storage with incremental copy of disk " - "(base image shared between src and destination)", + "(base image shared between src and destination)" + "\n\t\t\t -r to resume a paused migration", .cmd = hmp_migrate, }, diff --git a/hmp.c b/hmp.c index 016cb5c4f1..4ac4fbcb74 100644 --- a/hmp.c +++ b/hmp.c @@ -1904,10 +1904,12 @@ void hmp_migrate(Monitor *mon, const QDict *qdict) bool detach = qdict_get_try_bool(qdict, "detach", false); bool blk = qdict_get_try_bool(qdict, "blk", false); bool inc = qdict_get_try_bool(qdict, "inc", false); + bool resume = qdict_get_try_bool(qdict, "resume", false); const char *uri = qdict_get_str(qdict, "uri"); Error *err = NULL; - qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, &err); + qmp_migrate(uri, !!blk, blk, !!inc, inc, + false, false, true, resume, &err); if (err) { hmp_handle_error(mon, &err); return; diff --git a/migration/migration.c b/migration/migration.c index 2d97dd742a..d34653fb51 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1361,7 +1361,7 @@ bool migration_is_blocked(Error **errp) void qmp_migrate(const char *uri, bool has_blk, bool blk, bool has_inc, bool inc, bool has_detach, bool detach, - Error **errp) + bool has_resume, bool resume, Error **errp) { Error *local_err = NULL; MigrationState *s = migrate_get_current(); diff --git a/qapi/migration.json b/qapi/migration.json index 6142b065a1..11e0736f67 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1014,6 +1014,8 @@ # @detach: this argument exists only for compatibility reasons and # is ignored by QEMU # +# @resume: resume one paused migration, default "off". (since 2.12) +# # Returns: nothing on success # # Since: 0.14.0 @@ -1035,7 +1037,8 @@ # ## { 'command': 'migrate', - 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } } + 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', + '*detach': 'bool', '*resume': 'bool' } } ## # @migrate-incoming: From patchwork Fri Mar 9 09:15:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883544 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMnZ1fZKz9s8r for ; Fri, 9 Mar 2018 20:40:50 +1100 (AEDT) Received: from localhost ([::1]:44095 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEW8-0007pR-46 for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:40:48 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44952) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8H-0003lX-Pe for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8G-0005xP-Fd for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:09 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59250 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8G-0005xB-9k for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:08 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DDF6F4040859; Fri, 9 Mar 2018 09:16:07 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6680D215CDA7; Fri, 9 Mar 2018 09:16:05 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:20 +0800 Message-Id: <20180309091535.13315-9-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:16:07 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:16:07 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 08/23] migration: rebuild channel on source X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This patch detects the "resume" flag of migration command, rebuild the channels only if the flag is set. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 91 +++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 70 insertions(+), 21 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index d34653fb51..0b31468552 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1359,49 +1359,75 @@ bool migration_is_blocked(Error **errp) return false; } -void qmp_migrate(const char *uri, bool has_blk, bool blk, - bool has_inc, bool inc, bool has_detach, bool detach, - bool has_resume, bool resume, Error **errp) +/* Returns true if continue to migrate, or false if error detected */ +static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc, + bool resume, Error **errp) { Error *local_err = NULL; - MigrationState *s = migrate_get_current(); - const char *p; + + if (resume) { + if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) { + error_setg(errp, "Cannot resume if there is no " + "paused migration"); + return false; + } + /* This is a resume, skip init status */ + return true; + } if (migration_is_setup_or_active(s->state) || s->state == MIGRATION_STATUS_CANCELLING || s->state == MIGRATION_STATUS_COLO) { error_setg(errp, QERR_MIGRATION_ACTIVE); - return; + return false; } + if (runstate_check(RUN_STATE_INMIGRATE)) { error_setg(errp, "Guest is waiting for an incoming migration"); - return; + return false; } if (migration_is_blocked(errp)) { - return; + return false; } - if ((has_blk && blk) || (has_inc && inc)) { + if (blk || blk_inc) { if (migrate_use_block() || migrate_use_block_incremental()) { error_setg(errp, "Command options are incompatible with " "current migration capabilities"); - return; + return false; } migrate_set_block_enabled(true, &local_err); if (local_err) { error_propagate(errp, local_err); - return; + return false; } s->must_remove_block_options = true; } - if (has_inc && inc) { + if (blk_inc) { migrate_set_block_incremental(s, true); } migrate_init(s); + return true; +} + +void qmp_migrate(const char *uri, bool has_blk, bool blk, + bool has_inc, bool inc, bool has_detach, bool detach, + bool has_resume, bool resume, Error **errp) +{ + Error *local_err = NULL; + MigrationState *s = migrate_get_current(); + const char *p; + + if (!migrate_prepare(s, has_blk && blk, has_inc && inc, + has_resume && resume, errp)) { + /* Error detected, put into errp */ + return; + } + if (strstart(uri, "tcp:", &p)) { tcp_start_outgoing_migration(s, p, &local_err); #ifdef CONFIG_RDMA @@ -1857,7 +1883,8 @@ out: return NULL; } -static int open_return_path_on_source(MigrationState *ms) +static int open_return_path_on_source(MigrationState *ms, + bool create_thread) { ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file); @@ -1866,6 +1893,12 @@ static int open_return_path_on_source(MigrationState *ms) } trace_open_return_path_on_source(); + + if (!create_thread) { + /* We're done */ + return 0; + } + qemu_thread_create(&ms->rp_state.rp_thread, "return path", source_return_path_thread, ms, QEMU_THREAD_JOINABLE); @@ -2522,6 +2555,9 @@ static void *migration_thread(void *opaque) void migrate_fd_connect(MigrationState *s, Error *error_in) { + int64_t rate_limit; + bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED; + s->expected_downtime = s->parameters.downtime_limit; s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s); if (error_in) { @@ -2530,12 +2566,21 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) return; } - qemu_file_set_blocking(s->to_dst_file, true); - qemu_file_set_rate_limit(s->to_dst_file, - s->parameters.max_bandwidth / XFER_LIMIT_RATIO); + if (resume) { + /* This is a resumed migration */ + rate_limit = INT64_MAX; + } else { + /* This is a fresh new migration */ + rate_limit = s->parameters.max_bandwidth / XFER_LIMIT_RATIO; + s->expected_downtime = s->parameters.downtime_limit; + s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s); - /* Notify before starting migration thread */ - notifier_list_notify(&migration_state_notifiers, s); + /* Notify before starting migration thread */ + notifier_list_notify(&migration_state_notifiers, s); + } + + qemu_file_set_rate_limit(s->to_dst_file, rate_limit); + qemu_file_set_blocking(s->to_dst_file, true); /* * Open the return path. For postcopy, it is used exclusively. For @@ -2543,15 +2588,19 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) * QEMU uses the return path. */ if (migrate_postcopy_ram() || migrate_use_return_path()) { - if (open_return_path_on_source(s)) { + if (open_return_path_on_source(s, !resume)) { error_report("Unable to open return-path for postcopy"); - migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, - MIGRATION_STATUS_FAILED); + migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED); migrate_fd_cleanup(s); return; } } + if (resume) { + /* TODO: do the resume logic */ + return; + } + if (multifd_save_setup() != 0) { migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_FAILED); From patchwork Fri Mar 9 09:15:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883545 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMqF2Q3qz9sc3 for ; Fri, 9 Mar 2018 20:42:17 +1100 (AEDT) Received: from localhost ([::1]:44109 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEXX-0000OC-8G for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:42:15 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44979) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8O-0003s8-KP for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8J-0005yH-EK for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:16 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:45400 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8J-0005y3-8Y for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:11 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DE9B9EBFE3; Fri, 9 Mar 2018 09:16:10 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 68DB4215CDA7; Fri, 9 Mar 2018 09:16:08 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:21 +0800 Message-Id: <20180309091535.13315-10-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 09 Mar 2018 09:16:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 09 Mar 2018 09:16:10 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 09/23] migration: new state "postcopy-recover" X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Introducing new migration state "postcopy-recover". If a migration procedure is paused and the connection is rebuilt afterward successfully, we'll switch the source VM state from "postcopy-paused" to the new state "postcopy-recover", then we'll do the resume logic in the migration thread (along with the return path thread). This patch only do the state switch on source side. Another following up patch will handle the state switching on destination side using the same status bit. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 76 ++++++++++++++++++++++++++++++++++++++------------- qapi/migration.json | 4 ++- 2 files changed, 60 insertions(+), 20 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 0b31468552..322d5ad79b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -561,6 +561,7 @@ static bool migration_is_setup_or_active(int state) case MIGRATION_STATUS_ACTIVE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: + case MIGRATION_STATUS_POSTCOPY_RECOVER: case MIGRATION_STATUS_SETUP: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: @@ -641,6 +642,7 @@ MigrationInfo *qmp_query_migrate(Error **errp) case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: case MIGRATION_STATUS_POSTCOPY_PAUSED: + case MIGRATION_STATUS_POSTCOPY_RECOVER: /* TODO add some postcopy stats */ info->has_status = true; info->has_total_time = true; @@ -2247,6 +2249,13 @@ typedef enum MigThrError { MIG_THR_ERR_FATAL = 2, } MigThrError; +/* Return zero if success, or <0 for error */ +static int postcopy_do_resume(MigrationState *s) +{ + /* TODO: do the resume logic */ + return 0; +} + /* * We don't return until we are in a safe state to continue current * postcopy migration. Returns MIG_THR_ERR_RECOVERED if recovered, or @@ -2255,29 +2264,55 @@ typedef enum MigThrError { static MigThrError postcopy_pause(MigrationState *s) { assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE); - migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, - MIGRATION_STATUS_POSTCOPY_PAUSED); - /* Current channel is possibly broken. Release it. */ - assert(s->to_dst_file); - qemu_file_shutdown(s->to_dst_file); - qemu_fclose(s->to_dst_file); - s->to_dst_file = NULL; + while (true) { + migrate_set_state(&s->state, s->state, + MIGRATION_STATUS_POSTCOPY_PAUSED); - error_report("Detected IO failure for postcopy. " - "Migration paused."); + /* Current channel is possibly broken. Release it. */ + assert(s->to_dst_file); + qemu_file_shutdown(s->to_dst_file); + qemu_fclose(s->to_dst_file); + s->to_dst_file = NULL; - /* - * We wait until things fixed up. Then someone will setup the - * status back for us. - */ - while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { - qemu_sem_wait(&s->postcopy_pause_sem); - } + error_report("Detected IO failure for postcopy. " + "Migration paused."); - trace_postcopy_pause_continued(); + /* + * We wait until things fixed up. Then someone will setup the + * status back for us. + */ + while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { + qemu_sem_wait(&s->postcopy_pause_sem); + } + + if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { + /* Woken up by a recover procedure. Give it a shot */ - return MIG_THR_ERR_RECOVERED; + /* + * Firstly, let's wake up the return path now, with a new + * return path channel. + */ + qemu_sem_post(&s->postcopy_pause_rp_sem); + + /* Do the resume logic */ + if (postcopy_do_resume(s) == 0) { + /* Let's continue! */ + trace_postcopy_pause_continued(); + return MIG_THR_ERR_RECOVERED; + } else { + /* + * Something wrong happened during the recovery, let's + * pause again. Pause is always better than throwing + * data away. + */ + continue; + } + } else { + /* This is not right... Time to quit. */ + return MIG_THR_ERR_FATAL; + } + } } static MigThrError migration_detect_error(MigrationState *s) @@ -2597,7 +2632,10 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) } if (resume) { - /* TODO: do the resume logic */ + /* Wakeup the main migration thread to do the recovery */ + migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED, + MIGRATION_STATUS_POSTCOPY_RECOVER); + qemu_sem_post(&s->postcopy_pause_sem); return; } diff --git a/qapi/migration.json b/qapi/migration.json index 11e0736f67..152fb8e51e 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -91,6 +91,8 @@ # # @postcopy-paused: during postcopy but paused. (since 2.12) # +# @postcopy-recover: trying to recover from a paused postcopy. (since 2.11) +# # @completed: migration is finished. # # @failed: some error occurred during migration process. @@ -109,7 +111,7 @@ { 'enum': 'MigrationStatus', 'data': [ 'none', 'setup', 'cancelling', 'cancelled', 'active', 'postcopy-active', 'postcopy-paused', - 'completed', 'failed', 'colo', + 'postcopy-recover', 'completed', 'failed', 'colo', 'pre-switchover', 'device' ] } ## From patchwork Fri Mar 9 09:15:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883520 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMPf6Xmnz9sWt for ; Fri, 9 Mar 2018 20:23:33 +1100 (AEDT) Received: from localhost ([::1]:43982 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEFN-0000oV-Lu for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:23:29 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44978) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8O-0003s6-KL for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8M-0005zB-DX for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:16 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60702 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8M-0005z0-8M for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:14 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DEE5C8182D30; Fri, 9 Mar 2018 09:16:13 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6A826215CDAC; Fri, 9 Mar 2018 09:16:11 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:22 +0800 Message-Id: <20180309091535.13315-11-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:16:13 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:16:13 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 10/23] migration: wakeup dst ram-load-thread for recover X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" On the destination side, we cannot wake up all the threads when we got reconnected. The first thing to do is to wake up the main load thread, so that we can continue to receive valid messages from source again and reply when needed. At this point, we switch the destination VM state from postcopy-paused back to postcopy-recover. Now we are finally ready to do the resume logic. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 322d5ad79b..bbeb156a39 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -429,8 +429,34 @@ static void migration_incoming_process(void) void migration_fd_process_incoming(QEMUFile *f) { - migration_incoming_setup(f); - migration_incoming_process(); + MigrationIncomingState *mis = migration_incoming_get_current(); + + if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { + /* Resumed from a paused postcopy migration */ + + mis->from_src_file = f; + /* Postcopy has standalone thread to do vm load */ + qemu_file_set_blocking(f, true); + + /* Re-configure the return path */ + mis->to_src_file = qemu_file_get_return_path(f); + + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED, + MIGRATION_STATUS_POSTCOPY_RECOVER); + + /* + * Here, we only wake up the main loading thread (while the + * fault thread will still be waiting), so that we can receive + * commands from source now, and answer it if needed. The + * fault thread will be woken up afterwards until we are sure + * that source is ready to reply to page requests. + */ + qemu_sem_post(&mis->postcopy_pause_sem_dst); + } else { + /* New incoming migration */ + migration_incoming_setup(f); + migration_incoming_process(); + } } void migration_ioc_process_incoming(QIOChannel *ioc) From patchwork Fri Mar 9 09:15:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883546 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMrY6BxPz9sc3 for ; Fri, 9 Mar 2018 20:43:25 +1100 (AEDT) Received: from localhost ([::1]:44114 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEYd-00011g-P1 for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:43:23 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45002) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8Q-0003u1-NI for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8P-00060K-EV for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:18 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59396 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8P-000604-85 for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:17 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E0055402291E; Fri, 9 Mar 2018 09:16:16 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 69DA3215CDA7; Fri, 9 Mar 2018 09:16:14 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:23 +0800 Message-Id: <20180309091535.13315-12-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 09 Mar 2018 09:16:16 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 09 Mar 2018 09:16:16 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 11/23] migration: new cmd MIG_CMD_RECV_BITMAP X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for one ramblock. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/savevm.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++ migration/savevm.h | 1 + migration/trace-events | 2 ++ 3 files changed, 64 insertions(+) diff --git a/migration/savevm.c b/migration/savevm.c index 93e6a13477..5836c961c9 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -80,6 +80,7 @@ enum qemu_vm_cmd { were previously sent during precopy but are dirty. */ MIG_CMD_PACKAGED, /* Send a wrapped stream within this stream */ + MIG_CMD_RECV_BITMAP, /* Request for recved bitmap on dst */ MIG_CMD_MAX }; @@ -97,6 +98,7 @@ static struct mig_cmd_args { [MIG_CMD_POSTCOPY_RAM_DISCARD] = { .len = -1, .name = "POSTCOPY_RAM_DISCARD" }, [MIG_CMD_PACKAGED] = { .len = 4, .name = "PACKAGED" }, + [MIG_CMD_RECV_BITMAP] = { .len = -1, .name = "RECV_BITMAP" }, [MIG_CMD_MAX] = { .len = -1, .name = "MAX" }, }; @@ -955,6 +957,19 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f) qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL); } +void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name) +{ + size_t len; + char buf[256]; + + trace_savevm_send_recv_bitmap(block_name); + + buf[0] = len = strlen(block_name); + memcpy(buf + 1, block_name, len); + + qemu_savevm_command_send(f, MIG_CMD_RECV_BITMAP, len + 1, (uint8_t *)buf); +} + bool qemu_savevm_state_blocked(Error **errp) { SaveStateEntry *se; @@ -1777,6 +1792,49 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis) return ret; } +/* + * Handle request that source requests for recved_bitmap on + * destination. Payload format: + * + * len (1 byte) + ramblock_name (<255 bytes) + */ +static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis, + uint16_t len) +{ + QEMUFile *file = mis->from_src_file; + RAMBlock *rb; + char block_name[256]; + size_t cnt; + + cnt = qemu_get_counted_string(file, block_name); + if (!cnt) { + error_report("%s: failed to read block name", __func__); + return -EINVAL; + } + + /* Validate before using the data */ + if (qemu_file_get_error(file)) { + return qemu_file_get_error(file); + } + + if (len != cnt + 1) { + error_report("%s: invalid payload length (%d)", __func__, len); + return -EINVAL; + } + + rb = qemu_ram_block_by_name(block_name); + if (!rb) { + error_report("%s: block '%s' not found", __func__, block_name); + return -EINVAL; + } + + /* TODO: send the bitmap back to source */ + + trace_loadvm_handle_recv_bitmap(block_name); + + return 0; +} + /* * Process an incoming 'QEMU_VM_COMMAND' * 0 just a normal return @@ -1850,6 +1908,9 @@ static int loadvm_process_command(QEMUFile *f) case MIG_CMD_POSTCOPY_RAM_DISCARD: return loadvm_postcopy_ram_handle_discard(mis, len); + + case MIG_CMD_RECV_BITMAP: + return loadvm_handle_recv_bitmap(mis, len); } return 0; diff --git a/migration/savevm.h b/migration/savevm.h index 295c4a1f2c..8126b1cc14 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len); void qemu_savevm_send_postcopy_advise(QEMUFile *f); void qemu_savevm_send_postcopy_listen(QEMUFile *f); void qemu_savevm_send_postcopy_run(QEMUFile *f); +void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name); void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name, uint16_t len, diff --git a/migration/trace-events b/migration/trace-events index 06a919a6e3..62b27fbf11 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -12,6 +12,7 @@ loadvm_state_cleanup(void) "" loadvm_handle_cmd_packaged(unsigned int length) "%u" loadvm_handle_cmd_packaged_main(int ret) "%d" loadvm_handle_cmd_packaged_received(int ret) "%d" +loadvm_handle_recv_bitmap(char *s) "%s" loadvm_postcopy_handle_advise(void) "" loadvm_postcopy_handle_listen(void) "" loadvm_postcopy_handle_run(void) "" @@ -34,6 +35,7 @@ savevm_send_open_return_path(void) "" savevm_send_ping(uint32_t val) "0x%x" savevm_send_postcopy_listen(void) "" savevm_send_postcopy_run(void) "" +savevm_send_recv_bitmap(char *name) "%s" savevm_state_setup(void) "" savevm_state_header(void) "" savevm_state_iterate(void) "" From patchwork Fri Mar 9 09:15:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883525 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMTd6S1zz9sY9 for ; Fri, 9 Mar 2018 20:27:01 +1100 (AEDT) Received: from localhost ([::1]:44010 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEIl-0004Ci-72 for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:26:59 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45026) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8U-0003xJ-8I for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8S-00061Z-F0 for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:22 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60706 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8S-00061P-9Z for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:20 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E3CA58182D30; Fri, 9 Mar 2018 09:16:19 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6C6F3215CDA7; Fri, 9 Mar 2018 09:16:17 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:24 +0800 Message-Id: <20180309091535.13315-13-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:16:19 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:16:19 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 12/23] migration: new message MIG_RP_MSG_RECV_BITMAP X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send received bitmap of ramblock back to source. This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only the header (including the ramblock name), and it was appended with the whole ramblock received bitmap on the destination side. When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP), it parses it, convert it to the dirty bitmap by inverting the bits. One thing to mention is that, when we send the recv bitmap, we are doing these things in extra: - converting the bitmap to little endian, to support when hosts are using different endianess on src/dst. - do proper alignment for 8 bytes, to support when hosts are using different word size (32/64 bits) on src/dst. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 68 +++++++++++++++++++++++ migration/migration.h | 2 + migration/ram.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++ migration/ram.h | 3 ++ migration/savevm.c | 2 +- migration/trace-events | 3 ++ 6 files changed, 221 insertions(+), 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index bbeb156a39..759176a292 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -95,6 +95,7 @@ enum mig_rp_message_type { MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */ MIG_RP_MSG_REQ_PAGES, /* data (start: be64, len: be32) */ + MIG_RP_MSG_RECV_BITMAP, /* send recved_bitmap back to source */ MIG_RP_MSG_MAX }; @@ -508,6 +509,45 @@ void migrate_send_rp_pong(MigrationIncomingState *mis, migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf); } +void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis, + char *block_name) +{ + char buf[512]; + int len; + int64_t res; + + /* + * First, we send the header part. It contains only the len of + * idstr, and the idstr itself. + */ + len = strlen(block_name); + buf[0] = len; + memcpy(buf + 1, block_name, len); + + if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { + error_report("%s: MSG_RP_RECV_BITMAP only used for recovery", + __func__); + return; + } + + migrate_send_rp_message(mis, MIG_RP_MSG_RECV_BITMAP, len + 1, buf); + + /* + * Next, we dump the received bitmap to the stream. + * + * TODO: currently we are safe since we are the only one that is + * using the to_src_file handle (fault thread is still paused), + * and it's ok even not taking the mutex. However the best way is + * to take the lock before sending the message header, and release + * the lock after sending the bitmap. + */ + qemu_mutex_lock(&mis->rp_mutex); + res = ramblock_recv_bitmap_send(mis->to_src_file, block_name); + qemu_mutex_unlock(&mis->rp_mutex); + + trace_migrate_send_rp_recv_bitmap(block_name, res); +} + MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp) { MigrationCapabilityStatusList *head = NULL; @@ -1731,6 +1771,7 @@ static struct rp_cmd_args { [MIG_RP_MSG_PONG] = { .len = 4, .name = "PONG" }, [MIG_RP_MSG_REQ_PAGES] = { .len = 12, .name = "REQ_PAGES" }, [MIG_RP_MSG_REQ_PAGES_ID] = { .len = -1, .name = "REQ_PAGES_ID" }, + [MIG_RP_MSG_RECV_BITMAP] = { .len = -1, .name = "RECV_BITMAP" }, [MIG_RP_MSG_MAX] = { .len = -1, .name = "MAX" }, }; @@ -1775,6 +1816,19 @@ static bool postcopy_pause_return_path_thread(MigrationState *s) return true; } +static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name) +{ + RAMBlock *block = qemu_ram_block_by_name(block_name); + + if (!block) { + error_report("%s: invalid block name '%s'", __func__, block_name); + return -EINVAL; + } + + /* Fetch the received bitmap and refresh the dirty bitmap */ + return ram_dirty_bitmap_reload(s, block); +} + /* * Handles messages sent on the return path towards the source VM * @@ -1880,6 +1934,20 @@ retry: migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len); break; + case MIG_RP_MSG_RECV_BITMAP: + if (header_len < 1) { + error_report("%s: missing block name", __func__); + mark_source_rp_bad(ms); + goto out; + } + /* Format: len (1B) + idstr (<255B). This ends the idstr. */ + buf[buf[0] + 1] = '\0'; + if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) { + mark_source_rp_bad(ms); + goto out; + } + break; + default: break; } diff --git a/migration/migration.h b/migration/migration.h index 92517fd6e3..88cf393c8c 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -241,5 +241,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis, uint32_t value); int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname, ram_addr_t start, size_t len); +void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis, + char *block_name); #endif diff --git a/migration/ram.c b/migration/ram.c index 3b6c077964..a897c250cb 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -182,6 +182,70 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, nr); } +#define RAMBLOCK_RECV_BITMAP_ENDING (0x0123456789abcdefULL) + +/* + * Format: bitmap_size (8 bytes) + whole_bitmap (N bytes). + * + * Returns >0 if success with sent bytes, or <0 if error. + */ +int64_t ramblock_recv_bitmap_send(QEMUFile *file, + const char *block_name) +{ + RAMBlock *block = qemu_ram_block_by_name(block_name); + unsigned long *le_bitmap, nbits; + uint64_t size; + + if (!block) { + error_report("%s: invalid block name: %s", __func__, block_name); + return -1; + } + + nbits = block->used_length >> TARGET_PAGE_BITS; + + /* + * Make sure the tmp bitmap buffer is big enough, e.g., on 32bit + * machines we may need 4 more bytes for padding (see below + * comment). So extend it a bit before hand. + */ + le_bitmap = bitmap_new(nbits + BITS_PER_LONG); + + /* + * Always use little endian when sending the bitmap. This is + * required that when source and destination VMs are not using the + * same endianess. (Note: big endian won't work.) + */ + bitmap_to_le(le_bitmap, block->receivedmap, nbits); + + /* Size of the bitmap, in bytes */ + size = nbits / 8; + + /* + * size is always aligned to 8 bytes for 64bit machines, but it + * may not be true for 32bit machines. We need this padding to + * make sure the migration can survive even between 32bit and + * 64bit machines. + */ + size = ROUND_UP(size, 8); + + qemu_put_be64(file, size); + qemu_put_buffer(file, (const uint8_t *)le_bitmap, size); + /* + * Mark as an end, in case the middle part is screwed up due to + * some "misterious" reason. + */ + qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING); + qemu_fflush(file); + + free(le_bitmap); + + if (qemu_file_get_error(file)) { + return qemu_file_get_error(file); + } + + return size + sizeof(size); +} + /* * An outstanding page request, on the source, having been received * and queued @@ -2996,6 +3060,86 @@ static bool ram_has_postcopy(void *opaque) return migrate_postcopy_ram(); } +/* + * Read the received bitmap, revert it as the initial dirty bitmap. + * This is only used when the postcopy migration is paused but wants + * to resume from a middle point. + */ +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) +{ + int ret = -EINVAL; + QEMUFile *file = s->rp_state.from_dst_file; + unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS; + uint64_t local_size = nbits / 8; + uint64_t size, end_mark; + + trace_ram_dirty_bitmap_reload_begin(block->idstr); + + if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { + error_report("%s: incorrect state %s", __func__, + MigrationStatus_str(s->state)); + return -EINVAL; + } + + /* + * Note: see comments in ramblock_recv_bitmap_send() on why we + * need the endianess convertion, and the paddings. + */ + local_size = ROUND_UP(local_size, 8); + + /* Add paddings */ + le_bitmap = bitmap_new(nbits + BITS_PER_LONG); + + size = qemu_get_be64(file); + + /* The size of the bitmap should match with our ramblock */ + if (size != local_size) { + error_report("%s: ramblock '%s' bitmap size mismatch " + "(0x%"PRIx64" != 0x%"PRIx64")", __func__, + block->idstr, size, local_size); + ret = -EINVAL; + goto out; + } + + size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size); + end_mark = qemu_get_be64(file); + + ret = qemu_file_get_error(file); + if (ret || size != local_size) { + error_report("%s: read bitmap failed for ramblock '%s': %d" + " (size 0x%"PRIx64", got: 0x%"PRIx64")", + __func__, block->idstr, ret, local_size, size); + ret = -EIO; + goto out; + } + + if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) { + error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIu64, + __func__, block->idstr, end_mark); + ret = -EINVAL; + goto out; + } + + /* + * Endianess convertion. We are during postcopy (though paused). + * The dirty bitmap won't change. We can directly modify it. + */ + bitmap_from_le(block->bmap, le_bitmap, nbits); + + /* + * What we received is "received bitmap". Revert it as the initial + * dirty bitmap for this ramblock. + */ + bitmap_complement(block->bmap, block->bmap, nbits); + + trace_ram_dirty_bitmap_reload_complete(block->idstr); + + ret = 0; +out: + free(le_bitmap); + return ret; +} + static SaveVMHandlers savevm_ram_handlers = { .save_setup = ram_save_setup, .save_live_iterate = ram_save_iterate, diff --git a/migration/ram.h b/migration/ram.h index 53f0021c51..556ce60100 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -62,5 +62,8 @@ void ram_handle_compressed(void *host, uint8_t ch, uint64_t size); int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr); void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr); void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr); +int64_t ramblock_recv_bitmap_send(QEMUFile *file, + const char *block_name); +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); #endif diff --git a/migration/savevm.c b/migration/savevm.c index 5836c961c9..1a7b4c7caf 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1828,7 +1828,7 @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis, return -EINVAL; } - /* TODO: send the bitmap back to source */ + migrate_send_rp_recv_bitmap(mis, block_name); trace_loadvm_handle_recv_bitmap(block_name); diff --git a/migration/trace-events b/migration/trace-events index 62b27fbf11..f451251ad1 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -79,6 +79,8 @@ ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x" ram_postcopy_send_discard_bitmap(void) "" ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p" ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx" +ram_dirty_bitmap_reload_begin(char *str) "%s" +ram_dirty_bitmap_reload_complete(char *str) "%s" # migration/migration.c await_return_path_close_on_source_close(void) "" @@ -90,6 +92,7 @@ migrate_fd_cancel(void) "" migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx" migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")" migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d" +migrate_send_rp_recv_bitmap(char *name, int64_t size) "block '%s' size 0x%"PRIi64 migration_completion_file_err(void) "" migration_completion_postcopy_end(void) "" migration_completion_postcopy_end_after_complete(void) "" From patchwork Fri Mar 9 09:15:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883524 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMSf2Q5Fz9sXr for ; Fri, 9 Mar 2018 20:26:10 +1100 (AEDT) Received: from localhost ([::1]:44006 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEHw-0003Qr-3x for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:26:08 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45081) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8Z-00041v-DT for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8V-00062h-GH for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:27 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59256 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8V-00062Z-B6 for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:23 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E617A4040859; Fri, 9 Mar 2018 09:16:22 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6F352215CDA7; Fri, 9 Mar 2018 09:16:20 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:25 +0800 Message-Id: <20180309091535.13315-14-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:16:22 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:16:22 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 13/23] migration: new cmd MIG_CMD_POSTCOPY_RESUME X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Introducing this new command to be sent when the source VM is ready to resume the paused migration. What the destination does here is basically release the fault thread to continue service page faults. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/savevm.c | 35 +++++++++++++++++++++++++++++++++++ migration/savevm.h | 1 + migration/trace-events | 2 ++ 3 files changed, 38 insertions(+) diff --git a/migration/savevm.c b/migration/savevm.c index 1a7b4c7caf..c8d2f7e956 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -79,6 +79,7 @@ enum qemu_vm_cmd { MIG_CMD_POSTCOPY_RAM_DISCARD, /* A list of pages to discard that were previously sent during precopy but are dirty. */ + MIG_CMD_POSTCOPY_RESUME, /* resume postcopy on dest */ MIG_CMD_PACKAGED, /* Send a wrapped stream within this stream */ MIG_CMD_RECV_BITMAP, /* Request for recved bitmap on dst */ MIG_CMD_MAX @@ -97,6 +98,7 @@ static struct mig_cmd_args { [MIG_CMD_POSTCOPY_RUN] = { .len = 0, .name = "POSTCOPY_RUN" }, [MIG_CMD_POSTCOPY_RAM_DISCARD] = { .len = -1, .name = "POSTCOPY_RAM_DISCARD" }, + [MIG_CMD_POSTCOPY_RESUME] = { .len = 0, .name = "POSTCOPY_RESUME" }, [MIG_CMD_PACKAGED] = { .len = 4, .name = "PACKAGED" }, [MIG_CMD_RECV_BITMAP] = { .len = -1, .name = "RECV_BITMAP" }, [MIG_CMD_MAX] = { .len = -1, .name = "MAX" }, @@ -957,6 +959,12 @@ void qemu_savevm_send_postcopy_run(QEMUFile *f) qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL); } +void qemu_savevm_send_postcopy_resume(QEMUFile *f) +{ + trace_savevm_send_postcopy_resume(); + qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RESUME, 0, NULL); +} + void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name) { size_t len; @@ -1744,6 +1752,30 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) return LOADVM_QUIT; } +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) +{ + if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { + error_report("%s: illegal resume received", __func__); + /* Don't fail the load, only for this. */ + return 0; + } + + /* + * This means source VM is ready to resume the postcopy migration. + * It's time to switch state and release the fault thread to + * continue service page faults. + */ + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER, + MIGRATION_STATUS_POSTCOPY_ACTIVE); + qemu_sem_post(&mis->postcopy_pause_sem_fault); + + trace_loadvm_postcopy_handle_resume(); + + /* TODO: Tell source that "we are ready" */ + + return 0; +} + /** * Immediately following this command is a blob of data containing an embedded * chunk of migration stream; read it and load it. @@ -1909,6 +1941,9 @@ static int loadvm_process_command(QEMUFile *f) case MIG_CMD_POSTCOPY_RAM_DISCARD: return loadvm_postcopy_ram_handle_discard(mis, len); + case MIG_CMD_POSTCOPY_RESUME: + return loadvm_postcopy_handle_resume(mis); + case MIG_CMD_RECV_BITMAP: return loadvm_handle_recv_bitmap(mis, len); } diff --git a/migration/savevm.h b/migration/savevm.h index 8126b1cc14..a5f3879191 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -46,6 +46,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len); void qemu_savevm_send_postcopy_advise(QEMUFile *f); void qemu_savevm_send_postcopy_listen(QEMUFile *f); void qemu_savevm_send_postcopy_run(QEMUFile *f); +void qemu_savevm_send_postcopy_resume(QEMUFile *f); void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name); void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name, diff --git a/migration/trace-events b/migration/trace-events index f451251ad1..d323abb75a 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -18,6 +18,7 @@ loadvm_postcopy_handle_listen(void) "" loadvm_postcopy_handle_run(void) "" loadvm_postcopy_handle_run_cpu_sync(void) "" loadvm_postcopy_handle_run_vmstart(void) "" +loadvm_postcopy_handle_resume(void) "" loadvm_postcopy_ram_handle_discard(void) "" loadvm_postcopy_ram_handle_discard_end(void) "" loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud" @@ -35,6 +36,7 @@ savevm_send_open_return_path(void) "" savevm_send_ping(uint32_t val) "0x%x" savevm_send_postcopy_listen(void) "" savevm_send_postcopy_run(void) "" +savevm_send_postcopy_resume(void) "" savevm_send_recv_bitmap(char *name) "%s" savevm_state_setup(void) "" savevm_state_header(void) "" From patchwork Fri Mar 9 09:15:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883517 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMKc5tQjz9sWt for ; Fri, 9 Mar 2018 20:20:04 +1100 (AEDT) Received: from localhost ([::1]:43958 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEC2-0006G7-H0 for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:20:02 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45085) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8Z-00042A-Ih for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8Y-000645-DK for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:27 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:40166 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8Y-00063i-8I for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:26 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E51E3D1437; Fri, 9 Mar 2018 09:16:25 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 71EAB215CDA7; Fri, 9 Mar 2018 09:16:23 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:26 +0800 Message-Id: <20180309091535.13315-15-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Fri, 09 Mar 2018 09:16:25 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Fri, 09 Mar 2018 09:16:25 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 14/23] migration: new message MIG_RP_MSG_RESUME_ACK X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t is used as payload to let the source know whether destination is ready to continue the migration. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 37 +++++++++++++++++++++++++++++++++++++ migration/migration.h | 3 +++ migration/savevm.c | 3 ++- migration/trace-events | 1 + 4 files changed, 43 insertions(+), 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index 759176a292..f2aca30438 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -96,6 +96,7 @@ enum mig_rp_message_type { MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */ MIG_RP_MSG_REQ_PAGES, /* data (start: be64, len: be32) */ MIG_RP_MSG_RECV_BITMAP, /* send recved_bitmap back to source */ + MIG_RP_MSG_RESUME_ACK, /* tell source that we are ready to resume */ MIG_RP_MSG_MAX }; @@ -548,6 +549,14 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis, trace_migrate_send_rp_recv_bitmap(block_name, res); } +void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value) +{ + uint32_t buf; + + buf = cpu_to_be32(value); + migrate_send_rp_message(mis, MIG_RP_MSG_RESUME_ACK, sizeof(buf), &buf); +} + MigrationCapabilityStatusList *qmp_query_migrate_capabilities(Error **errp) { MigrationCapabilityStatusList *head = NULL; @@ -1772,6 +1781,7 @@ static struct rp_cmd_args { [MIG_RP_MSG_REQ_PAGES] = { .len = 12, .name = "REQ_PAGES" }, [MIG_RP_MSG_REQ_PAGES_ID] = { .len = -1, .name = "REQ_PAGES_ID" }, [MIG_RP_MSG_RECV_BITMAP] = { .len = -1, .name = "RECV_BITMAP" }, + [MIG_RP_MSG_RESUME_ACK] = { .len = 4, .name = "RESUME_ACK" }, [MIG_RP_MSG_MAX] = { .len = -1, .name = "MAX" }, }; @@ -1829,6 +1839,25 @@ static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name) return ram_dirty_bitmap_reload(s, block); } +static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value) +{ + trace_source_return_path_thread_resume_ack(value); + + if (value != MIGRATION_RESUME_ACK_VALUE) { + error_report("%s: illegal resume_ack value %"PRIu32, + __func__, value); + return -1; + } + + /* Now both sides are active. */ + migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER, + MIGRATION_STATUS_POSTCOPY_ACTIVE); + + /* TODO: notify send thread that time to continue send pages */ + + return 0; +} + /* * Handles messages sent on the return path towards the source VM * @@ -1948,6 +1977,14 @@ retry: } break; + case MIG_RP_MSG_RESUME_ACK: + tmp32 = ldl_be_p(buf); + if (migrate_handle_rp_resume_ack(ms, tmp32)) { + mark_source_rp_bad(ms); + goto out; + } + break; + default: break; } diff --git a/migration/migration.h b/migration/migration.h index 88cf393c8c..dca16307c2 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -22,6 +22,8 @@ #include "hw/qdev.h" #include "io/channel.h" +#define MIGRATION_RESUME_ACK_VALUE (1) + /* State for the incoming migration */ struct MigrationIncomingState { QEMUFile *from_src_file; @@ -243,5 +245,6 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname, ram_addr_t start, size_t len); void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis, char *block_name); +void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value); #endif diff --git a/migration/savevm.c b/migration/savevm.c index c8d2f7e956..333fb09953 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1771,7 +1771,8 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) trace_loadvm_postcopy_handle_resume(); - /* TODO: Tell source that "we are ready" */ + /* Tell source that "we are ready" */ + migrate_send_rp_resume_ack(mis, MIGRATION_RESUME_ACK_VALUE); return 0; } diff --git a/migration/trace-events b/migration/trace-events index d323abb75a..7422a395da 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -120,6 +120,7 @@ source_return_path_thread_entry(void) "" source_return_path_thread_loop_top(void) "" source_return_path_thread_pong(uint32_t val) "0x%x" source_return_path_thread_shut(uint32_t val) "0x%x" +source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32 migrate_global_state_post_load(const char *state) "loaded state: %s" migrate_global_state_pre_save(const char *state) "saved state: %s" migration_thread_low_pending(uint64_t pending) "%" PRIu64 From patchwork Fri Mar 9 09:15:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883532 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMXk43j0z9sbv for ; Fri, 9 Mar 2018 20:29:42 +1100 (AEDT) Received: from localhost ([::1]:44022 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euELM-0006gr-AW for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:29:40 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45111) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8c-00043L-SA for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8b-00066M-OZ for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:30 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:49760 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8b-00065X-Il for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:29 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 352D6402277B; Fri, 9 Mar 2018 09:16:29 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 71247215CDA7; Fri, 9 Mar 2018 09:16:26 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:27 +0800 Message-Id: <20180309091535.13315-16-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Fri, 09 Mar 2018 09:16:29 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Fri, 09 Mar 2018 09:16:29 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 15/23] migration: introduce SaveVMHandlers.resume_prepare X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This is hook function to be called when a postcopy migration wants to resume from a failure. For each module, it should provide its own recovery logic before we switch to the postcopy-active state. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- include/migration/register.h | 2 ++ migration/migration.c | 20 +++++++++++++++++++- migration/savevm.c | 25 +++++++++++++++++++++++++ migration/savevm.h | 1 + migration/trace-events | 1 + 5 files changed, 48 insertions(+), 1 deletion(-) diff --git a/include/migration/register.h b/include/migration/register.h index f4f7bdc177..128124f008 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -42,6 +42,8 @@ typedef struct SaveVMHandlers { LoadStateHandler *load_state; int (*load_setup)(QEMUFile *f, void *opaque); int (*load_cleanup)(void *opaque); + /* Called when postcopy migration wants to resume from failure */ + int (*resume_prepare)(MigrationState *s, void *opaque); } SaveVMHandlers; int register_savevm_live(DeviceState *dev, diff --git a/migration/migration.c b/migration/migration.c index f2aca30438..a2b98afb4d 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2383,7 +2383,25 @@ typedef enum MigThrError { /* Return zero if success, or <0 for error */ static int postcopy_do_resume(MigrationState *s) { - /* TODO: do the resume logic */ + int ret; + + /* + * Call all the resume_prepare() hooks, so that modules can be + * ready for the migration resume. + */ + ret = qemu_savevm_state_resume_prepare(s); + if (ret) { + error_report("%s: resume_prepare() failure detected: %d", + __func__, ret); + return ret; + } + + /* + * TODO: handshake with dest using MIG_CMD_RESUME, + * MIG_RP_MSG_RESUME_ACK, then switch source state to + * "postcopy-active" + */ + return 0; } diff --git a/migration/savevm.c b/migration/savevm.c index 333fb09953..7f4899ae9c 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1030,6 +1030,31 @@ void qemu_savevm_state_setup(QEMUFile *f) } } +int qemu_savevm_state_resume_prepare(MigrationState *s) +{ + SaveStateEntry *se; + int ret; + + trace_savevm_state_resume_prepare(); + + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + if (!se->ops || !se->ops->resume_prepare) { + continue; + } + if (se->ops && se->ops->is_active) { + if (!se->ops->is_active(se->opaque)) { + continue; + } + } + ret = se->ops->resume_prepare(s, se->opaque); + if (ret < 0) { + return ret; + } + } + + return 0; +} + /* * this function has three return values: * negative: there was one error, and we have -errno. diff --git a/migration/savevm.h b/migration/savevm.h index a5f3879191..3193f04cca 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -31,6 +31,7 @@ bool qemu_savevm_state_blocked(Error **errp); void qemu_savevm_state_setup(QEMUFile *f); +int qemu_savevm_state_resume_prepare(MigrationState *s); void qemu_savevm_state_header(QEMUFile *f); int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy); void qemu_savevm_state_cleanup(void); diff --git a/migration/trace-events b/migration/trace-events index 7422a395da..fe46b2c6c5 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -39,6 +39,7 @@ savevm_send_postcopy_run(void) "" savevm_send_postcopy_resume(void) "" savevm_send_recv_bitmap(char *name) "%s" savevm_state_setup(void) "" +savevm_state_resume_prepare(void) "" savevm_state_header(void) "" savevm_state_iterate(void) "" savevm_state_cleanup(void) "" From patchwork Fri Mar 9 09:15:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883519 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMNy1ZTGz9sWt for ; Fri, 9 Mar 2018 20:22:58 +1100 (AEDT) Received: from localhost ([::1]:43979 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEEp-0000KX-W8 for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:22:56 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45130) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8f-000463-TF for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8e-0006BE-O7 for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:33 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:49762 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8e-0006Ac-IW for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:32 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3556C402277B; Fri, 9 Mar 2018 09:16:32 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id B516A215CDA7; Fri, 9 Mar 2018 09:16:29 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:28 +0800 Message-Id: <20180309091535.13315-17-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Fri, 09 Mar 2018 09:16:32 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Fri, 09 Mar 2018 09:16:32 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 16/23] migration: synchronize dirty bitmap for resume X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This patch implements the first part of core RAM resume logic for postcopy. ram_resume_prepare() is provided for the work. When the migration is interrupted by network failure, the dirty bitmap on the source side will be meaningless, because even the dirty bit is cleared, it is still possible that the sent page was lost along the way to destination. Here instead of continue the migration with the old dirty bitmap on source, we ask the destination side to send back its received bitmap, then invert it to be our initial dirty bitmap. The source side send thread will issue the MIG_CMD_RECV_BITMAP requests, once per ramblock, to ask for the received bitmap. On destination side, MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap. Data will be received on the return-path thread of source, and the main migration thread will be notified when all the ramblock bitmaps are synchronized. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 2 ++ migration/migration.h | 1 + migration/ram.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ migration/trace-events | 4 ++++ 4 files changed, 54 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index a2b98afb4d..71e4195422 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2896,6 +2896,7 @@ static void migration_instance_finalize(Object *obj) qemu_sem_destroy(&ms->pause_sem); qemu_sem_destroy(&ms->postcopy_pause_sem); qemu_sem_destroy(&ms->postcopy_pause_rp_sem); + qemu_sem_destroy(&ms->rp_state.rp_sem); } static void migration_instance_init(Object *obj) @@ -2927,6 +2928,7 @@ static void migration_instance_init(Object *obj) qemu_sem_init(&ms->postcopy_pause_sem, 0); qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); + qemu_sem_init(&ms->rp_state.rp_sem, 0); } /* diff --git a/migration/migration.h b/migration/migration.h index dca16307c2..8f64b43584 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -119,6 +119,7 @@ struct MigrationState QEMUFile *from_dst_file; QemuThread rp_thread; bool error; + QemuSemaphore rp_sem; } rp_state; double mbps; diff --git a/migration/ram.c b/migration/ram.c index a897c250cb..196b1ba876 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -51,6 +51,7 @@ #include "qemu/rcu_queue.h" #include "migration/colo.h" #include "migration/block.h" +#include "savevm.h" /***********************************************************/ /* ram save/restore */ @@ -3060,6 +3061,38 @@ static bool ram_has_postcopy(void *opaque) return migrate_postcopy_ram(); } +/* Sync all the dirty bitmap with destination VM. */ +static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) +{ + RAMBlock *block; + QEMUFile *file = s->to_dst_file; + int ramblock_count = 0; + + trace_ram_dirty_bitmap_sync_start(); + + RAMBLOCK_FOREACH(block) { + qemu_savevm_send_recv_bitmap(file, block->idstr); + trace_ram_dirty_bitmap_request(block->idstr); + ramblock_count++; + } + + trace_ram_dirty_bitmap_sync_wait(); + + /* Wait until all the ramblocks' dirty bitmap synced */ + while (ramblock_count--) { + qemu_sem_wait(&s->rp_state.rp_sem); + } + + trace_ram_dirty_bitmap_sync_complete(); + + return 0; +} + +static void ram_dirty_bitmap_reload_notify(MigrationState *s) +{ + qemu_sem_post(&s->rp_state.rp_sem); +} + /* * Read the received bitmap, revert it as the initial dirty bitmap. * This is only used when the postcopy migration is paused but wants @@ -3134,12 +3167,25 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) trace_ram_dirty_bitmap_reload_complete(block->idstr); + /* + * We succeeded to sync bitmap for current ramblock. If this is + * the last one to sync, we need to notify the main send thread. + */ + ram_dirty_bitmap_reload_notify(s); + ret = 0; out: free(le_bitmap); return ret; } +static int ram_resume_prepare(MigrationState *s, void *opaque) +{ + RAMState *rs = *(RAMState **)opaque; + + return ram_dirty_bitmap_sync_all(s, rs); +} + static SaveVMHandlers savevm_ram_handlers = { .save_setup = ram_save_setup, .save_live_iterate = ram_save_iterate, @@ -3151,6 +3197,7 @@ static SaveVMHandlers savevm_ram_handlers = { .save_cleanup = ram_save_cleanup, .load_setup = ram_load_setup, .load_cleanup = ram_load_cleanup, + .resume_prepare = ram_resume_prepare, }; void ram_mig_init(void) diff --git a/migration/trace-events b/migration/trace-events index fe46b2c6c5..45b1d89217 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -82,8 +82,12 @@ ram_load_postcopy_loop(uint64_t addr, int flags) "@%" PRIx64 " %x" ram_postcopy_send_discard_bitmap(void) "" ram_save_page(const char *rbname, uint64_t offset, void *host) "%s: offset: 0x%" PRIx64 " host: %p" ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: 0x%zx len: 0x%zx" +ram_dirty_bitmap_request(char *str) "%s" ram_dirty_bitmap_reload_begin(char *str) "%s" ram_dirty_bitmap_reload_complete(char *str) "%s" +ram_dirty_bitmap_sync_start(void) "" +ram_dirty_bitmap_sync_wait(void) "" +ram_dirty_bitmap_sync_complete(void) "" # migration/migration.c await_return_path_close_on_source_close(void) "" From patchwork Fri Mar 9 09:15:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883523 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMS35jFHz9sWt for ; Fri, 9 Mar 2018 20:25:39 +1100 (AEDT) Received: from localhost ([::1]:43996 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEHR-0002tb-4h for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:25:37 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45153) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8k-0004B0-Uj for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8h-0006Ef-Nw for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:38 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59404 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8h-0006E3-Iz for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:35 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3BCC74023112; Fri, 9 Mar 2018 09:16:35 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id B4AFA215CDA7; Fri, 9 Mar 2018 09:16:32 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:29 +0800 Message-Id: <20180309091535.13315-18-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 09 Mar 2018 09:16:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 09 Mar 2018 09:16:35 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 17/23] migration: setup ramstate for resume X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" After we updated the dirty bitmaps of ramblocks, we also need to update the critical fields in RAMState to make sure it is ready for a resume. Signed-off-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert --- migration/ram.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- migration/trace-events | 1 + 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index 196b1ba876..5cd9f43728 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2254,6 +2254,41 @@ static int ram_init_all(RAMState **rsp) return 0; } +static void ram_state_resume_prepare(RAMState *rs, QEMUFile *out) +{ + RAMBlock *block; + uint64_t pages = 0; + + /* + * Postcopy is not using xbzrle/compression, so no need for that. + * Also, since source are already halted, we don't need to care + * about dirty page logging as well. + */ + + RAMBLOCK_FOREACH(block) { + pages += bitmap_count_one(block->bmap, + block->used_length >> TARGET_PAGE_BITS); + } + + /* This may not be aligned with current bitmaps. Recalculate. */ + rs->migration_dirty_pages = pages; + + rs->last_seen_block = NULL; + rs->last_sent_block = NULL; + rs->last_page = 0; + rs->last_version = ram_list.version; + /* + * Disable the bulk stage, otherwise we'll resend the whole RAM no + * matter what we have sent. + */ + rs->ram_bulk_stage = false; + + /* Update RAMState cache of output QEMUFile */ + rs->f = out; + + trace_ram_state_resume_prepare(pages); +} + /* * Each of ram_save_setup, ram_save_iterate and ram_save_complete has * long-running RCU critical section. When rcu-reclaims in the code @@ -3182,8 +3217,16 @@ out: static int ram_resume_prepare(MigrationState *s, void *opaque) { RAMState *rs = *(RAMState **)opaque; + int ret; - return ram_dirty_bitmap_sync_all(s, rs); + ret = ram_dirty_bitmap_sync_all(s, rs); + if (ret) { + return ret; + } + + ram_state_resume_prepare(rs, s->to_dst_file); + + return 0; } static SaveVMHandlers savevm_ram_handlers = { diff --git a/migration/trace-events b/migration/trace-events index 45b1d89217..f5913ff51c 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -88,6 +88,7 @@ ram_dirty_bitmap_reload_complete(char *str) "%s" ram_dirty_bitmap_sync_start(void) "" ram_dirty_bitmap_sync_wait(void) "" ram_dirty_bitmap_sync_complete(void) "" +ram_state_resume_prepare(long v) "%ld" # migration/migration.c await_return_path_close_on_source_close(void) "" From patchwork Fri Mar 9 09:15:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883530 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMW56Plpz9sbv for ; Fri, 9 Mar 2018 20:28:17 +1100 (AEDT) Received: from localhost ([::1]:44015 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEJz-0005Jp-MO for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:28:15 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45163) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8l-0004Bj-Jb for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8k-0006Fa-Ob for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:39 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60714 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8k-0006FS-KG for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:38 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 45AD68182D30; Fri, 9 Mar 2018 09:16:38 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id BD0E2215CDA7; Fri, 9 Mar 2018 09:16:35 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:30 +0800 Message-Id: <20180309091535.13315-19-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:16:38 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:16:38 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 18/23] migration: final handshake for the resume X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Finish the last step to do the final handshake for the recovery. First source sends one MIG_CMD_RESUME to dst, telling that source is ready to resume. Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that dest is ready to resume (after switch to postcopy-active state). When source received the RESUME_ACK, it switches its state to postcopy-active, and finally the recovery is completed. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 71e4195422..17578e2731 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1853,7 +1853,8 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value) migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER, MIGRATION_STATUS_POSTCOPY_ACTIVE); - /* TODO: notify send thread that time to continue send pages */ + /* Notify send thread that time to continue send pages */ + qemu_sem_post(&s->rp_state.rp_sem); return 0; } @@ -2380,6 +2381,21 @@ typedef enum MigThrError { MIG_THR_ERR_FATAL = 2, } MigThrError; +static int postcopy_resume_handshake(MigrationState *s) +{ + qemu_savevm_send_postcopy_resume(s->to_dst_file); + + while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { + qemu_sem_wait(&s->rp_state.rp_sem); + } + + if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + return 0; + } + + return -1; +} + /* Return zero if success, or <0 for error */ static int postcopy_do_resume(MigrationState *s) { @@ -2397,10 +2413,14 @@ static int postcopy_do_resume(MigrationState *s) } /* - * TODO: handshake with dest using MIG_CMD_RESUME, - * MIG_RP_MSG_RESUME_ACK, then switch source state to - * "postcopy-active" + * Last handshake with destination on the resume (destination will + * switch to postcopy-active afterwards) */ + ret = postcopy_resume_handshake(s); + if (ret) { + error_report("%s: handshake failed: %d", __func__, ret); + return ret; + } return 0; } From patchwork Fri Mar 9 09:15:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883547 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMsw60Rnz9scC for ; Fri, 9 Mar 2018 20:44:35 +1100 (AEDT) Received: from localhost ([::1]:44120 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEZk-0001pY-FQ for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:44:32 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45188) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8o-0004FB-QD for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8n-0006HC-Ti for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:42 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59260 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8n-0006H1-OR for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:41 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6A1B64040859; Fri, 9 Mar 2018 09:16:41 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id C5E72215CDA7; Fri, 9 Mar 2018 09:16:38 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:31 +0800 Message-Id: <20180309091535.13315-20-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:16:41 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:16:41 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 19/23] migration: init dst in migration_object_init too X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Though we may not need it, now we init both the src/dst migration objects in migration_object_init() so that even incoming migration object would be thread safe (it was not). Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- migration/migration.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 17578e2731..af00cfe04f 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -106,6 +106,7 @@ enum mig_rp_message_type { dynamic creation of migration */ static MigrationState *current_migration; +static MigrationIncomingState *current_incoming; static bool migration_object_check(MigrationState *ms, Error **errp); static int migration_maybe_pause(MigrationState *s, @@ -121,6 +122,18 @@ void migration_object_init(void) assert(!current_migration); current_migration = MIGRATION_OBJ(object_new(TYPE_MIGRATION)); + /* + * Init the migrate incoming object as well no matter whether + * we'll use it or not. + */ + assert(!current_incoming); + current_incoming = g_new0(MigrationIncomingState, 1); + current_incoming->state = MIGRATION_STATUS_NONE; + qemu_mutex_init(¤t_incoming->rp_mutex); + qemu_event_init(¤t_incoming->main_thread_load_event, false); + qemu_sem_init(¤t_incoming->postcopy_pause_sem_dst, 0); + qemu_sem_init(¤t_incoming->postcopy_pause_sem_fault, 0); + if (!migration_object_check(current_migration, &err)) { error_report_err(err); exit(1); @@ -151,19 +164,8 @@ MigrationState *migrate_get_current(void) MigrationIncomingState *migration_incoming_get_current(void) { - static bool once; - static MigrationIncomingState mis_current; - - if (!once) { - mis_current.state = MIGRATION_STATUS_NONE; - memset(&mis_current, 0, sizeof(MigrationIncomingState)); - qemu_mutex_init(&mis_current.rp_mutex); - qemu_event_init(&mis_current.main_thread_load_event, false); - qemu_sem_init(&mis_current.postcopy_pause_sem_dst, 0); - qemu_sem_init(&mis_current.postcopy_pause_sem_fault, 0); - once = true; - } - return &mis_current; + assert(current_incoming); + return current_incoming; } void migration_incoming_state_destroy(void) From patchwork Fri Mar 9 09:15:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883531 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMWl4zNBz9sdd for ; Fri, 9 Mar 2018 20:28:51 +1100 (AEDT) Received: from localhost ([::1]:44018 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEKX-0005pe-BN for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:28:49 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45248) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8v-0004ME-TC for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:50 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8q-0006Ip-V2 for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:49 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59410 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8q-0006IB-QY for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:44 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 75BBF4023112; Fri, 9 Mar 2018 09:16:44 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id EA150215CDA7; Fri, 9 Mar 2018 09:16:41 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:32 +0800 Message-Id: <20180309091535.13315-21-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 09 Mar 2018 09:16:44 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Fri, 09 Mar 2018 09:16:44 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 20/23] qmp/migration: new command migrate-recover X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" The first allow-oob=true command. It's used on destination side when the postcopy migration is paused and ready for a recovery. After execution, a new migration channel will be established for postcopy to continue. Signed-off-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert --- migration/migration.c | 24 ++++++++++++++++++++++++ migration/migration.h | 1 + migration/savevm.c | 3 +++ qapi/migration.json | 20 ++++++++++++++++++++ 4 files changed, 48 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index af00cfe04f..180552329c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1424,6 +1424,30 @@ void qmp_migrate_incoming(const char *uri, Error **errp) once = false; } +void qmp_migrate_recover(const char *uri, Error **errp) +{ + MigrationIncomingState *mis = migration_incoming_get_current(); + + if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) { + error_setg(errp, "Migrate recover can only be run " + "when postcopy is paused."); + return; + } + + if (atomic_cmpxchg(&mis->postcopy_recover_triggered, + false, true) == true) { + error_setg(errp, "Migrate recovery is triggered already"); + return; + } + + /* + * Note that this call will never start a real migration; it will + * only re-setup the migration stream and poke existing migration + * to continue using that newly established channel. + */ + qemu_start_incoming_migration(uri, errp); +} + bool migration_is_blocked(Error **errp) { if (qemu_savevm_state_blocked(errp)) { diff --git a/migration/migration.h b/migration/migration.h index 8f64b43584..c549859cc3 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -65,6 +65,7 @@ struct MigrationIncomingState { QemuSemaphore colo_incoming_sem; /* notify PAUSED postcopy incoming migrations to try to continue */ + bool postcopy_recover_triggered; QemuSemaphore postcopy_pause_sem_dst; QemuSemaphore postcopy_pause_sem_fault; }; diff --git a/migration/savevm.c b/migration/savevm.c index 7f4899ae9c..3e660dfca6 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2166,6 +2166,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis) { trace_postcopy_pause_incoming(); + /* Clear the triggered bit to allow one recovery */ + mis->postcopy_recover_triggered = false; + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_POSTCOPY_PAUSED); diff --git a/qapi/migration.json b/qapi/migration.json index 152fb8e51e..451d8c572b 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1174,3 +1174,23 @@ # Since: 2.9 ## { 'command': 'xen-colo-do-checkpoint' } + +## +# @migrate-recover: +# +# Provide a recovery migration stream URI. +# +# @uri: the URI to be used for the recovery of migration stream. +# +# Returns: nothing. +# +# Example: +# +# -> { "execute": "migrate-recover", +# "arguments": { "uri": "tcp:192.168.1.200:12345" } } +# <- { "return": {} } +# +# Since: 2.12 +## +{ 'command': 'migrate-recover', 'data': { 'uri': 'str' }, + 'allow-oob': true } From patchwork Fri Mar 9 09:15:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883534 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMZ93kByz9sdd for ; Fri, 9 Mar 2018 20:30:57 +1100 (AEDT) Received: from localhost ([::1]:44029 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEMZ-0007qy-7L for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:30:55 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45246) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8v-0004MD-So for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:50 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8u-0006O9-DM for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:49 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59264 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8u-0006N1-9o for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:48 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E6E424040859; Fri, 9 Mar 2018 09:16:47 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 00B92215CDA7; Fri, 9 Mar 2018 09:16:44 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:33 +0800 Message-Id: <20180309091535.13315-22-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:16:47 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:16:47 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 21/23] hmp/migration: add migrate_recover command X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Sister command to migrate-recover in QMP. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- hmp-commands.hx | 13 +++++++++++++ hmp.c | 10 ++++++++++ hmp.h | 1 + 3 files changed, 24 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index defba474d6..6758c1f6ed 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -955,7 +955,20 @@ STEXI @findex migrate_incoming Continue an incoming migration using the @var{uri} (that has the same syntax as the -incoming option). +ETEXI + { + .name = "migrate_recover", + .args_type = "uri:s", + .params = "uri", + .help = "Continue a paused incoming postcopy migration", + .cmd = hmp_migrate_recover, + }, + +STEXI +@item migrate_recover @var{uri} +@findex migrate_recover +Continue a paused incoming postcopy migration using the @var{uri}. ETEXI { diff --git a/hmp.c b/hmp.c index 4ac4fbcb74..0e189e157b 100644 --- a/hmp.c +++ b/hmp.c @@ -1502,6 +1502,16 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict) hmp_handle_error(mon, &err); } +void hmp_migrate_recover(Monitor *mon, const QDict *qdict) +{ + Error *err = NULL; + const char *uri = qdict_get_str(qdict, "uri"); + + qmp_migrate_recover(uri, &err); + + hmp_handle_error(mon, &err); +} + /* Kept for backwards compatibility */ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict) { diff --git a/hmp.h b/hmp.h index b89733876d..5cf02a169a 100644 --- a/hmp.h +++ b/hmp.h @@ -68,6 +68,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict); void hmp_migrate_cancel(Monitor *mon, const QDict *qdict); void hmp_migrate_continue(Monitor *mon, const QDict *qdict); void hmp_migrate_incoming(Monitor *mon, const QDict *qdict); +void hmp_migrate_recover(Monitor *mon, const QDict *qdict); void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict); void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict); void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict); From patchwork Fri Mar 9 09:15:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883535 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMb06Yphz9sbv for ; Fri, 9 Mar 2018 20:31:40 +1100 (AEDT) Received: from localhost ([::1]:44035 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euENG-0008Pw-KV for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:31:38 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45277) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE8y-0004P6-SR for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE8x-0006So-PK for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:52 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60718 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE8x-0006Se-Kl for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:51 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4ACFD8151D49; Fri, 9 Mar 2018 09:16:51 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id 74923215CDA7; Fri, 9 Mar 2018 09:16:48 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:34 +0800 Message-Id: <20180309091535.13315-23-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:16:51 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 09 Mar 2018 09:16:51 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 22/23] migration/qmp: add command migrate-pause X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" It pauses an ongoing migration. Currently it only supports postcopy. Note that this command will work on either side of the migration. Basically when we trigger this on one side, it'll interrupt the other side as well since the other side will get notified on the disconnect event. However, it's still possible that the other side is not notified, for example, when the network is totally broken, or due to some firewall configuration changes. In that case, we will also need to run the same command on the other side so both sides will go into the paused state. Signed-off-by: Peter Xu --- migration/migration.c | 27 +++++++++++++++++++++++++++ qapi/migration.json | 16 ++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 180552329c..f31fcbb0d5 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1448,6 +1448,33 @@ void qmp_migrate_recover(const char *uri, Error **errp) qemu_start_incoming_migration(uri, errp); } +void qmp_migrate_pause(Error **errp) +{ + MigrationState *ms = migrate_get_current(); + MigrationIncomingState *mis = migration_incoming_get_current(); + int ret; + + if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + /* Source side, during postcopy */ + ret = qemu_file_shutdown(ms->to_dst_file); + if (ret) { + error_setg(errp, "Failed to pause source migration"); + } + return; + } + + if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + ret = qemu_file_shutdown(mis->from_src_file); + if (ret) { + error_setg(errp, "Failed to pause destination migration"); + } + return; + } + + error_setg(errp, "migrate-pause is currently only supported " + "during postcopy-active state"); +} + bool migration_is_blocked(Error **errp) { if (qemu_savevm_state_blocked(errp)) { diff --git a/qapi/migration.json b/qapi/migration.json index 451d8c572b..b8827d5ace 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1194,3 +1194,19 @@ ## { 'command': 'migrate-recover', 'data': { 'uri': 'str' }, 'allow-oob': true } + +## +# @migrate-pause: +# +# Pause a migration. Currently it only supports postcopy. +# +# Returns: nothing. +# +# Example: +# +# -> { "execute": "migrate-pause" } +# <- { "return": {} } +# +# Since: 2.12 +## +{ 'command': 'migrate-pause', 'allow-oob': true } From patchwork Fri Mar 9 09:15:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 883537 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zyMdv5zwyz9sdy for ; Fri, 9 Mar 2018 20:34:10 +1100 (AEDT) Received: from localhost ([::1]:44043 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euEPe-0001wj-CC for incoming@patchwork.ozlabs.org; Fri, 09 Mar 2018 04:34:06 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45333) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1euE96-0004Vz-3t for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:17:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1euE91-0006UC-5q for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:17:00 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:59272 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1euE91-0006U4-15 for qemu-devel@nongnu.org; Fri, 09 Mar 2018 04:16:55 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A8DAA4040859; Fri, 9 Mar 2018 09:16:54 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-92.pek2.redhat.com [10.72.12.92]) by smtp.corp.redhat.com (Postfix) with ESMTP id D41CE215CDA7; Fri, 9 Mar 2018 09:16:51 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 9 Mar 2018 17:15:35 +0800 Message-Id: <20180309091535.13315-24-peterx@redhat.com> In-Reply-To: <20180309091535.13315-1-peterx@redhat.com> References: <20180309091535.13315-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:16:54 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 09 Mar 2018 09:16:54 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v7 23/23] migration/hmp: add migrate_pause command X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Wrapper for QMP command "migrate-pause". Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Peter Xu --- hmp-commands.hx | 14 ++++++++++++++ hmp.c | 9 +++++++++ hmp.h | 1 + 3 files changed, 24 insertions(+) diff --git a/hmp-commands.hx b/hmp-commands.hx index 6758c1f6ed..8775ac3467 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -969,6 +969,20 @@ STEXI @item migrate_recover @var{uri} @findex migrate_recover Continue a paused incoming postcopy migration using the @var{uri}. +ETEXI + + { + .name = "migrate_pause", + .args_type = "", + .params = "", + .help = "Pause an ongoing migration (postcopy-only)", + .cmd = hmp_migrate_pause, + }, + +STEXI +@item migrate_pause +@findex migrate_pause +Pause an ongoing migration. Currently it only supports postcopy. ETEXI { diff --git a/hmp.c b/hmp.c index 0e189e157b..b1685d1673 100644 --- a/hmp.c +++ b/hmp.c @@ -1512,6 +1512,15 @@ void hmp_migrate_recover(Monitor *mon, const QDict *qdict) hmp_handle_error(mon, &err); } +void hmp_migrate_pause(Monitor *mon, const QDict *qdict) +{ + Error *err = NULL; + + qmp_migrate_pause(&err); + + hmp_handle_error(mon, &err); +} + /* Kept for backwards compatibility */ void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict) { diff --git a/hmp.h b/hmp.h index 5cf02a169a..0d82a620d6 100644 --- a/hmp.h +++ b/hmp.h @@ -69,6 +69,7 @@ void hmp_migrate_cancel(Monitor *mon, const QDict *qdict); void hmp_migrate_continue(Monitor *mon, const QDict *qdict); void hmp_migrate_incoming(Monitor *mon, const QDict *qdict); void hmp_migrate_recover(Monitor *mon, const QDict *qdict); +void hmp_migrate_pause(Monitor *mon, const QDict *qdict); void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict); void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict); void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);