{"id":807516,"url":"http://patchwork.ozlabs.org/api/1.0/patches/807516/?format=json","project":{"id":14,"url":"http://patchwork.ozlabs.org/api/1.0/projects/14/?format=json","name":"QEMU Development","link_name":"qemu-devel","list_id":"qemu-devel.nongnu.org","list_email":"qemu-devel@nongnu.org","web_url":"","scm_url":"","webscm_url":""},"msgid":"<1504081950-2528-12-git-send-email-peterx@redhat.com>","date":"2017-08-30T08:32:08","name":"[RFC,v2,11/33] migration: allow src return path to pause","commit_ref":null,"pull_url":null,"state":"new","archived":false,"hash":"85dfd1cf41f5448b581af2272baa0661a6b9a931","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/1.0/people/67717/?format=json","name":"Peter Xu","email":"peterx@redhat.com"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/qemu-devel/patch/1504081950-2528-12-git-send-email-peterx@redhat.com/mbox/","series":[{"id":552,"url":"http://patchwork.ozlabs.org/api/1.0/series/552/?format=json","date":"2017-08-30T08:31:59","name":"Migration: postcopy failure recovery","version":2,"mbox":"http://patchwork.ozlabs.org/series/552/mbox/"}],"check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/807516/checks/","tags":{},"headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx09.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx09.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xhzpF5VRsz9t2Q\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 30 Aug 2017 18:54:29 +1000 (AEST)","from localhost ([::1]:49109 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dmylX-0000GD-F0\n\tfor incoming@patchwork.ozlabs.org; Wed, 30 Aug 2017 04:54:27 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:34169)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dmyRH-0007KF-ML\n\tfor qemu-devel@nongnu.org; Wed, 30 Aug 2017 04:33:32 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dmyRC-0003MP-QA\n\tfor qemu-devel@nongnu.org; Wed, 30 Aug 2017 04:33:31 -0400","from mx1.redhat.com ([209.132.183.28]:48818)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>) id 1dmyRC-0003Lz-Hn\n\tfor qemu-devel@nongnu.org; Wed, 30 Aug 2017 04:33:26 -0400","from smtp.corp.redhat.com\n\t(int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 854894A6EE;\n\tWed, 30 Aug 2017 08:33:25 +0000 (UTC)","from pxdev.xzpeter.org.com (dhcp-14-103.nay.redhat.com\n\t[10.66.14.103])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 994DC871DD;\n\tWed, 30 Aug 2017 08:33:22 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 854894A6EE","From":"Peter Xu <peterx@redhat.com>","To":"qemu-devel@nongnu.org","Date":"Wed, 30 Aug 2017 16:32:08 +0800","Message-Id":"<1504081950-2528-12-git-send-email-peterx@redhat.com>","In-Reply-To":"<1504081950-2528-1-git-send-email-peterx@redhat.com>","References":"<1504081950-2528-1-git-send-email-peterx@redhat.com>","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.15","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.38]);\n\tWed, 30 Aug 2017 08:33:25 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"[Qemu-devel] [RFC v2 11/33] migration: allow src return path to\n\tpause","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Laurent Vivier <lvivier@redhat.com>,\n\tAndrea Arcangeli <aarcange@redhat.com>, \n\tJuan Quintela <quintela@redhat.com>,\n\tAlexey Perevalov <a.perevalov@samsung.com>, peterx@redhat.com,\n\t\"Dr . David Alan Gilbert\" <dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"},"content":"Let the thread pause for network issues.\n\nReviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>\nSigned-off-by: Peter Xu <peterx@redhat.com>\n---\n migration/migration.c  | 35 +++++++++++++++++++++++++++++++++--\n migration/migration.h  |  1 +\n migration/trace-events |  2 ++\n 3 files changed, 36 insertions(+), 2 deletions(-)","diff":"diff --git a/migration/migration.c b/migration/migration.c\nindex 80de212..b3cd8be 100644\n--- a/migration/migration.c\n+++ b/migration/migration.c\n@@ -996,6 +996,7 @@ static void migrate_fd_cleanup(void *opaque)\n     block_cleanup_parameters(s);\n \n     qemu_sem_destroy(&s->postcopy_pause_sem);\n+    qemu_sem_destroy(&s->postcopy_pause_rp_sem);\n }\n \n void migrate_fd_error(MigrationState *s, const Error *error)\n@@ -1140,6 +1141,7 @@ MigrationState *migrate_init(void)\n     error_free(s->error);\n     s->error = NULL;\n     qemu_sem_init(&s->postcopy_pause_sem, 0);\n+    qemu_sem_init(&s->postcopy_pause_rp_sem, 0);\n \n     migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);\n \n@@ -1527,6 +1529,18 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,\n     }\n }\n \n+/* Return true to retry, false to quit */\n+static bool postcopy_pause_return_path_thread(MigrationState *s)\n+{\n+    trace_postcopy_pause_return_path();\n+\n+    qemu_sem_wait(&s->postcopy_pause_rp_sem);\n+\n+    trace_postcopy_pause_return_path_continued();\n+\n+    return true;\n+}\n+\n /*\n  * Handles messages sent on the return path towards the source VM\n  *\n@@ -1543,6 +1557,8 @@ static void *source_return_path_thread(void *opaque)\n     int res;\n \n     trace_source_return_path_thread_entry();\n+\n+retry:\n     while (!ms->rp_state.error && !qemu_file_get_error(rp) &&\n            migration_is_setup_or_active(ms->state)) {\n         trace_source_return_path_thread_loop_top();\n@@ -1634,13 +1650,28 @@ static void *source_return_path_thread(void *opaque)\n             break;\n         }\n     }\n-    if (qemu_file_get_error(rp)) {\n+\n+out:\n+    res = qemu_file_get_error(rp);\n+    if (res) {\n+        if (res == -EIO) {\n+            /*\n+             * Maybe there is something we can do: it looks like a\n+             * network down issue, and we pause for a recovery.\n+             */\n+            if (postcopy_pause_return_path_thread(ms)) {\n+                /* Reload rp, reset the rest */\n+                rp = ms->rp_state.from_dst_file;\n+                ms->rp_state.error = false;\n+                goto retry;\n+            }\n+        }\n+\n         trace_source_return_path_thread_bad_end();\n         mark_source_rp_bad(ms);\n     }\n \n     trace_source_return_path_thread_end();\n-out:\n     ms->rp_state.from_dst_file = NULL;\n     qemu_fclose(rp);\n     return NULL;\ndiff --git a/migration/migration.h b/migration/migration.h\nindex c423682..323d88d 100644\n--- a/migration/migration.h\n+++ b/migration/migration.h\n@@ -155,6 +155,7 @@ struct MigrationState\n \n     /* Needed by postcopy-pause state */\n     QemuSemaphore postcopy_pause_sem;\n+    QemuSemaphore postcopy_pause_rp_sem;\n };\n \n void migrate_set_state(int *state, int old_state, int new_state);\ndiff --git a/migration/trace-events b/migration/trace-events\nindex 7764c6f..1a83f60 100644\n--- a/migration/trace-events\n+++ b/migration/trace-events\n@@ -98,6 +98,8 @@ migration_thread_setup_complete(void) \"\"\n open_return_path_on_source(void) \"\"\n open_return_path_on_source_continue(void) \"\"\n postcopy_start(void) \"\"\n+postcopy_pause_return_path(void) \"\"\n+postcopy_pause_return_path_continued(void) \"\"\n postcopy_pause_continued(void) \"\"\n postcopy_pause_incoming(void) \"\"\n postcopy_pause_incoming_continued(void) \"\"\n","prefixes":["RFC","v2","11/33"]}