{"id":2233214,"url":"http://patchwork.ozlabs.org/api/1.1/patches/2233214/?format=json","web_url":"http://patchwork.ozlabs.org/project/qemu-devel/patch/20260505202640.1011006-2-peterx@redhat.com/","project":{"id":14,"url":"http://patchwork.ozlabs.org/api/1.1/projects/14/?format=json","name":"QEMU Development","link_name":"qemu-devel","list_id":"qemu-devel.nongnu.org","list_email":"qemu-devel@nongnu.org","web_url":"","scm_url":"","webscm_url":""},"msgid":"<20260505202640.1011006-2-peterx@redhat.com>","date":"2026-05-05T20:26:18","name":"[PULL,01/23] migration: Fix blocking in POSTCOPY_DEVICE during package load","commit_ref":null,"pull_url":null,"state":"new","archived":false,"hash":"3ce377ff7b07ba5a13ad54bba32a3ec7b20da7bf","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/1.1/people/67717/?format=json","name":"Peter Xu","email":"peterx@redhat.com"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/qemu-devel/patch/20260505202640.1011006-2-peterx@redhat.com/mbox/","series":[{"id":502897,"url":"http://patchwork.ozlabs.org/api/1.1/series/502897/?format=json","web_url":"http://patchwork.ozlabs.org/project/qemu-devel/list/?series=502897","date":"2026-05-05T20:26:17","name":"[PULL,01/23] migration: Fix blocking in POSTCOPY_DEVICE during package load","version":1,"mbox":"http://patchwork.ozlabs.org/series/502897/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/2233214/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/2233214/checks/","tags":{},"headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=RY2cdt8G;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=google header.b=mG2GN8G6;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists1p.gnu.org;\n envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g998f0QlKz1yJV\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 06 May 2026 06:29:06 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists1p.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-devel-bounces@nongnu.org>)\n\tid 1wKMM1-0004Mh-Cn; Tue, 05 May 2026 16:26:53 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <peterx@redhat.com>) id 1wKMLz-0004MG-Oz\n for qemu-devel@nongnu.org; Tue, 05 May 2026 16:26:51 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.129.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <peterx@redhat.com>) id 1wKMLx-0002ZV-9e\n for qemu-devel@nongnu.org; Tue, 05 May 2026 16:26:51 -0400","from mail-qv1-f72.google.com (mail-qv1-f72.google.com\n [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-607-xlykXANvMiWEnLvXX42xqA-1; Tue, 05 May 2026 16:26:46 -0400","by mail-qv1-f72.google.com with SMTP id\n 6a1803df08f44-8b552610488so80551246d6.0\n for <qemu-devel@nongnu.org>; Tue, 05 May 2026 13:26:46 -0700 (PDT)","from x1.com ([142.189.10.167]) by smtp.gmail.com with ESMTPSA id\n 6a1803df08f44-8b53c6b8123sm155283806d6.35.2026.05.05.13.26.44\n (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);\n Tue, 05 May 2026 13:26:44 -0700 (PDT)"],"DKIM-Signature":["v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1778012808;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=cuOw5wi626TK+F1PhtvnlgO8EB6xWCZ2vQsoxWAJJRo=;\n b=RY2cdt8G37nzNk4U+kKXf7kRWmpz5eV3026N2D/Ed8kyYbgG5vKdmCLpHQulTXmdfPApNs\n DOYvpYftSO43WBQOS9UuV5keth+sNoJ+qAxwuHU/izzJhvAK3VA2QVGw8xYelgUEccejEc\n TtMhA/jWyEut2A+uhEKPcb2GwaNMa2U=","v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=redhat.com; s=google; t=1778012806; x=1778617606; darn=nongnu.org;\n h=content-transfer-encoding:mime-version:references:in-reply-to\n :message-id:date:subject:cc:to:from:from:to:cc:subject:date\n :message-id:reply-to;\n bh=cuOw5wi626TK+F1PhtvnlgO8EB6xWCZ2vQsoxWAJJRo=;\n b=mG2GN8G6WX9raPPUTRsh5YP+Cov9NaUiZoufpWhFXcLN8quN/RM4jwSv58GTv6iCTW\n WZe929D0852XLWDo7ZC+BuwkMZHHBKrqFkCgdT0zPo9L5ASJuGpc1RC6a7fsoGQG8IQZ\n 0QVDmL4kPtt+LmK6uUJVtat0YVl0mbKnPP1X+9O+TQ/udqJD/3Ympg6+jWjEwA7mfbt+\n U3iB9rwhX3TL2bjGxTTXGMGiNYtx6ZNXsXbt4hGrVcCJXKOJSifmCyLlvx1Lk0rdTtfq\n 2DhIJFCqnH/yjH81gQZ6sg76gaGDBQ24sSFRaLj5xhIOUeHmp7Iy2YzutibZY8zUxIv5\n jH3g=="],"X-MC-Unique":"xlykXANvMiWEnLvXX42xqA-1","X-Mimecast-MFC-AGG-ID":"xlykXANvMiWEnLvXX42xqA_1778012806","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1778012806; x=1778617606;\n h=content-transfer-encoding:mime-version:references:in-reply-to\n :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from\n :to:cc:subject:date:message-id:reply-to;\n bh=cuOw5wi626TK+F1PhtvnlgO8EB6xWCZ2vQsoxWAJJRo=;\n b=HNTmnXpQWJn3UgiesMqEg8N+sF95ZoNYTaZ3rgZ0OKAZRJfe2gm8YkIX8zK+NMrciQ\n 1imDLa9a3DoK93Vm4oV+ZnYO0S4iYpmSaIlOSU3SKCxLmzN/CXbrZFpmaPxz2/fex+m4\n 5xXDKk6jb+/9RM4+g8D5+fRdXhFgItMQiPXJPEnEA9AVgta4wz48qnOO3m67oZ/LDK+/\n 2VPMoq0oiedbIzXq6cRxClkc5I9u1RCMORjfWYSloE6jLNezTB4S9kzrIlbRVfkw543T\n JZEdG79YqusUCa1cwPAnLu5uaymNnQfE+F4++ZNLRJcrk8p3V9Ws5pXDT3OlADn7y6wG\n MhhQ==","X-Gm-Message-State":"AOJu0Yw0da5O33hlR9T6mdYcqOckZ/6kdA7sziz7rVq4Z8XTbnv+OBph\n M2xB3DpHBnkkTiXuJoPZMp8CIiqRyJfi9BPoLKQtcP+jpGMDUxp0Z61jPHs9Sj2EBOAZKTPBOCO\n eO2XelpjLR5gjgsjxvrVlMDCvrrkQKDcpdpgbqSQo/RRkC+yBbTjzFegDPcoWJBSFzldYf3utfU\n HlmQugHg8+kD3B2a1PodUbi0YRytv98V6FSjmzPw==","X-Gm-Gg":"AeBDiesiXnZEmA2w0V0IcRhIl7g5ZDuSTTo24QvIO2THcaIHspqRshdnh58vT/fDBpp\n pJx356LYnMWi5k2Tct4xbDC7mwE8e7q9H1J4mTR0q0aMvVBbiddQ5tsMiMkoTbJCFNQWETvInFD\n /lyGU1F0syUu+s7cWXjFFgxobEc5j78mtgiagupVzTP5NIS+kqNJ/cyxDk9uG0AqUkrat8rf0O4\n jeQ7JBOpxEthercy+ub4vb0P+tp020KAL3vo2tUOGPxItr4fjh0y/esJBO6mVZQxRif7zGWI7/L\n GhZEkeb8u1sY4U7ddesY7hVk0PMVALMhXc335bfIQVUabPm0AcK/vNe3YENsIPCwM9MmwnCWXwS\n 6MTDW6qy27OynJYvYDCgMJ6tRjXgXYY34ewg+5/pPo83Mtw5rmov7gUA=","X-Received":["by 2002:a05:6214:19c6:b0:8ae:62aa:665a with SMTP id\n 6a1803df08f44-8bc442dbca6mr4401546d6.28.1778012805746;\n Tue, 05 May 2026 13:26:45 -0700 (PDT)","by 2002:a05:6214:19c6:b0:8ae:62aa:665a with SMTP id\n 6a1803df08f44-8bc442dbca6mr4400786d6.28.1778012805050;\n Tue, 05 May 2026 13:26:45 -0700 (PDT)"],"From":"Peter Xu <peterx@redhat.com>","To":"qemu-devel@nongnu.org","Cc":"Fabiano Rosas <farosas@suse.de>, Paolo Bonzini <pbonzini@redhat.com>,\n Peter Xu <peterx@redhat.com>, Pranav Tyagi <prtyagi@redhat.com>,\n Juraj Marcin <jmarcin@redhat.com>","Subject":"[PULL 01/23] migration: Fix blocking in POSTCOPY_DEVICE during\n package load","Date":"Tue,  5 May 2026 16:26:18 -0400","Message-ID":"<20260505202640.1011006-2-peterx@redhat.com>","X-Mailer":"git-send-email 2.53.0","In-Reply-To":"<20260505202640.1011006-1-peterx@redhat.com>","References":"<20260505202640.1011006-1-peterx@redhat.com>","MIME-Version":"1.0","Content-Transfer-Encoding":"8bit","Received-SPF":"pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-24","X-Spam_score":"-2.5","X-Spam_bar":"--","X-Spam_report":"(-2.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.443,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"qemu development <qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-devel>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org"},"content":"From: Pranav Tyagi <prtyagi@redhat.com>\n\nThe package_loaded event is not set in case MIG_RP_MSG_PONG does not\narrive on the source from the destination in the return path thread. The\nmigration thread would then be blocked waiting for package_loaded event\nindefinitely in POSTCOPY_DEVICE state. Where as, in such a condition the\nsource VM can safely resume as the destination has not yet started. The\npong message can get lost in case of a network failure or destination\ncrash before sending the pong.\n\nThis patch removes the package_loaded event and uses rp_sem, instead of\nkicking multiple events. The error is detected in case of network\nfailure or destination crash and rp_sem is set in the out path of the\nreturn path thread. This will kick the migration thread out from a\ncondition of indefinitely waiting for rp_sem. The migration thread then\nfails early and breaks from the migration loop to resume the vm on the\nsource side.\n\nFixes: 7b842fe354c6 (\"migration: Introduce POSTCOPY_DEVICE state\")\nSigned-off-by: Pranav Tyagi <prtyagi@redhat.com>\nReviewed-by: Juraj Marcin <jmarcin@redhat.com>\nReviewed-by: Peter Xu <peterx@redhat.com>\nLink: https://lore.kernel.org/r/20260423094438.43556-1-prtyagi@redhat.com\nSigned-off-by: Peter Xu <peterx@redhat.com>\n---\n migration/migration.h |  1 -\n migration/migration.c | 48 ++++++++++++++++++++++++++++---------------\n 2 files changed, 31 insertions(+), 18 deletions(-)","diff":"diff --git a/migration/migration.h b/migration/migration.h\nindex b6888daced..9081e6a612 100644\n--- a/migration/migration.h\n+++ b/migration/migration.h\n@@ -512,7 +512,6 @@ struct MigrationState {\n     bool rdma_migration;\n \n     bool postcopy_package_loaded;\n-    QemuEvent postcopy_package_loaded_event;\n \n     GSource *hup_source;\n \ndiff --git a/migration/migration.c b/migration/migration.c\nindex 5c9aaa6e58..6e4988a590 100644\n--- a/migration/migration.c\n+++ b/migration/migration.c\n@@ -1661,7 +1661,6 @@ int migrate_init(MigrationState *s, Error **errp)\n     migration_reset_vfio_bytes_transferred();\n \n     s->postcopy_package_loaded = false;\n-    qemu_event_reset(&s->postcopy_package_loaded_event);\n \n     return 0;\n }\n@@ -2317,7 +2316,7 @@ static void *source_return_path_thread(void *opaque)\n             if (tmp32 == QEMU_VM_PING_PACKAGED_LOADED) {\n                 trace_source_return_path_thread_postcopy_package_loaded();\n                 ms->postcopy_package_loaded = true;\n-                qemu_event_set(&ms->postcopy_package_loaded_event);\n+                migration_rp_kick(ms);\n             }\n             break;\n \n@@ -2388,16 +2387,21 @@ out:\n         trace_source_return_path_thread_bad_end();\n     }\n \n-    if (ms->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {\n+    if (ms->state == MIGRATION_STATUS_POSTCOPY_RECOVER ||\n+        ms->state == MIGRATION_STATUS_POSTCOPY_DEVICE) {\n         /*\n-         * this will be extremely unlikely: that we got yet another network\n-         * issue during recovering of the 1st network failure.. during this\n-         * period the main migration thread can be waiting on rp_sem for\n-         * this thread to sync with the other side.\n+         * The migration thread can get stuck waiting for rp_sem if the\n+         * return path fails to sync with the destination. This handles\n+         * two specific cases:\n          *\n-         * When this happens, explicitly kick the migration thread out of\n-         * RECOVER stage and back to PAUSED, so the admin can try\n-         * everything again.\n+         * POSTCOPY_RECOVER: A failure occurs during a recovery attempt.\n+         * We kick the migration thread back to PAUSED so the admin can\n+         * retry.\n+         *\n+         * POSTCOPY_DEVICE: The MIG_RP_MSG_PONG is lost due to a\n+         * network failure or destination crash. We kick the migration\n+         * thread out of its wait so it can fail the migration and safely\n+         * resume the VM on the source.\n          */\n         migration_rp_kick(ms);\n     }\n@@ -3226,12 +3230,24 @@ static MigIterateState migration_iteration_run(MigrationState *s)\n         if (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE &&\n             (s->postcopy_package_loaded || complete_ready)) {\n             /*\n-             * If package has been loaded, the event is set and we will\n-             * immediatelly transition to POSTCOPY_ACTIVE. If we are ready for\n-             * completion, we need to wait for destination to load the postcopy\n-             * package before actually completing.\n+             * We will immediately transition to POSTCOPY_ACTIVE.\n+             * If we are ready for completion, we need to wait for\n+             * destination to load the postcopy package before actually\n+             * completing.\n              */\n-            qemu_event_wait(&s->postcopy_package_loaded_event);\n+            while (!s->postcopy_package_loaded) {\n+                if (migration_rp_wait(s)) {\n+                    /*\n+                     * Error happened. Migration thread was stuck waiting in\n+                     * POSTCOPY_DEVICE for rp_sem which was never set.\n+                     */\n+                    migrate_set_state(&s->state,\n+                                    MIGRATION_STATUS_POSTCOPY_DEVICE,\n+                                    MIGRATION_STATUS_FAILING);\n+                    return MIG_ITERATE_BREAK;\n+                }\n+            }\n+            /* Acknowledgement received from the destination */\n             migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,\n                               MIGRATION_STATUS_POSTCOPY_ACTIVE);\n         }\n@@ -3863,7 +3879,6 @@ static void migration_instance_finalize(Object *obj)\n     qemu_sem_destroy(&ms->rp_state.rp_pong_acks);\n     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);\n     error_free(ms->error);\n-    qemu_event_destroy(&ms->postcopy_package_loaded_event);\n }\n \n static void migration_instance_init(Object *obj)\n@@ -3885,7 +3900,6 @@ static void migration_instance_init(Object *obj)\n     qemu_sem_init(&ms->wait_unplug_sem, 0);\n     qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);\n     qemu_mutex_init(&ms->qemu_file_lock);\n-    qemu_event_init(&ms->postcopy_package_loaded_event, 0);\n }\n \n /*\n","prefixes":["PULL","01/23"]}