[{"id":3680636,"web_url":"http://patchwork.ozlabs.org/comment/3680636/","msgid":"<87wlxzdsv9.fsf@suse.de>","list_archive_url":null,"date":"2026-04-22T13:31:06","subject":"Re: [PATCH] migration: Fix crash on second migration when cancel\n early","submitter":{"id":85343,"url":"http://patchwork.ozlabs.org/api/people/85343/","name":"Fabiano Rosas","email":"farosas@suse.de"},"content":"Peter Xu <peterx@redhat.com> writes:\n\n> Marc-André reported an issue on QEMU crash when retrying a cancelled\n> migration during early setup phase, see \"Link:\" for more information, and\n> also easy way to reproduce.\n>\n> This patch is a replacement of the prior fix proposed by not only switching\n> to migration_cleanup(), but also fixing it from CPR side, so that we track\n> hup_source properly to know if src QEMU is waiting or the HUP signal.\n>\n> To put it simple: this chunk of special casing in migration_cancel() should\n> not affect normal migration, but only cpr-transfer migration to cover the\n> small window when the src QEMU is waiting for a HUP signal on cpr\n> channel (so that src QEMU can continue the migration on the main channel).\n>\n> To achieve that, we'll also need to remember to detach the hup_source\n> whenenver invoked: after that point, we should always be able to cleanup\n> the migration.\n>\n> It's not a generic operation to explicitly detach a gsource from its\n> context while in its dispatch() function.  But it should be safe, because\n> gsource disptch() will only happen with a boosted refcount for the\n> dispatcher so that the gsource will not be freed until the callback\n> completes. It's also safe to return G_SOURCE_REMOVE after the gsource is\n> detached, as glib will simply ignore the G_SOURCE_REMOVE.\n>\n> One can refer to latest 2.86.5 glib code in g_main_dispatch() for that:\n>\n> https://github.com/GNOME/glib/blob/2.86.5/glib/gmain.c#L3592\n>\n> When at this, add a bunch of assertions to make sure nothing surprises us.\n>\n> After this patch applied, the 2nd migration will not crash QEMU, instead\n> it'll be in CANCELLING until the socket connection times out (it will take\n> ~2min on my Fedora default kernel).  During this process no 2nd migration\n> will be allowed, and after it timed out migration can be restarted.\n>\n> It's because so far we don't have control over socket_connect_outgoing(),\n> or anything yet managed by a task executed in qio_task_run_in_thread().\n> Speeding up the cancellation to be left for future.\n>\n> I also tested cpr-transfer by only providing cpr channel not the main\n> channel (with -incoming defer), kickoff migration on source, then cancel it\n> on source directly without providing the main channel.  It keeps working.\n>\n> I wanted to add an unit test for that but it'll need to refactor current\n> cpr-transfer tests first; let's leave it for later.\n>\n> Link: https://lore.kernel.org/r/20260417184742.293061-1-marcandre.lureau@redhat.com\n> Reported-by: Marc-André Lureau <marcandre.lureau@redhat.com>\n> Signed-off-by: Peter Xu <peterx@redhat.com>\n> ---\n>  include/migration/cpr.h  |  1 +\n>  migration/migration.h    |  5 +++++\n>  migration/cpr-transfer.c | 10 ++++++++++\n>  migration/migration.c    | 31 +++++++++++++++++++++++--------\n>  4 files changed, 39 insertions(+), 8 deletions(-)\n>\n> diff --git a/include/migration/cpr.h b/include/migration/cpr.h\n> index 5850fd1788..ebf09a2f0a 100644\n> --- a/include/migration/cpr.h\n> +++ b/include/migration/cpr.h\n> @@ -57,6 +57,7 @@ QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);\n>  void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n>                                  void *opaque);\n>  void cpr_transfer_source_destroy(MigrationState *s);\n> +bool cpr_transfer_source_active(MigrationState *s);\n>  \n>  void cpr_exec_init(void);\n>  QEMUFile *cpr_exec_output(Error **errp);\n> diff --git a/migration/migration.h b/migration/migration.h\n> index b6888daced..2bc2787480 100644\n> --- a/migration/migration.h\n> +++ b/migration/migration.h\n> @@ -514,6 +514,11 @@ struct MigrationState {\n>      bool postcopy_package_loaded;\n>      QemuEvent postcopy_package_loaded_event;\n>  \n> +    /*\n> +     * When set, it means cpr-transfer is waiting for the HUP signal from\n> +     * destination to continue the 2nd step of migration via the main\n> +     * channel.\n> +     */\n>      GSource *hup_source;\n>  \n>      /*\n> diff --git a/migration/cpr-transfer.c b/migration/cpr-transfer.c\n> index 61d5c9dce2..9defe7bad7 100644\n> --- a/migration/cpr-transfer.c\n> +++ b/migration/cpr-transfer.c\n> @@ -6,6 +6,7 @@\n>   */\n>  \n>  #include \"qemu/osdep.h\"\n> +#include \"qemu/main-loop.h\"\n>  #include \"qapi/clone-visitor.h\"\n>  #include \"qapi/error.h\"\n>  #include \"qapi/qapi-visit-migration.h\"\n> @@ -79,6 +80,7 @@ QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp)\n>  void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n>                                  void *opaque)\n>  {\n> +    assert(bql_locked());\n>      s->hup_source = qio_channel_create_watch(cpr_state_ioc(), G_IO_HUP);\n\nBefore I review the patch in detail, let me just make a high level\ncomment here:\n\nI wonder if we should have a \"register\" in the iochannel layer for the\nseveral watches we create. So we could at migration_cancel time call\ng_clear_handle/g_source_destroy on all at the same time. The management\nof these source ids is getting too particular.\n\nI'm seeing that exec, fd and file migration all ignore the id returned\nby channel-watch.c. In the case of exec this is causing\nqio_channel_command_finalize() to be skipped, leaving the exec'ed\ncommand process behind! So what I did as an experiment was to register\nwatches like this:\n\n  static inline void migration_watch_data(void) {};\n  \n  qio_channel_add_watch_full(ioc, G_IO_IN, exec_accept_incoming_migration,\n                             migration_watch_data, NULL,\n                             g_main_context_get_thread_default());\n\nand at migrate_cancel():\n\n  while(g_source_remove_by_user_data(migration_watch_data));\n\nIt feels to me that a wrapper around this, or even a hashtable of \"func\nptr->GSource\" or \"str->GSource id\" would allow us to call a (say)\nqio_channel_clear_watches() and avoid having to do the management in\neach of the clients.\n\nWe already have stuff like IOWatchPoll which kind of already wraps the\nwatch creation in a sense.\n\n>      g_source_set_callback(s->hup_source,\n>                            (GSourceFunc)func,\n> @@ -89,9 +91,17 @@ void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n>  \n>  void cpr_transfer_source_destroy(MigrationState *s)\n>  {\n> +    assert(bql_locked());\n>      if (s->hup_source) {\n>          g_source_destroy(s->hup_source);\n>          g_source_unref(s->hup_source);\n>          s->hup_source = NULL;\n>      }\n>  }\n> +\n> +bool cpr_transfer_source_active(MigrationState *s)\n> +{\n> +    /* Whenever the HUP gsource is available, it's active. */\n> +    assert(bql_locked());\n> +    return s->hup_source;\n> +}\n> diff --git a/migration/migration.c b/migration/migration.c\n> index 5c9aaa6e58..58c1e56766 100644\n> --- a/migration/migration.c\n> +++ b/migration/migration.c\n> @@ -1469,14 +1469,19 @@ void migration_cancel(void)\n>      }\n>  \n>      /*\n> -     * If migration_connect_outgoing has not been called, then there\n> -     * is no path that will complete the cancellation. Do it now.\n> -     */\n> -    if (setup && !s->to_dst_file) {\n> -        migrate_set_state(&s->state, MIGRATION_STATUS_CANCELLING,\n> -                          MIGRATION_STATUS_CANCELLED);\n> -        cpr_state_close();\n> -        cpr_transfer_source_destroy(s);\n> +     * This is cpr-transfer specific processing.\n> +     *\n> +     * If this is true, it means cpr-transfer migration is waiting for the\n> +     * destination to send HUP event on CPR channel to continue the next\n> +     * phase.  If so, do the cleanup proactively to avoid get stuck in\n> +     * CANCELLING state.\n> +     */\n> +    if (cpr_transfer_source_active(s)) {\n> +        assert(migrate_mode() == MIG_MODE_CPR_TRANSFER);\n> +        assert(setup && !s->to_dst_file);\n> +        migration_cleanup(s);\n> +        /* Now all things should have been released */\n> +        assert(!cpr_transfer_source_active(s));\n>      }\n>  }\n>  \n> @@ -2009,12 +2014,22 @@ static gboolean migration_connect_outgoing_cb(QIOChannel *channel,\n>      MigrationState *s = migrate_get_current();\n>      Error *local_err = NULL;\n>  \n> +    /*\n> +     * Detach and release the GSource right after use.  We rely on this to\n> +     * detect this small cpr-transfer window of \"waiting for HUP event\".\n> +     */\n> +    cpr_transfer_source_destroy(s);\n> +\n>      migration_connect_outgoing(s, opaque, &local_err);\n>  \n>      if (local_err) {\n>          migration_connect_error_propagate(s, local_err);\n>      }\n>  \n> +    /*\n> +     * This is redundant as we do cpr_transfer_source_destroy() at the\n> +     * entry, but it's benign; glib will just skip the detach.\n> +     */\n>      return G_SOURCE_REMOVE;\n>  }","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256\n header.s=susede2_rsa header.b=LBLH4baB;\n\tdkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256\n header.s=susede2_ed25519 header.b=xIH7z70t;\n\tdkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de\n header.a=rsa-sha256 header.s=susede2_rsa header.b=LBLH4baB;\n\tdkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256\n header.s=susede2_ed25519 header.b=xIH7z70t;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists1p.gnu.org;\n envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)","smtp-out1.suse.de;\n dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LBLH4baB;\n dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=xIH7z70t"],"Received":["from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g10X70J6wz1y2d\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 22 Apr 2026 23:32:37 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists1p.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-devel-bounces@nongnu.org>)\n\tid 1wFXgT-0007Ch-Vg; Wed, 22 Apr 2026 09:32:06 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <farosas@suse.de>) id 1wFXfq-0007AP-AW\n for qemu-devel@nongnu.org; Wed, 22 Apr 2026 09:31:35 -0400","from smtp-out1.suse.de ([195.135.223.130])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)\n (Exim 4.90_1) (envelope-from <farosas@suse.de>) id 1wFXfl-0000Ky-Tj\n for qemu-devel@nongnu.org; Wed, 22 Apr 2026 09:31:25 -0400","from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org\n [IPv6:2a07:de40:b281:104:10:150:64:97])\n (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest\n SHA256)\n (No client certificate requested)\n by smtp-out1.suse.de (Postfix) with ESMTPS id 097116A81C;\n Wed, 22 Apr 2026 13:31:09 +0000 (UTC)","from imap1.dmz-prg2.suse.org (localhost [127.0.0.1])\n (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest\n SHA256)\n (No client certificate requested)\n by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 9B6B5593AF;\n Wed, 22 Apr 2026 13:31:08 +0000 (UTC)","from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167])\n by imap1.dmz-prg2.suse.org with ESMTPSA id vRHJGpzN6GmpKgAAD6G6ig\n (envelope-from <farosas@suse.de>); Wed, 22 Apr 2026 13:31:08 +0000"],"DKIM-Signature":["v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_rsa;\n t=1776864669;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=EsY66juZdA4eO/bL7Unwm/4BXmuMiyZqtBauxvGQ0b8=;\n b=LBLH4baBO1ZOMzXPK6MzhGSgQTcepgPZ13jSja98kYX4aUNCHhqySRZtDjP2/OTbSCY/Ql\n kEmMd9TGAJpu5MH9aUQ/6EXPCySt5Heso90oGZnpUFAyyHOW4JU36rTFYackb7klaG3XO9\n hbtUPJC52N8VGThOe8Juu0nCyFhi0io=","v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_ed25519; t=1776864669;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=EsY66juZdA4eO/bL7Unwm/4BXmuMiyZqtBauxvGQ0b8=;\n b=xIH7z70tGPIctK21eGKn6u8fAOauJRs/nZxJfwE4lhtlhwsAgS2ekvh8Q1UCIhKhMqgm8U\n WIyXI87BOdM/i2DQ==","v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_rsa;\n t=1776864669;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=EsY66juZdA4eO/bL7Unwm/4BXmuMiyZqtBauxvGQ0b8=;\n b=LBLH4baBO1ZOMzXPK6MzhGSgQTcepgPZ13jSja98kYX4aUNCHhqySRZtDjP2/OTbSCY/Ql\n kEmMd9TGAJpu5MH9aUQ/6EXPCySt5Heso90oGZnpUFAyyHOW4JU36rTFYackb7klaG3XO9\n hbtUPJC52N8VGThOe8Juu0nCyFhi0io=","v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_ed25519; t=1776864669;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=EsY66juZdA4eO/bL7Unwm/4BXmuMiyZqtBauxvGQ0b8=;\n b=xIH7z70tGPIctK21eGKn6u8fAOauJRs/nZxJfwE4lhtlhwsAgS2ekvh8Q1UCIhKhMqgm8U\n WIyXI87BOdM/i2DQ=="],"From":"Fabiano Rosas <farosas@suse.de>","To":"Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org","Cc":"Peter Xu <peterx@redhat.com>, Prasad Pandit <ppandit@redhat.com>,\n Ben Chaney <bchaney@akamai.com>, Juraj Marcin <jmarcin@redhat.com>,\n Mark Kanda <mark.kanda@oracle.com>, Pranav Tyagi <prtyagi@redhat.com>,\n\t=?utf-8?q?Marc-Andr=C3=A9?= Lureau <marcandre.lureau@redhat.com>,\n\t=?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com>","Subject":"Re: [PATCH] migration: Fix crash on second migration when cancel\n early","In-Reply-To":"<20260421175820.302795-1-peterx@redhat.com>","References":"<20260421175820.302795-1-peterx@redhat.com>","Date":"Wed, 22 Apr 2026 10:31:06 -0300","Message-ID":"<87wlxzdsv9.fsf@suse.de>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Transfer-Encoding":"quoted-printable","X-Rspamd-Action":"no action","X-Rspamd-Server":"rspamd2.dmz-prg2.suse.org","X-Spamd-Result":"default: False [-4.51 / 50.00]; BAYES_HAM(-3.00)[100.00%];\n NEURAL_HAM_LONG(-1.00)[-1.000];\n NEURAL_HAM_SHORT(-0.20)[-1.000];\n R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];\n MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[];\n RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[];\n MISSING_XM_UA(0.00)[]; MIME_TRACE(0.00)[0:+];\n FUZZY_RATELIMITED(0.00)[rspamd.com];\n SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from];\n TO_DN_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[];\n RCPT_COUNT_SEVEN(0.00)[10];\n RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from];\n RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[];\n FROM_HAS_DN(0.00)[];\n RECEIVED_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:106:10:150:64:167:received];\n DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns,suse.de:dkim,suse.de:mid];\n DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];\n TO_MATCH_ENVRCPT_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[];\n DKIM_TRACE(0.00)[suse.de:+]","X-Rspamd-Queue-Id":"097116A81C","X-Spam-Score":"-4.51","Received-SPF":"pass client-ip=195.135.223.130; envelope-from=farosas@suse.de;\n helo=smtp-out1.suse.de","X-Spam_score_int":"-43","X-Spam_score":"-4.4","X-Spam_bar":"----","X-Spam_report":"(-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001,\n SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"qemu development <qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-devel>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3680649,"web_url":"http://patchwork.ozlabs.org/comment/3680649/","msgid":"<87tst3dqv9.fsf@suse.de>","list_archive_url":null,"date":"2026-04-22T14:14:18","subject":"Re: [PATCH] migration: Fix crash on second migration when cancel\n early","submitter":{"id":85343,"url":"http://patchwork.ozlabs.org/api/people/85343/","name":"Fabiano Rosas","email":"farosas@suse.de"},"content":"Fabiano Rosas <farosas@suse.de> writes:\n\n> Peter Xu <peterx@redhat.com> writes:\n>\n>> Marc-André reported an issue on QEMU crash when retrying a cancelled\n>> migration during early setup phase, see \"Link:\" for more information, and\n>> also easy way to reproduce.\n>>\n>> This patch is a replacement of the prior fix proposed by not only switching\n>> to migration_cleanup(), but also fixing it from CPR side, so that we track\n>> hup_source properly to know if src QEMU is waiting or the HUP signal.\n>>\n>> To put it simple: this chunk of special casing in migration_cancel() should\n>> not affect normal migration, but only cpr-transfer migration to cover the\n>> small window when the src QEMU is waiting for a HUP signal on cpr\n>> channel (so that src QEMU can continue the migration on the main channel).\n>>\n>> To achieve that, we'll also need to remember to detach the hup_source\n>> whenenver invoked: after that point, we should always be able to cleanup\n>> the migration.\n>>\n>> It's not a generic operation to explicitly detach a gsource from its\n>> context while in its dispatch() function.  But it should be safe, because\n>> gsource disptch() will only happen with a boosted refcount for the\n>> dispatcher so that the gsource will not be freed until the callback\n>> completes. It's also safe to return G_SOURCE_REMOVE after the gsource is\n>> detached, as glib will simply ignore the G_SOURCE_REMOVE.\n>>\n>> One can refer to latest 2.86.5 glib code in g_main_dispatch() for that:\n>>\n>> https://github.com/GNOME/glib/blob/2.86.5/glib/gmain.c#L3592\n>>\n>> When at this, add a bunch of assertions to make sure nothing surprises us.\n>>\n>> After this patch applied, the 2nd migration will not crash QEMU, instead\n>> it'll be in CANCELLING until the socket connection times out (it will take\n>> ~2min on my Fedora default kernel).  During this process no 2nd migration\n>> will be allowed, and after it timed out migration can be restarted.\n>>\n>> It's because so far we don't have control over socket_connect_outgoing(),\n>> or anything yet managed by a task executed in qio_task_run_in_thread().\n>> Speeding up the cancellation to be left for future.\n>>\n>> I also tested cpr-transfer by only providing cpr channel not the main\n>> channel (with -incoming defer), kickoff migration on source, then cancel it\n>> on source directly without providing the main channel.  It keeps working.\n>>\n>> I wanted to add an unit test for that but it'll need to refactor current\n>> cpr-transfer tests first; let's leave it for later.\n>>\n>> Link: https://lore.kernel.org/r/20260417184742.293061-1-marcandre.lureau@redhat.com\n>> Reported-by: Marc-André Lureau <marcandre.lureau@redhat.com>\n>> Signed-off-by: Peter Xu <peterx@redhat.com>\n>> ---\n>>  include/migration/cpr.h  |  1 +\n>>  migration/migration.h    |  5 +++++\n>>  migration/cpr-transfer.c | 10 ++++++++++\n>>  migration/migration.c    | 31 +++++++++++++++++++++++--------\n>>  4 files changed, 39 insertions(+), 8 deletions(-)\n>>\n>> diff --git a/include/migration/cpr.h b/include/migration/cpr.h\n>> index 5850fd1788..ebf09a2f0a 100644\n>> --- a/include/migration/cpr.h\n>> +++ b/include/migration/cpr.h\n>> @@ -57,6 +57,7 @@ QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);\n>>  void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n>>                                  void *opaque);\n>>  void cpr_transfer_source_destroy(MigrationState *s);\n>> +bool cpr_transfer_source_active(MigrationState *s);\n>>  \n>>  void cpr_exec_init(void);\n>>  QEMUFile *cpr_exec_output(Error **errp);\n>> diff --git a/migration/migration.h b/migration/migration.h\n>> index b6888daced..2bc2787480 100644\n>> --- a/migration/migration.h\n>> +++ b/migration/migration.h\n>> @@ -514,6 +514,11 @@ struct MigrationState {\n>>      bool postcopy_package_loaded;\n>>      QemuEvent postcopy_package_loaded_event;\n>>  \n>> +    /*\n>> +     * When set, it means cpr-transfer is waiting for the HUP signal from\n>> +     * destination to continue the 2nd step of migration via the main\n>> +     * channel.\n>> +     */\n>>      GSource *hup_source;\n>>  \n>>      /*\n>> diff --git a/migration/cpr-transfer.c b/migration/cpr-transfer.c\n>> index 61d5c9dce2..9defe7bad7 100644\n>> --- a/migration/cpr-transfer.c\n>> +++ b/migration/cpr-transfer.c\n>> @@ -6,6 +6,7 @@\n>>   */\n>>  \n>>  #include \"qemu/osdep.h\"\n>> +#include \"qemu/main-loop.h\"\n>>  #include \"qapi/clone-visitor.h\"\n>>  #include \"qapi/error.h\"\n>>  #include \"qapi/qapi-visit-migration.h\"\n>> @@ -79,6 +80,7 @@ QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp)\n>>  void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n>>                                  void *opaque)\n>>  {\n>> +    assert(bql_locked());\n>>      s->hup_source = qio_channel_create_watch(cpr_state_ioc(), G_IO_HUP);\n>\n> Before I review the patch in detail, let me just make a high level\n> comment here:\n>\n> I wonder if we should have a \"register\" in the iochannel layer for the\n> several watches we create. So we could at migration_cancel time call\n> g_clear_handle/g_source_destroy on all at the same time. The management\n> of these source ids is getting too particular.\n>\n> I'm seeing that exec, fd and file migration all ignore the id returned\n> by channel-watch.c. In the case of exec this is causing\n> qio_channel_command_finalize() to be skipped, leaving the exec'ed\n> command process behind! So what I did as an experiment was to register\n> watches like this:\n>\n>   static inline void migration_watch_data(void) {};\n\nOops, not inline.","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256\n header.s=susede2_rsa header.b=BKm/6Xb+;\n\tdkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256\n header.s=susede2_ed25519 header.b=LqlBLSy2;\n\tdkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de\n header.a=rsa-sha256 header.s=susede2_rsa header.b=BKm/6Xb+;\n\tdkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256\n header.s=susede2_ed25519 header.b=LqlBLSy2;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists1p.gnu.org;\n envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)","smtp-out1.suse.de;\n\tnone"],"Received":["from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g11TJ2GVTz1y2d\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 23 Apr 2026 00:15:16 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists1p.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-devel-bounces@nongnu.org>)\n\tid 1wFYLo-0000Pm-KX; Wed, 22 Apr 2026 10:14:49 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <farosas@suse.de>) id 1wFYLZ-0000Mk-IJ\n for qemu-devel@nongnu.org; Wed, 22 Apr 2026 10:14:34 -0400","from smtp-out1.suse.de ([2a07:de40:b251:101:10:150:64:1])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)\n (Exim 4.90_1) (envelope-from <farosas@suse.de>) id 1wFYLX-0005nx-5Z\n for qemu-devel@nongnu.org; Wed, 22 Apr 2026 10:14:33 -0400","from imap1.dmz-prg2.suse.org (unknown [10.150.64.97])\n (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest\n SHA256)\n (No client certificate requested)\n by smtp-out1.suse.de (Postfix) with ESMTPS id BC8DA6A867;\n Wed, 22 Apr 2026 14:14:24 +0000 (UTC)","from imap1.dmz-prg2.suse.org (localhost [127.0.0.1])\n (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest\n SHA256)\n (No client certificate requested)\n by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 5B6B6593AF;\n Wed, 22 Apr 2026 14:14:24 +0000 (UTC)","from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167])\n by imap1.dmz-prg2.suse.org with ESMTPSA id RMBXC8DX6GmnVQAAD6G6ig\n (envelope-from <farosas@suse.de>); Wed, 22 Apr 2026 14:14:24 +0000"],"DKIM-Signature":["v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_rsa;\n t=1776867264;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=eqIi/8jAYlWi7ulS+8v7FNHOxMMTw7izcQd2wChIVMs=;\n b=BKm/6Xb+1F02QVnqRck+x0E5qnSkF/D98q5Rk0Ig+rkdTZK9MWT1B2+kyHsFnR3IbUq/BT\n //ZYfKbbuMjSd2c+jgJ0UTch1+yGHCWnT2h4f9fQsAyfC4qy0lTOpesxf28mpQVC4P3TFy\n MZzliL5LlBOm/gzfn8YxtVn9HRGQJgU=","v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_ed25519; t=1776867264;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=eqIi/8jAYlWi7ulS+8v7FNHOxMMTw7izcQd2wChIVMs=;\n b=LqlBLSy2gyiEZEXU+XmO7iYKWcUCMqW/nCFaWL4Xvus7w1q62z3+vBD5GWDVLEUbCd87Rk\n /J5azShvJF2rSnDg==","v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_rsa;\n t=1776867264;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=eqIi/8jAYlWi7ulS+8v7FNHOxMMTw7izcQd2wChIVMs=;\n b=BKm/6Xb+1F02QVnqRck+x0E5qnSkF/D98q5Rk0Ig+rkdTZK9MWT1B2+kyHsFnR3IbUq/BT\n //ZYfKbbuMjSd2c+jgJ0UTch1+yGHCWnT2h4f9fQsAyfC4qy0lTOpesxf28mpQVC4P3TFy\n MZzliL5LlBOm/gzfn8YxtVn9HRGQJgU=","v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_ed25519; t=1776867264;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=eqIi/8jAYlWi7ulS+8v7FNHOxMMTw7izcQd2wChIVMs=;\n b=LqlBLSy2gyiEZEXU+XmO7iYKWcUCMqW/nCFaWL4Xvus7w1q62z3+vBD5GWDVLEUbCd87Rk\n /J5azShvJF2rSnDg=="],"From":"Fabiano Rosas <farosas@suse.de>","To":"Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org","Cc":"Peter Xu <peterx@redhat.com>, Prasad Pandit <ppandit@redhat.com>,\n Ben Chaney <bchaney@akamai.com>, Juraj Marcin <jmarcin@redhat.com>,\n Mark Kanda <mark.kanda@oracle.com>, Pranav Tyagi <prtyagi@redhat.com>,\n\t=?utf-8?q?Marc-Andr=C3=A9?= Lureau <marcandre.lureau@redhat.com>,\n\t=?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com>","Subject":"Re: [PATCH] migration: Fix crash on second migration when cancel\n early","In-Reply-To":"<87wlxzdsv9.fsf@suse.de>","References":"<20260421175820.302795-1-peterx@redhat.com>\n <87wlxzdsv9.fsf@suse.de>","Date":"Wed, 22 Apr 2026 11:14:18 -0300","Message-ID":"<87tst3dqv9.fsf@suse.de>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Transfer-Encoding":"quoted-printable","X-Spamd-Result":"default: False [-4.30 / 50.00]; BAYES_HAM(-3.00)[100.00%];\n NEURAL_HAM_LONG(-1.00)[-1.000];\n NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain];\n RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[];\n RCVD_TLS_ALL(0.00)[]; MISSING_XM_UA(0.00)[];\n FUZZY_RATELIMITED(0.00)[rspamd.com]; MIME_TRACE(0.00)[0:+];\n RCPT_COUNT_SEVEN(0.00)[10]; MID_RHS_MATCH_FROM(0.00)[];\n DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];\n FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];\n FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[];\n RCVD_COUNT_TWO(0.00)[2];\n DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo, suse.de:mid,\n suse.de:email]","X-Spam-Score":"-4.30","Received-SPF":"pass client-ip=2a07:de40:b251:101:10:150:64:1;\n envelope-from=farosas@suse.de; helo=smtp-out1.suse.de","X-Spam_score_int":"-20","X-Spam_score":"-2.1","X-Spam_bar":"--","X-Spam_report":"(-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001,\n SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"qemu development <qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-devel>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3680677,"web_url":"http://patchwork.ozlabs.org/comment/3680677/","msgid":"<aejqGMhhrfQENE1X@x1.local>","list_archive_url":null,"date":"2026-04-22T15:32:40","subject":"Re: [PATCH] migration: Fix crash on second migration when cancel\n early","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Wed, Apr 22, 2026 at 10:31:06AM -0300, Fabiano Rosas wrote:\n> Peter Xu <peterx@redhat.com> writes:\n> \n> > Marc-André reported an issue on QEMU crash when retrying a cancelled\n> > migration during early setup phase, see \"Link:\" for more information, and\n> > also easy way to reproduce.\n> >\n> > This patch is a replacement of the prior fix proposed by not only switching\n> > to migration_cleanup(), but also fixing it from CPR side, so that we track\n> > hup_source properly to know if src QEMU is waiting or the HUP signal.\n> >\n> > To put it simple: this chunk of special casing in migration_cancel() should\n> > not affect normal migration, but only cpr-transfer migration to cover the\n> > small window when the src QEMU is waiting for a HUP signal on cpr\n> > channel (so that src QEMU can continue the migration on the main channel).\n> >\n> > To achieve that, we'll also need to remember to detach the hup_source\n> > whenenver invoked: after that point, we should always be able to cleanup\n> > the migration.\n> >\n> > It's not a generic operation to explicitly detach a gsource from its\n> > context while in its dispatch() function.  But it should be safe, because\n> > gsource disptch() will only happen with a boosted refcount for the\n> > dispatcher so that the gsource will not be freed until the callback\n> > completes. It's also safe to return G_SOURCE_REMOVE after the gsource is\n> > detached, as glib will simply ignore the G_SOURCE_REMOVE.\n> >\n> > One can refer to latest 2.86.5 glib code in g_main_dispatch() for that:\n> >\n> > https://github.com/GNOME/glib/blob/2.86.5/glib/gmain.c#L3592\n> >\n> > When at this, add a bunch of assertions to make sure nothing surprises us.\n> >\n> > After this patch applied, the 2nd migration will not crash QEMU, instead\n> > it'll be in CANCELLING until the socket connection times out (it will take\n> > ~2min on my Fedora default kernel).  During this process no 2nd migration\n> > will be allowed, and after it timed out migration can be restarted.\n> >\n> > It's because so far we don't have control over socket_connect_outgoing(),\n> > or anything yet managed by a task executed in qio_task_run_in_thread().\n> > Speeding up the cancellation to be left for future.\n\n[1]\n\n> >\n> > I also tested cpr-transfer by only providing cpr channel not the main\n> > channel (with -incoming defer), kickoff migration on source, then cancel it\n> > on source directly without providing the main channel.  It keeps working.\n> >\n> > I wanted to add an unit test for that but it'll need to refactor current\n> > cpr-transfer tests first; let's leave it for later.\n> >\n> > Link: https://lore.kernel.org/r/20260417184742.293061-1-marcandre.lureau@redhat.com\n> > Reported-by: Marc-André Lureau <marcandre.lureau@redhat.com>\n> > Signed-off-by: Peter Xu <peterx@redhat.com>\n> > ---\n> >  include/migration/cpr.h  |  1 +\n> >  migration/migration.h    |  5 +++++\n> >  migration/cpr-transfer.c | 10 ++++++++++\n> >  migration/migration.c    | 31 +++++++++++++++++++++++--------\n> >  4 files changed, 39 insertions(+), 8 deletions(-)\n> >\n> > diff --git a/include/migration/cpr.h b/include/migration/cpr.h\n> > index 5850fd1788..ebf09a2f0a 100644\n> > --- a/include/migration/cpr.h\n> > +++ b/include/migration/cpr.h\n> > @@ -57,6 +57,7 @@ QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);\n> >  void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n> >                                  void *opaque);\n> >  void cpr_transfer_source_destroy(MigrationState *s);\n> > +bool cpr_transfer_source_active(MigrationState *s);\n> >  \n> >  void cpr_exec_init(void);\n> >  QEMUFile *cpr_exec_output(Error **errp);\n> > diff --git a/migration/migration.h b/migration/migration.h\n> > index b6888daced..2bc2787480 100644\n> > --- a/migration/migration.h\n> > +++ b/migration/migration.h\n> > @@ -514,6 +514,11 @@ struct MigrationState {\n> >      bool postcopy_package_loaded;\n> >      QemuEvent postcopy_package_loaded_event;\n> >  \n> > +    /*\n> > +     * When set, it means cpr-transfer is waiting for the HUP signal from\n> > +     * destination to continue the 2nd step of migration via the main\n> > +     * channel.\n> > +     */\n> >      GSource *hup_source;\n> >  \n> >      /*\n> > diff --git a/migration/cpr-transfer.c b/migration/cpr-transfer.c\n> > index 61d5c9dce2..9defe7bad7 100644\n> > --- a/migration/cpr-transfer.c\n> > +++ b/migration/cpr-transfer.c\n> > @@ -6,6 +6,7 @@\n> >   */\n> >  \n> >  #include \"qemu/osdep.h\"\n> > +#include \"qemu/main-loop.h\"\n> >  #include \"qapi/clone-visitor.h\"\n> >  #include \"qapi/error.h\"\n> >  #include \"qapi/qapi-visit-migration.h\"\n> > @@ -79,6 +80,7 @@ QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp)\n> >  void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n> >                                  void *opaque)\n> >  {\n> > +    assert(bql_locked());\n> >      s->hup_source = qio_channel_create_watch(cpr_state_ioc(), G_IO_HUP);\n> \n> Before I review the patch in detail, let me just make a high level\n> comment here:\n> \n> I wonder if we should have a \"register\" in the iochannel layer for the\n> several watches we create. So we could at migration_cancel time call\n\nYes, we need to do this part of mgmt better either now or at some point..\n\nSaid that, just to mention: what you're discussing below seems to be about\ndest QEMU, not src QEMU.\n\nHere the \"unmanaged async tasks\" I mentioned above [1] is only about src\nQEMU.  It only applies to socket channels and socket_connect_outgoing().\nIt is slightly even trickier, IIUC, because instead of creating a watch\ndirectly, it creates a pthread and run the task there, then when the task\nfn() completes that thread will inject one event back to the main event\nloop.  See qio_task_thread_worker() and the @completion gsource.\n\nOne thing I can try to do is to work out fast cancellation for sockets here\non src side, so we don't need to wait for that 2min timeout..  But it would\nstill be nice to have this fix land first because it fixes a crash.. so it\nmay still be something on top but I can start look into.  I actually don't\nknow how frequent users will suffer from the 2min timeout: normally when\nthe host:port isn't available we should just get disconnected fast.\n\nStarting from now, I'll only discuss about dest QEMU side (IOW, may not be\ndirectly relevant to this patch).\n\n> g_clear_handle/g_source_destroy on all at the same time. The management\n> of these source ids is getting too particular.\n> \n> I'm seeing that exec, fd and file migration all ignore the id returned\n> by channel-watch.c. In the case of exec this is causing\n> qio_channel_command_finalize() to be skipped, leaving the exec'ed\n> command process behind! So what I did as an experiment was to register\n\nAre we?\n\nWe're on the same page at least on that the gsources are not yet managed.\nBut I am not sure they're leaked.\n\nexec_connect_incoming() does qio_channel_add_watch_full(), within itself it\nwill release the refcount of the IO watch gsource.  It means it'll be a\ndangling gsource on the main context.  So after the spawn of the new\nprocess, it will still be finalized properly?\n\n> watches like this:\n> \n>   static inline void migration_watch_data(void) {};\n>   \n>   qio_channel_add_watch_full(ioc, G_IO_IN, exec_accept_incoming_migration,\n>                              migration_watch_data, NULL,\n>                              g_main_context_get_thread_default());\n> \n> and at migrate_cancel():\n> \n>   while(g_source_remove_by_user_data(migration_watch_data));\n> \n> It feels to me that a wrapper around this, or even a hashtable of \"func\n> ptr->GSource\" or \"str->GSource id\" would allow us to call a (say)\n> qio_channel_clear_watches() and avoid having to do the management in\n> each of the clients.\n\nIn general, this whole idea sounds reasonable.  I just want to check with\nyou on which side of QEMU we're talking about.\n\nTo me, dest QEMU is less of a concern when it can easily be killed.  But\nstill it's good to be able to manage those.  Src QEMU is more important\nfrom that perspective.\n\n> \n> We already have stuff like IOWatchPoll which kind of already wraps the\n> watch creation in a sense.\n> \n> >      g_source_set_callback(s->hup_source,\n> >                            (GSourceFunc)func,\n> > @@ -89,9 +91,17 @@ void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n> >  \n> >  void cpr_transfer_source_destroy(MigrationState *s)\n> >  {\n> > +    assert(bql_locked());\n> >      if (s->hup_source) {\n> >          g_source_destroy(s->hup_source);\n> >          g_source_unref(s->hup_source);\n> >          s->hup_source = NULL;\n> >      }\n> >  }\n> > +\n> > +bool cpr_transfer_source_active(MigrationState *s)\n> > +{\n> > +    /* Whenever the HUP gsource is available, it's active. */\n> > +    assert(bql_locked());\n> > +    return s->hup_source;\n> > +}\n> > diff --git a/migration/migration.c b/migration/migration.c\n> > index 5c9aaa6e58..58c1e56766 100644\n> > --- a/migration/migration.c\n> > +++ b/migration/migration.c\n> > @@ -1469,14 +1469,19 @@ void migration_cancel(void)\n> >      }\n> >  \n> >      /*\n> > -     * If migration_connect_outgoing has not been called, then there\n> > -     * is no path that will complete the cancellation. Do it now.\n> > -     */\n> > -    if (setup && !s->to_dst_file) {\n> > -        migrate_set_state(&s->state, MIGRATION_STATUS_CANCELLING,\n> > -                          MIGRATION_STATUS_CANCELLED);\n> > -        cpr_state_close();\n> > -        cpr_transfer_source_destroy(s);\n> > +     * This is cpr-transfer specific processing.\n> > +     *\n> > +     * If this is true, it means cpr-transfer migration is waiting for the\n> > +     * destination to send HUP event on CPR channel to continue the next\n> > +     * phase.  If so, do the cleanup proactively to avoid get stuck in\n> > +     * CANCELLING state.\n> > +     */\n> > +    if (cpr_transfer_source_active(s)) {\n> > +        assert(migrate_mode() == MIG_MODE_CPR_TRANSFER);\n> > +        assert(setup && !s->to_dst_file);\n> > +        migration_cleanup(s);\n> > +        /* Now all things should have been released */\n> > +        assert(!cpr_transfer_source_active(s));\n> >      }\n> >  }\n> >  \n> > @@ -2009,12 +2014,22 @@ static gboolean migration_connect_outgoing_cb(QIOChannel *channel,\n> >      MigrationState *s = migrate_get_current();\n> >      Error *local_err = NULL;\n> >  \n> > +    /*\n> > +     * Detach and release the GSource right after use.  We rely on this to\n> > +     * detect this small cpr-transfer window of \"waiting for HUP event\".\n> > +     */\n> > +    cpr_transfer_source_destroy(s);\n> > +\n> >      migration_connect_outgoing(s, opaque, &local_err);\n> >  \n> >      if (local_err) {\n> >          migration_connect_error_propagate(s, local_err);\n> >      }\n> >  \n> > +    /*\n> > +     * This is redundant as we do cpr_transfer_source_destroy() at the\n> > +     * entry, but it's benign; glib will just skip the detach.\n> > +     */\n> >      return G_SOURCE_REMOVE;\n> >  }\n>","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=fzP54wts;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=google header.b=oITaPgLw;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists1p.gnu.org;\n envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g13ND6dD0z1yCv\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 23 Apr 2026 01:41:00 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists1p.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-devel-bounces@nongnu.org>)\n\tid 1wFZgM-00022v-Un; Wed, 22 Apr 2026 11:40:06 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <peterx@redhat.com>) id 1wFZgJ-0001yi-Qr\n for qemu-devel@nongnu.org; Wed, 22 Apr 2026 11:40:03 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.129.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <peterx@redhat.com>) id 1wFZgG-0000oA-PA\n for qemu-devel@nongnu.org; Wed, 22 Apr 2026 11:40:03 -0400","from mail-oi1-f199.google.com (mail-oi1-f199.google.com\n [209.85.167.199]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-55-nOq3G7nPPVuurB9f7zObxQ-1; Wed, 22 Apr 2026 11:39:56 -0400","by mail-oi1-f199.google.com with SMTP id\n 5614622812f47-47018d3424fso7441943b6e.2\n for <qemu-devel@nongnu.org>; Wed, 22 Apr 2026 08:39:56 -0700 (PDT)","from x1.local ([142.189.10.167]) by smtp.gmail.com with ESMTPSA id\n 6a1803df08f44-8b02ac7d4e6sm132487036d6.20.2026.04.22.08.32.40\n (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);\n Wed, 22 Apr 2026 08:32:41 -0700 (PDT)"],"DKIM-Signature":["v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1776872398;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=saU6kuDwma6n7Nmzao6ikScZYd76vvpOTyUjB/pj+pk=;\n b=fzP54wtsdcYHeWcgGUp7GNv+Ba/EdQjQadzT9bxKF1c6TkYbGec90knIG9+Hn8Lyrb70+r\n Ot+fD24XGXxYmXDPLHc8omPn0dGMs9L5FLJh1GRHl9//iz/SBojFi3uYaImirQxBHNK8Xe\n re1bTl8ZSthL3xSv+0v/fRz+N6RdNmg=","v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=redhat.com; s=google; t=1776872396; x=1777477196; darn=nongnu.org;\n h=in-reply-to:content-transfer-encoding:content-disposition\n :mime-version:references:message-id:subject:cc:to:from:date:from:to\n :cc:subject:date:message-id:reply-to;\n bh=saU6kuDwma6n7Nmzao6ikScZYd76vvpOTyUjB/pj+pk=;\n b=oITaPgLworJsis9IHmcuzD+1SrMnmTeKoot4+XbqWADARJ8auJB7wUIZFGrU7ksKdp\n RrVkN26UBf11d16U/f4NVd0c7BdCNChZFCEx5+aZF/9583p48tHt77wnZmqUroThmJaj\n gebsNR27yIedHkVtfUlKXBRfftLzDuqTYdMK1ugvK5AKIHUcxKszGtyOgiaFeQfjck+F\n anf7ViNukDQzczSX1mEKrKxyBSxceVqewhwtPIxfW3drD5AH14sl6ifFSaoih3av0eeJ\n +aUmYPMpcsPyzscvvm6hwViSzf8yCURuoO9SiCz7w5lnB31VYzLmaCdW4VvgiIFSLgfS\n 9XTg=="],"X-MC-Unique":"nOq3G7nPPVuurB9f7zObxQ-1","X-Mimecast-MFC-AGG-ID":"nOq3G7nPPVuurB9f7zObxQ_1776872396","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1776872396; x=1777477196;\n h=in-reply-to:content-transfer-encoding:content-disposition\n :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg\n :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;\n bh=saU6kuDwma6n7Nmzao6ikScZYd76vvpOTyUjB/pj+pk=;\n b=j47tNqcQV79dfqyN9VLOq/dN6ibi5kZMsIwWG533nt3K30qSaDXMmZtEzLABbQT6aG\n R0iAYucfYioWJNSWIo3lYcxd5Pm1Gf94B+nExw3pW5toKzboaKnzsdQVvB4II8//d1hc\n kw5MY+dJEFjZqlxFwQA8/nBD1MH6V/ozRBbK3Z4bxLnCIp7ZqO1XaMHu6QQH63jO40nK\n gaXEXHjzqsHR3zPIyxdqY3uA0aX2VgfyXAn4S+0ROIfxJAQ5B8NTHHq23iO9EuIb8Kz4\n M5fBkWkBln7ZmEGhkJ9FZhR8Y8d9SfRnLV+CPjVK6FeNb2+n0SRiqMSaXTCQKL/2YLcf\n urzw==","X-Gm-Message-State":"AOJu0YwqSsKSe1gHPHqYGAuk0C/TEUlAwCwzKkpgwR/xyGQP/RHfyNWK\n Ln98JqOZ7QwMUZAZpxpOm9cDwYnSZ1ZYSAN5s1RImddTQ1XLMbU6TG9san3zT8NNpa+3bIdpea1\n UPbzAktSMA91t7inKJAUuVVpcXnf8ClwWfmaI4G2nC++hBh9l82cx33q4","X-Gm-Gg":"AeBDieuhH9Vv5gALKP4PM3+LAoKJCRE75BZYgHzNtFrwj377UV9Vj+Y1sI+lAJHgCD9\n t+hNGq4KIz0pEXACz8CrXm8USc+nHyJm4Tc3AwLJDRB74bw6YvvREgDvW730W5oc9QKonvH6GsV\n 5qhBxAPCM361/CCyA9hl9/SRwFbNhovQDSCmTLwP3kEkEOlIYLLEX201q/emj803lw2Rphjmudi\n oqoKA6ivgT83IVJ+1R2WebxIGGMYN+SwWGjYUVjePTp5NcmqhCYLqVp4H+UeGoj5uXObxo6K2zv\n BHPvb14vISqW5UAP2qHLoG/8Anp6W81e2zzdKR50BDGffbjlYWG7iB//PGcEgH9f4gfoBVuQDbz\n nAu3/q/7Bom5uYhiDcaWVEAlUwQjO2LIq8u8Xz//EvnCH2/mAsHbTH1qtjA==","X-Received":["by 2002:a05:6820:1b10:b0:684:5041:9296 with SMTP id\n 006d021491bc7-69462de2716mr12367392eaf.10.1776871962891;\n Wed, 22 Apr 2026 08:32:42 -0700 (PDT)","by 2002:a05:6820:1b10:b0:684:5041:9296 with SMTP id\n 006d021491bc7-69462de2716mr12367341eaf.10.1776871962213;\n Wed, 22 Apr 2026 08:32:42 -0700 (PDT)"],"Date":"Wed, 22 Apr 2026 11:32:40 -0400","From":"Peter Xu <peterx@redhat.com>","To":"Fabiano Rosas <farosas@suse.de>","Cc":"qemu-devel@nongnu.org, Prasad Pandit <ppandit@redhat.com>,\n Ben Chaney <bchaney@akamai.com>, Juraj Marcin <jmarcin@redhat.com>,\n Mark Kanda <mark.kanda@oracle.com>, Pranav Tyagi <prtyagi@redhat.com>,\n\t=?utf-8?q?Marc-Andr=C3=A9?= Lureau <marcandre.lureau@redhat.com>, Daniel\n\t=?utf-8?b?UC4gQmVycmFuZ8Op?= <berrange@redhat.com>","Subject":"Re: [PATCH] migration: Fix crash on second migration when cancel\n early","Message-ID":"<aejqGMhhrfQENE1X@x1.local>","References":"<20260421175820.302795-1-peterx@redhat.com>\n <87wlxzdsv9.fsf@suse.de>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","Content-Transfer-Encoding":"8bit","In-Reply-To":"<87wlxzdsv9.fsf@suse.de>","Received-SPF":"pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-20","X-Spam_score":"-2.1","X-Spam_bar":"--","X-Spam_report":"(-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001,\n SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"qemu development <qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-devel>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3681185,"web_url":"http://patchwork.ozlabs.org/comment/3681185/","msgid":"<87qzo6ek7d.fsf@suse.de>","list_archive_url":null,"date":"2026-04-22T21:52:54","subject":"Re: [PATCH] migration: Fix crash on second migration when cancel\n early","submitter":{"id":85343,"url":"http://patchwork.ozlabs.org/api/people/85343/","name":"Fabiano Rosas","email":"farosas@suse.de"},"content":"Peter Xu <peterx@redhat.com> writes:\n\n> On Wed, Apr 22, 2026 at 10:31:06AM -0300, Fabiano Rosas wrote:\n>> Peter Xu <peterx@redhat.com> writes:\n>> \n>> > Marc-André reported an issue on QEMU crash when retrying a cancelled\n>> > migration during early setup phase, see \"Link:\" for more information, and\n>> > also easy way to reproduce.\n>> >\n>> > This patch is a replacement of the prior fix proposed by not only switching\n>> > to migration_cleanup(), but also fixing it from CPR side, so that we track\n>> > hup_source properly to know if src QEMU is waiting or the HUP signal.\n>> >\n>> > To put it simple: this chunk of special casing in migration_cancel() should\n>> > not affect normal migration, but only cpr-transfer migration to cover the\n>> > small window when the src QEMU is waiting for a HUP signal on cpr\n>> > channel (so that src QEMU can continue the migration on the main channel).\n>> >\n>> > To achieve that, we'll also need to remember to detach the hup_source\n>> > whenenver invoked: after that point, we should always be able to cleanup\n>> > the migration.\n>> >\n>> > It's not a generic operation to explicitly detach a gsource from its\n>> > context while in its dispatch() function.  But it should be safe, because\n>> > gsource disptch() will only happen with a boosted refcount for the\n>> > dispatcher so that the gsource will not be freed until the callback\n>> > completes. It's also safe to return G_SOURCE_REMOVE after the gsource is\n>> > detached, as glib will simply ignore the G_SOURCE_REMOVE.\n>> >\n>> > One can refer to latest 2.86.5 glib code in g_main_dispatch() for that:\n>> >\n>> > https://github.com/GNOME/glib/blob/2.86.5/glib/gmain.c#L3592\n>> >\n>> > When at this, add a bunch of assertions to make sure nothing surprises us.\n>> >\n>> > After this patch applied, the 2nd migration will not crash QEMU, instead\n>> > it'll be in CANCELLING until the socket connection times out (it will take\n>> > ~2min on my Fedora default kernel).  During this process no 2nd migration\n>> > will be allowed, and after it timed out migration can be restarted.\n>> >\n>> > It's because so far we don't have control over socket_connect_outgoing(),\n>> > or anything yet managed by a task executed in qio_task_run_in_thread().\n>> > Speeding up the cancellation to be left for future.\n>\n> [1]\n>\n>> >\n>> > I also tested cpr-transfer by only providing cpr channel not the main\n>> > channel (with -incoming defer), kickoff migration on source, then cancel it\n>> > on source directly without providing the main channel.  It keeps working.\n>> >\n>> > I wanted to add an unit test for that but it'll need to refactor current\n>> > cpr-transfer tests first; let's leave it for later.\n>> >\n>> > Link: https://lore.kernel.org/r/20260417184742.293061-1-marcandre.lureau@redhat.com\n>> > Reported-by: Marc-André Lureau <marcandre.lureau@redhat.com>\n>> > Signed-off-by: Peter Xu <peterx@redhat.com>\n>> > ---\n>> >  include/migration/cpr.h  |  1 +\n>> >  migration/migration.h    |  5 +++++\n>> >  migration/cpr-transfer.c | 10 ++++++++++\n>> >  migration/migration.c    | 31 +++++++++++++++++++++++--------\n>> >  4 files changed, 39 insertions(+), 8 deletions(-)\n>> >\n>> > diff --git a/include/migration/cpr.h b/include/migration/cpr.h\n>> > index 5850fd1788..ebf09a2f0a 100644\n>> > --- a/include/migration/cpr.h\n>> > +++ b/include/migration/cpr.h\n>> > @@ -57,6 +57,7 @@ QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);\n>> >  void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n>> >                                  void *opaque);\n>> >  void cpr_transfer_source_destroy(MigrationState *s);\n>> > +bool cpr_transfer_source_active(MigrationState *s);\n>> >  \n>> >  void cpr_exec_init(void);\n>> >  QEMUFile *cpr_exec_output(Error **errp);\n>> > diff --git a/migration/migration.h b/migration/migration.h\n>> > index b6888daced..2bc2787480 100644\n>> > --- a/migration/migration.h\n>> > +++ b/migration/migration.h\n>> > @@ -514,6 +514,11 @@ struct MigrationState {\n>> >      bool postcopy_package_loaded;\n>> >      QemuEvent postcopy_package_loaded_event;\n>> >  \n>> > +    /*\n>> > +     * When set, it means cpr-transfer is waiting for the HUP signal from\n>> > +     * destination to continue the 2nd step of migration via the main\n>> > +     * channel.\n>> > +     */\n>> >      GSource *hup_source;\n>> >  \n>> >      /*\n>> > diff --git a/migration/cpr-transfer.c b/migration/cpr-transfer.c\n>> > index 61d5c9dce2..9defe7bad7 100644\n>> > --- a/migration/cpr-transfer.c\n>> > +++ b/migration/cpr-transfer.c\n>> > @@ -6,6 +6,7 @@\n>> >   */\n>> >  \n>> >  #include \"qemu/osdep.h\"\n>> > +#include \"qemu/main-loop.h\"\n>> >  #include \"qapi/clone-visitor.h\"\n>> >  #include \"qapi/error.h\"\n>> >  #include \"qapi/qapi-visit-migration.h\"\n>> > @@ -79,6 +80,7 @@ QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp)\n>> >  void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n>> >                                  void *opaque)\n>> >  {\n>> > +    assert(bql_locked());\n>> >      s->hup_source = qio_channel_create_watch(cpr_state_ioc(), G_IO_HUP);\n>> \n>> Before I review the patch in detail, let me just make a high level\n>> comment here:\n>> \n>> I wonder if we should have a \"register\" in the iochannel layer for the\n>> several watches we create. So we could at migration_cancel time call\n>\n> Yes, we need to do this part of mgmt better either now or at some point..\n>\n> Said that, just to mention: what you're discussing below seems to be about\n> dest QEMU, not src QEMU.\n>\n\nIndeed, I haven't looked closely into this patch yet.\n\n> Here the \"unmanaged async tasks\" I mentioned above [1] is only about src\n> QEMU.  It only applies to socket channels and socket_connect_outgoing().\n> It is slightly even trickier, IIUC, because instead of creating a watch\n> directly, it creates a pthread and run the task there, then when the task\n> fn() completes that thread will inject one event back to the main event\n> loop.  See qio_task_thread_worker() and the @completion gsource.\n>\n> One thing I can try to do is to work out fast cancellation for sockets here\n> on src side, so we don't need to wait for that 2min timeout..  But it would\n> still be nice to have this fix land first because it fixes a crash.. so it\n> may still be something on top but I can start look into.  I actually don't\n> know how frequent users will suffer from the 2min timeout: normally when\n> the host:port isn't available we should just get disconnected fast.\n>\n> Starting from now, I'll only discuss about dest QEMU side (IOW, may not be\n> directly relevant to this patch).\n>\n>> g_clear_handle/g_source_destroy on all at the same time. The management\n>> of these source ids is getting too particular.\n>> \n>> I'm seeing that exec, fd and file migration all ignore the id returned\n>> by channel-watch.c. In the case of exec this is causing\n>> qio_channel_command_finalize() to be skipped, leaving the exec'ed\n>> command process behind! So what I did as an experiment was to register\n>\n> Are we?\n>\n\nYes.\n\n> We're on the same page at least on that the gsources are not yet managed.\n> But I am not sure they're leaked.\n>\n> exec_connect_incoming() does qio_channel_add_watch_full(), within itself it\n> will release the refcount of the IO watch gsource.  It means it'll be a\n> dangling gsource on the main context.  So after the spawn of the new\n> process, it will still be finalized properly?\n>\n\nWhen the migration_with_exec functional test fails [0], the test issues\nqmp-quit to both src and dst. The dst will exit before ever having\ndispatched the gsource and exec_accept_incoming_migration() is not\nexecuted at all. In that case the gsource and the ioc are never freed\nand the spawned process is left behind.\n\nThis oneliner triggers it for me:\n\nfor i in $(seq 1 1000); do \\\necho \"$i =============\"; \\\nmake -j$(nproc) check-func-quick || break; done; ps aux | grep socat\n\n[0] - due to the startup race as the AI overlords told us. Patch coming\nsoon!\n\n>> watches like this:\n>> \n>>   static inline void migration_watch_data(void) {};\n>>   \n>>   qio_channel_add_watch_full(ioc, G_IO_IN, exec_accept_incoming_migration,\n>>                              migration_watch_data, NULL,\n>>                              g_main_context_get_thread_default());\n>> \n>> and at migrate_cancel():\n>> \n>>   while(g_source_remove_by_user_data(migration_watch_data));\n>> \n>> It feels to me that a wrapper around this, or even a hashtable of \"func\n>> ptr->GSource\" or \"str->GSource id\" would allow us to call a (say)\n>> qio_channel_clear_watches() and avoid having to do the management in\n>> each of the clients.\n>\n> In general, this whole idea sounds reasonable.  I just want to check with\n> you on which side of QEMU we're talking about.\n>\n> To me, dest QEMU is less of a concern when it can easily be killed.  But\n> still it's good to be able to manage those.  Src QEMU is more important\n> from that perspective.\n>\n\nIt applies to both src and dst. Whenever we have to add a watch. The\nnon-uniformity of having the source removed maybe after it dispatches if\nwe return G_SOURCE_REMOVE or maybe it doesn't dispatch and then it needs\nto be explicitly removed is error prone I think.\n\nLooking at callers of qio_channel_add_watch[_full], all of them just\nstore the gsource \"tag\" and later remove it. Having the callers each\nimplement their own way of keeping track of the GSource pointer/id just\nto be able to free them later on seems unnecessary to me.\n\nThe caller could still decide when to destroy the source, but it could\nhave semantics more like: \"I'm done with all the sources\".\n\nOr just make it part of the iochannel finalize routine. A small list of\nsource ids per channel seems reasonable.\n\n>> \n>> We already have stuff like IOWatchPoll which kind of already wraps the\n>> watch creation in a sense.\n>> \n>> >      g_source_set_callback(s->hup_source,\n>> >                            (GSourceFunc)func,\n>> > @@ -89,9 +91,17 @@ void cpr_transfer_add_hup_watch(MigrationState *s, QIOChannelFunc func,\n>> >  \n>> >  void cpr_transfer_source_destroy(MigrationState *s)\n>> >  {\n>> > +    assert(bql_locked());\n>> >      if (s->hup_source) {\n>> >          g_source_destroy(s->hup_source);\n>> >          g_source_unref(s->hup_source);\n>> >          s->hup_source = NULL;\n>> >      }\n>> >  }\n>> > +\n>> > +bool cpr_transfer_source_active(MigrationState *s)\n>> > +{\n>> > +    /* Whenever the HUP gsource is available, it's active. */\n>> > +    assert(bql_locked());\n>> > +    return s->hup_source;\n>> > +}\n>> > diff --git a/migration/migration.c b/migration/migration.c\n>> > index 5c9aaa6e58..58c1e56766 100644\n>> > --- a/migration/migration.c\n>> > +++ b/migration/migration.c\n>> > @@ -1469,14 +1469,19 @@ void migration_cancel(void)\n>> >      }\n>> >  \n>> >      /*\n>> > -     * If migration_connect_outgoing has not been called, then there\n>> > -     * is no path that will complete the cancellation. Do it now.\n>> > -     */\n>> > -    if (setup && !s->to_dst_file) {\n>> > -        migrate_set_state(&s->state, MIGRATION_STATUS_CANCELLING,\n>> > -                          MIGRATION_STATUS_CANCELLED);\n>> > -        cpr_state_close();\n>> > -        cpr_transfer_source_destroy(s);\n>> > +     * This is cpr-transfer specific processing.\n>> > +     *\n>> > +     * If this is true, it means cpr-transfer migration is waiting for the\n>> > +     * destination to send HUP event on CPR channel to continue the next\n>> > +     * phase.  If so, do the cleanup proactively to avoid get stuck in\n>> > +     * CANCELLING state.\n>> > +     */\n>> > +    if (cpr_transfer_source_active(s)) {\n>> > +        assert(migrate_mode() == MIG_MODE_CPR_TRANSFER);\n>> > +        assert(setup && !s->to_dst_file);\n>> > +        migration_cleanup(s);\n>> > +        /* Now all things should have been released */\n>> > +        assert(!cpr_transfer_source_active(s));\n>> >      }\n>> >  }\n>> >  \n>> > @@ -2009,12 +2014,22 @@ static gboolean migration_connect_outgoing_cb(QIOChannel *channel,\n>> >      MigrationState *s = migrate_get_current();\n>> >      Error *local_err = NULL;\n>> >  \n>> > +    /*\n>> > +     * Detach and release the GSource right after use.  We rely on this to\n>> > +     * detect this small cpr-transfer window of \"waiting for HUP event\".\n>> > +     */\n>> > +    cpr_transfer_source_destroy(s);\n>> > +\n>> >      migration_connect_outgoing(s, opaque, &local_err);\n>> >  \n>> >      if (local_err) {\n>> >          migration_connect_error_propagate(s, local_err);\n>> >      }\n>> >  \n>> > +    /*\n>> > +     * This is redundant as we do cpr_transfer_source_destroy() at the\n>> > +     * entry, but it's benign; glib will just skip the detach.\n>> > +     */\n>> >      return G_SOURCE_REMOVE;\n>> >  }\n>>","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256\n header.s=susede2_rsa header.b=a2p3dEJM;\n\tdkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256\n header.s=susede2_ed25519 header.b=MpqGZZf/;\n\tdkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de\n header.a=rsa-sha256 header.s=susede2_rsa header.b=a2p3dEJM;\n\tdkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256\n header.s=susede2_ed25519 header.b=MpqGZZf/;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists1p.gnu.org;\n envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)","smtp-out1.suse.de;\n dkim=pass header.d=suse.de header.s=susede2_rsa header.b=a2p3dEJM;\n dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=\"MpqGZZf/\""],"Received":["from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g1CfL3HgNz1yGs\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 23 Apr 2026 07:53:44 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists1p.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-devel-bounces@nongnu.org>)\n\tid 1wFfVS-0007h5-PZ; Wed, 22 Apr 2026 17:53:15 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <farosas@suse.de>) id 1wFfVK-0007fQ-Sj\n for qemu-devel@nongnu.org; Wed, 22 Apr 2026 17:53:09 -0400","from smtp-out1.suse.de ([2a07:de40:b251:101:10:150:64:1])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)\n (Exim 4.90_1) (envelope-from <farosas@suse.de>) id 1wFfVI-0002sE-Av\n for qemu-devel@nongnu.org; Wed, 22 Apr 2026 17:53:06 -0400","from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org\n [IPv6:2a07:de40:b281:104:10:150:64:97])\n (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest\n SHA256)\n (No client certificate requested)\n by smtp-out1.suse.de (Postfix) with ESMTPS id 74F736A836;\n Wed, 22 Apr 2026 21:53:01 +0000 (UTC)","from imap1.dmz-prg2.suse.org (localhost [127.0.0.1])\n (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest\n SHA256)\n (No client certificate requested)\n by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 0CCE4593AF;\n Wed, 22 Apr 2026 21:53:00 +0000 (UTC)","from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167])\n by imap1.dmz-prg2.suse.org with ESMTPSA id civWMzxD6WlkEwAAD6G6ig\n (envelope-from <farosas@suse.de>); Wed, 22 Apr 2026 21:53:00 +0000"],"DKIM-Signature":["v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_rsa;\n t=1776894781;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=eubxRGA54r6/y+tI+BmxFOkbbk1e7POfXyp4I/+c0+c=;\n b=a2p3dEJMieRo+3KSM+jXs/+ET7leGVjHPy+jCCyyTtMr4uyGOEQgCI7MbitUVs6vxGqD6v\n Uzq/Ey9LLn9p+coKJhAtZ8r2vqgRtLLyOqfidy6lUg10UdlGBXNHhy4HUVQrRVGNJLbssq\n W9u57//TBzuG3XggUd2DOdbf13KrQKE=","v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_ed25519; t=1776894781;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=eubxRGA54r6/y+tI+BmxFOkbbk1e7POfXyp4I/+c0+c=;\n b=MpqGZZf/g3ir8QiRG+eS2VZndfCF2eYJSHOPDmToC2jR6b1Js9LYJfzsKIyMQQYkaCgqZG\n WWb3MxkhSH1XvmAg==","v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_rsa;\n t=1776894781;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=eubxRGA54r6/y+tI+BmxFOkbbk1e7POfXyp4I/+c0+c=;\n b=a2p3dEJMieRo+3KSM+jXs/+ET7leGVjHPy+jCCyyTtMr4uyGOEQgCI7MbitUVs6vxGqD6v\n Uzq/Ey9LLn9p+coKJhAtZ8r2vqgRtLLyOqfidy6lUg10UdlGBXNHhy4HUVQrRVGNJLbssq\n W9u57//TBzuG3XggUd2DOdbf13KrQKE=","v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;\n s=susede2_ed25519; t=1776894781;\n h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:\n mime-version:mime-version:content-type:content-type:\n content-transfer-encoding:content-transfer-encoding:\n in-reply-to:in-reply-to:references:references;\n bh=eubxRGA54r6/y+tI+BmxFOkbbk1e7POfXyp4I/+c0+c=;\n b=MpqGZZf/g3ir8QiRG+eS2VZndfCF2eYJSHOPDmToC2jR6b1Js9LYJfzsKIyMQQYkaCgqZG\n WWb3MxkhSH1XvmAg=="],"From":"Fabiano Rosas <farosas@suse.de>","To":"Peter Xu <peterx@redhat.com>","Cc":"qemu-devel@nongnu.org, Prasad Pandit <ppandit@redhat.com>,\n Ben Chaney <bchaney@akamai.com>, Juraj Marcin <jmarcin@redhat.com>,\n Mark Kanda <mark.kanda@oracle.com>, Pranav Tyagi <prtyagi@redhat.com>,\n\t=?utf-8?q?Marc-Andr=C3=A9?= Lureau <marcandre.lureau@redhat.com>,\n\t=?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com>","Subject":"Re: [PATCH] migration: Fix crash on second migration when cancel\n early","In-Reply-To":"<aejqGMhhrfQENE1X@x1.local>","References":"<20260421175820.302795-1-peterx@redhat.com>\n <87wlxzdsv9.fsf@suse.de> <aejqGMhhrfQENE1X@x1.local>","Date":"Wed, 22 Apr 2026 18:52:54 -0300","Message-ID":"<87qzo6ek7d.fsf@suse.de>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Transfer-Encoding":"quoted-printable","X-Spamd-Result":"default: False [-4.51 / 50.00]; BAYES_HAM(-3.00)[100.00%];\n NEURAL_HAM_LONG(-1.00)[-1.000];\n R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];\n NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain];\n MX_GOOD(-0.01)[]; FUZZY_RATELIMITED(0.00)[rspamd.com];\n ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[];\n RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+];\n RCPT_COUNT_SEVEN(0.00)[9]; RCVD_TLS_ALL(0.00)[];\n TO_DN_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[];\n FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROM(0.00)[];\n RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[];\n DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns,suse.de:dkim,suse.de:mid];\n DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];\n DKIM_TRACE(0.00)[suse.de:+]","X-Rspamd-Action":"no action","X-Spam-Score":"-4.51","X-Rspamd-Server":"rspamd1.dmz-prg2.suse.org","X-Rspamd-Queue-Id":"74F736A836","Received-SPF":"pass client-ip=2a07:de40:b251:101:10:150:64:1;\n envelope-from=farosas@suse.de; helo=smtp-out1.suse.de","X-Spam_score_int":"-20","X-Spam_score":"-2.1","X-Spam_bar":"--","X-Spam_report":"(-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,\n DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001,\n SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"qemu development <qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-devel>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}},{"id":3681533,"web_url":"http://patchwork.ozlabs.org/comment/3681533/","msgid":"<aeo56NITJuRd1FN5@x1.local>","list_archive_url":null,"date":"2026-04-23T15:25:28","subject":"Re: [PATCH] migration: Fix crash on second migration when cancel\n early","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Wed, Apr 22, 2026 at 06:52:54PM -0300, Fabiano Rosas wrote:\n> > We're on the same page at least on that the gsources are not yet managed.\n> > But I am not sure they're leaked.\n> >\n> > exec_connect_incoming() does qio_channel_add_watch_full(), within itself it\n> > will release the refcount of the IO watch gsource.  It means it'll be a\n> > dangling gsource on the main context.  So after the spawn of the new\n> > process, it will still be finalized properly?\n> >\n> \n> When the migration_with_exec functional test fails [0], the test issues\n> qmp-quit to both src and dst. The dst will exit before ever having\n> dispatched the gsource and exec_accept_incoming_migration() is not\n> executed at all. In that case the gsource and the ioc are never freed\n> and the spawned process is left behind.\n> \n> This oneliner triggers it for me:\n> \n> for i in $(seq 1 1000); do \\\n> echo \"$i =============\"; \\\n> make -j$(nproc) check-func-quick || break; done; ps aux | grep socat\n> \n> [0] - due to the startup race as the AI overlords told us. Patch coming\n> soon!\n\nOhhhh, so it's about some failure path, ok.\n\n> \n> >> watches like this:\n> >> \n> >>   static inline void migration_watch_data(void) {};\n> >>   \n> >>   qio_channel_add_watch_full(ioc, G_IO_IN, exec_accept_incoming_migration,\n> >>                              migration_watch_data, NULL,\n> >>                              g_main_context_get_thread_default());\n> >> \n> >> and at migrate_cancel():\n> >> \n> >>   while(g_source_remove_by_user_data(migration_watch_data));\n> >> \n> >> It feels to me that a wrapper around this, or even a hashtable of \"func\n> >> ptr->GSource\" or \"str->GSource id\" would allow us to call a (say)\n> >> qio_channel_clear_watches() and avoid having to do the management in\n> >> each of the clients.\n> >\n> > In general, this whole idea sounds reasonable.  I just want to check with\n> > you on which side of QEMU we're talking about.\n> >\n> > To me, dest QEMU is less of a concern when it can easily be killed.  But\n> > still it's good to be able to manage those.  Src QEMU is more important\n> > from that perspective.\n> >\n> \n> It applies to both src and dst. Whenever we have to add a watch. The\n> non-uniformity of having the source removed maybe after it dispatches if\n> we return G_SOURCE_REMOVE or maybe it doesn't dispatch and then it needs\n> to be explicitly removed is error prone I think.\n> \n> Looking at callers of qio_channel_add_watch[_full], all of them just\n> store the gsource \"tag\" and later remove it. Having the callers each\n> implement their own way of keeping track of the GSource pointer/id just\n> to be able to free them later on seems unnecessary to me.\n\nYes, and we also need to keep in mind that the ID can be reused by glib\ncontext right after removal of the gsource from the context, afaik.\n\nExample I randomly picked: qio_channel_websock_handshake_io(), it remembers\nthe ID into hs_io_tag but it also needs to be very careful in its callback\nfunction so that whenever qio_channel_websock_handshake_send() would return\nFALSE (which should really be G_SOURCE_REMOVE..) it must reset hs_io_tag.\n\n> \n> The caller could still decide when to destroy the source, but it could\n> have semantics more like: \"I'm done with all the sources\".\n> \n> Or just make it part of the iochannel finalize routine. A small list of\n> source ids per channel seems reasonable.\n\nSounds good in general.\n\nBut one thing to mention is, IIUC even with this it won't fix the problem\nyou hit above with leftover spawned process and IOC.. because IOC's\nfinalize() won't get called..","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=mimecast20190719 header.b=ZvRh/V8L;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256\n header.s=google header.b=AeXNT/O1;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org\n (client-ip=209.51.188.17; helo=lists1p.gnu.org;\n envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n receiver=patchwork.ozlabs.org)"],"Received":["from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17])\n\t(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g1g1K75gdz1yCv\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 24 Apr 2026 01:26:44 +1000 (AEST)","from localhost ([::1] helo=lists1p.gnu.org)\n\tby lists1p.gnu.org with esmtp (Exim 4.90_1)\n\t(envelope-from <qemu-devel-bounces@nongnu.org>)\n\tid 1wFvwL-0007rf-8u; Thu, 23 Apr 2026 11:26:07 -0400","from eggs.gnu.org ([2001:470:142:3::10])\n by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <peterx@redhat.com>) id 1wFvvv-0007pr-R3\n for qemu-devel@nongnu.org; Thu, 23 Apr 2026 11:25:39 -0400","from us-smtp-delivery-124.mimecast.com ([170.10.129.124])\n by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)\n (Exim 4.90_1) (envelope-from <peterx@redhat.com>) id 1wFvvs-0008IL-72\n for qemu-devel@nongnu.org; Thu, 23 Apr 2026 11:25:38 -0400","from mail-qv1-f70.google.com (mail-qv1-f70.google.com\n [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS\n (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id\n us-mta-193-ZP09SJO3MGGGYMtNhC_7_g-1; Thu, 23 Apr 2026 11:25:32 -0400","by mail-qv1-f70.google.com with SMTP id\n 6a1803df08f44-8abd6e281c0so187196356d6.1\n for <qemu-devel@nongnu.org>; Thu, 23 Apr 2026 08:25:32 -0700 (PDT)","from x1.local ([142.189.10.167]) by smtp.gmail.com with ESMTPSA id\n d75a77b69052e-50fc42c7fabsm41308371cf.9.2026.04.23.08.25.29\n (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);\n Thu, 23 Apr 2026 08:25:30 -0700 (PDT)"],"DKIM-Signature":["v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;\n s=mimecast20190719; t=1776957934;\n h=from:from:reply-to:subject:subject:date:date:message-id:message-id:\n to:to:cc:cc:mime-version:mime-version:content-type:content-type:\n in-reply-to:in-reply-to:references:references;\n bh=gvefTJu7dGp8xIrMloC+CQc+wFQkZXv7PVTkOxSx0DQ=;\n b=ZvRh/V8LOP9pX0g0j3yj28NFf7EccD3IfOWQgBftaMX9eWk80R1YvpU+Qxqj214FpQ2uuA\n DJwrswzuZQMwZZ1sPUqRxokgeeE2uYr896rcmztAKD/Ye/JQbaIjnw4/79IDnj55lL6t5i\n Pdxpu8R5Vmk8iakmik3dmz+9+gqpMe0=","v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=redhat.com; s=google; t=1776957932; x=1777562732; darn=nongnu.org;\n h=in-reply-to:content-disposition:mime-version:references:message-id\n :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;\n bh=gvefTJu7dGp8xIrMloC+CQc+wFQkZXv7PVTkOxSx0DQ=;\n b=AeXNT/O1HY76idFtCHNQ13meku/302bvQyE9YQJur6c4DIGrrSbS8qGfhyjV3WZ/jd\n QmNA2b7sqA0K/3YMXzXqbv9pgus4i92TXAOQHR6M7VPDN5OrS7aDsxbMIEwnScSRFdew\n g9tvwTZpQeu0AC+hApApxrxhtmw3Sfboq3zwPudnQkUKc4rKEFGZRTwgSFExixINrset\n rUmbFA0BK38qQI4yrcW7z4D7MAOlWvUmlzY9d+pg/anMsb2nuSYXVcEvLLGRmUmqoIOT\n zQcUKm2HrLnG5REAaopC1CbnMOsc8+KkdHWe5ARSnaCxh9EIX0Qt0FEuktk3kF6zilTU\n cm3Q=="],"X-MC-Unique":"ZP09SJO3MGGGYMtNhC_7_g-1","X-Mimecast-MFC-AGG-ID":"ZP09SJO3MGGGYMtNhC_7_g_1776957932","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1776957932; x=1777562732;\n h=in-reply-to:content-disposition:mime-version:references:message-id\n :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc\n :subject:date:message-id:reply-to;\n bh=gvefTJu7dGp8xIrMloC+CQc+wFQkZXv7PVTkOxSx0DQ=;\n b=LA2/A+uAmJbiqan7/2/d5i+OBb/SrELAiiIdUmDw/sF2bmKCI9L0YFkAeuhVsofTyf\n K7N809C6TgEHmxWwQDtygecbF2GetLD0wuw+GJZGlOHVXkkAIdBsZxgcg5iVE56Vn/TT\n jFxolin/8lOOblXFRGBsJc0F8Csv1gZ5rzS9oDSMrjbr2qnmz9nNOgypD4rFM220Am1I\n UTpx4Nbn45NC0lnf8+azb4gVP0XPcFV58PDnF8//AQ70al7hZNG1u8pdomryknz3Pu8h\n w5NhrNh/jv5Siii3qOSAwSsN8srXuQPhq5GhAoGL1JDEbAbKMa8aFv4M5nOTtCIF/ueZ\n 3S9g==","X-Gm-Message-State":"AOJu0Yx3FAe6vBNMBBc17xP1bdmrtUGcO6mgDFlK90UMbMdsKIGncWTD\n PXDK3W5iRNg2+eqNmciP5e7yTEe2S3jbDUkeTrFc5eHEyTxGraud+lrN8/Ql+NF7N4AQ0zfWTq3\n QIlWYUXICqMBdIJhrbaSPj2d+cS5bKxKDRIkpYNQyiDsNP3MsK6QNc6zs","X-Gm-Gg":"AeBDiesQheRGqASELgr4pXsEPSmpXj9/ynFJ6LqrziZGZJSvjaKClx1c/GCgHjIVX7j\n sYnlC9tymAMCATVisPIMIulXAvNsy+CcjZ+k+qJpGCwxwCI5lZD9t4W3j503r3/wqrqi6RhTwu8\n iIKwTsATDAB7+aEbmwmlnIJRXSgSVvCHKYxmImgErdTC2Wg5xe983WhVNHhPd4o6Rv9u1IOmAhX\n j88hDXmk0eLSr/6JNsQXuSoyRmbOEMUY03fnRFuFdrqTOxWCHlpQ5x5DHNGdijWed6T+giVr4Ku\n exnEgF9nmXSfja6iMHAVx40En8CLmLjtrhPsWita3nNUKpY/WzpTXAe3l9yHg45sIWNcj1biLm5\n n7S4VGZ0k3BhxINA0q/lmMrv+PiOTZxO/AkSoqsazjBCYc1gT/z1khmg6OQ==","X-Received":["by 2002:ac8:5fd1:0:b0:50d:6b06:a453 with SMTP id\n d75a77b69052e-50e36a3f60bmr421425951cf.18.1776957931753;\n Thu, 23 Apr 2026 08:25:31 -0700 (PDT)","by 2002:ac8:5fd1:0:b0:50d:6b06:a453 with SMTP id\n d75a77b69052e-50e36a3f60bmr421425131cf.18.1776957930984;\n Thu, 23 Apr 2026 08:25:30 -0700 (PDT)"],"Date":"Thu, 23 Apr 2026 11:25:28 -0400","From":"Peter Xu <peterx@redhat.com>","To":"Fabiano Rosas <farosas@suse.de>","Cc":"qemu-devel@nongnu.org, Prasad Pandit <ppandit@redhat.com>,\n Ben Chaney <bchaney@akamai.com>, Juraj Marcin <jmarcin@redhat.com>,\n Mark Kanda <mark.kanda@oracle.com>, Pranav Tyagi <prtyagi@redhat.com>,\n\t=?utf-8?q?Marc-Andr=C3=A9?= Lureau <marcandre.lureau@redhat.com>, Daniel\n\t=?utf-8?b?UC4gQmVycmFuZ8Op?= <berrange@redhat.com>","Subject":"Re: [PATCH] migration: Fix crash on second migration when cancel\n early","Message-ID":"<aeo56NITJuRd1FN5@x1.local>","References":"<20260421175820.302795-1-peterx@redhat.com>\n <87wlxzdsv9.fsf@suse.de> <aejqGMhhrfQENE1X@x1.local>\n <87qzo6ek7d.fsf@suse.de>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<87qzo6ek7d.fsf@suse.de>","Received-SPF":"pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com;\n helo=us-smtp-delivery-124.mimecast.com","X-Spam_score_int":"-20","X-Spam_score":"-2.1","X-Spam_bar":"--","X-Spam_report":"(-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001,\n DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,\n RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001,\n SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no","X-Spam_action":"no action","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.29","Precedence":"list","List-Id":"qemu development <qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<https://lists.nongnu.org/archive/html/qemu-devel>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n <mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org"}}]