[{"id":1769016,"web_url":"http://patchwork.ozlabs.org/comment/1769016/","msgid":"<20170915080350.GS3617@pxdev.xzpeter.org>","list_archive_url":null,"date":"2017-09-15T08:03:50","subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Fri, Sep 15, 2017 at 01:44:03PM +0800, Fam Zheng wrote:\n> bdrv_close_all() would abort() due to op blockers added by BMDS, clean\n> up migration states when main loop quits to avoid that.\n> \n> Signed-off-by: Fam Zheng <famz@redhat.com>\n> ---\n>  include/migration/misc.h | 1 +\n>  migration/migration.c    | 7 ++++++-\n>  vl.c                     | 3 +++\n>  3 files changed, 10 insertions(+), 1 deletion(-)\n> \n> diff --git a/include/migration/misc.h b/include/migration/misc.h\n> index c079b7771b..b9a26b0898 100644\n> --- a/include/migration/misc.h\n> +++ b/include/migration/misc.h\n> @@ -54,5 +54,6 @@ bool migration_has_failed(MigrationState *);\n>  /* ...and after the device transmission */\n>  bool migration_in_postcopy_after_devices(MigrationState *);\n>  void migration_global_dump(Monitor *mon);\n> +void migrate_cancel(void);\n>  \n>  #endif\n> diff --git a/migration/migration.c b/migration/migration.c\n> index 959e8ec88e..2c844945c7 100644\n> --- a/migration/migration.c\n> +++ b/migration/migration.c\n> @@ -1274,11 +1274,16 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,\n>      }\n>  }\n>  \n> -void qmp_migrate_cancel(Error **errp)\n> +void migrate_cancel(void)\n>  {\n>      migrate_fd_cancel(migrate_get_current());\n>  }\n>  \n> +void qmp_migrate_cancel(Error **errp)\n> +{\n> +    migrate_cancel();\n> +}\n> +\n\nNit: I would prefer just call migrate_fd_cancel() below, since I don't\nsee much point to introduce migrate_cancel() which only calls\nmigrate_fd_cancel()...\n\n>  void qmp_migrate_set_cache_size(int64_t value, Error **errp)\n>  {\n>      MigrationState *s = migrate_get_current();\n> diff --git a/vl.c b/vl.c\n> index fb1f05b937..abbe61f40b 100644\n> --- a/vl.c\n> +++ b/vl.c\n> @@ -87,6 +87,7 @@ int main(int argc, char **argv)\n>  #include \"sysemu/blockdev.h\"\n>  #include \"hw/block/block.h\"\n>  #include \"migration/misc.h\"\n> +#include \"migration/savevm.h\"\n>  #include \"migration/snapshot.h\"\n>  #include \"migration/global_state.h\"\n>  #include \"sysemu/tpm.h\"\n> @@ -4799,6 +4800,8 @@ int main(int argc, char **argv, char **envp)\n>      iothread_stop_all();\n>  \n>      pause_all_vcpus();\n> +    migrate_cancel();\n\nIIUC this is an async cancel, so when reach here the migration thread\ncan still be alive.  Then...\n\n> +    qemu_savevm_state_cleanup();\n\n... Here calling qemu_savevm_state_cleanup() may be problematic if\nmigration thread has not yet quitted.\n\nI'm thinking whether we should make migrate_fd_cancel() wait until the\nmigration thread finishes (state change to CANCELLED).  Then the\nmigration thread will do the cleanup, and here we can avoid calling\nqemu_savevm_state_cleanup() as well.\n\n>      bdrv_close_all();\n>      res_free();\n>  \n> -- \n> 2.13.5\n>","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtp502nHwz9sPr\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 15 Sep 2017 18:11:16 +1000 (AEST)","from localhost ([::1]:51860 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dsliU-0007u7-Gl\n\tfor incoming@patchwork.ozlabs.org; Fri, 15 Sep 2017 04:11:14 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:55295)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dslbb-0000SW-86\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 04:04:08 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dslbV-0001Tf-A4\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 04:04:07 -0400","from mx1.redhat.com ([209.132.183.28]:51322)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>)\n\tid 1dslbU-0001TB-UO; Fri, 15 Sep 2017 04:04:01 -0400","from smtp.corp.redhat.com\n\t(int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id AC2D3356EC;\n\tFri, 15 Sep 2017 08:03:59 +0000 (UTC)","from pxdev.xzpeter.org (ovpn-12-86.pek2.redhat.com [10.72.12.86])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id BA3686F100;\n\tFri, 15 Sep 2017 08:03:54 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com AC2D3356EC","Date":"Fri, 15 Sep 2017 16:03:50 +0800","From":"Peter Xu <peterx@redhat.com>","To":"Fam Zheng <famz@redhat.com>","Message-ID":"<20170915080350.GS3617@pxdev.xzpeter.org>","References":"<20170915054404.19914-1-famz@redhat.com>\n\t<20170915054404.19914-3-famz@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170915054404.19914-3-famz@redhat.com>","User-Agent":"Mutt/1.5.24 (2015-08-30)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.13","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.30]);\n\tFri, 15 Sep 2017 08:03:59 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-stable@nongnu.org, Juan Quintela <quintela@redhat.com>,\n\tqemu-devel@nongnu.org, \"Dr. David Alan Gilbert\" <dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1769032,"web_url":"http://patchwork.ozlabs.org/comment/1769032/","msgid":"<20170915082026.GG17199@lemon>","list_archive_url":null,"date":"2017-09-15T08:20:26","subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","submitter":{"id":24872,"url":"http://patchwork.ozlabs.org/api/people/24872/","name":"Fam Zheng","email":"famz@redhat.com"},"content":"On Fri, 09/15 16:03, Peter Xu wrote:\n> On Fri, Sep 15, 2017 at 01:44:03PM +0800, Fam Zheng wrote:\n> > bdrv_close_all() would abort() due to op blockers added by BMDS, clean\n> > up migration states when main loop quits to avoid that.\n> > \n> > Signed-off-by: Fam Zheng <famz@redhat.com>\n> > ---\n> >  include/migration/misc.h | 1 +\n> >  migration/migration.c    | 7 ++++++-\n> >  vl.c                     | 3 +++\n> >  3 files changed, 10 insertions(+), 1 deletion(-)\n> > \n> > diff --git a/include/migration/misc.h b/include/migration/misc.h\n> > index c079b7771b..b9a26b0898 100644\n> > --- a/include/migration/misc.h\n> > +++ b/include/migration/misc.h\n> > @@ -54,5 +54,6 @@ bool migration_has_failed(MigrationState *);\n> >  /* ...and after the device transmission */\n> >  bool migration_in_postcopy_after_devices(MigrationState *);\n> >  void migration_global_dump(Monitor *mon);\n> > +void migrate_cancel(void);\n> >  \n> >  #endif\n> > diff --git a/migration/migration.c b/migration/migration.c\n> > index 959e8ec88e..2c844945c7 100644\n> > --- a/migration/migration.c\n> > +++ b/migration/migration.c\n> > @@ -1274,11 +1274,16 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,\n> >      }\n> >  }\n> >  \n> > -void qmp_migrate_cancel(Error **errp)\n> > +void migrate_cancel(void)\n> >  {\n> >      migrate_fd_cancel(migrate_get_current());\n> >  }\n> >  \n> > +void qmp_migrate_cancel(Error **errp)\n> > +{\n> > +    migrate_cancel();\n> > +}\n> > +\n> \n> Nit: I would prefer just call migrate_fd_cancel() below, since I don't\n> see much point to introduce migrate_cancel() which only calls\n> migrate_fd_cancel()...\n\nmigrate_get_current() is a migration internal IMHO. But that can be moved to\nmigrate_fd_cancel() so the parameter is dropped.\n\n> \n> >  void qmp_migrate_set_cache_size(int64_t value, Error **errp)\n> >  {\n> >      MigrationState *s = migrate_get_current();\n> > diff --git a/vl.c b/vl.c\n> > index fb1f05b937..abbe61f40b 100644\n> > --- a/vl.c\n> > +++ b/vl.c\n> > @@ -87,6 +87,7 @@ int main(int argc, char **argv)\n> >  #include \"sysemu/blockdev.h\"\n> >  #include \"hw/block/block.h\"\n> >  #include \"migration/misc.h\"\n> > +#include \"migration/savevm.h\"\n> >  #include \"migration/snapshot.h\"\n> >  #include \"migration/global_state.h\"\n> >  #include \"sysemu/tpm.h\"\n> > @@ -4799,6 +4800,8 @@ int main(int argc, char **argv, char **envp)\n> >      iothread_stop_all();\n> >  \n> >      pause_all_vcpus();\n> > +    migrate_cancel();\n> \n> IIUC this is an async cancel, so when reach here the migration thread\n> can still be alive.  Then...\n> \n> > +    qemu_savevm_state_cleanup();\n> \n> ... Here calling qemu_savevm_state_cleanup() may be problematic if\n> migration thread has not yet quitted.\n> \n> I'm thinking whether we should make migrate_fd_cancel() wait until the\n> migration thread finishes (state change to CANCELLED).  Then the\n> migration thread will do the cleanup, and here we can avoid calling\n> qemu_savevm_state_cleanup() as well.\n\nBut if the migration thread is stuck and CANCELLED is never reached, we'll hang\nhere?\n\nFam","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=famz@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtpJW41Jfz9sBZ\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 15 Sep 2017 18:21:14 +1000 (AEST)","from localhost ([::1]:51897 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dsls5-0006Fu-2Y\n\tfor incoming@patchwork.ozlabs.org; Fri, 15 Sep 2017 04:21:09 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:33357)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <famz@redhat.com>) id 1dslra-0006Eo-V4\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 04:20:40 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <famz@redhat.com>) id 1dslrV-0007cl-Qm\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 04:20:38 -0400","from mx1.redhat.com ([209.132.183.28]:53834)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <famz@redhat.com>)\n\tid 1dslrV-0007bV-Gu; Fri, 15 Sep 2017 04:20:33 -0400","from smtp.corp.redhat.com\n\t(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 92D4E356D7;\n\tFri, 15 Sep 2017 08:20:32 +0000 (UTC)","from localhost (ovpn-12-95.pek2.redhat.com [10.72.12.95])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id E433D78C29;\n\tFri, 15 Sep 2017 08:20:28 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 92D4E356D7","Date":"Fri, 15 Sep 2017 16:20:26 +0800","From":"Fam Zheng <famz@redhat.com>","To":"Peter Xu <peterx@redhat.com>","Message-ID":"<20170915082026.GG17199@lemon>","References":"<20170915054404.19914-1-famz@redhat.com>\n\t<20170915054404.19914-3-famz@redhat.com>\n\t<20170915080350.GS3617@pxdev.xzpeter.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170915080350.GS3617@pxdev.xzpeter.org>","User-Agent":"Mutt/1.8.3 (2017-05-23)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.12","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.30]);\n\tFri, 15 Sep 2017 08:20:32 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, qemu-stable@nongnu.org,\n\t\"Dr. David Alan Gilbert\" <dgilbert@redhat.com>,\n\tJuan Quintela <quintela@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1769044,"web_url":"http://patchwork.ozlabs.org/comment/1769044/","msgid":"<20170915083721.GT3617@pxdev.xzpeter.org>","list_archive_url":null,"date":"2017-09-15T08:37:21","subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Fri, Sep 15, 2017 at 04:20:26PM +0800, Fam Zheng wrote:\n\n[...]\n\n> > >  void qmp_migrate_set_cache_size(int64_t value, Error **errp)\n> > >  {\n> > >      MigrationState *s = migrate_get_current();\n> > > diff --git a/vl.c b/vl.c\n> > > index fb1f05b937..abbe61f40b 100644\n> > > --- a/vl.c\n> > > +++ b/vl.c\n> > > @@ -87,6 +87,7 @@ int main(int argc, char **argv)\n> > >  #include \"sysemu/blockdev.h\"\n> > >  #include \"hw/block/block.h\"\n> > >  #include \"migration/misc.h\"\n> > > +#include \"migration/savevm.h\"\n> > >  #include \"migration/snapshot.h\"\n> > >  #include \"migration/global_state.h\"\n> > >  #include \"sysemu/tpm.h\"\n> > > @@ -4799,6 +4800,8 @@ int main(int argc, char **argv, char **envp)\n> > >      iothread_stop_all();\n> > >  \n> > >      pause_all_vcpus();\n> > > +    migrate_cancel();\n> > \n> > IIUC this is an async cancel, so when reach here the migration thread\n> > can still be alive.  Then...\n> > \n> > > +    qemu_savevm_state_cleanup();\n> > \n> > ... Here calling qemu_savevm_state_cleanup() may be problematic if\n> > migration thread has not yet quitted.\n> > \n> > I'm thinking whether we should make migrate_fd_cancel() wait until the\n> > migration thread finishes (state change to CANCELLED).  Then the\n> > migration thread will do the cleanup, and here we can avoid calling\n> > qemu_savevm_state_cleanup() as well.\n> \n> But if the migration thread is stuck and CANCELLED is never reached, we'll hang\n> here?\n\nMaybe we can add timeout. Anyway, if it gets stuck, I do see it a\nmigration bug, and in that case I'm not sure force return after\ntimeout would be anything wiser.\n\nDave/Juan may have better idea though.","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx01.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx01.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtpgr11lTz9sPr\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 15 Sep 2017 18:38:00 +1000 (AEST)","from localhost ([::1]:51941 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dsm8M-0005fO-97\n\tfor incoming@patchwork.ozlabs.org; Fri, 15 Sep 2017 04:37:58 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:37207)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dsm7w-0005Yp-AH\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 04:37:33 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dsm7t-0000t5-8B\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 04:37:32 -0400","from mx1.redhat.com ([209.132.183.28]:58526)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>)\n\tid 1dsm7t-0000sg-1Q; Fri, 15 Sep 2017 04:37:29 -0400","from smtp.corp.redhat.com\n\t(int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id E7ADB81DF3;\n\tFri, 15 Sep 2017 08:37:27 +0000 (UTC)","from pxdev.xzpeter.org (ovpn-12-86.pek2.redhat.com [10.72.12.86])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 2B18D17A90;\n\tFri, 15 Sep 2017 08:37:23 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com E7ADB81DF3","Date":"Fri, 15 Sep 2017 16:37:21 +0800","From":"Peter Xu <peterx@redhat.com>","To":"Fam Zheng <famz@redhat.com>","Message-ID":"<20170915083721.GT3617@pxdev.xzpeter.org>","References":"<20170915054404.19914-1-famz@redhat.com>\n\t<20170915054404.19914-3-famz@redhat.com>\n\t<20170915080350.GS3617@pxdev.xzpeter.org>\n\t<20170915082026.GG17199@lemon>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170915082026.GG17199@lemon>","User-Agent":"Mutt/1.5.24 (2015-08-30)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.13","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.25]);\n\tFri, 15 Sep 2017 08:37:28 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, qemu-stable@nongnu.org,\n\t\"Dr. David Alan Gilbert\" <dgilbert@redhat.com>,\n\tJuan Quintela <quintela@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1769054,"web_url":"http://patchwork.ozlabs.org/comment/1769054/","msgid":"<20170915085247.GH17199@lemon>","list_archive_url":null,"date":"2017-09-15T08:52:47","subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","submitter":{"id":24872,"url":"http://patchwork.ozlabs.org/api/people/24872/","name":"Fam Zheng","email":"famz@redhat.com"},"content":"On Fri, 09/15 09:42, Dr. David Alan Gilbert wrote:\n> * Fam Zheng (famz@redhat.com) wrote:\n> > On Fri, 09/15 16:03, Peter Xu wrote:\n> > > On Fri, Sep 15, 2017 at 01:44:03PM +0800, Fam Zheng wrote:\n> > > > bdrv_close_all() would abort() due to op blockers added by BMDS, clean\n> > > > up migration states when main loop quits to avoid that.\n> > > > \n> > > > Signed-off-by: Fam Zheng <famz@redhat.com>\n> > > > ---\n> > > >  include/migration/misc.h | 1 +\n> > > >  migration/migration.c    | 7 ++++++-\n> > > >  vl.c                     | 3 +++\n> > > >  3 files changed, 10 insertions(+), 1 deletion(-)\n> > > > \n> > > > diff --git a/include/migration/misc.h b/include/migration/misc.h\n> > > > index c079b7771b..b9a26b0898 100644\n> > > > --- a/include/migration/misc.h\n> > > > +++ b/include/migration/misc.h\n> > > > @@ -54,5 +54,6 @@ bool migration_has_failed(MigrationState *);\n> > > >  /* ...and after the device transmission */\n> > > >  bool migration_in_postcopy_after_devices(MigrationState *);\n> > > >  void migration_global_dump(Monitor *mon);\n> > > > +void migrate_cancel(void);\n> > > >  \n> > > >  #endif\n> > > > diff --git a/migration/migration.c b/migration/migration.c\n> > > > index 959e8ec88e..2c844945c7 100644\n> > > > --- a/migration/migration.c\n> > > > +++ b/migration/migration.c\n> > > > @@ -1274,11 +1274,16 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,\n> > > >      }\n> > > >  }\n> > > >  \n> > > > -void qmp_migrate_cancel(Error **errp)\n> > > > +void migrate_cancel(void)\n> > > >  {\n> > > >      migrate_fd_cancel(migrate_get_current());\n> > > >  }\n> > > >  \n> > > > +void qmp_migrate_cancel(Error **errp)\n> > > > +{\n> > > > +    migrate_cancel();\n> > > > +}\n> > > > +\n> > > \n> > > Nit: I would prefer just call migrate_fd_cancel() below, since I don't\n> > > see much point to introduce migrate_cancel() which only calls\n> > > migrate_fd_cancel()...\n> > \n> > migrate_get_current() is a migration internal IMHO. But that can be moved to\n> > migrate_fd_cancel() so the parameter is dropped.\n> > \n> > > \n> > > >  void qmp_migrate_set_cache_size(int64_t value, Error **errp)\n> > > >  {\n> > > >      MigrationState *s = migrate_get_current();\n> > > > diff --git a/vl.c b/vl.c\n> > > > index fb1f05b937..abbe61f40b 100644\n> > > > --- a/vl.c\n> > > > +++ b/vl.c\n> > > > @@ -87,6 +87,7 @@ int main(int argc, char **argv)\n> > > >  #include \"sysemu/blockdev.h\"\n> > > >  #include \"hw/block/block.h\"\n> > > >  #include \"migration/misc.h\"\n> > > > +#include \"migration/savevm.h\"\n> > > >  #include \"migration/snapshot.h\"\n> > > >  #include \"migration/global_state.h\"\n> > > >  #include \"sysemu/tpm.h\"\n> > > > @@ -4799,6 +4800,8 @@ int main(int argc, char **argv, char **envp)\n> > > >      iothread_stop_all();\n> > > >  \n> > > >      pause_all_vcpus();\n> > > > +    migrate_cancel();\n> > > \n> > > IIUC this is an async cancel, so when reach here the migration thread\n> > > can still be alive.  Then...\n> > > \n> > > > +    qemu_savevm_state_cleanup();\n> > > \n> > > ... Here calling qemu_savevm_state_cleanup() may be problematic if\n> > > migration thread has not yet quitted.\n> > > \n> > > I'm thinking whether we should make migrate_fd_cancel() wait until the\n> > > migration thread finishes (state change to CANCELLED).  Then the\n> > > migration thread will do the cleanup, and here we can avoid calling\n> > > qemu_savevm_state_cleanup() as well.\n> > \n> > But if the migration thread is stuck and CANCELLED is never reached, we'll hang\n> > here?\n> \n> I'm not sure I see an easy fix; I agree with Peter that calling\n> migrate_cancel() followed by qemu_savevm_state_cleanup() is racy,\n> because the cancel just forces the state to CANCELLING before coming\n> back to you, and the migration thread asynchronously starts to\n> fail/cleanup.\n> \n> migrate_cancel() can forcibly unblock some cases because it calls\n> shutdown(2) on the network fd, but there are other ways for a migration\n> to hang.\n> \n> Having said that,  the migration thread does it's calls\n> to qemu_savevm_state_cleanup under the lock_iothread;\n> Do we have that lock at this point?\n\nYes we do.  Main loop releases the lock only during poll(), other parts of\nmain() all have the lock.\n\nFam","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx02.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx02.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=famz@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtq4X052Mz9sPr\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 15 Sep 2017 18:55:54 +1000 (AEST)","from localhost ([::1]:52020 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dsmPg-0005bh-G7\n\tfor incoming@patchwork.ozlabs.org; Fri, 15 Sep 2017 04:55:52 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:42872)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <famz@redhat.com>) id 1dsmMp-0003o4-Mr\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 04:52:56 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <famz@redhat.com>) id 1dsmMm-0004li-LK\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 04:52:55 -0400","from mx1.redhat.com ([209.132.183.28]:42534)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <famz@redhat.com>)\n\tid 1dsmMm-0004la-Ci; Fri, 15 Sep 2017 04:52:52 -0400","from smtp.corp.redhat.com\n\t(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 596F3883C4;\n\tFri, 15 Sep 2017 08:52:51 +0000 (UTC)","from localhost (ovpn-12-95.pek2.redhat.com [10.72.12.95])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id D629B78C1F;\n\tFri, 15 Sep 2017 08:52:48 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 596F3883C4","Date":"Fri, 15 Sep 2017 16:52:47 +0800","From":"Fam Zheng <famz@redhat.com>","To":"\"Dr. David Alan Gilbert\" <dgilbert@redhat.com>","Message-ID":"<20170915085247.GH17199@lemon>","References":"<20170915054404.19914-1-famz@redhat.com>\n\t<20170915054404.19914-3-famz@redhat.com>\n\t<20170915080350.GS3617@pxdev.xzpeter.org>\n\t<20170915082026.GG17199@lemon> <20170915084234.GA2170@work-vm>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170915084234.GA2170@work-vm>","User-Agent":"Mutt/1.8.3 (2017-05-23)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.12","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.26]);\n\tFri, 15 Sep 2017 08:52:51 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, qemu-stable@nongnu.org,\n\tPeter Xu <peterx@redhat.com>, Juan Quintela <quintela@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1769055,"web_url":"http://patchwork.ozlabs.org/comment/1769055/","msgid":"<20170915084234.GA2170@work-vm>","list_archive_url":null,"date":"2017-09-15T08:42:35","subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","submitter":{"id":48102,"url":"http://patchwork.ozlabs.org/api/people/48102/","name":"Dr. David Alan Gilbert","email":"dgilbert@redhat.com"},"content":"* Fam Zheng (famz@redhat.com) wrote:\n> On Fri, 09/15 16:03, Peter Xu wrote:\n> > On Fri, Sep 15, 2017 at 01:44:03PM +0800, Fam Zheng wrote:\n> > > bdrv_close_all() would abort() due to op blockers added by BMDS, clean\n> > > up migration states when main loop quits to avoid that.\n> > > \n> > > Signed-off-by: Fam Zheng <famz@redhat.com>\n> > > ---\n> > >  include/migration/misc.h | 1 +\n> > >  migration/migration.c    | 7 ++++++-\n> > >  vl.c                     | 3 +++\n> > >  3 files changed, 10 insertions(+), 1 deletion(-)\n> > > \n> > > diff --git a/include/migration/misc.h b/include/migration/misc.h\n> > > index c079b7771b..b9a26b0898 100644\n> > > --- a/include/migration/misc.h\n> > > +++ b/include/migration/misc.h\n> > > @@ -54,5 +54,6 @@ bool migration_has_failed(MigrationState *);\n> > >  /* ...and after the device transmission */\n> > >  bool migration_in_postcopy_after_devices(MigrationState *);\n> > >  void migration_global_dump(Monitor *mon);\n> > > +void migrate_cancel(void);\n> > >  \n> > >  #endif\n> > > diff --git a/migration/migration.c b/migration/migration.c\n> > > index 959e8ec88e..2c844945c7 100644\n> > > --- a/migration/migration.c\n> > > +++ b/migration/migration.c\n> > > @@ -1274,11 +1274,16 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,\n> > >      }\n> > >  }\n> > >  \n> > > -void qmp_migrate_cancel(Error **errp)\n> > > +void migrate_cancel(void)\n> > >  {\n> > >      migrate_fd_cancel(migrate_get_current());\n> > >  }\n> > >  \n> > > +void qmp_migrate_cancel(Error **errp)\n> > > +{\n> > > +    migrate_cancel();\n> > > +}\n> > > +\n> > \n> > Nit: I would prefer just call migrate_fd_cancel() below, since I don't\n> > see much point to introduce migrate_cancel() which only calls\n> > migrate_fd_cancel()...\n> \n> migrate_get_current() is a migration internal IMHO. But that can be moved to\n> migrate_fd_cancel() so the parameter is dropped.\n> \n> > \n> > >  void qmp_migrate_set_cache_size(int64_t value, Error **errp)\n> > >  {\n> > >      MigrationState *s = migrate_get_current();\n> > > diff --git a/vl.c b/vl.c\n> > > index fb1f05b937..abbe61f40b 100644\n> > > --- a/vl.c\n> > > +++ b/vl.c\n> > > @@ -87,6 +87,7 @@ int main(int argc, char **argv)\n> > >  #include \"sysemu/blockdev.h\"\n> > >  #include \"hw/block/block.h\"\n> > >  #include \"migration/misc.h\"\n> > > +#include \"migration/savevm.h\"\n> > >  #include \"migration/snapshot.h\"\n> > >  #include \"migration/global_state.h\"\n> > >  #include \"sysemu/tpm.h\"\n> > > @@ -4799,6 +4800,8 @@ int main(int argc, char **argv, char **envp)\n> > >      iothread_stop_all();\n> > >  \n> > >      pause_all_vcpus();\n> > > +    migrate_cancel();\n> > \n> > IIUC this is an async cancel, so when reach here the migration thread\n> > can still be alive.  Then...\n> > \n> > > +    qemu_savevm_state_cleanup();\n> > \n> > ... Here calling qemu_savevm_state_cleanup() may be problematic if\n> > migration thread has not yet quitted.\n> > \n> > I'm thinking whether we should make migrate_fd_cancel() wait until the\n> > migration thread finishes (state change to CANCELLED).  Then the\n> > migration thread will do the cleanup, and here we can avoid calling\n> > qemu_savevm_state_cleanup() as well.\n> \n> But if the migration thread is stuck and CANCELLED is never reached, we'll hang\n> here?\n\nI'm not sure I see an easy fix; I agree with Peter that calling\nmigrate_cancel() followed by qemu_savevm_state_cleanup() is racy,\nbecause the cancel just forces the state to CANCELLING before coming\nback to you, and the migration thread asynchronously starts to\nfail/cleanup.\n\nmigrate_cancel() can forcibly unblock some cases because it calls\nshutdown(2) on the network fd, but there are other ways for a migration\nto hang.\n\nHaving said that,  the migration thread does it's calls\nto qemu_savevm_state_cleanup under the lock_iothread;\nDo we have that lock at this point?\n\nDave\n\n> Fam\n--\nDr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx05.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx05.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=dgilbert@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtq5Q4mVHz9sRm\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 15 Sep 2017 18:56:42 +1000 (AEST)","from localhost ([::1]:52024 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dsmQS-0006LX-Oe\n\tfor incoming@patchwork.ozlabs.org; Fri, 15 Sep 2017 04:56:40 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:39121)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <dgilbert@redhat.com>) id 1dsmD1-0001UA-6w\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 04:42:48 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <dgilbert@redhat.com>) id 1dsmCy-0003U6-53\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 04:42:47 -0400","from mx1.redhat.com ([209.132.183.28]:47846)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <dgilbert@redhat.com>)\n\tid 1dsmCx-0003Sd-TQ; Fri, 15 Sep 2017 04:42:44 -0400","from smtp.corp.redhat.com\n\t(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id D99AB13AB2;\n\tFri, 15 Sep 2017 08:42:42 +0000 (UTC)","from work-vm (ovpn-117-180.ams2.redhat.com [10.36.117.180])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id E73C160462;\n\tFri, 15 Sep 2017 08:42:37 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com D99AB13AB2","Date":"Fri, 15 Sep 2017 09:42:35 +0100","From":"\"Dr. David Alan Gilbert\" <dgilbert@redhat.com>","To":"Fam Zheng <famz@redhat.com>","Message-ID":"<20170915084234.GA2170@work-vm>","References":"<20170915054404.19914-1-famz@redhat.com>\n\t<20170915054404.19914-3-famz@redhat.com>\n\t<20170915080350.GS3617@pxdev.xzpeter.org>\n\t<20170915082026.GG17199@lemon>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170915082026.GG17199@lemon>","User-Agent":"Mutt/1.8.3 (2017-05-23)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.11","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.29]);\n\tFri, 15 Sep 2017 08:42:43 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, qemu-stable@nongnu.org,\n\tPeter Xu <peterx@redhat.com>, Juan Quintela <quintela@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1769070,"web_url":"http://patchwork.ozlabs.org/comment/1769070/","msgid":"<20170915092242.GB2170@work-vm>","list_archive_url":null,"date":"2017-09-15T09:22:43","subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","submitter":{"id":48102,"url":"http://patchwork.ozlabs.org/api/people/48102/","name":"Dr. David Alan Gilbert","email":"dgilbert@redhat.com"},"content":"* Fam Zheng (famz@redhat.com) wrote:\n> On Fri, 09/15 09:42, Dr. David Alan Gilbert wrote:\n> > * Fam Zheng (famz@redhat.com) wrote:\n> > > On Fri, 09/15 16:03, Peter Xu wrote:\n> > > > On Fri, Sep 15, 2017 at 01:44:03PM +0800, Fam Zheng wrote:\n> > > > > bdrv_close_all() would abort() due to op blockers added by BMDS, clean\n> > > > > up migration states when main loop quits to avoid that.\n> > > > > \n> > > > > Signed-off-by: Fam Zheng <famz@redhat.com>\n> > > > > ---\n> > > > >  include/migration/misc.h | 1 +\n> > > > >  migration/migration.c    | 7 ++++++-\n> > > > >  vl.c                     | 3 +++\n> > > > >  3 files changed, 10 insertions(+), 1 deletion(-)\n> > > > > \n> > > > > diff --git a/include/migration/misc.h b/include/migration/misc.h\n> > > > > index c079b7771b..b9a26b0898 100644\n> > > > > --- a/include/migration/misc.h\n> > > > > +++ b/include/migration/misc.h\n> > > > > @@ -54,5 +54,6 @@ bool migration_has_failed(MigrationState *);\n> > > > >  /* ...and after the device transmission */\n> > > > >  bool migration_in_postcopy_after_devices(MigrationState *);\n> > > > >  void migration_global_dump(Monitor *mon);\n> > > > > +void migrate_cancel(void);\n> > > > >  \n> > > > >  #endif\n> > > > > diff --git a/migration/migration.c b/migration/migration.c\n> > > > > index 959e8ec88e..2c844945c7 100644\n> > > > > --- a/migration/migration.c\n> > > > > +++ b/migration/migration.c\n> > > > > @@ -1274,11 +1274,16 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,\n> > > > >      }\n> > > > >  }\n> > > > >  \n> > > > > -void qmp_migrate_cancel(Error **errp)\n> > > > > +void migrate_cancel(void)\n> > > > >  {\n> > > > >      migrate_fd_cancel(migrate_get_current());\n> > > > >  }\n> > > > >  \n> > > > > +void qmp_migrate_cancel(Error **errp)\n> > > > > +{\n> > > > > +    migrate_cancel();\n> > > > > +}\n> > > > > +\n> > > > \n> > > > Nit: I would prefer just call migrate_fd_cancel() below, since I don't\n> > > > see much point to introduce migrate_cancel() which only calls\n> > > > migrate_fd_cancel()...\n> > > \n> > > migrate_get_current() is a migration internal IMHO. But that can be moved to\n> > > migrate_fd_cancel() so the parameter is dropped.\n> > > \n> > > > \n> > > > >  void qmp_migrate_set_cache_size(int64_t value, Error **errp)\n> > > > >  {\n> > > > >      MigrationState *s = migrate_get_current();\n> > > > > diff --git a/vl.c b/vl.c\n> > > > > index fb1f05b937..abbe61f40b 100644\n> > > > > --- a/vl.c\n> > > > > +++ b/vl.c\n> > > > > @@ -87,6 +87,7 @@ int main(int argc, char **argv)\n> > > > >  #include \"sysemu/blockdev.h\"\n> > > > >  #include \"hw/block/block.h\"\n> > > > >  #include \"migration/misc.h\"\n> > > > > +#include \"migration/savevm.h\"\n> > > > >  #include \"migration/snapshot.h\"\n> > > > >  #include \"migration/global_state.h\"\n> > > > >  #include \"sysemu/tpm.h\"\n> > > > > @@ -4799,6 +4800,8 @@ int main(int argc, char **argv, char **envp)\n> > > > >      iothread_stop_all();\n> > > > >  \n> > > > >      pause_all_vcpus();\n> > > > > +    migrate_cancel();\n> > > > \n> > > > IIUC this is an async cancel, so when reach here the migration thread\n> > > > can still be alive.  Then...\n> > > > \n> > > > > +    qemu_savevm_state_cleanup();\n> > > > \n> > > > ... Here calling qemu_savevm_state_cleanup() may be problematic if\n> > > > migration thread has not yet quitted.\n> > > > \n> > > > I'm thinking whether we should make migrate_fd_cancel() wait until the\n> > > > migration thread finishes (state change to CANCELLED).  Then the\n> > > > migration thread will do the cleanup, and here we can avoid calling\n> > > > qemu_savevm_state_cleanup() as well.\n> > > \n> > > But if the migration thread is stuck and CANCELLED is never reached, we'll hang\n> > > here?\n> > \n> > I'm not sure I see an easy fix; I agree with Peter that calling\n> > migrate_cancel() followed by qemu_savevm_state_cleanup() is racy,\n> > because the cancel just forces the state to CANCELLING before coming\n> > back to you, and the migration thread asynchronously starts to\n> > fail/cleanup.\n> > \n> > migrate_cancel() can forcibly unblock some cases because it calls\n> > shutdown(2) on the network fd, but there are other ways for a migration\n> > to hang.\n> > \n> > Having said that,  the migration thread does it's calls\n> > to qemu_savevm_state_cleanup under the lock_iothread;\n> > Do we have that lock at this point?\n> \n> Yes we do.  Main loop releases the lock only during poll(), other parts of\n> main() all have the lock.\n\nHaving said that though this is pretty confusing; because at this point\nwe're after main_loop has exited.\nI don't think the migration thread will exit without having taken the\niothread lock; so if we've got it at this point then the migration\nthread will never exit, and it will never call qemu_savevm_state_cleanup\nitself - so that race might not exist?\n\nHowever, assuming the migration is at an earlier point, it might be\ncalling one of the state handlers that you're cleaning up, and that's\nracy in individual devices.\n\nIf we have the lock I don't think we can wait for the migration to\ncomplete/cancel.   The transition from cancelling->cancelled happens\nin migrate_fd_cleanup() - and that's run of a bh which I assume don't\nwork any more at this point.\n\nPerhaps the answer is to move this to qemu_system_shutdown_request prior\nto the point where shutdown_requested is set?  At that point we've\nstill got the main loop, although hmm I'm not convinced if it's\nconsistent whether that's called with or without the lock held.\n\nDave\n\n> Fam\n--\nDr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=dgilbert@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtqhQ3FHJz9ryk\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 15 Sep 2017 19:23:33 +1000 (AEST)","from localhost ([::1]:52199 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dsmqQ-0007cV-Sk\n\tfor incoming@patchwork.ozlabs.org; Fri, 15 Sep 2017 05:23:30 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:55237)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <dgilbert@redhat.com>) id 1dsmpr-0007b0-L3\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 05:22:57 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <dgilbert@redhat.com>) id 1dsmpn-0000Wg-L8\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 05:22:55 -0400","from mx1.redhat.com ([209.132.183.28]:43772)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <dgilbert@redhat.com>)\n\tid 1dsmpn-0000WC-DU; Fri, 15 Sep 2017 05:22:51 -0400","from smtp.corp.redhat.com\n\t(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 73343356D7;\n\tFri, 15 Sep 2017 09:22:50 +0000 (UTC)","from work-vm (ovpn-117-180.ams2.redhat.com [10.36.117.180])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 7563E60462;\n\tFri, 15 Sep 2017 09:22:45 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 73343356D7","Date":"Fri, 15 Sep 2017 10:22:43 +0100","From":"\"Dr. David Alan Gilbert\" <dgilbert@redhat.com>","To":"Fam Zheng <famz@redhat.com>","Message-ID":"<20170915092242.GB2170@work-vm>","References":"<20170915054404.19914-1-famz@redhat.com>\n\t<20170915054404.19914-3-famz@redhat.com>\n\t<20170915080350.GS3617@pxdev.xzpeter.org>\n\t<20170915082026.GG17199@lemon> <20170915084234.GA2170@work-vm>\n\t<20170915085247.GH17199@lemon>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170915085247.GH17199@lemon>","User-Agent":"Mutt/1.8.3 (2017-05-23)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.11","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.30]);\n\tFri, 15 Sep 2017 09:22:50 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, qemu-stable@nongnu.org,\n\tPeter Xu <peterx@redhat.com>, Juan Quintela <quintela@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1769320,"web_url":"http://patchwork.ozlabs.org/comment/1769320/","msgid":"<48f385c6-629e-12c8-d039-737cc74f1345@redhat.com>","list_archive_url":null,"date":"2017-09-15T17:29:06","subject":"Re: [Qemu-devel] [PATCH 3/3] iotests: Add \"quit during block\n\tmigration\" case 195","submitter":{"id":6591,"url":"http://patchwork.ozlabs.org/api/people/6591/","name":"Eric Blake","email":"eblake@redhat.com"},"content":"On 09/15/2017 12:44 AM, Fam Zheng wrote:\n> Signed-off-by: Fam Zheng <famz@redhat.com>\n> ---\n>  tests/qemu-iotests/195     | 97 ++++++++++++++++++++++++++++++++++++++++++++++\n>  tests/qemu-iotests/195.out | 19 +++++++++\n>  tests/qemu-iotests/group   |  1 +\n>  3 files changed, 117 insertions(+)\n>  create mode 100755 tests/qemu-iotests/195\n>  create mode 100644 tests/qemu-iotests/195.out\n\n> +_cleanup()\n> +{\n> +    rm -f \"${MIG_SOCKET}\"\n> +    rm -f \"${TEST_IMG}.dest\"\n> +    _cleanup_test_img\n> +    _cleanup_qemu\n> +}\n> +trap \"_cleanup; exit \\$status\" 0 1 2 3 15\n\nSemantic conflict with Jeff's patch series that adds './check -s' that\npreserves files in the per-test temp directory.","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx10.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx10.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=eblake@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xv2TV5lwPz9sNc\n\tfor <incoming@patchwork.ozlabs.org>;\n\tSat, 16 Sep 2017 03:29:49 +1000 (AEST)","from localhost ([::1]:54393 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dsuR0-0006bx-HP\n\tfor incoming@patchwork.ozlabs.org; Fri, 15 Sep 2017 13:29:46 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:57753)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <eblake@redhat.com>) id 1dsuQb-0006bT-20\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 13:29:22 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <eblake@redhat.com>) id 1dsuQW-0002zD-HG\n\tfor qemu-devel@nongnu.org; Fri, 15 Sep 2017 13:29:21 -0400","from mx1.redhat.com ([209.132.183.28]:56198)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <eblake@redhat.com>)\n\tid 1dsuQW-0002vR-75; Fri, 15 Sep 2017 13:29:16 -0400","from smtp.corp.redhat.com\n\t(int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 0B65F2BEF6;\n\tFri, 15 Sep 2017 17:29:14 +0000 (UTC)","from [10.10.124.97] (ovpn-124-97.rdu2.redhat.com [10.10.124.97])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id D248C413C;\n\tFri, 15 Sep 2017 17:29:07 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 0B65F2BEF6","To":"Fam Zheng <famz@redhat.com>, qemu-devel@nongnu.org","References":"<20170915054404.19914-1-famz@redhat.com>\n\t<20170915054404.19914-4-famz@redhat.com>","From":"Eric Blake <eblake@redhat.com>","Openpgp":"url=http://people.redhat.com/eblake/eblake.gpg","Organization":"Red Hat, Inc.","Message-ID":"<48f385c6-629e-12c8-d039-737cc74f1345@redhat.com>","Date":"Fri, 15 Sep 2017 12:29:06 -0500","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170915054404.19914-4-famz@redhat.com>","Content-Type":"multipart/signed; micalg=pgp-sha256;\n\tprotocol=\"application/pgp-signature\";\n\tboundary=\"JtUd6pP0Q48DVrQH8Cn3qTJWDhvowsrRS\"","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.14","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.39]);\n\tFri, 15 Sep 2017 17:29:14 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","X-Content-Filtered-By":"Mailman/MimeDel 2.1.21","Subject":"Re: [Qemu-devel] [PATCH 3/3] iotests: Add \"quit during block\n\tmigration\" case 195","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Jeff Cody <jcody@redhat.com>, Juan Quintela <quintela@redhat.com>,\n\tqemu-stable@nongnu.org, peterx@redhat.com,\n\t\"Dr. David Alan Gilbert\" <dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1769907,"web_url":"http://patchwork.ozlabs.org/comment/1769907/","msgid":"<20170918073159.GZ3617@pxdev.xzpeter.org>","list_archive_url":null,"date":"2017-09-18T07:31:59","subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Fri, Sep 15, 2017 at 10:22:43AM +0100, Dr. David Alan Gilbert wrote:\n> * Fam Zheng (famz@redhat.com) wrote:\n> > On Fri, 09/15 09:42, Dr. David Alan Gilbert wrote:\n> > > * Fam Zheng (famz@redhat.com) wrote:\n> > > > On Fri, 09/15 16:03, Peter Xu wrote:\n> > > > > On Fri, Sep 15, 2017 at 01:44:03PM +0800, Fam Zheng wrote:\n> > > > > > bdrv_close_all() would abort() due to op blockers added by BMDS, clean\n> > > > > > up migration states when main loop quits to avoid that.\n> > > > > > \n> > > > > > Signed-off-by: Fam Zheng <famz@redhat.com>\n> > > > > > ---\n> > > > > >  include/migration/misc.h | 1 +\n> > > > > >  migration/migration.c    | 7 ++++++-\n> > > > > >  vl.c                     | 3 +++\n> > > > > >  3 files changed, 10 insertions(+), 1 deletion(-)\n> > > > > > \n> > > > > > diff --git a/include/migration/misc.h b/include/migration/misc.h\n> > > > > > index c079b7771b..b9a26b0898 100644\n> > > > > > --- a/include/migration/misc.h\n> > > > > > +++ b/include/migration/misc.h\n> > > > > > @@ -54,5 +54,6 @@ bool migration_has_failed(MigrationState *);\n> > > > > >  /* ...and after the device transmission */\n> > > > > >  bool migration_in_postcopy_after_devices(MigrationState *);\n> > > > > >  void migration_global_dump(Monitor *mon);\n> > > > > > +void migrate_cancel(void);\n> > > > > >  \n> > > > > >  #endif\n> > > > > > diff --git a/migration/migration.c b/migration/migration.c\n> > > > > > index 959e8ec88e..2c844945c7 100644\n> > > > > > --- a/migration/migration.c\n> > > > > > +++ b/migration/migration.c\n> > > > > > @@ -1274,11 +1274,16 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,\n> > > > > >      }\n> > > > > >  }\n> > > > > >  \n> > > > > > -void qmp_migrate_cancel(Error **errp)\n> > > > > > +void migrate_cancel(void)\n> > > > > >  {\n> > > > > >      migrate_fd_cancel(migrate_get_current());\n> > > > > >  }\n> > > > > >  \n> > > > > > +void qmp_migrate_cancel(Error **errp)\n> > > > > > +{\n> > > > > > +    migrate_cancel();\n> > > > > > +}\n> > > > > > +\n> > > > > \n> > > > > Nit: I would prefer just call migrate_fd_cancel() below, since I don't\n> > > > > see much point to introduce migrate_cancel() which only calls\n> > > > > migrate_fd_cancel()...\n> > > > \n> > > > migrate_get_current() is a migration internal IMHO. But that can be moved to\n> > > > migrate_fd_cancel() so the parameter is dropped.\n> > > > \n> > > > > \n> > > > > >  void qmp_migrate_set_cache_size(int64_t value, Error **errp)\n> > > > > >  {\n> > > > > >      MigrationState *s = migrate_get_current();\n> > > > > > diff --git a/vl.c b/vl.c\n> > > > > > index fb1f05b937..abbe61f40b 100644\n> > > > > > --- a/vl.c\n> > > > > > +++ b/vl.c\n> > > > > > @@ -87,6 +87,7 @@ int main(int argc, char **argv)\n> > > > > >  #include \"sysemu/blockdev.h\"\n> > > > > >  #include \"hw/block/block.h\"\n> > > > > >  #include \"migration/misc.h\"\n> > > > > > +#include \"migration/savevm.h\"\n> > > > > >  #include \"migration/snapshot.h\"\n> > > > > >  #include \"migration/global_state.h\"\n> > > > > >  #include \"sysemu/tpm.h\"\n> > > > > > @@ -4799,6 +4800,8 @@ int main(int argc, char **argv, char **envp)\n> > > > > >      iothread_stop_all();\n> > > > > >  \n> > > > > >      pause_all_vcpus();\n> > > > > > +    migrate_cancel();\n> > > > > \n> > > > > IIUC this is an async cancel, so when reach here the migration thread\n> > > > > can still be alive.  Then...\n> > > > > \n> > > > > > +    qemu_savevm_state_cleanup();\n> > > > > \n> > > > > ... Here calling qemu_savevm_state_cleanup() may be problematic if\n> > > > > migration thread has not yet quitted.\n> > > > > \n> > > > > I'm thinking whether we should make migrate_fd_cancel() wait until the\n> > > > > migration thread finishes (state change to CANCELLED).  Then the\n> > > > > migration thread will do the cleanup, and here we can avoid calling\n> > > > > qemu_savevm_state_cleanup() as well.\n> > > > \n> > > > But if the migration thread is stuck and CANCELLED is never reached, we'll hang\n> > > > here?\n> > > \n> > > I'm not sure I see an easy fix; I agree with Peter that calling\n> > > migrate_cancel() followed by qemu_savevm_state_cleanup() is racy,\n> > > because the cancel just forces the state to CANCELLING before coming\n> > > back to you, and the migration thread asynchronously starts to\n> > > fail/cleanup.\n> > > \n> > > migrate_cancel() can forcibly unblock some cases because it calls\n> > > shutdown(2) on the network fd, but there are other ways for a migration\n> > > to hang.\n> > > \n> > > Having said that,  the migration thread does it's calls\n> > > to qemu_savevm_state_cleanup under the lock_iothread;\n> > > Do we have that lock at this point?\n> > \n> > Yes we do.  Main loop releases the lock only during poll(), other parts of\n> > main() all have the lock.\n> \n> Having said that though this is pretty confusing; because at this point\n> we're after main_loop has exited.\n> I don't think the migration thread will exit without having taken the\n> iothread lock; so if we've got it at this point then the migration\n> thread will never exit, and it will never call qemu_savevm_state_cleanup\n> itself - so that race might not exist?\n\nIs that because we are taking the BQL in migration_thread()?  Please\nsee my below questions...\n\n> \n> However, assuming the migration is at an earlier point, it might be\n> calling one of the state handlers that you're cleaning up, and that's\n> racy in individual devices.\n> \n> If we have the lock I don't think we can wait for the migration to\n> complete/cancel.   The transition from cancelling->cancelled happens\n> in migrate_fd_cleanup() - and that's run of a bh which I assume don't\n> work any more at this point.\n> \n> Perhaps the answer is to move this to qemu_system_shutdown_request prior\n> to the point where shutdown_requested is set?  At that point we've\n> still got the main loop, although hmm I'm not convinced if it's\n> consistent whether that's called with or without the lock held.\n\nI do think the migration cleanup part needs some cleanup itself... :)\n\nActually I have two questions here about BQL and migration:\n\n(1) I see that we took BQL in migration_thread(), but if the migration\n    thread is to be cancelled, could we avoid taking the lock for the\n    cancelling case right after we break from the big migration loop?\n    Since I don't see why we need it if we are after all going to\n    cancel the migration...  (though this one may need some other\n    cleanups to let it work I guess)\n\n(2) I see that we took BQL in migrate_fd_cleanup().  Could I ask why\n    we had that?  Can we remove it?","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx05.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx05.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xwd5J0S24z9ryQ\n\tfor <incoming@patchwork.ozlabs.org>;\n\tMon, 18 Sep 2017 17:32:51 +1000 (AEST)","from localhost ([::1]:35039 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dtqXw-0003V1-57\n\tfor incoming@patchwork.ozlabs.org; Mon, 18 Sep 2017 03:32:48 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:55491)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dtqXN-0003Ts-Np\n\tfor qemu-devel@nongnu.org; Mon, 18 Sep 2017 03:32:15 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dtqXJ-0004nu-E8\n\tfor qemu-devel@nongnu.org; Mon, 18 Sep 2017 03:32:13 -0400","from mx1.redhat.com ([209.132.183.28]:40338)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>)\n\tid 1dtqXJ-0004ni-52; Mon, 18 Sep 2017 03:32:09 -0400","from smtp.corp.redhat.com\n\t(int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 53750FED4;\n\tMon, 18 Sep 2017 07:32:07 +0000 (UTC)","from pxdev.xzpeter.org (dhcp-15-224.nay.redhat.com [10.66.15.224])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 6B6515C541;\n\tMon, 18 Sep 2017 07:32:02 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 53750FED4","Date":"Mon, 18 Sep 2017 15:31:59 +0800","From":"Peter Xu <peterx@redhat.com>","To":"\"Dr. David Alan Gilbert\" <dgilbert@redhat.com>","Message-ID":"<20170918073159.GZ3617@pxdev.xzpeter.org>","References":"<20170915054404.19914-1-famz@redhat.com>\n\t<20170915054404.19914-3-famz@redhat.com>\n\t<20170915080350.GS3617@pxdev.xzpeter.org>\n\t<20170915082026.GG17199@lemon> <20170915084234.GA2170@work-vm>\n\t<20170915085247.GH17199@lemon> <20170915092242.GB2170@work-vm>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170915092242.GB2170@work-vm>","User-Agent":"Mutt/1.5.24 (2015-08-30)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.16","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.29]);\n\tMon, 18 Sep 2017 07:32:08 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, Fam Zheng <famz@redhat.com>,\n\tqemu-stable@nongnu.org, Juan Quintela <quintela@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1770042,"web_url":"http://patchwork.ozlabs.org/comment/1770042/","msgid":"<20170918100014.GB2581@work-vm>","list_archive_url":null,"date":"2017-09-18T10:00:15","subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","submitter":{"id":48102,"url":"http://patchwork.ozlabs.org/api/people/48102/","name":"Dr. David Alan Gilbert","email":"dgilbert@redhat.com"},"content":"* Peter Xu (peterx@redhat.com) wrote:\n> On Fri, Sep 15, 2017 at 10:22:43AM +0100, Dr. David Alan Gilbert wrote:\n> > * Fam Zheng (famz@redhat.com) wrote:\n> > > On Fri, 09/15 09:42, Dr. David Alan Gilbert wrote:\n> > > > * Fam Zheng (famz@redhat.com) wrote:\n> > > > > On Fri, 09/15 16:03, Peter Xu wrote:\n> > > > > > On Fri, Sep 15, 2017 at 01:44:03PM +0800, Fam Zheng wrote:\n> > > > > > > bdrv_close_all() would abort() due to op blockers added by BMDS, clean\n> > > > > > > up migration states when main loop quits to avoid that.\n> > > > > > > \n> > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com>\n> > > > > > > ---\n> > > > > > >  include/migration/misc.h | 1 +\n> > > > > > >  migration/migration.c    | 7 ++++++-\n> > > > > > >  vl.c                     | 3 +++\n> > > > > > >  3 files changed, 10 insertions(+), 1 deletion(-)\n> > > > > > > \n> > > > > > > diff --git a/include/migration/misc.h b/include/migration/misc.h\n> > > > > > > index c079b7771b..b9a26b0898 100644\n> > > > > > > --- a/include/migration/misc.h\n> > > > > > > +++ b/include/migration/misc.h\n> > > > > > > @@ -54,5 +54,6 @@ bool migration_has_failed(MigrationState *);\n> > > > > > >  /* ...and after the device transmission */\n> > > > > > >  bool migration_in_postcopy_after_devices(MigrationState *);\n> > > > > > >  void migration_global_dump(Monitor *mon);\n> > > > > > > +void migrate_cancel(void);\n> > > > > > >  \n> > > > > > >  #endif\n> > > > > > > diff --git a/migration/migration.c b/migration/migration.c\n> > > > > > > index 959e8ec88e..2c844945c7 100644\n> > > > > > > --- a/migration/migration.c\n> > > > > > > +++ b/migration/migration.c\n> > > > > > > @@ -1274,11 +1274,16 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,\n> > > > > > >      }\n> > > > > > >  }\n> > > > > > >  \n> > > > > > > -void qmp_migrate_cancel(Error **errp)\n> > > > > > > +void migrate_cancel(void)\n> > > > > > >  {\n> > > > > > >      migrate_fd_cancel(migrate_get_current());\n> > > > > > >  }\n> > > > > > >  \n> > > > > > > +void qmp_migrate_cancel(Error **errp)\n> > > > > > > +{\n> > > > > > > +    migrate_cancel();\n> > > > > > > +}\n> > > > > > > +\n> > > > > > \n> > > > > > Nit: I would prefer just call migrate_fd_cancel() below, since I don't\n> > > > > > see much point to introduce migrate_cancel() which only calls\n> > > > > > migrate_fd_cancel()...\n> > > > > \n> > > > > migrate_get_current() is a migration internal IMHO. But that can be moved to\n> > > > > migrate_fd_cancel() so the parameter is dropped.\n> > > > > \n> > > > > > \n> > > > > > >  void qmp_migrate_set_cache_size(int64_t value, Error **errp)\n> > > > > > >  {\n> > > > > > >      MigrationState *s = migrate_get_current();\n> > > > > > > diff --git a/vl.c b/vl.c\n> > > > > > > index fb1f05b937..abbe61f40b 100644\n> > > > > > > --- a/vl.c\n> > > > > > > +++ b/vl.c\n> > > > > > > @@ -87,6 +87,7 @@ int main(int argc, char **argv)\n> > > > > > >  #include \"sysemu/blockdev.h\"\n> > > > > > >  #include \"hw/block/block.h\"\n> > > > > > >  #include \"migration/misc.h\"\n> > > > > > > +#include \"migration/savevm.h\"\n> > > > > > >  #include \"migration/snapshot.h\"\n> > > > > > >  #include \"migration/global_state.h\"\n> > > > > > >  #include \"sysemu/tpm.h\"\n> > > > > > > @@ -4799,6 +4800,8 @@ int main(int argc, char **argv, char **envp)\n> > > > > > >      iothread_stop_all();\n> > > > > > >  \n> > > > > > >      pause_all_vcpus();\n> > > > > > > +    migrate_cancel();\n> > > > > > \n> > > > > > IIUC this is an async cancel, so when reach here the migration thread\n> > > > > > can still be alive.  Then...\n> > > > > > \n> > > > > > > +    qemu_savevm_state_cleanup();\n> > > > > > \n> > > > > > ... Here calling qemu_savevm_state_cleanup() may be problematic if\n> > > > > > migration thread has not yet quitted.\n> > > > > > \n> > > > > > I'm thinking whether we should make migrate_fd_cancel() wait until the\n> > > > > > migration thread finishes (state change to CANCELLED).  Then the\n> > > > > > migration thread will do the cleanup, and here we can avoid calling\n> > > > > > qemu_savevm_state_cleanup() as well.\n> > > > > \n> > > > > But if the migration thread is stuck and CANCELLED is never reached, we'll hang\n> > > > > here?\n> > > > \n> > > > I'm not sure I see an easy fix; I agree with Peter that calling\n> > > > migrate_cancel() followed by qemu_savevm_state_cleanup() is racy,\n> > > > because the cancel just forces the state to CANCELLING before coming\n> > > > back to you, and the migration thread asynchronously starts to\n> > > > fail/cleanup.\n> > > > \n> > > > migrate_cancel() can forcibly unblock some cases because it calls\n> > > > shutdown(2) on the network fd, but there are other ways for a migration\n> > > > to hang.\n> > > > \n> > > > Having said that,  the migration thread does it's calls\n> > > > to qemu_savevm_state_cleanup under the lock_iothread;\n> > > > Do we have that lock at this point?\n> > > \n> > > Yes we do.  Main loop releases the lock only during poll(), other parts of\n> > > main() all have the lock.\n> > \n> > Having said that though this is pretty confusing; because at this point\n> > we're after main_loop has exited.\n> > I don't think the migration thread will exit without having taken the\n> > iothread lock; so if we've got it at this point then the migration\n> > thread will never exit, and it will never call qemu_savevm_state_cleanup\n> > itself - so that race might not exist?\n> \n> Is that because we are taking the BQL in migration_thread()?  Please\n> see my below questions...\n\nYes; there are a few places we take it.\n\n> > However, assuming the migration is at an earlier point, it might be\n> > calling one of the state handlers that you're cleaning up, and that's\n> > racy in individual devices.\n> > \n> > If we have the lock I don't think we can wait for the migration to\n> > complete/cancel.   The transition from cancelling->cancelled happens\n> > in migrate_fd_cleanup() - and that's run of a bh which I assume don't\n> > work any more at this point.\n> > \n> > Perhaps the answer is to move this to qemu_system_shutdown_request prior\n> > to the point where shutdown_requested is set?  At that point we've\n> > still got the main loop, although hmm I'm not convinced if it's\n> > consistent whether that's called with or without the lock held.\n> \n> I do think the migration cleanup part needs some cleanup itself... :)\n> \n> Actually I have two questions here about BQL and migration:\n> \n> (1) I see that we took BQL in migration_thread(), but if the migration\n>     thread is to be cancelled, could we avoid taking the lock for the\n>     cancelling case right after we break from the big migration loop?\n>     Since I don't see why we need it if we are after all going to\n>     cancel the migration...  (though this one may need some other\n>     cleanups to let it work I guess)\n\nIf you mean the lock just before 'The resource has been allocated...'\nno I don't think so; I'm not sure - I worry about what would happen if\nsomeone issued a migrate_cancel or another migrate command during that\ntime - but then that could happen just before the lock_iothread so I'm\nnot sure we're any worse off.\n\n\n> (2) I see that we took BQL in migrate_fd_cleanup().  Could I ask why\n>     we had that?  Can we remove it?\n\nNote migrate_fd_cleanup is called in a bh, so I think that means it's\nentered with the lock held and that just drops it while we wait\nfor the other thread.\n\nDave\n\n> -- \n> Peter Xu\n--\nDr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=dgilbert@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xwhNH3FY4z9s7g\n\tfor <incoming@patchwork.ozlabs.org>;\n\tMon, 18 Sep 2017 20:01:03 +1000 (AEST)","from localhost ([::1]:35523 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dtsrN-0006DV-Am\n\tfor incoming@patchwork.ozlabs.org; Mon, 18 Sep 2017 06:01:01 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:36942)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <dgilbert@redhat.com>) id 1dtsqv-0006D4-Fb\n\tfor qemu-devel@nongnu.org; Mon, 18 Sep 2017 06:00:39 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <dgilbert@redhat.com>) id 1dtsqp-0003vx-4m\n\tfor qemu-devel@nongnu.org; Mon, 18 Sep 2017 06:00:33 -0400","from mx1.redhat.com ([209.132.183.28]:48312)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <dgilbert@redhat.com>)\n\tid 1dtsqo-0003vU-Sf; Mon, 18 Sep 2017 06:00:27 -0400","from smtp.corp.redhat.com\n\t(int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id CEA7AC047B97;\n\tMon, 18 Sep 2017 10:00:25 +0000 (UTC)","from work-vm (ovpn-117-229.ams2.redhat.com [10.36.117.229])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 95E7F6060C;\n\tMon, 18 Sep 2017 10:00:17 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com CEA7AC047B97","Date":"Mon, 18 Sep 2017 11:00:15 +0100","From":"\"Dr. David Alan Gilbert\" <dgilbert@redhat.com>","To":"Peter Xu <peterx@redhat.com>","Message-ID":"<20170918100014.GB2581@work-vm>","References":"<20170915054404.19914-1-famz@redhat.com>\n\t<20170915054404.19914-3-famz@redhat.com>\n\t<20170915080350.GS3617@pxdev.xzpeter.org>\n\t<20170915082026.GG17199@lemon> <20170915084234.GA2170@work-vm>\n\t<20170915085247.GH17199@lemon> <20170915092242.GB2170@work-vm>\n\t<20170918073159.GZ3617@pxdev.xzpeter.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170918073159.GZ3617@pxdev.xzpeter.org>","User-Agent":"Mutt/1.8.3 (2017-05-23)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.13","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.31]);\n\tMon, 18 Sep 2017 10:00:25 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, Fam Zheng <famz@redhat.com>,\n\tqemu-stable@nongnu.org, Juan Quintela <quintela@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1770747,"web_url":"http://patchwork.ozlabs.org/comment/1770747/","msgid":"<20170919082656.GJ3617@pxdev.xzpeter.org>","list_archive_url":null,"date":"2017-09-19T08:26:56","subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Mon, Sep 18, 2017 at 11:00:15AM +0100, Dr. David Alan Gilbert wrote:\n> * Peter Xu (peterx@redhat.com) wrote:\n> > On Fri, Sep 15, 2017 at 10:22:43AM +0100, Dr. David Alan Gilbert wrote:\n> > > * Fam Zheng (famz@redhat.com) wrote:\n> > > > On Fri, 09/15 09:42, Dr. David Alan Gilbert wrote:\n> > > > > * Fam Zheng (famz@redhat.com) wrote:\n> > > > > > On Fri, 09/15 16:03, Peter Xu wrote:\n> > > > > > > On Fri, Sep 15, 2017 at 01:44:03PM +0800, Fam Zheng wrote:\n> > > > > > > > bdrv_close_all() would abort() due to op blockers added by BMDS, clean\n> > > > > > > > up migration states when main loop quits to avoid that.\n> > > > > > > > \n> > > > > > > > Signed-off-by: Fam Zheng <famz@redhat.com>\n> > > > > > > > ---\n> > > > > > > >  include/migration/misc.h | 1 +\n> > > > > > > >  migration/migration.c    | 7 ++++++-\n> > > > > > > >  vl.c                     | 3 +++\n> > > > > > > >  3 files changed, 10 insertions(+), 1 deletion(-)\n> > > > > > > > \n> > > > > > > > diff --git a/include/migration/misc.h b/include/migration/misc.h\n> > > > > > > > index c079b7771b..b9a26b0898 100644\n> > > > > > > > --- a/include/migration/misc.h\n> > > > > > > > +++ b/include/migration/misc.h\n> > > > > > > > @@ -54,5 +54,6 @@ bool migration_has_failed(MigrationState *);\n> > > > > > > >  /* ...and after the device transmission */\n> > > > > > > >  bool migration_in_postcopy_after_devices(MigrationState *);\n> > > > > > > >  void migration_global_dump(Monitor *mon);\n> > > > > > > > +void migrate_cancel(void);\n> > > > > > > >  \n> > > > > > > >  #endif\n> > > > > > > > diff --git a/migration/migration.c b/migration/migration.c\n> > > > > > > > index 959e8ec88e..2c844945c7 100644\n> > > > > > > > --- a/migration/migration.c\n> > > > > > > > +++ b/migration/migration.c\n> > > > > > > > @@ -1274,11 +1274,16 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,\n> > > > > > > >      }\n> > > > > > > >  }\n> > > > > > > >  \n> > > > > > > > -void qmp_migrate_cancel(Error **errp)\n> > > > > > > > +void migrate_cancel(void)\n> > > > > > > >  {\n> > > > > > > >      migrate_fd_cancel(migrate_get_current());\n> > > > > > > >  }\n> > > > > > > >  \n> > > > > > > > +void qmp_migrate_cancel(Error **errp)\n> > > > > > > > +{\n> > > > > > > > +    migrate_cancel();\n> > > > > > > > +}\n> > > > > > > > +\n> > > > > > > \n> > > > > > > Nit: I would prefer just call migrate_fd_cancel() below, since I don't\n> > > > > > > see much point to introduce migrate_cancel() which only calls\n> > > > > > > migrate_fd_cancel()...\n> > > > > > \n> > > > > > migrate_get_current() is a migration internal IMHO. But that can be moved to\n> > > > > > migrate_fd_cancel() so the parameter is dropped.\n> > > > > > \n> > > > > > > \n> > > > > > > >  void qmp_migrate_set_cache_size(int64_t value, Error **errp)\n> > > > > > > >  {\n> > > > > > > >      MigrationState *s = migrate_get_current();\n> > > > > > > > diff --git a/vl.c b/vl.c\n> > > > > > > > index fb1f05b937..abbe61f40b 100644\n> > > > > > > > --- a/vl.c\n> > > > > > > > +++ b/vl.c\n> > > > > > > > @@ -87,6 +87,7 @@ int main(int argc, char **argv)\n> > > > > > > >  #include \"sysemu/blockdev.h\"\n> > > > > > > >  #include \"hw/block/block.h\"\n> > > > > > > >  #include \"migration/misc.h\"\n> > > > > > > > +#include \"migration/savevm.h\"\n> > > > > > > >  #include \"migration/snapshot.h\"\n> > > > > > > >  #include \"migration/global_state.h\"\n> > > > > > > >  #include \"sysemu/tpm.h\"\n> > > > > > > > @@ -4799,6 +4800,8 @@ int main(int argc, char **argv, char **envp)\n> > > > > > > >      iothread_stop_all();\n> > > > > > > >  \n> > > > > > > >      pause_all_vcpus();\n> > > > > > > > +    migrate_cancel();\n> > > > > > > \n> > > > > > > IIUC this is an async cancel, so when reach here the migration thread\n> > > > > > > can still be alive.  Then...\n> > > > > > > \n> > > > > > > > +    qemu_savevm_state_cleanup();\n> > > > > > > \n> > > > > > > ... Here calling qemu_savevm_state_cleanup() may be problematic if\n> > > > > > > migration thread has not yet quitted.\n> > > > > > > \n> > > > > > > I'm thinking whether we should make migrate_fd_cancel() wait until the\n> > > > > > > migration thread finishes (state change to CANCELLED).  Then the\n> > > > > > > migration thread will do the cleanup, and here we can avoid calling\n> > > > > > > qemu_savevm_state_cleanup() as well.\n> > > > > > \n> > > > > > But if the migration thread is stuck and CANCELLED is never reached, we'll hang\n> > > > > > here?\n> > > > > \n> > > > > I'm not sure I see an easy fix; I agree with Peter that calling\n> > > > > migrate_cancel() followed by qemu_savevm_state_cleanup() is racy,\n> > > > > because the cancel just forces the state to CANCELLING before coming\n> > > > > back to you, and the migration thread asynchronously starts to\n> > > > > fail/cleanup.\n> > > > > \n> > > > > migrate_cancel() can forcibly unblock some cases because it calls\n> > > > > shutdown(2) on the network fd, but there are other ways for a migration\n> > > > > to hang.\n> > > > > \n> > > > > Having said that,  the migration thread does it's calls\n> > > > > to qemu_savevm_state_cleanup under the lock_iothread;\n> > > > > Do we have that lock at this point?\n> > > > \n> > > > Yes we do.  Main loop releases the lock only during poll(), other parts of\n> > > > main() all have the lock.\n> > > \n> > > Having said that though this is pretty confusing; because at this point\n> > > we're after main_loop has exited.\n> > > I don't think the migration thread will exit without having taken the\n> > > iothread lock; so if we've got it at this point then the migration\n> > > thread will never exit, and it will never call qemu_savevm_state_cleanup\n> > > itself - so that race might not exist?\n> > \n> > Is that because we are taking the BQL in migration_thread()?  Please\n> > see my below questions...\n> \n> Yes; there are a few places we take it.\n> \n> > > However, assuming the migration is at an earlier point, it might be\n> > > calling one of the state handlers that you're cleaning up, and that's\n> > > racy in individual devices.\n> > > \n> > > If we have the lock I don't think we can wait for the migration to\n> > > complete/cancel.   The transition from cancelling->cancelled happens\n> > > in migrate_fd_cleanup() - and that's run of a bh which I assume don't\n> > > work any more at this point.\n> > > \n> > > Perhaps the answer is to move this to qemu_system_shutdown_request prior\n> > > to the point where shutdown_requested is set?  At that point we've\n> > > still got the main loop, although hmm I'm not convinced if it's\n> > > consistent whether that's called with or without the lock held.\n> > \n> > I do think the migration cleanup part needs some cleanup itself... :)\n> > \n> > Actually I have two questions here about BQL and migration:\n> > \n> > (1) I see that we took BQL in migration_thread(), but if the migration\n> >     thread is to be cancelled, could we avoid taking the lock for the\n> >     cancelling case right after we break from the big migration loop?\n> >     Since I don't see why we need it if we are after all going to\n> >     cancel the migration...  (though this one may need some other\n> >     cleanups to let it work I guess)\n> \n> If you mean the lock just before 'The resource has been allocated...'\n> no I don't think so; I'm not sure - I worry about what would happen if\n> someone issued a migrate_cancel or another migrate command during that\n> time - but then that could happen just before the lock_iothread so I'm\n> not sure we're any worse off.\n\nFor these cases, I think we should first check the migration status,\nthen we proceed.  I think that's what we have done in qmp_migrate(),\nhowever something we missed in qmp_migrate_cancel() (so I think we\nshould add it soon).\n\nBesides these cases, IMHO the BQL should be used for either vm_start()\nor runstate_set() below.  However again, I'm thinking whether we can\navoid taking the lock for \"cancelling\" case.  If my understanding is\ncorrect above, I would like to give it a shot.\n\n> \n> \n> > (2) I see that we took BQL in migrate_fd_cleanup().  Could I ask why\n> >     we had that?  Can we remove it?\n> \n> Note migrate_fd_cleanup is called in a bh, so I think that means it's\n> entered with the lock held and that just drops it while we wait\n> for the other thread.\n\nAh, sorry I obviously misread the code on the ordering of\nlock/unlock...  Thanks,","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx09.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx09.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xxGFx63fXz9s7M\n\tfor <incoming@patchwork.ozlabs.org>;\n\tTue, 19 Sep 2017 18:27:33 +1000 (AEST)","from localhost ([::1]:40758 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1duDsR-0004FP-Vw\n\tfor incoming@patchwork.ozlabs.org; Tue, 19 Sep 2017 04:27:32 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:34726)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1duDs2-0004DR-I6\n\tfor qemu-devel@nongnu.org; Tue, 19 Sep 2017 04:27:08 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1duDrz-0004MK-CL\n\tfor qemu-devel@nongnu.org; Tue, 19 Sep 2017 04:27:06 -0400","from mx1.redhat.com ([209.132.183.28]:39673)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>)\n\tid 1duDrz-0004Ky-3A; Tue, 19 Sep 2017 04:27:03 -0400","from smtp.corp.redhat.com\n\t(int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 28A412DDB9;\n\tTue, 19 Sep 2017 08:27:02 +0000 (UTC)","from pxdev.xzpeter.org (ovpn-12-66.pek2.redhat.com [10.72.12.66])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 2D6395C541;\n\tTue, 19 Sep 2017 08:26:57 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 28A412DDB9","Date":"Tue, 19 Sep 2017 16:26:56 +0800","From":"Peter Xu <peterx@redhat.com>","To":"\"Dr. David Alan Gilbert\" <dgilbert@redhat.com>","Message-ID":"<20170919082656.GJ3617@pxdev.xzpeter.org>","References":"<20170915054404.19914-1-famz@redhat.com>\n\t<20170915054404.19914-3-famz@redhat.com>\n\t<20170915080350.GS3617@pxdev.xzpeter.org>\n\t<20170915082026.GG17199@lemon> <20170915084234.GA2170@work-vm>\n\t<20170915085247.GH17199@lemon> <20170915092242.GB2170@work-vm>\n\t<20170918073159.GZ3617@pxdev.xzpeter.org>\n\t<20170918100014.GB2581@work-vm>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170918100014.GB2581@work-vm>","User-Agent":"Mutt/1.5.24 (2015-08-30)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.16","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.38]);\n\tTue, 19 Sep 2017 08:27:02 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [PATCH 2/3] migration: Cancel migration at exit","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"qemu-devel@nongnu.org, Fam Zheng <famz@redhat.com>,\n\tqemu-stable@nongnu.org, Juan Quintela <quintela@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}}]