[{"id":1771340,"web_url":"http://patchwork.ozlabs.org/comment/1771340/","msgid":"<810ee791-fecd-c0b7-fefd-66a78c2de925@redhat.com>","list_archive_url":null,"date":"2017-09-19T19:59:37","subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","submitter":{"id":6591,"url":"http://patchwork.ozlabs.org/api/people/6591/","name":"Eric Blake","email":"eblake@redhat.com"},"content":"On 09/14/2017 02:50 AM, Peter Xu wrote:\n> This is not a problem if we are only having one single loop thread like\n> before.  However, after per-monitor thread is introduced, this is not\n> true any more, and the race can happen.\n> \n> The race can be triggered with \"make check -j8\" sometimes:\n> \n>   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:\n>   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.\n> \n> This patch keeps the reference for the watch object when creating in\n> io_add_watch_poll(), so that the object will never be released in the\n> context main loop, especially when the context loop is running in\n> another standalone thread.  Meanwhile, when we want to remove the watch\n> object, we always first detach the watch object from its owner context,\n> then we continue with the cleanup.\n> \n> Without this patch, calling io_remove_watch_poll() in main loop thread\n> is not thread-safe, since the other per-monitor thread may be modifying\n> the watch object at the same time.\n> \n> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>\n> Signed-off-by: Peter Xu <peterx@redhat.com>\n> ---\n\n> +     * Let's blame the glib bug mentioned in commit 2b3167 (again) for\n\nThat 6-char commit id may become ambiguous soon (it's still rare to see\nambiguity with an 8-char id, although I've seen it more in recent times\nthan in the past; and git itself has moved from a 7-char default\nabbreviation length in the 2.0 days to what is now a 10-char default\nabbreviation in 2.13.5).","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx04.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx04.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=eblake@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xxYdH0Dg8z9s81\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 20 Sep 2017 06:00:18 +1000 (AEST)","from localhost ([::1]:45131 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1duOgp-0006Ue-J6\n\tfor incoming@patchwork.ozlabs.org; Tue, 19 Sep 2017 16:00:15 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:59212)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <eblake@redhat.com>) id 1duOgU-0006TD-9m\n\tfor qemu-devel@nongnu.org; Tue, 19 Sep 2017 15:59:55 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <eblake@redhat.com>) id 1duOgR-0006sW-6N\n\tfor qemu-devel@nongnu.org; Tue, 19 Sep 2017 15:59:54 -0400","from mx1.redhat.com ([209.132.183.28]:58074)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <eblake@redhat.com>) id 1duOgQ-0006r1-TE\n\tfor qemu-devel@nongnu.org; Tue, 19 Sep 2017 15:59:51 -0400","from smtp.corp.redhat.com\n\t(int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 865387EA9A;\n\tTue, 19 Sep 2017 19:59:49 +0000 (UTC)","from [10.10.124.97] (ovpn-124-97.rdu2.redhat.com [10.10.124.97])\n\tby smtp.corp.redhat.com (Postfix) with ESMTP id 5E5095C88F;\n\tTue, 19 Sep 2017 19:59:39 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 865387EA9A","To":"Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org","References":"<1505375436-28439-1-git-send-email-peterx@redhat.com>\n\t<1505375436-28439-2-git-send-email-peterx@redhat.com>","From":"Eric Blake <eblake@redhat.com>","Openpgp":"url=http://people.redhat.com/eblake/eblake.gpg","Organization":"Red Hat, Inc.","Message-ID":"<810ee791-fecd-c0b7-fefd-66a78c2de925@redhat.com>","Date":"Tue, 19 Sep 2017 14:59:37 -0500","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<1505375436-28439-2-git-send-email-peterx@redhat.com>","Content-Type":"multipart/signed; micalg=pgp-sha256;\n\tprotocol=\"application/pgp-signature\";\n\tboundary=\"jfmbqRSkXpscpgX0pmvqWoWQfs9BeQrRG\"","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.16","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.28]);\n\tTue, 19 Sep 2017 19:59:49 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","X-Content-Filtered-By":"Mailman/MimeDel 2.1.21","Subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Laurent Vivier <lvivier@redhat.com>, Fam Zheng <famz@redhat.com>, Juan\n\tQuintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>,\n\tmdroth@linux.vnet.ibm.com, \tStefan Hajnoczi <shajnocz@redhat.com>,\n\t=?utf-8?q?Marc-Andr=C3=A9_Lure?= =?utf-8?q?au?=\n\t<marcandre.lureau@gmail.com>, \tPaolo Bonzini <pbonzini@redhat.com>,\n\t\"Dr . David Alan Gilbert\" <dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1771555,"web_url":"http://patchwork.ozlabs.org/comment/1771555/","msgid":"<20170920044412.GS3617@pxdev.xzpeter.org>","list_archive_url":null,"date":"2017-09-20T04:44:12","subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Tue, Sep 19, 2017 at 02:59:37PM -0500, Eric Blake wrote:\n> On 09/14/2017 02:50 AM, Peter Xu wrote:\n> > This is not a problem if we are only having one single loop thread like\n> > before.  However, after per-monitor thread is introduced, this is not\n> > true any more, and the race can happen.\n> > \n> > The race can be triggered with \"make check -j8\" sometimes:\n> > \n> >   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:\n> >   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.\n> > \n> > This patch keeps the reference for the watch object when creating in\n> > io_add_watch_poll(), so that the object will never be released in the\n> > context main loop, especially when the context loop is running in\n> > another standalone thread.  Meanwhile, when we want to remove the watch\n> > object, we always first detach the watch object from its owner context,\n> > then we continue with the cleanup.\n> > \n> > Without this patch, calling io_remove_watch_poll() in main loop thread\n> > is not thread-safe, since the other per-monitor thread may be modifying\n> > the watch object at the same time.\n> > \n> > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>\n> > Signed-off-by: Peter Xu <peterx@redhat.com>\n> > ---\n> \n> > +     * Let's blame the glib bug mentioned in commit 2b3167 (again) for\n> \n> That 6-char commit id may become ambiguous soon (it's still rare to see\n> ambiguity with an 8-char id, although I've seen it more in recent times\n> than in the past; and git itself has moved from a 7-char default\n> abbreviation length in the 2.0 days to what is now a 10-char default\n> abbreviation in 2.13.5).\n\nThanks for noticing this.  I'll use 10 chars in next post.","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx09.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx09.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xxnGY2kVmz9s4q\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 20 Sep 2017 14:44:51 +1000 (AEST)","from localhost ([::1]:46656 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1duWsR-0005x5-Vg\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 00:44:47 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:59079)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1duWs6-0005wp-Ox\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 00:44:28 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1duWs2-0003TJ-RT\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 00:44:26 -0400","from mx1.redhat.com ([209.132.183.28]:43556)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>) id 1duWs2-0003PU-J6\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 00:44:22 -0400","from smtp.corp.redhat.com\n\t(int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 815FF4E4F3;\n\tWed, 20 Sep 2017 04:44:21 +0000 (UTC)","from pxdev.xzpeter.org (dhcp-15-224.nay.redhat.com [10.66.15.224])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 2875D5C541;\n\tWed, 20 Sep 2017 04:44:15 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 815FF4E4F3","Date":"Wed, 20 Sep 2017 12:44:12 +0800","From":"Peter Xu <peterx@redhat.com>","To":"Eric Blake <eblake@redhat.com>","Message-ID":"<20170920044412.GS3617@pxdev.xzpeter.org>","References":"<1505375436-28439-1-git-send-email-peterx@redhat.com>\n\t<1505375436-28439-2-git-send-email-peterx@redhat.com>\n\t<810ee791-fecd-c0b7-fefd-66a78c2de925@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<810ee791-fecd-c0b7-fefd-66a78c2de925@redhat.com>","User-Agent":"Mutt/1.5.24 (2015-08-30)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.16","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.38]);\n\tWed, 20 Sep 2017 04:44:21 +0000 (UTC)","Content-Transfer-Encoding":"quoted-printable","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Laurent Vivier <lvivier@redhat.com>, Fam Zheng <famz@redhat.com>,\n\tmdroth@linux.vnet.ibm.com, Juan Quintela <quintela@redhat.com>,\n\tMarkus Armbruster <armbru@redhat.com>, qemu-devel@nongnu.org, Stefan\n\tHajnoczi <shajnocz@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9?=\n\tLureau <marcandre.lureau@gmail.com>,\n\tPaolo Bonzini <pbonzini@redhat.com>, \"Dr . David Alan Gilbert\"\n\t<dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1771657,"web_url":"http://patchwork.ozlabs.org/comment/1771657/","msgid":"<20170920075703.GA4053@redhat.com>","list_archive_url":null,"date":"2017-09-20T07:57:03","subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","submitter":{"id":2694,"url":"http://patchwork.ozlabs.org/api/people/2694/","name":"Daniel P. Berrangé","email":"berrange@redhat.com"},"content":"On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote:\n> This is not a problem if we are only having one single loop thread like\n> before.  However, after per-monitor thread is introduced, this is not\n> true any more, and the race can happen.\n> \n> The race can be triggered with \"make check -j8\" sometimes:\n> \n>   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:\n>   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.\n> \n> This patch keeps the reference for the watch object when creating in\n> io_add_watch_poll(), so that the object will never be released in the\n> context main loop, especially when the context loop is running in\n> another standalone thread.  Meanwhile, when we want to remove the watch\n> object, we always first detach the watch object from its owner context,\n> then we continue with the cleanup.\n> \n> Without this patch, calling io_remove_watch_poll() in main loop thread\n> is not thread-safe, since the other per-monitor thread may be modifying\n> the watch object at the same time.\n\nThis doesn't feel right to me. Why is the main loop thread doing anything\nat all with the Chardev, if there is a per-monitor thread ? The Chardev\ncode isn't thread safe so it isn't safe to have two separate threads\naccessing the same Chardev. IOW, if we want a per-monitor thread, then\nwe must make sure the main thread never touches that monitor's chardev\nat all.  While your patch here might have avoided the assertion you\nmention above, I fear this is just papering over a fundamental problem\nthat still exists, that can only be solved by not letting the mainloop\ntouch the chardev at all.\n\n> \n> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>\n> Signed-off-by: Peter Xu <peterx@redhat.com>\n> ---\n>  chardev/char-io.c | 15 +++++++++++++--\n>  1 file changed, 13 insertions(+), 2 deletions(-)\n> \n> diff --git a/chardev/char-io.c b/chardev/char-io.c\n> index f810524..3828c20 100644\n> --- a/chardev/char-io.c\n> +++ b/chardev/char-io.c\n> @@ -122,7 +122,6 @@ GSource *io_add_watch_poll(Chardev *chr,\n>      g_free(name);\n>  \n>      g_source_attach(&iwp->parent, context);\n> -    g_source_unref(&iwp->parent);\n>      return (GSource *)iwp;\n>  }\n>  \n> @@ -131,12 +130,24 @@ static void io_remove_watch_poll(GSource *source)\n>      IOWatchPoll *iwp;\n>  \n>      iwp = io_watch_poll_from_source(source);\n> +\n> +    /*\n> +     * Here the order of destruction really matters.  We need to first\n> +     * detach the IOWatchPoll object from the context (which may still\n> +     * be running in another loop thread), only after that could we\n> +     * continue to operate on iwp->src, or there may be race condition\n> +     * between current thread and the context loop thread.\n> +     *\n> +     * Let's blame the glib bug mentioned in commit 2b3167 (again) for\n> +     * this extra complexity.\n> +     */\n> +    g_source_destroy(&iwp->parent);\n>      if (iwp->src) {\n>          g_source_destroy(iwp->src);\n>          g_source_unref(iwp->src);\n>          iwp->src = NULL;\n>      }\n> -    g_source_destroy(&iwp->parent);\n> +    g_source_unref(&iwp->parent);\n>  }\n>  \n>  void remove_fd_in_watch(Chardev *chr)\n> -- \n> 2.7.4\n> \n\nRegards,\nDaniel","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx01.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx01.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=berrange@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xxsYB3FBHz9s8J\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 20 Sep 2017 17:57:46 +1000 (AEST)","from localhost ([::1]:47285 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1duZt8-0006jb-6h\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 03:57:42 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:43421)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <berrange@redhat.com>) id 1duZso-0006jU-Q3\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 03:57:24 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <berrange@redhat.com>) id 1duZsl-0003Ry-Ht\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 03:57:22 -0400","from mx1.redhat.com ([209.132.183.28]:39392)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <berrange@redhat.com>) id 1duZsl-0003NS-9E\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 03:57:19 -0400","from smtp.corp.redhat.com\n\t(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 10CFB81DE3;\n\tWed, 20 Sep 2017 07:57:18 +0000 (UTC)","from redhat.com (unknown [10.33.36.25])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id E1DCE610B0;\n\tWed, 20 Sep 2017 07:57:06 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 10CFB81DE3","Date":"Wed, 20 Sep 2017 08:57:03 +0100","From":"\"Daniel P. Berrange\" <berrange@redhat.com>","To":"Peter Xu <peterx@redhat.com>","Message-ID":"<20170920075703.GA4053@redhat.com>","References":"<1505375436-28439-1-git-send-email-peterx@redhat.com>\n\t<1505375436-28439-2-git-send-email-peterx@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<1505375436-28439-2-git-send-email-peterx@redhat.com>","User-Agent":"Mutt/1.8.3 (2017-05-23)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.12","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.25]);\n\tWed, 20 Sep 2017 07:57:18 +0000 (UTC)","Content-Transfer-Encoding":"quoted-printable","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Reply-To":"\"Daniel P. Berrange\" <berrange@redhat.com>","Cc":"Laurent Vivier <lvivier@redhat.com>, Fam Zheng <famz@redhat.com>, Juan\n\tQuintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>,\n\tmdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,\n\tStefan Hajnoczi <shajnocz@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9?=\n\tLureau <marcandre.lureau@gmail.com>,\n\tPaolo Bonzini <pbonzini@redhat.com>, \"Dr . David Alan Gilbert\"\n\t<dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1771863,"web_url":"http://patchwork.ozlabs.org/comment/1771863/","msgid":"<20170920104958.GA30661@pxdev.xzpeter.org>","list_archive_url":null,"date":"2017-09-20T10:49:58","subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Wed, Sep 20, 2017 at 10:14:38AM +0100, Daniel P. Berrange wrote:\n> On Wed, Sep 20, 2017 at 05:09:26PM +0800, Peter Xu wrote:\n> > On Wed, Sep 20, 2017 at 08:57:03AM +0100, Daniel P. Berrange wrote:\n> > > On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote:\n> > > > This is not a problem if we are only having one single loop thread like\n> > > > before.  However, after per-monitor thread is introduced, this is not\n> > > > true any more, and the race can happen.\n> > > > \n> > > > The race can be triggered with \"make check -j8\" sometimes:\n> > > > \n> > > >   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:\n> > > >   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.\n> > > > \n> > > > This patch keeps the reference for the watch object when creating in\n> > > > io_add_watch_poll(), so that the object will never be released in the\n> > > > context main loop, especially when the context loop is running in\n> > > > another standalone thread.  Meanwhile, when we want to remove the watch\n> > > > object, we always first detach the watch object from its owner context,\n> > > > then we continue with the cleanup.\n> > > > \n> > > > Without this patch, calling io_remove_watch_poll() in main loop thread\n> > > > is not thread-safe, since the other per-monitor thread may be modifying\n> > > > the watch object at the same time.\n> > > \n> > > This doesn't feel right to me. Why is the main loop thread doing anything\n> > > at all with the Chardev, if there is a per-monitor thread ? The Chardev\n> > > code isn't thread safe so it isn't safe to have two separate threads\n> > > accessing the same Chardev. IOW, if we want a per-monitor thread, then\n> > > we must make sure the main thread never touches that monitor's chardev\n> > > at all.  While your patch here might have avoided the assertion you\n> > > mention above, I fear this is just papering over a fundamental problem\n> > > that still exists, that can only be solved by not letting the mainloop\n> > > touch the chardev at all.\n> > \n> > The stack I encountered:\n> > \n> > #0  0x00007f658234c765 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54\n> > #1  0x00007f658234e36a in __GI_abort () at abort.c:89\n> > #2  0x00007f6582344f97 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55c76345fce1 \"iwp->src == NULL\", file=file@entry=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=line@entry=91, function=function@entry=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:92\n> > #3  0x00007f6582345042 in __GI___assert_fail (assertion=0x55c76345fce1 \"iwp->src == NULL\", file=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=91, function=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:101\n> > #4  0x000055c7632c2be5 in io_watch_poll_finalize (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:91\n> > #5  0x00007f65847bb859 in g_source_unref_internal () at /lib64/libglib-2.0.so.0\n> > #6  0x00007f65847bca29 in g_source_destroy_internal () at /lib64/libglib-2.0.so.0\n> > #7  0x000055c7632c2d30 in io_remove_watch_poll (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:139\n> > #8  0x000055c7632c2d5c in remove_fd_in_watch (chr=0x55c7651ccdf0) at /root/git/qemu/chardev/char-io.c:145\n> > #9  0x000055c7632c2368 in qemu_chr_fe_set_handlers (b=0x55c7651f6410, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true)\n> >     at /root/git/qemu/chardev/char-fe.c:267\n> > #10 0x000055c7632c2221 in qemu_chr_fe_deinit (b=0x55c7651f6410, del=false) at /root/git/qemu/chardev/char-fe.c:231\n> > #11 0x000055c762e2b15c in monitor_data_destroy (mon=0x55c7651f6410) at /root/git/qemu/monitor.c:600\n> > #12 0x000055c762e340ec in monitor_cleanup () at /root/git/qemu/monitor.c:4346\n> > #13 0x000055c762f9445d in main (argc=19, argv=0x7ffc6846d0e8, envp=0x7ffc6846d188) at /root/git/qemu/vl.c:4889\n> > \n> > So it's destroying the CharBackend, but it'll then call\n> > qemu_chr_fe_set_handlers() which finally tries to remove the watch poll.\n> \n> Ok that code is broken - it must not call monitor_cleanup from the main\n> thread - it needs to be called from the monitor thread, unless it can\n> guarantee that the monitor thread has already exited, which seems unlikely\n\nThe problem is that not all monitors are parsed in the IO thread, but\nonly those with use_io_thr=true set.\n\nHow about I move the calls of monitor_data_destroy() into that monitor\nIO thread when use_io_thr=true?  And for the rest, I think they still\nneed to be destroyed in the main thread.\n\n> \n> Regards,\n> Daniel\n> -- \n> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|\n> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|\n> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx06.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xy1Gq6DfCz9s7c\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 20 Sep 2017 23:45:55 +1000 (AEST)","from localhost ([::1]:48174 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dufK5-0001Wu-TP\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 09:45:53 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:45455)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dues6-0003TC-VK\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:17:02 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1duert-0001JY-Qo\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:16:58 -0400","from mx1.redhat.com ([209.132.183.28]:47740)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>) id 1duert-0001IK-1n\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:16:45 -0400","from smtp.corp.redhat.com\n\t(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id DB8471F56A;\n\tWed, 20 Sep 2017 10:50:18 +0000 (UTC)","from pxdev.xzpeter.org (ovpn-12-43.pek2.redhat.com [10.72.12.43])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id EC6F66017B;\n\tWed, 20 Sep 2017 10:50:00 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com DB8471F56A","Date":"Wed, 20 Sep 2017 18:49:58 +0800","From":"Peter Xu <peterx@redhat.com>","To":"\"Daniel P. Berrange\" <berrange@redhat.com>","Message-ID":"<20170920104958.GA30661@pxdev.xzpeter.org>","References":"<1505375436-28439-1-git-send-email-peterx@redhat.com>\n\t<1505375436-28439-2-git-send-email-peterx@redhat.com>\n\t<20170920075703.GA4053@redhat.com>\n\t<20170920090926.GA31306@pxdev.xzpeter.org>\n\t<20170920091438.GB4053@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170920091438.GB4053@redhat.com>","User-Agent":"Mutt/1.5.24 (2015-08-30)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.11","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.30]);\n\tWed, 20 Sep 2017 10:50:19 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Laurent Vivier <lvivier@redhat.com>, Fam Zheng <famz@redhat.com>, Juan\n\tQuintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>,\n\tmdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,\n\tStefan Hajnoczi <shajnocz@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9?=\n\tLureau <marcandre.lureau@gmail.com>,\n\tPaolo Bonzini <pbonzini@redhat.com>, \"Dr . David Alan Gilbert\"\n\t<dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1771900,"web_url":"http://patchwork.ozlabs.org/comment/1771900/","msgid":"<20170920110309.GF4053@redhat.com>","list_archive_url":null,"date":"2017-09-20T11:03:09","subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","submitter":{"id":2694,"url":"http://patchwork.ozlabs.org/api/people/2694/","name":"Daniel P. Berrangé","email":"berrange@redhat.com"},"content":"On Wed, Sep 20, 2017 at 06:49:58PM +0800, Peter Xu wrote:\n> On Wed, Sep 20, 2017 at 10:14:38AM +0100, Daniel P. Berrange wrote:\n> > On Wed, Sep 20, 2017 at 05:09:26PM +0800, Peter Xu wrote:\n> > > On Wed, Sep 20, 2017 at 08:57:03AM +0100, Daniel P. Berrange wrote:\n> > > > On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote:\n> > > > > This is not a problem if we are only having one single loop thread like\n> > > > > before.  However, after per-monitor thread is introduced, this is not\n> > > > > true any more, and the race can happen.\n> > > > > \n> > > > > The race can be triggered with \"make check -j8\" sometimes:\n> > > > > \n> > > > >   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:\n> > > > >   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.\n> > > > > \n> > > > > This patch keeps the reference for the watch object when creating in\n> > > > > io_add_watch_poll(), so that the object will never be released in the\n> > > > > context main loop, especially when the context loop is running in\n> > > > > another standalone thread.  Meanwhile, when we want to remove the watch\n> > > > > object, we always first detach the watch object from its owner context,\n> > > > > then we continue with the cleanup.\n> > > > > \n> > > > > Without this patch, calling io_remove_watch_poll() in main loop thread\n> > > > > is not thread-safe, since the other per-monitor thread may be modifying\n> > > > > the watch object at the same time.\n> > > > \n> > > > This doesn't feel right to me. Why is the main loop thread doing anything\n> > > > at all with the Chardev, if there is a per-monitor thread ? The Chardev\n> > > > code isn't thread safe so it isn't safe to have two separate threads\n> > > > accessing the same Chardev. IOW, if we want a per-monitor thread, then\n> > > > we must make sure the main thread never touches that monitor's chardev\n> > > > at all.  While your patch here might have avoided the assertion you\n> > > > mention above, I fear this is just papering over a fundamental problem\n> > > > that still exists, that can only be solved by not letting the mainloop\n> > > > touch the chardev at all.\n> > > \n> > > The stack I encountered:\n> > > \n> > > #0  0x00007f658234c765 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54\n> > > #1  0x00007f658234e36a in __GI_abort () at abort.c:89\n> > > #2  0x00007f6582344f97 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55c76345fce1 \"iwp->src == NULL\", file=file@entry=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=line@entry=91, function=function@entry=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:92\n> > > #3  0x00007f6582345042 in __GI___assert_fail (assertion=0x55c76345fce1 \"iwp->src == NULL\", file=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=91, function=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:101\n> > > #4  0x000055c7632c2be5 in io_watch_poll_finalize (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:91\n> > > #5  0x00007f65847bb859 in g_source_unref_internal () at /lib64/libglib-2.0.so.0\n> > > #6  0x00007f65847bca29 in g_source_destroy_internal () at /lib64/libglib-2.0.so.0\n> > > #7  0x000055c7632c2d30 in io_remove_watch_poll (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:139\n> > > #8  0x000055c7632c2d5c in remove_fd_in_watch (chr=0x55c7651ccdf0) at /root/git/qemu/chardev/char-io.c:145\n> > > #9  0x000055c7632c2368 in qemu_chr_fe_set_handlers (b=0x55c7651f6410, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true)\n> > >     at /root/git/qemu/chardev/char-fe.c:267\n> > > #10 0x000055c7632c2221 in qemu_chr_fe_deinit (b=0x55c7651f6410, del=false) at /root/git/qemu/chardev/char-fe.c:231\n> > > #11 0x000055c762e2b15c in monitor_data_destroy (mon=0x55c7651f6410) at /root/git/qemu/monitor.c:600\n> > > #12 0x000055c762e340ec in monitor_cleanup () at /root/git/qemu/monitor.c:4346\n> > > #13 0x000055c762f9445d in main (argc=19, argv=0x7ffc6846d0e8, envp=0x7ffc6846d188) at /root/git/qemu/vl.c:4889\n> > > \n> > > So it's destroying the CharBackend, but it'll then call\n> > > qemu_chr_fe_set_handlers() which finally tries to remove the watch poll.\n> > \n> > Ok that code is broken - it must not call monitor_cleanup from the main\n> > thread - it needs to be called from the monitor thread, unless it can\n> > guarantee that the monitor thread has already exited, which seems unlikely\n> \n> The problem is that not all monitors are parsed in the IO thread, but\n> only those with use_io_thr=true set.\n> \n> How about I move the calls of monitor_data_destroy() into that monitor\n> IO thread when use_io_thr=true?  And for the rest, I think they still\n> need to be destroyed in the main thread.\n\nI think having the monitor sometimes run in the main thread and sometimes\nrun in a background thread is a recipe for ongoing trouble, of which this\nproblem is just the first example that will hurt us. People will test\nbehaviour of a feature with one setup and then users will later run it in\na different setup and potentially experiance obscure bugs as a result.\nIOW, use_io_thr flag should not exist, and every monitor should be run\nunconditionally in the background thread from the point at which your\npatch series merges.\n\nRegards,\nDaniel","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=berrange@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xy1qz4tGfz9s82\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 21 Sep 2017 00:11:11 +1000 (AEST)","from localhost ([::1]:48331 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dufiX-0007eo-MS\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 10:11:09 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:37693)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <berrange@redhat.com>) id 1duf3M-0004N1-W7\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:28:38 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <berrange@redhat.com>) id 1duf3J-0005DZ-Mq\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:28:36 -0400","from mx1.redhat.com ([209.132.183.28]:57654)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <berrange@redhat.com>) id 1duf3J-0005Cp-DX\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:28:33 -0400","from smtp.corp.redhat.com\n\t(int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 3C420C062EA5;\n\tWed, 20 Sep 2017 11:03:18 +0000 (UTC)","from redhat.com (unknown [10.33.36.25])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id CEF5F5C542;\n\tWed, 20 Sep 2017 11:03:12 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 3C420C062EA5","Date":"Wed, 20 Sep 2017 12:03:09 +0100","From":"\"Daniel P. Berrange\" <berrange@redhat.com>","To":"Peter Xu <peterx@redhat.com>","Message-ID":"<20170920110309.GF4053@redhat.com>","References":"<1505375436-28439-1-git-send-email-peterx@redhat.com>\n\t<1505375436-28439-2-git-send-email-peterx@redhat.com>\n\t<20170920075703.GA4053@redhat.com>\n\t<20170920090926.GA31306@pxdev.xzpeter.org>\n\t<20170920091438.GB4053@redhat.com>\n\t<20170920104958.GA30661@pxdev.xzpeter.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170920104958.GA30661@pxdev.xzpeter.org>","User-Agent":"Mutt/1.8.3 (2017-05-23)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.16","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.31]);\n\tWed, 20 Sep 2017 11:03:18 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Reply-To":"\"Daniel P. Berrange\" <berrange@redhat.com>","Cc":"Laurent Vivier <lvivier@redhat.com>, Fam Zheng <famz@redhat.com>, Juan\n\tQuintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>,\n\tmdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,\n\tStefan Hajnoczi <shajnocz@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9?=\n\tLureau <marcandre.lureau@gmail.com>,\n\tPaolo Bonzini <pbonzini@redhat.com>, \"Dr . David Alan Gilbert\"\n\t<dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1771942,"web_url":"http://patchwork.ozlabs.org/comment/1771942/","msgid":"<20170920111849.GB30661@pxdev.xzpeter.org>","list_archive_url":null,"date":"2017-09-20T11:18:49","subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Wed, Sep 20, 2017 at 12:03:09PM +0100, Daniel P. Berrange wrote:\n> On Wed, Sep 20, 2017 at 06:49:58PM +0800, Peter Xu wrote:\n> > On Wed, Sep 20, 2017 at 10:14:38AM +0100, Daniel P. Berrange wrote:\n> > > On Wed, Sep 20, 2017 at 05:09:26PM +0800, Peter Xu wrote:\n> > > > On Wed, Sep 20, 2017 at 08:57:03AM +0100, Daniel P. Berrange wrote:\n> > > > > On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote:\n> > > > > > This is not a problem if we are only having one single loop thread like\n> > > > > > before.  However, after per-monitor thread is introduced, this is not\n> > > > > > true any more, and the race can happen.\n> > > > > > \n> > > > > > The race can be triggered with \"make check -j8\" sometimes:\n> > > > > > \n> > > > > >   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:\n> > > > > >   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.\n> > > > > > \n> > > > > > This patch keeps the reference for the watch object when creating in\n> > > > > > io_add_watch_poll(), so that the object will never be released in the\n> > > > > > context main loop, especially when the context loop is running in\n> > > > > > another standalone thread.  Meanwhile, when we want to remove the watch\n> > > > > > object, we always first detach the watch object from its owner context,\n> > > > > > then we continue with the cleanup.\n> > > > > > \n> > > > > > Without this patch, calling io_remove_watch_poll() in main loop thread\n> > > > > > is not thread-safe, since the other per-monitor thread may be modifying\n> > > > > > the watch object at the same time.\n> > > > > \n> > > > > This doesn't feel right to me. Why is the main loop thread doing anything\n> > > > > at all with the Chardev, if there is a per-monitor thread ? The Chardev\n> > > > > code isn't thread safe so it isn't safe to have two separate threads\n> > > > > accessing the same Chardev. IOW, if we want a per-monitor thread, then\n> > > > > we must make sure the main thread never touches that monitor's chardev\n> > > > > at all.  While your patch here might have avoided the assertion you\n> > > > > mention above, I fear this is just papering over a fundamental problem\n> > > > > that still exists, that can only be solved by not letting the mainloop\n> > > > > touch the chardev at all.\n> > > > \n> > > > The stack I encountered:\n> > > > \n> > > > #0  0x00007f658234c765 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54\n> > > > #1  0x00007f658234e36a in __GI_abort () at abort.c:89\n> > > > #2  0x00007f6582344f97 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55c76345fce1 \"iwp->src == NULL\", file=file@entry=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=line@entry=91, function=function@entry=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:92\n> > > > #3  0x00007f6582345042 in __GI___assert_fail (assertion=0x55c76345fce1 \"iwp->src == NULL\", file=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=91, function=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:101\n> > > > #4  0x000055c7632c2be5 in io_watch_poll_finalize (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:91\n> > > > #5  0x00007f65847bb859 in g_source_unref_internal () at /lib64/libglib-2.0.so.0\n> > > > #6  0x00007f65847bca29 in g_source_destroy_internal () at /lib64/libglib-2.0.so.0\n> > > > #7  0x000055c7632c2d30 in io_remove_watch_poll (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:139\n> > > > #8  0x000055c7632c2d5c in remove_fd_in_watch (chr=0x55c7651ccdf0) at /root/git/qemu/chardev/char-io.c:145\n> > > > #9  0x000055c7632c2368 in qemu_chr_fe_set_handlers (b=0x55c7651f6410, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true)\n> > > >     at /root/git/qemu/chardev/char-fe.c:267\n> > > > #10 0x000055c7632c2221 in qemu_chr_fe_deinit (b=0x55c7651f6410, del=false) at /root/git/qemu/chardev/char-fe.c:231\n> > > > #11 0x000055c762e2b15c in monitor_data_destroy (mon=0x55c7651f6410) at /root/git/qemu/monitor.c:600\n> > > > #12 0x000055c762e340ec in monitor_cleanup () at /root/git/qemu/monitor.c:4346\n> > > > #13 0x000055c762f9445d in main (argc=19, argv=0x7ffc6846d0e8, envp=0x7ffc6846d188) at /root/git/qemu/vl.c:4889\n> > > > \n> > > > So it's destroying the CharBackend, but it'll then call\n> > > > qemu_chr_fe_set_handlers() which finally tries to remove the watch poll.\n> > > \n> > > Ok that code is broken - it must not call monitor_cleanup from the main\n> > > thread - it needs to be called from the monitor thread, unless it can\n> > > guarantee that the monitor thread has already exited, which seems unlikely\n> > \n> > The problem is that not all monitors are parsed in the IO thread, but\n> > only those with use_io_thr=true set.\n> > \n> > How about I move the calls of monitor_data_destroy() into that monitor\n> > IO thread when use_io_thr=true?  And for the rest, I think they still\n> > need to be destroyed in the main thread.\n> \n> I think having the monitor sometimes run in the main thread and sometimes\n> run in a background thread is a recipe for ongoing trouble, of which this\n> problem is just the first example that will hurt us. People will test\n> behaviour of a feature with one setup and then users will later run it in\n> a different setup and potentially experiance obscure bugs as a result.\n> IOW, use_io_thr flag should not exist, and every monitor should be run\n> unconditionally in the background thread from the point at which your\n> patch series merges.\n\nI agree with you that this may bring trouble in some aspect.  I just\ndon't know whether it'll bring more trouble if we move all the\nmonitor-related chardev IO into monitor thread.\n\nThe key is the muxed typed chardev.\n\nIf we don't have muxed typed chardev, I'll surely consider to use IO\nthread for all the monitors.\n\nHowever, the muxed chardevs can support e.g. one monitor plus a serial\nport. Can we just run the IO stuff in monitor thread even part of its\nfrontend is a serial port?  And also I'm not sure what would happen if\nit's a monitor plus something else I even don't aware of.\n\nAny nicer thoughts?  Thanks,","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx03.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx03.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xy2gN64Gnz9s4q\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 21 Sep 2017 00:48:48 +1000 (AEST)","from localhost ([::1]:48590 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dugIw-0000eS-Ps\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 10:48:46 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:47318)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dufKJ-0002O4-8U\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:46:09 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dufKD-0001SR-4o\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:46:07 -0400","from mx1.redhat.com ([209.132.183.28]:43734)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>) id 1dufKC-0001S5-RZ\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:46:01 -0400","from smtp.corp.redhat.com\n\t(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 275AD7E43E;\n\tWed, 20 Sep 2017 11:18:59 +0000 (UTC)","from pxdev.xzpeter.org (ovpn-12-43.pek2.redhat.com [10.72.12.43])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 11162600CA;\n\tWed, 20 Sep 2017 11:18:52 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 275AD7E43E","Date":"Wed, 20 Sep 2017 19:18:49 +0800","From":"Peter Xu <peterx@redhat.com>","To":"\"Daniel P. Berrange\" <berrange@redhat.com>","Message-ID":"<20170920111849.GB30661@pxdev.xzpeter.org>","References":"<1505375436-28439-1-git-send-email-peterx@redhat.com>\n\t<1505375436-28439-2-git-send-email-peterx@redhat.com>\n\t<20170920075703.GA4053@redhat.com>\n\t<20170920090926.GA31306@pxdev.xzpeter.org>\n\t<20170920091438.GB4053@redhat.com>\n\t<20170920104958.GA30661@pxdev.xzpeter.org>\n\t<20170920110309.GF4053@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170920110309.GF4053@redhat.com>","User-Agent":"Mutt/1.5.24 (2015-08-30)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.11","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.27]);\n\tWed, 20 Sep 2017 11:18:59 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Laurent Vivier <lvivier@redhat.com>, Fam Zheng <famz@redhat.com>, Juan\n\tQuintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>,\n\tmdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,\n\tStefan Hajnoczi <shajnocz@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9?=\n\tLureau <marcandre.lureau@gmail.com>,\n\tPaolo Bonzini <pbonzini@redhat.com>, \"Dr . David Alan Gilbert\"\n\t<dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1771951,"web_url":"http://patchwork.ozlabs.org/comment/1771951/","msgid":"<20170920090926.GA31306@pxdev.xzpeter.org>","list_archive_url":null,"date":"2017-09-20T09:09:26","subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Wed, Sep 20, 2017 at 08:57:03AM +0100, Daniel P. Berrange wrote:\n> On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote:\n> > This is not a problem if we are only having one single loop thread like\n> > before.  However, after per-monitor thread is introduced, this is not\n> > true any more, and the race can happen.\n> > \n> > The race can be triggered with \"make check -j8\" sometimes:\n> > \n> >   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:\n> >   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.\n> > \n> > This patch keeps the reference for the watch object when creating in\n> > io_add_watch_poll(), so that the object will never be released in the\n> > context main loop, especially when the context loop is running in\n> > another standalone thread.  Meanwhile, when we want to remove the watch\n> > object, we always first detach the watch object from its owner context,\n> > then we continue with the cleanup.\n> > \n> > Without this patch, calling io_remove_watch_poll() in main loop thread\n> > is not thread-safe, since the other per-monitor thread may be modifying\n> > the watch object at the same time.\n> \n> This doesn't feel right to me. Why is the main loop thread doing anything\n> at all with the Chardev, if there is a per-monitor thread ? The Chardev\n> code isn't thread safe so it isn't safe to have two separate threads\n> accessing the same Chardev. IOW, if we want a per-monitor thread, then\n> we must make sure the main thread never touches that monitor's chardev\n> at all.  While your patch here might have avoided the assertion you\n> mention above, I fear this is just papering over a fundamental problem\n> that still exists, that can only be solved by not letting the mainloop\n> touch the chardev at all.\n\nThe stack I encountered:\n\n#0  0x00007f658234c765 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54\n#1  0x00007f658234e36a in __GI_abort () at abort.c:89\n#2  0x00007f6582344f97 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55c76345fce1 \"iwp->src == NULL\", file=file@entry=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=line@entry=91, function=function@entry=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:92\n#3  0x00007f6582345042 in __GI___assert_fail (assertion=0x55c76345fce1 \"iwp->src == NULL\", file=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=91, function=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:101\n#4  0x000055c7632c2be5 in io_watch_poll_finalize (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:91\n#5  0x00007f65847bb859 in g_source_unref_internal () at /lib64/libglib-2.0.so.0\n#6  0x00007f65847bca29 in g_source_destroy_internal () at /lib64/libglib-2.0.so.0\n#7  0x000055c7632c2d30 in io_remove_watch_poll (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:139\n#8  0x000055c7632c2d5c in remove_fd_in_watch (chr=0x55c7651ccdf0) at /root/git/qemu/chardev/char-io.c:145\n#9  0x000055c7632c2368 in qemu_chr_fe_set_handlers (b=0x55c7651f6410, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true)\n    at /root/git/qemu/chardev/char-fe.c:267\n#10 0x000055c7632c2221 in qemu_chr_fe_deinit (b=0x55c7651f6410, del=false) at /root/git/qemu/chardev/char-fe.c:231\n#11 0x000055c762e2b15c in monitor_data_destroy (mon=0x55c7651f6410) at /root/git/qemu/monitor.c:600\n#12 0x000055c762e340ec in monitor_cleanup () at /root/git/qemu/monitor.c:4346\n#13 0x000055c762f9445d in main (argc=19, argv=0x7ffc6846d0e8, envp=0x7ffc6846d188) at /root/git/qemu/vl.c:4889\n\nSo it's destroying the CharBackend, but it'll then call\nqemu_chr_fe_set_handlers() which finally tries to remove the watch poll.\n\nIf without current patch, I can still encounter the same crash when\ndoing \"make check -j8\".\n\n> \n> > \n> > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>\n> > Signed-off-by: Peter Xu <peterx@redhat.com>\n> > ---\n> >  chardev/char-io.c | 15 +++++++++++++--\n> >  1 file changed, 13 insertions(+), 2 deletions(-)\n> > \n> > diff --git a/chardev/char-io.c b/chardev/char-io.c\n> > index f810524..3828c20 100644\n> > --- a/chardev/char-io.c\n> > +++ b/chardev/char-io.c\n> > @@ -122,7 +122,6 @@ GSource *io_add_watch_poll(Chardev *chr,\n> >      g_free(name);\n> >  \n> >      g_source_attach(&iwp->parent, context);\n> > -    g_source_unref(&iwp->parent);\n> >      return (GSource *)iwp;\n> >  }\n> >  \n> > @@ -131,12 +130,24 @@ static void io_remove_watch_poll(GSource *source)\n> >      IOWatchPoll *iwp;\n> >  \n> >      iwp = io_watch_poll_from_source(source);\n> > +\n> > +    /*\n> > +     * Here the order of destruction really matters.  We need to first\n> > +     * detach the IOWatchPoll object from the context (which may still\n> > +     * be running in another loop thread), only after that could we\n> > +     * continue to operate on iwp->src, or there may be race condition\n> > +     * between current thread and the context loop thread.\n> > +     *\n> > +     * Let's blame the glib bug mentioned in commit 2b3167 (again) for\n> > +     * this extra complexity.\n> > +     */\n> > +    g_source_destroy(&iwp->parent);\n> >      if (iwp->src) {\n> >          g_source_destroy(iwp->src);\n> >          g_source_unref(iwp->src);\n> >          iwp->src = NULL;\n> >      }\n> > -    g_source_destroy(&iwp->parent);\n> > +    g_source_unref(&iwp->parent);\n> >  }\n> >  \n> >  void remove_fd_in_watch(Chardev *chr)\n> > -- \n> > 2.7.4\n> > \n> \n> Regards,\n> Daniel\n> -- \n> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|\n> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|\n> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xy2sB6wZ5z9sNr\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 21 Sep 2017 00:57:18 +1000 (AEST)","from localhost ([::1]:48646 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dugRB-000296-3I\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 10:57:17 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:51154)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dufWP-0004si-99\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:58:38 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dufWL-0006qU-C4\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:58:37 -0400","from mx1.redhat.com ([209.132.183.28]:33764)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>) id 1dufWL-0006pw-1S\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:58:33 -0400","from smtp.corp.redhat.com\n\t(int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id A4CACC047B92;\n\tWed, 20 Sep 2017 09:09:33 +0000 (UTC)","from pxdev.xzpeter.org (dhcp-15-224.nay.redhat.com [10.66.15.224])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 6FDAD5D97F;\n\tWed, 20 Sep 2017 09:09:28 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com A4CACC047B92","Date":"Wed, 20 Sep 2017 17:09:26 +0800","From":"Peter Xu <peterx@redhat.com>","To":"\"Daniel P. Berrange\" <berrange@redhat.com>","Message-ID":"<20170920090926.GA31306@pxdev.xzpeter.org>","References":"<1505375436-28439-1-git-send-email-peterx@redhat.com>\n\t<1505375436-28439-2-git-send-email-peterx@redhat.com>\n\t<20170920075703.GA4053@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170920075703.GA4053@redhat.com>","User-Agent":"Mutt/1.5.24 (2015-08-30)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.14","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.31]);\n\tWed, 20 Sep 2017 09:09:33 +0000 (UTC)","Content-Transfer-Encoding":"quoted-printable","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Laurent Vivier <lvivier@redhat.com>, Fam Zheng <famz@redhat.com>, Juan\n\tQuintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>,\n\tmdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,\n\tStefan Hajnoczi <shajnocz@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9?=\n\tLureau <marcandre.lureau@gmail.com>,\n\tPaolo Bonzini <pbonzini@redhat.com>, \"Dr . David Alan Gilbert\"\n\t<dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1771964,"web_url":"http://patchwork.ozlabs.org/comment/1771964/","msgid":"<20170920112921.GG4053@redhat.com>","list_archive_url":null,"date":"2017-09-20T11:29:21","subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","submitter":{"id":2694,"url":"http://patchwork.ozlabs.org/api/people/2694/","name":"Daniel P. Berrangé","email":"berrange@redhat.com"},"content":"On Wed, Sep 20, 2017 at 07:18:49PM +0800, Peter Xu wrote:\n> On Wed, Sep 20, 2017 at 12:03:09PM +0100, Daniel P. Berrange wrote:\n> > On Wed, Sep 20, 2017 at 06:49:58PM +0800, Peter Xu wrote:\n> > > On Wed, Sep 20, 2017 at 10:14:38AM +0100, Daniel P. Berrange wrote:\n> > > > On Wed, Sep 20, 2017 at 05:09:26PM +0800, Peter Xu wrote:\n> > > > > On Wed, Sep 20, 2017 at 08:57:03AM +0100, Daniel P. Berrange wrote:\n> > > > > > On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote:\n> > > > > > > This is not a problem if we are only having one single loop thread like\n> > > > > > > before.  However, after per-monitor thread is introduced, this is not\n> > > > > > > true any more, and the race can happen.\n> > > > > > > \n> > > > > > > The race can be triggered with \"make check -j8\" sometimes:\n> > > > > > > \n> > > > > > >   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:\n> > > > > > >   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.\n> > > > > > > \n> > > > > > > This patch keeps the reference for the watch object when creating in\n> > > > > > > io_add_watch_poll(), so that the object will never be released in the\n> > > > > > > context main loop, especially when the context loop is running in\n> > > > > > > another standalone thread.  Meanwhile, when we want to remove the watch\n> > > > > > > object, we always first detach the watch object from its owner context,\n> > > > > > > then we continue with the cleanup.\n> > > > > > > \n> > > > > > > Without this patch, calling io_remove_watch_poll() in main loop thread\n> > > > > > > is not thread-safe, since the other per-monitor thread may be modifying\n> > > > > > > the watch object at the same time.\n> > > > > > \n> > > > > > This doesn't feel right to me. Why is the main loop thread doing anything\n> > > > > > at all with the Chardev, if there is a per-monitor thread ? The Chardev\n> > > > > > code isn't thread safe so it isn't safe to have two separate threads\n> > > > > > accessing the same Chardev. IOW, if we want a per-monitor thread, then\n> > > > > > we must make sure the main thread never touches that monitor's chardev\n> > > > > > at all.  While your patch here might have avoided the assertion you\n> > > > > > mention above, I fear this is just papering over a fundamental problem\n> > > > > > that still exists, that can only be solved by not letting the mainloop\n> > > > > > touch the chardev at all.\n> > > > > \n> > > > > The stack I encountered:\n> > > > > \n> > > > > #0  0x00007f658234c765 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54\n> > > > > #1  0x00007f658234e36a in __GI_abort () at abort.c:89\n> > > > > #2  0x00007f6582344f97 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55c76345fce1 \"iwp->src == NULL\", file=file@entry=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=line@entry=91, function=function@entry=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:92\n> > > > > #3  0x00007f6582345042 in __GI___assert_fail (assertion=0x55c76345fce1 \"iwp->src == NULL\", file=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=91, function=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:101\n> > > > > #4  0x000055c7632c2be5 in io_watch_poll_finalize (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:91\n> > > > > #5  0x00007f65847bb859 in g_source_unref_internal () at /lib64/libglib-2.0.so.0\n> > > > > #6  0x00007f65847bca29 in g_source_destroy_internal () at /lib64/libglib-2.0.so.0\n> > > > > #7  0x000055c7632c2d30 in io_remove_watch_poll (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:139\n> > > > > #8  0x000055c7632c2d5c in remove_fd_in_watch (chr=0x55c7651ccdf0) at /root/git/qemu/chardev/char-io.c:145\n> > > > > #9  0x000055c7632c2368 in qemu_chr_fe_set_handlers (b=0x55c7651f6410, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true)\n> > > > >     at /root/git/qemu/chardev/char-fe.c:267\n> > > > > #10 0x000055c7632c2221 in qemu_chr_fe_deinit (b=0x55c7651f6410, del=false) at /root/git/qemu/chardev/char-fe.c:231\n> > > > > #11 0x000055c762e2b15c in monitor_data_destroy (mon=0x55c7651f6410) at /root/git/qemu/monitor.c:600\n> > > > > #12 0x000055c762e340ec in monitor_cleanup () at /root/git/qemu/monitor.c:4346\n> > > > > #13 0x000055c762f9445d in main (argc=19, argv=0x7ffc6846d0e8, envp=0x7ffc6846d188) at /root/git/qemu/vl.c:4889\n> > > > > \n> > > > > So it's destroying the CharBackend, but it'll then call\n> > > > > qemu_chr_fe_set_handlers() which finally tries to remove the watch poll.\n> > > > \n> > > > Ok that code is broken - it must not call monitor_cleanup from the main\n> > > > thread - it needs to be called from the monitor thread, unless it can\n> > > > guarantee that the monitor thread has already exited, which seems unlikely\n> > > \n> > > The problem is that not all monitors are parsed in the IO thread, but\n> > > only those with use_io_thr=true set.\n> > > \n> > > How about I move the calls of monitor_data_destroy() into that monitor\n> > > IO thread when use_io_thr=true?  And for the rest, I think they still\n> > > need to be destroyed in the main thread.\n> > \n> > I think having the monitor sometimes run in the main thread and sometimes\n> > run in a background thread is a recipe for ongoing trouble, of which this\n> > problem is just the first example that will hurt us. People will test\n> > behaviour of a feature with one setup and then users will later run it in\n> > a different setup and potentially experiance obscure bugs as a result.\n> > IOW, use_io_thr flag should not exist, and every monitor should be run\n> > unconditionally in the background thread from the point at which your\n> > patch series merges.\n> \n> I agree with you that this may bring trouble in some aspect.  I just\n> don't know whether it'll bring more trouble if we move all the\n> monitor-related chardev IO into monitor thread.\n> \n> The key is the muxed typed chardev.\n> \n> If we don't have muxed typed chardev, I'll surely consider to use IO\n> thread for all the monitors.\n> \n> However, the muxed chardevs can support e.g. one monitor plus a serial\n> port. Can we just run the IO stuff in monitor thread even part of its\n> frontend is a serial port?  And also I'm not sure what would happen if\n> it's a monitor plus something else I even don't aware of.\n\nUrgh, I forgot about the horrible mux chardev concept, that does rather\ncomplicate life - moving the guest device interaction to the monitor\nthread would be dubious.\n\nSo yeah, given that, it probably is simplest to change monitor_cleanup\nto skip destroy of monitors which have a background thread.\n\nRegards,\nDaniel","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx07.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=berrange@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xy33z6nkRz9sNr\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 21 Sep 2017 01:06:39 +1000 (AEST)","from localhost ([::1]:48707 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dugaD-0002yf-UV\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 11:06:38 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:51152)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <berrange@redhat.com>) id 1dufWP-0004sf-7V\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:58:38 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <berrange@redhat.com>) id 1dufWL-0006qG-8X\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:58:37 -0400","from mx1.redhat.com ([209.132.183.28]:33758)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <berrange@redhat.com>) id 1dufWK-0006pt-TM\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 09:58:33 -0400","from smtp.corp.redhat.com\n\t(int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 960E8C01BC95;\n\tWed, 20 Sep 2017 11:29:32 +0000 (UTC)","from redhat.com (unknown [10.33.36.25])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id AFAB45C670;\n\tWed, 20 Sep 2017 11:29:25 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 960E8C01BC95","Date":"Wed, 20 Sep 2017 12:29:21 +0100","From":"\"Daniel P. Berrange\" <berrange@redhat.com>","To":"Peter Xu <peterx@redhat.com>","Message-ID":"<20170920112921.GG4053@redhat.com>","References":"<1505375436-28439-1-git-send-email-peterx@redhat.com>\n\t<1505375436-28439-2-git-send-email-peterx@redhat.com>\n\t<20170920075703.GA4053@redhat.com>\n\t<20170920090926.GA31306@pxdev.xzpeter.org>\n\t<20170920091438.GB4053@redhat.com>\n\t<20170920104958.GA30661@pxdev.xzpeter.org>\n\t<20170920110309.GF4053@redhat.com>\n\t<20170920111849.GB30661@pxdev.xzpeter.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170920111849.GB30661@pxdev.xzpeter.org>","User-Agent":"Mutt/1.8.3 (2017-05-23)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.16","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.31]);\n\tWed, 20 Sep 2017 11:29:32 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Reply-To":"\"Daniel P. Berrange\" <berrange@redhat.com>","Cc":"Laurent Vivier <lvivier@redhat.com>, Fam Zheng <famz@redhat.com>, Juan\n\tQuintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>,\n\tmdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,\n\tStefan Hajnoczi <shajnocz@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9?=\n\tLureau <marcandre.lureau@gmail.com>,\n\tPaolo Bonzini <pbonzini@redhat.com>, \"Dr . David Alan Gilbert\"\n\t<dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1771968,"web_url":"http://patchwork.ozlabs.org/comment/1771968/","msgid":"<20170920091438.GB4053@redhat.com>","list_archive_url":null,"date":"2017-09-20T09:14:38","subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","submitter":{"id":2694,"url":"http://patchwork.ozlabs.org/api/people/2694/","name":"Daniel P. Berrangé","email":"berrange@redhat.com"},"content":"On Wed, Sep 20, 2017 at 05:09:26PM +0800, Peter Xu wrote:\n> On Wed, Sep 20, 2017 at 08:57:03AM +0100, Daniel P. Berrange wrote:\n> > On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote:\n> > > This is not a problem if we are only having one single loop thread like\n> > > before.  However, after per-monitor thread is introduced, this is not\n> > > true any more, and the race can happen.\n> > > \n> > > The race can be triggered with \"make check -j8\" sometimes:\n> > > \n> > >   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:\n> > >   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.\n> > > \n> > > This patch keeps the reference for the watch object when creating in\n> > > io_add_watch_poll(), so that the object will never be released in the\n> > > context main loop, especially when the context loop is running in\n> > > another standalone thread.  Meanwhile, when we want to remove the watch\n> > > object, we always first detach the watch object from its owner context,\n> > > then we continue with the cleanup.\n> > > \n> > > Without this patch, calling io_remove_watch_poll() in main loop thread\n> > > is not thread-safe, since the other per-monitor thread may be modifying\n> > > the watch object at the same time.\n> > \n> > This doesn't feel right to me. Why is the main loop thread doing anything\n> > at all with the Chardev, if there is a per-monitor thread ? The Chardev\n> > code isn't thread safe so it isn't safe to have two separate threads\n> > accessing the same Chardev. IOW, if we want a per-monitor thread, then\n> > we must make sure the main thread never touches that monitor's chardev\n> > at all.  While your patch here might have avoided the assertion you\n> > mention above, I fear this is just papering over a fundamental problem\n> > that still exists, that can only be solved by not letting the mainloop\n> > touch the chardev at all.\n> \n> The stack I encountered:\n> \n> #0  0x00007f658234c765 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54\n> #1  0x00007f658234e36a in __GI_abort () at abort.c:89\n> #2  0x00007f6582344f97 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55c76345fce1 \"iwp->src == NULL\", file=file@entry=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=line@entry=91, function=function@entry=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:92\n> #3  0x00007f6582345042 in __GI___assert_fail (assertion=0x55c76345fce1 \"iwp->src == NULL\", file=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=91, function=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:101\n> #4  0x000055c7632c2be5 in io_watch_poll_finalize (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:91\n> #5  0x00007f65847bb859 in g_source_unref_internal () at /lib64/libglib-2.0.so.0\n> #6  0x00007f65847bca29 in g_source_destroy_internal () at /lib64/libglib-2.0.so.0\n> #7  0x000055c7632c2d30 in io_remove_watch_poll (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:139\n> #8  0x000055c7632c2d5c in remove_fd_in_watch (chr=0x55c7651ccdf0) at /root/git/qemu/chardev/char-io.c:145\n> #9  0x000055c7632c2368 in qemu_chr_fe_set_handlers (b=0x55c7651f6410, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true)\n>     at /root/git/qemu/chardev/char-fe.c:267\n> #10 0x000055c7632c2221 in qemu_chr_fe_deinit (b=0x55c7651f6410, del=false) at /root/git/qemu/chardev/char-fe.c:231\n> #11 0x000055c762e2b15c in monitor_data_destroy (mon=0x55c7651f6410) at /root/git/qemu/monitor.c:600\n> #12 0x000055c762e340ec in monitor_cleanup () at /root/git/qemu/monitor.c:4346\n> #13 0x000055c762f9445d in main (argc=19, argv=0x7ffc6846d0e8, envp=0x7ffc6846d188) at /root/git/qemu/vl.c:4889\n> \n> So it's destroying the CharBackend, but it'll then call\n> qemu_chr_fe_set_handlers() which finally tries to remove the watch poll.\n\nOk that code is broken - it must not call monitor_cleanup from the main\nthread - it needs to be called from the monitor thread, unless it can\nguarantee that the monitor thread has already exited, which seems unlikely\n\nRegards,\nDaniel","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx04.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx04.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=berrange@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xy3BD6z7jz9s4q\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 21 Sep 2017 01:12:04 +1000 (AEST)","from localhost ([::1]:48742 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dugfS-0007ho-Sp\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 11:12:02 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:51835)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <berrange@redhat.com>) id 1dufYl-0006xs-2K\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 10:01:04 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <berrange@redhat.com>) id 1dufYh-0007lf-6I\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 10:01:03 -0400","from mx1.redhat.com ([209.132.183.28]:44302)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <berrange@redhat.com>) id 1dufYg-0007lR-Tj\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 10:00:59 -0400","from smtp.corp.redhat.com\n\t(int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id F3DEF7EA9E;\n\tWed, 20 Sep 2017 09:14:48 +0000 (UTC)","from redhat.com (unknown [10.33.36.25])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id C2C735D9CB;\n\tWed, 20 Sep 2017 09:14:40 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com F3DEF7EA9E","Date":"Wed, 20 Sep 2017 10:14:38 +0100","From":"\"Daniel P. Berrange\" <berrange@redhat.com>","To":"Peter Xu <peterx@redhat.com>","Message-ID":"<20170920091438.GB4053@redhat.com>","References":"<1505375436-28439-1-git-send-email-peterx@redhat.com>\n\t<1505375436-28439-2-git-send-email-peterx@redhat.com>\n\t<20170920075703.GA4053@redhat.com>\n\t<20170920090926.GA31306@pxdev.xzpeter.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170920090926.GA31306@pxdev.xzpeter.org>","User-Agent":"Mutt/1.8.3 (2017-05-23)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.14","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.28]);\n\tWed, 20 Sep 2017 09:14:49 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Reply-To":"\"Daniel P. Berrange\" <berrange@redhat.com>","Cc":"Laurent Vivier <lvivier@redhat.com>, Fam Zheng <famz@redhat.com>, Juan\n\tQuintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>,\n\tmdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,\n\tStefan Hajnoczi <shajnocz@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9?=\n\tLureau <marcandre.lureau@gmail.com>,\n\tPaolo Bonzini <pbonzini@redhat.com>, \"Dr . David Alan Gilbert\"\n\t<dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1772395,"web_url":"http://patchwork.ozlabs.org/comment/1772395/","msgid":"<20170921034545.GC30661@pxdev.xzpeter.org>","list_archive_url":null,"date":"2017-09-21T03:45:45","subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Wed, Sep 20, 2017 at 12:29:21PM +0100, Daniel P. Berrange wrote:\n> On Wed, Sep 20, 2017 at 07:18:49PM +0800, Peter Xu wrote:\n> > On Wed, Sep 20, 2017 at 12:03:09PM +0100, Daniel P. Berrange wrote:\n> > > On Wed, Sep 20, 2017 at 06:49:58PM +0800, Peter Xu wrote:\n> > > > On Wed, Sep 20, 2017 at 10:14:38AM +0100, Daniel P. Berrange wrote:\n> > > > > On Wed, Sep 20, 2017 at 05:09:26PM +0800, Peter Xu wrote:\n> > > > > > On Wed, Sep 20, 2017 at 08:57:03AM +0100, Daniel P. Berrange wrote:\n> > > > > > > On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote:\n> > > > > > > > This is not a problem if we are only having one single loop thread like\n> > > > > > > > before.  However, after per-monitor thread is introduced, this is not\n> > > > > > > > true any more, and the race can happen.\n> > > > > > > > \n> > > > > > > > The race can be triggered with \"make check -j8\" sometimes:\n> > > > > > > > \n> > > > > > > >   qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:\n> > > > > > > >   io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.\n> > > > > > > > \n> > > > > > > > This patch keeps the reference for the watch object when creating in\n> > > > > > > > io_add_watch_poll(), so that the object will never be released in the\n> > > > > > > > context main loop, especially when the context loop is running in\n> > > > > > > > another standalone thread.  Meanwhile, when we want to remove the watch\n> > > > > > > > object, we always first detach the watch object from its owner context,\n> > > > > > > > then we continue with the cleanup.\n> > > > > > > > \n> > > > > > > > Without this patch, calling io_remove_watch_poll() in main loop thread\n> > > > > > > > is not thread-safe, since the other per-monitor thread may be modifying\n> > > > > > > > the watch object at the same time.\n> > > > > > > \n> > > > > > > This doesn't feel right to me. Why is the main loop thread doing anything\n> > > > > > > at all with the Chardev, if there is a per-monitor thread ? The Chardev\n> > > > > > > code isn't thread safe so it isn't safe to have two separate threads\n> > > > > > > accessing the same Chardev. IOW, if we want a per-monitor thread, then\n> > > > > > > we must make sure the main thread never touches that monitor's chardev\n> > > > > > > at all.  While your patch here might have avoided the assertion you\n> > > > > > > mention above, I fear this is just papering over a fundamental problem\n> > > > > > > that still exists, that can only be solved by not letting the mainloop\n> > > > > > > touch the chardev at all.\n> > > > > > \n> > > > > > The stack I encountered:\n> > > > > > \n> > > > > > #0  0x00007f658234c765 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54\n> > > > > > #1  0x00007f658234e36a in __GI_abort () at abort.c:89\n> > > > > > #2  0x00007f6582344f97 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55c76345fce1 \"iwp->src == NULL\", file=file@entry=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=line@entry=91, function=function@entry=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:92\n> > > > > > #3  0x00007f6582345042 in __GI___assert_fail (assertion=0x55c76345fce1 \"iwp->src == NULL\", file=0x55c76345fcc0 \"/root/git/qemu/chardev/char-io.c\", line=91, function=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> \"io_watch_poll_finalize\") at assert.c:101\n> > > > > > #4  0x000055c7632c2be5 in io_watch_poll_finalize (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:91\n> > > > > > #5  0x00007f65847bb859 in g_source_unref_internal () at /lib64/libglib-2.0.so.0\n> > > > > > #6  0x00007f65847bca29 in g_source_destroy_internal () at /lib64/libglib-2.0.so.0\n> > > > > > #7  0x000055c7632c2d30 in io_remove_watch_poll (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:139\n> > > > > > #8  0x000055c7632c2d5c in remove_fd_in_watch (chr=0x55c7651ccdf0) at /root/git/qemu/chardev/char-io.c:145\n> > > > > > #9  0x000055c7632c2368 in qemu_chr_fe_set_handlers (b=0x55c7651f6410, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true)\n> > > > > >     at /root/git/qemu/chardev/char-fe.c:267\n> > > > > > #10 0x000055c7632c2221 in qemu_chr_fe_deinit (b=0x55c7651f6410, del=false) at /root/git/qemu/chardev/char-fe.c:231\n> > > > > > #11 0x000055c762e2b15c in monitor_data_destroy (mon=0x55c7651f6410) at /root/git/qemu/monitor.c:600\n> > > > > > #12 0x000055c762e340ec in monitor_cleanup () at /root/git/qemu/monitor.c:4346\n> > > > > > #13 0x000055c762f9445d in main (argc=19, argv=0x7ffc6846d0e8, envp=0x7ffc6846d188) at /root/git/qemu/vl.c:4889\n> > > > > > \n> > > > > > So it's destroying the CharBackend, but it'll then call\n> > > > > > qemu_chr_fe_set_handlers() which finally tries to remove the watch poll.\n> > > > > \n> > > > > Ok that code is broken - it must not call monitor_cleanup from the main\n> > > > > thread - it needs to be called from the monitor thread, unless it can\n> > > > > guarantee that the monitor thread has already exited, which seems unlikely\n> > > > \n> > > > The problem is that not all monitors are parsed in the IO thread, but\n> > > > only those with use_io_thr=true set.\n> > > > \n> > > > How about I move the calls of monitor_data_destroy() into that monitor\n> > > > IO thread when use_io_thr=true?  And for the rest, I think they still\n> > > > need to be destroyed in the main thread.\n> > > \n> > > I think having the monitor sometimes run in the main thread and sometimes\n> > > run in a background thread is a recipe for ongoing trouble, of which this\n> > > problem is just the first example that will hurt us. People will test\n> > > behaviour of a feature with one setup and then users will later run it in\n> > > a different setup and potentially experiance obscure bugs as a result.\n> > > IOW, use_io_thr flag should not exist, and every monitor should be run\n> > > unconditionally in the background thread from the point at which your\n> > > patch series merges.\n> > \n> > I agree with you that this may bring trouble in some aspect.  I just\n> > don't know whether it'll bring more trouble if we move all the\n> > monitor-related chardev IO into monitor thread.\n> > \n> > The key is the muxed typed chardev.\n> > \n> > If we don't have muxed typed chardev, I'll surely consider to use IO\n> > thread for all the monitors.\n> > \n> > However, the muxed chardevs can support e.g. one monitor plus a serial\n> > port. Can we just run the IO stuff in monitor thread even part of its\n> > frontend is a serial port?  And also I'm not sure what would happen if\n> > it's a monitor plus something else I even don't aware of.\n> \n> Urgh, I forgot about the horrible mux chardev concept, that does rather\n> complicate life - moving the guest device interaction to the monitor\n> thread would be dubious.\n> \n> So yeah, given that, it probably is simplest to change monitor_cleanup\n> to skip destroy of monitors which have a background thread.\n\nThanks.  I'll add one more patch to split the cleanups to make sure\nthey are done in their own thread.\n\nIf you won't disagree, I would still prefer to keep this patch -\nfirstly, it never hurts; secondly, in case one day we would like to\nmake the whole chardev thread-safe, then we won't need to dig that\nhole again.","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ext-mx05.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx05.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=peterx@redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xyMwd6DJnz9sNw\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 21 Sep 2017 13:46:25 +1000 (AEST)","from localhost ([::1]:51631 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1dusRU-00070m-1c\n\tfor incoming@patchwork.ozlabs.org; Wed, 20 Sep 2017 23:46:24 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:39175)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dusR6-00070b-DA\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 23:46:01 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1dusR1-0005DF-EJ\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 23:46:00 -0400","from mx1.redhat.com ([209.132.183.28]:54950)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>) id 1dusR1-0005Cm-56\n\tfor qemu-devel@nongnu.org; Wed, 20 Sep 2017 23:45:55 -0400","from smtp.corp.redhat.com\n\t(int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id F0F6715561;\n\tThu, 21 Sep 2017 03:45:53 +0000 (UTC)","from pxdev.xzpeter.org (dhcp-15-224.nay.redhat.com [10.66.15.224])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 0845060BF0;\n\tThu, 21 Sep 2017 03:45:47 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com F0F6715561","Date":"Thu, 21 Sep 2017 11:45:45 +0800","From":"Peter Xu <peterx@redhat.com>","To":"\"Daniel P. Berrange\" <berrange@redhat.com>","Message-ID":"<20170921034545.GC30661@pxdev.xzpeter.org>","References":"<1505375436-28439-1-git-send-email-peterx@redhat.com>\n\t<1505375436-28439-2-git-send-email-peterx@redhat.com>\n\t<20170920075703.GA4053@redhat.com>\n\t<20170920090926.GA31306@pxdev.xzpeter.org>\n\t<20170920091438.GB4053@redhat.com>\n\t<20170920104958.GA30661@pxdev.xzpeter.org>\n\t<20170920110309.GF4053@redhat.com>\n\t<20170920111849.GB30661@pxdev.xzpeter.org>\n\t<20170920112921.GG4053@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<20170920112921.GG4053@redhat.com>","User-Agent":"Mutt/1.5.24 (2015-08-30)","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.13","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.29]);\n\tThu, 21 Sep 2017 03:45:54 +0000 (UTC)","X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"209.132.183.28","Subject":"Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on\n\tIOWatchPoll","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Laurent Vivier <lvivier@redhat.com>, Fam Zheng <famz@redhat.com>, Juan\n\tQuintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>,\n\tmdroth@linux.vnet.ibm.com, qemu-devel@nongnu.org,\n\tStefan Hajnoczi <shajnocz@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9?=\n\tLureau <marcandre.lureau@gmail.com>,\n\tPaolo Bonzini <pbonzini@redhat.com>, \"Dr . David Alan Gilbert\"\n\t<dgilbert@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}}]