From patchwork Thu Aug 9 13:22:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fam Zheng X-Patchwork-Id: 955579 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41mTVf1nBbz9s0n for ; Thu, 9 Aug 2018 23:24:09 +1000 (AEST) Received: from localhost ([::1]:50839 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fnkv8-0007sx-3m for incoming@patchwork.ozlabs.org; Thu, 09 Aug 2018 09:24:06 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53223) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fnkuP-0007iw-NU for qemu-devel@nongnu.org; Thu, 09 Aug 2018 09:23:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fnkuO-0002U8-OT for qemu-devel@nongnu.org; Thu, 09 Aug 2018 09:23:21 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:35480 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fnkuM-0002P4-Ge; Thu, 09 Aug 2018 09:23:18 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0EFDB402332F; Thu, 9 Aug 2018 13:23:18 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6E5232314D; Thu, 9 Aug 2018 13:23:14 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 9 Aug 2018 21:22:58 +0800 Message-Id: <20180809132259.18402-2-famz@redhat.com> In-Reply-To: <20180809132259.18402-1-famz@redhat.com> References: <20180809132259.18402-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 09 Aug 2018 13:23:18 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 09 Aug 2018 13:23:18 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , qemu-block@nongnu.org, Stefan Weil , qemu-stable@nongnu.org, Stefan Hajnoczi , pbonzini@redhat.com, lersek@redhat.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" The same logic exists in fd polling. This change is especially important to avoid busy loop once we limit aio_notify_accept() to blocking aio_poll(). Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng --- util/aio-posix.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/util/aio-posix.c b/util/aio-posix.c index 118bf5784b..b5c7f463aa 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -494,7 +494,8 @@ static bool run_poll_handlers_once(AioContext *ctx) QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { if (!node->deleted && node->io_poll && aio_node_check(ctx, node->is_external) && - node->io_poll(node->opaque)) { + node->io_poll(node->opaque) && + node->opaque != &ctx->notifier) { progress = true; } From patchwork Thu Aug 9 13:22:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fam Zheng X-Patchwork-Id: 955580 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41mTVv5xWfz9s0n for ; Thu, 9 Aug 2018 23:24:23 +1000 (AEST) Received: from localhost ([::1]:50841 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fnkvN-00085w-FA for incoming@patchwork.ozlabs.org; Thu, 09 Aug 2018 09:24:21 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53299) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fnkuY-0007re-40 for qemu-devel@nongnu.org; Thu, 09 Aug 2018 09:23:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fnkuT-0002aT-UE for qemu-devel@nongnu.org; Thu, 09 Aug 2018 09:23:29 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:35486 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fnkuQ-0002Ws-QS; Thu, 09 Aug 2018 09:23:22 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5D9B7402332F; Thu, 9 Aug 2018 13:23:22 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id B92532314D; Thu, 9 Aug 2018 13:23:18 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Thu, 9 Aug 2018 21:22:59 +0800 Message-Id: <20180809132259.18402-3-famz@redhat.com> In-Reply-To: <20180809132259.18402-1-famz@redhat.com> References: <20180809132259.18402-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 09 Aug 2018 13:23:22 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 09 Aug 2018 13:23:22 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , qemu-block@nongnu.org, Stefan Weil , qemu-stable@nongnu.org, Stefan Hajnoczi , pbonzini@redhat.com, lersek@redhat.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" An aio_notify() pairs with an aio_notify_accept(). The former should happen in the main thread or a vCPU thread, and the latter should be done in the IOThread. There is one rare case that the main thread or vCPU thread may "steal" the aio_notify() event just raised by itself, in bdrv_set_aio_context() [1]. The sequence is like this: main thread IO Thread =============================================================== bdrv_drained_begin() aio_disable_external(ctx) aio_poll(ctx, true) ctx->notify_me += 2 ... bdrv_drained_end() ... aio_notify() ... bdrv_set_aio_context() aio_poll(ctx, false) [1] aio_notify_accept(ctx) ppoll() /* Hang! */ [1] is problematic. It will clear the ctx->notifier event so that the blocked ppoll() will not return. (For the curious, this bug was noticed when booting a number of VMs simultaneously in RHV. One or two of the VMs will hit this race condition, making the VIRTIO device unresponsive to I/O commands. When it hangs, Seabios is busy waiting for a read request to complete (read MBR), right after initializing the virtio-blk-pci device, using 100% guest CPU. See also https://bugzilla.redhat.com/show_bug.cgi?id=1562750 for the original bug analysis.) aio_notify() only injects an event when ctx->notify_me is set, correspondingly aio_notify_accept() is only useful when ctx->notify_me _was_ set. Move the call to it into the "blocking" branch. This will effectively skip [1] and fix the hang. Furthermore, blocking aio_poll is only allowed on home thread (in_aio_context_home_thread), because otherwise two blocking aio_poll()'s can steal each other's ctx->notifier event and cause hanging just like described above. Cc: qemu-stable@nongnu.org Suggested-by: Paolo Bonzini Signed-off-by: Fam Zheng --- util/aio-posix.c | 4 ++-- util/aio-win32.c | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/util/aio-posix.c b/util/aio-posix.c index b5c7f463aa..b5c609b68b 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -591,6 +591,7 @@ bool aio_poll(AioContext *ctx, bool blocking) * so disable the optimization now. */ if (blocking) { + assert(in_aio_context_home_thread(ctx)); atomic_add(&ctx->notify_me, 2); } @@ -633,6 +634,7 @@ bool aio_poll(AioContext *ctx, bool blocking) if (blocking) { atomic_sub(&ctx->notify_me, 2); + aio_notify_accept(ctx); } /* Adjust polling time */ @@ -676,8 +678,6 @@ bool aio_poll(AioContext *ctx, bool blocking) } } - aio_notify_accept(ctx); - /* if we have any readable fds, dispatch event */ if (ret > 0) { for (i = 0; i < npfd; i++) { diff --git a/util/aio-win32.c b/util/aio-win32.c index e676a8d9b2..c58957cc4b 100644 --- a/util/aio-win32.c +++ b/util/aio-win32.c @@ -373,11 +373,12 @@ bool aio_poll(AioContext *ctx, bool blocking) ret = WaitForMultipleObjects(count, events, FALSE, timeout); if (blocking) { assert(first); + assert(in_aio_context_home_thread(ctx)); atomic_sub(&ctx->notify_me, 2); + aio_notify_accept(ctx); } if (first) { - aio_notify_accept(ctx); progress |= aio_bh_poll(ctx); first = false; }