From patchwork Wed Nov 8 06:34:47 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergio Lopez X-Patchwork-Id: 835663 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yWxPk2XG7z9s7C for ; Wed, 8 Nov 2017 17:35:35 +1100 (AEDT) Received: from localhost ([::1]:57770 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eCJxS-0004nE-Cb for incoming@patchwork.ozlabs.org; Wed, 08 Nov 2017 01:35:30 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50561) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eCJx1-0004m4-6G for qemu-devel@nongnu.org; Wed, 08 Nov 2017 01:35:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eCJx0-0003wN-BH for qemu-devel@nongnu.org; Wed, 08 Nov 2017 01:35:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56926) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eCJwv-0003uv-Sv; Wed, 08 Nov 2017 01:34:58 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CAB7D7EA8E; Wed, 8 Nov 2017 06:34:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com CAB7D7EA8E Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=slp@redhat.com Received: from kthompson.redhat.com (ovpn-116-141.ams2.redhat.com [10.36.116.141]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8737C5E1DE; Wed, 8 Nov 2017 06:34:49 +0000 (UTC) From: Sergio Lopez To: qemu-devel@nongnu.org Date: Wed, 8 Nov 2017 07:34:47 +0100 Message-Id: <20171108063447.2842-1-slp@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 08 Nov 2017 06:34:56 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v3] util/async: use atomic_mb_set in qemu_bh_cancel X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sergio Lopez , pbonzini@redhat.com, famz@redhat.com, stefanha@redhat.com, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Commit b7a745d added a qemu_bh_cancel call to the completion function as an optimization to prevent it from unnecessarily rescheduling itself. This completion function is scheduled from worker_thread, after setting the state of a ThreadPoolElement to THREAD_DONE. This was considered to be safe, as the completion function restarts the loop just after the call to qemu_bh_cancel. But, under certain access patterns and scheduling conditions, the loop may wrongly use a pre-fetched elem->state value, reading it as THREAD_QUEUED, and ending the completion function without having processed a pending TPE linked at pool->head: worker thread | I/O thread ------------------------------------------------------------------------ | speculatively read req->state req->state = THREAD_DONE; | qemu_bh_schedule(p->completion_bh) | bh->scheduled = 1; | | qemu_bh_cancel(p->completion_bh) | bh->scheduled = 0; | if (req->state == THREAD_DONE) | // sees THREAD_QUEUED The source of the misunderstanding was that qemu_bh_cancel is now being used by the _consumer_ rather than the producer, and therefore now needs to have acquire semantics just like e.g. aio_bh_poll. In some situations, if there are no other independent requests in the same aio context that could eventually trigger the scheduling of the completion function, the omitted TPE and all operations pending on it will get stuck forever. Signed-off-by: Sergio Lopez --- util/async.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/util/async.c b/util/async.c index 355af73ee7..0e1bd8780a 100644 --- a/util/async.c +++ b/util/async.c @@ -174,7 +174,7 @@ void qemu_bh_schedule(QEMUBH *bh) */ void qemu_bh_cancel(QEMUBH *bh) { - bh->scheduled = 0; + atomic_mb_set(&bh->scheduled, 0); } /* This func is async.The bottom half will do the delete action at the finial