From patchwork Sat May 31 18:29:05 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?q?Marcin_Gibu=C5=82a?= X-Patchwork-Id: 354518 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 63FC11400BD for ; Sun, 1 Jun 2014 04:29:55 +1000 (EST) Received: from localhost ([::1]:60816 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wqo2S-0004Nf-S2 for incoming@patchwork.ozlabs.org; Sat, 31 May 2014 14:29:52 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56912) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wqo28-0003rz-6y for qemu-devel@nongnu.org; Sat, 31 May 2014 14:29:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wqo1z-0006Wt-LX for qemu-devel@nongnu.org; Sat, 31 May 2014 14:29:32 -0400 Received: from mx.beyond.pl ([92.43.117.49]:34413) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wqo1z-0006Tz-Eb for qemu-devel@nongnu.org; Sat, 31 May 2014 14:29:23 -0400 Received: from localhost (localhost [127.0.0.1]) by mx.beyond.pl (Postfix) with ESMTP id 3C9182A81; Sat, 31 May 2014 20:29:21 +0200 (CEST) X-Virus-Scanned: Scanned by Beyond.pl Virus Scanner Received: from mx.beyond.pl ([127.0.0.1]) by localhost (mw.beyond.pl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jo9063HVlkWz; Sat, 31 May 2014 20:29:17 +0200 (CEST) Received: from [192.168.1.121] (src75-20.unii.maverick.com.pl [194.187.75.20]) (Authenticated sender: m.gibula@beyond.pl) by mx.beyond.pl (Postfix) with ESMTPSA id 7CB372A4A; Sat, 31 May 2014 20:29:17 +0200 (CEST) Message-ID: <538A1F71.4080203@beyond.pl> Date: Sat, 31 May 2014 20:29:05 +0200 From: =?UTF-8?B?TWFyY2luIEdpYnXFgmE=?= User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: "qemu-devel@nongnu.org" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-Received-From: 92.43.117.49 Cc: Stefan Hajnoczi Subject: [Qemu-devel] [PATCH] thread-pool: fix deadlock when callbacks depends on each other X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org When two coroutines submit I/O and first coroutine depends on second to complete (by calling bdrv_drain_all), deadlock may occur. This is because both requests may have completed before thread pool notifier got called. Then, when notifier gets executed and first coroutine calls aio_pool() to make progress, it will hang forever, as notifier's descriptor has been already marked clear. This patch fixes this, by rearming thread pool notifier if there are more than one completed requests on list. Without this patch, I could reproduce this bug with snapshot-commit with about 1 per 10 tries. With this patch, I couldn't reproduce it any more. Signed-off-by: Marcin Gibula --- thread-pool.c 2014-04-17 15:44:45.000000000 +0200 +++ thread-pool.c 2014-05-31 20:20:26.083011514 +0200 @@ -76,6 +76,8 @@ struct ThreadPool { int new_threads; /* backlog of threads we need to create */ int pending_threads; /* threads created but not running yet */ int pending_cancellations; /* whether we need a cond_broadcast */ + int pending_completions; /* whether we need to rearm notifier when + executing callback */ bool stopping; }; @@ -110,6 +112,10 @@ static void *worker_thread(void *opaque) ret = req->func(req->arg); req->ret = ret; + if (req->common.cb) { + pool->pending_completions++; + } + /* Write ret before state. */ smp_wmb(); req->state = THREAD_DONE; @@ -185,6 +191,14 @@ restart: } if (elem->state == THREAD_DONE && elem->common.cb) { QLIST_REMOVE(elem, all); + /* If more completed requests are waiting, notifier needs + * to be rearmed so callback can progress with aio_pool(). + */ + pool->pending_completions--; + if (pool->pending_completions) { + event_notifier_set(notifier); + } + /* Read state before ret. */ smp_rmb(); elem->common.cb(elem->common.opaque, elem->ret);