From patchwork Sun Aug 14 04:04:49 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Avi Kivity X-Patchwork-Id: 109954 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id BDF4BB6F72 for ; Sun, 14 Aug 2011 14:05:06 +1000 (EST) Received: from localhost ([::1]:34402 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QsRwc-00017m-Ib for incoming@patchwork.ozlabs.org; Sun, 14 Aug 2011 00:05:02 -0400 Received: from eggs.gnu.org ([140.186.70.92]:46554) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QsRwW-00017f-OK for qemu-devel@nongnu.org; Sun, 14 Aug 2011 00:04:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QsRwV-0002qy-CT for qemu-devel@nongnu.org; Sun, 14 Aug 2011 00:04:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:27413) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QsRwV-0002qu-2X for qemu-devel@nongnu.org; Sun, 14 Aug 2011 00:04:55 -0400 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p7E44rW9007742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 14 Aug 2011 00:04:53 -0400 Received: from cleopatra.tlv.redhat.com (cleopatra.tlv.redhat.com [10.35.255.11]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p7E44pIt006463; Sun, 14 Aug 2011 00:04:52 -0400 Received: from s01.tlv.redhat.com (s01.tlv.redhat.com [10.35.255.8]) by cleopatra.tlv.redhat.com (Postfix) with ESMTP id 506C3250B48; Sun, 14 Aug 2011 07:04:51 +0300 (IDT) From: Avi Kivity To: Anthony Liguori , Kevin Wolf Date: Sun, 14 Aug 2011 07:04:49 +0300 Message-Id: <1313294689-21572-1-git-send-email-avi@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id p7E44rW9007742 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.132.183.28 Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org Subject: [Qemu-devel] [PATCH v2] posix-aio-compat: fix latency issues X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org In certain circumstances, posix-aio-compat can incur a lot of latency: - threads are created by vcpu threads, so if vcpu affinity is set, aio threads inherit vcpu affinity. This can cause many aio threads to compete for one cpu. - we can create up to max_threads (64) aio threads in one go; since a pthread_create can take around 30μs, we have up to 2ms of cpu time under a global lock. Fix by: - moving thread creation to the main thread, so we inherit the main thread's affinity instead of the vcpu thread's affinity. - if a thread is currently being created, and we need to create yet another thread, let thread being born create the new thread, reducing the amount of time we spend under the main thread. - drop the local lock while creating a thread (we may still hold the global mutex, though) Note this doesn't eliminate latency completely; scheduler artifacts or lack of host cpu resources can still cause it. We may want pre-allocated threads when this cannot be tolerated. Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions. Signed-off-by: Avi Kivity --- v2: simplify do_spawn_thread() locking posix-aio-compat.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 42 insertions(+), 2 deletions(-) diff --git a/posix-aio-compat.c b/posix-aio-compat.c index 8dc00cb..c3febfb 100644 --- a/posix-aio-compat.c +++ b/posix-aio-compat.c @@ -30,6 +30,7 @@ #include "block/raw-posix-aio.h" +static void do_spawn_thread(void); struct qemu_paiocb { BlockDriverAIOCB common; @@ -64,6 +65,9 @@ static pthread_attr_t attr; static int max_threads = 64; static int cur_threads = 0; static int idle_threads = 0; +static int new_threads = 0; /* backlog of threads we need to create */ +static int pending_threads = 0; /* threads created but not running yet */ +static QEMUBH *new_thread_bh; static QTAILQ_HEAD(, qemu_paiocb) request_list; #ifdef CONFIG_PREADV @@ -311,6 +315,11 @@ static void *aio_thread(void *unused) pid = getpid(); + mutex_lock(&lock); + pending_threads--; + mutex_unlock(&lock); + do_spawn_thread(); + while (1) { struct qemu_paiocb *aiocb; ssize_t ret = 0; @@ -381,11 +390,20 @@ static void *aio_thread(void *unused) return NULL; } -static void spawn_thread(void) +static void do_spawn_thread(void) { sigset_t set, oldset; - cur_threads++; + mutex_lock(&lock); + if (!new_threads) { + mutex_unlock(&lock); + return; + } + + new_threads--; + pending_threads++; + + mutex_unlock(&lock); /* block all signals */ if (sigfillset(&set)) die("sigfillset"); @@ -396,6 +414,27 @@ static void spawn_thread(void) if (sigprocmask(SIG_SETMASK, &oldset, NULL)) die("sigprocmask restore"); } +static void spawn_thread_bh_fn(void *opaque) +{ + do_spawn_thread(); +} + +static void spawn_thread(void) +{ + cur_threads++; + new_threads++; + /* If there are threads being created, they will spawn new workers, so + * we don't spend time creating many threads in a loop holding a mutex or + * starving the current vcpu. + * + * If there are no idle threads, ask the main thread to create one, so we + * inherit the correct affinity instead of the vcpu affinity. + */ + if (!pending_threads) { + qemu_bh_schedule(new_thread_bh); + } +} + static void qemu_paio_submit(struct qemu_paiocb *aiocb) { aiocb->ret = -EINPROGRESS; @@ -665,6 +704,7 @@ int paio_init(void) die2(ret, "pthread_attr_setdetachstate"); QTAILQ_INIT(&request_list); + new_thread_bh = qemu_bh_new(spawn_thread_bh_fn, NULL); posix_aio_state = s; return 0;