From patchwork Tue Oct 9 16:26:23 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 190360 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 59E2C2C00E7 for ; Wed, 10 Oct 2012 03:29:01 +1100 (EST) Received: from localhost ([::1]:41990 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLcfz-0000t3-Fj for incoming@patchwork.ozlabs.org; Tue, 09 Oct 2012 12:28:59 -0400 Received: from eggs.gnu.org ([208.118.235.92]:56434) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLcfD-0000YV-Qx for qemu-devel@nongnu.org; Tue, 09 Oct 2012 12:28:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLce1-0002Or-RT for qemu-devel@nongnu.org; Tue, 09 Oct 2012 12:28:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:14308) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLce0-0002Nk-RL for qemu-devel@nongnu.org; Tue, 09 Oct 2012 12:26:57 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q99GQSQp009205 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 9 Oct 2012 12:26:28 -0400 Received: from yakj.usersys.redhat.com (ovpn-112-29.ams2.redhat.com [10.36.112.29]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q99GQOd1027828; Tue, 9 Oct 2012 12:26:25 -0400 Message-ID: <5074502F.5030706@redhat.com> Date: Tue, 09 Oct 2012 18:26:23 +0200 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Anthony Liguori References: <1348577763-12920-1-git-send-email-pbonzini@redhat.com> <20121008113932.GB16332@stefanha-thinkpad.redhat.com> <5072CE54.8020208@redhat.com> <20121009090811.GB13775@stefanha-thinkpad.redhat.com> <877gqzn0xc.fsf@codemonkey.ws> <50743D91.4010900@redhat.com> <87391n8xmq.fsf@codemonkey.ws> In-Reply-To: <87391n8xmq.fsf@codemonkey.ws> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.132.183.28 Cc: Kevin Wolf , Stefan Hajnoczi , Ping Fan Liu , qemu-devel@nongnu.org, Avi Kivity Subject: Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Il 09/10/2012 17:37, Anthony Liguori ha scritto: >>> >> In the very short term, I can imagine an aio fastpath that was only >>> >> implemented in terms of the device API. We could have a slow path that >>> >> acquired the BQL. >> > >> > Not sure I follow. > > As long as the ioeventfd thread can acquire qemu_mutex in order to call > bdrv_* functions. The new device-only API could do this under the > covers for everything but the linux-aio fast path initially. Ok, so it's about the locking. I'm not even sure we need locking if we have cooperative multitasking. For example if bdrv_aio_readv/writev is called from a VCPU thread, it can just schedule a bottom half for itself in the appropriate AioContext. Similarly for block jobs. The only part where I'm not sure how it would work is bdrv_read/write, because of the strange "qemu_aio_wait() calls select with a lock taken". Maybe we can just forbid synchronous I/O if you set a non-default AioContext. This would be entirely hidden in the block layer. For example the following does it for bdrv_aio_readv/writev: Then we can add a bdrv_aio_readv/writev_unlocked API to the protocols, which would run outside the bottom half and provide the desired fast path. Paolo > That means that we can convert block devices to use the device-only API > across the board (provided we make BQL recursive). > > It also means we get at least some of the benefits of data-plane in the > short term. diff --git a/block.c b/block.c index e95f613..7165e82 100644 --- a/block.c +++ b/block.c @@ -3712,15 +3712,6 @@ static AIOPool bdrv_em_co_aio_pool = { .cancel = bdrv_aio_co_cancel_em, }; -static void bdrv_co_em_bh(void *opaque) -{ - BlockDriverAIOCBCoroutine *acb = opaque; - - acb->common.cb(acb->common.opaque, acb->req.error); - qemu_bh_delete(acb->bh); - qemu_aio_release(acb); -} - /* Invoke bdrv_co_do_readv/bdrv_co_do_writev */ static void coroutine_fn bdrv_co_do_rw(void *opaque) { @@ -3735,8 +3726,17 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque) acb->req.nb_sectors, acb->req.qiov, 0); } - acb->bh = qemu_bh_new(bdrv_co_em_bh, acb); - qemu_bh_schedule(acb->bh); + acb->common.cb(acb->common.opaque, acb->req.error); + qemu_aio_release(acb); +} + +static void bdrv_co_em_bh(void *opaque) +{ + BlockDriverAIOCBCoroutine *acb = opaque; + + qemu_bh_delete(acb->bh); + co = qemu_coroutine_create(bdrv_co_do_rw); + qemu_coroutine_enter(co, acb); } static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, @@ -3756,8 +3756,8 @@ static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs, acb->req.qiov = qiov; acb->is_write = is_write; - co = qemu_coroutine_create(bdrv_co_do_rw); - qemu_coroutine_enter(co, acb); + acb->bh = qemu_bh_new(bdrv_co_em_bh, acb); + qemu_bh_schedule(acb->bh); return &acb->common; }