From patchwork Tue Jul 13 19:23:38 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brunner X-Patchwork-Id: 58810 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 37D7DB6EF7 for ; Wed, 14 Jul 2010 05:25:00 +1000 (EST) Received: from localhost ([127.0.0.1]:53855 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OYl69-0006uZ-CN for incoming@patchwork.ozlabs.org; Tue, 13 Jul 2010 15:24:57 -0400 Received: from [140.186.70.92] (port=37934 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OYl4z-0006rc-M7 for qemu-devel@nongnu.org; Tue, 13 Jul 2010 15:23:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OYl4x-0000lA-UK for qemu-devel@nongnu.org; Tue, 13 Jul 2010 15:23:45 -0400 Received: from mail-bw0-f45.google.com ([209.85.214.45]:56658) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OYl4x-0000ki-MR for qemu-devel@nongnu.org; Tue, 13 Jul 2010 15:23:43 -0400 Received: by bwz19 with SMTP id 19so316541bwz.4 for ; Tue, 13 Jul 2010 12:23:41 -0700 (PDT) Received: by 10.204.82.6 with SMTP id z6mr240947bkk.31.1279049021392; Tue, 13 Jul 2010 12:23:41 -0700 (PDT) Received: from sir.home (e181029245.adsl.alicedsl.de [85.181.29.245]) by mx.google.com with ESMTPS id s34sm25434087bkk.13.2010.07.13.12.23.40 (version=SSLv3 cipher=RC4-MD5); Tue, 13 Jul 2010 12:23:40 -0700 (PDT) Date: Tue, 13 Jul 2010 21:23:38 +0200 From: Christian Brunner To: Yehuda Sadeh Weinraub Subject: Re: [Qemu-devel] Re: [PATCH] ceph/rbd block driver for qemu-kvm (v3) Message-ID: <20100713192338.GA25126@sir.home> References: <20100531193140.GA13993@chb-desktop> <4C1293B7.1060307@gmail.com> <4C1B45DB.4000502@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-12-10) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) Cc: Kevin Wolf , ceph-devel@vger.kernel.org, Simone Gotti , qemu-devel@nongnu.org, kvm@vger.kernel.org X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Tue, Jul 13, 2010 at 11:27:03AM -0700, Yehuda Sadeh Weinraub wrote: > > > > There is another problem with very large i/o requests. I suspect that > > this can be triggered only > > with qemu-io and not in kvm, but I'll try to get a proper solution it anyway. > > > > Have you made any progress with this issue? Just note that there were > a few changes we introduced recently (a format change that allows > renaming of rbd images, and some snapshots support), so everything > will needed to be reposted once we figure out the aio issue. Attached is a patch where I'm trying to solve the issue with pthreads locking. It works well with qemu-io, but I'm not sure if there are interferences with other threads in qemu/kvm (I didn't have time to test this, yet). Another thing I'm not sure about is the fact, that these large I/O requests only happen with qemu-io. I've never seen this happen inside a virtual machine. So do we really have to fix this, as it is only a warning message (laggy). Regards, Christian From fcef3d897e0357b252a189ed59e43bfd5c24d229 Mon Sep 17 00:00:00 2001 From: Christian Brunner Date: Tue, 22 Jun 2010 21:51:09 +0200 Subject: [PATCH 27/27] add queueing delay based on queuesize --- block/rbd.c | 31 ++++++++++++++++++++++++++++++- 1 files changed, 30 insertions(+), 1 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index 10daf20..c6693d7 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -24,7 +24,7 @@ #include #include - +#include int eventfd(unsigned int initval, int flags); @@ -50,6 +50,7 @@ int eventfd(unsigned int initval, int flags); */ #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER) +#define MAX_QUEUE_SIZE 33554432 // 32MB typedef struct RBDAIOCB { BlockDriverAIOCB common; @@ -79,6 +80,9 @@ typedef struct BDRVRBDState { uint64_t size; uint64_t objsize; int qemu_aio_count; + uint64_t queuesize; + pthread_mutex_t *queue_mutex; + pthread_cond_t *queue_threshold; } BDRVRBDState; typedef struct rbd_obj_header_ondisk RbdHeader1; @@ -334,6 +338,12 @@ static int rbd_open(BlockDriverState *bs, const char *filename, int flags) le64_to_cpus((uint64_t *) & header->image_size); s->size = header->image_size; s->objsize = 1 << header->options.order; + s->queuesize = 0; + + s->queue_mutex = qemu_malloc(sizeof(pthread_mutex_t)); + pthread_mutex_init(s->queue_mutex, NULL); + s->queue_threshold = qemu_malloc(sizeof(pthread_cond_t)); + pthread_cond_init (s->queue_threshold, NULL); s->efd = eventfd(0, 0); if (s->efd < 0) { @@ -356,6 +366,11 @@ static void rbd_close(BlockDriverState *bs) { BDRVRBDState *s = bs->opaque; + pthread_cond_destroy(s->queue_threshold); + qemu_free(s->queue_threshold); + pthread_mutex_destroy(s->queue_mutex); + qemu_free(s->queue_mutex); + rados_close_pool(s->pool); rados_deinitialize(); } @@ -443,6 +458,12 @@ static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb) int i; acb->aiocnt--; + acb->s->queuesize -= rcb->segsize; + if (acb->s->queuesize+rcb->segsize > MAX_QUEUE_SIZE && acb->s->queuesize <= MAX_QUEUE_SIZE) { + pthread_mutex_lock(acb->s->queue_mutex); + pthread_cond_signal(acb->s->queue_threshold); + pthread_mutex_unlock(acb->s->queue_mutex); + } r = rados_aio_get_return_value(c); rados_aio_release(c); if (acb->write) { @@ -560,6 +581,14 @@ static BlockDriverAIOCB *rbd_aio_rw_vector(BlockDriverState *bs, rcb->segsize = segsize; rcb->buf = buf; + while (s->queuesize > MAX_QUEUE_SIZE) { + pthread_mutex_lock(s->queue_mutex); + pthread_cond_wait(s->queue_threshold, s->queue_mutex); + pthread_mutex_unlock(s->queue_mutex); + } + + s->queuesize += segsize; + if (write) { rados_aio_create_completion(rcb, NULL, (rados_callback_t) rbd_finish_aiocb,