From patchwork Wed Mar 20 13:29:24 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 229382 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 51F772C00B5 for ; Thu, 21 Mar 2013 00:30:03 +1100 (EST) Received: from localhost ([::1]:40367 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIJ5d-0006OM-8K for incoming@patchwork.ozlabs.org; Wed, 20 Mar 2013 09:30:01 -0400 Received: from eggs.gnu.org ([208.118.235.92]:43325) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIJ5I-0006IY-Ka for qemu-devel@nongnu.org; Wed, 20 Mar 2013 09:29:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UIJ5C-0001Aq-3E for qemu-devel@nongnu.org; Wed, 20 Mar 2013 09:29:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:3554) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIJ5B-0001AP-Rj for qemu-devel@nongnu.org; Wed, 20 Mar 2013 09:29:34 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r2KDTPnR028439 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 20 Mar 2013 09:29:26 -0400 Received: from localhost (dhcp-64-106.muc.redhat.com [10.32.64.106]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r2KDTONs024426; Wed, 20 Mar 2013 09:29:25 -0400 Date: Wed, 20 Mar 2013 14:29:24 +0100 From: Stefan Hajnoczi To: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20130320132924.GB14441@stefanha-thinkpad.muc.redhat.com> References: <1363770734-30970-1-git-send-email-benoit@irqsave.net> <1363770734-30970-2-git-send-email-benoit@irqsave.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1363770734-30970-2-git-send-email-benoit@irqsave.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id r2KDTPnR028439 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: kwolf@redhat.com, wuzhy@linux.vnet.ibm.com, qemu-devel@nongnu.org Subject: Re: [Qemu-devel] [PATCH] block: fix bdrv_exceed_iops_limits wait computation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Wed, Mar 20, 2013 at 10:12:14AM +0100, BenoƮt Canet wrote: > This patch fix an I/O throttling behavior triggered by limiting at 150 iops > and running a load of 50 threads doing random pwrites on a raw virtio device. > > After a few moments the iops count start to oscillate steadily between 0 and a > value upper than the limit. > > As the load keep running the period and the amplitude of the oscillation > increase until io bursts reaching the physical storage max iops count are > done. > > These bursts are followed by guest io starvation. > > As the period of this oscillation cycle is increasing the cause must be a > computation error leading to increase slowly the wait time. > > This patch make the wait time a bit smaller and tests confirm that it solves > the oscillating behavior. > > Signed-off-by: Benoit Canet > --- > block.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block.c b/block.c > index 0a062c9..455d8b0 100644 > --- a/block.c > +++ b/block.c > @@ -3739,7 +3739,7 @@ static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write, > } > > /* Calc approx time to dispatch */ > - wait_time = (ios_base + 1) / iops_limit; > + wait_time = ios_base / iops_limit; > if (wait_time > elapsed_time) { > wait_time = wait_time - elapsed_time; > } else { I tried reproducing without your test case: dd if=/dev/vdb of=/dev/null bs=4096 iflag=direct I've pasted printfs below which reveals that wait time increases monotonically! In other words, dd is slowed down more and more as it runs: bs 0x7f56fe187a30 co 0x7f56fe211cb0 throttled for 1426 ms bs 0x7f56fe187a30 co 0x7f56fe211cb0 woke up from throttled_reqs after sleeping bs 0x7f56fe187a30 co 0x7f56fe211cb0 throttled for 1431 ms bs 0x7f56fe187a30 co 0x7f56fe211cb0 woke up from throttled_reqs after sleeping bs 0x7f56fe187a30 co 0x7f56fe211cb0 throttled for 1437 ms bs 0x7f56fe187a30 co 0x7f56fe211cb0 woke up from throttled_reqs after sleeping ... Killing dd and starting it again resets the accumulated delay (probably because we end the slice and state is cleared). This suggests workloads that are constantly at the I/O limit will experience creeping delay or the oscillations you found. After applying your patch I observed the opposite behavior: wait time decreases until it resets itself. Perhaps we're waiting less and less until we just finish the slice and all values reset: bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 496 ms bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sleeping bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 489 ms bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sleeping bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 484 ms bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sleeping bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 480 ms bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sleeping bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 474 ms ... bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 300 ms bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sleeping bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 299 ms bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sleeping bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 298 ms bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sleeping bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 494 ms I'm not confident that I understand the effects of your patch. Do you have an explanation for these results? More digging will probably be necessary to solve the underlying problem here. diff --git a/block.c b/block.c index 0a062c9..7a8c9e6 100644 --- a/block.c +++ b/block.c @@ -175,7 +175,9 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs, int64_t wait_time = -1; if (!qemu_co_queue_empty(&bs->throttled_reqs)) { + fprintf(stderr, "bs %p co %p waiting for throttled_reqs\n", bs, qemu_coroutine_self()); qemu_co_queue_wait(&bs->throttled_reqs); + fprintf(stderr, "bs %p co %p woke up from throttled_reqs\n", bs, qemu_coroutine_self()); } /* In fact, we hope to keep each request's timing, in FIFO mode. The next @@ -188,7 +190,9 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs, while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) { qemu_mod_timer(bs->block_timer, wait_time + qemu_get_clock_ns(vm_clock)); + fprintf(stderr, "bs %p co %p throttled for %"PRId64" ms\n", bs, qemu_coroutine_self(), wait_time qemu_co_queue_wait_insert_head(&bs->throttled_reqs); + fprintf(stderr, "bs %p co %p woke up from throttled_reqs after sleeping\n", bs, qemu_coroutine_s } qemu_co_queue_next(&bs->throttled_reqs);