From patchwork Fri Sep 6 15:38:32 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 273235 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 49C572C010A for ; Sat, 7 Sep 2013 01:40:09 +1000 (EST) Received: from localhost ([::1]:38404 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VHy8l-0004Ft-4m for incoming@patchwork.ozlabs.org; Fri, 06 Sep 2013 11:40:07 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40123) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VHy8B-000494-1d for qemu-devel@nongnu.org; Fri, 06 Sep 2013 11:39:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VHy86-000557-Gq for qemu-devel@nongnu.org; Fri, 06 Sep 2013 11:39:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:5868) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VHy86-00054l-6N for qemu-devel@nongnu.org; Fri, 06 Sep 2013 11:39:26 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r86FdKUF012679 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 6 Sep 2013 11:39:20 -0400 Received: from localhost (ovpn-112-53.ams2.redhat.com [10.36.112.53]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r86FdIH0010384; Fri, 6 Sep 2013 11:39:19 -0400 From: Stefan Hajnoczi To: Date: Fri, 6 Sep 2013 17:38:32 +0200 Message-Id: <1378481953-23099-2-git-send-email-stefanha@redhat.com> In-Reply-To: <1378481953-23099-1-git-send-email-stefanha@redhat.com> References: <1378481953-23099-1-git-send-email-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id r86FdKUF012679 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: Stefan Hajnoczi , =?UTF-8?q?Beno=C3=AEt=20Canet?= , Anthony Liguori Subject: [Qemu-devel] [PULL 01/42] throttle: Add a new throttling API implementing continuous leaky bucket. X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org From: Benoît Canet Implement the continuous leaky bucket algorithm devised on IRC as a separate module. Signed-off-by: Benoit Canet Signed-off-by: Stefan Hajnoczi --- include/qemu/throttle.h | 110 ++++++++++++++ util/Makefile.objs | 1 + util/throttle.c | 396 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 507 insertions(+) create mode 100644 include/qemu/throttle.h create mode 100644 util/throttle.c diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h new file mode 100644 index 0000000..ab29b0b --- /dev/null +++ b/include/qemu/throttle.h @@ -0,0 +1,110 @@ +/* + * QEMU throttling infrastructure + * + * Copyright (C) Nodalink, SARL. 2013 + * + * Author: + * Benoît Canet + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 or + * (at your option) version 3 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see . + */ + +#ifndef THROTTLE_H +#define THROTTLE_H + +#include +#include "qemu-common.h" +#include "qemu/timer.h" + +#define NANOSECONDS_PER_SECOND 1000000000.0 + +typedef enum { + THROTTLE_BPS_TOTAL, + THROTTLE_BPS_READ, + THROTTLE_BPS_WRITE, + THROTTLE_OPS_TOTAL, + THROTTLE_OPS_READ, + THROTTLE_OPS_WRITE, + BUCKETS_COUNT, +} BucketType; + +/* + * The max parameter of the leaky bucket throttling algorithm can be used to + * allow the guest to do bursts. + * The max value is a pool of I/O that the guest can use without being throttled + * at all. Throttling is triggered once this pool is empty. + */ + +typedef struct LeakyBucket { + double avg; /* average goal in units per second */ + double max; /* leaky bucket max burst in units */ + double level; /* bucket level in units */ +} LeakyBucket; + +/* The following structure is used to configure a ThrottleState + * It contains a bit of state: the bucket field of the LeakyBucket structure. + * However it allows to keep the code clean and the bucket field is reset to + * zero at the right time. + */ +typedef struct ThrottleConfig { + LeakyBucket buckets[BUCKETS_COUNT]; /* leaky buckets */ + uint64_t op_size; /* size of an operation in bytes */ +} ThrottleConfig; + +typedef struct ThrottleState { + ThrottleConfig cfg; /* configuration */ + int64_t previous_leak; /* timestamp of the last leak done */ + QEMUTimer * timers[2]; /* timers used to do the throttling */ + QEMUClockType clock_type; /* the clock used */ +} ThrottleState; + +/* operations on single leaky buckets */ +void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta); + +int64_t throttle_compute_wait(LeakyBucket *bkt); + +/* expose timer computation function for unit tests */ +bool throttle_compute_timer(ThrottleState *ts, + bool is_write, + int64_t now, + int64_t *next_timestamp); + +/* init/destroy cycle */ +void throttle_init(ThrottleState *ts, + QEMUClockType clock_type, + void (read_timer)(void *), + void (write_timer)(void *), + void *timer_opaque); + +void throttle_destroy(ThrottleState *ts); + +bool throttle_have_timer(ThrottleState *ts); + +/* configuration */ +bool throttle_enabled(ThrottleConfig *cfg); + +bool throttle_conflicting(ThrottleConfig *cfg); + +bool throttle_is_valid(ThrottleConfig *cfg); + +void throttle_config(ThrottleState *ts, ThrottleConfig *cfg); + +void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg); + +/* usage */ +bool throttle_schedule_timer(ThrottleState *ts, bool is_write); + +void throttle_account(ThrottleState *ts, bool is_write, uint64_t size); + +#endif diff --git a/util/Makefile.objs b/util/Makefile.objs index dc72ab0..2bb13a2 100644 --- a/util/Makefile.objs +++ b/util/Makefile.objs @@ -11,3 +11,4 @@ util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o uri.o notify.o util-obj-y += qemu-option.o qemu-progress.o util-obj-y += hexdump.o util-obj-y += crc32c.o +util-obj-y += throttle.o diff --git a/util/throttle.c b/util/throttle.c new file mode 100644 index 0000000..02e6f15 --- /dev/null +++ b/util/throttle.c @@ -0,0 +1,396 @@ +/* + * QEMU throttling infrastructure + * + * Copyright (C) Nodalink, SARL. 2013 + * + * Author: + * Benoît Canet + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 or + * (at your option) version 3 of the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see . + */ + +#include "qemu/throttle.h" +#include "qemu/timer.h" + +/* This function make a bucket leak + * + * @bkt: the bucket to make leak + * @delta_ns: the time delta + */ +void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta_ns) +{ + double leak; + + /* compute how much to leak */ + leak = (bkt->avg * (double) delta_ns) / NANOSECONDS_PER_SECOND; + + /* make the bucket leak */ + bkt->level = MAX(bkt->level - leak, 0); +} + +/* Calculate the time delta since last leak and make proportionals leaks + * + * @now: the current timestamp in ns + */ +static void throttle_do_leak(ThrottleState *ts, int64_t now) +{ + /* compute the time elapsed since the last leak */ + int64_t delta_ns = now - ts->previous_leak; + int i; + + ts->previous_leak = now; + + if (delta_ns <= 0) { + return; + } + + /* make each bucket leak */ + for (i = 0; i < BUCKETS_COUNT; i++) { + throttle_leak_bucket(&ts->cfg.buckets[i], delta_ns); + } +} + +/* do the real job of computing the time to wait + * + * @limit: the throttling limit + * @extra: the number of operation to delay + * @ret: the time to wait in ns + */ +static int64_t throttle_do_compute_wait(double limit, double extra) +{ + double wait = extra * NANOSECONDS_PER_SECOND; + wait /= limit; + return wait; +} + +/* This function compute the wait time in ns that a leaky bucket should trigger + * + * @bkt: the leaky bucket we operate on + * @ret: the resulting wait time in ns or 0 if the operation can go through + */ +int64_t throttle_compute_wait(LeakyBucket *bkt) +{ + double extra; /* the number of extra units blocking the io */ + + if (!bkt->avg) { + return 0; + } + + extra = bkt->level - bkt->max; + + if (extra <= 0) { + return 0; + } + + return throttle_do_compute_wait(bkt->avg, extra); +} + +/* This function compute the time that must be waited while this IO + * + * @is_write: true if the current IO is a write, false if it's a read + * @ret: time to wait + */ +static int64_t throttle_compute_wait_for(ThrottleState *ts, + bool is_write) +{ + BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL, + THROTTLE_OPS_TOTAL, + THROTTLE_BPS_READ, + THROTTLE_OPS_READ}, + {THROTTLE_BPS_TOTAL, + THROTTLE_OPS_TOTAL, + THROTTLE_BPS_WRITE, + THROTTLE_OPS_WRITE}, }; + int64_t wait, max_wait = 0; + int i; + + for (i = 0; i < 4; i++) { + BucketType index = to_check[is_write][i]; + wait = throttle_compute_wait(&ts->cfg.buckets[index]); + if (wait > max_wait) { + max_wait = wait; + } + } + + return max_wait; +} + +/* compute the timer for this type of operation + * + * @is_write: the type of operation + * @now: the current clock timestamp + * @next_timestamp: the resulting timer + * @ret: true if a timer must be set + */ +bool throttle_compute_timer(ThrottleState *ts, + bool is_write, + int64_t now, + int64_t *next_timestamp) +{ + int64_t wait; + + /* leak proportionally to the time elapsed */ + throttle_do_leak(ts, now); + + /* compute the wait time if any */ + wait = throttle_compute_wait_for(ts, is_write); + + /* if the code must wait compute when the next timer should fire */ + if (wait) { + *next_timestamp = now + wait; + return true; + } + + /* else no need to wait at all */ + *next_timestamp = now; + return false; +} + +/* To be called first on the ThrottleState */ +void throttle_init(ThrottleState *ts, + QEMUClockType clock_type, + QEMUTimerCB *read_timer_cb, + QEMUTimerCB *write_timer_cb, + void *timer_opaque) +{ + memset(ts, 0, sizeof(ThrottleState)); + + ts->clock_type = clock_type; + ts->timers[0] = timer_new_ns(clock_type, read_timer_cb, timer_opaque); + ts->timers[1] = timer_new_ns(clock_type, write_timer_cb, timer_opaque); +} + +/* destroy a timer */ +static void throttle_timer_destroy(QEMUTimer **timer) +{ + assert(*timer != NULL); + + timer_del(*timer); + timer_free(*timer); + *timer = NULL; +} + +/* To be called last on the ThrottleState */ +void throttle_destroy(ThrottleState *ts) +{ + int i; + + for (i = 0; i < 2; i++) { + throttle_timer_destroy(&ts->timers[i]); + } +} + +/* is any throttling timer configured */ +bool throttle_have_timer(ThrottleState *ts) +{ + if (ts->timers[0]) { + return true; + } + + return false; +} + +/* Does any throttling must be done + * + * @cfg: the throttling configuration to inspect + * @ret: true if throttling must be done else false + */ +bool throttle_enabled(ThrottleConfig *cfg) +{ + int i; + + for (i = 0; i < BUCKETS_COUNT; i++) { + if (cfg->buckets[i].avg > 0) { + return true; + } + } + + return false; +} + +/* return true if any two throttling parameters conflicts + * + * @cfg: the throttling configuration to inspect + * @ret: true if any conflict detected else false + */ +bool throttle_conflicting(ThrottleConfig *cfg) +{ + bool bps_flag, ops_flag; + bool bps_max_flag, ops_max_flag; + + bps_flag = cfg->buckets[THROTTLE_BPS_TOTAL].avg && + (cfg->buckets[THROTTLE_BPS_READ].avg || + cfg->buckets[THROTTLE_BPS_WRITE].avg); + + ops_flag = cfg->buckets[THROTTLE_OPS_TOTAL].avg && + (cfg->buckets[THROTTLE_OPS_READ].avg || + cfg->buckets[THROTTLE_OPS_WRITE].avg); + + bps_max_flag = cfg->buckets[THROTTLE_BPS_TOTAL].max && + (cfg->buckets[THROTTLE_BPS_READ].max || + cfg->buckets[THROTTLE_BPS_WRITE].max); + + ops_max_flag = cfg->buckets[THROTTLE_OPS_TOTAL].max && + (cfg->buckets[THROTTLE_OPS_READ].max || + cfg->buckets[THROTTLE_OPS_WRITE].max); + + return bps_flag || ops_flag || bps_max_flag || ops_max_flag; +} + +/* check if a throttling configuration is valid + * @cfg: the throttling configuration to inspect + * @ret: true if valid else false + */ +bool throttle_is_valid(ThrottleConfig *cfg) +{ + bool invalid = false; + int i; + + for (i = 0; i < BUCKETS_COUNT; i++) { + if (cfg->buckets[i].avg < 0) { + invalid = true; + } + } + + for (i = 0; i < BUCKETS_COUNT; i++) { + if (cfg->buckets[i].max < 0) { + invalid = true; + } + } + + return !invalid; +} + +/* fix bucket parameters */ +static void throttle_fix_bucket(LeakyBucket *bkt) +{ + double min; + + /* zero bucket level */ + bkt->level = 0; + + /* The following is done to cope with the Linux CFQ block scheduler + * which regroup reads and writes by block of 100ms in the guest. + * When they are two process one making reads and one making writes cfq + * make a pattern looking like the following: + * WWWWWWWWWWWRRRRRRRRRRRRRRWWWWWWWWWWWWWwRRRRRRRRRRRRRRRRR + * Having a max burst value of 100ms of the average will help smooth the + * throttling + */ + min = bkt->avg / 10; + if (bkt->avg && !bkt->max) { + bkt->max = min; + } +} + +/* take care of canceling a timer */ +static void throttle_cancel_timer(QEMUTimer *timer) +{ + assert(timer != NULL); + + timer_del(timer); +} + +/* Used to configure the throttle + * + * @ts: the throttle state we are working on + * @cfg: the config to set + */ +void throttle_config(ThrottleState *ts, ThrottleConfig *cfg) +{ + int i; + + ts->cfg = *cfg; + + for (i = 0; i < BUCKETS_COUNT; i++) { + throttle_fix_bucket(&ts->cfg.buckets[i]); + } + + ts->previous_leak = qemu_clock_get_ns(ts->clock_type); + + for (i = 0; i < 2; i++) { + throttle_cancel_timer(ts->timers[i]); + } +} + +/* used to get config + * + * @ts: the throttle state we are working on + * @cfg: the config to write + */ +void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg) +{ + *cfg = ts->cfg; +} + + +/* Schedule the read or write timer if needed + * + * NOTE: this function is not unit tested due to it's usage of timer_mod + * + * @is_write: the type of operation (read/write) + * @ret: true if the timer has been scheduled else false + */ +bool throttle_schedule_timer(ThrottleState *ts, bool is_write) +{ + int64_t now = qemu_clock_get_ns(ts->clock_type); + int64_t next_timestamp; + bool must_wait; + + must_wait = throttle_compute_timer(ts, + is_write, + now, + &next_timestamp); + + /* request not throttled */ + if (!must_wait) { + return false; + } + + /* request throttled and timer pending -> do nothing */ + if (timer_pending(ts->timers[is_write])) { + return true; + } + + /* request throttled and timer not pending -> arm timer */ + timer_mod(ts->timers[is_write], next_timestamp); + return true; +} + +/* do the accounting for this operation + * + * @is_write: the type of operation (read/write) + * @size: the size of the operation + */ +void throttle_account(ThrottleState *ts, bool is_write, uint64_t size) +{ + double units = 1.0; + + /* if cfg.op_size is defined and smaller than size we compute unit count */ + if (ts->cfg.op_size && size > ts->cfg.op_size) { + units = (double) size / ts->cfg.op_size; + } + + ts->cfg.buckets[THROTTLE_BPS_TOTAL].level += size; + ts->cfg.buckets[THROTTLE_OPS_TOTAL].level += units; + + if (is_write) { + ts->cfg.buckets[THROTTLE_BPS_WRITE].level += size; + ts->cfg.buckets[THROTTLE_OPS_WRITE].level += units; + } else { + ts->cfg.buckets[THROTTLE_BPS_READ].level += size; + ts->cfg.buckets[THROTTLE_OPS_READ].level += units; + } +} +