diff mbox

[08/13] throttle: Add support for burst periods

Message ID 56fe58ac90ed99e5c8dd90a9d4f7bcbb730fd4ee.1454669823.git.berto@igalia.com
State New
Headers show

Commit Message

Alberto Garcia Feb. 5, 2016, 10:59 a.m. UTC
This patch adds support for burst periods to the throttling code.
With this feature the user can keep performing bursts as defined by
the LeakyBucket.max rate for a configurable period of time.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 include/qemu/throttle.h | 41 +++++++++++++++++++++++----
 util/throttle.c         | 73 ++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 96 insertions(+), 18 deletions(-)

Comments

Kevin Wolf Feb. 16, 2016, 10:45 a.m. UTC | #1
Am 05.02.2016 um 11:59 hat Alberto Garcia geschrieben:
> This patch adds support for burst periods to the throttling code.
> With this feature the user can keep performing bursts as defined by
> the LeakyBucket.max rate for a configurable period of time.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  include/qemu/throttle.h | 41 +++++++++++++++++++++++----
>  util/throttle.c         | 73 ++++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 96 insertions(+), 18 deletions(-)
> 
> diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
> index 8ec8951..63df690 100644
> --- a/include/qemu/throttle.h
> +++ b/include/qemu/throttle.h
> @@ -2,7 +2,7 @@
>   * QEMU throttling infrastructure
>   *
>   * Copyright (C) Nodalink, EURL. 2013-2014
> - * Copyright (C) Igalia, S.L. 2015
> + * Copyright (C) Igalia, S.L. 2015-2016
>   *
>   * Authors:
>   *   Benoît Canet <benoit.canet@nodalink.com>
> @@ -42,16 +42,47 @@ typedef enum {
>  } BucketType;
>  
>  /*
> - * The max parameter of the leaky bucket throttling algorithm can be used to
> - * allow the guest to do bursts.
> - * The max value is a pool of I/O that the guest can use without being throttled
> - * at all. Throttling is triggered once this pool is empty.
> + * This module implements I/O limits using the leaky bucket
> + * algorithm. The code is independent of the I/O units, but it is
> + * currently used for bytes per second and operations per second.
> + *
> + * Three parameters can be set by the user:
> + *
> + * - avg: the desired I/O limits in units per second.
> + * - max: the limit during bursts, also in units per second.
> + * - burst_length: the maximum length of the burst period, in seconds.
> + *
> + * Here's how it works:
> + *
> + * - The bucket level (number of performed I/O units) is kept in
> + *   bkt.level and leaks at a rate of bkt.avg units per second.
> + *
> + * - The size of the bucket is bkt.max * bkt.burst_length. Once the
> + *   bucket is full no more I/O is performed until the bucket leaks
> + *   again. This is what makes the I/O rate bkt.avg.
> + *
> + * - The bkt.avg rate does not apply until the bucket is full,
> + *   allowing the user to do bursts until then. The I/O limit during
> + *   bursts is bkt.max. To enforce this limit we keep an additional
> + *   bucket in bkt.burst_length that leaks at a rate of bkt.max units
> + *   per second.
> + *
> + * - Because of all of the above, the user can perform I/O at a
> + *   maximum of bkt.max units per second for at most bkt.burst_length
> + *   seconds in a row. After that the bucket will be full and the I/O
> + *   rate will go down to bkt.avg.
> + *
> + * - Since the bucket always leaks at a rate of bkt.avg, this also
> + *   determines how much the user needs to wait before being able to
> + *   do bursts again.
>   */

Good summary, thanks!

>  typedef struct LeakyBucket {
>      double  avg;              /* average goal in units per second */
>      double  max;              /* leaky bucket max burst in units */
>      double  level;            /* bucket level in units */
> +    double  burst_level;      /* bucket level in units (for computing bursts) */
> +    unsigned burst_length;    /* max length of the burst period, in seconds */
>  } LeakyBucket;
>  
>  /* The following structure is used to configure a ThrottleState
> diff --git a/util/throttle.c b/util/throttle.c
> index 6a01cee..371c769 100644
> --- a/util/throttle.c
> +++ b/util/throttle.c
> @@ -41,6 +41,14 @@ void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta_ns)
>  
>      /* make the bucket leak */
>      bkt->level = MAX(bkt->level - leak, 0);
> +
> +    /* if we allow bursts for more than one second we also need to
> +     * keep track of bkt->burst_level so the bkt->max goal per second
> +     * is attained */
> +    if (bkt->burst_length > 1) {
> +        leak = (bkt->max * (double) delta_ns) / NANOSECONDS_PER_SECOND;
> +        bkt->burst_level = MAX(bkt->burst_level - leak, 0);
> +    }
>  }
>  
>  /* Calculate the time delta since last leak and make proportionals leaks
> @@ -91,13 +99,24 @@ int64_t throttle_compute_wait(LeakyBucket *bkt)
>          return 0;
>      }
>  
> -    extra = bkt->level - bkt->max;
> +    /* If the bucket is full then we have to wait */
> +    extra = bkt->level - bkt->max * bkt->burst_length;
> +    if (extra > 0) {
> +        return throttle_do_compute_wait(bkt->avg, extra);
> +    }
>  
> -    if (extra <= 0) {
> -        return 0;
> +    /* If the bucket is not full yet we have to make sure that we
> +     * fulfill the goal of bkt->max units per second. */
> +    if (bkt->burst_length > 1) {
> +        /* We use 1/10 of the max value to smooth the throttling.
> +         * See throttle_fix_bucket() for more details. */
> +        extra = bkt->burst_level - bkt->max / 10;

I don't understand the connection between throttle_fix_bucket() and
this.

throttle_fix_bucket() seems to set a default rate for bursts, which kind
of makes sense to me (but what's the point when this is lower than the
average rate?)

Here we work on a bkt->max that is either supplied by the user and
should therefore be respected, or the default in throttle_fix_bucket()
has already been applied.

What this line does is letting the request wait for more than would be
strictly necessary, or in other words, for the last second before the
bucket runs full, you only allow a tenth of the actual maximum rate.

I understand that having any burst at all helps, so the default that
throttle_fix_bucket() sets used to make sense. I'm not so sure that it
still makes sense with its max < avg setting (max used to be additional
units on top of avg, now it's measured on its own). For the divison by
10 here, however, I'm still puzzled.

What am I missing?

> +        if (extra > 0) {
> +            return throttle_do_compute_wait(bkt->max, extra);
> +        }
>      }
>  
> -    return throttle_do_compute_wait(bkt->avg, extra);
> +    return 0;
>  }

Kevin
Alberto Garcia Feb. 16, 2016, 2:24 p.m. UTC | #2
On Tue 16 Feb 2016 11:45:32 AM CET, Kevin Wolf <kwolf@redhat.com> wrote:

>> +    /* If the bucket is not full yet we have to make sure that we
>> +     * fulfill the goal of bkt->max units per second. */
>> +    if (bkt->burst_length > 1) {
>> +        /* We use 1/10 of the max value to smooth the throttling.
>> +         * See throttle_fix_bucket() for more details. */
>> +        extra = bkt->burst_level - bkt->max / 10;
>
> I don't understand the connection between throttle_fix_bucket() and
> this.
>
> throttle_fix_bucket() seems to set a default rate for bursts, which
> kind of makes sense to me (but what's the point when this is lower
> than the average rate?)

The point is to smoothen the throttling: if you request a max rate of
2000 operations per second, what you actually get is a rate of 200
operations per tenth of a second.

With the current code, if you set bkg->avg to 1000 but not bkt->max
you'll get exactly this. I'm just applying the same logic to bursts.

Berto
diff mbox

Patch

diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
index 8ec8951..63df690 100644
--- a/include/qemu/throttle.h
+++ b/include/qemu/throttle.h
@@ -2,7 +2,7 @@ 
  * QEMU throttling infrastructure
  *
  * Copyright (C) Nodalink, EURL. 2013-2014
- * Copyright (C) Igalia, S.L. 2015
+ * Copyright (C) Igalia, S.L. 2015-2016
  *
  * Authors:
  *   Benoît Canet <benoit.canet@nodalink.com>
@@ -42,16 +42,47 @@  typedef enum {
 } BucketType;
 
 /*
- * The max parameter of the leaky bucket throttling algorithm can be used to
- * allow the guest to do bursts.
- * The max value is a pool of I/O that the guest can use without being throttled
- * at all. Throttling is triggered once this pool is empty.
+ * This module implements I/O limits using the leaky bucket
+ * algorithm. The code is independent of the I/O units, but it is
+ * currently used for bytes per second and operations per second.
+ *
+ * Three parameters can be set by the user:
+ *
+ * - avg: the desired I/O limits in units per second.
+ * - max: the limit during bursts, also in units per second.
+ * - burst_length: the maximum length of the burst period, in seconds.
+ *
+ * Here's how it works:
+ *
+ * - The bucket level (number of performed I/O units) is kept in
+ *   bkt.level and leaks at a rate of bkt.avg units per second.
+ *
+ * - The size of the bucket is bkt.max * bkt.burst_length. Once the
+ *   bucket is full no more I/O is performed until the bucket leaks
+ *   again. This is what makes the I/O rate bkt.avg.
+ *
+ * - The bkt.avg rate does not apply until the bucket is full,
+ *   allowing the user to do bursts until then. The I/O limit during
+ *   bursts is bkt.max. To enforce this limit we keep an additional
+ *   bucket in bkt.burst_length that leaks at a rate of bkt.max units
+ *   per second.
+ *
+ * - Because of all of the above, the user can perform I/O at a
+ *   maximum of bkt.max units per second for at most bkt.burst_length
+ *   seconds in a row. After that the bucket will be full and the I/O
+ *   rate will go down to bkt.avg.
+ *
+ * - Since the bucket always leaks at a rate of bkt.avg, this also
+ *   determines how much the user needs to wait before being able to
+ *   do bursts again.
  */
 
 typedef struct LeakyBucket {
     double  avg;              /* average goal in units per second */
     double  max;              /* leaky bucket max burst in units */
     double  level;            /* bucket level in units */
+    double  burst_level;      /* bucket level in units (for computing bursts) */
+    unsigned burst_length;    /* max length of the burst period, in seconds */
 } LeakyBucket;
 
 /* The following structure is used to configure a ThrottleState
diff --git a/util/throttle.c b/util/throttle.c
index 6a01cee..371c769 100644
--- a/util/throttle.c
+++ b/util/throttle.c
@@ -41,6 +41,14 @@  void throttle_leak_bucket(LeakyBucket *bkt, int64_t delta_ns)
 
     /* make the bucket leak */
     bkt->level = MAX(bkt->level - leak, 0);
+
+    /* if we allow bursts for more than one second we also need to
+     * keep track of bkt->burst_level so the bkt->max goal per second
+     * is attained */
+    if (bkt->burst_length > 1) {
+        leak = (bkt->max * (double) delta_ns) / NANOSECONDS_PER_SECOND;
+        bkt->burst_level = MAX(bkt->burst_level - leak, 0);
+    }
 }
 
 /* Calculate the time delta since last leak and make proportionals leaks
@@ -91,13 +99,24 @@  int64_t throttle_compute_wait(LeakyBucket *bkt)
         return 0;
     }
 
-    extra = bkt->level - bkt->max;
+    /* If the bucket is full then we have to wait */
+    extra = bkt->level - bkt->max * bkt->burst_length;
+    if (extra > 0) {
+        return throttle_do_compute_wait(bkt->avg, extra);
+    }
 
-    if (extra <= 0) {
-        return 0;
+    /* If the bucket is not full yet we have to make sure that we
+     * fulfill the goal of bkt->max units per second. */
+    if (bkt->burst_length > 1) {
+        /* We use 1/10 of the max value to smooth the throttling.
+         * See throttle_fix_bucket() for more details. */
+        extra = bkt->burst_level - bkt->max / 10;
+        if (extra > 0) {
+            return throttle_do_compute_wait(bkt->max, extra);
+        }
     }
 
-    return throttle_do_compute_wait(bkt->avg, extra);
+    return 0;
 }
 
 /* This function compute the time that must be waited while this IO
@@ -177,7 +196,11 @@  void throttle_timers_attach_aio_context(ThrottleTimers *tt,
  */
 void throttle_config_init(ThrottleConfig *cfg)
 {
+    unsigned i;
     memset(cfg, 0, sizeof(*cfg));
+    for (i = 0; i < BUCKETS_COUNT; i++) {
+        cfg->buckets[i].burst_length = 1;
+    }
 }
 
 /* To be called first on the ThrottleState */
@@ -301,6 +324,16 @@  bool throttle_is_valid(ThrottleConfig *cfg, Error **errp)
             return false;
         }
 
+        if (!cfg->buckets[i].burst_length) {
+            error_setg(errp, "the burst length cannot be 0");
+            return false;
+        }
+
+        if (cfg->buckets[i].burst_length > 1 && !cfg->buckets[i].max) {
+            error_setg(errp, "burst length set without burst rate");
+            return false;
+        }
+
         if (cfg->buckets[i].max && !cfg->buckets[i].avg) {
             error_setg(errp, "bps_max/iops_max require corresponding"
                        " bps/iops values");
@@ -317,7 +350,7 @@  static void throttle_fix_bucket(LeakyBucket *bkt)
     double min;
 
     /* zero bucket level */
-    bkt->level = 0;
+    bkt->level = bkt->burst_level = 0;
 
     /* The following is done to cope with the Linux CFQ block scheduler
      * which regroup reads and writes by block of 100ms in the guest.
@@ -420,22 +453,36 @@  bool throttle_schedule_timer(ThrottleState *ts,
  */
 void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
 {
+    const BucketType bucket_types_size[2][2] = {
+        { THROTTLE_BPS_TOTAL, THROTTLE_BPS_READ },
+        { THROTTLE_BPS_TOTAL, THROTTLE_BPS_WRITE }
+    };
+    const BucketType bucket_types_units[2][2] = {
+        { THROTTLE_OPS_TOTAL, THROTTLE_OPS_READ },
+        { THROTTLE_OPS_TOTAL, THROTTLE_OPS_WRITE }
+    };
     double units = 1.0;
+    unsigned i;
 
     /* if cfg.op_size is defined and smaller than size we compute unit count */
     if (ts->cfg.op_size && size > ts->cfg.op_size) {
         units = (double) size / ts->cfg.op_size;
     }
 
-    ts->cfg.buckets[THROTTLE_BPS_TOTAL].level += size;
-    ts->cfg.buckets[THROTTLE_OPS_TOTAL].level += units;
+    for (i = 0; i < 2; i++) {
+        LeakyBucket *bkt;
 
-    if (is_write) {
-        ts->cfg.buckets[THROTTLE_BPS_WRITE].level += size;
-        ts->cfg.buckets[THROTTLE_OPS_WRITE].level += units;
-    } else {
-        ts->cfg.buckets[THROTTLE_BPS_READ].level += size;
-        ts->cfg.buckets[THROTTLE_OPS_READ].level += units;
+        bkt = &ts->cfg.buckets[bucket_types_size[is_write][i]];
+        bkt->level += size;
+        if (bkt->burst_length > 1) {
+            bkt->burst_level += size;
+        }
+
+        bkt = &ts->cfg.buckets[bucket_types_units[is_write][i]];
+        bkt->level += units;
+        if (bkt->burst_length > 1) {
+            bkt->burst_level += units;
+        }
     }
 }