diff mbox

[v2,3/3] util: Add an utility infrastructure used to compute an average on a time slice

Message ID 1410178693-23370-4-git-send-email-benoit.canet@nodalink.com
State New
Headers show

Commit Message

Benoît Canet Sept. 8, 2014, 12:18 p.m. UTC
The algorithm used was defined on the list while discussing the new IO accounting
overhaul.
See http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04954.html

Also the module takes care of computing minimal and maximal values over the time
slice duration.

Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
---
 include/qemu/timed-average.h |  49 +++++++++++++++
 tests/Makefile               |   2 +
 tests/test-timed-average.c   |  62 +++++++++++++++++++
 util/Makefile.objs           |   1 +
 util/timed-average.c         | 138 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 252 insertions(+)
 create mode 100644 include/qemu/timed-average.h
 create mode 100644 tests/test-timed-average.c
 create mode 100644 util/timed-average.c

Comments

Paolo Bonzini Sept. 8, 2014, 2:29 p.m. UTC | #1
Il 08/09/2014 14:18, Benoît Canet ha scritto:
> The algorithm used was defined on the list while discussing the new IO accounting
> overhaul.
> See http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04954.html
> 
> Also the module takes care of computing minimal and maximal values over the time
> slice duration.
> 
> Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>

If you add

int64_t cpu_get_clock(void)
{
    return my_clock_value;
}

to the test, and use a QEMU_CLOCK_VIRTUAL-based average, you should be
able to advance the clock directly in the test with no need for sleep()
and with 100% deterministic results.

> 
> +/* Check if the ta->periods seconds time slice has expired
> + *
> + * If the slice has expired the counters will be reseted
> + *
> + * @ta: the timed average structure used
> + */
> +static void timed_average_check_expiration(TimedAverage *ta)
> +{
> +    int64_t now = qemu_clock_get_ns(ta->clock_type);
> +
> +    /* if we are still in the period slice do nothing */
> +    if (now < ta->expiration) {
> +        return;
> +    }
> +
> +    /* the slice has expired -> create a new slice */
> +    ta->min = UINT64_MAX;
> +    ta->sum = 0;
> +    ta->max = 0;
> +    ta->count = 0;
> +    timed_average_set_expiration(ta);
> +}

This can produce very noisy results if you invoke min/avg/max at the
wrong time.  Some alternatives include:

- create two windows, with twice the suggested expiration period, and
return min/avg/max from the oldest window.  Example

       t=0          |t=1          |t=2          |t=3          |t=4
       wnd0: [0,1)  |wnd0: [1,3)  |             |wnd0: [3,5)  |
       wnd1: [0,2)  |             |wnd1: [2,4)  |             |

Values are returned from:

       wnd0---------|wnd1---------|wnd0---------|wnd1---------|


- if you do not need min/max, you can use exponential smoothing, with a
weighted factor that depends on the time since the last sample.
http://www.drdobbs.com/tools/discontiguous-exponential-averaging/184410671
-- for example, giving 90% weight to the last second.  Of course the
exponential nature means that, in that case, 1-sqrt(10%)=68.3% weight is
given to the last half second, 21.6% weight is given to the previous
half second, and 10% to the entire previous history.  This cannot give
min/max, but can give avg/stdev.

Paolo
Benoît Canet Sept. 8, 2014, 2:49 p.m. UTC | #2
The Monday 08 Sep 2014 à 16:29:26 (+0200), Paolo Bonzini wrote :
> Il 08/09/2014 14:18, Benoît Canet ha scritto:
> > The algorithm used was defined on the list while discussing the new IO accounting
> > overhaul.
> > See http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04954.html
> > 
> > Also the module takes care of computing minimal and maximal values over the time
> > slice duration.
> > 
> > Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
> 
> If you add
> 
> int64_t cpu_get_clock(void)
> {
>     return my_clock_value;
> }
> 
> to the test, and use a QEMU_CLOCK_VIRTUAL-based average, you should be
> able to advance the clock directly in the test with no need for sleep()
> and with 100% deterministic results.
> 
> > 
> > +/* Check if the ta->periods seconds time slice has expired
> > + *
> > + * If the slice has expired the counters will be reseted
> > + *
> > + * @ta: the timed average structure used
> > + */
> > +static void timed_average_check_expiration(TimedAverage *ta)
> > +{
> > +    int64_t now = qemu_clock_get_ns(ta->clock_type);
> > +
> > +    /* if we are still in the period slice do nothing */
> > +    if (now < ta->expiration) {
> > +        return;
> > +    }
> > +
> > +    /* the slice has expired -> create a new slice */
> > +    ta->min = UINT64_MAX;
> > +    ta->sum = 0;
> > +    ta->max = 0;
> > +    ta->count = 0;
> > +    timed_average_set_expiration(ta);
> > +}
> 
Thanks for reviewing.

> This can produce very noisy results if you invoke min/avg/max at the
> wrong time.  Some alternatives include:
> 
> - create two windows, with twice the suggested expiration period, and
> return min/avg/max from the oldest window.  Example
> 
>        t=0          |t=1          |t=2          |t=3          |t=4
>        wnd0: [0,1)  |wnd0: [1,3)  |             |wnd0: [3,5)  |
>        wnd1: [0,2)  |             |wnd1: [2,4)  |             |
> 
> Values are returned from:
> 
>        wnd0---------|wnd1---------|wnd0---------|wnd1---------|


This is neat.
As I think that knowing the minimal and maximal latencies of a block device
would be handy I will implement this.

Best regards

Benoît

> 
> 
> - if you do not need min/max, you can use exponential smoothing, with a
> weighted factor that depends on the time since the last sample.
> http://www.drdobbs.com/tools/discontiguous-exponential-averaging/184410671
> -- for example, giving 90% weight to the last second.  Of course the
> exponential nature means that, in that case, 1-sqrt(10%)=68.3% weight is
> given to the last half second, 21.6% weight is given to the previous
> half second, and 10% to the entire previous history.  This cannot give
> min/max, but can give avg/stdev.
> 
> Paolo
>
Paolo Bonzini Sept. 8, 2014, 3:09 p.m. UTC | #3
Il 08/09/2014 16:49, Benoît Canet ha scritto:
>> > - create two windows, with twice the suggested expiration period, and
>> > return min/avg/max from the oldest window.  Example
>> > 
>> >        t=0          |t=1          |t=2          |t=3          |t=4
>> >        wnd0: [0,1)  |wnd0: [1,3)  |             |wnd0: [3,5)  |
>> >        wnd1: [0,2)  |             |wnd1: [2,4)  |             |
>> > 
>> > Values are returned from:
>> > 
>> >        wnd0---------|wnd1---------|wnd0---------|wnd1---------|
> 
> This is neat.

Alternatively, you can make it probabilistically correct:

    t=0            |t=0.66           |t=1.33             |t=2              |t=2.66
                   |wnd0: [0.66,2)   |                   |wnd0: [2,3.33)   |
    wnd1: [0,0.66) |                 |wnd1: [1.33,2.66)  |                 |

Return from:

    wnd1-----------|wnd1-------------|wnd0---------------|wnd1-------------|wnd0

So you always have 2/3 seconds worth of data, and on average exactly 1 second
worth of data.

The problem is the delay in getting data, which can be big for the minute-
and hour-based statistics.  Suppose you have a spike that lasts 10 seconds,
it might not show in the minute-based statistics for as much as 30 seconds
after it ends (the window switches every 40 seconds).

For min/max you could return min(min0, min1) and max(max0, max1).  Only the
average has this problem.

Exponential smoothing doesn't have this problem.  IIRC uptime uses that.

Paolo
Benoît Canet Sept. 8, 2014, 3:25 p.m. UTC | #4
On Mon, Sep 08, 2014 at 05:09:38PM +0200, Paolo Bonzini wrote:
> Il 08/09/2014 16:49, Benoît Canet ha scritto:
> >> > - create two windows, with twice the suggested expiration period, and
> >> > return min/avg/max from the oldest window.  Example
> >> > 
> >> >        t=0          |t=1          |t=2          |t=3          |t=4
> >> >        wnd0: [0,1)  |wnd0: [1,3)  |             |wnd0: [3,5)  |
> >> >        wnd1: [0,2)  |             |wnd1: [2,4)  |             |
> >> > 
> >> > Values are returned from:
> >> > 
> >> >        wnd0---------|wnd1---------|wnd0---------|wnd1---------|
> > 
> > This is neat.
> 
> Alternatively, you can make it probabilistically correct:
> 
>     t=0            |t=0.66           |t=1.33             |t=2              |t=2.66
>                    |wnd0: [0.66,2)   |                   |wnd0: [2,3.33)   |
>     wnd1: [0,0.66) |                 |wnd1: [1.33,2.66)  |                 |
> 
> Return from:
> 
>     wnd1-----------|wnd1-------------|wnd0---------------|wnd1-------------|wnd0
> 
> So you always have 2/3 seconds worth of data, and on average exactly 1 second
> worth of data.
> 
> The problem is the delay in getting data, which can be big for the minute-
> and hour-based statistics.  Suppose you have a spike that lasts 10 seconds,
> it might not show in the minute-based statistics for as much as 30 seconds
> after it ends (the window switches every 40 seconds).
> 
> For min/max you could return min(min0, min1) and max(max0, max1).  Only the
> average has this problem.
> 
> Exponential smoothing doesn't have this problem.  IIRC uptime uses that.

I am writing this so cloud end users can programatically get informations about
their vms disk statistics.

Cloud end users are known to use their cloud API to script the elasticity of their
architecture.
Some code will poll system statistics to decide if new instances must be launched
or existing instances must be pruned.
This means introducing a delay in the accounting code would slow down their
decisions.

min and max is also useful to know since it gives an idea of the deviation.

So I think the first method you suggested would be the best for a cloud vm.

Best regards

Benoît

> 
> Paolo
Markus Armbruster Sept. 15, 2014, 10:23 a.m. UTC | #5
Benoît Canet <benoit.canet@nodalink.com> writes:

> The algorithm used was defined on the list while discussing the new IO accounting
> overhaul.
> See http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04954.html
>
> Also the module takes care of computing minimal and maximal values over the time
> slice duration.

Please limit commit message line length to about 70 characters, for
readability.  This includes the subject.  Not hard:

    util: Infrastructure for computing averages within a time slice

or

    util: Infrastructure for computing recent averages

Looking forward to v3.
Markus Armbruster Sept. 15, 2014, 11:13 a.m. UTC | #6
Benoît Canet <benoit.canet@nodalink.com> writes:

> On Mon, Sep 08, 2014 at 05:09:38PM +0200, Paolo Bonzini wrote:
>> Il 08/09/2014 16:49, Benoît Canet ha scritto:
>> >> > - create two windows, with twice the suggested expiration period, and
>> >> > return min/avg/max from the oldest window.  Example
>> >> > 
>> >> >        t=0          |t=1          |t=2          |t=3          |t=4
>> >> >        wnd0: [0,1)  |wnd0: [1,3)  |             |wnd0: [3,5)  |
>> >> >        wnd1: [0,2)  |             |wnd1: [2,4)  |             |
>> >> > 
>> >> > Values are returned from:
>> >> > 
>> >> >        wnd0---------|wnd1---------|wnd0---------|wnd1---------|
>> > 
>> > This is neat.
>> 
>> Alternatively, you can make it probabilistically correct:
>> 
>>     t=0            |t=0.66           |t=1.33             |t=2              |t=2.66
>>                    |wnd0: [0.66,2)   |                   |wnd0: [2,3.33)   |
>>     wnd1: [0,0.66) |                 |wnd1: [1.33,2.66)  |                 |
>> 
>> Return from:
>> 
>>     wnd1-----------|wnd1-------------|wnd0---------------|wnd1-------------|wnd0
>> 
>> So you always have 2/3 seconds worth of data, and on average exactly 1 second
>> worth of data.
>> 
>> The problem is the delay in getting data, which can be big for the minute-
>> and hour-based statistics.  Suppose you have a spike that lasts 10 seconds,
>> it might not show in the minute-based statistics for as much as 30 seconds
>> after it ends (the window switches every 40 seconds).
>> 
>> For min/max you could return min(min0, min1) and max(max0, max1).  Only the
>> average has this problem.
>> 
>> Exponential smoothing doesn't have this problem.  IIRC uptime uses that.
>
> I am writing this so cloud end users can programatically get informations about
> their vms disk statistics.
>
> Cloud end users are known to use their cloud API to script the
> elasticity of their
> architecture.
> Some code will poll system statistics to decide if new instances must
> be launched
> or existing instances must be pruned.
> This means introducing a delay in the accounting code would slow down their
> decisions.
>
> min and max is also useful to know since it gives an idea of the deviation.

For what it's worth, the algorithm in the Dr. Dobb's Paolo referenced
can compute a standard deviation.  Can we figure out what users really
want, standard deviation, min/max, or both?

[...]
Benoît Canet Sept. 15, 2014, 11:41 a.m. UTC | #7
On Mon, Sep 15, 2014 at 01:13:08PM +0200, Markus Armbruster wrote:
> Benoît Canet <benoit.canet@nodalink.com> writes:
> 
> > On Mon, Sep 08, 2014 at 05:09:38PM +0200, Paolo Bonzini wrote:
> >> Il 08/09/2014 16:49, Benoît Canet ha scritto:
> >> >> > - create two windows, with twice the suggested expiration period, and
> >> >> > return min/avg/max from the oldest window.  Example
> >> >> > 
> >> >> >        t=0          |t=1          |t=2          |t=3          |t=4
> >> >> >        wnd0: [0,1)  |wnd0: [1,3)  |             |wnd0: [3,5)  |
> >> >> >        wnd1: [0,2)  |             |wnd1: [2,4)  |             |
> >> >> > 
> >> >> > Values are returned from:
> >> >> > 
> >> >> >        wnd0---------|wnd1---------|wnd0---------|wnd1---------|
> >> > 
> >> > This is neat.
> >> 
> >> Alternatively, you can make it probabilistically correct:
> >> 
> >>     t=0            |t=0.66           |t=1.33             |t=2              |t=2.66
> >>                    |wnd0: [0.66,2)   |                   |wnd0: [2,3.33)   |
> >>     wnd1: [0,0.66) |                 |wnd1: [1.33,2.66)  |                 |
> >> 
> >> Return from:
> >> 
> >>     wnd1-----------|wnd1-------------|wnd0---------------|wnd1-------------|wnd0
> >> 
> >> So you always have 2/3 seconds worth of data, and on average exactly 1 second
> >> worth of data.
> >> 
> >> The problem is the delay in getting data, which can be big for the minute-
> >> and hour-based statistics.  Suppose you have a spike that lasts 10 seconds,
> >> it might not show in the minute-based statistics for as much as 30 seconds
> >> after it ends (the window switches every 40 seconds).
> >> 
> >> For min/max you could return min(min0, min1) and max(max0, max1).  Only the
> >> average has this problem.
> >> 
> >> Exponential smoothing doesn't have this problem.  IIRC uptime uses that.
> >
> > I am writing this so cloud end users can programatically get informations about
> > their vms disk statistics.
> >
> > Cloud end users are known to use their cloud API to script the
> > elasticity of their
> > architecture.
> > Some code will poll system statistics to decide if new instances must
> > be launched
> > or existing instances must be pruned.
> > This means introducing a delay in the accounting code would slow down their
> > decisions.
> >
> > min and max is also useful to know since it gives an idea of the deviation.
> 
> For what it's worth, the algorithm in the Dr. Dobb's Paolo referenced
> can compute a standard deviation.  Can we figure out what users really
> want, standard deviation, min/max, or both?

I'll ask to my test subject.

> 
> [...]
Benoît Canet Sept. 15, 2014, 11:44 a.m. UTC | #8
On Mon, Sep 15, 2014 at 01:13:08PM +0200, Markus Armbruster wrote:
> Benoît Canet <benoit.canet@nodalink.com> writes:
> 
> > On Mon, Sep 08, 2014 at 05:09:38PM +0200, Paolo Bonzini wrote:
> >> Il 08/09/2014 16:49, Benoît Canet ha scritto:
> >> >> > - create two windows, with twice the suggested expiration period, and
> >> >> > return min/avg/max from the oldest window.  Example
> >> >> > 
> >> >> >        t=0          |t=1          |t=2          |t=3          |t=4
> >> >> >        wnd0: [0,1)  |wnd0: [1,3)  |             |wnd0: [3,5)  |
> >> >> >        wnd1: [0,2)  |             |wnd1: [2,4)  |             |
> >> >> > 
> >> >> > Values are returned from:
> >> >> > 
> >> >> >        wnd0---------|wnd1---------|wnd0---------|wnd1---------|
> >> > 
> >> > This is neat.
> >> 
> >> Alternatively, you can make it probabilistically correct:
> >> 
> >>     t=0            |t=0.66           |t=1.33             |t=2              |t=2.66
> >>                    |wnd0: [0.66,2)   |                   |wnd0: [2,3.33)   |
> >>     wnd1: [0,0.66) |                 |wnd1: [1.33,2.66)  |                 |
> >> 
> >> Return from:
> >> 
> >>     wnd1-----------|wnd1-------------|wnd0---------------|wnd1-------------|wnd0
> >> 
> >> So you always have 2/3 seconds worth of data, and on average exactly 1 second
> >> worth of data.
> >> 
> >> The problem is the delay in getting data, which can be big for the minute-
> >> and hour-based statistics.  Suppose you have a spike that lasts 10 seconds,
> >> it might not show in the minute-based statistics for as much as 30 seconds
> >> after it ends (the window switches every 40 seconds).
> >> 
> >> For min/max you could return min(min0, min1) and max(max0, max1).  Only the
> >> average has this problem.
> >> 
> >> Exponential smoothing doesn't have this problem.  IIRC uptime uses that.
> >
> > I am writing this so cloud end users can programatically get informations about
> > their vms disk statistics.
> >
> > Cloud end users are known to use their cloud API to script the
> > elasticity of their
> > architecture.
> > Some code will poll system statistics to decide if new instances must
> > be launched
> > or existing instances must be pruned.
> > This means introducing a delay in the accounting code would slow down their
> > decisions.
> >
> > min and max is also useful to know since it gives an idea of the deviation.
> 
> For what it's worth, the algorithm in the Dr. Dobb's Paolo referenced
> can compute a standard deviation.  Can we figure out what users really
> want, standard deviation, min/max, or both?

My test subject think that min/max convey more information than standard deviation
and that anyway the second can be computed with the formers.

Does Red Hat have other tests subjects at hand ?

Best regards

Benoît

> 
> [...]
Benoît Canet Sept. 24, 2014, 1:26 p.m. UTC | #9
On Mon, Sep 08, 2014 at 04:29:26PM +0200, Paolo Bonzini wrote:
> Il 08/09/2014 14:18, Benoît Canet ha scritto:
> > The algorithm used was defined on the list while discussing the new IO accounting
> > overhaul.
> > See http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04954.html
> > 
> > Also the module takes care of computing minimal and maximal values over the time
> > slice duration.
> > 
> > Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
> 
> If you add
> 
> int64_t cpu_get_clock(void)
> {
>     return my_clock_value;
> }
> 
> to the test, and use a QEMU_CLOCK_VIRTUAL-based average, you should be
> able to advance the clock directly in the test with no need for sleep()
> and with 100% deterministic results.

That's a really neat trick. Thanks
diff mbox

Patch

diff --git a/include/qemu/timed-average.h b/include/qemu/timed-average.h
new file mode 100644
index 0000000..200490c
--- /dev/null
+++ b/include/qemu/timed-average.h
@@ -0,0 +1,49 @@ 
+/*
+ * QEMU timed average computation
+ *
+ * Copyright (C) Nodalink, EURL. 2014
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@nodalink.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) version 3 or any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TIMED_AVERAGE_H
+#define TIMED_AVERAGE_H
+
+#include <stdint.h>
+
+#include "qemu/timer.h"
+
+typedef struct TimedAverage {
+    uint64_t      sum;           /* sum of the discrete values */
+    uint64_t      min;           /* minimal value accounted in the slice */
+    uint64_t      max;           /* maximal value accounted in the slice */
+    uint64_t      count;         /* number of values */
+    uint64_t      period;        /* period in seconds */
+    int64_t       expiration;    /* the end of the current period slice in ns */
+    QEMUClockType clock_type;    /* the clock used */
+} TimedAverage;
+
+void timed_average_init(TimedAverage *ta, QEMUClockType clock_type,
+                        uint64_t period);
+
+void timed_average_account(TimedAverage *ta, uint64_t value);
+
+uint64_t timed_average_min(TimedAverage *ta);
+uint64_t timed_average_avg(TimedAverage *ta);
+uint64_t timed_average_max(TimedAverage *ta);
+
+#endif
diff --git a/tests/Makefile b/tests/Makefile
index 469c0a5..9b51b32 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -64,6 +64,7 @@  gcov-files-check-qom-interface-y = qom/object.c
 check-unit-$(CONFIG_POSIX) += tests/test-vmstate$(EXESUF)
 check-unit-y += tests/test-qemu-opts$(EXESUF)
 gcov-files-test-qemu-opts-y = qom/test-qemu-opts.c
+check-unit-y += tests/test-timed-average$(EXESUF)
 
 check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
 
@@ -259,6 +260,7 @@  tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
 tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
 	vmstate.o qemu-file.o \
 	libqemuutil.a
+tests/test-timed-average$(EXESUF): tests/test-timed-average.o qemu-timer.o libqemuutil.a libqemustub.a
 
 tests/test-qapi-types.c tests/test-qapi-types.h :\
 $(SRC_PATH)/tests/qapi-schema/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py
diff --git a/tests/test-timed-average.c b/tests/test-timed-average.c
new file mode 100644
index 0000000..ec35683
--- /dev/null
+++ b/tests/test-timed-average.c
@@ -0,0 +1,62 @@ 
+/*
+ * Timed average computation tests
+ *
+ * Copyright Nodalink, EURL. 2014
+ *
+ * Authors:
+ *  Benoît Canet     <benoit.canet@nodalink.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <unistd.h>
+
+#include "qemu/timed-average.h"
+
+static void account(TimedAverage *ta)
+{
+    timed_average_account(ta, 1);
+    timed_average_account(ta, 5);
+    timed_average_account(ta, 2);
+    timed_average_account(ta, 4);
+    timed_average_account(ta, 3);
+}
+
+static void test_average(void)
+{
+    TimedAverage ta;
+    uint64_t result;
+
+    /* we will compute some average on a period of 1 second */
+    timed_average_init(&ta, QEMU_CLOCK_HOST, 1);
+    account(&ta);
+    result = timed_average_min(&ta);
+    g_assert(result == 1);
+    result = timed_average_avg(&ta);
+    g_assert(result == 3);
+    result = timed_average_max(&ta);
+    g_assert(result == 5);
+
+    /* sleep a while so the current slice expires */
+    sleep(1);
+
+    /* recompute the same average */
+    account(&ta);
+    result = timed_average_min(&ta);
+    g_assert(result == 1);
+    result = timed_average_avg(&ta);
+    g_assert(result == 3);
+    result = timed_average_max(&ta);
+    g_assert(result == 5);
+}
+
+int main(int argc, char **argv)
+{
+    /* tests in the same order as the header function declarations */
+    g_test_init(&argc, &argv, NULL);
+    g_test_add_func("/timed-average/average", test_average);
+    return g_test_run();
+}
+
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 6b3c83b..97d82ce 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -15,3 +15,4 @@  util-obj-y += throttle.o
 util-obj-y += getauxval.o
 util-obj-y += readline.o
 util-obj-y += rfifolock.o
+util-obj-y += timed-average.o
diff --git a/util/timed-average.c b/util/timed-average.c
new file mode 100644
index 0000000..ac45ea7
--- /dev/null
+++ b/util/timed-average.c
@@ -0,0 +1,138 @@ 
+/*
+ * QEMU timed average computation
+ *
+ * Copyright (C) Nodalink, EURL. 2014
+ *
+ * Author:
+ *   Benoît Canet <benoit.canet@nodalink.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) version 3 or any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <string.h>
+
+#include "qemu/timed-average.h"
+
+/* This module compute an average on a time slice of ta->period seconds
+ */
+
+/* Set the expiration of the ta->periods seconds time slice
+ *
+ * @ta: the timed average structure used
+ */
+static void timed_average_set_expiration(TimedAverage *ta)
+{
+    int64_t period = ta->period * NANOSECONDS_PER_SECOND;
+    int64_t now = qemu_clock_get_ns(ta->clock_type);
+    ta->expiration = ROUND_UP(now + 1, period);
+}
+
+/* Check if the ta->periods seconds time slice has expired
+ *
+ * If the slice has expired the counters will be reseted
+ *
+ * @ta: the timed average structure used
+ */
+static void timed_average_check_expiration(TimedAverage *ta)
+{
+    int64_t now = qemu_clock_get_ns(ta->clock_type);
+
+    /* if we are still in the period slice do nothing */
+    if (now < ta->expiration) {
+        return;
+    }
+
+    /* the slice has expired -> create a new slice */
+    ta->min = UINT64_MAX;
+    ta->sum = 0;
+    ta->max = 0;
+    ta->count = 0;
+    timed_average_set_expiration(ta);
+}
+
+/* Initialize a TimedAverage structure
+ *
+ * @ta:         the timed average structure used
+ * @clock_type: the type of clock to use
+ * @period:     the time slice period in seconds
+ */
+void timed_average_init(TimedAverage *ta, QEMUClockType clock_type,
+                        uint64_t period)
+{
+    memset(ta, 0, sizeof(TimedAverage));
+    ta->min = UINT64_MAX;
+    ta->period = period;
+    ta->clock_type = clock_type;
+    timed_average_set_expiration(ta);
+}
+
+/* Account a value in the average
+ *
+ * @ta:    the timed average structure used
+ * @value: the value to account in the average
+ */
+void timed_average_account(TimedAverage *ta, uint64_t value)
+{
+    timed_average_check_expiration(ta);
+    ta->sum += value;
+    ta->count++;
+
+    if (value < ta->min) {
+        ta->min = value;
+    }
+
+    if (value > ta->max) {
+        ta->max = value;
+    }
+}
+
+/* Get the minimal value
+ *
+ * @ta: the timed average structure used
+ */
+uint64_t timed_average_min(TimedAverage *ta)
+{
+    timed_average_check_expiration(ta);
+
+    if (ta->min == UINT64_MAX) {
+        return 0;
+    }
+
+    return ta->min;
+}
+
+/* Get the average value
+ *
+ * @ta: the timed average structure used
+ */
+uint64_t timed_average_avg(TimedAverage *ta)
+{
+    timed_average_check_expiration(ta);
+
+    if (ta->count) {
+        return ta->sum / ta->count;
+    }
+
+    return 0;
+}
+
+/* Get the maximum value
+ *
+ * @ta: the timed average structure used
+ */
+uint64_t timed_average_max(TimedAverage *ta)
+{
+    timed_average_check_expiration(ta);
+    return ta->max;
+}