diff mbox

test-coroutine: add baseline test that times the cost of function calls

Message ID 1407317621-30591-1-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Aug. 6, 2014, 9:33 a.m. UTC
This can be used to compute the cost of coroutine operations.  In the
end the cost of the function call is a few clock cycles, so it's pretty
cheap for now, but it may become more relevant as the coroutine code
is optimized.

For example, here are the results on my machine:

   Function call 100000000 iterations: 0.173884 s
   Yield 100000000 iterations: 8.445064 s
   Lifecycle 1000000 iterations: 0.098445 s
   Nesting 10000 iterations of 1000 depth each: 7.406431 s

One yield takes 83 nanoseconds, one enter takes 97 nanoseconds,
one coroutine allocation takes (roughly, since some of the allocations
in the nesting test do hit the pool) 739 nanoseconds:

   (8.445064 - 0.173884) * 10^9 / 100000000 = 82.7
   (0.098445 * 100 - 0.173884) * 10^9 / 100000000 = 96.7
   (7.406431 * 10 - 0.173884) * 10^9 / 100000000 = 738.9

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tests/test-coroutine.c | 24 ++++++++++++++++
 1 files changed, 24 insertions(+)

Comments

Stefan Hajnoczi Aug. 7, 2014, 9:33 a.m. UTC | #1
On Wed, Aug 06, 2014 at 11:33:41AM +0200, Paolo Bonzini wrote:
> This can be used to compute the cost of coroutine operations.  In the
> end the cost of the function call is a few clock cycles, so it's pretty
> cheap for now, but it may become more relevant as the coroutine code
> is optimized.
> 
> For example, here are the results on my machine:
> 
>    Function call 100000000 iterations: 0.173884 s
>    Yield 100000000 iterations: 8.445064 s
>    Lifecycle 1000000 iterations: 0.098445 s
>    Nesting 10000 iterations of 1000 depth each: 7.406431 s
> 
> One yield takes 83 nanoseconds, one enter takes 97 nanoseconds,
> one coroutine allocation takes (roughly, since some of the allocations
> in the nesting test do hit the pool) 739 nanoseconds:
> 
>    (8.445064 - 0.173884) * 10^9 / 100000000 = 82.7
>    (0.098445 * 100 - 0.173884) * 10^9 / 100000000 = 96.7
>    (7.406431 * 10 - 0.173884) * 10^9 / 100000000 = 738.9
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  tests/test-coroutine.c | 24 ++++++++++++++++
>  1 files changed, 24 insertions(+)

Can't hurt to have this as a comparison.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Kevin Wolf Aug. 7, 2014, 2:08 p.m. UTC | #2
Am 07.08.2014 um 11:33 hat Stefan Hajnoczi geschrieben:
> On Wed, Aug 06, 2014 at 11:33:41AM +0200, Paolo Bonzini wrote:
> > This can be used to compute the cost of coroutine operations.  In the
> > end the cost of the function call is a few clock cycles, so it's pretty
> > cheap for now, but it may become more relevant as the coroutine code
> > is optimized.
> > 
> > For example, here are the results on my machine:
> > 
> >    Function call 100000000 iterations: 0.173884 s
> >    Yield 100000000 iterations: 8.445064 s
> >    Lifecycle 1000000 iterations: 0.098445 s
> >    Nesting 10000 iterations of 1000 depth each: 7.406431 s
> > 
> > One yield takes 83 nanoseconds, one enter takes 97 nanoseconds,
> > one coroutine allocation takes (roughly, since some of the allocations
> > in the nesting test do hit the pool) 739 nanoseconds:
> > 
> >    (8.445064 - 0.173884) * 10^9 / 100000000 = 82.7
> >    (0.098445 * 100 - 0.173884) * 10^9 / 100000000 = 96.7
> >    (7.406431 * 10 - 0.173884) * 10^9 / 100000000 = 738.9
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  tests/test-coroutine.c | 24 ++++++++++++++++
> >  1 files changed, 24 insertions(+)
> 
> Can't hurt to have this as a comparison.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Thanks, applied to the block branch.

Kevin
Ming Lei Aug. 11, 2014, 12:44 a.m. UTC | #3
Hi,

On Wed, Aug 6, 2014 at 5:33 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> This can be used to compute the cost of coroutine operations.  In the
> end the cost of the function call is a few clock cycles, so it's pretty
> cheap for now, but it may become more relevant as the coroutine code
> is optimized.
>
> For example, here are the results on my machine:
>
>    Function call 100000000 iterations: 0.173884 s
>    Yield 100000000 iterations: 8.445064 s
>    Lifecycle 1000000 iterations: 0.098445 s
>    Nesting 10000 iterations of 1000 depth each: 7.406431 s
>
> One yield takes 83 nanoseconds, one enter takes 97 nanoseconds,
> one coroutine allocation takes (roughly, since some of the allocations
> in the nesting test do hit the pool) 739 nanoseconds:
>
>    (8.445064 - 0.173884) * 10^9 / 100000000 = 82.7
>    (0.098445 * 100 - 0.173884) * 10^9 / 100000000 = 96.7
>    (7.406431 * 10 - 0.173884) * 10^9 / 100000000 = 738.9

IMO, it isn't enough to only consider the cost of coroutine APIs,
given stack is switched back and forth and its usage is inevitable
in reality, the cost of this part should be tested too.

I will post a patch to test costs of coroutine from user view today,
so that the total costs can be observed with using coroutine.

Thanks,
Ming Lei Aug. 11, 2014, 10:08 a.m. UTC | #4
On Wed, Aug 6, 2014 at 5:33 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> This can be used to compute the cost of coroutine operations.  In the
> end the cost of the function call is a few clock cycles, so it's pretty
> cheap for now, but it may become more relevant as the coroutine code
> is optimized.
>
> For example, here are the results on my machine:
>
>    Function call 100000000 iterations: 0.173884 s
>    Yield 100000000 iterations: 8.445064 s
>    Lifecycle 1000000 iterations: 0.098445 s
>    Nesting 10000 iterations of 1000 depth each: 7.406431 s
>
> One yield takes 83 nanoseconds, one enter takes 97 nanoseconds,
> one coroutine allocation takes (roughly, since some of the allocations
> in the nesting test do hit the pool) 739 nanoseconds:
>
>    (8.445064 - 0.173884) * 10^9 / 100000000 = 82.7
>    (0.098445 * 100 - 0.173884) * 10^9 / 100000000 = 96.7
>    (7.406431 * 10 - 0.173884) * 10^9 / 100000000 = 738.9

Thought about it further, the above is _not_ cheap if it is computed correctly,
take block layer for example:

- suppose one block device can reach 300K IOPS, like Kevin's loop
over tmpfs file, so handling one IO takes 3.333us
- for handling one IO, at least two enter and one yield are required in current
implementation, so these three operations take 0.277us(0.083 + 0.097 * 2)
for handling one IO(suppose all allocations hit the pool)
- so coroutine-only operations take 8.31%(0.277/3.333), not mention effect
from switching stack

Some modern storage devices can handle millions of IOPS, so using
coroutine surely will slow down these devices a lot.

Thanks,
diff mbox

Patch

diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
index 760636d..6e634f4 100644
--- a/tests/test-coroutine.c
+++ b/tests/test-coroutine.c
@@ -288,6 +288,29 @@  static void perf_yield(void)
         maxcycles, duration);
 }
 
+static __attribute__((noinline)) void dummy(unsigned *i)
+{
+    (*i)--;
+}
+
+static void perf_baseline(void)
+{
+    unsigned int i, maxcycles;
+    double duration;
+
+    maxcycles = 100000000;
+    i = maxcycles;
+
+    g_test_timer_start();
+    while (i > 0) {
+        dummy(&i);
+    }
+    duration = g_test_timer_elapsed();
+
+    g_test_message("Function call %u iterations: %f s\n",
+        maxcycles, duration);
+}
+
 int main(int argc, char **argv)
 {
     g_test_init(&argc, &argv, NULL);
@@ -301,6 +324,7 @@  int main(int argc, char **argv)
         g_test_add_func("/perf/lifecycle", perf_lifecycle);
         g_test_add_func("/perf/nesting", perf_nesting);
         g_test_add_func("/perf/yield", perf_yield);
+        g_test_add_func("/perf/function-call", perf_baseline);
     }
     return g_test_run();
 }