Patchwork [1/2] coroutine: introduce coroutines

login
register
mail settings
Submitter Stefan Hajnoczi
Date May 11, 2011, 10:15 a.m.
Message ID <1305108925-26048-2-git-send-email-stefanha@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/95123/
State New
Headers show

Comments

Stefan Hajnoczi - May 11, 2011, 10:15 a.m.
From: Kevin Wolf <kwolf@redhat.com>

Asynchronous code is becoming very complex.  At the same time
synchronous code is growing because it is convenient to write.
Sometimes duplicate code paths are even added, one synchronous and the
other asynchronous.  This patch introduces coroutines which allow code
that looks synchronous but is asynchronous under the covers.

A coroutine has its own stack and is therefore able to preserve state
across blocking operations, which traditionally require callback
functions and manual marshalling of parameters.

Creating and starting a coroutine is easy:

  coroutine = qemu_coroutine_create(my_coroutine);
  qemu_coroutine_enter(coroutine, my_data);

The coroutine then executes until it returns or yields:

  void coroutine_fn my_coroutine(void *opaque) {
      MyData *my_data = opaque;

      /* do some work */

      qemu_coroutine_yield();

      /* do some more work */
  }

Yielding switches control back to the caller of qemu_coroutine_enter().
This is typically used to switch back to the main thread's event loop
after issuing an asynchronous I/O request.  The request callback will
then invoke qemu_coroutine_enter() once more to switch back to the
coroutine.

Note that coroutines never execute concurrently and should only be used
from threads which hold the global mutex.  This restriction makes
programming with coroutines easier than with threads.  Race conditions
cannot occur since only one coroutine may be active at any time.  Other
coroutines can only run across yield.

This coroutines implementation is based on the gtk-vnc implementation
written by Anthony Liguori <anthony@codemonkey.ws> but it has been
significantly rewritten by Kevin Wolf <kwolf@redhat.com> to use
setjmp()/longjmp() instead of the more expensive swapcontext().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 Makefile.objs        |    7 +++
 coroutine-ucontext.c |   73 +++++++++++++++++++++++++++
 coroutine-win32.c    |   57 +++++++++++++++++++++
 qemu-coroutine-int.h |   57 +++++++++++++++++++++
 qemu-coroutine.c     |  132 ++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-coroutine.h     |   82 +++++++++++++++++++++++++++++++
 trace-events         |    5 ++
 7 files changed, 413 insertions(+), 0 deletions(-)
 create mode 100644 coroutine-ucontext.c
 create mode 100644 coroutine-win32.c
 create mode 100644 qemu-coroutine-int.h
 create mode 100644 qemu-coroutine.c
 create mode 100644 qemu-coroutine.h
Kevin Wolf - May 11, 2011, 11:20 a.m.
Am 11.05.2011 12:15, schrieb Stefan Hajnoczi:
> From: Kevin Wolf <kwolf@redhat.com>
> 
> Asynchronous code is becoming very complex.  At the same time
> synchronous code is growing because it is convenient to write.
> Sometimes duplicate code paths are even added, one synchronous and the
> other asynchronous.  This patch introduces coroutines which allow code
> that looks synchronous but is asynchronous under the covers.
> 
> A coroutine has its own stack and is therefore able to preserve state
> across blocking operations, which traditionally require callback
> functions and manual marshalling of parameters.
> 
> Creating and starting a coroutine is easy:
> 
>   coroutine = qemu_coroutine_create(my_coroutine);
>   qemu_coroutine_enter(coroutine, my_data);
> 
> The coroutine then executes until it returns or yields:
> 
>   void coroutine_fn my_coroutine(void *opaque) {
>       MyData *my_data = opaque;
> 
>       /* do some work */
> 
>       qemu_coroutine_yield();
> 
>       /* do some more work */
>   }
> 
> Yielding switches control back to the caller of qemu_coroutine_enter().
> This is typically used to switch back to the main thread's event loop
> after issuing an asynchronous I/O request.  The request callback will
> then invoke qemu_coroutine_enter() once more to switch back to the
> coroutine.
> 
> Note that coroutines never execute concurrently and should only be used
> from threads which hold the global mutex.  This restriction makes
> programming with coroutines easier than with threads.  Race conditions
> cannot occur since only one coroutine may be active at any time.  Other
> coroutines can only run across yield.
> 
> This coroutines implementation is based on the gtk-vnc implementation
> written by Anthony Liguori <anthony@codemonkey.ws> but it has been
> significantly rewritten by Kevin Wolf <kwolf@redhat.com> to use
> setjmp()/longjmp() instead of the more expensive swapcontext().
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

For the diff between my latest version and this patch:

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Paolo Bonzini - May 11, 2011, 12:04 p.m.
On 05/11/2011 12:15 PM, Stefan Hajnoczi wrote:
> +#ifdef __i386__
> +    asm volatile(
> +        "mov %%esp, %%ebx;"
> +        "mov %0, %%esp;"
> +        "pushl %1;"
> +        "call _trampoline;"
> +        "mov %%ebx, %%esp;"
> +        : : "r" (co->stack + co->stack_size), "r" (co) : "ebx"
> +    );

This is incomplete, it should set FS:[4] and FS:[8] to top and bottom of 
stack respectively, otherwise exception handling (including SIGSEGV) is 
broken.  But I think for Windows it's anyway better to use fibers. 
Commit this and either I or Stefan Weil will fix it. :)

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo
Kevin Wolf - May 11, 2011, 12:15 p.m.
Am 11.05.2011 14:04, schrieb Paolo Bonzini:
> On 05/11/2011 12:15 PM, Stefan Hajnoczi wrote:
>> +#ifdef __i386__
>> +    asm volatile(
>> +        "mov %%esp, %%ebx;"
>> +        "mov %0, %%esp;"
>> +        "pushl %1;"
>> +        "call _trampoline;"
>> +        "mov %%ebx, %%esp;"
>> +        : : "r" (co->stack + co->stack_size), "r" (co) : "ebx"
>> +    );
> 
> This is incomplete, it should set FS:[4] and FS:[8] to top and bottom of 
> stack respectively, otherwise exception handling (including SIGSEGV) is 
> broken.  But I think for Windows it's anyway better to use fibers. 
> Commit this and either I or Stefan Weil will fix it. :)
> 
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Yeah, I didn't feel like searching for the right APIs, but wanted to
have something that builds for win32 at least. And this one seemed to
work for some simple tests in Wine.

So my plan with it was to CC Stefan Weil and have him provide the real
implementation that works on 64 bit, too. ;-)

Kevin
Anthony Liguori - May 11, 2011, 12:36 p.m.
On 05/11/2011 05:15 AM, Stefan Hajnoczi wrote:
> From: Kevin Wolf<kwolf@redhat.com>
>
> Asynchronous code is becoming very complex.  At the same time
> synchronous code is growing because it is convenient to write.
> Sometimes duplicate code paths are even added, one synchronous and the
> other asynchronous.  This patch introduces coroutines which allow code
> that looks synchronous but is asynchronous under the covers.
>
> A coroutine has its own stack and is therefore able to preserve state
> across blocking operations, which traditionally require callback
> functions and manual marshalling of parameters.
>
> Creating and starting a coroutine is easy:
>
>    coroutine = qemu_coroutine_create(my_coroutine);
>    qemu_coroutine_enter(coroutine, my_data);

Why do away with yieldto?

Do we have performance data around setjmp vs. setcontext?

>
> The coroutine then executes until it returns or yields:
>
>    void coroutine_fn my_coroutine(void *opaque) {
>        MyData *my_data = opaque;
>
>        /* do some work */
>
>        qemu_coroutine_yield();
>
>        /* do some more work */
>    }
>
> Yielding switches control back to the caller of qemu_coroutine_enter().
> This is typically used to switch back to the main thread's event loop
> after issuing an asynchronous I/O request.  The request callback will
> then invoke qemu_coroutine_enter() once more to switch back to the
> coroutine.
>
> Note that coroutines never execute concurrently and should only be used
> from threads which hold the global mutex.  This restriction makes
> programming with coroutines easier than with threads.  Race conditions
> cannot occur since only one coroutine may be active at any time.  Other
> coroutines can only run across yield.
>
> This coroutines implementation is based on the gtk-vnc implementation
> written by Anthony Liguori<anthony@codemonkey.ws>  but it has been
> significantly rewritten by Kevin Wolf<kwolf@redhat.com>  to use
> setjmp()/longjmp() instead of the more expensive swapcontext().
>
> Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> Signed-off-by: Stefan Hajnoczi<stefanha@linux.vnet.ibm.com>
> ---
>   Makefile.objs        |    7 +++
>   coroutine-ucontext.c |   73 +++++++++++++++++++++++++++
>   coroutine-win32.c    |   57 +++++++++++++++++++++
>   qemu-coroutine-int.h |   57 +++++++++++++++++++++
>   qemu-coroutine.c     |  132 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   qemu-coroutine.h     |   82 +++++++++++++++++++++++++++++++
>   trace-events         |    5 ++
>   7 files changed, 413 insertions(+), 0 deletions(-)
>   create mode 100644 coroutine-ucontext.c
>   create mode 100644 coroutine-win32.c
>   create mode 100644 qemu-coroutine-int.h
>   create mode 100644 qemu-coroutine.c
>   create mode 100644 qemu-coroutine.h
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 9d8851e..cba6c2b 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -11,6 +11,12 @@ oslib-obj-$(CONFIG_WIN32) += oslib-win32.o
>   oslib-obj-$(CONFIG_POSIX) += oslib-posix.o
>
>   #######################################################################
> +# coroutines
> +coroutine-obj-y = qemu-coroutine.o
> +coroutine-obj-$(CONFIG_POSIX) += coroutine-ucontext.o
> +coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
> +
> +#######################################################################
>   # block-obj-y is code used by both qemu system emulation and qemu-img
>
>   block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o async.o
> @@ -67,6 +73,7 @@ common-obj-y += readline.o console.o cursor.o qemu-error.o
>   common-obj-y += $(oslib-obj-y)
>   common-obj-$(CONFIG_WIN32) += os-win32.o
>   common-obj-$(CONFIG_POSIX) += os-posix.o
> +common-obj-y += $(coroutine-obj-y)
>
>   common-obj-y += tcg-runtime.o host-utils.o
>   common-obj-y += irq.o ioport.o input.o
> diff --git a/coroutine-ucontext.c b/coroutine-ucontext.c
> new file mode 100644
> index 0000000..97f2b35
> --- /dev/null
> +++ b/coroutine-ucontext.c
> @@ -0,0 +1,73 @@
> +/*
> + * ucontext coroutine initialization code
> + *
> + * Copyright (C) 2006  Anthony Liguori<anthony@codemonkey.ws>
> + * Copyright (C) 2011  Kevin Wolf<kwolf@redhat.com>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.0 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301 USA
> + */
> +
> +/* XXX Is there a nicer way to disable glibc's stack check for longjmp? */
> +#ifdef _FORTIFY_SOURCE
> +#undef _FORTIFY_SOURCE
> +#endif
> +#include<setjmp.h>
> +#include<stdint.h>
> +#include<ucontext.h>
> +#include "qemu-coroutine-int.h"
> +
> +static Coroutine *new_coroutine;
> +
> +static void continuation_trampoline(void)
> +{
> +    Coroutine *co = new_coroutine;
> +
> +    /* Initialize longjmp environment and switch back to
> +     * qemu_coroutine_init_env() in the old ucontext. */
> +    if (!setjmp(co->env)) {
> +        return;
> +    }
> +
> +    while (true) {
> +        co->entry(co->data);
> +        if (!setjmp(co->env)) {
> +            longjmp(co->caller->env, COROUTINE_TERMINATE);
> +        }
> +    }
> +}
> +
> +int qemu_coroutine_init_env(Coroutine *co)
> +{
> +    ucontext_t old_uc, uc;
> +
> +    /* Create a new ucontext for switching to the coroutine stack and setting
> +     * up a longjmp environment. */
> +    if (getcontext(&uc) == -1) {
> +        return -errno;
> +    }
> +
> +    uc.uc_link =&old_uc;
> +    uc.uc_stack.ss_sp = co->stack;
> +    uc.uc_stack.ss_size = co->stack_size;
> +    uc.uc_stack.ss_flags = 0;
> +
> +    new_coroutine = co;
> +    makecontext(&uc, (void *)continuation_trampoline, 0);
> +
> +    /* Initialize the longjmp environment */
> +    swapcontext(&old_uc,&uc);
> +
> +    return 0;
> +}
> diff --git a/coroutine-win32.c b/coroutine-win32.c
> new file mode 100644
> index 0000000..f4521c3
> --- /dev/null
> +++ b/coroutine-win32.c
> @@ -0,0 +1,57 @@
> +/*
> + * Win32 coroutine initialization code
> + *
> + * Copyright (c) 2011 Kevin Wolf<kwolf@redhat.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu-coroutine-int.h"
> +
> +static void __attribute__((used)) trampoline(Coroutine *co)
> +{
> +    if (!setjmp(co->env)) {
> +        return;
> +    }
> +
> +    while (true) {
> +        co->entry(co->data);
> +        if (!setjmp(co->env)) {
> +            longjmp(co->caller->env, COROUTINE_TERMINATE);
> +        }
> +    }
> +}
> +
> +int qemu_coroutine_init_env(Coroutine *co)
> +{
> +#ifdef __i386__
> +    asm volatile(
> +        "mov %%esp, %%ebx;"
> +        "mov %0, %%esp;"
> +        "pushl %1;"
> +        "call _trampoline;"
> +        "mov %%ebx, %%esp;"
> +        : : "r" (co->stack + co->stack_size), "r" (co) : "ebx"
> +    );

So the only Linux host we support is x86??

We can't reasonably do this IMHO.  If we're going to go this route, we 
should at least fall back to setcontext for the sake of portability.

Regards,

Anthony Liguori
Paolo Bonzini - May 11, 2011, 12:46 p.m.
On 05/11/2011 02:36 PM, Anthony Liguori wrote:
>
> So the only Linux host we support is x86??
>
> We can't reasonably do this IMHO.  If we're going to go this route, we
> should at least fall back to setcontext for the sake of portability.

That was:

diff --git a/coroutine-win32.c b/coroutine-win32.c
new file mode 100644
index 0000000..f4521c3
--- /dev/null
+++ b/coroutine-win32.c

See my reply too.

Paolo
Anthony Liguori - May 11, 2011, 12:51 p.m.
On 05/11/2011 07:04 AM, Paolo Bonzini wrote:
> On 05/11/2011 12:15 PM, Stefan Hajnoczi wrote:
>> +#ifdef __i386__
>> + asm volatile(
>> + "mov %%esp, %%ebx;"
>> + "mov %0, %%esp;"
>> + "pushl %1;"
>> + "call _trampoline;"
>> + "mov %%ebx, %%esp;"
>> + : : "r" (co->stack + co->stack_size), "r" (co) : "ebx"
>> + );
>
> This is incomplete, it should set FS:[4] and FS:[8] to top and bottom of
> stack respectively, otherwise exception handling (including SIGSEGV) is
> broken. But I think for Windows it's anyway better to use fibers. Commit
> this and either I or Stefan Weil will fix it. :)
>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>

How about a generic thread fallback?  That's what we do in gtk-vnc and 
it solves the portability issue in a very robust way.

Regards,

Anthony Liguori

>
> Paolo
>
Paolo Bonzini - May 11, 2011, 12:52 p.m.
On 05/11/2011 02:51 PM, Anthony Liguori wrote:
> How about a generic thread fallback?  That's what we do in gtk-vnc and
> it solves the portability issue in a very robust way.

A very slow way, too (on Windows at least if you use qemu_cond...).

Paolo
Stefan Hajnoczi - May 11, 2011, 12:54 p.m.
On Wed, May 11, 2011 at 1:46 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 05/11/2011 02:36 PM, Anthony Liguori wrote:
> diff --git a/coroutine-win32.c b/coroutine-win32.c

Kevin: This reminds me that I did not run ./check-coroutine on win32.
If you are able to run it in your environment that would be good to
make sure the Windows code works.

Stefan
Anthony Liguori - May 11, 2011, 1:05 p.m.
On 05/11/2011 07:52 AM, Paolo Bonzini wrote:
> On 05/11/2011 02:51 PM, Anthony Liguori wrote:
>> How about a generic thread fallback? That's what we do in gtk-vnc and
>> it solves the portability issue in a very robust way.
>
> A very slow way, too (on Windows at least if you use qemu_cond...).

That doesn't mean you can't do a fiber implementation for Windows... but 
having a highly portable fallback is a good thing.

Regards,

Anthony Liguori

> Paolo
Kevin Wolf - May 11, 2011, 1:08 p.m.
Am 11.05.2011 14:36, schrieb Anthony Liguori:
> On 05/11/2011 05:15 AM, Stefan Hajnoczi wrote:
>> From: Kevin Wolf<kwolf@redhat.com>
>>
>> Asynchronous code is becoming very complex.  At the same time
>> synchronous code is growing because it is convenient to write.
>> Sometimes duplicate code paths are even added, one synchronous and the
>> other asynchronous.  This patch introduces coroutines which allow code
>> that looks synchronous but is asynchronous under the covers.
>>
>> A coroutine has its own stack and is therefore able to preserve state
>> across blocking operations, which traditionally require callback
>> functions and manual marshalling of parameters.
>>
>> Creating and starting a coroutine is easy:
>>
>>    coroutine = qemu_coroutine_create(my_coroutine);
>>    qemu_coroutine_enter(coroutine, my_data);
> 
> Why do away with yieldto?
> 
> Do we have performance data around setjmp vs. setcontext?

I did some quick ad-hoc tests when I introduced setjmp, but don't have
them any more. IIRC, it was something like a factor of 3.

>> diff --git a/coroutine-win32.c b/coroutine-win32.c
>> new file mode 100644
>> index 0000000..f4521c3
>> --- /dev/null
>> +++ b/coroutine-win32.c
>> @@ -0,0 +1,57 @@
>> +/*
>> + * Win32 coroutine initialization code
>> + *
>> + * Copyright (c) 2011 Kevin Wolf<kwolf@redhat.com>
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "qemu-coroutine-int.h"
>> +
>> +static void __attribute__((used)) trampoline(Coroutine *co)
>> +{
>> +    if (!setjmp(co->env)) {
>> +        return;
>> +    }
>> +
>> +    while (true) {
>> +        co->entry(co->data);
>> +        if (!setjmp(co->env)) {
>> +            longjmp(co->caller->env, COROUTINE_TERMINATE);
>> +        }
>> +    }
>> +}
>> +
>> +int qemu_coroutine_init_env(Coroutine *co)
>> +{
>> +#ifdef __i386__
>> +    asm volatile(
>> +        "mov %%esp, %%ebx;"
>> +        "mov %0, %%esp;"
>> +        "pushl %1;"
>> +        "call _trampoline;"
>> +        "mov %%ebx, %%esp;"
>> +        : : "r" (co->stack + co->stack_size), "r" (co) : "ebx"
>> +    );
> 
> So the only Linux host we support is x86??
> 
> We can't reasonably do this IMHO.  If we're going to go this route, we 
> should at least fall back to setcontext for the sake of portability.

This is win32 code, Linux uses ucontext for initialization.

Kevin
Paolo Bonzini - May 11, 2011, 1:45 p.m.
On 05/11/2011 03:05 PM, Anthony Liguori wrote:
>>
>> A very slow way, too (on Windows at least if you use qemu_cond...).
>
> That doesn't mean you can't do a fiber implementation for Windows... but
> having a highly portable fallback is a good thing.

I agree but where would you place it, since QEMU is only portable to 
POSIX and Windows?

osdep-$(CONFIG_POSIX) += coroutine-posix.c
osdep-$(CONFIG_WIN32) += coroutine-win32.c
osdep-??? += coroutine-fallback.c

:)

Paolo
Daniel P. Berrange - May 11, 2011, 1:51 p.m.
On Wed, May 11, 2011 at 03:45:39PM +0200, Paolo Bonzini wrote:
> On 05/11/2011 03:05 PM, Anthony Liguori wrote:
> >>
> >>A very slow way, too (on Windows at least if you use qemu_cond...).
> >
> >That doesn't mean you can't do a fiber implementation for Windows... but
> >having a highly portable fallback is a good thing.
> 
> I agree but where would you place it, since QEMU is only portable to
> POSIX and Windows?
> 
> osdep-$(CONFIG_POSIX) += coroutine-posix.c
> osdep-$(CONFIG_WIN32) += coroutine-win32.c
> osdep-??? += coroutine-fallback.c

NetBSD forbids the use of 'makecontext' in any application
which also links to libpthread.so[1]. We used makecontext in
GTK-VNC's coroutines and got random crashes in threaded
apps running on NetBSD. So for NetBSD we tell people to use
the thread based coroutines instead.

So at least one POSIX platform will need a different impl.

Regards,
Daniel

[1] http://www.eila.univ-paris-diderot.fr/cgi-bin/man.cgi?swapcontext+3
Jan Kiszka - May 12, 2011, 9:51 a.m.
On 2011-05-11 12:15, Stefan Hajnoczi wrote:
> From: Kevin Wolf <kwolf@redhat.com>
> 
> Asynchronous code is becoming very complex.  At the same time
> synchronous code is growing because it is convenient to write.
> Sometimes duplicate code paths are even added, one synchronous and the
> other asynchronous.  This patch introduces coroutines which allow code
> that looks synchronous but is asynchronous under the covers.
> 
> A coroutine has its own stack and is therefore able to preserve state
> across blocking operations, which traditionally require callback
> functions and manual marshalling of parameters.
> 
> Creating and starting a coroutine is easy:
> 
>   coroutine = qemu_coroutine_create(my_coroutine);
>   qemu_coroutine_enter(coroutine, my_data);
> 
> The coroutine then executes until it returns or yields:
> 
>   void coroutine_fn my_coroutine(void *opaque) {
>       MyData *my_data = opaque;
> 
>       /* do some work */
> 
>       qemu_coroutine_yield();
> 
>       /* do some more work */
>   }
> 
> Yielding switches control back to the caller of qemu_coroutine_enter().
> This is typically used to switch back to the main thread's event loop
> after issuing an asynchronous I/O request.  The request callback will
> then invoke qemu_coroutine_enter() once more to switch back to the
> coroutine.
> 
> Note that coroutines never execute concurrently and should only be used
> from threads which hold the global mutex.  This restriction makes
> programming with coroutines easier than with threads.  Race conditions
> cannot occur since only one coroutine may be active at any time.  Other
> coroutines can only run across yield.

Mmh, is there anything that conceptually prevent fixing this limitation
later on? I would really like to remove such dependency long-term as
well to have VCPUs operate truly independently on independent device models.

Jan
Stefan Hajnoczi - May 12, 2011, 9:59 a.m.
On Thu, May 12, 2011 at 10:51 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-05-11 12:15, Stefan Hajnoczi wrote:
>> From: Kevin Wolf <kwolf@redhat.com>
>>
>> Asynchronous code is becoming very complex.  At the same time
>> synchronous code is growing because it is convenient to write.
>> Sometimes duplicate code paths are even added, one synchronous and the
>> other asynchronous.  This patch introduces coroutines which allow code
>> that looks synchronous but is asynchronous under the covers.
>>
>> A coroutine has its own stack and is therefore able to preserve state
>> across blocking operations, which traditionally require callback
>> functions and manual marshalling of parameters.
>>
>> Creating and starting a coroutine is easy:
>>
>>   coroutine = qemu_coroutine_create(my_coroutine);
>>   qemu_coroutine_enter(coroutine, my_data);
>>
>> The coroutine then executes until it returns or yields:
>>
>>   void coroutine_fn my_coroutine(void *opaque) {
>>       MyData *my_data = opaque;
>>
>>       /* do some work */
>>
>>       qemu_coroutine_yield();
>>
>>       /* do some more work */
>>   }
>>
>> Yielding switches control back to the caller of qemu_coroutine_enter().
>> This is typically used to switch back to the main thread's event loop
>> after issuing an asynchronous I/O request.  The request callback will
>> then invoke qemu_coroutine_enter() once more to switch back to the
>> coroutine.
>>
>> Note that coroutines never execute concurrently and should only be used
>> from threads which hold the global mutex.  This restriction makes
>> programming with coroutines easier than with threads.  Race conditions
>> cannot occur since only one coroutine may be active at any time.  Other
>> coroutines can only run across yield.
>
> Mmh, is there anything that conceptually prevent fixing this limitation
> later on? I would really like to remove such dependency long-term as
> well to have VCPUs operate truly independently on independent device models.

The use case that has motivated coroutines is the block layer.  It is
synchronous in many places and definitely not thread-safe.  Coroutines
is a step that solves the "synchronous" part of the problem but does
not tackle the "not thread-safe" part.

It is possible to move from coroutines to threads but we need to
remove single-thread assumptions from all the block layer code, which
isn't a small task.  Coroutines does not prevent us from making the
block layer thread-safe!

Stefan
Kevin Wolf - May 12, 2011, 10:02 a.m.
Am 12.05.2011 11:51, schrieb Jan Kiszka:
> On 2011-05-11 12:15, Stefan Hajnoczi wrote:
>> From: Kevin Wolf <kwolf@redhat.com>
>>
>> Asynchronous code is becoming very complex.  At the same time
>> synchronous code is growing because it is convenient to write.
>> Sometimes duplicate code paths are even added, one synchronous and the
>> other asynchronous.  This patch introduces coroutines which allow code
>> that looks synchronous but is asynchronous under the covers.
>>
>> A coroutine has its own stack and is therefore able to preserve state
>> across blocking operations, which traditionally require callback
>> functions and manual marshalling of parameters.
>>
>> Creating and starting a coroutine is easy:
>>
>>   coroutine = qemu_coroutine_create(my_coroutine);
>>   qemu_coroutine_enter(coroutine, my_data);
>>
>> The coroutine then executes until it returns or yields:
>>
>>   void coroutine_fn my_coroutine(void *opaque) {
>>       MyData *my_data = opaque;
>>
>>       /* do some work */
>>
>>       qemu_coroutine_yield();
>>
>>       /* do some more work */
>>   }
>>
>> Yielding switches control back to the caller of qemu_coroutine_enter().
>> This is typically used to switch back to the main thread's event loop
>> after issuing an asynchronous I/O request.  The request callback will
>> then invoke qemu_coroutine_enter() once more to switch back to the
>> coroutine.
>>
>> Note that coroutines never execute concurrently and should only be used
>> from threads which hold the global mutex.  This restriction makes
>> programming with coroutines easier than with threads.  Race conditions
>> cannot occur since only one coroutine may be active at any time.  Other
>> coroutines can only run across yield.
> 
> Mmh, is there anything that conceptually prevent fixing this limitation
> later on? I would really like to remove such dependency long-term as
> well to have VCPUs operate truly independently on independent device models.

I think it's the defining property of coroutines. If you remove it, you
have full threads. Going from coroutines to threads may be an option in
the long term. The advantage of coroutines is that they provide an
incremental way forward from where we are today. This restriction is
really what makes the transition easy.

Kevin
Jamie Lokier - May 24, 2011, 7:37 p.m.
Daniel P. Berrange wrote:
> On Wed, May 11, 2011 at 03:45:39PM +0200, Paolo Bonzini wrote:
> > On 05/11/2011 03:05 PM, Anthony Liguori wrote:
> > >>
> > >>A very slow way, too (on Windows at least if you use qemu_cond...).
> > >
> > >That doesn't mean you can't do a fiber implementation for Windows... but
> > >having a highly portable fallback is a good thing.
> > 
> > I agree but where would you place it, since QEMU is only portable to
> > POSIX and Windows?
> > 
> > osdep-$(CONFIG_POSIX) += coroutine-posix.c
> > osdep-$(CONFIG_WIN32) += coroutine-win32.c
> > osdep-??? += coroutine-fallback.c
> 
> NetBSD forbids the use of 'makecontext' in any application
> which also links to libpthread.so[1]. We used makecontext in
> GTK-VNC's coroutines and got random crashes in threaded
> apps running on NetBSD. So for NetBSD we tell people to use
> the thread based coroutines instead.

You have to use swapcontext(), no wait, you have to use setjmp(), no wait,
_setjmp(), no wait, threads.... Read on.

From Glibc's FAQ, setjmp/longjmp are not portable choices:

    - UNIX provides no other (portable) way of effecting a synchronous
      context switch (also known as co-routine switch).  Some versions
      support this via setjmp()/longjmp() but this does not work
      universally.

So in principle you should use swapcontext() in portable code.

(By the way, Glibc goes on about how it won't support swapcontext()
from async signal handlers, i.e. preemption, on some architectures
(IA-64/S-390), and I know it has been very subtly broken from a signal
handler on ARM.  Fair enough, somehow disappointing, but doesn't
matter for QEMU coroutines.)

But swapcontext() etc. have been withdrawn from POSIX 2008:

    - Functions to be deleted

      Legacy: Delete all legacy functions except utimes (which should not be legacy).
      OB: Default position is to delete all OB functions.

      XSI Functions to change state

      ....
      _setjmp and _longjmp. Should become obsolete.
      ....
      getcontext, setcontext, makecontext and swapcontext are already
      marked OB and should be withdrawn. And header file <ucontext.h>. 

OB means obsolescent.  They were marked obsolescent a few versions
prior, with the rationale that you can use threads instead...

It's not surprising that NetBSD forbids makecontext() with
libpthread.so.  I suspect old versions of FreeBSD, OpenBSD, DragonFly
BSD, (and Mac OS X?), have the same restriction, because they have a
similar pthreads evolutionary history to LinuxThreads.  LinuxThreads
also breaks when using coroutines that switch stacks, because it uses
the stack pointer to know the current thread.

(LinuxThreads is old now, but that particular quirk still affects me
because some uCLinux platforms, on which I wish to use coroutines, still
don't have working NPTL - but they aren't likely to be running QEMU :-)

Finally, if you are using setjmp/longjmp, consider (from FreeBSD man page):

    The setjmp()/longjmp() pairs save and restore the signal mask
    while _setjmp()/_longjmp() pairs save and restore only the
    register set and the stack.  (See sigprocmask(2).)

As setjmp/longjmp were chosen for performance, you may wish to use
_setjmp/_longjmp instead (when available), as swizzling the signal
mask on each switch may involve a system call and be rather slow.

-- Jamie
Jamie Lokier - May 24, 2011, 7:54 p.m.
Stefan Hajnoczi wrote:
> On Thu, May 12, 2011 at 10:51 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > On 2011-05-11 12:15, Stefan Hajnoczi wrote:
> >> From: Kevin Wolf <kwolf@redhat.com>
> >>
> >> Asynchronous code is becoming very complex.  At the same time
> >> synchronous code is growing because it is convenient to write.
> >> Sometimes duplicate code paths are even added, one synchronous and the
> >> other asynchronous.  This patch introduces coroutines which allow code
> >> that looks synchronous but is asynchronous under the covers.
> >>
> >> A coroutine has its own stack and is therefore able to preserve state
> >> across blocking operations, which traditionally require callback
> >> functions and manual marshalling of parameters.
> >>
> >> Creating and starting a coroutine is easy:
> >>
> >>   coroutine = qemu_coroutine_create(my_coroutine);
> >>   qemu_coroutine_enter(coroutine, my_data);
> >>
> >> The coroutine then executes until it returns or yields:
> >>
> >>   void coroutine_fn my_coroutine(void *opaque) {
> >>       MyData *my_data = opaque;
> >>
> >>       /* do some work */
> >>
> >>       qemu_coroutine_yield();
> >>
> >>       /* do some more work */
> >>   }
> >>
> >> Yielding switches control back to the caller of qemu_coroutine_enter().
> >> This is typically used to switch back to the main thread's event loop
> >> after issuing an asynchronous I/O request.  The request callback will
> >> then invoke qemu_coroutine_enter() once more to switch back to the
> >> coroutine.
> >>
> >> Note that coroutines never execute concurrently and should only be used
> >> from threads which hold the global mutex.  This restriction makes
> >> programming with coroutines easier than with threads.  Race conditions
> >> cannot occur since only one coroutine may be active at any time.  Other
> >> coroutines can only run across yield.
> >
> > Mmh, is there anything that conceptually prevent fixing this limitation
> > later on? I would really like to remove such dependency long-term as
> > well to have VCPUs operate truly independently on independent device models.
> 
> The use case that has motivated coroutines is the block layer.  It is
> synchronous in many places and definitely not thread-safe.  Coroutines
> is a step that solves the "synchronous" part of the problem but does
> not tackle the "not thread-safe" part.
> 
> It is possible to move from coroutines to threads but we need to
> remove single-thread assumptions from all the block layer code, which
> isn't a small task.  Coroutines does not prevent us from making the
> block layer thread-safe!

Keeping in mind that you may have to do some of the work even with
coroutines.  If the code is not thread safe, it may contain
assumptions that certain state does not change when it makes blocking
I/O calls, which stops being true once you have coroutines and replace
the I/O calls with async calls.  But at least the checking can be
confined to those places in the code.

It's quite similar to the Linux BKL - scheduling points have to be
checked but nowhere else does.  And, like the BKL, it could be "pushed
down" in stages over a long time period, to convert the coroutine code
over to concurrent threads over time, rather than in a single step.

By the end, even with full concurrency, there is still some potential
for coroutines, and/or async calls, to be useful for performance
balancing.

-- Jamie
Stefan Hajnoczi - May 24, 2011, 7:58 p.m.
On Tue, May 24, 2011 at 08:37:50PM +0100, Jamie Lokier wrote:
> Daniel P. Berrange wrote:
> > On Wed, May 11, 2011 at 03:45:39PM +0200, Paolo Bonzini wrote:
> > > On 05/11/2011 03:05 PM, Anthony Liguori wrote:
> > > >>
> > > >>A very slow way, too (on Windows at least if you use qemu_cond...).
> > > >
> > > >That doesn't mean you can't do a fiber implementation for Windows... but
> > > >having a highly portable fallback is a good thing.
> > > 
> > > I agree but where would you place it, since QEMU is only portable to
> > > POSIX and Windows?
> > > 
> > > osdep-$(CONFIG_POSIX) += coroutine-posix.c
> > > osdep-$(CONFIG_WIN32) += coroutine-win32.c
> > > osdep-??? += coroutine-fallback.c
> > 
> > NetBSD forbids the use of 'makecontext' in any application
> > which also links to libpthread.so[1]. We used makecontext in
> > GTK-VNC's coroutines and got random crashes in threaded
> > apps running on NetBSD. So for NetBSD we tell people to use
> > the thread based coroutines instead.
> 
> You have to use swapcontext(), no wait, you have to use setjmp(), no wait,
> _setjmp(), no wait, threads.... Read on.
> 
> From Glibc's FAQ, setjmp/longjmp are not portable choices:
> 
>     - UNIX provides no other (portable) way of effecting a synchronous
>       context switch (also known as co-routine switch).  Some versions
>       support this via setjmp()/longjmp() but this does not work
>       universally.
> 
> So in principle you should use swapcontext() in portable code.
> 
> (By the way, Glibc goes on about how it won't support swapcontext()
> from async signal handlers, i.e. preemption, on some architectures
> (IA-64/S-390), and I know it has been very subtly broken from a signal
> handler on ARM.  Fair enough, somehow disappointing, but doesn't
> matter for QEMU coroutines.)
> 
> But swapcontext() etc. have been withdrawn from POSIX 2008:
> 
>     - Functions to be deleted
> 
>       Legacy: Delete all legacy functions except utimes (which should not be legacy).
>       OB: Default position is to delete all OB functions.
> 
>       XSI Functions to change state
> 
>       ....
>       _setjmp and _longjmp. Should become obsolete.
>       ....
>       getcontext, setcontext, makecontext and swapcontext are already
>       marked OB and should be withdrawn. And header file <ucontext.h>. 
> 
> OB means obsolescent.  They were marked obsolescent a few versions
> prior, with the rationale that you can use threads instead...

Yep, aware of this but at the end of the day these functions are
commonly available.

> It's not surprising that NetBSD forbids makecontext() with
> libpthread.so.  I suspect old versions of FreeBSD, OpenBSD, DragonFly
> BSD, (and Mac OS X?), have the same restriction, because they have a
> similar pthreads evolutionary history to LinuxThreads.  LinuxThreads
> also breaks when using coroutines that switch stacks, because it uses
> the stack pointer to know the current thread.
> 
> (LinuxThreads is old now, but that particular quirk still affects me
> because some uCLinux platforms, on which I wish to use coroutines, still
> don't have working NPTL - but they aren't likely to be running QEMU :-)

That is nasty.

> Finally, if you are using setjmp/longjmp, consider (from FreeBSD man page):
> 
>     The setjmp()/longjmp() pairs save and restore the signal mask
>     while _setjmp()/_longjmp() pairs save and restore only the
>     register set and the stack.  (See sigprocmask(2).)
> 
> As setjmp/longjmp were chosen for performance, you may wish to use
> _setjmp/_longjmp instead (when available), as swizzling the signal
> mask on each switch may involve a system call and be rather slow.

Thanks, I read about that but didn't try to implement special cases
because I don't have relevant OSes here to test against.

My current plan is to try using sigaltstack(2) instead of
makecontext()/swapcontext() as a hack since OpenBSD doesn't have
makecontext()/swapcontext().

TBH I'm almost at the stage where I think we should just use threads
and/or async callbacks, as appropriate.  Hopefully I'll be able to cook
up a reasonably portable implementation of coroutines though, because
the prospect of having to go fully threaded or do async callbacks isn't
attractive in many cases.

Stefan
Jamie Lokier - May 24, 2011, 8:51 p.m.
Stefan Hajnoczi wrote:
> My current plan is to try using sigaltstack(2) instead of
> makecontext()/swapcontext() as a hack since OpenBSD doesn't have
> makecontext()/swapcontext().

sigaltstack() is just a system call to tell the system about an
alternative signal stack - that you have allocated yourself using
malloc().  According to 'info libc "Signal Stack"'.  It won't help you
get a new stack by itself.

Maybe take a look at what GNU Pth does.  It has a similar matrix of
tested platforms using different strategies on each, though it is
slightly different because it obviously doesn't link with
libpthread.so (it provides it!), and it has to context switch from the
SIGALRM handler for pre-emption.

> TBH I'm almost at the stage where I think we should just use threads
> and/or async callbacks, as appropriate.  Hopefully I'll be able to cook
> up a reasonably portable implementation of coroutines though, because
> the prospect of having to go fully threaded or do async callbacks isn't
> attractive in many cases.

Another classic trick is just to call a function recursively which has
a large local array(*), setjmp() every M calls, and longjmp() back to
the start after M*N calls.  That gets you N setjmp() contexts to
switch between, all in the same larger stack so it's fine even with
old pthread implementations, providing the total stack used isn't too
big, and the individual stacks you've allocated aren't too small for
the program.

If the large local array insists on being optimised away, it's
probably better anyway to track the address of a local variable, and
split the stack whenever the address has changed by enough.  Try to
make sure the compiler doesn't optimise away the tail recursion :-)

It works better on non-threaded programs as per-thread stacks are more
likely to have limited size.  *But* the initial thread often has a
large growable stack, just like a single-threaded program.  So it's a
good idea to do the stack carving in the initial thread (doesn't
necessarily have to be at the start of the program).  You may be able
to add guard pages afterwards with mprotect() if you're paranoid :-)

-- Jamie
Anthony Liguori - May 24, 2011, 9:21 p.m.
On 05/24/2011 02:58 PM, Stefan Hajnoczi wrote:
> On Tue, May 24, 2011 at 08:37:50PM +0100, Jamie Lokier wrote:
> Thanks, I read about that but didn't try to implement special cases
> because I don't have relevant OSes here to test against.
>
> My current plan is to try using sigaltstack(2) instead of
> makecontext()/swapcontext() as a hack since OpenBSD doesn't have
> makecontext()/swapcontext().
>
> TBH I'm almost at the stage where I think we should just use threads
> and/or async callbacks, as appropriate.  Hopefully I'll be able to cook
> up a reasonably portable implementation of coroutines though, because
> the prospect of having to go fully threaded or do async callbacks isn't
> attractive in many cases.

Why not use threads as a coroutine callback?  That's essentially what we 
would do to be "fully threaded".

Regards,

Anthony Liguori

>
> Stefan
>
Anthony Liguori - May 24, 2011, 9:22 p.m.
On 05/24/2011 02:58 PM, Stefan Hajnoczi wrote:
> On Tue, May 24, 2011 at 08:37:50PM +0100, Jamie Lokier wrote:
>> Daniel P. Berrange wrote:
>>> On Wed, May 11, 2011 at 03:45:39PM +0200, Paolo Bonzini wrote:
>>>> On 05/11/2011 03:05 PM, Anthony Liguori wrote:
>>>>>>
>>>>>> A very slow way, too (on Windows at least if you use qemu_cond...).
>>>>>
>>>>> That doesn't mean you can't do a fiber implementation for Windows... but
>>>>> having a highly portable fallback is a good thing.
>>>>
>>>> I agree but where would you place it, since QEMU is only portable to
>>>> POSIX and Windows?
>>>>
>>>> osdep-$(CONFIG_POSIX) += coroutine-posix.c
>>>> osdep-$(CONFIG_WIN32) += coroutine-win32.c
>>>> osdep-??? += coroutine-fallback.c
>>>
>>> NetBSD forbids the use of 'makecontext' in any application
>>> which also links to libpthread.so[1]. We used makecontext in
>>> GTK-VNC's coroutines and got random crashes in threaded
>>> apps running on NetBSD. So for NetBSD we tell people to use
>>> the thread based coroutines instead.
>>
>> You have to use swapcontext(), no wait, you have to use setjmp(), no wait,
>> _setjmp(), no wait, threads.... Read on.
>>
>>  From Glibc's FAQ, setjmp/longjmp are not portable choices:
>>
>>      - UNIX provides no other (portable) way of effecting a synchronous
>>        context switch (also known as co-routine switch).  Some versions
>>        support this via setjmp()/longjmp() but this does not work
>>        universally.
>>
>> So in principle you should use swapcontext() in portable code.
>>
>> (By the way, Glibc goes on about how it won't support swapcontext()
>> from async signal handlers, i.e. preemption, on some architectures
>> (IA-64/S-390), and I know it has been very subtly broken from a signal
>> handler on ARM.  Fair enough, somehow disappointing, but doesn't
>> matter for QEMU coroutines.)
>>
>> But swapcontext() etc. have been withdrawn from POSIX 2008:
>>
>>      - Functions to be deleted
>>
>>        Legacy: Delete all legacy functions except utimes (which should not be legacy).
>>        OB: Default position is to delete all OB functions.
>>
>>        XSI Functions to change state
>>
>>        ....
>>        _setjmp and _longjmp. Should become obsolete.
>>        ....
>>        getcontext, setcontext, makecontext and swapcontext are already
>>        marked OB and should be withdrawn. And header file<ucontext.h>.
>>
>> OB means obsolescent.  They were marked obsolescent a few versions
>> prior, with the rationale that you can use threads instead...
>
> Yep, aware of this but at the end of the day these functions are
> commonly available.
>
>> It's not surprising that NetBSD forbids makecontext() with
>> libpthread.so.  I suspect old versions of FreeBSD, OpenBSD, DragonFly
>> BSD, (and Mac OS X?), have the same restriction, because they have a
>> similar pthreads evolutionary history to LinuxThreads.  LinuxThreads
>> also breaks when using coroutines that switch stacks, because it uses
>> the stack pointer to know the current thread.
>>
>> (LinuxThreads is old now, but that particular quirk still affects me
>> because some uCLinux platforms, on which I wish to use coroutines, still
>> don't have working NPTL - but they aren't likely to be running QEMU :-)
>
> That is nasty.
>
>> Finally, if you are using setjmp/longjmp, consider (from FreeBSD man page):
>>
>>      The setjmp()/longjmp() pairs save and restore the signal mask
>>      while _setjmp()/_longjmp() pairs save and restore only the
>>      register set and the stack.  (See sigprocmask(2).)
>>
>> As setjmp/longjmp were chosen for performance, you may wish to use
>> _setjmp/_longjmp instead (when available), as swizzling the signal
>> mask on each switch may involve a system call and be rather slow.
>
> Thanks, I read about that but didn't try to implement special cases
> because I don't have relevant OSes here to test against.
>
> My current plan is to try using sigaltstack(2) instead of
> makecontext()/swapcontext() as a hack since OpenBSD doesn't have
> makecontext()/swapcontext().
>
> TBH I'm almost at the stage where I think we should just use threads
> and/or async callbacks, as appropriate.  Hopefully I'll be able to cook
> up a reasonably portable implementation of coroutines though, because
> the prospect of having to go fully threaded or do async callbacks isn't
> attractive in many cases.

I'm meant to say threads as a coroutine fallback.

Regards,

Anthony Liguori

>
> Stefan
>
Stefan Hajnoczi - May 25, 2011, 7:09 a.m.
On Tue, May 24, 2011 at 9:51 PM, Jamie Lokier <jamie@shareable.org> wrote:
> Stefan Hajnoczi wrote:
>> My current plan is to try using sigaltstack(2) instead of
>> makecontext()/swapcontext() as a hack since OpenBSD doesn't have
>> makecontext()/swapcontext().
>
> sigaltstack() is just a system call to tell the system about an
> alternative signal stack - that you have allocated yourself using
> malloc().  According to 'info libc "Signal Stack"'.  It won't help you
> get a new stack by itself.

Issue sigaltstack() with the malloced new stack.  Send yourself a
signal and in a custom signal handler setjmp() to stash away the state
(you're now on the new stack).

>
> Maybe take a look at what GNU Pth does.  It has a similar matrix of
> tested platforms using different strategies on each, though it is
> slightly different because it obviously doesn't link with
> libpthread.so (it provides it!), and it has to context switch from the
> SIGALRM handler for pre-emption.
>
>> TBH I'm almost at the stage where I think we should just use threads
>> and/or async callbacks, as appropriate.  Hopefully I'll be able to cook
>> up a reasonably portable implementation of coroutines though, because
>> the prospect of having to go fully threaded or do async callbacks isn't
>> attractive in many cases.
>
> Another classic trick is just to call a function recursively which has
> a large local array(*), setjmp() every M calls, and longjmp() back to
> the start after M*N calls.  That gets you N setjmp() contexts to
> switch between, all in the same larger stack so it's fine even with
> old pthread implementations, providing the total stack used isn't too
> big, and the individual stacks you've allocated aren't too small for
> the program.

True, I think I've done something like this with alloca() before :(.
It's extremely hacky though.

Stefan
Paolo Bonzini - May 25, 2011, 7:32 a.m.
On 05/24/2011 11:21 PM, Anthony Liguori wrote:
> Why not use threads as a coroutine fallback?  That's essentially what we
> would do to be "fully threaded".

Not exactly, there would be much less synchronization going on.  Using 
threads to implement coroutines means you go through the slow path of 
the synchronization primitives (either mutexes/condvars or barriers) 
twice or more per coroutine switch.  It is really slow, a 100 times 
difference perhaps.

Paolo
Bastien ROUCARIES - May 25, 2011, 11:43 a.m.
Le mardi 24 mai 2011 21:58:12, Stefan Hajnoczi a écrit :
> On Tue, May 24, 2011 at 08:37:50PM +0100, Jamie Lokier wrote:
> TBH I'm almost at the stage where I think we should just use threads
> and/or async callbacks, as appropriate.  Hopefully I'll be able to cook
> up a reasonably portable implementation of coroutines though, because
> the prospect of having to go fully threaded or do async callbacks isn't
> attractive in many cases.

May be Stupid question: why not using pcl lib ?

Bastien

> Stefan
Richard Henderson - May 25, 2011, 6:54 p.m.
On 05/25/2011 12:09 AM, Stefan Hajnoczi wrote:
> On Tue, May 24, 2011 at 9:51 PM, Jamie Lokier <jamie@shareable.org> wrote:
>> Stefan Hajnoczi wrote:
>>> My current plan is to try using sigaltstack(2) instead of
>>> makecontext()/swapcontext() as a hack since OpenBSD doesn't have
>>> makecontext()/swapcontext().
>>
>> sigaltstack() is just a system call to tell the system about an
>> alternative signal stack - that you have allocated yourself using
>> malloc().  According to 'info libc "Signal Stack"'.  It won't help you
>> get a new stack by itself.
> 
> Issue sigaltstack() with the malloced new stack.  Send yourself a
> signal and in a custom signal handler setjmp() to stash away the state
> (you're now on the new stack).

Is any of this really easier than simply writing 20-30 lines of
assembly to do what you Really Want And Nothing Else?

Honestly, this is qemu we're talking about, and we assume you've
already ported TCG to the host cpu plus abi.  How hard is it to
just DTRT with a qemu-specific routine, anyway?


r~

Patch

diff --git a/Makefile.objs b/Makefile.objs
index 9d8851e..cba6c2b 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -11,6 +11,12 @@  oslib-obj-$(CONFIG_WIN32) += oslib-win32.o
 oslib-obj-$(CONFIG_POSIX) += oslib-posix.o
 
 #######################################################################
+# coroutines
+coroutine-obj-y = qemu-coroutine.o
+coroutine-obj-$(CONFIG_POSIX) += coroutine-ucontext.o
+coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
+
+#######################################################################
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
 block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o async.o
@@ -67,6 +73,7 @@  common-obj-y += readline.o console.o cursor.o qemu-error.o
 common-obj-y += $(oslib-obj-y)
 common-obj-$(CONFIG_WIN32) += os-win32.o
 common-obj-$(CONFIG_POSIX) += os-posix.o
+common-obj-y += $(coroutine-obj-y)
 
 common-obj-y += tcg-runtime.o host-utils.o
 common-obj-y += irq.o ioport.o input.o
diff --git a/coroutine-ucontext.c b/coroutine-ucontext.c
new file mode 100644
index 0000000..97f2b35
--- /dev/null
+++ b/coroutine-ucontext.c
@@ -0,0 +1,73 @@ 
+/*
+ * ucontext coroutine initialization code
+ *
+ * Copyright (C) 2006  Anthony Liguori <anthony@codemonkey.ws>
+ * Copyright (C) 2011  Kevin Wolf <kwolf@redhat.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.0 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+/* XXX Is there a nicer way to disable glibc's stack check for longjmp? */
+#ifdef _FORTIFY_SOURCE
+#undef _FORTIFY_SOURCE
+#endif
+#include <setjmp.h>
+#include <stdint.h>
+#include <ucontext.h>
+#include "qemu-coroutine-int.h"
+
+static Coroutine *new_coroutine;
+
+static void continuation_trampoline(void)
+{
+    Coroutine *co = new_coroutine;
+
+    /* Initialize longjmp environment and switch back to
+     * qemu_coroutine_init_env() in the old ucontext. */
+    if (!setjmp(co->env)) {
+        return;
+    }
+
+    while (true) {
+        co->entry(co->data);
+        if (!setjmp(co->env)) {
+            longjmp(co->caller->env, COROUTINE_TERMINATE);
+        }
+    }
+}
+
+int qemu_coroutine_init_env(Coroutine *co)
+{
+    ucontext_t old_uc, uc;
+
+    /* Create a new ucontext for switching to the coroutine stack and setting
+     * up a longjmp environment. */
+    if (getcontext(&uc) == -1) {
+        return -errno;
+    }
+
+    uc.uc_link = &old_uc;
+    uc.uc_stack.ss_sp = co->stack;
+    uc.uc_stack.ss_size = co->stack_size;
+    uc.uc_stack.ss_flags = 0;
+
+    new_coroutine = co;
+    makecontext(&uc, (void *)continuation_trampoline, 0);
+
+    /* Initialize the longjmp environment */
+    swapcontext(&old_uc, &uc);
+
+    return 0;
+}
diff --git a/coroutine-win32.c b/coroutine-win32.c
new file mode 100644
index 0000000..f4521c3
--- /dev/null
+++ b/coroutine-win32.c
@@ -0,0 +1,57 @@ 
+/*
+ * Win32 coroutine initialization code
+ *
+ * Copyright (c) 2011 Kevin Wolf <kwolf@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu-coroutine-int.h"
+
+static void __attribute__((used)) trampoline(Coroutine *co)
+{
+    if (!setjmp(co->env)) {
+        return;
+    }
+
+    while (true) {
+        co->entry(co->data);
+        if (!setjmp(co->env)) {
+            longjmp(co->caller->env, COROUTINE_TERMINATE);
+        }
+    }
+}
+
+int qemu_coroutine_init_env(Coroutine *co)
+{
+#ifdef __i386__
+    asm volatile(
+        "mov %%esp, %%ebx;"
+        "mov %0, %%esp;"
+        "pushl %1;"
+        "call _trampoline;"
+        "mov %%ebx, %%esp;"
+        : : "r" (co->stack + co->stack_size), "r" (co) : "ebx"
+    );
+#else
+    #error This host architecture is not supported for win32
+#endif
+
+    return 0;
+}
diff --git a/qemu-coroutine-int.h b/qemu-coroutine-int.h
new file mode 100644
index 0000000..c86dcc1
--- /dev/null
+++ b/qemu-coroutine-int.h
@@ -0,0 +1,57 @@ 
+/*
+ * Coroutine internals
+ *
+ * Copyright (c) 2011 Kevin Wolf <kwolf@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_COROUTINE_INT_H
+#define QEMU_COROUTINE_INT_H
+
+#include <setjmp.h>
+#include "qemu-common.h"
+#include "qemu-queue.h"
+#include "qemu-coroutine.h"
+
+enum {
+    /* setjmp() return values */
+    COROUTINE_YIELD = 1,
+    COROUTINE_TERMINATE = 2,
+};
+
+struct Coroutine {
+    bool initialized;
+    struct Coroutine *caller;
+
+    size_t stack_size;
+    char *stack;
+
+    /* Used to pass arguments/return values for coroutines */
+    void *data;
+    CoroutineEntry *entry;
+
+    QLIST_ENTRY(Coroutine) pool_next;
+
+    jmp_buf env;
+};
+
+int qemu_coroutine_init_env(Coroutine *co);
+
+#endif
diff --git a/qemu-coroutine.c b/qemu-coroutine.c
new file mode 100644
index 0000000..0927f58
--- /dev/null
+++ b/qemu-coroutine.c
@@ -0,0 +1,132 @@ 
+/*
+ * QEMU coroutines
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Stefan Hajnoczi    <stefanha@linux.vnet.ibm.com>
+ *  Kevin Wolf         <kwolf@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+/* XXX Is there a nicer way to disable glibc's stack check for longjmp? */
+#ifdef _FORTIFY_SOURCE
+#undef _FORTIFY_SOURCE
+#endif
+#include <setjmp.h>
+
+#include "trace.h"
+#include "qemu-queue.h"
+#include "qemu-common.h"
+#include "qemu-coroutine.h"
+#include "qemu-coroutine-int.h"
+
+static QLIST_HEAD(, Coroutine) pool = QLIST_HEAD_INITIALIZER(&pool);
+static __thread Coroutine leader;
+static __thread Coroutine *current;
+
+static void qemu_coroutine_terminate(Coroutine *coroutine)
+{
+    trace_qemu_coroutine_terminate(coroutine);
+    QLIST_INSERT_HEAD(&pool, coroutine, pool_next);
+    coroutine->caller = NULL;
+}
+
+static int coroutine_init(Coroutine *co)
+{
+    if (!co->initialized) {
+        co->initialized = true;
+        co->stack_size = 4 << 20;
+        co->stack = qemu_malloc(co->stack_size);
+    }
+
+    return qemu_coroutine_init_env(co);
+}
+
+Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
+{
+    Coroutine *coroutine;
+
+    coroutine = QLIST_FIRST(&pool);
+
+    if (coroutine) {
+        QLIST_REMOVE(coroutine, pool_next);
+    } else {
+        coroutine = qemu_mallocz(sizeof(*coroutine));
+    }
+
+    coroutine_init(coroutine);
+    coroutine->entry = entry;
+
+    return coroutine;
+}
+
+Coroutine * coroutine_fn qemu_coroutine_self(void)
+{
+    if (current == NULL) {
+        current = &leader;
+    }
+
+    return current;
+}
+
+bool qemu_in_coroutine(void)
+{
+    return (qemu_coroutine_self() != &leader);
+}
+
+static void *coroutine_swap(Coroutine *from, Coroutine *to, void *opaque)
+{
+    int ret;
+
+    to->data = opaque;
+
+    ret = setjmp(from->env);
+    switch (ret) {
+    case COROUTINE_YIELD:
+        return from->data;
+    case COROUTINE_TERMINATE:
+        current = to->caller;
+        qemu_coroutine_terminate(to);
+        return to->data;
+    default:
+        /* Switch to called coroutine */
+        current = to;
+        longjmp(to->env, COROUTINE_YIELD);
+        return NULL;
+    }
+}
+
+void qemu_coroutine_enter(Coroutine *coroutine, void *opaque)
+{
+    Coroutine *self = qemu_coroutine_self();
+
+    trace_qemu_coroutine_enter(self, coroutine, opaque);
+
+    if (coroutine->caller) {
+        fprintf(stderr, "Co-routine re-entered recursively\n");
+        abort();
+    }
+
+    coroutine->caller = self;
+    coroutine_swap(self, coroutine, opaque);
+}
+
+void * coroutine_fn qemu_coroutine_yield(void)
+{
+    Coroutine *self = qemu_coroutine_self();
+    Coroutine *to = self->caller;
+
+    trace_qemu_coroutine_yield(self, self->caller);
+
+    if (!to) {
+        fprintf(stderr, "Co-routine is yielding to no one\n");
+        abort();
+    }
+
+    self->caller = NULL;
+    return coroutine_swap(self, to, NULL);
+}
diff --git a/qemu-coroutine.h b/qemu-coroutine.h
new file mode 100644
index 0000000..1a19d18
--- /dev/null
+++ b/qemu-coroutine.h
@@ -0,0 +1,82 @@ 
+/*
+ * QEMU coroutine implementation
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Stefan Hajnoczi    <stefanha@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_COROUTINE_H
+#define QEMU_COROUTINE_H
+
+#include <stdbool.h>
+
+/**
+ * Mark a function that executes in coroutine context
+ *
+ * Functions that execute in coroutine context cannot be called directly from
+ * normal functions.  In the future it would be nice to enable compiler or
+ * static checker support for catching such errors.  This annotation might make
+ * it possible and in the meantime it serves as documentation.
+ *
+ * For example:
+ *
+ *   static void coroutine_fn foo(void) {
+ *       ....
+ *   }
+ */
+#define coroutine_fn
+
+typedef struct Coroutine Coroutine;
+
+/**
+ * Coroutine entry point
+ *
+ * When the coroutine is entered for the first time, opaque is passed in as an
+ * argument.
+ *
+ * When this function returns, the coroutine is destroyed automatically and
+ * execution continues in the caller who last entered the coroutine.
+ */
+typedef void coroutine_fn CoroutineEntry(void *opaque);
+
+/**
+ * Create a new coroutine
+ *
+ * Use qemu_coroutine_enter() to actually transfer control to the coroutine.
+ */
+Coroutine *qemu_coroutine_create(CoroutineEntry *entry);
+
+/**
+ * Transfer control to a coroutine
+ *
+ * The opaque argument is made available to the coroutine either as the entry
+ * function argument if this is the first time a new coroutine is entered, or
+ * as the return value from qemu_coroutine_yield().
+ */
+void qemu_coroutine_enter(Coroutine *coroutine, void *opaque);
+
+/**
+ * Transfer control back to a coroutine's caller
+ *
+ * The return value is the argument passed back in from the next
+ * qemu_coroutine_enter().
+ */
+void * coroutine_fn qemu_coroutine_yield(void);
+
+/**
+ * Get the currently executing coroutine
+ */
+Coroutine * coroutine_fn qemu_coroutine_self(void);
+
+/**
+ * Return whether or not currently inside a coroutine
+ */
+bool qemu_in_coroutine(void);
+
+#endif /* QEMU_COROUTINE_H */
diff --git a/trace-events b/trace-events
index 4f965e2..2d4db05 100644
--- a/trace-events
+++ b/trace-events
@@ -361,3 +361,8 @@  disable milkymist_uart_pulse_irq_tx(void) "Pulse IRQ TX"
 # hw/milkymist-vgafb.c
 disable milkymist_vgafb_memory_read(uint32_t addr, uint32_t value) "addr %08x value %08x"
 disable milkymist_vgafb_memory_write(uint32_t addr, uint32_t value) "addr %08x value %08x"
+
+# qemu-coroutine.c
+qemu_coroutine_enter(void *from, void *to, void *opaque) "from %p to %p opaque %p"
+qemu_coroutine_yield(void *from, void *to) "from %p to %p"
+qemu_coroutine_terminate(void *co) "self %p"