diff mbox

docs/multiple-iothreads.txt: add documentation on IOThread programming

Message ID 1402322375-18899-1-git-send-email-stefanha@redhat.com
State New
Headers show

Commit Message

Stefan Hajnoczi June 9, 2014, 1:59 p.m. UTC
This document explains how IOThreads and the main loop are related,
especially how to write code that can run in an IOThread.  Currently on
virtio-blk-data-plane uses these techniques.  The next obvious target is
virtio-scsi; there has also been work on virtio-net.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 docs/multiple-iothreads.txt | 124 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)
 create mode 100644 docs/multiple-iothreads.txt

Comments

Paolo Bonzini June 9, 2014, 2:11 p.m. UTC | #1
Il 09/06/2014 15:59, Stefan Hajnoczi ha scritto:
> This document explains how IOThreads and the main loop are related,
> especially how to write code that can run in an IOThread.  Currently on
> virtio-blk-data-plane uses these techniques.  The next obvious target is
> virtio-scsi; there has also been work on virtio-net.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  docs/multiple-iothreads.txt | 124 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 124 insertions(+)
>  create mode 100644 docs/multiple-iothreads.txt
>
> diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
> new file mode 100644
> index 0000000..f2b008d
> --- /dev/null
> +++ b/docs/multiple-iothreads.txt
> @@ -0,0 +1,124 @@
> +This document explains the IOThread feature and how to write code that runs
> +outside the QEMU global mutex.
> +
> +The main loop and IOThreads
> +---------------------------
> +QEMU is an event-driven program that can do several things at once using an
> +event loop.  The VNC server and the QMP monitor are both processed from the
> +same event loop which monitors their file descriptors until they become
> +readable and then invokes a callback.
> +
> +The default event loop is called the main loop (see main-loop.c).  It is
> +possible to create additional event loop threads using -object
> +iothread,id=my-iothread.
> +
> +Side note: The main loop and IOThread are both event loops but their code is
> +not shared completely.  Sometimes it is useful to remember that although they
> +are conceptually similar they are currently not interchangeable.

Actually, the main loop does include all the iothread code.  So you 
could say that the main loop is a superset of the iothread.

> +How to program for IOThreads
> +----------------------------
> +The main difference between legacy code and new code that can run in an
> +IOThread is dealing explicitly with the event loop object, AioContext
> +(see include/block/aio.h).  Code that only works in the main loop
> +implicitly uses the main loop's AioContext.  Code that supports running
> +in IOThreads must be aware of its AioContext.
> +
> +AioContext supports the following services:
> + * File descriptor monitoring (read/write/error)

POSIX only, at least for now.

> + * Event notifiers (inter-thread signalling)
> + * Timers
> + * Bottom Halves (BH) deferred callbacks
> +
> +There are several old APIs that use the main loop AioContext:
> + * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
> + * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier

seems to be unused

> + * LEGACY timer_new_ms() - create a timer
> + * LEGACY qemu_bh_new() - create a BH
> + * LEGACY qemu_aio_wait() - run an event loop iteration

also seems to be unused except for qemu-io-cmds.c (and easily removed 
from there).

Perhaps add a note (here or elsewhere) that timer_new_ms/qemu_bh_new 
should never be used in the block layer?

> +Since they implicitly work on the main loop they cannot be used in code that
> +runs in an IOThread.  They might cause a crash or deadlock if called from an
> +IOThread since the QEMU global mutex is not held.
> +
> +Instead, use the AioContext functions directly (see include/block/aio.h):
> + * aio_set_fd_handler() - monitor a file descriptor
> + * aio_set_event_notifier() - monitor an event notifier
> + * aio_timer_new() - create a timer
> + * aio_bh_new() - create a BH
> + * aio_poll() - run an event loop iteration
> +
> +The AioContext can be obtained from the IOThread using
> +iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
> +Code that takes an AioContext argument works both in IOThreads or the main
> +loop, depending on which AioContext instance the caller passes in.

Perfect.

> +How to synchronize with an IOThread
> +-----------------------------------
> +AioContext is not thread-safe so some rules must be followed when using file
> +descriptors, event notifiers, timers, or BHs across threads:
> +
> +1. AioContext functions can be called safely from file descriptor, event
> +notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
> +necessary.
> +
> +2. Other threads wishing to access the AioContext must use
> +aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
> +context is acquired no other thread can access it or run event loop iterations
> +in this AioContext.
> +
> +aio_context_acquire()/aio_context_release() calls may be nested.  This
> +means you can call them if you're not sure whether #1 applies.
> +
> +There is currently no lock ordering rule if a thread needs to acquire multiple
> +AioContexts simultaneously.  Therefore, it is only safe for code holding the
> +QEMU global mutex to acquire other AioContexts.

Good point (and a nice way out of the lock ordering quagmire...).

Paolo

> +Side note: the best way to schedule a function call across threads is to create
> +a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
> +acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
> +sure to acquire the AioContext for aio_bh_new() if necessary.
> +
> +The relationship between AioContext and the block layer
> +-------------------------------------------------------
> +The AioContext originates from the QEMU block layer because it provides a
> +scoped way of running event loop iterations until all work is done.  This
> +feature is used to complete all in-flight block I/O requests (see
> +bdrv_drain_all()).  Nowadays AioContext is a generic event loop that can be
> +used by any QEMU subsystem.
> +
> +The block layer has support for AioContext integrated.  Each BlockDriverState
> +is associated with an AioContext using bdrv_set_aio_context() and
> +bdrv_get_aio_context().  This allows block layer code to process I/O inside the
> +right AioContext.  Other subsystems may wish to follow a similar approach.
> +
> +If main loop code such as a QMP function wishes to access a BlockDriverState it
> +must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
> +IOThread does not run in parallel.
> +
> +Long-running jobs (usually in the form of coroutines) are best scheduled in the
> +BlockDriverState's AioContext to avoid the need to acquire/release around each
> +bdrv_*() call.  Be aware that there is currently no mechanism to get notified
> +when bdrv_set_aio_context() moves this BlockDriverState to a different
> +AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
> +may need to add this if you want to support long-running jobs.
>
Eric Blake June 9, 2014, 3:29 p.m. UTC | #2
On 06/09/2014 07:59 AM, Stefan Hajnoczi wrote:
> This document explains how IOThreads and the main loop are related,
> especially how to write code that can run in an IOThread.  Currently on
> virtio-blk-data-plane uses these techniques.  The next obvious target is
> virtio-scsi; there has also been work on virtio-net.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  docs/multiple-iothreads.txt | 124 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 124 insertions(+)
>  create mode 100644 docs/multiple-iothreads.txt
> 
> diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
> new file mode 100644
> index 0000000..f2b008d
> --- /dev/null
> +++ b/docs/multiple-iothreads.txt
> @@ -0,0 +1,124 @@
> +This document explains the IOThread feature and how to write code that runs
> +outside the QEMU global mutex.

Pre-existing epidemic in this directory, but should you assert copyright
and a license?

> +
> +The main loop and IOThreads
> +---------------------------
> +QEMU is an event-driven program that can do several things at once using an
> +event loop.  The VNC server and the QMP monitor are both processed from the
> +same event loop which monitors their file descriptors until they become

s/loop/loop,/
Fam Zheng June 10, 2014, 2:04 a.m. UTC | #3
On Mon, 06/09 15:59, Stefan Hajnoczi wrote:
> This document explains how IOThreads and the main loop are related,
> especially how to write code that can run in an IOThread.  Currently on

Perhaps s/on/only/ ?

Fam

> virtio-blk-data-plane uses these techniques.  The next obvious target is
> virtio-scsi; there has also been work on virtio-net.
>
Stefan Hajnoczi June 27, 2014, 9:59 a.m. UTC | #4
On Mon, Jun 09, 2014 at 09:29:31AM -0600, Eric Blake wrote:
> On 06/09/2014 07:59 AM, Stefan Hajnoczi wrote:
> > This document explains how IOThreads and the main loop are related,
> > especially how to write code that can run in an IOThread.  Currently on
> > virtio-blk-data-plane uses these techniques.  The next obvious target is
> > virtio-scsi; there has also been work on virtio-net.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  docs/multiple-iothreads.txt | 124 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 124 insertions(+)
> >  create mode 100644 docs/multiple-iothreads.txt
> > 
> > diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
> > new file mode 100644
> > index 0000000..f2b008d
> > --- /dev/null
> > +++ b/docs/multiple-iothreads.txt
> > @@ -0,0 +1,124 @@
> > +This document explains the IOThread feature and how to write code that runs
> > +outside the QEMU global mutex.
> 
> Pre-existing epidemic in this directory, but should you assert copyright
> and a license?

Yes, I'm happy to do that.
Stefan Hajnoczi June 27, 2014, 10:07 a.m. UTC | #5
On Mon, Jun 09, 2014 at 04:11:29PM +0200, Paolo Bonzini wrote:
> >+The main loop and IOThreads
> >+---------------------------
> >+QEMU is an event-driven program that can do several things at once using an
> >+event loop.  The VNC server and the QMP monitor are both processed from the
> >+same event loop which monitors their file descriptors until they become
> >+readable and then invokes a callback.
> >+
> >+The default event loop is called the main loop (see main-loop.c).  It is
> >+possible to create additional event loop threads using -object
> >+iothread,id=my-iothread.
> >+
> >+Side note: The main loop and IOThread are both event loops but their code is
> >+not shared completely.  Sometimes it is useful to remember that although they
> >+are conceptually similar they are currently not interchangeable.
> 
> Actually, the main loop does include all the iothread code.  So you could
> say that the main loop is a superset of the iothread.

Not quite.  The main loop includes AioContext but it does not use
iothread.c (IOThread).

> >+ * LEGACY timer_new_ms() - create a timer
> >+ * LEGACY qemu_bh_new() - create a BH
> >+ * LEGACY qemu_aio_wait() - run an event loop iteration
> 
> also seems to be unused except for qemu-io-cmds.c (and easily removed from
> there).
> 
> Perhaps add a note (here or elsewhere) that timer_new_ms/qemu_bh_new should
> never be used in the block layer?

I'll note it further down where the block layer is mentioned.
diff mbox

Patch

diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
new file mode 100644
index 0000000..f2b008d
--- /dev/null
+++ b/docs/multiple-iothreads.txt
@@ -0,0 +1,124 @@ 
+This document explains the IOThread feature and how to write code that runs
+outside the QEMU global mutex.
+
+The main loop and IOThreads
+---------------------------
+QEMU is an event-driven program that can do several things at once using an
+event loop.  The VNC server and the QMP monitor are both processed from the
+same event loop which monitors their file descriptors until they become
+readable and then invokes a callback.
+
+The default event loop is called the main loop (see main-loop.c).  It is
+possible to create additional event loop threads using -object
+iothread,id=my-iothread.
+
+Side note: The main loop and IOThread are both event loops but their code is
+not shared completely.  Sometimes it is useful to remember that although they
+are conceptually similar they are currently not interchangeable.
+
+Why IOThreads are useful
+------------------------
+IOThreads allow the user to control the placement of work.  The main loop is a
+scalability bottleneck on hosts with many CPUs.  Work can be spread across
+several IOThreads instead of just one main loop.  When set up correctly this
+can improve I/O latency and reduce jitter seen by the guest.
+
+The main loop is also deeply associated with the QEMU global mutex, which is a
+scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
+global mutex to serialize execution of QEMU code.  This mutex is necessary
+because a lot of QEMU's code historically was not thread-safe.
+
+The fact that all I/O processing is done in a single main loop and that the
+QEMU global mutex is contended by all vCPU threads and the main loop explain
+why it is desirable to place work into IOThreads.
+
+The experimental virtio-blk data-plane implementation has been benchmarked and
+shows these effects:
+ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
+
+How to program for IOThreads
+----------------------------
+The main difference between legacy code and new code that can run in an
+IOThread is dealing explicitly with the event loop object, AioContext
+(see include/block/aio.h).  Code that only works in the main loop
+implicitly uses the main loop's AioContext.  Code that supports running
+in IOThreads must be aware of its AioContext.
+
+AioContext supports the following services:
+ * File descriptor monitoring (read/write/error)
+ * Event notifiers (inter-thread signalling)
+ * Timers
+ * Bottom Halves (BH) deferred callbacks
+
+There are several old APIs that use the main loop AioContext:
+ * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
+ * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
+ * LEGACY timer_new_ms() - create a timer
+ * LEGACY qemu_bh_new() - create a BH
+ * LEGACY qemu_aio_wait() - run an event loop iteration
+
+Since they implicitly work on the main loop they cannot be used in code that
+runs in an IOThread.  They might cause a crash or deadlock if called from an
+IOThread since the QEMU global mutex is not held.
+
+Instead, use the AioContext functions directly (see include/block/aio.h):
+ * aio_set_fd_handler() - monitor a file descriptor
+ * aio_set_event_notifier() - monitor an event notifier
+ * aio_timer_new() - create a timer
+ * aio_bh_new() - create a BH
+ * aio_poll() - run an event loop iteration
+
+The AioContext can be obtained from the IOThread using
+iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
+Code that takes an AioContext argument works both in IOThreads or the main
+loop, depending on which AioContext instance the caller passes in.
+
+How to synchronize with an IOThread
+-----------------------------------
+AioContext is not thread-safe so some rules must be followed when using file
+descriptors, event notifiers, timers, or BHs across threads:
+
+1. AioContext functions can be called safely from file descriptor, event
+notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
+necessary.
+
+2. Other threads wishing to access the AioContext must use
+aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
+context is acquired no other thread can access it or run event loop iterations
+in this AioContext.
+
+aio_context_acquire()/aio_context_release() calls may be nested.  This
+means you can call them if you're not sure whether #1 applies.
+
+There is currently no lock ordering rule if a thread needs to acquire multiple
+AioContexts simultaneously.  Therefore, it is only safe for code holding the
+QEMU global mutex to acquire other AioContexts.
+
+Side note: the best way to schedule a function call across threads is to create
+a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
+acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
+sure to acquire the AioContext for aio_bh_new() if necessary.
+
+The relationship between AioContext and the block layer
+-------------------------------------------------------
+The AioContext originates from the QEMU block layer because it provides a
+scoped way of running event loop iterations until all work is done.  This
+feature is used to complete all in-flight block I/O requests (see
+bdrv_drain_all()).  Nowadays AioContext is a generic event loop that can be
+used by any QEMU subsystem.
+
+The block layer has support for AioContext integrated.  Each BlockDriverState
+is associated with an AioContext using bdrv_set_aio_context() and
+bdrv_get_aio_context().  This allows block layer code to process I/O inside the
+right AioContext.  Other subsystems may wish to follow a similar approach.
+
+If main loop code such as a QMP function wishes to access a BlockDriverState it
+must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
+IOThread does not run in parallel.
+
+Long-running jobs (usually in the form of coroutines) are best scheduled in the
+BlockDriverState's AioContext to avoid the need to acquire/release around each
+bdrv_*() call.  Be aware that there is currently no mechanism to get notified
+when bdrv_set_aio_context() moves this BlockDriverState to a different
+AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
+may need to add this if you want to support long-running jobs.