diff mbox series

[v2,3/9] jobs: add exit shim

Message ID 20180823220856.10094-4-jsnow@redhat.com
State New
Headers show
Series jobs: Job Exit Refactoring Pt 1 | expand

Commit Message

John Snow Aug. 23, 2018, 10:08 p.m. UTC
All jobs do the same thing when they leave their running loop:
- Store the return code in a structure
- wait to receive this structure in the main thread
- signal job completion via job_completed

Few jobs do anything beyond exactly this. Consolidate this exit
logic for a net reduction in SLOC.

More seriously, when we utilize job_defer_to_main_loop_bh to call
a function that calls job_completed, job_finalize_single will run
in a context where it has recursively taken the aio_context lock,
which can cause hangs if it puts down a reference that causes a flush.

You can observe this in practice by looking at mirror_exit's careful
placement of job_completed and bdrv_unref calls.

If we centralize job exiting, we can signal job completion from outside
of the aio_context, which should allow for job cleanup code to run with
only one lock, which makes cleanup callbacks less tricky to write.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 include/qemu/job.h | 11 +++++++++++
 job.c              | 18 ++++++++++++++++++
 2 files changed, 29 insertions(+)

Comments

Max Reitz Aug. 27, 2018, 10 a.m. UTC | #1
On 2018-08-24 00:08, John Snow wrote:
> All jobs do the same thing when they leave their running loop:
> - Store the return code in a structure
> - wait to receive this structure in the main thread
> - signal job completion via job_completed
> 
> Few jobs do anything beyond exactly this. Consolidate this exit
> logic for a net reduction in SLOC.
> 
> More seriously, when we utilize job_defer_to_main_loop_bh to call
> a function that calls job_completed, job_finalize_single will run
> in a context where it has recursively taken the aio_context lock,
> which can cause hangs if it puts down a reference that causes a flush.
> 
> You can observe this in practice by looking at mirror_exit's careful
> placement of job_completed and bdrv_unref calls.
> 
> If we centralize job exiting, we can signal job completion from outside
> of the aio_context, which should allow for job cleanup code to run with
> only one lock, which makes cleanup callbacks less tricky to write.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  include/qemu/job.h | 11 +++++++++++
>  job.c              | 18 ++++++++++++++++++
>  2 files changed, 29 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>
diff mbox series

Patch

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 5c92c53ef0..c67f6a647e 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -204,6 +204,17 @@  struct JobDriver {
      */
     void (*drain)(Job *job);
 
+    /**
+     * If the callback is not NULL, exit will be invoked from the main thread
+     * when the job's coroutine has finished, but before transactional
+     * convergence; before @prepare or @abort.
+     *
+     * FIXME TODO: This callback is only temporary to transition remaining jobs
+     * to prepare/commit/abort/clean callbacks and will be removed before 3.1.
+     * is released.
+     */
+    void (*exit)(Job *job);
+
     /**
      * If the callback is not NULL, prepare will be invoked when all the jobs
      * belonging to the same transaction complete; or upon this job's completion
diff --git a/job.c b/job.c
index bc1d970df4..bc8dad4e71 100644
--- a/job.c
+++ b/job.c
@@ -535,6 +535,18 @@  void job_drain(Job *job)
     }
 }
 
+static void job_exit(void *opaque)
+{
+    Job *job = (Job *)opaque;
+    AioContext *aio_context = job->aio_context;
+
+    if (job->driver->exit) {
+        aio_context_acquire(aio_context);
+        job->driver->exit(job);
+        aio_context_release(aio_context);
+    }
+    job_completed(job, job->ret);
+}
 
 /**
  * All jobs must allow a pause point before entering their job proper. This
@@ -547,6 +559,12 @@  static void coroutine_fn job_co_entry(void *opaque)
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
     job->ret = job->driver->run(job, &job->err);
+    if (!job->deferred_to_main_loop) {
+        job->deferred_to_main_loop = true;
+        aio_bh_schedule_oneshot(qemu_get_aio_context(),
+                                job_exit,
+                                job);
+    }
 }