diff mbox series

[PULL,1/4] blockjob: do not allow coroutine double entry or entry-after-completion

Message ID 20171121170350.31290-2-jcody@redhat.com
State New
Headers show
Series [PULL,1/4] blockjob: do not allow coroutine double entry or entry-after-completion | expand

Commit Message

Jeff Cody Nov. 21, 2017, 5:03 p.m. UTC
When block_job_sleep_ns() is called, the co-routine is scheduled for
future execution.  If we allow the job to be re-entered prior to the
scheduled time, we present a race condition in which a coroutine can be
entered recursively, or even entered after the coroutine is deleted.

The job->busy flag is used by blockjobs when a coroutine is busy
executing. The function 'block_job_enter()' obeys the busy flag,
and will not enter a coroutine if set.  If we sleep a job, we need to
leave the busy flag set, so that subsequent calls to block_job_enter()
are prevented.

This changes the prior behavior of block_job_cancel() being able to
immediately wake up and cancel a job; in practice, this should not be an
issue, as the coroutine sleep times are generally very small, and the
cancel will occur the next time the coroutine wakes up.

This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockjob.c                   | 7 +++++--
 include/block/blockjob_int.h | 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

Comments

Kevin Wolf Nov. 22, 2017, 10:25 a.m. UTC | #1
Am 21.11.2017 um 18:03 hat Jeff Cody geschrieben:
> When block_job_sleep_ns() is called, the co-routine is scheduled for
> future execution.  If we allow the job to be re-entered prior to the
> scheduled time, we present a race condition in which a coroutine can be
> entered recursively, or even entered after the coroutine is deleted.
> 
> The job->busy flag is used by blockjobs when a coroutine is busy
> executing. The function 'block_job_enter()' obeys the busy flag,
> and will not enter a coroutine if set.  If we sleep a job, we need to
> leave the busy flag set, so that subsequent calls to block_job_enter()
> are prevented.
> 
> This changes the prior behavior of block_job_cancel() being able to
> immediately wake up and cancel a job; in practice, this should not be an
> issue, as the coroutine sleep times are generally very small, and the
> cancel will occur the next time the coroutine wakes up.
> 
> This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

git bisect says that this is the commit where qemu-iotests started to
break, e.g. case 020:

--- /home/kwolf/source/qemu/tests/qemu-iotests/020.out  2017-11-20 10:43:53.157894898 +0100
+++ /home/kwolf/source/qemu/tests/qemu-iotests/020.out.bad      2017-11-22 11:22:48.781344756 +0100
@@ -537,7 +537,8 @@
 wrote 65536/65536 bytes at offset 4295098368
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 No errors were found on the image.
-Image committed.
+qemu-img: block/block-backend.c:2086: blk_root_drained_end: Assertion `blk->quiesce_counter' failed.
+./common.rc: line 61: 17396 Aborted                 (core dumped) ( exec "$QEMU_IMG_PROG" $QEMU_IMG_OPTIONS "$@" )
 Reading from the backing file
diff mbox series

Patch

diff --git a/blockjob.c b/blockjob.c
index 3a0c491..ff9a614 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -797,11 +797,14 @@  void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
         return;
     }
 
-    job->busy = false;
+    /* We need to leave job->busy set here, because when we have
+     * put a coroutine to 'sleep', we have scheduled it to run in
+     * the future.  We cannot enter that same coroutine again before
+     * it wakes and runs, otherwise we risk double-entry or entry after
+     * completion. */
     if (!block_job_should_pause(job)) {
         co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
     }
-    job->busy = true;
 
     block_job_pause_point(job);
 }
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index f13ad05..43f3be2 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -143,7 +143,8 @@  void *block_job_create(const char *job_id, const BlockJobDriver *driver,
  * @ns: How many nanoseconds to stop for.
  *
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will interrupt the wait immediately.
+ * nanoseconds.  Canceling the job will not interrupt the wait, so the
+ * cancel will not process until the coroutine wakes up.
  */
 void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);