From patchwork Mon Dec 4 17:01:02 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kevin Wolf X-Patchwork-Id: 844327 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yrB6f47MWz9t3x for ; Tue, 5 Dec 2017 04:03:51 +1100 (AEDT) Received: from localhost ([::1]:44208 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eLu9k-0006gD-55 for incoming@patchwork.ozlabs.org; Mon, 04 Dec 2017 12:03:48 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34568) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eLu7R-0004CX-VJ for qemu-devel@nongnu.org; Mon, 04 Dec 2017 12:01:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eLu7O-0008EG-3z for qemu-devel@nongnu.org; Mon, 04 Dec 2017 12:01:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49272) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eLu7G-0008AX-Fu; Mon, 04 Dec 2017 12:01:14 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 93BC4461C1; Mon, 4 Dec 2017 17:01:13 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-31.ams2.redhat.com [10.36.117.31]) by smtp.corp.redhat.com (Postfix) with ESMTP id D39D56F44C; Mon, 4 Dec 2017 17:01:11 +0000 (UTC) From: Kevin Wolf To: qemu-block@nongnu.org Date: Mon, 4 Dec 2017 18:01:02 +0100 Message-Id: <20171204170103.13973-2-kwolf@redhat.com> In-Reply-To: <20171204170103.13973-1-kwolf@redhat.com> References: <20171204170103.13973-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 04 Dec 2017 17:01:13 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, peter.maydell@linaro.org, berto@igalia.com, qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: Alberto Garcia Starting from commit 40840e419be31e6a32e6ea24511c74b389d5e0e4 we are pausing all block jobs during bdrv_reopen_multiple() to prevent any of them from finishing and removing nodes from the graph while they are being reopened. It turns out that pausing a block job doesn't necessarily prevent it from finishing: a paused block job can still run its exit function from the main loop and call block_job_completed(). The mirror block job in particular always goes to the main loop while it is paused (by virtue of the bdrv_drained_begin() call in mirror_run()). Destroying a paused block job during bdrv_reopen_multiple() has two consequences: 1) The references to the nodes involved in the job are released, possibly destroying some of them. If those nodes were in the reopen queue this would trigger the problem originally described in commit 40840e419be, crashing QEMU. 2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would not be doing all necessary bdrv_parent_drained_end() calls. I can reproduce problem 1) easily with iotest 030 by increasing STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking the iotest like in this example: https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html This patch keeps an additional reference to all block jobs between block_job_pause_all() and block_job_resume_all(), guaranteeing that they are kept alive. Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf --- blockjob.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/blockjob.c b/blockjob.c index 0ed50b953b..715c2c2680 100644 --- a/blockjob.c +++ b/blockjob.c @@ -730,6 +730,7 @@ void block_job_pause_all(void) AioContext *aio_context = blk_get_aio_context(job->blk); aio_context_acquire(aio_context); + block_job_ref(job); block_job_pause(job); aio_context_release(aio_context); } @@ -808,12 +809,14 @@ void coroutine_fn block_job_pause_point(BlockJob *job) void block_job_resume_all(void) { - BlockJob *job = NULL; - while ((job = block_job_next(job))) { + BlockJob *job, *next; + + QLIST_FOREACH_SAFE(job, &block_jobs, job_list, next) { AioContext *aio_context = blk_get_aio_context(job->blk); aio_context_acquire(aio_context); block_job_resume(job); + block_job_unref(job); aio_context_release(aio_context); } }