From patchwork Wed Feb 13 13:22:09 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kevin Wolf X-Patchwork-Id: 220143 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 1707D2C008E for ; Thu, 14 Feb 2013 01:14:08 +1100 (EST) Received: from localhost ([::1]:35712 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U5cJy-0008Eu-22 for incoming@patchwork.ozlabs.org; Wed, 13 Feb 2013 08:24:22 -0500 Received: from eggs.gnu.org ([208.118.235.92]:51809) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U5cIe-00071R-Hz for qemu-devel@nongnu.org; Wed, 13 Feb 2013 08:23:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1U5cIc-0002zz-1A for qemu-devel@nongnu.org; Wed, 13 Feb 2013 08:23:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:18653) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U5cIb-0002zm-PI for qemu-devel@nongnu.org; Wed, 13 Feb 2013 08:22:57 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r1DDMvvS004785 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 13 Feb 2013 08:22:57 -0500 Received: from dhcp-5-188.str.redhat.com (dhcp-192-240.str.redhat.com [10.33.192.240]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r1DDMGRq011418; Wed, 13 Feb 2013 08:22:55 -0500 From: Kevin Wolf To: qemu-devel@nongnu.org Date: Wed, 13 Feb 2013 14:22:09 +0100 Message-Id: <1360761733-25347-20-git-send-email-kwolf@redhat.com> In-Reply-To: <1360761733-25347-1-git-send-email-kwolf@redhat.com> References: <1360761733-25347-1-git-send-email-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: kwolf@redhat.com, pbonzini@redhat.com, lersek@redhat.com, stefanha@redhat.com Subject: [Qemu-devel] [RFC PATCH v2 19/23] qcow2: Add error handling to the l2meta coroutine X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Not exactly bisectable, but one large patch isn't much better either :-( m->error is used to allow bdrv_drain() to stop with l2meta in error state rather than go into an endless loop. Signed-off-by: Kevin Wolf --- block/qcow2.c | 44 ++++++++++++++++++++++++++++++++++++++++---- block/qcow2.h | 3 +++ 2 files changed, 43 insertions(+), 4 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 57552aa..2819336 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -774,11 +774,33 @@ static void coroutine_fn process_l2meta(void *opaque) m->sleeping = false; } +again: qemu_co_mutex_lock(&s->lock); ret = qcow2_alloc_cluster_link_l2(bs, m); if (ret < 0) { - /* FIXME */ + /* + * This is a nasty situation: We have already completed the allocation + * write request and returned success, so just failing it isn't + * possible. We need to make sure to return an error during the next + * flush. + * + * However, we still can't drop the l2meta because we want I/O errors + * to be recoverable e.g. after the block device has been grown or the + * network connection restored. Sleep until the next flush comes and + * then retry. + */ + s->flush_error = ret; + + qemu_co_mutex_unlock(&s->lock); + qemu_co_rwlock_unlock(&s->l2meta_flush); + m->sleeping = true; + m->error = true; + qemu_coroutine_yield(); + m->error = false; + m->sleeping = false; + qemu_co_rwlock_rdlock(&s->l2meta_flush); + goto again; } qemu_co_mutex_unlock(&s->lock); @@ -801,11 +823,12 @@ static bool qcow2_drain(BlockDriverState *bs) { BDRVQcowState *s = bs->opaque; QCowL2Meta *m; + bool busy = false; s->in_l2meta_flush = true; again: QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) { - if (m->sleeping) { + if (m->sleeping && !m->error) { qemu_coroutine_enter(m->co, NULL); /* next_in_flight link could have become invalid */ goto again; @@ -813,7 +836,19 @@ again: } s->in_l2meta_flush = false; - return !QLIST_EMPTY(&s->cluster_allocs); + /* + * If there's still a sleeping l2meta, then an error must have occured. + * Don't consider l2metas in this state as busy, they only get active on + * flushes. + */ + QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) { + if (!m->sleeping) { + busy = true; + break; + } + } + + return busy; } static inline coroutine_fn void stop_l2meta(BlockDriverState *bs) @@ -1683,7 +1718,8 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs) } } - ret = 0; + ret = s->flush_error; + s->flush_error = 0; fail: qemu_co_mutex_unlock(&s->lock); resume_l2meta(bs); diff --git a/block/qcow2.h b/block/qcow2.h index 1d7cdab..504f10f 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -171,6 +171,8 @@ typedef struct BDRVQcowState { CoRwlock l2meta_flush; bool in_l2meta_flush; + int flush_error; + uint32_t crypt_method; /* current crypt method, 0 if no key yet */ uint32_t crypt_method_header; AES_KEY aes_encrypt_key; @@ -250,6 +252,7 @@ typedef struct QCowL2Meta * be reentered in order to cancel the timer. */ bool sleeping; + bool error; /** Coroutine that handles delayed COW and updates L2 entry */ Coroutine *co;