Patchwork [RFC,v2,19/23] qcow2: Add error handling to the l2meta coroutine

login
register
mail settings
Submitter Kevin Wolf
Date Feb. 13, 2013, 1:22 p.m.
Message ID <1360761733-25347-20-git-send-email-kwolf@redhat.com>
Download mbox | patch
Permalink /patch/220143/
State New
Headers show

Comments

Kevin Wolf - Feb. 13, 2013, 1:22 p.m.
Not exactly bisectable, but one large patch isn't much better either :-(

m->error is used to allow bdrv_drain() to stop with l2meta in error
state rather than go into an endless loop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c |   44 ++++++++++++++++++++++++++++++++++++++++----
 block/qcow2.h |    3 +++
 2 files changed, 43 insertions(+), 4 deletions(-)
Stefan Hajnoczi - Feb. 18, 2013, 3:42 p.m.
On Wed, Feb 13, 2013 at 02:22:09PM +0100, Kevin Wolf wrote:
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 57552aa..2819336 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -774,11 +774,33 @@ static void coroutine_fn process_l2meta(void *opaque)
>          m->sleeping = false;
>      }
>  
> +again:
>      qemu_co_mutex_lock(&s->lock);
>  
>      ret = qcow2_alloc_cluster_link_l2(bs, m);
>      if (ret < 0) {
> -        /* FIXME */
> +        /*
> +         * This is a nasty situation: We have already completed the allocation
> +         * write request and returned success, so just failing it isn't
> +         * possible. We need to make sure to return an error during the next
> +         * flush.
> +         *
> +         * However, we still can't drop the l2meta because we want I/O errors
> +         * to be recoverable e.g. after the block device has been grown or the
> +         * network connection restored. Sleep until the next flush comes and
> +         * then retry.
> +         */

A failed flush is live migrated by hw/virtio-blk.c but what happens when
we fail during drain?
Kevin Wolf - Feb. 21, 2013, 9:35 a.m.
On Mon, Feb 18, 2013 at 04:42:55PM +0100, Stefan Hajnoczi wrote:
> On Wed, Feb 13, 2013 at 02:22:09PM +0100, Kevin Wolf wrote:
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 57552aa..2819336 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -774,11 +774,33 @@ static void coroutine_fn process_l2meta(void *opaque)
> >          m->sleeping = false;
> >      }
> >  
> > +again:
> >      qemu_co_mutex_lock(&s->lock);
> >  
> >      ret = qcow2_alloc_cluster_link_l2(bs, m);
> >      if (ret < 0) {
> > -        /* FIXME */
> > +        /*
> > +         * This is a nasty situation: We have already completed the allocation
> > +         * write request and returned success, so just failing it isn't
> > +         * possible. We need to make sure to return an error during the next
> > +         * flush.
> > +         *
> > +         * However, we still can't drop the l2meta because we want I/O errors
> > +         * to be recoverable e.g. after the block device has been grown or the
> > +         * network connection restored. Sleep until the next flush comes and
> > +         * then retry.
> > +         */
> 
> A failed flush is live migrated by hw/virtio-blk.c but what happens when
> we fail during drain?

That's a very good questions. Looks like things become rather hairy...
This would be a case where we really need a VMState for block drivers
(which is in fact how the whole rerror/werror handling would have been
implemented best).

Juan, any chance to introduce such a thing without breaking everything?
Is there something like optional top-level sections?

Kevin
Kevin Wolf - Feb. 21, 2013, 10:17 a.m.
On Thu, Feb 21, 2013 at 10:35:42AM +0100, Kevin Wolf wrote:
> On Mon, Feb 18, 2013 at 04:42:55PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Feb 13, 2013 at 02:22:09PM +0100, Kevin Wolf wrote:
> > > diff --git a/block/qcow2.c b/block/qcow2.c
> > > index 57552aa..2819336 100644
> > > --- a/block/qcow2.c
> > > +++ b/block/qcow2.c
> > > @@ -774,11 +774,33 @@ static void coroutine_fn process_l2meta(void *opaque)
> > >          m->sleeping = false;
> > >      }
> > >  
> > > +again:
> > >      qemu_co_mutex_lock(&s->lock);
> > >  
> > >      ret = qcow2_alloc_cluster_link_l2(bs, m);
> > >      if (ret < 0) {
> > > -        /* FIXME */
> > > +        /*
> > > +         * This is a nasty situation: We have already completed the allocation
> > > +         * write request and returned success, so just failing it isn't
> > > +         * possible. We need to make sure to return an error during the next
> > > +         * flush.
> > > +         *
> > > +         * However, we still can't drop the l2meta because we want I/O errors
> > > +         * to be recoverable e.g. after the block device has been grown or the
> > > +         * network connection restored. Sleep until the next flush comes and
> > > +         * then retry.
> > > +         */
> > 
> > A failed flush is live migrated by hw/virtio-blk.c but what happens when
> > we fail during drain?
> 
> That's a very good questions. Looks like things become rather hairy...
> This would be a case where we really need a VMState for block drivers
> (which is in fact how the whole rerror/werror handling would have been
> implemented best).
> 
> Juan, any chance to introduce such a thing without breaking everything?
> Is there something like optional top-level sections?

In fact, don't we have the same problem today, when flushing the image
on the source fails during completion of the migration? With this series
it just becomes much more likely to happen in practice.

Kevin

Patch

diff --git a/block/qcow2.c b/block/qcow2.c
index 57552aa..2819336 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -774,11 +774,33 @@  static void coroutine_fn process_l2meta(void *opaque)
         m->sleeping = false;
     }
 
+again:
     qemu_co_mutex_lock(&s->lock);
 
     ret = qcow2_alloc_cluster_link_l2(bs, m);
     if (ret < 0) {
-        /* FIXME */
+        /*
+         * This is a nasty situation: We have already completed the allocation
+         * write request and returned success, so just failing it isn't
+         * possible. We need to make sure to return an error during the next
+         * flush.
+         *
+         * However, we still can't drop the l2meta because we want I/O errors
+         * to be recoverable e.g. after the block device has been grown or the
+         * network connection restored. Sleep until the next flush comes and
+         * then retry.
+         */
+        s->flush_error = ret;
+
+        qemu_co_mutex_unlock(&s->lock);
+        qemu_co_rwlock_unlock(&s->l2meta_flush);
+        m->sleeping = true;
+        m->error = true;
+        qemu_coroutine_yield();
+        m->error = false;
+        m->sleeping = false;
+        qemu_co_rwlock_rdlock(&s->l2meta_flush);
+        goto again;
     }
 
     qemu_co_mutex_unlock(&s->lock);
@@ -801,11 +823,12 @@  static bool qcow2_drain(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
     QCowL2Meta *m;
+    bool busy = false;
 
     s->in_l2meta_flush = true;
 again:
     QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
-        if (m->sleeping) {
+        if (m->sleeping && !m->error) {
             qemu_coroutine_enter(m->co, NULL);
             /* next_in_flight link could have become invalid */
             goto again;
@@ -813,7 +836,19 @@  again:
     }
     s->in_l2meta_flush = false;
 
-    return !QLIST_EMPTY(&s->cluster_allocs);
+    /*
+     * If there's still a sleeping l2meta, then an error must have occured.
+     * Don't consider l2metas in this state as busy, they only get active on
+     * flushes.
+     */
+    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
+        if (!m->sleeping) {
+            busy = true;
+            break;
+        }
+    }
+
+    return busy;
 }
 
 static inline coroutine_fn void stop_l2meta(BlockDriverState *bs)
@@ -1683,7 +1718,8 @@  static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
         }
     }
 
-    ret = 0;
+    ret = s->flush_error;
+    s->flush_error = 0;
 fail:
     qemu_co_mutex_unlock(&s->lock);
     resume_l2meta(bs);
diff --git a/block/qcow2.h b/block/qcow2.h
index 1d7cdab..504f10f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -171,6 +171,8 @@  typedef struct BDRVQcowState {
     CoRwlock l2meta_flush;
     bool in_l2meta_flush;
 
+    int flush_error;
+
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
     uint32_t crypt_method_header;
     AES_KEY aes_encrypt_key;
@@ -250,6 +252,7 @@  typedef struct QCowL2Meta
      * be reentered in order to cancel the timer.
      */
     bool sleeping;
+    bool error;
 
     /** Coroutine that handles delayed COW and updates L2 entry */
     Coroutine *co;