diff mbox

[v5,13/15] block: add support for partial streaming

Message ID 1326460457-19446-14-git-send-email-stefanha@linux.vnet.ibm.com
State New
Headers show

Commit Message

Stefan Hajnoczi Jan. 13, 2012, 1:14 p.m. UTC
From: Marcelo Tosatti <mtosatti@redhat.com>

Add support for streaming data from an intermediate section of the
image chain (see patch and documentation for details).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block/stream.c |   91 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 block_int.h    |    3 +-
 blockdev.c     |   11 ++++--
 3 files changed, 96 insertions(+), 9 deletions(-)

Comments

Kevin Wolf Jan. 17, 2012, 1:27 p.m. UTC | #1
Am 13.01.2012 14:14, schrieb Stefan Hajnoczi:
> From: Marcelo Tosatti <mtosatti@redhat.com>
> 
> Add support for streaming data from an intermediate section of the
> image chain (see patch and documentation for details).
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

I'm afraid that in the review for the previous version I couldn't see
the wood for the trees... This does limit the COR requests issued by
image streaming, but not those issued by the guest. Am I missing
something? This is not what we want, is it?

Kevin
Marcelo Tosatti Jan. 17, 2012, 1:50 p.m. UTC | #2
On Tue, Jan 17, 2012 at 02:27:04PM +0100, Kevin Wolf wrote:
> Am 13.01.2012 14:14, schrieb Stefan Hajnoczi:
> > From: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > Add support for streaming data from an intermediate section of the
> > image chain (see patch and documentation for details).
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> 
> I'm afraid that in the review for the previous version I couldn't see
> the wood for the trees... This does limit the COR requests issued by
> image streaming, but not those issued by the guest. Am I missing
> something? This is not what we want, is it?

What you mean "limit the COR requests"? 

bdrv_co_is_allocated_base (or its new name) relies on
bbdrv_co_is_allocated being synchronous.

Sectors are only allocated in the top image, and in that case the
situation regaring synchronicity is the same as without shared base
option, that is, the serialization in bdrv_aio_read/bdrv_aio_write level
is responsible for correctness.
Kevin Wolf Jan. 17, 2012, 2:05 p.m. UTC | #3
Am 17.01.2012 14:50, schrieb Marcelo Tosatti:
> On Tue, Jan 17, 2012 at 02:27:04PM +0100, Kevin Wolf wrote:
>> Am 13.01.2012 14:14, schrieb Stefan Hajnoczi:
>>> From: Marcelo Tosatti <mtosatti@redhat.com>
>>>
>>> Add support for streaming data from an intermediate section of the
>>> image chain (see patch and documentation for details).
>>>
>>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>
>> I'm afraid that in the review for the previous version I couldn't see
>> the wood for the trees... This does limit the COR requests issued by
>> image streaming, but not those issued by the guest. Am I missing
>> something? This is not what we want, is it?
> 
> What you mean "limit the COR requests"? 

base -> sn1 -> sn2

You only want to copy the content of sn1 into sn2 and keep base. The
streaming coroutine is doing the right thing because it checks
is_allocated_base. However, if it is the guest that reads some data from
base, COR copies it into sn2 even though it's in the common base file.

Maybe streaming shouldn't enable normal COR on images, but instead of
calling bdrv_co_read it could directly call bdrv_co_copy_on_readv().

Kevin
Marcelo Tosatti Jan. 17, 2012, 3:47 p.m. UTC | #4
On Tue, Jan 17, 2012 at 03:05:29PM +0100, Kevin Wolf wrote:
> Am 17.01.2012 14:50, schrieb Marcelo Tosatti:
> > On Tue, Jan 17, 2012 at 02:27:04PM +0100, Kevin Wolf wrote:
> >> Am 13.01.2012 14:14, schrieb Stefan Hajnoczi:
> >>> From: Marcelo Tosatti <mtosatti@redhat.com>
> >>>
> >>> Add support for streaming data from an intermediate section of the
> >>> image chain (see patch and documentation for details).
> >>>
> >>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> >>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> >>
> >> I'm afraid that in the review for the previous version I couldn't see
> >> the wood for the trees... This does limit the COR requests issued by
> >> image streaming, but not those issued by the guest. Am I missing
> >> something? This is not what we want, is it?
> > 
> > What you mean "limit the COR requests"? 
> 
> base -> sn1 -> sn2
> 
> You only want to copy the content of sn1 into sn2 and keep base. The
> streaming coroutine is doing the right thing because it checks
> is_allocated_base. However, if it is the guest that reads some data from
> base, COR copies it into sn2 even though it's in the common base file.

Ah, yes.

> Maybe streaming shouldn't enable normal COR on images, but instead of
> calling bdrv_co_read it could directly call bdrv_co_copy_on_readv().

That would work.
Stefan Hajnoczi Jan. 17, 2012, 3:55 p.m. UTC | #5
On Tue, Jan 17, 2012 at 3:47 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Tue, Jan 17, 2012 at 03:05:29PM +0100, Kevin Wolf wrote:
>> Am 17.01.2012 14:50, schrieb Marcelo Tosatti:
>> > On Tue, Jan 17, 2012 at 02:27:04PM +0100, Kevin Wolf wrote:
>> >> Am 13.01.2012 14:14, schrieb Stefan Hajnoczi:
>> >>> From: Marcelo Tosatti <mtosatti@redhat.com>
>> >>>
>> >>> Add support for streaming data from an intermediate section of the
>> >>> image chain (see patch and documentation for details).
>> >>>
>> >>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>> >>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> >>
>> >> I'm afraid that in the review for the previous version I couldn't see
>> >> the wood for the trees... This does limit the COR requests issued by
>> >> image streaming, but not those issued by the guest. Am I missing
>> >> something? This is not what we want, is it?
>> >
>> > What you mean "limit the COR requests"?
>>
>> base -> sn1 -> sn2
>>
>> You only want to copy the content of sn1 into sn2 and keep base. The
>> streaming coroutine is doing the right thing because it checks
>> is_allocated_base. However, if it is the guest that reads some data from
>> base, COR copies it into sn2 even though it's in the common base file.
>
> Ah, yes.
>
>> Maybe streaming shouldn't enable normal COR on images, but instead of
>> calling bdrv_co_read it could directly call bdrv_co_copy_on_readv().
>
> That would work.

Sounds like a good suggestion.  It will prevent the case where a guest
is doing heavy read I/O during image streaming with a 'base' and we
bloat the destination image file.

I'll resend the series with this fix.

Stefan
diff mbox

Patch

diff --git a/block/stream.c b/block/stream.c
index 93f0305..7532f5e 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -57,6 +57,7 @@  typedef struct StreamBlockJob {
     BlockJob common;
     RateLimit limit;
     BlockDriverState *base;
+    char backing_file_id[1024];
 } StreamBlockJob;
 
 static int coroutine_fn stream_populate(BlockDriverState *bs,
@@ -75,10 +76,76 @@  static int coroutine_fn stream_populate(BlockDriverState *bs,
     return bdrv_co_readv(bs, sector_num, nb_sectors, &qiov);
 }
 
+/*
+ * Given an image chain: [BASE] -> [INTER1] -> [INTER2] -> [TOP]
+ *
+ * Return true if the given sector is allocated in top.
+ * Return false if the given sector is allocated in intermediate images.
+ * Return true otherwise.
+ *
+ * 'pnum' is set to the number of sectors (including and immediately following
+ *  the specified sector) that are known to be in the same
+ *  allocated/unallocated state.
+ *
+ */
+static int coroutine_fn is_allocated_base(BlockDriverState *top,
+                                          BlockDriverState *base,
+                                          int64_t sector_num,
+                                          int nb_sectors, int *pnum)
+{
+    BlockDriverState *intermediate;
+    int ret, n;
+
+    ret = bdrv_co_is_allocated(top, sector_num, nb_sectors, &n);
+    if (ret) {
+        *pnum = n;
+        return ret;
+    }
+
+    /*
+     * Is the unallocated chunk [sector_num, n] also
+     * unallocated between base and top?
+     */
+    intermediate = top->backing_hd;
+
+    while (intermediate) {
+        int pnum_inter;
+
+        /* reached base */
+        if (intermediate == base) {
+            *pnum = n;
+            return 1;
+        }
+        ret = bdrv_co_is_allocated(intermediate, sector_num, nb_sectors,
+                                   &pnum_inter);
+        if (ret < 0) {
+            return ret;
+        } else if (ret) {
+            *pnum = pnum_inter;
+            return 0;
+        }
+
+        /*
+         * [sector_num, nb_sectors] is unallocated on top but intermediate
+         * might have
+         *
+         * [sector_num+x, nr_sectors] allocated.
+         */
+        if (n > pnum_inter) {
+            n = pnum_inter;
+        }
+
+        intermediate = intermediate->backing_hd;
+    }
+
+    return 1;
+}
+
 static void coroutine_fn stream_run(void *opaque)
 {
     StreamBlockJob *s = opaque;
     BlockDriverState *bs = s->common.bs;
+    BlockDriverState *base = s->base;
     int64_t sector_num, end;
     int ret = 0;
     int n;
@@ -101,8 +168,15 @@  retry:
             break;
         }
 
-        ret = bdrv_co_is_allocated(bs, sector_num,
-                                   STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
+
+        if (base) {
+            ret = is_allocated_base(bs, base, sector_num,
+                                    STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
+        } else {
+            ret = bdrv_co_is_allocated(bs, sector_num,
+                                       STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE,
+                                       &n);
+        }
         trace_stream_one_iteration(s, sector_num, n, ret);
         if (ret == 0) {
             if (s->common.speed) {
@@ -119,6 +193,7 @@  retry:
         if (ret < 0) {
             break;
         }
+        ret = 0;
 
         /* Publish progress */
         s->common.offset += n * BDRV_SECTOR_SIZE;
@@ -132,7 +207,11 @@  retry:
     bdrv_disable_copy_on_read(bs);
 
     if (sector_num == end && ret == 0) {
-        ret = bdrv_change_backing_file(bs, NULL, NULL);
+        const char *base_id = NULL;
+        if (base) {
+            base_id = s->backing_file_id;
+        }
+        ret = bdrv_change_backing_file(bs, base_id, NULL);
     }
 
     qemu_vfree(buf);
@@ -158,7 +237,8 @@  static BlockJobType stream_job_type = {
 };
 
 int stream_start(BlockDriverState *bs, BlockDriverState *base,
-                 BlockDriverCompletionFunc *cb, void *opaque)
+                 const char *base_id, BlockDriverCompletionFunc *cb,
+                 void *opaque)
 {
     StreamBlockJob *s;
     Coroutine *co;
@@ -169,6 +249,9 @@  int stream_start(BlockDriverState *bs, BlockDriverState *base,
     }
 
     s->base = base;
+    if (base_id) {
+        pstrcpy(s->backing_file_id, sizeof(s->backing_file_id), base_id);
+    }
 
     co = qemu_coroutine_create(stream_run);
     trace_stream_start(bs, base, s, co, opaque);
diff --git a/block_int.h b/block_int.h
index c7c9178..ed92884 100644
--- a/block_int.h
+++ b/block_int.h
@@ -333,6 +333,7 @@  void block_job_cancel(BlockJob *job);
 bool block_job_is_cancelled(BlockJob *job);
 
 int stream_start(BlockDriverState *bs, BlockDriverState *base,
-                 BlockDriverCompletionFunc *cb, void *opaque);
+                 const char *base_id, BlockDriverCompletionFunc *cb,
+                 void *opaque);
 
 #endif /* BLOCK_INT_H */
diff --git a/blockdev.c b/blockdev.c
index 45a6ba6..ed3002d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -956,6 +956,7 @@  void qmp_block_stream(const char *device, bool has_base,
                       const char *base, Error **errp)
 {
     BlockDriverState *bs;
+    BlockDriverState *base_bs = NULL;
     int ret;
 
     bs = bdrv_find(device);
@@ -964,13 +965,15 @@  void qmp_block_stream(const char *device, bool has_base,
         return;
     }
 
-    /* Base device not supported */
     if (base) {
-        error_set(errp, QERR_NOT_SUPPORTED);
-        return;
+        base_bs = bdrv_find_backing_image(bs, base);
+        if (base_bs == NULL) {
+            error_set(errp, QERR_BASE_NOT_FOUND, base);
+            return;
+        }
     }
 
-    ret = stream_start(bs, NULL, block_stream_cb, bs);
+    ret = stream_start(bs, base_bs, base, block_stream_cb, bs);
     if (ret < 0) {
         switch (ret) {
         case -EBUSY: