Patchwork [RFC,v2,03/23] qcow2: Change handle_dependency to byte granularity

login
register
mail settings
Submitter Kevin Wolf
Date Feb. 13, 2013, 1:21 p.m.
Message ID <1360761733-25347-4-git-send-email-kwolf@redhat.com>
Download mbox | patch
Permalink /patch/220169/
State New
Headers show

Comments

Kevin Wolf - Feb. 13, 2013, 1:21 p.m.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c |   40 ++++++++++++++++++++++++++++------------
 block/qcow2.h         |   11 +++++++++++
 2 files changed, 39 insertions(+), 12 deletions(-)
Stefan Hajnoczi - Feb. 14, 2013, 2:48 p.m.
On Wed, Feb 13, 2013 at 02:21:53PM +0100, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2-cluster.c |   40 ++++++++++++++++++++++++++++------------
>  block/qcow2.h         |   11 +++++++++++
>  2 files changed, 39 insertions(+), 12 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 0e804ba..a3b2447 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -756,31 +756,41 @@ out:
>   * Check if there already is an AIO write request in flight which allocates
>   * the same cluster. In this case we need to wait until the previous
>   * request has completed and updated the L2 table accordingly.
> + *
> + * Returns:
> + *   0       if there was no dependency. *cur_bytes indicates the number of
> + *           bytes from guest_offset that can be read before the next
> + *           dependency must be processed (or the request is complete)

s/read/accessed/

IMO "read" is too specific, the doc comment doesn't need to contain
knowledge of what the caller will do at guest_offset.

> + *
> + *   -EAGAIN if we had to wait for another request, previously gathered
> + *           information on cluster allocation may be invalid now. The caller
> + *           must start over anyway, so consider *cur_bytes undefined.
>   */
>  static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
> -    unsigned int *nb_clusters)
> +    uint64_t *cur_bytes)
>  {
>      BDRVQcowState *s = bs->opaque;
>      QCowL2Meta *old_alloc;
> +    uint64_t bytes = *cur_bytes;
>  
>      QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) {
>  
> -        uint64_t start = guest_offset >> s->cluster_bits;
> -        uint64_t end = start + *nb_clusters;
> -        uint64_t old_start = old_alloc->offset >> s->cluster_bits;
> -        uint64_t old_end = old_start + old_alloc->nb_clusters;
> +        uint64_t start = guest_offset;

I'm not sure this is safe.  Previously the function caught write
requests which overlap at cluster granularity, now it will allow them?

Stefan
Kevin Wolf - Feb. 14, 2013, 3:45 p.m.
On Thu, Feb 14, 2013 at 03:48:51PM +0100, Stefan Hajnoczi wrote:
> On Wed, Feb 13, 2013 at 02:21:53PM +0100, Kevin Wolf wrote:
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/qcow2-cluster.c |   40 ++++++++++++++++++++++++++++------------
> >  block/qcow2.h         |   11 +++++++++++
> >  2 files changed, 39 insertions(+), 12 deletions(-)
> > 
> > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > index 0e804ba..a3b2447 100644
> > --- a/block/qcow2-cluster.c
> > +++ b/block/qcow2-cluster.c
> > @@ -756,31 +756,41 @@ out:
> >   * Check if there already is an AIO write request in flight which allocates
> >   * the same cluster. In this case we need to wait until the previous
> >   * request has completed and updated the L2 table accordingly.
> > + *
> > + * Returns:
> > + *   0       if there was no dependency. *cur_bytes indicates the number of
> > + *           bytes from guest_offset that can be read before the next
> > + *           dependency must be processed (or the request is complete)
> 
> s/read/accessed/
> 
> IMO "read" is too specific, the doc comment doesn't need to contain
> knowledge of what the caller will do at guest_offset.

Right. In fact, "read" is even wrong, if anything it should be "written to".

> > + *
> > + *   -EAGAIN if we had to wait for another request, previously gathered
> > + *           information on cluster allocation may be invalid now. The caller
> > + *           must start over anyway, so consider *cur_bytes undefined.
> >   */
> >  static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
> > -    unsigned int *nb_clusters)
> > +    uint64_t *cur_bytes)
> >  {
> >      BDRVQcowState *s = bs->opaque;
> >      QCowL2Meta *old_alloc;
> > +    uint64_t bytes = *cur_bytes;
> >  
> >      QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) {
> >  
> > -        uint64_t start = guest_offset >> s->cluster_bits;
> > -        uint64_t end = start + *nb_clusters;
> > -        uint64_t old_start = old_alloc->offset >> s->cluster_bits;
> > -        uint64_t old_end = old_start + old_alloc->nb_clusters;
> > +        uint64_t start = guest_offset;
> 
> I'm not sure this is safe.  Previously the function caught write
> requests which overlap at cluster granularity, now it will allow them?

At this point in the series, old_start and old_end are still aligned to
cluster boundaries. So the cases in which an overlap is detected stay exactly
the same with this patch. This is just a more precise description of what we
really wanted to compare.

Kevin
Stefan Hajnoczi - Feb. 15, 2013, 9:12 a.m.
On Thu, Feb 14, 2013 at 04:45:06PM +0100, Kevin Wolf wrote:
> On Thu, Feb 14, 2013 at 03:48:51PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Feb 13, 2013 at 02:21:53PM +0100, Kevin Wolf wrote:
> > > + *
> > > + *   -EAGAIN if we had to wait for another request, previously gathered
> > > + *           information on cluster allocation may be invalid now. The caller
> > > + *           must start over anyway, so consider *cur_bytes undefined.
> > >   */
> > >  static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
> > > -    unsigned int *nb_clusters)
> > > +    uint64_t *cur_bytes)
> > >  {
> > >      BDRVQcowState *s = bs->opaque;
> > >      QCowL2Meta *old_alloc;
> > > +    uint64_t bytes = *cur_bytes;
> > >  
> > >      QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) {
> > >  
> > > -        uint64_t start = guest_offset >> s->cluster_bits;
> > > -        uint64_t end = start + *nb_clusters;
> > > -        uint64_t old_start = old_alloc->offset >> s->cluster_bits;
> > > -        uint64_t old_end = old_start + old_alloc->nb_clusters;
> > > +        uint64_t start = guest_offset;
> > 
> > I'm not sure this is safe.  Previously the function caught write
> > requests which overlap at cluster granularity, now it will allow them?
> 
> At this point in the series, old_start and old_end are still aligned to
> cluster boundaries. So the cases in which an overlap is detected stay exactly
> the same with this patch. This is just a more precise description of what we
> really wanted to compare.

Good point.  In an overlap comparison between A and B only one of them
needs to be cluster-aligned.  I've used this trick myself in the past
but missed it in this code review.

Stefan

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0e804ba..a3b2447 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -756,31 +756,41 @@  out:
  * Check if there already is an AIO write request in flight which allocates
  * the same cluster. In this case we need to wait until the previous
  * request has completed and updated the L2 table accordingly.
+ *
+ * Returns:
+ *   0       if there was no dependency. *cur_bytes indicates the number of
+ *           bytes from guest_offset that can be read before the next
+ *           dependency must be processed (or the request is complete)
+ *
+ *   -EAGAIN if we had to wait for another request, previously gathered
+ *           information on cluster allocation may be invalid now. The caller
+ *           must start over anyway, so consider *cur_bytes undefined.
  */
 static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
-    unsigned int *nb_clusters)
+    uint64_t *cur_bytes)
 {
     BDRVQcowState *s = bs->opaque;
     QCowL2Meta *old_alloc;
+    uint64_t bytes = *cur_bytes;
 
     QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) {
 
-        uint64_t start = guest_offset >> s->cluster_bits;
-        uint64_t end = start + *nb_clusters;
-        uint64_t old_start = old_alloc->offset >> s->cluster_bits;
-        uint64_t old_end = old_start + old_alloc->nb_clusters;
+        uint64_t start = guest_offset;
+        uint64_t end = start + bytes;
+        uint64_t old_start = l2meta_cow_start(old_alloc);
+        uint64_t old_end = l2meta_cow_end(old_alloc);
 
         if (end <= old_start || start >= old_end) {
             /* No intersection */
         } else {
             if (start < old_start) {
                 /* Stop at the start of a running allocation */
-                *nb_clusters = old_start - start;
+                bytes = old_start - start;
             } else {
-                *nb_clusters = 0;
+                bytes = 0;
             }
 
-            if (*nb_clusters == 0) {
+            if (bytes == 0) {
                 /* Wait for the dependency to complete. We need to recheck
                  * the free/allocated clusters when we continue. */
                 qemu_co_mutex_unlock(&s->lock);
@@ -791,9 +801,9 @@  static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
         }
     }
 
-    if (!*nb_clusters) {
-        abort();
-    }
+    /* Make sure that existing clusters and new allocations are only used up to
+     * the next dependency if we shortened the request above */
+    *cur_bytes = bytes;
 
     return 0;
 }
@@ -872,6 +882,7 @@  int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
     uint64_t *l2_table;
     unsigned int nb_clusters, keep_clusters;
     uint64_t cluster_offset;
+    uint64_t cur_bytes;
 
     trace_qcow2_alloc_clusters_offset(qemu_coroutine_self(), offset,
                                       n_start, n_end);
@@ -884,6 +895,7 @@  again:
     l2_index = offset_to_l2_index(s, offset);
     nb_clusters = MIN(size_to_clusters(s, n_end << BDRV_SECTOR_BITS),
                       s->l2_size - l2_index);
+    n_end = MIN(n_end, nb_clusters * s->cluster_sectors);
 
     /*
      * Now start gathering as many contiguous clusters as possible:
@@ -908,7 +920,8 @@  again:
      * 3. If the request still hasn't completed, allocate new clusters,
      *    considering any cluster_offset of steps 1c or 2.
      */
-    ret = handle_dependencies(bs, offset, &nb_clusters);
+    cur_bytes = (n_end - n_start) * BDRV_SECTOR_SIZE;
+    ret = handle_dependencies(bs, offset, &cur_bytes);
     if (ret == -EAGAIN) {
         goto again;
     } else if (ret < 0) {
@@ -919,6 +932,9 @@  again:
          * correctly during the next loop iteration. */
     }
 
+    nb_clusters = size_to_clusters(s, offset + cur_bytes)
+                - (offset >> s->cluster_bits);
+
     /* Find L2 entry for the first involved cluster */
     ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
     if (ret < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index 1322012..72e6d12 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -303,6 +303,17 @@  static inline bool qcow2_need_accurate_refcounts(BDRVQcowState *s)
     return !(s->incompatible_features & QCOW2_INCOMPAT_DIRTY);
 }
 
+static inline uint64_t l2meta_cow_start(QCowL2Meta *m)
+{
+    return m->offset + m->cow_start.offset;
+}
+
+static inline uint64_t l2meta_cow_end(QCowL2Meta *m)
+{
+    return m->offset + m->cow_end.offset
+        + (m->cow_end.nb_sectors << BDRV_SECTOR_BITS);
+}
+
 // FIXME Need qcow2_ prefix to global functions
 
 /* qcow2.c functions */