diff mbox

qcow2: Rewrite qcow2_alloc_bytes()

Message ID 1423079623-4253-1-git-send-email-mreitz@redhat.com
State New
Headers show

Commit Message

Max Reitz Feb. 4, 2015, 7:53 p.m. UTC
qcow2_alloc_bytes() is a function with insufficient error handling and
an unnecessary goto. This patch rewrites it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 77 +++++++++++++++++++++++++++++---------------------
 1 file changed, 45 insertions(+), 32 deletions(-)

Comments

Eric Blake Feb. 4, 2015, 9:59 p.m. UTC | #1
On 02/04/2015 12:53 PM, Max Reitz wrote:

My review jumps around a bit; try to follow the numbers to learn my
stream of consciousness on reviewing it.

tl;dr:
It looks like this is functionally correct (you won't break behavior),
but that you have a pessimization bug (see [7]), so you ought to spin v2.

> qcow2_alloc_bytes() is a function with insufficient error handling and
> an unnecessary goto. This patch rewrites it.

Not clear what the added error handling is from just the commit message. [1]

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 77 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 45 insertions(+), 32 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 9afdb40..cafca0a 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -759,46 +759,51 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
>  int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
>  {
>      BDRVQcowState *s = bs->opaque;
> -    int64_t offset, cluster_offset;
> -    int free_in_cluster;
> +    int64_t offset, new_cluster = 0, cluster_end;
> +    size_t free_in_cluster;
>  
>      BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES);
>      assert(size > 0 && size <= s->cluster_size);

[0] Not listed here, but I finally figured it out and it made the rest
of my review easier: on entry, s->free_byte_offset is either 0 (there is
no existing compressed cluster with an available tail, does that mean
s->free_cluster_index is also 0? [10]) or an actual byte offset of a
tail (not cluster-aligned, s->free_cluster_index is non-zero).

> -    if (s->free_byte_offset == 0) {
> -        offset = qcow2_alloc_clusters(bs, s->cluster_size);

The old code had one cluster alloc here; [2]

> -        if (offset < 0) {
> -            return offset;
> +
> +    if (s->free_byte_offset) {
> +        int refcount = qcow2_get_refcount(bs,
> +            s->free_byte_offset >> s->cluster_bits);
> +        if (refcount < 0) {
> +            return refcount;
> +        }
> +
> +        if (refcount == 0xffff) {
> +            s->free_byte_offset = 0;

[1] This is some of the added error checking (the old code silently
failed if it overflowed a refcount - perhaps possible with 2M clusters
and LOTS of compressed data fitting into a single cluster).

>          }
> -        s->free_byte_offset = offset;
>      }
> - redo:
> +
>      free_in_cluster = s->cluster_size -
>          offset_into_cluster(s, s->free_byte_offset);

Based on [0], free_in_cluster is either s->cluster_size
(s->free_byte_offset was 0 and we have no tail) or smaller than
s->cluster_size (s->free_byte_offset was non-zero).

> -    if (size <= free_in_cluster) {
> -        /* enough space in current cluster */
> -        offset = s->free_byte_offset;

The old code determined a successful return value here [9]

> -        s->free_byte_offset += size;
> -        free_in_cluster -= size;
> -        if (free_in_cluster == 0)
> -            s->free_byte_offset = 0;
> -        if (offset_into_cluster(s, offset) != 0)
> -            qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
> -                                          QCOW2_DISCARD_NEVER);

[1] another instance of missing error checking - the old code fails to
notice a botched refcount update, while the new code correctly undoes
things if you encounter failure [8]

> -    } else {
> -        offset = qcow2_alloc_clusters(bs, s->cluster_size);

[2] and a second here. If my logic is right, you could never trigger
both in a single run (the first is triggered only if s->free_byte_offset
was 0 [meaning we have no existing sector to add more compressed data
to], the function requires that size <= s->cluster_size, so
free_in_cluster is s->cluster_size if the first alloc triggered, and we
skip the second.  Oh, that ignores the 'redo:' label, so I have to
re-evaluate if we can ever allocate a cluster, then goto redo, then
allocate again; but looking at [3], the only time we jump back to redo
is right after setting 'offset' to the start of a new cluster, so again
only one allocation).  So, I'd expect that your rewrite cleans things up
to have just a single cluster allocation. [4]

> -        if (offset < 0) {
> -            return offset;
> +
> +    if (!s->free_byte_offset || free_in_cluster < size) {
> +        new_cluster = qcow2_alloc_clusters(bs, s->cluster_size);

[4] and here's the one allocation.  A freshly-allocated cluster starts
life with a ref-count of one; it is only if we reuse the tail of an
existing cluster that we need to bump the refcount higher, so there is
more logic needed later on deciding refcount changes [5].  If we made
this allocation, then s->free_byte_offset is either 0 or the start of a
tail that is too small but that we might be able to use if the
allocation is contiguous [6].

> +        if (new_cluster < 0) {
> +            return new_cluster;
>          }
> -        cluster_offset = start_of_cluster(s, s->free_byte_offset);
> -        if ((cluster_offset + s->cluster_size) == offset) {
> -            /* we are lucky: contiguous data */
> -            offset = s->free_byte_offset;

[9] or here

> -            qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
> -                                          QCOW2_DISCARD_NEVER);
> -            s->free_byte_offset += size;
> -        } else {
> -            s->free_byte_offset = offset;
> -            goto redo;

[3] this is the backwards jump that makes the old code HARD to
understand.  Thanks for killing it!

> +
> +        cluster_end = start_of_cluster(s, s->free_byte_offset) +
> +            s->cluster_size;

[6] cluster_end is now either s->cluster_size (s->free_byte_offset was
0) or the first cluster-aligned address after s->free_byte_offset (will
tell us if new_cluster is contiguous). If I'm not mistaken, you could
also write this as:

cluster_end = ROUND_UP(s->free_byte_offset, s->cluster_size);

which would give the same value for a non-zero s->free_byte_offset, but
the value 0 instead of s->cluster_size if s->free_byte_offset is 0.
I'll analyze at point [7] what semantic difference that would make.

> +
> +        if (!s->free_cluster_index || cluster_end != new_cluster) {
> +            s->free_byte_offset = new_cluster;
> +        }

[7] This was the only mention of s->free_cluster_index in the patch.  I
had to hunt for it's use in the code base; there are only two places in
the code base that set it to non-zero: alloc_clusters_noref() and
update_refcount().  But alloc_clusters_noref() is called by
qcow2_alloc_clusters(), which we just called.  Therefore, the left half
of the || is always true, and you always send up changing
s->free_byte_offset (that is, when you allocate a new cluster, you never
reuse the tail of an existing cluster, even if the two were contiguous).
[11]

At one point in my review, I wondered if point [0] should assert that
!s->free_byte_offset should imply !s->free_cluster_index (more on that
at [10], but I convinced myself that would be wrong).

Let's ignore the left half of the ||, and focus on the right half.  If
s->free_byte_offset was non-zero, you now know whether the new cluster
is contiguous (use the existing tail, and the overflow into the new
cluster is safe) or not (use only the new cluster); either way,
s->free_byte_offset is now the point where the compressed data will be
written.  If s->free_byte_offset was 0, you want to unconditionally
change it to the start of the just-allocated cluster.

Note that if you used my suggestion above at making cluster_end == 0
rather than s->cluster_size for the !s->free_byte_offset case, then you
are guaranteed that cluster_end != new_cluster is a sufficient test for
when to correctly rewrite s->free_byte_offset (since new_cluster will be
non-zero, because qcow2_alloc_clusters() will never return the header
cluster); whereas with your code, there is a very minute chance that
qcow2_alloc_clusters() could be equal to s->cluster_size (that is, the
very first cluster after the header, perhaps possible if earlier actions
on the file moved the L1 and refcount table to later in the file and
freed up cluster index 1).  Using just the right half of the ||, if that
rare chance under your code happened, then we would NOT rewrite
s->free_byte_offset, and actually have a bug of returning 0 to the caller.

> +    }
> +
> +    if (offset_into_cluster(s, s->free_byte_offset)) {

I'd feel a bit better if you added
assert(s->free_byte_offset);
just before this if (which happens to be the case with your
pessimization on the left half of || [7], and would also be the case
with my proposed simplification of [6]).

> +        int ret = qcow2_update_cluster_refcount(bs,
> +                s->free_byte_offset >> s->cluster_bits, 1,
> +                QCOW2_DISCARD_NEVER);

[5] good - you are only incrementing the refcount of an existing cluster
if you use the tail.

> +        if (ret < 0) {
> +            if (new_cluster > 0) {
> +                qcow2_free_clusters(bs, new_cluster, s->cluster_size,
> +                                    QCOW2_DISCARD_OTHER);
> +            }

[8] nice that you avoid leaking the new allocation on a failed refcount
update.

> +            return ret;
>          }
>      }
>  
> @@ -807,6 +812,14 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
>       * be flushed before the caller's L2 table updates.
>       */
>      qcow2_cache_set_dependency(bs, s->l2_table_cache, s->refcount_block_cache);
> +
> +    offset = s->free_byte_offset;

[9] while you don't compute it until the very end.

> +
> +    s->free_byte_offset += size;
> +    if (!offset_into_cluster(s, s->free_byte_offset)) {
> +        s->free_byte_offset = 0;

[10] Hmm. This condition means that the compressed data fits EXACTLY
into the tail of the existing active compressed cluster, so the next
byte allocation will need to start a new cluster.  However, you are NOT
setting s->free_cluster_index back to 0, so we are back to my analysis
of the test at point [7] being suspect.  As I argued above, it is
possible that new_cluster could be == s->cluster_size, and so you needed
something other than just 'cluster_end != new_cluster) for deciding
whether to rewrite s->free_byte_offset when there was no existing tail
to even consider.  So my naive thought was that you could set
s->free_cluster_index to 0 at this point, and then assert at point [0]
that s->free_byte_offset == 0 implies !s->free_cluster_index.  But
thinking about it more, that pessimizes more than it helps
(s->free_cluster_index is useful to keep non-zero as much as possible),
and I think my alternative of being careful to avoid cluster_end ==
s->cluster_size is more elegant.

> +    }
> +
>      return offset;
>  }
>  
> 

[11] In summary, everything you have looks like it works, except for the
fact that your rewrite has a pessimization that EVERY compressed write
that spills a cluster tail will start at the beginning of a new cluster,
rather than taking advantage of contiguous cluster allocations to reuse
the tail.  Can we design a test that would exercise this, and make sure
we don't pessimize in any future changes?  Off the top of my head, it
might be possible to design the test to have minimum-size 512-byte
clusters, then do a series of 3 compressed writes of 512 bytes of data
that occupies 300 bytes when compressed.  With your pessimization, you
are guaranteed that this would consume three clusters (each with
refcount 1, and with a tail of 212 bytes each, regardless of whether
they are contiguous), with the original code or with my proposed fix, if
we can guarantee contiguous allocation, then it would only consume two
clusters (both with refcount 2 because they contain 1.5 compressed data
clusters each, the first cluster having 300 bytes of data write 1 and
212 bytes of data write 2, and the second cluster having 88 bytes of
data write 2, 300 bytes of data write 3, and 124 bytes of tail).
Eric Blake Feb. 4, 2015, 10:07 p.m. UTC | #2
On 02/04/2015 02:59 PM, Eric Blake wrote:
>> qcow2_alloc_bytes() is a function with insufficient error handling and
>> an unnecessary goto. This patch rewrites it.

>> - redo:
>> +
>>      free_in_cluster = s->cluster_size -
>>          offset_into_cluster(s, s->free_byte_offset);
> 
> Based on [0], free_in_cluster is either s->cluster_size
> (s->free_byte_offset was 0 and we have no tail) or smaller than
> s->cluster_size (s->free_byte_offset was non-zero).

Maybe even worth an assert?

assert(free_in_cluster < s->cluster_size + !s->free_byte_offset);
Max Reitz Feb. 5, 2015, 1:25 p.m. UTC | #3
On 2015-02-04 at 16:59, Eric Blake wrote:
> On 02/04/2015 12:53 PM, Max Reitz wrote:
>
> My review jumps around a bit; try to follow the numbers to learn my
> stream of consciousness on reviewing it.
>
> tl;dr:
> It looks like this is functionally correct (you won't break behavior),
> but that you have a pessimization bug (see [7]), so you ought to spin v2.
>
>> qcow2_alloc_bytes() is a function with insufficient error handling and
>> an unnecessary goto. This patch rewrites it.
> Not clear what the added error handling is from just the commit message. [1]

Well, there are two cases and I didn't really mind writing it into the 
commit message since this patch completely rewrites the function anyway.

[snip]

>> +
>> +        cluster_end = start_of_cluster(s, s->free_byte_offset) +
>> +            s->cluster_size;
> [6] cluster_end is now either s->cluster_size (s->free_byte_offset was
> 0) or the first cluster-aligned address after s->free_byte_offset (will
> tell us if new_cluster is contiguous). If I'm not mistaken, you could
> also write this as:
>
> cluster_end = ROUND_UP(s->free_byte_offset, s->cluster_size);
>
> which would give the same value for a non-zero s->free_byte_offset, but
> the value 0 instead of s->cluster_size if s->free_byte_offset is 0.
> I'll analyze at point [7] what semantic difference that would make.

Indeed. I was wondering about s->free_byte_offset maybe being aligned to 
a cluster boundary and thus breaking this, but that can never happen (if 
it is aligned, it will be set to 0 at the end of this function). If it 
did happen, the free_in_cluster calculation would break, too.

I'll add an assert(!s->free_byte_offset || offset_into_cluster(s, 
s->free_byte_offset)); at the start of this function (which might have 
avoided "[0] ... I finally figured it out").

>> +
>> +        if (!s->free_cluster_index || cluster_end != new_cluster) {
>> +            s->free_byte_offset = new_cluster;
>> +        }
> [7] This was the only mention of s->free_cluster_index in the patch.

Oops, that's because I meant to use s->free_byte_offset.

> I
> had to hunt for it's use in the code base;

Sorry. :-/

> there are only two places in
> the code base that set it to non-zero: alloc_clusters_noref() and
> update_refcount().  But alloc_clusters_noref() is called by
> qcow2_alloc_clusters(), which we just called.  Therefore, the left half
> of the || is always true, and you always send up changing
> s->free_byte_offset (that is, when you allocate a new cluster, you never
> reuse the tail of an existing cluster, even if the two were contiguous).
> [11]
>
> At one point in my review, I wondered if point [0] should assert that
> !s->free_byte_offset should imply !s->free_cluster_index (more on that
> at [10], but I convinced myself that would be wrong).
>
> Let's ignore the left half of the ||, and focus on the right half.  If
> s->free_byte_offset was non-zero, you now know whether the new cluster
> is contiguous (use the existing tail, and the overflow into the new
> cluster is safe) or not (use only the new cluster); either way,
> s->free_byte_offset is now the point where the compressed data will be
> written.  If s->free_byte_offset was 0, you want to unconditionally
> change it to the start of the just-allocated cluster.
>
> Note that if you used my suggestion above at making cluster_end == 0
> rather than s->cluster_size for the !s->free_byte_offset case, then you
> are guaranteed that cluster_end != new_cluster is a sufficient test for
> when to correctly rewrite s->free_byte_offset (since new_cluster will be
> non-zero, because qcow2_alloc_clusters() will never return the header
> cluster); whereas with your code, there is a very minute chance that
> qcow2_alloc_clusters() could be equal to s->cluster_size (that is, the
> very first cluster after the header, perhaps possible if earlier actions
> on the file moved the L1 and refcount table to later in the file and
> freed up cluster index 1).  Using just the right half of the ||, if that
> rare chance under your code happened, then we would NOT rewrite
> s->free_byte_offset, and actually have a bug of returning 0 to the caller.

That silly small mistake made your whole review so much more 
difficult... I'm sorry, again.

>> +    }
>> +
>> +    if (offset_into_cluster(s, s->free_byte_offset)) {
> I'd feel a bit better if you added
> assert(s->free_byte_offset);
> just before this if (which happens to be the case with your
> pessimization on the left half of || [7], and would also be the case
> with my proposed simplification of [6]).

Yes, why not.

Thanks for your review!

Max
diff mbox

Patch

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 9afdb40..cafca0a 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -759,46 +759,51 @@  int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
 int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t offset, cluster_offset;
-    int free_in_cluster;
+    int64_t offset, new_cluster = 0, cluster_end;
+    size_t free_in_cluster;
 
     BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES);
     assert(size > 0 && size <= s->cluster_size);
-    if (s->free_byte_offset == 0) {
-        offset = qcow2_alloc_clusters(bs, s->cluster_size);
-        if (offset < 0) {
-            return offset;
+
+    if (s->free_byte_offset) {
+        int refcount = qcow2_get_refcount(bs,
+            s->free_byte_offset >> s->cluster_bits);
+        if (refcount < 0) {
+            return refcount;
+        }
+
+        if (refcount == 0xffff) {
+            s->free_byte_offset = 0;
         }
-        s->free_byte_offset = offset;
     }
- redo:
+
     free_in_cluster = s->cluster_size -
         offset_into_cluster(s, s->free_byte_offset);
-    if (size <= free_in_cluster) {
-        /* enough space in current cluster */
-        offset = s->free_byte_offset;
-        s->free_byte_offset += size;
-        free_in_cluster -= size;
-        if (free_in_cluster == 0)
-            s->free_byte_offset = 0;
-        if (offset_into_cluster(s, offset) != 0)
-            qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
-                                          QCOW2_DISCARD_NEVER);
-    } else {
-        offset = qcow2_alloc_clusters(bs, s->cluster_size);
-        if (offset < 0) {
-            return offset;
+
+    if (!s->free_byte_offset || free_in_cluster < size) {
+        new_cluster = qcow2_alloc_clusters(bs, s->cluster_size);
+        if (new_cluster < 0) {
+            return new_cluster;
         }
-        cluster_offset = start_of_cluster(s, s->free_byte_offset);
-        if ((cluster_offset + s->cluster_size) == offset) {
-            /* we are lucky: contiguous data */
-            offset = s->free_byte_offset;
-            qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
-                                          QCOW2_DISCARD_NEVER);
-            s->free_byte_offset += size;
-        } else {
-            s->free_byte_offset = offset;
-            goto redo;
+
+        cluster_end = start_of_cluster(s, s->free_byte_offset) +
+            s->cluster_size;
+
+        if (!s->free_cluster_index || cluster_end != new_cluster) {
+            s->free_byte_offset = new_cluster;
+        }
+    }
+
+    if (offset_into_cluster(s, s->free_byte_offset)) {
+        int ret = qcow2_update_cluster_refcount(bs,
+                s->free_byte_offset >> s->cluster_bits, 1,
+                QCOW2_DISCARD_NEVER);
+        if (ret < 0) {
+            if (new_cluster > 0) {
+                qcow2_free_clusters(bs, new_cluster, s->cluster_size,
+                                    QCOW2_DISCARD_OTHER);
+            }
+            return ret;
         }
     }
 
@@ -807,6 +812,14 @@  int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
      * be flushed before the caller's L2 table updates.
      */
     qcow2_cache_set_dependency(bs, s->l2_table_cache, s->refcount_block_cache);
+
+    offset = s->free_byte_offset;
+
+    s->free_byte_offset += size;
+    if (!offset_into_cluster(s, s->free_byte_offset)) {
+        s->free_byte_offset = 0;
+    }
+
     return offset;
 }