diff mbox

[14/20] qcow2: Factor out count_cow_clusters

Message ID 4F916168.6090002@redhat.com
State New
Headers show

Commit Message

Kevin Wolf April 20, 2012, 1:15 p.m. UTC
Am 20.04.2012 15:06, schrieb Stefan Hajnoczi:
> On Fri, Apr 20, 2012 at 1:27 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Fri, Apr 20, 2012 at 9:24 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 19.04.2012 23:18, schrieb Marcelo Tosatti:
>>>> On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote:
>>>>>> There is one intended change in functionality in this patch, which is
>>>>>> that it allocates new clusters even when it could satisfy the first part
>>>>>> of the request with already allocated clusters. In order to check if
>>>>>> there is a problem with this scenario, the following patch should revert
>>>>>> to the old behaviour:
>>>>>>
>>>>>> --- a/block/qcow2-cluster.c
>>>>>> +++ b/block/qcow2-cluster.c
>>>>>> @@ -847,7 +847,7 @@ again:
>>>>>>          keep_clusters = count_contiguous_clusters(nb_clusters,
>>>>>> s->cluster_size,
>>>>>>                                                    &l2_table[l2_index],
>>>>>> 0, 0);
>>>>>>          assert(keep_clusters <= nb_clusters);
>>>>>> -        nb_clusters -= keep_clusters;
>>>>>> +        nb_clusters = 0;
>>>>>>      } else {
>>>>>>          /* For the moment, overwrite compressed clusters one by one */
>>>>>>          if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
>>>>>>
>>>>>> The rest is meant to be a functionally equivalent rewrite of the old
>>>>>> code that was required in order to allow this change.
>>>>>
>>>>> Testing.
>>>>
>>>> Corruption gone with patch above.
>>>
>>> Okay, so something must be wrong only with the new code paths, which
>>> makes things a bit easier. I'll have another closer look. Stefan, can
>>> you re-review 250196f1 as well?
>>
>> I'm taking a look.
> 
> Just looking at the qemu-img check output that Marcelo posted I'm
> interpreting that as OFLAG_COPIED was set on the offset but its
> refcount was 2.
> 
> The refcount shouldn't be 2 unless internal snapshots were used.
> 
> So we probably incremented the refcount either too often or for the
> wrong block (which is more likely since the guest sees corruption).

I've looked at the same thing now and I think the same cluster was used
both for a regular data allocation and for a new refcount block. This
may happen because alloc_refcount_block() uses s->free_cluster_index
which is updated by qcow2_alloc_cluster_noref(), but not by
qcow2_alloc_cluster_at(). Just a theory so far, though, and not yet
tried out in practice. If this is it, a patch would look like this
(completely untested as well):

     /* Check how many clusters there are free */
@@ -602,11 +603,16 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs,
uint64_t offset,
     }

     /* And then allocate them */
+    old_free_cluster_index = s->free_cluster_index;
+    s->free_cluster_index = cluster_index + i;
+
     ret = update_refcount(bs, offset, i << s->cluster_bits, 1);
     if (ret < 0) {
         return ret;
     }

+    s->free_cluster_index = old_free_cluster_index;
+
     return i;
 }

Kevin
diff mbox

Patch

--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -587,6 +587,7 @@  int qcow2_alloc_clusters_at(BlockDriverState *bs,
uint64_t offset,
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t cluster_index;
+    uint64_t old_free_cluster_index;
     int i, refcount, ret;