Patchwork [v2,0.15.0] qcow2: Fix L1 table size after bdrv_snapshot_goto

login
register
mail settings
Submitter Kevin Wolf
Date Aug. 4, 2011, 4:24 p.m.
Message ID <1312475099-3323-1-git-send-email-kwolf@redhat.com>
Download mbox | patch
Permalink /patch/108537/
State New
Headers show

Comments

Kevin Wolf - Aug. 4, 2011, 4:24 p.m.
When loading an internal snapshot whose L1 table is smaller than the current L1
table, the size of the current L1 would be shrunk to the snapshot's L1 size in
memory, but not on disk. This lead to incorrect refcount updates and eventuelly
to image corruption.

Instead of writing the new L1 size to disk, this simply retains the bigger L1
size that is currently in use and makes sure that the unused part is zeroed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---

And the moment you send it out, you notice that it's wrong... *sigh*

v2:
- Check for s->l1_size > sn->l1_size in order to avoid disasters...

Philipp, I think this should fix your corruption. Please give it a try.

Anthony, this must go into 0.15. Given the short time until -rc2, do you prefer
to pick it up directly or should I send a pull request tomorrow? The patch
looks obvious, is tested with the given testcase and survives a basic
qemu-iotests run (though qemu-iotests doesn't exercise snapshots a lot)

Stefan, please review :-)

 block/qcow2-snapshot.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)
Philipp Hahn - Aug. 4, 2011, 5:19 p.m.
Hello,

On Thursday 04 August 2011 18:24:59 Kevin Wolf wrote:
> When loading an internal snapshot whose L1 table is smaller than the
> current L1 table, the size of the current L1 would be shrunk to the
> snapshot's L1 size in memory, but not on disk. This lead to incorrect
> refcount updates and eventuelly to image corruption.
>
> Instead of writing the new L1 size to disk, this simply retains the bigger
> L1 size that is currently in use and makes sure that the unused part is
> zeroed.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Philipp Hahn <hahn@univention.de>

> Philipp, I think this should fix your corruption. Please give it a try.
Yes, the patch looks conceptually right and fixes the observed problem.

> Anthony, this must go into 0.15.
That bug is also found in 0.12.4 and 0.14.1, so if there ever shoudl be an 
update to those branches, that fix should be applied there as well.

Thanks for your fast support.

Sincerely
Philipp Hahn
Frediano Ziglio - Aug. 5, 2011, 6:35 a.m.
2011/8/4 Kevin Wolf <kwolf@redhat.com>:
> When loading an internal snapshot whose L1 table is smaller than the current L1
> table, the size of the current L1 would be shrunk to the snapshot's L1 size in
> memory, but not on disk. This lead to incorrect refcount updates and eventuelly
> to image corruption.
>
> Instead of writing the new L1 size to disk, this simply retains the bigger L1
> size that is currently in use and makes sure that the unused part is zeroed.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>
> And the moment you send it out, you notice that it's wrong... *sigh*
>
> v2:
> - Check for s->l1_size > sn->l1_size in order to avoid disasters...
>
> Philipp, I think this should fix your corruption. Please give it a try.
>
> Anthony, this must go into 0.15. Given the short time until -rc2, do you prefer
> to pick it up directly or should I send a pull request tomorrow? The patch
> looks obvious, is tested with the given testcase and survives a basic
> qemu-iotests run (though qemu-iotests doesn't exercise snapshots a lot)
>
> Stefan, please review :-)
>
>  block/qcow2-snapshot.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
> index 74823a5..6972e66 100644
> --- a/block/qcow2-snapshot.c
> +++ b/block/qcow2-snapshot.c
> @@ -330,8 +330,11 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
>     if (qcow2_grow_l1_table(bs, sn->l1_size, true) < 0)
>         goto fail;
>
> -    s->l1_size = sn->l1_size;
> +    if (s->l1_size > sn->l1_size) {
> +        memset(s->l1_table + sn->l1_size, 0, s->l1_size - sn->l1_size);
> +    }
>     l1_size2 = s->l1_size * sizeof(uint64_t);
> +
>     /* copy the snapshot l1 table to the current l1 table */
>     if (bdrv_pread(bs->file, sn->l1_table_offset,
>                    s->l1_table, l1_size2) != l1_size2)
> --
> 1.7.6
>

This patch looked odd at first sight. First a qcow2_grow_l1_table is
called to shrink L1 so perhaps should be qcow2_resize_l1_table.
Perhaps also it would be better to clean entries in
qcow2_grow_l1_table instead of  qcow2_snapshot_goto to avoid same
problem in different calls to qcow2_grow_l1_table. The other oddity
(still to understand) is: why does some code use l1_table above
l1_size ??

Frediano
Kevin Wolf - Aug. 5, 2011, 7:51 a.m.
Am 05.08.2011 08:35, schrieb Frediano Ziglio:
> 2011/8/4 Kevin Wolf <kwolf@redhat.com>:
>> When loading an internal snapshot whose L1 table is smaller than the current L1
>> table, the size of the current L1 would be shrunk to the snapshot's L1 size in
>> memory, but not on disk. This lead to incorrect refcount updates and eventuelly
>> to image corruption.
>>
>> Instead of writing the new L1 size to disk, this simply retains the bigger L1
>> size that is currently in use and makes sure that the unused part is zeroed.
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>
>> And the moment you send it out, you notice that it's wrong... *sigh*
>>
>> v2:
>> - Check for s->l1_size > sn->l1_size in order to avoid disasters...
>>
>> Philipp, I think this should fix your corruption. Please give it a try.
>>
>> Anthony, this must go into 0.15. Given the short time until -rc2, do you prefer
>> to pick it up directly or should I send a pull request tomorrow? The patch
>> looks obvious, is tested with the given testcase and survives a basic
>> qemu-iotests run (though qemu-iotests doesn't exercise snapshots a lot)
>>
>> Stefan, please review :-)
>>
>>  block/qcow2-snapshot.c |    5 ++++-
>>  1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
>> index 74823a5..6972e66 100644
>> --- a/block/qcow2-snapshot.c
>> +++ b/block/qcow2-snapshot.c
>> @@ -330,8 +330,11 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
>>     if (qcow2_grow_l1_table(bs, sn->l1_size, true) < 0)
>>         goto fail;
>>
>> -    s->l1_size = sn->l1_size;
>> +    if (s->l1_size > sn->l1_size) {
>> +        memset(s->l1_table + sn->l1_size, 0, s->l1_size - sn->l1_size);
>> +    }
>>     l1_size2 = s->l1_size * sizeof(uint64_t);
>> +
>>     /* copy the snapshot l1 table to the current l1 table */
>>     if (bdrv_pread(bs->file, sn->l1_table_offset,
>>                    s->l1_table, l1_size2) != l1_size2)
>> --
>> 1.7.6
>>
> 
> This patch looked odd at first sight. First a qcow2_grow_l1_table is
> called to shrink L1 so perhaps should be qcow2_resize_l1_table.

No, it doesn't shrink the table:

    if (min_size <= s->l1_size)
        return 0;

> Perhaps also it would be better to clean entries in
> qcow2_grow_l1_table instead of  qcow2_snapshot_goto to avoid same
> problem in different calls to qcow2_grow_l1_table. The other oddity
> (still to understand) is: why does some code use l1_table above
> l1_size ??

Which code do you mean specifically?

Kevin
Frediano Ziglio - Aug. 5, 2011, 8:28 a.m.
2011/8/5 Kevin Wolf <kwolf@redhat.com>:
> Am 05.08.2011 08:35, schrieb Frediano Ziglio:
>> 2011/8/4 Kevin Wolf <kwolf@redhat.com>:
>>> When loading an internal snapshot whose L1 table is smaller than the current L1
>>> table, the size of the current L1 would be shrunk to the snapshot's L1 size in
>>> memory, but not on disk. This lead to incorrect refcount updates and eventuelly
>>> to image corruption.
>>>
>>> Instead of writing the new L1 size to disk, this simply retains the bigger L1
>>> size that is currently in use and makes sure that the unused part is zeroed.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>
>>> And the moment you send it out, you notice that it's wrong... *sigh*
>>>
>>> v2:
>>> - Check for s->l1_size > sn->l1_size in order to avoid disasters...
>>>
>>> Philipp, I think this should fix your corruption. Please give it a try.
>>>
>>> Anthony, this must go into 0.15. Given the short time until -rc2, do you prefer
>>> to pick it up directly or should I send a pull request tomorrow? The patch
>>> looks obvious, is tested with the given testcase and survives a basic
>>> qemu-iotests run (though qemu-iotests doesn't exercise snapshots a lot)
>>>
>>> Stefan, please review :-)
>>>
>>>  block/qcow2-snapshot.c |    5 ++++-
>>>  1 files changed, 4 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
>>> index 74823a5..6972e66 100644
>>> --- a/block/qcow2-snapshot.c
>>> +++ b/block/qcow2-snapshot.c
>>> @@ -330,8 +330,11 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
>>>     if (qcow2_grow_l1_table(bs, sn->l1_size, true) < 0)
>>>         goto fail;
>>>
>>> -    s->l1_size = sn->l1_size;
>>> +    if (s->l1_size > sn->l1_size) {
>>> +        memset(s->l1_table + sn->l1_size, 0, s->l1_size - sn->l1_size);
>>> +    }
>>>     l1_size2 = s->l1_size * sizeof(uint64_t);
>>> +
>>>     /* copy the snapshot l1 table to the current l1 table */
>>>     if (bdrv_pread(bs->file, sn->l1_table_offset,
>>>                    s->l1_table, l1_size2) != l1_size2)
>>> --
>>> 1.7.6
>>>
>>
>> This patch looked odd at first sight. First a qcow2_grow_l1_table is
>> called to shrink L1 so perhaps should be qcow2_resize_l1_table.
>
> No, it doesn't shrink the table:
>
>    if (min_size <= s->l1_size)
>        return 0;
>

Yes, but perhaps returning success and not clipping anything is not
that correct, perhaps qcow2_snapshot_goto should not call
qcow2_grow_l1_table with a shorter value.

>> Perhaps also it would be better to clean entries in
>> qcow2_grow_l1_table instead of  qcow2_snapshot_goto to avoid same
>> problem in different calls to qcow2_grow_l1_table. The other oddity
>> (still to understand) is: why does some code use l1_table above
>> l1_size ??
>
> Which code do you mean specifically?
>
> Kevin
>

I think this is the issue

# 1204 -> 128 cluster per L2 entry -> 128k per L2 entry
# 128 cluster per L1 entry -> 16M per L1 entry
qemu-img create -f qcow2 /tmp/sn.qcow2 16m -o cluster_size=1024
qemu-img snapshot -c foo /tmp/sn.qcow2
# extend L1
qemu-io -c 'write -b 0 4M' /tmp/sn.qcow2
# shrink
qemu-img snapshot -a foo /tmp/sn.qcow2
qemu-img check /tmp/sn.qcow2

Well... I was trying to get a leak and got a core instead. I removed
your patch and got leaks.

also, should not be

    memset(s->l1_table + sn->l1_size, 0, (s->l1_size - sn->l1_size) *
sizeof(uint64_t));

instead of

    memset(s->l1_table + sn->l1_size, 0, s->l1_size - sn->l1_size);

Frediano

Patch

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 74823a5..6972e66 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -330,8 +330,11 @@  int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
     if (qcow2_grow_l1_table(bs, sn->l1_size, true) < 0)
         goto fail;
 
-    s->l1_size = sn->l1_size;
+    if (s->l1_size > sn->l1_size) {
+        memset(s->l1_table + sn->l1_size, 0, s->l1_size - sn->l1_size);
+    }
     l1_size2 = s->l1_size * sizeof(uint64_t);
+
     /* copy the snapshot l1 table to the current l1 table */
     if (bdrv_pread(bs->file, sn->l1_table_offset,
                    s->l1_table, l1_size2) != l1_size2)