diff mbox

[v6] qcow2: Buffer L1 table in snapshot refcount update

Message ID 1415719671-16257-1-git-send-email-mreitz@redhat.com
State New
Headers show

Commit Message

Max Reitz Nov. 11, 2014, 3:27 p.m. UTC
From: Zhang Haoyu <zhanghy@sangfor.com>

Buffer the active L1 table in qcow2_update_snapshot_refcount() in order
to prevent in-place conversion of the L1 table buffer in the
BDRVQcowState to big endian and back, which would lead to data
corruption if that buffer was accessed concurrently. This should not
happen but better being safe than sorry.

Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
v6 for "snapshot: use local variable to bdrv_pwrite_sync L1 table" (I
changed the commit message wording to make it more clear what this patch
does and why we want it).

Changes in v6:
- Only copy the local buffer back into s->l1_table if we are indeed
  accessing the local L1 table
- Use qemu_vfree() instead of g_free()
---
 block/qcow2-refcount.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

Comments

Max Reitz Nov. 20, 2014, 2:32 p.m. UTC | #1
On 2014-11-11 at 16:27, Max Reitz wrote:
> From: Zhang Haoyu <zhanghy@sangfor.com>
>
> Buffer the active L1 table in qcow2_update_snapshot_refcount() in order
> to prevent in-place conversion of the L1 table buffer in the
> BDRVQcowState to big endian and back, which would lead to data
> corruption if that buffer was accessed concurrently. This should not
> happen but better being safe than sorry.
>
> Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> v6 for "snapshot: use local variable to bdrv_pwrite_sync L1 table" (I
> changed the commit message wording to make it more clear what this patch
> does and why we want it).
>
> Changes in v6:
> - Only copy the local buffer back into s->l1_table if we are indeed
>    accessing the local L1 table
> - Use qemu_vfree() instead of g_free()
> ---
>   block/qcow2-refcount.c | 30 ++++++++++++++----------------
>   1 file changed, 14 insertions(+), 16 deletions(-)

Ping
Max Reitz Nov. 27, 2014, 3:09 p.m. UTC | #2
On 2014-11-20 at 15:32, Max Reitz wrote:
> On 2014-11-11 at 16:27, Max Reitz wrote:
>> From: Zhang Haoyu <zhanghy@sangfor.com>
>>
>> Buffer the active L1 table in qcow2_update_snapshot_refcount() in order
>> to prevent in-place conversion of the L1 table buffer in the
>> BDRVQcowState to big endian and back, which would lead to data
>> corruption if that buffer was accessed concurrently. This should not
>> happen but better being safe than sorry.
>>
>> Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> v6 for "snapshot: use local variable to bdrv_pwrite_sync L1 table" (I
>> changed the commit message wording to make it more clear what this patch
>> does and why we want it).
>>
>> Changes in v6:
>> - Only copy the local buffer back into s->l1_table if we are indeed
>>    accessing the local L1 table
>> - Use qemu_vfree() instead of g_free()
>> ---
>>   block/qcow2-refcount.c | 30 ++++++++++++++----------------
>>   1 file changed, 14 insertions(+), 16 deletions(-)
>
> Ping

-(Png)²

(got it? because i² = -1, ha-ha-haa)
Stefan Hajnoczi Nov. 28, 2014, 10:29 a.m. UTC | #3
On Tue, Nov 11, 2014 at 04:27:51PM +0100, Max Reitz wrote:
> From: Zhang Haoyu <zhanghy@sangfor.com>
> 
> Buffer the active L1 table in qcow2_update_snapshot_refcount() in order
> to prevent in-place conversion of the L1 table buffer in the
> BDRVQcowState to big endian and back, which would lead to data
> corruption if that buffer was accessed concurrently. This should not
> happen but better being safe than sorry.
> 
> Signed-off-by: Zhang Haoyu <zhanghy@sangfor.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
> v6 for "snapshot: use local variable to bdrv_pwrite_sync L1 table" (I
> changed the commit message wording to make it more clear what this patch
> does and why we want it).
> 
> Changes in v6:
> - Only copy the local buffer back into s->l1_table if we are indeed
>   accessing the local L1 table
> - Use qemu_vfree() instead of g_free()
> ---
>  block/qcow2-refcount.c | 30 ++++++++++++++----------------
>  1 file changed, 14 insertions(+), 16 deletions(-)

If there is a code path where the L1 table is accessed while
qcow2_update_snapshot_refcount() is blocked, this patch does not fix the
bug.

It trades an L1 table entry corruption (due to endianness mismatch on
little-endian hosts) for a race condition where a stale L1 table is
accessed or L1 changes are overwritten when
qcow2_update_snapshot_refcount() memcpys back to s->l1_table.

Please identify the root cause and fix that.

> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 9afdb40..c0c4a50 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -877,14 +877,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>  {
>      BDRVQcowState *s = bs->opaque;
>      uint64_t *l1_table, *l2_table, l2_offset, offset, l1_size2;
> -    bool l1_allocated = false;
> +    bool active_l1 = false;
>      int64_t old_offset, old_l2_offset;
>      int i, j, l1_modified = 0, nb_csectors, refcount;
>      int ret;
>  
>      l2_table = NULL;
> -    l1_table = NULL;
>      l1_size2 = l1_size * sizeof(uint64_t);
> +    l1_table = qemu_try_blockalign(bs->file, l1_size2);
> +    if (l1_table == NULL) {
> +        ret = -ENOMEM;
> +        goto fail;
> +    }
>  
>      s->cache_discards = true;
>  
> @@ -892,13 +896,6 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>       * l1_table_offset when it is the current s->l1_table_offset! Be careful
>       * when changing this! */
>      if (l1_table_offset != s->l1_table_offset) {
> -        l1_table = g_try_malloc0(align_offset(l1_size2, 512));
> -        if (l1_size2 && l1_table == NULL) {
> -            ret = -ENOMEM;
> -            goto fail;
> -        }
> -        l1_allocated = true;
> -
>          ret = bdrv_pread(bs->file, l1_table_offset, l1_table, l1_size2);
>          if (ret < 0) {
>              goto fail;
> @@ -908,8 +905,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>              be64_to_cpus(&l1_table[i]);
>      } else {
>          assert(l1_size == s->l1_size);
> -        l1_table = s->l1_table;
> -        l1_allocated = false;
> +        memcpy(l1_table, s->l1_table, l1_size2);
> +        active_l1 = true;
>      }
>  
>      for(i = 0; i < l1_size; i++) {
> @@ -1051,13 +1048,14 @@ fail:
>          }
>  
>          ret = bdrv_pwrite_sync(bs->file, l1_table_offset, l1_table, l1_size2);
> -
> -        for (i = 0; i < l1_size; i++) {
> -            be64_to_cpus(&l1_table[i]);
> +        if (active_l1 && ret == 0) {
> +            for (i = 0; i < l1_size; i++) {
> +                be64_to_cpus(&l1_table[i]);
> +            }
> +            memcpy(s->l1_table, l1_table, l1_size2);
>          }
>      }
> -    if (l1_allocated)
> -        g_free(l1_table);
> +    qemu_vfree(l1_table);
>      return ret;
>  }
>  
> -- 
> 1.9.3
> 
>
diff mbox

Patch

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 9afdb40..c0c4a50 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -877,14 +877,18 @@  int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t *l1_table, *l2_table, l2_offset, offset, l1_size2;
-    bool l1_allocated = false;
+    bool active_l1 = false;
     int64_t old_offset, old_l2_offset;
     int i, j, l1_modified = 0, nb_csectors, refcount;
     int ret;
 
     l2_table = NULL;
-    l1_table = NULL;
     l1_size2 = l1_size * sizeof(uint64_t);
+    l1_table = qemu_try_blockalign(bs->file, l1_size2);
+    if (l1_table == NULL) {
+        ret = -ENOMEM;
+        goto fail;
+    }
 
     s->cache_discards = true;
 
@@ -892,13 +896,6 @@  int qcow2_update_snapshot_refcount(BlockDriverState *bs,
      * l1_table_offset when it is the current s->l1_table_offset! Be careful
      * when changing this! */
     if (l1_table_offset != s->l1_table_offset) {
-        l1_table = g_try_malloc0(align_offset(l1_size2, 512));
-        if (l1_size2 && l1_table == NULL) {
-            ret = -ENOMEM;
-            goto fail;
-        }
-        l1_allocated = true;
-
         ret = bdrv_pread(bs->file, l1_table_offset, l1_table, l1_size2);
         if (ret < 0) {
             goto fail;
@@ -908,8 +905,8 @@  int qcow2_update_snapshot_refcount(BlockDriverState *bs,
             be64_to_cpus(&l1_table[i]);
     } else {
         assert(l1_size == s->l1_size);
-        l1_table = s->l1_table;
-        l1_allocated = false;
+        memcpy(l1_table, s->l1_table, l1_size2);
+        active_l1 = true;
     }
 
     for(i = 0; i < l1_size; i++) {
@@ -1051,13 +1048,14 @@  fail:
         }
 
         ret = bdrv_pwrite_sync(bs->file, l1_table_offset, l1_table, l1_size2);
-
-        for (i = 0; i < l1_size; i++) {
-            be64_to_cpus(&l1_table[i]);
+        if (active_l1 && ret == 0) {
+            for (i = 0; i < l1_size; i++) {
+                be64_to_cpus(&l1_table[i]);
+            }
+            memcpy(s->l1_table, l1_table, l1_size2);
         }
     }
-    if (l1_allocated)
-        g_free(l1_table);
+    qemu_vfree(l1_table);
     return ret;
 }