Patchwork [3/4] Block: readonly changes

login
register
mail settings
Submitter Naphtali Sprei
Date Feb. 3, 2010, 6:32 p.m.
Message ID <1265221948-12613-4-git-send-email-nsprei@redhat.com>
Download mbox | patch
Permalink /patch/44403/
State New
Headers show

Comments

Naphtali Sprei - Feb. 3, 2010, 6:32 p.m.
Open backing file for read-only
During commit upgrade to read-write and back at end to read-only

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
---
 block.c     |   64 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 block_int.h |    1 +
 2 files changed, 57 insertions(+), 8 deletions(-)
Jamie Lokier - Feb. 3, 2010, 9:06 p.m.
Naphtali Sprei wrote:
> Open backing file for read-only
> During commit upgrade to read-write and back at end to read-only

> +    if (ro) { /* re-open as RO */
> +        bs_ro = bdrv_new("");
> +        ret = bdrv_open2(bs_ro, bs->backing_hd->filename,  bs->backing_hd->open_flags & ~BDRV_O_RDWR, NULL);
> +        if (ret < 0) {
> +            bdrv_delete(bs_ro);
> +            return -EACCES;
> +        }
> +        bdrv_close(bs->backing_hd);
> +        qemu_free(bs->backing_hd);
> +        bs->backing_hd = bs_ro;
> +        bs->backing_hd->keep_read_only = 0;
> +    }

I think the general idea is perfect.

A couple of concerns come to mind.

1. When changing read-write to read-only, if the backing file is a complex
   format like qcow2 (or any others), is it possible for this bdrv_open2()
   to read metadata such as format indexes, and even data, _before_
   all changes maintained by bs->backing_hd have been written to the file?

   (If the complex formats were like real filesystems and had a "mounted"
   flags as real filesystems tend to, then it would be an issue, but I'm
   not aware of any of them doing that.)

   Are there any such issues when switching from read-only to read-write
   earlier?  (It seems unlikely).

2. Secondly, what if the re-open gets a different file (testable with
   fstat()).  I know, you get what you deserve if you rename files, but
   still, do any of the formats which use backing files have a UUID check
   or something to confirm they are using the right backing file, which
   might be subverted by this?

3. What about the bdrv file/device-locking which was actively looking
   like it might get in a couple of months ago.  Did it get in, and if
   so does it conflict with this upgrade pattern?

Thanks!
-- Jamie
Naphtali Sprei - Feb. 4, 2010, 9:36 a.m.
Jamie Lokier wrote:
> Naphtali Sprei wrote:
>> Open backing file for read-only
>> During commit upgrade to read-write and back at end to read-only
> 
>> +    if (ro) { /* re-open as RO */
>> +        bs_ro = bdrv_new("");
>> +        ret = bdrv_open2(bs_ro, bs->backing_hd->filename,  bs->backing_hd->open_flags & ~BDRV_O_RDWR, NULL);
>> +        if (ret < 0) {
>> +            bdrv_delete(bs_ro);
>> +            return -EACCES;
>> +        }
>> +        bdrv_close(bs->backing_hd);
>> +        qemu_free(bs->backing_hd);
>> +        bs->backing_hd = bs_ro;
>> +        bs->backing_hd->keep_read_only = 0;
>> +    }
> 
> I think the general idea is perfect.
> 
> A couple of concerns come to mind.
> 
> 1. When changing read-write to read-only, if the backing file is a complex
>    format like qcow2 (or any others), is it possible for this bdrv_open2()
>    to read metadata such as format indexes, and even data, _before_
>    all changes maintained by bs->backing_hd have been written to the file?
> 
>    (If the complex formats were like real filesystems and had a "mounted"
>    flags as real filesystems tend to, then it would be an issue, but I'm
>    not aware of any of them doing that.)
> 
>    Are there any such issues when switching from read-only to read-write
>    earlier?  (It seems unlikely).
> 

Good question. I looked at some of the formats (qcow, qcow2, vmdk) and didn't see
anything problematic, since in the close function I didn't see any changes to the real file,
only in-memory data and memory free.
But an answer from an expert would help.

> 2. Secondly, what if the re-open gets a different file (testable with
>    fstat()).  I know, you get what you deserve if you rename files, but
>    still, do any of the formats which use backing files have a UUID check
>    or something to confirm they are using the right backing file, which
>    might be subverted by this?

I didn't see any such checking/validation.
It seems that handling such cases will complicate things more than you gain.

> 
> 3. What about the bdrv file/device-locking which was actively looking
>    like it might get in a couple of months ago.  Did it get in, and if
>    so does it conflict with this upgrade pattern?

AFAIK, the locks thread terminated, don't think anything committed.
But surely, there's a tight relationship between read-only/locks and sharing.

> 
> Thanks!
> -- Jamie

 Thanks,

  Naphtali
Kevin Wolf - Feb. 4, 2010, 10:44 a.m.
Am 04.02.2010 10:36, schrieb Naphtali Sprei:
> Jamie Lokier wrote:
>> Naphtali Sprei wrote:
>>> Open backing file for read-only
>>> During commit upgrade to read-write and back at end to read-only
>>
>>> +    if (ro) { /* re-open as RO */
>>> +        bs_ro = bdrv_new("");
>>> +        ret = bdrv_open2(bs_ro, bs->backing_hd->filename,  bs->backing_hd->open_flags & ~BDRV_O_RDWR, NULL);
>>> +        if (ret < 0) {
>>> +            bdrv_delete(bs_ro);
>>> +            return -EACCES;
>>> +        }
>>> +        bdrv_close(bs->backing_hd);
>>> +        qemu_free(bs->backing_hd);
>>> +        bs->backing_hd = bs_ro;
>>> +        bs->backing_hd->keep_read_only = 0;
>>> +    }
>>
>> I think the general idea is perfect.
>>
>> A couple of concerns come to mind.
>>
>> 1. When changing read-write to read-only, if the backing file is a complex
>>    format like qcow2 (or any others), is it possible for this bdrv_open2()
>>    to read metadata such as format indexes, and even data, _before_
>>    all changes maintained by bs->backing_hd have been written to the file?
>>
>>    (If the complex formats were like real filesystems and had a "mounted"
>>    flags as real filesystems tend to, then it would be an issue, but I'm
>>    not aware of any of them doing that.)
>>
>>    Are there any such issues when switching from read-only to read-write
>>    earlier?  (It seems unlikely).
>>
> 
> Good question. I looked at some of the formats (qcow, qcow2, vmdk) and didn't see
> anything problematic, since in the close function I didn't see any changes to the real file,
> only in-memory data and memory free.
> But an answer from an expert would help.

It would probably work currently, but I think it's still an invalid
assumption. To make this code safe in terms of avoiding corruption, we
should first close the old bs and then open the new one.

Unfortunately, this means in turn that if re-opening fails, it's just
bad luck and the VM has lost its image. In this case I'm not sure if
there is any possible error recovery, so we might end up abort()ing.

>> 2. Secondly, what if the re-open gets a different file (testable with
>>    fstat()).  I know, you get what you deserve if you rename files, but
>>    still, do any of the formats which use backing files have a UUID check
>>    or something to confirm they are using the right backing file, which
>>    might be subverted by this?

qcow and qcow2 don't. I'm not sure about VMDK, but I don't think there
is any such check either. And even if such checks were in place we would
have the same problem as above: If we detected an error, what to do?
abort()?

Kevin

Patch

diff --git a/block.c b/block.c
index 66564de..527b146 100644
--- a/block.c
+++ b/block.c
@@ -451,7 +451,6 @@  int bdrv_open2(BlockDriverState *bs, const char *filename, int flags,
     if (flags & (BDRV_O_CACHE_WB|BDRV_O_NOCACHE))
         bs->enable_write_cache = 1;
 
-    bs->read_only = (flags & BDRV_O_RDWR) == 0;
     if (!(flags & BDRV_O_FILE)) {
         open_flags = (flags & (BDRV_O_RDWR | BDRV_O_CACHE_MASK|BDRV_O_NATIVE_AIO));
         if (bs->is_temporary) { /* snapshot should be writeable */
@@ -466,6 +465,7 @@  int bdrv_open2(BlockDriverState *bs, const char *filename, int flags,
         goto free_and_fail;
     }
 
+    bs->keep_read_only = bs->read_only = !(open_flags & BDRV_O_RDWR);
     if (drv->bdrv_getlength) {
         bs->total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
     }
@@ -482,13 +482,28 @@  int bdrv_open2(BlockDriverState *bs, const char *filename, int flags,
                      filename, bs->backing_file);
         if (bs->backing_format[0] != '\0')
             back_drv = bdrv_find_format(bs->backing_format);
+
+        open_flags &= ~BDRV_O_RDWR; /* clear RW, then restore from orig */
+        if (bs->is_temporary) {
+            open_flags |= (flags & BDRV_O_RDWR);
+        }
+        
         ret = bdrv_open2(bs->backing_hd, backing_filename, open_flags,
                          back_drv);
-        bs->backing_hd->read_only =  (open_flags & BDRV_O_RDWR) == 0;
+        if (ret < 0) {
+            open_flags &= ~BDRV_O_RDWR;  /* Fall-back to read-only for the backing file */
+            ret = bdrv_open2(bs->backing_hd, backing_filename, open_flags,
+                             back_drv);
+        }
         if (ret < 0) {
             bdrv_close(bs);
             return ret;
         }
+        if (!bs->is_temporary) {
+            bs->backing_hd->keep_read_only = bs->keep_read_only; /* base image inherits from "parent" and open read-only */
+        } else {
+            bs->backing_hd->keep_read_only = !(flags & BDRV_O_RDWR);
+        }
     }
 
     if (!bdrv_key_required(bs)) {
@@ -564,19 +579,34 @@  int bdrv_commit(BlockDriverState *bs)
 {
     BlockDriver *drv = bs->drv;
     int64_t i, total_sectors;
-    int n, j;
+    int n, j, ro;
     int ret = 0;
     unsigned char sector[512];
+    BlockDriverState *bs_rw, *bs_ro;
 
     if (!drv)
         return -ENOMEDIUM;
+    
+    if (!bs->backing_hd) {
+	return -ENOTSUP;
+    }
 
-    if (bs->read_only) {
+    if (bs->backing_hd->keep_read_only) {
 	return -EACCES;
     }
+    
+    ro = bs->backing_hd->read_only;
 
-    if (!bs->backing_hd) {
-	return -ENOTSUP;
+    if (ro) { /* re-open as RW */
+        bs_rw = bdrv_new("");
+        ret = bdrv_open2(bs_rw, bs->backing_hd->filename,  bs->backing_hd->open_flags | BDRV_O_RDWR, NULL);
+        if (ret < 0) {
+            bdrv_delete(bs_rw);
+            return -EACCES;
+        }
+        bdrv_close(bs->backing_hd);
+        qemu_free(bs->backing_hd);
+        bs->backing_hd = bs_rw;
     }
 
     total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
@@ -584,11 +614,13 @@  int bdrv_commit(BlockDriverState *bs)
         if (drv->bdrv_is_allocated(bs, i, 65536, &n)) {
             for(j = 0; j < n; j++) {
                 if (bdrv_read(bs, i, sector, 1) != 0) {
-                    return -EIO;
+                    ret = -EIO;
+                    goto ro_cleanup;
                 }
 
                 if (bdrv_write(bs->backing_hd, i, sector, 1) != 0) {
-                    return -EIO;
+                    ret = -EIO;
+                    goto ro_cleanup;
                 }
                 i++;
 	    }
@@ -608,6 +640,22 @@  int bdrv_commit(BlockDriverState *bs)
      */
     if (bs->backing_hd)
         bdrv_flush(bs->backing_hd);
+
+ro_cleanup:
+
+    if (ro) { /* re-open as RO */
+        bs_ro = bdrv_new("");
+        ret = bdrv_open2(bs_ro, bs->backing_hd->filename,  bs->backing_hd->open_flags & ~BDRV_O_RDWR, NULL);
+        if (ret < 0) {
+            bdrv_delete(bs_ro);
+            return -EACCES;
+        }
+        bdrv_close(bs->backing_hd);
+        qemu_free(bs->backing_hd);
+        bs->backing_hd = bs_ro;
+        bs->backing_hd->keep_read_only = 0;
+    }
+
     return ret;
 }
 
diff --git a/block_int.h b/block_int.h
index 9144d37..02fae5b 100644
--- a/block_int.h
+++ b/block_int.h
@@ -130,6 +130,7 @@  struct BlockDriverState {
     int64_t total_sectors; /* if we are reading a disk image, give its
                               size in sectors */
     int read_only; /* if true, the media is read only */
+    int keep_read_only; /* if true, the media was requested to stay read only */
     int open_flags; /* flags used to open the file */
     int removable; /* if true, the media can be removed */
     int locked;    /* if true, the media cannot temporarily be ejected */