diff mbox

block: Flush image after open

Message ID 1299687353-20424-1-git-send-email-kwolf@redhat.com
State New
Headers show

Commit Message

Kevin Wolf March 9, 2011, 4:15 p.m. UTC
Quoting the bug report:

    qemu ensures that guest writes and qemu metadata writes hit the disk
    when necessary to prevent data corruption. However, if an image was
    in host pagecache prior to starting qemu, for example after running
    qemu-img convert, then nothing prevents writes from reaching the
    disk out of order, potentially causing corruption.

I'm not entirely sure if there is a realistic case where we would get
corruption, but it's probably a case of better safe than sorry.

Reported-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

Comments

Christoph Hellwig March 9, 2011, 5:27 p.m. UTC | #1
On Wed, Mar 09, 2011 at 05:15:53PM +0100, Kevin Wolf wrote:
> Quoting the bug report:
> 
>     qemu ensures that guest writes and qemu metadata writes hit the disk
>     when necessary to prevent data corruption. However, if an image was
>     in host pagecache prior to starting qemu, for example after running
>     qemu-img convert, then nothing prevents writes from reaching the
>     disk out of order, potentially causing corruption.
> 
> I'm not entirely sure if there is a realistic case where we would get
> corruption, but it's probably a case of better safe than sorry.

Except for SCSI with ordered tags (which we don't support) there are not
ordering guarantees in the storage protocols, and as such the above explanation
doesn't make any sense at all.
Anthony Liguori March 9, 2011, 5:38 p.m. UTC | #2
On 03/09/2011 11:27 AM, Christoph Hellwig wrote:
> On Wed, Mar 09, 2011 at 05:15:53PM +0100, Kevin Wolf wrote:
>> Quoting the bug report:
>>
>>      qemu ensures that guest writes and qemu metadata writes hit the disk
>>      when necessary to prevent data corruption. However, if an image was
>>      in host pagecache prior to starting qemu, for example after running
>>      qemu-img convert, then nothing prevents writes from reaching the
>>      disk out of order, potentially causing corruption.
>>
>> I'm not entirely sure if there is a realistic case where we would get
>> corruption, but it's probably a case of better safe than sorry.
> Except for SCSI with ordered tags (which we don't support) there are not
> ordering guarantees in the storage protocols, and as such the above explanation
> doesn't make any sense at all.

Even if there was, a guest shouldn't be relying on the ordering of a 
write that comes from a non-guest.

I don't understand the failure scenario here.

Regards,

Anthony Liguori

>
Avi Kivity March 21, 2011, 12:23 p.m. UTC | #3
On 03/09/2011 07:38 PM, Anthony Liguori wrote:
> On 03/09/2011 11:27 AM, Christoph Hellwig wrote:
>> On Wed, Mar 09, 2011 at 05:15:53PM +0100, Kevin Wolf wrote:
>>> Quoting the bug report:
>>>
>>>      qemu ensures that guest writes and qemu metadata writes hit the 
>>> disk
>>>      when necessary to prevent data corruption. However, if an image 
>>> was
>>>      in host pagecache prior to starting qemu, for example after 
>>> running
>>>      qemu-img convert, then nothing prevents writes from reaching the
>>>      disk out of order, potentially causing corruption.
>>>
>>> I'm not entirely sure if there is a realistic case where we would get
>>> corruption, but it's probably a case of better safe than sorry.
>> Except for SCSI with ordered tags (which we don't support) there are not
>> ordering guarantees in the storage protocols, and as such the above 
>> explanation
>> doesn't make any sense at all.
>
> Even if there was, a guest shouldn't be relying on the ordering of a 
> write that comes from a non-guest.
>
> I don't understand the failure scenario here.

   $ cp x.img y.img
   $ qemu -drive file=y.img,cache=writeback
<read something from disk, send it over the network>
<no guest flushes>
<host crash>

The guest may expect that any or none of its writes hit the disk, but 
that anything that it read from the disk, stays there.
Kevin Wolf March 21, 2011, 1:02 p.m. UTC | #4
Am 21.03.2011 13:23, schrieb Avi Kivity:
> On 03/09/2011 07:38 PM, Anthony Liguori wrote:
>> On 03/09/2011 11:27 AM, Christoph Hellwig wrote:
>>> On Wed, Mar 09, 2011 at 05:15:53PM +0100, Kevin Wolf wrote:
>>>> Quoting the bug report:
>>>>
>>>>      qemu ensures that guest writes and qemu metadata writes hit the 
>>>> disk
>>>>      when necessary to prevent data corruption. However, if an image 
>>>> was
>>>>      in host pagecache prior to starting qemu, for example after 
>>>> running
>>>>      qemu-img convert, then nothing prevents writes from reaching the
>>>>      disk out of order, potentially causing corruption.
>>>>
>>>> I'm not entirely sure if there is a realistic case where we would get
>>>> corruption, but it's probably a case of better safe than sorry.
>>> Except for SCSI with ordered tags (which we don't support) there are not
>>> ordering guarantees in the storage protocols, and as such the above 
>>> explanation
>>> doesn't make any sense at all.
>>
>> Even if there was, a guest shouldn't be relying on the ordering of a 
>> write that comes from a non-guest.
>>
>> I don't understand the failure scenario here.
> 
>    $ cp x.img y.img
>    $ qemu -drive file=y.img,cache=writeback
> <read something from disk, send it over the network>
> <no guest flushes>
> <host crash>
> 
> The guest may expect that any or none of its writes hit the disk, but 
> that anything that it read from the disk, stays there.

Is it true for real hardware? Consider a reboot, you could still have
some data in a volatile disk write cache if the OS that ran before the
reboot hasn't flushed it.

Kevin
Avi Kivity March 21, 2011, 1:21 p.m. UTC | #5
On 03/21/2011 03:02 PM, Kevin Wolf wrote:
> Am 21.03.2011 13:23, schrieb Avi Kivity:
> >  On 03/09/2011 07:38 PM, Anthony Liguori wrote:
> >>  On 03/09/2011 11:27 AM, Christoph Hellwig wrote:
> >>>  On Wed, Mar 09, 2011 at 05:15:53PM +0100, Kevin Wolf wrote:
> >>>>  Quoting the bug report:
> >>>>
> >>>>       qemu ensures that guest writes and qemu metadata writes hit the
> >>>>  disk
> >>>>       when necessary to prevent data corruption. However, if an image
> >>>>  was
> >>>>       in host pagecache prior to starting qemu, for example after
> >>>>  running
> >>>>       qemu-img convert, then nothing prevents writes from reaching the
> >>>>       disk out of order, potentially causing corruption.
> >>>>
> >>>>  I'm not entirely sure if there is a realistic case where we would get
> >>>>  corruption, but it's probably a case of better safe than sorry.
> >>>  Except for SCSI with ordered tags (which we don't support) there are not
> >>>  ordering guarantees in the storage protocols, and as such the above
> >>>  explanation
> >>>  doesn't make any sense at all.
> >>
> >>  Even if there was, a guest shouldn't be relying on the ordering of a
> >>  write that comes from a non-guest.
> >>
> >>  I don't understand the failure scenario here.
> >
> >     $ cp x.img y.img
> >     $ qemu -drive file=y.img,cache=writeback
> >  <read something from disk, send it over the network>
> >  <no guest flushes>
> >  <host crash>
> >
> >  The guest may expect that any or none of its writes hit the disk, but
> >  that anything that it read from the disk, stays there.
>
> Is it true for real hardware? Consider a reboot, you could still have
> some data in a volatile disk write cache if the OS that ran before the
> reboot hasn't flushed it.

That's if RESET doesn't flush the cache.  It's probably false for fc or 
iscsi, but possibly true for IDE.

But it can't happen for a single-boot host, or a dual boot host with no 
shared partitions.
diff mbox

Patch

diff --git a/block.c b/block.c
index 8dea0b5..bf6892f 100644
--- a/block.c
+++ b/block.c
@@ -648,6 +648,14 @@  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
             bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
     }
 
+    /* Make sure that the image is consistent on disk */
+    if (!bdrv_is_read_only(bs)) {
+        ret = bdrv_flush(bs);
+        if (ret < 0) {
+            goto unlink_and_fail;
+        }
+    }
+
     return 0;
 
 unlink_and_fail: