diff mbox

[v2] block: Produce zeros when protocols reading beyond end of file

Message ID 1375754020-20545-1-git-send-email-asias@redhat.com
State New
Headers show

Commit Message

Asias He Aug. 6, 2013, 1:53 a.m. UTC
From: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>

While Asias is debugging an issue creating qcow2 images on top of
non-file protocols.  It boils down to this example using NBD:

$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'

Notice the open -g option to set bs->growable.  This means you can
read/write beyond end of file.  Reading beyond end of file is supposed
to produce zeroes.

We rely on this behavior in qcow2_create2() during qcow2 image
creation.  We create a new file and then write the qcow2 header
structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
sector size, block.c first uses bdrv_read() on the empty file to fetch
the first sector (should be all zeroes).

Here is the output from the qemu-io NBD example above:

$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
00000000:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
00000010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
00000020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
...

We are not zeroing the buffer!  As a result qcow2 image creation on top
of protocols is not guaranteed to work even when file creation is
supported by the protocol.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Asias He <asias@redhat.com>
---
 block.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

Comments

Fam Zheng Aug. 6, 2013, 2:02 a.m. UTC | #1
On Tue, 08/06 09:53, Asias He wrote:
> From: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> 
> While Asias is debugging an issue creating qcow2 images on top of
> non-file protocols.  It boils down to this example using NBD:
> 
> $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> 
> Notice the open -g option to set bs->growable.  This means you can
> read/write beyond end of file.  Reading beyond end of file is supposed
> to produce zeroes.
> 
> We rely on this behavior in qcow2_create2() during qcow2 image
> creation.  We create a new file and then write the qcow2 header
> structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
> sector size, block.c first uses bdrv_read() on the empty file to fetch
> the first sector (should be all zeroes).
> 
> Here is the output from the qemu-io NBD example above:
> 
> $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> 00000000:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> 00000010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> 00000020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> ...
> 
> We are not zeroing the buffer!  As a result qcow2 image creation on top
> of protocols is not guaranteed to work even when file creation is
> supported by the protocol.

It seems to me that "read beyond EOF" is more protocol defined than to
be forced in block layer. Is this behavior common to all protocols, is
it reasonable if some protocol wants other values than zero?

Thanks.
Asias He Aug. 6, 2013, 2:38 a.m. UTC | #2
On Tue, Aug 06, 2013 at 10:02:22AM +0800, Fam Zheng wrote:
> On Tue, 08/06 09:53, Asias He wrote:
> > From: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> > 
> > While Asias is debugging an issue creating qcow2 images on top of
> > non-file protocols.  It boils down to this example using NBD:
> > 
> > $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> > 
> > Notice the open -g option to set bs->growable.  This means you can
> > read/write beyond end of file.  Reading beyond end of file is supposed
> > to produce zeroes.
> > 
> > We rely on this behavior in qcow2_create2() during qcow2 image
> > creation.  We create a new file and then write the qcow2 header
> > structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
> > sector size, block.c first uses bdrv_read() on the empty file to fetch
> > the first sector (should be all zeroes).
> > 
> > Here is the output from the qemu-io NBD example above:
> > 
> > $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> > 00000000:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> > 00000010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> > 00000020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> > ...
> > 
> > We are not zeroing the buffer!  As a result qcow2 image creation on top
> > of protocols is not guaranteed to work even when file creation is
> > supported by the protocol.
> 
> It seems to me that "read beyond EOF" is more protocol defined than to
> be forced in block layer. Is this behavior common to all protocols, is
> it reasonable if some protocol wants other values than zero?

The reason to do this in block layer is that we do not want to duplicate
the memset in all protocols.

Do we actually have protocols that really want values other than zero?

> Thanks.
> 
> -- 
> Fam
Stefan Hajnoczi Aug. 7, 2013, 8:04 a.m. UTC | #3
On Tue, Aug 06, 2013 at 10:38:32AM +0800, Asias He wrote:
> On Tue, Aug 06, 2013 at 10:02:22AM +0800, Fam Zheng wrote:
> > On Tue, 08/06 09:53, Asias He wrote:
> > > From: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> > > 
> > > While Asias is debugging an issue creating qcow2 images on top of
> > > non-file protocols.  It boils down to this example using NBD:
> > > 
> > > $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> > > 
> > > Notice the open -g option to set bs->growable.  This means you can
> > > read/write beyond end of file.  Reading beyond end of file is supposed
> > > to produce zeroes.
> > > 
> > > We rely on this behavior in qcow2_create2() during qcow2 image
> > > creation.  We create a new file and then write the qcow2 header
> > > structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
> > > sector size, block.c first uses bdrv_read() on the empty file to fetch
> > > the first sector (should be all zeroes).
> > > 
> > > Here is the output from the qemu-io NBD example above:
> > > 
> > > $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> > > 00000000:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> > > 00000010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> > > 00000020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> > > ...
> > > 
> > > We are not zeroing the buffer!  As a result qcow2 image creation on top
> > > of protocols is not guaranteed to work even when file creation is
> > > supported by the protocol.
> > 
> > It seems to me that "read beyond EOF" is more protocol defined than to
> > be forced in block layer. Is this behavior common to all protocols, is
> > it reasonable if some protocol wants other values than zero?
> 
> The reason to do this in block layer is that we do not want to duplicate
> the memset in all protocols.
> 
> Do we actually have protocols that really want values other than zero?

I think we rely on zeroes when bs->growable is true, because
bdrv_pwrite() handles sub-sector I/O using read-modify-write.  So it
reads beyond EOF first (expects to get zeroes), then copies the
sub-sector data from the caller, then writes it back.  If we don't zero
beyond-EOF data then we would write unexpected values to the image.

Stefan
Stefan Hajnoczi Aug. 13, 2013, 1:50 p.m. UTC | #4
On Tue, Aug 06, 2013 at 09:53:40AM +0800, Asias He wrote:
> From: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> 
> While Asias is debugging an issue creating qcow2 images on top of
> non-file protocols.  It boils down to this example using NBD:
> 
> $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> 
> Notice the open -g option to set bs->growable.  This means you can
> read/write beyond end of file.  Reading beyond end of file is supposed
> to produce zeroes.
> 
> We rely on this behavior in qcow2_create2() during qcow2 image
> creation.  We create a new file and then write the qcow2 header
> structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
> sector size, block.c first uses bdrv_read() on the empty file to fetch
> the first sector (should be all zeroes).
> 
> Here is the output from the qemu-io NBD example above:
> 
> $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> 00000000:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> 00000010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> 00000020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> ...
> 
> We are not zeroing the buffer!  As a result qcow2 image creation on top
> of protocols is not guaranteed to work even when file creation is
> supported by the protocol.
> 
> Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> Signed-off-by: Asias He <asias@redhat.com>
> ---
>  block.c | 30 +++++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)

Thanks, applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

Stefan
Stefan Hajnoczi Aug. 22, 2013, 12:13 p.m. UTC | #5
On Tue, Aug 06, 2013 at 09:53:40AM +0800, Asias He wrote:
> From: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> 
> While Asias is debugging an issue creating qcow2 images on top of
> non-file protocols.  It boils down to this example using NBD:
> 
> $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> 
> Notice the open -g option to set bs->growable.  This means you can
> read/write beyond end of file.  Reading beyond end of file is supposed
> to produce zeroes.
> 
> We rely on this behavior in qcow2_create2() during qcow2 image
> creation.  We create a new file and then write the qcow2 header
> structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
> sector size, block.c first uses bdrv_read() on the empty file to fetch
> the first sector (should be all zeroes).
> 
> Here is the output from the qemu-io NBD example above:
> 
> $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> 00000000:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> 00000010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> 00000020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
> ...
> 
> We are not zeroing the buffer!  As a result qcow2 image creation on top
> of protocols is not guaranteed to work even when file creation is
> supported by the protocol.
> 
> Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> Signed-off-by: Asias He <asias@redhat.com>
> ---
>  block.c | 30 +++++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)

Applied again on top of Asias' fix so qcow2 vmstate doesn't break.

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan
diff mbox

Patch

diff --git a/block.c b/block.c
index 01b66d8..f3cd9fb 100644
--- a/block.c
+++ b/block.c
@@ -2544,7 +2544,35 @@  static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         }
     }
 
-    ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+    if (!bs->growable) {
+        ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+    } else {
+        /* Read zeros after EOF of growable BDSes */
+        int64_t len, total_sectors, max_nb_sectors;
+
+        len = bdrv_getlength(bs);
+        if (len < 0) {
+            ret = len;
+            goto out;
+        }
+
+        total_sectors = len >> BDRV_SECTOR_BITS;
+        max_nb_sectors = MAX(0, total_sectors - sector_num);
+        if (max_nb_sectors > 0) {
+            ret = drv->bdrv_co_readv(bs, sector_num,
+                                     MIN(nb_sectors, max_nb_sectors), qiov);
+        } else {
+            ret = 0;
+        }
+
+        /* Reading beyond end of file is supposed to produce zeroes */
+        if (ret == 0 && total_sectors < sector_num + nb_sectors) {
+            uint64_t offset = MAX(0, total_sectors - sector_num);
+            uint64_t bytes = (sector_num + nb_sectors - offset) *
+                              BDRV_SECTOR_SIZE;
+            qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes);
+        }
+    }
 
 out:
     tracked_request_end(&req);