Message ID | 1375690310-9051-1-git-send-email-asias@redhat.com |
---|---|
State | New |
Headers | show |
Am 05.08.2013 um 10:11 hat Asias He geschrieben: > From: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> > > While Asias is debugging an issue creating qcow2 images on top of > non-file protocols. It boils down to this example using NBD: > > $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512' > > Notice the open -g option to set bs->growable. This means you can > read/write beyond end of file. Reading beyond end of file is supposed > to produce zeroes. > > We rely on this behavior in qcow2_create2() during qcow2 image > creation. We create a new file and then write the qcow2 header > structure using bdrv_pwrite(). Since QCowHeader is not a multiple of > sector size, block.c first uses bdrv_read() on the empty file to fetch > the first sector (should be all zeroes). > > Here is the output from the qemu-io NBD example above: > > $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512' > 00000000: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................ > 00000010: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................ > 00000020: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................ > ... > > We are not zeroing the buffer! As a result qcow2 image creation on top > of protocols is not guaranteed to work even when file creation is > supported by the protocol. > > Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> > Signed-off-by: Asias He <asias@redhat.com> > --- > block.c | 30 +++++++++++++++++++++++++++++- > 1 file changed, 29 insertions(+), 1 deletion(-) > > diff --git a/block.c b/block.c > index 01b66d8..deaf0a0 100644 > --- a/block.c > +++ b/block.c > @@ -2544,7 +2544,35 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, > } > } > > - ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); > + if (!bs->drv->protocol_name) { I think !bs->growable is the right check. Checking for the protocol name is always a hack and most times wrong. > + ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); > + } else { > + /* NBD doesn't support reading beyond end of file. */ This is not only for NBD, make it a neutral comment like: /* Read zeros after EOF of growable BDSes */ > + int64_t len, total_sectors, max_nb_sectors; > + > + len = bdrv_getlength(bs); > + if (len < 0) { > + ret = len; > + goto out; > + } > + > + total_sectors = len >> BDRV_SECTOR_BITS; > + max_nb_sectors = MAX(0, total_sectors - sector_num); > + if (max_nb_sectors > 0) { > + ret = drv->bdrv_co_readv(bs, sector_num, > + MIN(nb_sectors, max_nb_sectors), qiov); > + } else { > + ret = 0; > + } > + > + /* Reading beyond end of file is supposed to produce zeroes */ > + if (ret == 0 && total_sectors < sector_num + nb_sectors) { > + size_t offset = MAX(0, total_sectors - sector_num); > + size_t bytes = (sector_num + nb_sectors - offset) * > + BDRV_SECTOR_SIZE; uint64_t for both offset and bytes, size_t can be 32 bits. > + qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes); > + } > + } > > out: > tracked_request_end(&req); Kevin
On Mon, Aug 5, 2013 at 10:11 AM, Asias He <asias@redhat.com> wrote: > From: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> > > While Asias is debugging an issue creating qcow2 images on top of > non-file protocols. It boils down to this example using NBD: > > $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512' > > Notice the open -g option to set bs->growable. This means you can > read/write beyond end of file. Reading beyond end of file is supposed > to produce zeroes. > > We rely on this behavior in qcow2_create2() during qcow2 image > creation. We create a new file and then write the qcow2 header > structure using bdrv_pwrite(). Since QCowHeader is not a multiple of > sector size, block.c first uses bdrv_read() on the empty file to fetch > the first sector (should be all zeroes). > > Here is the output from the qemu-io NBD example above: > > $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512' > 00000000: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................ > 00000010: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................ > 00000020: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................ > ... > > We are not zeroing the buffer! As a result qcow2 image creation on top > of protocols is not guaranteed to work even when file creation is > supported by the protocol. > > Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> > Signed-off-by: Asias He <asias@redhat.com> > --- > block.c | 30 +++++++++++++++++++++++++++++- > 1 file changed, 29 insertions(+), 1 deletion(-) > > diff --git a/block.c b/block.c > index 01b66d8..deaf0a0 100644 > --- a/block.c > +++ b/block.c > @@ -2544,7 +2544,35 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, > } > } > > - ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); > + if (!bs->drv->protocol_name) { > + ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); > + } else { > + /* NBD doesn't support reading beyond end of file. */ > + int64_t len, total_sectors, max_nb_sectors; > + > + len = bdrv_getlength(bs); > + if (len < 0) { > + ret = len; > + goto out; > + } > + > + total_sectors = len >> BDRV_SECTOR_BITS; > + max_nb_sectors = MAX(0, total_sectors - sector_num); > + if (max_nb_sectors > 0) { > + ret = drv->bdrv_co_readv(bs, sector_num, > + MIN(nb_sectors, max_nb_sectors), qiov); > + } else { > + ret = 0; > + } > + > + /* Reading beyond end of file is supposed to produce zeroes */ > + if (ret == 0 && total_sectors < sector_num + nb_sectors) { > + size_t offset = MAX(0, total_sectors - sector_num); > + size_t bytes = (sector_num + nb_sectors - offset) * > + BDRV_SECTOR_SIZE; > + qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes); > + } > + } This patch breaks qemu-iotests ./check -qcow2 022. This happens because qcow2 temporarily sets ->growable = 1 for vmstate accesses (which are stored beyond the end of regular image data). static int qcow2_load_vmstate(BlockDriverState *bs, uint8_t *buf, int64_t pos, int size) { BDRVQcowState *s = bs->opaque; int growable = bs->growable; int ret; BLKDBG_EVENT(bs->file, BLKDBG_VMSTATE_LOAD); bs->growable = 1; ret = bdrv_pread(bs, qcow2_vm_state_offset(s) + pos, buf, size); bs->growable = growable; return ret; } Please *always* run qemu-iotests before submitting block patches: http://qemu-project.org/Documentation/QemuIoTests A simple but ugly way to fix this is for block.c to also have a ->zero_beyond_eof flag which enables the behavior you are adding. qcow2_load_vmstate() would disable ->zero_beyond_eof temporarily in addition to enabling ->growable. It's not easy to call the internal qcow2_co_readv() from qcow2_load_vmstate() because the vmstate functions are byte granularity (like bdrv_pread()) while .bdrv_co_readv() is sector-granularity. Stefan
On Fri, Aug 16, 2013 at 10:41:36AM +0200, Stefan Hajnoczi wrote: > On Mon, Aug 5, 2013 at 10:11 AM, Asias He <asias@redhat.com> wrote: > > From: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> > > > > While Asias is debugging an issue creating qcow2 images on top of > > non-file protocols. It boils down to this example using NBD: > > > > $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512' > > > > Notice the open -g option to set bs->growable. This means you can > > read/write beyond end of file. Reading beyond end of file is supposed > > to produce zeroes. > > > > We rely on this behavior in qcow2_create2() during qcow2 image > > creation. We create a new file and then write the qcow2 header > > structure using bdrv_pwrite(). Since QCowHeader is not a multiple of > > sector size, block.c first uses bdrv_read() on the empty file to fetch > > the first sector (should be all zeroes). > > > > Here is the output from the qemu-io NBD example above: > > > > $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512' > > 00000000: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................ > > 00000010: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................ > > 00000020: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ................ > > ... > > > > We are not zeroing the buffer! As a result qcow2 image creation on top > > of protocols is not guaranteed to work even when file creation is > > supported by the protocol. > > > > Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> > > Signed-off-by: Asias He <asias@redhat.com> > > --- > > block.c | 30 +++++++++++++++++++++++++++++- > > 1 file changed, 29 insertions(+), 1 deletion(-) > > > > diff --git a/block.c b/block.c > > index 01b66d8..deaf0a0 100644 > > --- a/block.c > > +++ b/block.c > > @@ -2544,7 +2544,35 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, > > } > > } > > > > - ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); > > + if (!bs->drv->protocol_name) { > > + ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); > > + } else { > > + /* NBD doesn't support reading beyond end of file. */ > > + int64_t len, total_sectors, max_nb_sectors; > > + > > + len = bdrv_getlength(bs); > > + if (len < 0) { > > + ret = len; > > + goto out; > > + } > > + > > + total_sectors = len >> BDRV_SECTOR_BITS; > > + max_nb_sectors = MAX(0, total_sectors - sector_num); > > + if (max_nb_sectors > 0) { > > + ret = drv->bdrv_co_readv(bs, sector_num, > > + MIN(nb_sectors, max_nb_sectors), qiov); > > + } else { > > + ret = 0; > > + } > > + > > + /* Reading beyond end of file is supposed to produce zeroes */ > > + if (ret == 0 && total_sectors < sector_num + nb_sectors) { > > + size_t offset = MAX(0, total_sectors - sector_num); > > + size_t bytes = (sector_num + nb_sectors - offset) * > > + BDRV_SECTOR_SIZE; > > + qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes); > > + } > > + } > > This patch breaks qemu-iotests ./check -qcow2 022. This happens > because qcow2 temporarily sets ->growable = 1 for vmstate accesses > (which are stored beyond the end of regular image data). I am a bit confused. This is from the other mail: """ > > I think it would break qcow2_load_vmstate(), which is basically a > > bdrv_pread() after the end of the image. > > I see, then only protocols have to zeroing the buffer? In case of > protocols, I think bdrv_getlength() returns the underlying file > length, so qcow2_load_vmstate() would be a bdrv_pread() within the > result of bdrv_getlength(). Limiting it to protocols solves the problem, I think. """ And in v1 of this patch, Kevin wanted bs->growable check instad of the protocol_name one. """ > - ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); > + if (!bs->drv->protocol_name) { I think !bs->growable is the right check. Checking for the protocol name is always a hack and most times wrong. """ Switching back to the protocol_name check, ./check -qcow2 022 test passes. > static int qcow2_load_vmstate(BlockDriverState *bs, uint8_t *buf, > int64_t pos, int size) > { > BDRVQcowState *s = bs->opaque; > int growable = bs->growable; > int ret; > > BLKDBG_EVENT(bs->file, BLKDBG_VMSTATE_LOAD); > bs->growable = 1; > ret = bdrv_pread(bs, qcow2_vm_state_offset(s) + pos, buf, size); > bs->growable = growable; > > return ret; > } > > Please *always* run qemu-iotests before submitting block patches: > http://qemu-project.org/Documentation/QemuIoTests OK. > A simple but ugly way to fix this is for block.c to also have a > ->zero_beyond_eof flag which enables the behavior you are adding. > qcow2_load_vmstate() would disable ->zero_beyond_eof temporarily in > addition to enabling ->growable. I am wondering why the ->growable logic is introduced in the first place. Adding yet another this kind of flag looks realy ugly ;( > It's not easy to call the internal qcow2_co_readv() from > qcow2_load_vmstate() because the vmstate functions are byte > granularity (like bdrv_pread()) while .bdrv_co_readv() is > sector-granularity. > > Stefan
On Mon, Aug 19, 2013 at 02:36:14PM +0800, Asias He wrote: > On Fri, Aug 16, 2013 at 10:41:36AM +0200, Stefan Hajnoczi wrote: > > On Mon, Aug 5, 2013 at 10:11 AM, Asias He <asias@redhat.com> wrote: > > A simple but ugly way to fix this is for block.c to also have a > > ->zero_beyond_eof flag which enables the behavior you are adding. > > qcow2_load_vmstate() would disable ->zero_beyond_eof temporarily in > > addition to enabling ->growable. > > I am wondering why the ->growable logic is introduced in the first > place. Adding yet another this kind of flag looks realy ugly ;( bs->growable serves two functions: 1. It means you can read/write beyond the end of the file, for example, when creating a new image file. Normally BlockDriverState rejects requests beyond the EOF. 2. qcow2 uses it as a hack to implement the vmstate area after the end of regular guest data. This is the ugly part but it always worked up until now. #1 is a legitimate use case. You don't always know how big the file will end up so it's much more convenient to grow on demand, instead of having to bdrv_truncate() all the time. #2 is hacky but hard to solve without duplicating the bounce buffer and coroutine wrapping logic in bdrv_pread() (Kevin has suggested calling bdrv_co_readv() internally instead of bdrv_pread()). Stefan
diff --git a/block.c b/block.c index 01b66d8..deaf0a0 100644 --- a/block.c +++ b/block.c @@ -2544,7 +2544,35 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, } } - ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); + if (!bs->drv->protocol_name) { + ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov); + } else { + /* NBD doesn't support reading beyond end of file. */ + int64_t len, total_sectors, max_nb_sectors; + + len = bdrv_getlength(bs); + if (len < 0) { + ret = len; + goto out; + } + + total_sectors = len >> BDRV_SECTOR_BITS; + max_nb_sectors = MAX(0, total_sectors - sector_num); + if (max_nb_sectors > 0) { + ret = drv->bdrv_co_readv(bs, sector_num, + MIN(nb_sectors, max_nb_sectors), qiov); + } else { + ret = 0; + } + + /* Reading beyond end of file is supposed to produce zeroes */ + if (ret == 0 && total_sectors < sector_num + nb_sectors) { + size_t offset = MAX(0, total_sectors - sector_num); + size_t bytes = (sector_num + nb_sectors - offset) * + BDRV_SECTOR_SIZE; + qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes); + } + } out: tracked_request_end(&req);