Message ID | a56a55df5012c863f79cdb45ce2ba5a886ac37f4.1362580930.git.jcody@redhat.com |
---|---|
State | New |
Headers | show |
Am 06.03.2013 15:47, schrieb Jeff Cody: > Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks > QCOW images, as well other future image formats (such as VHDX) that may call > bdrv_truncate(bs->file) from within a read/write operation. For example, QCOW > will cause an assert, due to tracked_requests not being empty (since the > read/write that called bdrv_truncate() is still in progress). > > This modifies the check to only force the bdrv_drain_all() call if the BDS to > be truncated is not growable, or if we are shrinking the BDS. Otherwise, if we > are just growing the file, allow it to happen without forcing a call to > bdrv_drain_all(). This will actually break iscsi_truncate(). Paolo requested that the iscsi_tuncate command should use sync I/O for rereading the LUN capacity. Using sync I/O is only possible if there are no async requests in-flight. Looking at the source I have not found a place where bs->growable is set to 0 for any block driver, maybe I miss something. Having bs->growable for iSCSI would also be ok. Shouldn't it be possible to call bdrv_drain_all() any time? There are other places where this is called. One I have in mind is e.g. if you cancel an ongoing block migration. Peter > > Signed-off-by: Jeff Cody <jcody@redhat.com> > --- > block.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/block.c b/block.c > index 124a9eb..dce844c 100644 > --- a/block.c > +++ b/block.c > @@ -2450,8 +2450,14 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset) > if (bdrv_in_use(bs)) > return -EBUSY; > > - /* There better not be any in-flight IOs when we truncate the device. */ > - bdrv_drain_all(); > + /* Don't force a drain if we are just growing the file - otherwise, > + * using bdrv_truncate() from within a block driver in a read/write > + * operation will cause an assert, because bdrv_drain_all() will assert if > + * a tracked_request is still in progress. */ > + if (!bs->growable || offset < bdrv_getlength(bs)) { > + /* There better not be any in-flight IOs when we truncate the device. */ > + bdrv_drain_all(); > + } > > ret = drv->bdrv_truncate(bs, offset); > if (ret == 0) { >
Il 06/03/2013 18:50, Peter Lieven ha scritto: >> > Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks >> > QCOW images, as well other future image formats (such as VHDX) that may call >> > bdrv_truncate(bs->file) from within a read/write operation. For example, QCOW >> > will cause an assert, due to tracked_requests not being empty (since the >> > read/write that called bdrv_truncate() is still in progress). I'm not sure such bdrv_truncate calls are necessary. QCOW2 doesn't have them (almost; there is one in qcow2_write_compressed, I'm not even sure that one is necessary though), and I think QCOW's breaks using it with a block device as a backing file. Paolo
On Wed, Mar 06, 2013 at 07:06:24PM +0100, Paolo Bonzini wrote: > Il 06/03/2013 18:50, Peter Lieven ha scritto: > >> > Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks > >> > QCOW images, as well other future image formats (such as VHDX) that may call > >> > bdrv_truncate(bs->file) from within a read/write operation. For example, QCOW > >> > will cause an assert, due to tracked_requests not being empty (since the > >> > read/write that called bdrv_truncate() is still in progress). > > I'm not sure such bdrv_truncate calls are necessary. QCOW2 doesn't have > them (almost; there is one in qcow2_write_compressed, I'm not even sure > that one is necessary though), and I think QCOW's breaks using it with a > block device as a backing file. > > Paolo QCOW breaks with it using a normal raw posix file as a device. As a test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that drive mounted, and try to partition and format it. QEMU now asserts. The nicety of being able to using truncate during a write call, especially for VHDX (which can have relatively large block/cluster sizes), so to grow the file sparsely in a dynamically allocated file.
Am 06.03.2013 19:06, schrieb Paolo Bonzini: > Il 06/03/2013 18:50, Peter Lieven ha scritto: >>>> Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks >>>> QCOW images, as well other future image formats (such as VHDX) that may call >>>> bdrv_truncate(bs->file) from within a read/write operation. For example, QCOW >>>> will cause an assert, due to tracked_requests not being empty (since the >>>> read/write that called bdrv_truncate() is still in progress). > > I'm not sure such bdrv_truncate calls are necessary. QCOW2 doesn't have > them (almost; there is one in qcow2_write_compressed, I'm not even sure > that one is necessary though), and I think QCOW's breaks using it with a > block device as a backing file. I think we have to check the sense of bs->growable nevertheless. It is set to 1 for all drivers which can't be right?! int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags) { BlockDriverState *bs; BlockDriver *drv; int ret; drv = bdrv_find_protocol(filename); if (!drv) { return -ENOENT; } bs = bdrv_new(""); ret = bdrv_open_common(bs, NULL, filename, flags, drv); if (ret < 0) { bdrv_delete(bs); return ret; } bs->growable = 1; *pbs = bs; return 0; } I think each driver should set it accordingly on its own. Peter > > Paolo >
Il 06/03/2013 19:14, Jeff Cody ha scritto: > QCOW breaks with it using a normal raw posix file as a device. As a > test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that > drive mounted, and try to partition and format it. QEMU now asserts. > > The nicety of being able to using truncate during a write call, > especially for VHDX (which can have relatively large block/cluster > sizes), so to grow the file sparsely in a dynamically allocated file. Perhaps we need two APIs, "truncate" and "revalidate". Truncate should be a no-op if (!bs->growable). Revalidate could be called by the block_resize monitor command with no size specified. Paolo
On Wed, Mar 06, 2013 at 06:50:56PM +0100, Peter Lieven wrote: > Am 06.03.2013 15:47, schrieb Jeff Cody: > > Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks > > QCOW images, as well other future image formats (such as VHDX) that may call > > bdrv_truncate(bs->file) from within a read/write operation. For example, QCOW > > will cause an assert, due to tracked_requests not being empty (since the > > read/write that called bdrv_truncate() is still in progress). > > > > This modifies the check to only force the bdrv_drain_all() call if the BDS to > > be truncated is not growable, or if we are shrinking the BDS. Otherwise, if we > > are just growing the file, allow it to happen without forcing a call to > > bdrv_drain_all(). > > This will actually break iscsi_truncate(). Paolo requested that the iscsi_tuncate command > should use sync I/O for rereading the LUN capacity. Using sync I/O is only possible if > there are no async requests in-flight. > > Looking at the source I have not found a place where bs->growable is set to 0 for any > block driver, maybe I miss something. Having bs->growable for iSCSI would also be ok. > > Shouldn't it be possible to call bdrv_drain_all() any time? There are other places > where this is called. One I have in mind is e.g. if you cancel an ongoing block migration. > That is a good point - what happens to QCOW now, if there is a block job in progress (e.g. block-commit, block-stream, etc...)? I would imagine -EBUSY gets thrown, since bdrv_truncate() checks bdrv_in_use(). > > > > > Signed-off-by: Jeff Cody <jcody@redhat.com> > > --- > > block.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/block.c b/block.c > > index 124a9eb..dce844c 100644 > > --- a/block.c > > +++ b/block.c > > @@ -2450,8 +2450,14 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset) > > if (bdrv_in_use(bs)) > > return -EBUSY; > > > > - /* There better not be any in-flight IOs when we truncate the device. */ > > - bdrv_drain_all(); > > + /* Don't force a drain if we are just growing the file - otherwise, > > + * using bdrv_truncate() from within a block driver in a read/write > > + * operation will cause an assert, because bdrv_drain_all() will assert if > > + * a tracked_request is still in progress. */ > > + if (!bs->growable || offset < bdrv_getlength(bs)) { > > + /* There better not be any in-flight IOs when we truncate the device. */ > > + bdrv_drain_all(); > > + } > > > > ret = drv->bdrv_truncate(bs, offset); > > if (ret == 0) { > > >
Am 06.03.2013 19:42, schrieb Jeff Cody: > On Wed, Mar 06, 2013 at 07:25:39PM +0100, Peter Lieven wrote: >> Am 06.03.2013 19:14, schrieb Jeff Cody: >>> On Wed, Mar 06, 2013 at 07:06:24PM +0100, Paolo Bonzini wrote: >>>> Il 06/03/2013 18:50, Peter Lieven ha scritto: >>>>>>> Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks >>>>>>> QCOW images, as well other future image formats (such as VHDX) that may call >>>>>>> bdrv_truncate(bs->file) from within a read/write operation. For example, QCOW >>>>>>> will cause an assert, due to tracked_requests not being empty (since the >>>>>>> read/write that called bdrv_truncate() is still in progress). >>>> >>>> I'm not sure such bdrv_truncate calls are necessary. QCOW2 doesn't have >>>> them (almost; there is one in qcow2_write_compressed, I'm not even sure >>>> that one is necessary though), and I think QCOW's breaks using it with a >>>> block device as a backing file. >>>> >>>> Paolo >>> >>> QCOW breaks with it using a normal raw posix file as a device. As a >>> test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that >>> drive mounted, and try to partition and format it. QEMU now asserts. >>> >>> The nicety of being able to using truncate during a write call, >>> especially for VHDX (which can have relatively large block/cluster >>> sizes), so to grow the file sparsely in a dynamically allocated file. >>> >> >> why don't you call vhdx_truncate() there instead of bdrv_truncate()? >> >> Peter >> > > What I want to do is grow the underlying file, so bdrv_truncate() is > called on bs->file. vhdx_truncate() would primarily care about > updating the vhdx file structures / headers to reflect the new file > size (which is what vhdx_block_allocate() does), but I need to > actually make the underlying file larger (and ideally, in a sparse > fashion). > As Paolo suggested we probably need to fix the underlying misuage of bdrv_truncate() for non growable devices and for that we additionally need to set bs->growable not to 1 per default, but only if the devices are really growable: qcow2, raw (only on regular file), vhdx? I maybe missing some. Paolo?
On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote: > Il 06/03/2013 19:14, Jeff Cody ha scritto: > > QCOW breaks with it using a normal raw posix file as a device. As a > > test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that > > drive mounted, and try to partition and format it. QEMU now asserts. > > > > The nicety of being able to using truncate during a write call, > > especially for VHDX (which can have relatively large block/cluster > > sizes), so to grow the file sparsely in a dynamically allocated file. > > Perhaps we need two APIs, "truncate" and "revalidate". > > Truncate should be a no-op if (!bs->growable). > > Revalidate could be called by the block_resize monitor command with no > size specified. > > Paolo I think that is a good solution. Is it better to have "truncate" and "revalidate", or "truncate" and "grow", with grow being a subset of truncate, with fewer restrictions? There may still be operations where it is OK to grow a file, but not OK to shrink it.
Am 06.03.2013 19:48, schrieb Jeff Cody: > On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote: >> Il 06/03/2013 19:14, Jeff Cody ha scritto: >>> QCOW breaks with it using a normal raw posix file as a device. As a >>> test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that >>> drive mounted, and try to partition and format it. QEMU now asserts. >>> >>> The nicety of being able to using truncate during a write call, >>> especially for VHDX (which can have relatively large block/cluster >>> sizes), so to grow the file sparsely in a dynamically allocated file. >> >> Perhaps we need two APIs, "truncate" and "revalidate". >> >> Truncate should be a no-op if (!bs->growable). >> >> Revalidate could be called by the block_resize monitor command with no >> size specified. >> >> Paolo > > I think that is a good solution. Is it better to have "truncate" and > "revalidate", or "truncate" and "grow", with grow being a subset of > truncate, with fewer restrictions? There may still be operations > where it is OK to grow a file, but not OK to shrink it. > Or as a first step: a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable) b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there c) Fix the value of bs->growable for all drivers Peter
Il 06/03/2013 19:32, Jeff Cody ha scritto: > On Wed, Mar 06, 2013 at 06:50:56PM +0100, Peter Lieven wrote: >> Looking at the source I have not found a place where bs->growable is set to 0 for any >> block driver, maybe I miss something. Having bs->growable for iSCSI would also be ok. >> >> Shouldn't it be possible to call bdrv_drain_all() any time? There are other places >> where this is called. One I have in mind is e.g. if you cancel an ongoing block migration. > > That is a good point - what happens to QCOW now, if there is a block > job in progress (e.g. block-commit, block-stream, etc...)? I would > imagine -EBUSY gets thrown, since bdrv_truncate() checks > bdrv_in_use(). No, bs->file is not marked in use. Only bs is. Paolo
Il 06/03/2013 20:03, Peter Lieven ha scritto: > Am 06.03.2013 19:48, schrieb Jeff Cody: >> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote: >>> Il 06/03/2013 19:14, Jeff Cody ha scritto: >>>> QCOW breaks with it using a normal raw posix file as a device. As a >>>> test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that >>>> drive mounted, and try to partition and format it. QEMU now asserts. >>>> >>>> The nicety of being able to using truncate during a write call, >>>> especially for VHDX (which can have relatively large block/cluster >>>> sizes), so to grow the file sparsely in a dynamically allocated file. >>> >>> Perhaps we need two APIs, "truncate" and "revalidate". >>> >>> Truncate should be a no-op if (!bs->growable). >>> >>> Revalidate could be called by the block_resize monitor command with no >>> size specified. >>> >>> Paolo >> >> I think that is a good solution. Is it better to have "truncate" and >> "revalidate", or "truncate" and "grow", with grow being a subset of >> truncate, with fewer restrictions? There may still be operations >> where it is OK to grow a file, but not OK to shrink it. > > Or as a first step: > > a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable) > b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there > c) Fix the value of bs->growable for all drivers Let's start from (c). bdrv_file_open sets bs->growable = 1. I think it should be removed and only the file protocol should set it. Then we can add bdrv_revalidate and, for block_resize, call bdrv_revalidate+bdrv_truncate. For bs->growable = 0 && !bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual size is the same or bigger as the one requested, and fail otherwise. Paolo
Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben: > Il 06/03/2013 20:03, Peter Lieven ha scritto: > > Am 06.03.2013 19:48, schrieb Jeff Cody: > >> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote: > >>> Il 06/03/2013 19:14, Jeff Cody ha scritto: > >>>> QCOW breaks with it using a normal raw posix file as a device. As a > >>>> test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that > >>>> drive mounted, and try to partition and format it. QEMU now asserts. > >>>> > >>>> The nicety of being able to using truncate during a write call, > >>>> especially for VHDX (which can have relatively large block/cluster > >>>> sizes), so to grow the file sparsely in a dynamically allocated file. > >>> > >>> Perhaps we need two APIs, "truncate" and "revalidate". > >>> > >>> Truncate should be a no-op if (!bs->growable). > >>> > >>> Revalidate could be called by the block_resize monitor command with no > >>> size specified. > >>> > >>> Paolo > >> > >> I think that is a good solution. Is it better to have "truncate" and > >> "revalidate", or "truncate" and "grow", with grow being a subset of > >> truncate, with fewer restrictions? There may still be operations > >> where it is OK to grow a file, but not OK to shrink it. What semantics would the both operations have? Is truncate the same as it used to be? I don't really understand what "revalidate" would do, it sounds like a read-only operation from its name? > > Or as a first step: > > > > a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable) > > b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there > > c) Fix the value of bs->growable for all drivers > > Let's start from (c). bdrv_file_open sets bs->growable = 1. I think it > should be removed and only the file protocol should set it. This is probably right. > Then we can add bdrv_revalidate and, for block_resize, call > bdrv_revalidate+bdrv_truncate. For bs->growable = 0 && > !bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual > size is the same or bigger as the one requested, and fail otherwise. This one not so much. bs->growable does not mean that you can use bdrv_truncate. It rather means that you may write beyond the end of the file even without truncating it first. Mabye bs->auto_grow would be a better for it. So bs->growable == true implies that bdrv_truncate() should be allowed as well, because obviously changing the BDS size is possbile, but it's not true the other way round. Kevin
On 06.03.2013 21:39, Paolo Bonzini wrote: > Il 06/03/2013 20:03, Peter Lieven ha scritto: >> Am 06.03.2013 19:48, schrieb Jeff Cody: >>> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote: >>>> Il 06/03/2013 19:14, Jeff Cody ha scritto: >>>>> QCOW breaks with it using a normal raw posix file as a device. As a >>>>> test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that >>>>> drive mounted, and try to partition and format it. QEMU now asserts. >>>>> >>>>> The nicety of being able to using truncate during a write call, >>>>> especially for VHDX (which can have relatively large block/cluster >>>>> sizes), so to grow the file sparsely in a dynamically allocated file. >>>> >>>> Perhaps we need two APIs, "truncate" and "revalidate". >>>> >>>> Truncate should be a no-op if (!bs->growable). >>>> >>>> Revalidate could be called by the block_resize monitor command with no >>>> size specified. >>>> >>>> Paolo >>> >>> I think that is a good solution. Is it better to have "truncate" and >>> "revalidate", or "truncate" and "grow", with grow being a subset of >>> truncate, with fewer restrictions? There may still be operations >>> where it is OK to grow a file, but not OK to shrink it. >> >> Or as a first step: >> >> a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable) >> b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there >> c) Fix the value of bs->growable for all drivers > > Let's start from (c). bdrv_file_open sets bs->growable = 1. I think it > should be removed and only the file protocol should set it. > > Then we can add bdrv_revalidate and, for block_resize, call > bdrv_revalidate+bdrv_truncate. For bs->growable = 0 && > !bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual > size is the same or bigger as the one requested, and fail otherwise. > > Paolo > Regarding brd_drain_all(). Is the fix right to call it only on device shrink? In this case it has to be added to iscsi_truncate as well. Peter
On 07.03.2013 09:50, Kevin Wolf wrote: > Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben: >> Il 06/03/2013 20:03, Peter Lieven ha scritto: >>> Am 06.03.2013 19:48, schrieb Jeff Cody: >>>> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote: >>>>> Il 06/03/2013 19:14, Jeff Cody ha scritto: >>>>>> QCOW breaks with it using a normal raw posix file as a device. As a >>>>>> test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that >>>>>> drive mounted, and try to partition and format it. QEMU now asserts. >>>>>> >>>>>> The nicety of being able to using truncate during a write call, >>>>>> especially for VHDX (which can have relatively large block/cluster >>>>>> sizes), so to grow the file sparsely in a dynamically allocated file. >>>>> >>>>> Perhaps we need two APIs, "truncate" and "revalidate". >>>>> >>>>> Truncate should be a no-op if (!bs->growable). >>>>> >>>>> Revalidate could be called by the block_resize monitor command with no >>>>> size specified. >>>>> >>>>> Paolo >>>> >>>> I think that is a good solution. Is it better to have "truncate" and >>>> "revalidate", or "truncate" and "grow", with grow being a subset of >>>> truncate, with fewer restrictions? There may still be operations >>>> where it is OK to grow a file, but not OK to shrink it. > > What semantics would the both operations have? Is truncate the same as > it used to be? I don't really understand what "revalidate" would do, it > sounds like a read-only operation from its name? > >>> Or as a first step: >>> >>> a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable) >>> b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there >>> c) Fix the value of bs->growable for all drivers >> >> Let's start from (c). bdrv_file_open sets bs->growable = 1. I think it >> should be removed and only the file protocol should set it. > > This is probably right. If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This flag was introduced as a fix for this problem. bdrv_check_byte_request() does nothing useful if bs->growable is 1. Peter
Am 06.03.2013 um 19:27 hat Peter Lieven geschrieben: > Am 06.03.2013 19:06, schrieb Paolo Bonzini: > > Il 06/03/2013 18:50, Peter Lieven ha scritto: > >>>> Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks > >>>> QCOW images, as well other future image formats (such as VHDX) that may call > >>>> bdrv_truncate(bs->file) from within a read/write operation. For example, QCOW > >>>> will cause an assert, due to tracked_requests not being empty (since the > >>>> read/write that called bdrv_truncate() is still in progress). > > > > I'm not sure such bdrv_truncate calls are necessary. QCOW2 doesn't have > > them (almost; there is one in qcow2_write_compressed, I'm not even sure > > that one is necessary though), and I think QCOW's breaks using it with a > > block device as a backing file. > > I think we have to check the sense of bs->growable nevertheless. It is set > to 1 for all drivers which can't be right?! For everything that goes through bdrv_file_open(), which are protocol drivers, not format drivers. It is required for files so that formats can write past EOF; for other drivers that can't actually grow their backing storage it doesn't hurt because they will return an eror anyway when you write to it. "bdrv_truncate() works" and "bs->growable == true" are totally different things, so even though it doesn't look right at the first sight, bs->growable does its job correctly. In your other mail you're talking about setting it for raw, qcow2, VHDX. This would be discussing on the entirely wrong level, this isn't about formats, but about protocols (file, host_device, nbd, iscsi, http, vvfat...), that are below the format drivers. Kevin
Am 07.03.2013 um 09:53 hat Peter Lieven geschrieben: > On 06.03.2013 21:39, Paolo Bonzini wrote: > >Il 06/03/2013 20:03, Peter Lieven ha scritto: > >>Am 06.03.2013 19:48, schrieb Jeff Cody: > >>>On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote: > >>>>Il 06/03/2013 19:14, Jeff Cody ha scritto: > >>>>>QCOW breaks with it using a normal raw posix file as a device. As a > >>>>>test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that > >>>>>drive mounted, and try to partition and format it. QEMU now asserts. > >>>>> > >>>>>The nicety of being able to using truncate during a write call, > >>>>>especially for VHDX (which can have relatively large block/cluster > >>>>>sizes), so to grow the file sparsely in a dynamically allocated file. > >>>> > >>>>Perhaps we need two APIs, "truncate" and "revalidate". > >>>> > >>>>Truncate should be a no-op if (!bs->growable). > >>>> > >>>>Revalidate could be called by the block_resize monitor command with no > >>>>size specified. > >>>> > >>>>Paolo > >>> > >>>I think that is a good solution. Is it better to have "truncate" and > >>>"revalidate", or "truncate" and "grow", with grow being a subset of > >>>truncate, with fewer restrictions? There may still be operations > >>>where it is OK to grow a file, but not OK to shrink it. > >> > >>Or as a first step: > >> > >>a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable) > >>b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there > >>c) Fix the value of bs->growable for all drivers > > > >Let's start from (c). bdrv_file_open sets bs->growable = 1. I think it > >should be removed and only the file protocol should set it. > > > >Then we can add bdrv_revalidate and, for block_resize, call > >bdrv_revalidate+bdrv_truncate. For bs->growable = 0 && > >!bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual > >size is the same or bigger as the one requested, and fail otherwise. > > > >Paolo > > > > Regarding brd_drain_all(). Is the fix right to call it only on device shrink? > In this case it has to be added to iscsi_truncate as well. The real fix would bdrv_drain(bs). I hope we're not too far away from that today, even though we're not quite there. Kevin
Am 07.03.2013 um 09:56 hat Peter Lieven geschrieben: > On 07.03.2013 09:50, Kevin Wolf wrote: > >Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben: > >>Il 06/03/2013 20:03, Peter Lieven ha scritto: > >>>Am 06.03.2013 19:48, schrieb Jeff Cody: > >>>>On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote: > >>>>>Il 06/03/2013 19:14, Jeff Cody ha scritto: > >>>>>>QCOW breaks with it using a normal raw posix file as a device. As a > >>>>>>test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that > >>>>>>drive mounted, and try to partition and format it. QEMU now asserts. > >>>>>> > >>>>>>The nicety of being able to using truncate during a write call, > >>>>>>especially for VHDX (which can have relatively large block/cluster > >>>>>>sizes), so to grow the file sparsely in a dynamically allocated file. > >>>>> > >>>>>Perhaps we need two APIs, "truncate" and "revalidate". > >>>>> > >>>>>Truncate should be a no-op if (!bs->growable). > >>>>> > >>>>>Revalidate could be called by the block_resize monitor command with no > >>>>>size specified. > >>>>> > >>>>>Paolo > >>>> > >>>>I think that is a good solution. Is it better to have "truncate" and > >>>>"revalidate", or "truncate" and "grow", with grow being a subset of > >>>>truncate, with fewer restrictions? There may still be operations > >>>>where it is OK to grow a file, but not OK to shrink it. > > > >What semantics would the both operations have? Is truncate the same as > >it used to be? I don't really understand what "revalidate" would do, it > >sounds like a read-only operation from its name? > > > >>>Or as a first step: > >>> > >>>a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable) > >>>b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there > >>>c) Fix the value of bs->growable for all drivers > >> > >>Let's start from (c). bdrv_file_open sets bs->growable = 1. I think it > >>should be removed and only the file protocol should set it. > > > >This is probably right. > > If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This > flag was introduced as a fix for this problem. > > bdrv_check_byte_request() does nothing useful if bs->growable is 1. Don't ignore the difference between bdrv_open() and bdrv_file_open(). Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is opened through bdrv_open() and has bs->growable = false. Its bs->file is using the file protocol (raw-posix driver) and opened by bdrv_file_open(). This one has bs->file->growable = true so that qcow2 can write to newly allocated areas without calling bdrv_truncate() first. Kevin
Am 07.03.2013 um 10:03 schrieb Kevin Wolf <kwolf@redhat.com>: > Am 07.03.2013 um 09:56 hat Peter Lieven geschrieben: >> On 07.03.2013 09:50, Kevin Wolf wrote: >>> Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben: >>>> Il 06/03/2013 20:03, Peter Lieven ha scritto: >>>>> Am 06.03.2013 19:48, schrieb Jeff Cody: >>>>>> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote: >>>>>>> Il 06/03/2013 19:14, Jeff Cody ha scritto: >>>>>>>> QCOW breaks with it using a normal raw posix file as a device. As a >>>>>>>> test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that >>>>>>>> drive mounted, and try to partition and format it. QEMU now asserts. >>>>>>>> >>>>>>>> The nicety of being able to using truncate during a write call, >>>>>>>> especially for VHDX (which can have relatively large block/cluster >>>>>>>> sizes), so to grow the file sparsely in a dynamically allocated file. >>>>>>> >>>>>>> Perhaps we need two APIs, "truncate" and "revalidate". >>>>>>> >>>>>>> Truncate should be a no-op if (!bs->growable). >>>>>>> >>>>>>> Revalidate could be called by the block_resize monitor command with no >>>>>>> size specified. >>>>>>> >>>>>>> Paolo >>>>>> >>>>>> I think that is a good solution. Is it better to have "truncate" and >>>>>> "revalidate", or "truncate" and "grow", with grow being a subset of >>>>>> truncate, with fewer restrictions? There may still be operations >>>>>> where it is OK to grow a file, but not OK to shrink it. >>> >>> What semantics would the both operations have? Is truncate the same as >>> it used to be? I don't really understand what "revalidate" would do, it >>> sounds like a read-only operation from its name? >>> >>>>> Or as a first step: >>>>> >>>>> a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable) >>>>> b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there >>>>> c) Fix the value of bs->growable for all drivers >>>> >>>> Let's start from (c). bdrv_file_open sets bs->growable = 1. I think it >>>> should be removed and only the file protocol should set it. >>> >>> This is probably right. >> >> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This >> flag was introduced as a fix for this problem. >> >> bdrv_check_byte_request() does nothing useful if bs->growable is 1. > > Don't ignore the difference between bdrv_open() and bdrv_file_open(). > Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is > opened through bdrv_open() and has bs->growable = false. Its bs->file is > using the file protocol (raw-posix driver) and opened by > bdrv_file_open(). This one has bs->file->growable = true so that qcow2 > can write to newly allocated areas without calling bdrv_truncate() > first. Sorry, I have to admin I am little confused by what is happening in bdrv_open(). However, what I can say is that bs->growable is 1 for an iSCSI backed harddrive and I wonder how this can happen if bdrv_file_open is not used for opening it because that is the only place where bs->growable is set to 1. cmdline: x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0 -vnc :1 -boot dc -monitor stdio Peter > > Kevin
Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben: > >> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This > >> flag was introduced as a fix for this problem. > >> > >> bdrv_check_byte_request() does nothing useful if bs->growable is 1. > > > > Don't ignore the difference between bdrv_open() and bdrv_file_open(). > > Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is > > opened through bdrv_open() and has bs->growable = false. Its bs->file is > > using the file protocol (raw-posix driver) and opened by > > bdrv_file_open(). This one has bs->file->growable = true so that qcow2 > > can write to newly allocated areas without calling bdrv_truncate() > > first. > > Sorry, I have to admin I am little confused by what is happening in bdrv_open(). > > However, what I can say is that bs->growable is 1 for an iSCSI backed > harddrive and I wonder how this can happen if bdrv_file_open is not used for > opening it because that is the only place where bs->growable is set to 1. > > cmdline: > x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0 -vnc :1 -boot dc -monitor stdio It is used for the iscsi driver. You have a raw BDS (growable == false) on top of an iscsi one (growable == true). Kevin
Am 07.03.2013 um 10:22 schrieb Kevin Wolf <kwolf@redhat.com>: > Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben: >>>> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This >>>> flag was introduced as a fix for this problem. >>>> >>>> bdrv_check_byte_request() does nothing useful if bs->growable is 1. >>> >>> Don't ignore the difference between bdrv_open() and bdrv_file_open(). >>> Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is >>> opened through bdrv_open() and has bs->growable = false. Its bs->file is >>> using the file protocol (raw-posix driver) and opened by >>> bdrv_file_open(). This one has bs->file->growable = true so that qcow2 >>> can write to newly allocated areas without calling bdrv_truncate() >>> first. >> >> Sorry, I have to admin I am little confused by what is happening in bdrv_open(). >> >> However, what I can say is that bs->growable is 1 for an iSCSI backed >> harddrive and I wonder how this can happen if bdrv_file_open is not used for >> opening it because that is the only place where bs->growable is set to 1. >> >> cmdline: >> x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0 -vnc :1 -boot dc -monitor stdio > > It is used for the iscsi driver. You have a raw BDS (growable == false) > on top of an iscsi one (growable == true). Ok, but growable == true is wrong for the iSCSI driver isn`t it? Peter
Am 07.03.2013 um 10:25 hat Peter Lieven geschrieben: > > Am 07.03.2013 um 10:22 schrieb Kevin Wolf <kwolf@redhat.com>: > > > Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben: > >>>> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This > >>>> flag was introduced as a fix for this problem. > >>>> > >>>> bdrv_check_byte_request() does nothing useful if bs->growable is 1. > >>> > >>> Don't ignore the difference between bdrv_open() and bdrv_file_open(). > >>> Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is > >>> opened through bdrv_open() and has bs->growable = false. Its bs->file is > >>> using the file protocol (raw-posix driver) and opened by > >>> bdrv_file_open(). This one has bs->file->growable = true so that qcow2 > >>> can write to newly allocated areas without calling bdrv_truncate() > >>> first. > >> > >> Sorry, I have to admin I am little confused by what is happening in bdrv_open(). > >> > >> However, what I can say is that bs->growable is 1 for an iSCSI backed > >> harddrive and I wonder how this can happen if bdrv_file_open is not used for > >> opening it because that is the only place where bs->growable is set to 1. > >> > >> cmdline: > >> x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0 -vnc :1 -boot dc -monitor stdio > > > > It is used for the iscsi driver. You have a raw BDS (growable == false) > > on top of an iscsi one (growable == true). > > Ok, but growable == true is wrong for the iSCSI driver isn`t it? I guess it depends on your definition. If you say that growable includes the capability of growing the image, then yes, it's wrong. If you only interpret it as permission to write beyond EOF (if the driver supports that), then it's right even though any such attempt will result in an error. Practically speaking, the difference is between -EIO returned from bdrv_check_byte_request() and -EIO from iscsi_aio_write16_cb(), i.e. the result is the same. Kevin
Am 07.03.2013 um 11:00 schrieb Kevin Wolf <kwolf@redhat.com>: > Am 07.03.2013 um 10:25 hat Peter Lieven geschrieben: >> >> Am 07.03.2013 um 10:22 schrieb Kevin Wolf <kwolf@redhat.com>: >> >>> Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben: >>>>>> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This >>>>>> flag was introduced as a fix for this problem. >>>>>> >>>>>> bdrv_check_byte_request() does nothing useful if bs->growable is 1. >>>>> >>>>> Don't ignore the difference between bdrv_open() and bdrv_file_open(). >>>>> Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is >>>>> opened through bdrv_open() and has bs->growable = false. Its bs->file is >>>>> using the file protocol (raw-posix driver) and opened by >>>>> bdrv_file_open(). This one has bs->file->growable = true so that qcow2 >>>>> can write to newly allocated areas without calling bdrv_truncate() >>>>> first. >>>> >>>> Sorry, I have to admin I am little confused by what is happening in bdrv_open(). >>>> >>>> However, what I can say is that bs->growable is 1 for an iSCSI backed >>>> harddrive and I wonder how this can happen if bdrv_file_open is not used for >>>> opening it because that is the only place where bs->growable is set to 1. >>>> >>>> cmdline: >>>> x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0 -vnc :1 -boot dc -monitor stdio >>> >>> It is used for the iscsi driver. You have a raw BDS (growable == false) >>> on top of an iscsi one (growable == true). >> >> Ok, but growable == true is wrong for the iSCSI driver isn`t it? > > I guess it depends on your definition. If you say that growable includes > the capability of growing the image, then yes, it's wrong. If you only > interpret it as permission to write beyond EOF (if the driver supports > that), then it's right even though any such attempt will result in an > error. > > Practically speaking, the difference is between -EIO returned from > bdrv_check_byte_request() and -EIO from iscsi_aio_write16_cb(), i.e. > the result is the same. Yes, but there are many broken storage implementations outside which might react freaky if there is a read/write beyond the LUN size or with negative offset. The check_request would block such requests earlier protecting the infrastructure. I have a case open with the vendor of a storage system where I am able to crash the storage sending illegal requests. I personally would feel more comfortable if illegal requests where just not possible. Peter
Il 07/03/2013 09:53, Peter Lieven ha scritto: >> >> Then we can add bdrv_revalidate and, for block_resize, call >> bdrv_revalidate+bdrv_truncate. For bs->growable = 0 && >> !bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual >> size is the same or bigger as the one requested, and fail otherwise. >> > > Regarding brd_drain_all(). Is the fix right to call it only on device > shrink? I think the fix is to only call it for the monitor command. Optionally, when shrinking, assert that there are no requests in flight. Paolo > In this case it has to be added to iscsi_truncate as well.
Il 07/03/2013 09:50, Kevin Wolf ha scritto: > Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben: >> Il 06/03/2013 20:03, Peter Lieven ha scritto: >>> Am 06.03.2013 19:48, schrieb Jeff Cody: >>>> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote: >>>>> Il 06/03/2013 19:14, Jeff Cody ha scritto: >>>>>> QCOW breaks with it using a normal raw posix file as a device. As a >>>>>> test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that >>>>>> drive mounted, and try to partition and format it. QEMU now asserts. >>>>>> >>>>>> The nicety of being able to using truncate during a write call, >>>>>> especially for VHDX (which can have relatively large block/cluster >>>>>> sizes), so to grow the file sparsely in a dynamically allocated file. >>>>> >>>>> Perhaps we need two APIs, "truncate" and "revalidate". >>>>> >>>>> Truncate should be a no-op if (!bs->growable). >>>>> >>>>> Revalidate could be called by the block_resize monitor command with no >>>>> size specified. >>>> >>>> I think that is a good solution. Is it better to have "truncate" and >>>> "revalidate", or "truncate" and "grow", with grow being a subset of >>>> truncate, with fewer restrictions? There may still be operations >>>> where it is OK to grow a file, but not OK to shrink it. > > What semantics would the both operations have? Is truncate the same as > it used to be? I don't really understand what "revalidate" would do, it > sounds like a read-only operation from its name? Revalidate would update the current size. Files fetch it on all calls to bdrv_getlength, but other backends cache it. It would visit the BDS ->file chain from the bottom (bs->file->file->file...) to the top implicitly, with no need for an explicit recursion in the callback (like we do for flush_to_os). Before starting the recursion, bdrv_revalidate does a bdrv_drain_all, so using the current iscsi_truncate for iscsi_revalidate would be fine. The block_resize command will always call revalidate before doing anything. Then block_resize will try to do a bdrv_truncate if the callback is there. Another change that could make sense, is to make bdrv_truncate succeed if there is no bdrv_truncate callback but bs is larger than the requested size. This would be a difference from today's if (!drv->bdrv_truncate) return -ENOTSUP; And it could even do the same if the callback is there, but returns ENOTSUP. It would simplify some code, like this in raw-posix.c's raw_truncate: } else if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) { if (offset > raw_getlength(bs)) { return -EINVAL; } } else { return -ENOTSUP; } The "else if" branch can just go away. The remaining question is about usage of bdrv_truncate in the formats. One important aspect is that both QCOW and VHDX, in addition to using bdrv_truncate to extend the file, rely on bdrv_getlength to fetch the last _used_ byte of the file, not the available space. I think we can say that bdrv_getlength behaves like that iff bs->growable. If so, QCOW and VHDX should fail to open for write if the underlying file has !bs->growable. Which will never fail right now, but it will as soon as we implement this: >> Let's start from (c). bdrv_file_open sets bs->growable = 1. I think it >> should be removed and only the file protocol should set it. > > This is probably right. Good. More precisely, raw_open should only set it if the file is a regular file. Paolo
On 07.03.2013 17:09, Paolo Bonzini wrote: > Il 07/03/2013 09:53, Peter Lieven ha scritto: >>> >>> Then we can add bdrv_revalidate and, for block_resize, call >>> bdrv_revalidate+bdrv_truncate. For bs->growable = 0 && >>> !bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual >>> size is the same or bigger as the one requested, and fail otherwise. >>> >> >> Regarding brd_drain_all(). Is the fix right to call it only on device >> shrink? > > I think the fix is to only call it for the monitor command. Optionally, > when shrinking, assert that there are no requests in flight. Okay. What is the plan? just fix this or fix the whole thing (which seems to be some work). The suggested patch from Jeff will break iscsi_truncate as bs->growable is 1 currently. Peter > > Paolo > >> In this case it has to be added to iscsi_truncate as well. >
Il 08/03/2013 08:53, Peter Lieven ha scritto: >> >> I think the fix is to only call it for the monitor command. Optionally, >> when shrinking, assert that there are no requests in flight. > > Okay. > > What is the plan? just fix this or fix the whole thing (which seems to > be some > work). > > The suggested patch from Jeff will break iscsi_truncate as bs->growable > is 1 currently. I guess it's up to Kevin and Jeff. Paolo
Am 08.03.2013 um 10:23 hat Paolo Bonzini geschrieben: > Il 08/03/2013 08:53, Peter Lieven ha scritto: > >> > >> I think the fix is to only call it for the monitor command. Optionally, > >> when shrinking, assert that there are no requests in flight. > > > > Okay. > > > > What is the plan? just fix this or fix the whole thing (which seems to > > be some > > work). > > > > The suggested patch from Jeff will break iscsi_truncate as bs->growable > > is 1 currently. > > I guess it's up to Kevin and Jeff. Someone needs to send a fix for the qcow1 (and probably vmdk) regression, otherwise I'll have to revert the patch. Kevin
Am 08.03.2013 um 10:35 schrieb Kevin Wolf <kwolf@redhat.com>: > Am 08.03.2013 um 10:23 hat Paolo Bonzini geschrieben: >> Il 08/03/2013 08:53, Peter Lieven ha scritto: >>>> >>>> I think the fix is to only call it for the monitor command. Optionally, >>>> when shrinking, assert that there are no requests in flight. >>> >>> Okay. >>> >>> What is the plan? just fix this or fix the whole thing (which seems to >>> be some >>> work). >>> >>> The suggested patch from Jeff will break iscsi_truncate as bs->growable >>> is 1 currently. >> >> I guess it's up to Kevin and Jeff. > > Someone needs to send a fix for the qcow1 (and probably vmdk) > regression, otherwise I'll have to revert the patch. What about Paolos suggestion to call bdrv_drain_all() from the block_resize command and not in bdrv_truncate? It is not a the complete solution, but it will fix the regression. However, what happens if someone resizes a qcow2 device? Peter > > Kevin
Am 08.03.2013 um 12:46 hat Peter Lieven geschrieben: > > Am 08.03.2013 um 10:35 schrieb Kevin Wolf <kwolf@redhat.com>: > > > Am 08.03.2013 um 10:23 hat Paolo Bonzini geschrieben: > >> Il 08/03/2013 08:53, Peter Lieven ha scritto: > >>>> > >>>> I think the fix is to only call it for the monitor command. Optionally, > >>>> when shrinking, assert that there are no requests in flight. > >>> > >>> Okay. > >>> > >>> What is the plan? just fix this or fix the whole thing (which seems to > >>> be some > >>> work). > >>> > >>> The suggested patch from Jeff will break iscsi_truncate as bs->growable > >>> is 1 currently. > >> > >> I guess it's up to Kevin and Jeff. > > > > Someone needs to send a fix for the qcow1 (and probably vmdk) > > regression, otherwise I'll have to revert the patch. > > What about Paolos suggestion to call bdrv_drain_all() from the block_resize > command and not in bdrv_truncate? It is not a the complete solution, but it > will fix the regression. However, what happens if someone resizes a qcow2 > device? I suppose you mean qcow1? It doesn't support resizing images, but even if it did, this shouldn't be a problem: The problematic case is the call of bdrv_drain_all() during a read/write function, which would have to wait for itself to complete. As soon as you restrict the bdrv_drain_all() to the monitor, waiting for in-flight I/Os should just work. Kevin
Am 08.03.2013 12:56, schrieb Kevin Wolf: > Am 08.03.2013 um 12:46 hat Peter Lieven geschrieben: >> >> Am 08.03.2013 um 10:35 schrieb Kevin Wolf <kwolf@redhat.com>: >> >>> Am 08.03.2013 um 10:23 hat Paolo Bonzini geschrieben: >>>> Il 08/03/2013 08:53, Peter Lieven ha scritto: >>>>>> >>>>>> I think the fix is to only call it for the monitor command. Optionally, >>>>>> when shrinking, assert that there are no requests in flight. >>>>> >>>>> Okay. >>>>> >>>>> What is the plan? just fix this or fix the whole thing (which seems to >>>>> be some >>>>> work). >>>>> >>>>> The suggested patch from Jeff will break iscsi_truncate as bs->growable >>>>> is 1 currently. >>>> >>>> I guess it's up to Kevin and Jeff. >>> >>> Someone needs to send a fix for the qcow1 (and probably vmdk) >>> regression, otherwise I'll have to revert the patch. >> >> What about Paolos suggestion to call bdrv_drain_all() from the block_resize >> command and not in bdrv_truncate? It is not a the complete solution, but it >> will fix the regression. However, what happens if someone resizes a qcow2 >> device? > > I suppose you mean qcow1? It doesn't support resizing images, but even > if it did, this shouldn't be a problem: The problematic case is the call > of bdrv_drain_all() during a read/write function, which would have to > wait for itself to complete. As soon as you restrict the > bdrv_drain_all() to the monitor, waiting for in-flight I/Os should just > work. Ok, then please ignore / revert the patch that added bdrv_drain_all() in bdrv_truncate() and I will sent a new one that adds bdrv_drain_all() to qmp_block_resize(). Peter > > Kevin >
diff --git a/block.c b/block.c index 124a9eb..dce844c 100644 --- a/block.c +++ b/block.c @@ -2450,8 +2450,14 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset) if (bdrv_in_use(bs)) return -EBUSY; - /* There better not be any in-flight IOs when we truncate the device. */ - bdrv_drain_all(); + /* Don't force a drain if we are just growing the file - otherwise, + * using bdrv_truncate() from within a block driver in a read/write + * operation will cause an assert, because bdrv_drain_all() will assert if + * a tracked_request is still in progress. */ + if (!bs->growable || offset < bdrv_getlength(bs)) { + /* There better not be any in-flight IOs when we truncate the device. */ + bdrv_drain_all(); + } ret = drv->bdrv_truncate(bs, offset); if (ret == 0) {
Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks QCOW images, as well other future image formats (such as VHDX) that may call bdrv_truncate(bs->file) from within a read/write operation. For example, QCOW will cause an assert, due to tracked_requests not being empty (since the read/write that called bdrv_truncate() is still in progress). This modifies the check to only force the bdrv_drain_all() call if the BDS to be truncated is not growable, or if we are shrinking the BDS. Otherwise, if we are just growing the file, allow it to happen without forcing a call to bdrv_drain_all(). Signed-off-by: Jeff Cody <jcody@redhat.com> --- block.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)