diff mbox

[1/7] block: only force IO completion in .bdrv_truncate if we are shrinking

Message ID a56a55df5012c863f79cdb45ce2ba5a886ac37f4.1362580930.git.jcody@redhat.com
State New
Headers show

Commit Message

Jeff Cody March 6, 2013, 2:47 p.m. UTC
Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks
QCOW images, as well other future image formats (such as VHDX) that may call
bdrv_truncate(bs->file) from within a read/write operation.  For example, QCOW
will cause an assert, due to tracked_requests not being empty (since the
read/write that called bdrv_truncate() is still in progress).

This modifies the check to only force the bdrv_drain_all() call if the BDS to
be truncated is not growable, or if we are shrinking the BDS.  Otherwise, if we
are just growing the file, allow it to happen without forcing a call to
bdrv_drain_all().

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Peter Lieven March 6, 2013, 5:50 p.m. UTC | #1
Am 06.03.2013 15:47, schrieb Jeff Cody:
> Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks
> QCOW images, as well other future image formats (such as VHDX) that may call
> bdrv_truncate(bs->file) from within a read/write operation.  For example, QCOW
> will cause an assert, due to tracked_requests not being empty (since the
> read/write that called bdrv_truncate() is still in progress).
> 
> This modifies the check to only force the bdrv_drain_all() call if the BDS to
> be truncated is not growable, or if we are shrinking the BDS.  Otherwise, if we
> are just growing the file, allow it to happen without forcing a call to
> bdrv_drain_all().

This will actually break iscsi_truncate(). Paolo requested that the iscsi_tuncate command
should use sync I/O for rereading the LUN capacity. Using sync I/O is only possible if
there are no async requests in-flight.

Looking at the source I have not found a place where bs->growable is set to 0 for any
block driver, maybe I miss something. Having bs->growable for iSCSI would also be ok.

Shouldn't it be possible to call bdrv_drain_all() any time? There are other places
where this is called. One I have in mind is e.g. if you cancel an ongoing block migration.

Peter

> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  block.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 124a9eb..dce844c 100644
> --- a/block.c
> +++ b/block.c
> @@ -2450,8 +2450,14 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>      if (bdrv_in_use(bs))
>          return -EBUSY;
>  
> -    /* There better not be any in-flight IOs when we truncate the device. */
> -    bdrv_drain_all();
> +    /* Don't force a drain if we are just growing the file - otherwise,
> +     * using bdrv_truncate() from within a block driver in a read/write
> +     * operation will cause an assert, because bdrv_drain_all() will assert if
> +     * a tracked_request is still in progress. */
> +    if (!bs->growable || offset < bdrv_getlength(bs)) {
> +        /* There better not be any in-flight IOs when we truncate the device. */
> +        bdrv_drain_all();
> +    }
>  
>      ret = drv->bdrv_truncate(bs, offset);
>      if (ret == 0) {
>
Paolo Bonzini March 6, 2013, 6:06 p.m. UTC | #2
Il 06/03/2013 18:50, Peter Lieven ha scritto:
>> > Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks
>> > QCOW images, as well other future image formats (such as VHDX) that may call
>> > bdrv_truncate(bs->file) from within a read/write operation.  For example, QCOW
>> > will cause an assert, due to tracked_requests not being empty (since the
>> > read/write that called bdrv_truncate() is still in progress).

I'm not sure such bdrv_truncate calls are necessary.  QCOW2 doesn't have
them (almost; there is one in qcow2_write_compressed, I'm not even sure
that one is necessary though), and I think QCOW's breaks using it with a
block device as a backing file.

Paolo
Jeff Cody March 6, 2013, 6:14 p.m. UTC | #3
On Wed, Mar 06, 2013 at 07:06:24PM +0100, Paolo Bonzini wrote:
> Il 06/03/2013 18:50, Peter Lieven ha scritto:
> >> > Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks
> >> > QCOW images, as well other future image formats (such as VHDX) that may call
> >> > bdrv_truncate(bs->file) from within a read/write operation.  For example, QCOW
> >> > will cause an assert, due to tracked_requests not being empty (since the
> >> > read/write that called bdrv_truncate() is still in progress).
> 
> I'm not sure such bdrv_truncate calls are necessary.  QCOW2 doesn't have
> them (almost; there is one in qcow2_write_compressed, I'm not even sure
> that one is necessary though), and I think QCOW's breaks using it with a
> block device as a backing file.
> 
> Paolo

QCOW breaks with it using a normal raw posix file as a device.  As a
test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
drive mounted, and try to partition and format it.  QEMU now asserts.

The nicety of being able to using truncate during a write call,
especially for VHDX (which can have relatively large block/cluster
sizes), so to grow the file sparsely in a dynamically allocated file.
Peter Lieven March 6, 2013, 6:27 p.m. UTC | #4
Am 06.03.2013 19:06, schrieb Paolo Bonzini:
> Il 06/03/2013 18:50, Peter Lieven ha scritto:
>>>> Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks
>>>> QCOW images, as well other future image formats (such as VHDX) that may call
>>>> bdrv_truncate(bs->file) from within a read/write operation.  For example, QCOW
>>>> will cause an assert, due to tracked_requests not being empty (since the
>>>> read/write that called bdrv_truncate() is still in progress).
> 
> I'm not sure such bdrv_truncate calls are necessary.  QCOW2 doesn't have
> them (almost; there is one in qcow2_write_compressed, I'm not even sure
> that one is necessary though), and I think QCOW's breaks using it with a
> block device as a backing file.

I think we have to check the sense of bs->growable nevertheless. It is set
to 1 for all drivers which can't be right?!

int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags)
{
    BlockDriverState *bs;
    BlockDriver *drv;
    int ret;

    drv = bdrv_find_protocol(filename);
    if (!drv) {
        return -ENOENT;
    }

    bs = bdrv_new("");
    ret = bdrv_open_common(bs, NULL, filename, flags, drv);
    if (ret < 0) {
        bdrv_delete(bs);
        return ret;
    }
    bs->growable = 1;
    *pbs = bs;
    return 0;
}

I think each driver should set it accordingly on its own.

Peter


> 
> Paolo
>
Paolo Bonzini March 6, 2013, 6:31 p.m. UTC | #5
Il 06/03/2013 19:14, Jeff Cody ha scritto:
> QCOW breaks with it using a normal raw posix file as a device.  As a
> test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
> drive mounted, and try to partition and format it.  QEMU now asserts.
> 
> The nicety of being able to using truncate during a write call,
> especially for VHDX (which can have relatively large block/cluster
> sizes), so to grow the file sparsely in a dynamically allocated file.

Perhaps we need two APIs, "truncate" and "revalidate".

Truncate should be a no-op if (!bs->growable).

Revalidate could be called by the block_resize monitor command with no
size specified.

Paolo
Jeff Cody March 6, 2013, 6:32 p.m. UTC | #6
On Wed, Mar 06, 2013 at 06:50:56PM +0100, Peter Lieven wrote:
> Am 06.03.2013 15:47, schrieb Jeff Cody:
> > Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks
> > QCOW images, as well other future image formats (such as VHDX) that may call
> > bdrv_truncate(bs->file) from within a read/write operation.  For example, QCOW
> > will cause an assert, due to tracked_requests not being empty (since the
> > read/write that called bdrv_truncate() is still in progress).
> > 
> > This modifies the check to only force the bdrv_drain_all() call if the BDS to
> > be truncated is not growable, or if we are shrinking the BDS.  Otherwise, if we
> > are just growing the file, allow it to happen without forcing a call to
> > bdrv_drain_all().
> 
> This will actually break iscsi_truncate(). Paolo requested that the iscsi_tuncate command
> should use sync I/O for rereading the LUN capacity. Using sync I/O is only possible if
> there are no async requests in-flight.
> 
> Looking at the source I have not found a place where bs->growable is set to 0 for any
> block driver, maybe I miss something. Having bs->growable for iSCSI would also be ok.
> 
> Shouldn't it be possible to call bdrv_drain_all() any time? There are other places
> where this is called. One I have in mind is e.g. if you cancel an ongoing block migration.
> 

That is a good point - what happens to QCOW now, if there is a block
job in progress (e.g. block-commit, block-stream, etc...)?  I would
imagine -EBUSY gets thrown, since bdrv_truncate() checks
bdrv_in_use().

> 
> > 
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > ---
> >  block.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block.c b/block.c
> > index 124a9eb..dce844c 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -2450,8 +2450,14 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
> >      if (bdrv_in_use(bs))
> >          return -EBUSY;
> >  
> > -    /* There better not be any in-flight IOs when we truncate the device. */
> > -    bdrv_drain_all();
> > +    /* Don't force a drain if we are just growing the file - otherwise,
> > +     * using bdrv_truncate() from within a block driver in a read/write
> > +     * operation will cause an assert, because bdrv_drain_all() will assert if
> > +     * a tracked_request is still in progress. */
> > +    if (!bs->growable || offset < bdrv_getlength(bs)) {
> > +        /* There better not be any in-flight IOs when we truncate the device. */
> > +        bdrv_drain_all();
> > +    }
> >  
> >      ret = drv->bdrv_truncate(bs, offset);
> >      if (ret == 0) {
> > 
>
Peter Lieven March 6, 2013, 6:46 p.m. UTC | #7
Am 06.03.2013 19:42, schrieb Jeff Cody:
> On Wed, Mar 06, 2013 at 07:25:39PM +0100, Peter Lieven wrote:
>> Am 06.03.2013 19:14, schrieb Jeff Cody:
>>> On Wed, Mar 06, 2013 at 07:06:24PM +0100, Paolo Bonzini wrote:
>>>> Il 06/03/2013 18:50, Peter Lieven ha scritto:
>>>>>>> Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks
>>>>>>> QCOW images, as well other future image formats (such as VHDX) that may call
>>>>>>> bdrv_truncate(bs->file) from within a read/write operation.  For example, QCOW
>>>>>>> will cause an assert, due to tracked_requests not being empty (since the
>>>>>>> read/write that called bdrv_truncate() is still in progress).
>>>>
>>>> I'm not sure such bdrv_truncate calls are necessary.  QCOW2 doesn't have
>>>> them (almost; there is one in qcow2_write_compressed, I'm not even sure
>>>> that one is necessary though), and I think QCOW's breaks using it with a
>>>> block device as a backing file.
>>>>
>>>> Paolo
>>>
>>> QCOW breaks with it using a normal raw posix file as a device.  As a
>>> test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
>>> drive mounted, and try to partition and format it.  QEMU now asserts.
>>>
>>> The nicety of being able to using truncate during a write call,
>>> especially for VHDX (which can have relatively large block/cluster
>>> sizes), so to grow the file sparsely in a dynamically allocated file.
>>>
>>
>> why don't you call vhdx_truncate() there instead of bdrv_truncate()?
>>
>> Peter
>>
> 
> What I want to do is grow the underlying file, so bdrv_truncate() is
> called on bs->file.  vhdx_truncate() would primarily care about
> updating the vhdx file structures / headers to reflect the new file
> size (which is what vhdx_block_allocate() does), but I need to
> actually make the underlying file larger (and ideally, in a sparse
> fashion). 
> 

As Paolo suggested we probably need to fix the underlying misuage of
bdrv_truncate() for non growable devices and for that we additionally
need to set bs->growable not to 1 per default, but only if the devices
are really growable: qcow2, raw (only on regular file), vhdx? I maybe
missing some.

Paolo?
Jeff Cody March 6, 2013, 6:48 p.m. UTC | #8
On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
> Il 06/03/2013 19:14, Jeff Cody ha scritto:
> > QCOW breaks with it using a normal raw posix file as a device.  As a
> > test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
> > drive mounted, and try to partition and format it.  QEMU now asserts.
> > 
> > The nicety of being able to using truncate during a write call,
> > especially for VHDX (which can have relatively large block/cluster
> > sizes), so to grow the file sparsely in a dynamically allocated file.
> 
> Perhaps we need two APIs, "truncate" and "revalidate".
> 
> Truncate should be a no-op if (!bs->growable).
> 
> Revalidate could be called by the block_resize monitor command with no
> size specified.
> 
> Paolo

I think that is a good solution.  Is it better to have "truncate" and
"revalidate", or "truncate" and "grow", with grow being a subset of
truncate, with fewer restrictions?  There may still be operations
where it is OK to grow a file, but not OK to shrink it.
Peter Lieven March 6, 2013, 7:03 p.m. UTC | #9
Am 06.03.2013 19:48, schrieb Jeff Cody:
> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
>> Il 06/03/2013 19:14, Jeff Cody ha scritto:
>>> QCOW breaks with it using a normal raw posix file as a device.  As a
>>> test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
>>> drive mounted, and try to partition and format it.  QEMU now asserts.
>>>
>>> The nicety of being able to using truncate during a write call,
>>> especially for VHDX (which can have relatively large block/cluster
>>> sizes), so to grow the file sparsely in a dynamically allocated file.
>>
>> Perhaps we need two APIs, "truncate" and "revalidate".
>>
>> Truncate should be a no-op if (!bs->growable).
>>
>> Revalidate could be called by the block_resize monitor command with no
>> size specified.
>>
>> Paolo
> 
> I think that is a good solution.  Is it better to have "truncate" and
> "revalidate", or "truncate" and "grow", with grow being a subset of
> truncate, with fewer restrictions?  There may still be operations
> where it is OK to grow a file, but not OK to shrink it.
> 

Or as a first step:

a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable)
b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there
c) Fix the value of bs->growable for all drivers

Peter
Paolo Bonzini March 6, 2013, 8:22 p.m. UTC | #10
Il 06/03/2013 19:32, Jeff Cody ha scritto:
> On Wed, Mar 06, 2013 at 06:50:56PM +0100, Peter Lieven wrote:
>> Looking at the source I have not found a place where bs->growable is set to 0 for any
>> block driver, maybe I miss something. Having bs->growable for iSCSI would also be ok.
>>
>> Shouldn't it be possible to call bdrv_drain_all() any time? There are other places
>> where this is called. One I have in mind is e.g. if you cancel an ongoing block migration.
> 
> That is a good point - what happens to QCOW now, if there is a block
> job in progress (e.g. block-commit, block-stream, etc...)?  I would
> imagine -EBUSY gets thrown, since bdrv_truncate() checks
> bdrv_in_use().

No, bs->file is not marked in use.  Only bs is.

Paolo
Paolo Bonzini March 6, 2013, 8:39 p.m. UTC | #11
Il 06/03/2013 20:03, Peter Lieven ha scritto:
> Am 06.03.2013 19:48, schrieb Jeff Cody:
>> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
>>> Il 06/03/2013 19:14, Jeff Cody ha scritto:
>>>> QCOW breaks with it using a normal raw posix file as a device.  As a
>>>> test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
>>>> drive mounted, and try to partition and format it.  QEMU now asserts.
>>>>
>>>> The nicety of being able to using truncate during a write call,
>>>> especially for VHDX (which can have relatively large block/cluster
>>>> sizes), so to grow the file sparsely in a dynamically allocated file.
>>>
>>> Perhaps we need two APIs, "truncate" and "revalidate".
>>>
>>> Truncate should be a no-op if (!bs->growable).
>>>
>>> Revalidate could be called by the block_resize monitor command with no
>>> size specified.
>>>
>>> Paolo
>>
>> I think that is a good solution.  Is it better to have "truncate" and
>> "revalidate", or "truncate" and "grow", with grow being a subset of
>> truncate, with fewer restrictions?  There may still be operations
>> where it is OK to grow a file, but not OK to shrink it.
> 
> Or as a first step:
> 
> a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable)
> b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there
> c) Fix the value of bs->growable for all drivers

Let's start from (c).  bdrv_file_open sets bs->growable = 1.  I think it
should be removed and only the file protocol should set it.

Then we can add bdrv_revalidate and, for block_resize, call
bdrv_revalidate+bdrv_truncate.  For bs->growable = 0 &&
!bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual
size is the same or bigger as the one requested, and fail otherwise.

Paolo
Kevin Wolf March 7, 2013, 8:50 a.m. UTC | #12
Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben:
> Il 06/03/2013 20:03, Peter Lieven ha scritto:
> > Am 06.03.2013 19:48, schrieb Jeff Cody:
> >> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
> >>> Il 06/03/2013 19:14, Jeff Cody ha scritto:
> >>>> QCOW breaks with it using a normal raw posix file as a device.  As a
> >>>> test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
> >>>> drive mounted, and try to partition and format it.  QEMU now asserts.
> >>>>
> >>>> The nicety of being able to using truncate during a write call,
> >>>> especially for VHDX (which can have relatively large block/cluster
> >>>> sizes), so to grow the file sparsely in a dynamically allocated file.
> >>>
> >>> Perhaps we need two APIs, "truncate" and "revalidate".
> >>>
> >>> Truncate should be a no-op if (!bs->growable).
> >>>
> >>> Revalidate could be called by the block_resize monitor command with no
> >>> size specified.
> >>>
> >>> Paolo
> >>
> >> I think that is a good solution.  Is it better to have "truncate" and
> >> "revalidate", or "truncate" and "grow", with grow being a subset of
> >> truncate, with fewer restrictions?  There may still be operations
> >> where it is OK to grow a file, but not OK to shrink it.

What semantics would the both operations have? Is truncate the same as
it used to be? I don't really understand what "revalidate" would do, it
sounds like a read-only operation from its name?

> > Or as a first step:
> > 
> > a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable)
> > b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there
> > c) Fix the value of bs->growable for all drivers
> 
> Let's start from (c).  bdrv_file_open sets bs->growable = 1.  I think it
> should be removed and only the file protocol should set it.

This is probably right.

> Then we can add bdrv_revalidate and, for block_resize, call
> bdrv_revalidate+bdrv_truncate.  For bs->growable = 0 &&
> !bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual
> size is the same or bigger as the one requested, and fail otherwise.

This one not so much. bs->growable does not mean that you can use
bdrv_truncate. It rather means that you may write beyond the end of the
file even without truncating it first. Mabye bs->auto_grow would be a
better for it.

So bs->growable == true implies that bdrv_truncate() should be allowed
as well, because obviously changing the BDS size is possbile, but it's
not true the other way round.

Kevin
Peter Lieven March 7, 2013, 8:53 a.m. UTC | #13
On 06.03.2013 21:39, Paolo Bonzini wrote:
> Il 06/03/2013 20:03, Peter Lieven ha scritto:
>> Am 06.03.2013 19:48, schrieb Jeff Cody:
>>> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
>>>> Il 06/03/2013 19:14, Jeff Cody ha scritto:
>>>>> QCOW breaks with it using a normal raw posix file as a device.  As a
>>>>> test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
>>>>> drive mounted, and try to partition and format it.  QEMU now asserts.
>>>>>
>>>>> The nicety of being able to using truncate during a write call,
>>>>> especially for VHDX (which can have relatively large block/cluster
>>>>> sizes), so to grow the file sparsely in a dynamically allocated file.
>>>>
>>>> Perhaps we need two APIs, "truncate" and "revalidate".
>>>>
>>>> Truncate should be a no-op if (!bs->growable).
>>>>
>>>> Revalidate could be called by the block_resize monitor command with no
>>>> size specified.
>>>>
>>>> Paolo
>>>
>>> I think that is a good solution.  Is it better to have "truncate" and
>>> "revalidate", or "truncate" and "grow", with grow being a subset of
>>> truncate, with fewer restrictions?  There may still be operations
>>> where it is OK to grow a file, but not OK to shrink it.
>>
>> Or as a first step:
>>
>> a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable)
>> b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there
>> c) Fix the value of bs->growable for all drivers
>
> Let's start from (c).  bdrv_file_open sets bs->growable = 1.  I think it
> should be removed and only the file protocol should set it.
>
> Then we can add bdrv_revalidate and, for block_resize, call
> bdrv_revalidate+bdrv_truncate.  For bs->growable = 0 &&
> !bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual
> size is the same or bigger as the one requested, and fail otherwise.
>
> Paolo
>

Regarding brd_drain_all(). Is the fix right to call it only on device shrink?
In this case it has to be added to iscsi_truncate as well.

Peter
Peter Lieven March 7, 2013, 8:56 a.m. UTC | #14
On 07.03.2013 09:50, Kevin Wolf wrote:
> Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben:
>> Il 06/03/2013 20:03, Peter Lieven ha scritto:
>>> Am 06.03.2013 19:48, schrieb Jeff Cody:
>>>> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
>>>>> Il 06/03/2013 19:14, Jeff Cody ha scritto:
>>>>>> QCOW breaks with it using a normal raw posix file as a device.  As a
>>>>>> test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
>>>>>> drive mounted, and try to partition and format it.  QEMU now asserts.
>>>>>>
>>>>>> The nicety of being able to using truncate during a write call,
>>>>>> especially for VHDX (which can have relatively large block/cluster
>>>>>> sizes), so to grow the file sparsely in a dynamically allocated file.
>>>>>
>>>>> Perhaps we need two APIs, "truncate" and "revalidate".
>>>>>
>>>>> Truncate should be a no-op if (!bs->growable).
>>>>>
>>>>> Revalidate could be called by the block_resize monitor command with no
>>>>> size specified.
>>>>>
>>>>> Paolo
>>>>
>>>> I think that is a good solution.  Is it better to have "truncate" and
>>>> "revalidate", or "truncate" and "grow", with grow being a subset of
>>>> truncate, with fewer restrictions?  There may still be operations
>>>> where it is OK to grow a file, but not OK to shrink it.
>
> What semantics would the both operations have? Is truncate the same as
> it used to be? I don't really understand what "revalidate" would do, it
> sounds like a read-only operation from its name?
>
>>> Or as a first step:
>>>
>>> a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable)
>>> b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there
>>> c) Fix the value of bs->growable for all drivers
>>
>> Let's start from (c).  bdrv_file_open sets bs->growable = 1.  I think it
>> should be removed and only the file protocol should set it.
>
> This is probably right.

If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This
flag was introduced as a fix for this problem.

bdrv_check_byte_request() does nothing useful if bs->growable is 1.

Peter
Kevin Wolf March 7, 2013, 8:57 a.m. UTC | #15
Am 06.03.2013 um 19:27 hat Peter Lieven geschrieben:
> Am 06.03.2013 19:06, schrieb Paolo Bonzini:
> > Il 06/03/2013 18:50, Peter Lieven ha scritto:
> >>>> Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this breaks
> >>>> QCOW images, as well other future image formats (such as VHDX) that may call
> >>>> bdrv_truncate(bs->file) from within a read/write operation.  For example, QCOW
> >>>> will cause an assert, due to tracked_requests not being empty (since the
> >>>> read/write that called bdrv_truncate() is still in progress).
> > 
> > I'm not sure such bdrv_truncate calls are necessary.  QCOW2 doesn't have
> > them (almost; there is one in qcow2_write_compressed, I'm not even sure
> > that one is necessary though), and I think QCOW's breaks using it with a
> > block device as a backing file.
> 
> I think we have to check the sense of bs->growable nevertheless. It is set
> to 1 for all drivers which can't be right?!

For everything that goes through bdrv_file_open(), which are protocol
drivers, not format drivers. It is required for files so that formats
can write past EOF; for other drivers that can't actually grow their
backing storage it doesn't hurt because they will return an eror anyway
when you write to it.

"bdrv_truncate() works" and "bs->growable == true" are totally different
things, so even though it doesn't look right at the first sight,
bs->growable does its job correctly.

In your other mail you're talking about setting it for raw, qcow2, VHDX.
This would be discussing on the entirely wrong level, this isn't about
formats, but about protocols (file, host_device, nbd, iscsi, http,
vvfat...), that are below the format drivers.

Kevin
Kevin Wolf March 7, 2013, 8:59 a.m. UTC | #16
Am 07.03.2013 um 09:53 hat Peter Lieven geschrieben:
> On 06.03.2013 21:39, Paolo Bonzini wrote:
> >Il 06/03/2013 20:03, Peter Lieven ha scritto:
> >>Am 06.03.2013 19:48, schrieb Jeff Cody:
> >>>On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
> >>>>Il 06/03/2013 19:14, Jeff Cody ha scritto:
> >>>>>QCOW breaks with it using a normal raw posix file as a device.  As a
> >>>>>test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
> >>>>>drive mounted, and try to partition and format it.  QEMU now asserts.
> >>>>>
> >>>>>The nicety of being able to using truncate during a write call,
> >>>>>especially for VHDX (which can have relatively large block/cluster
> >>>>>sizes), so to grow the file sparsely in a dynamically allocated file.
> >>>>
> >>>>Perhaps we need two APIs, "truncate" and "revalidate".
> >>>>
> >>>>Truncate should be a no-op if (!bs->growable).
> >>>>
> >>>>Revalidate could be called by the block_resize monitor command with no
> >>>>size specified.
> >>>>
> >>>>Paolo
> >>>
> >>>I think that is a good solution.  Is it better to have "truncate" and
> >>>"revalidate", or "truncate" and "grow", with grow being a subset of
> >>>truncate, with fewer restrictions?  There may still be operations
> >>>where it is OK to grow a file, but not OK to shrink it.
> >>
> >>Or as a first step:
> >>
> >>a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable)
> >>b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there
> >>c) Fix the value of bs->growable for all drivers
> >
> >Let's start from (c).  bdrv_file_open sets bs->growable = 1.  I think it
> >should be removed and only the file protocol should set it.
> >
> >Then we can add bdrv_revalidate and, for block_resize, call
> >bdrv_revalidate+bdrv_truncate.  For bs->growable = 0 &&
> >!bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual
> >size is the same or bigger as the one requested, and fail otherwise.
> >
> >Paolo
> >
> 
> Regarding brd_drain_all(). Is the fix right to call it only on device shrink?
> In this case it has to be added to iscsi_truncate as well.

The real fix would bdrv_drain(bs). I hope we're not too far away from
that today, even though we're not quite there.

Kevin
Kevin Wolf March 7, 2013, 9:03 a.m. UTC | #17
Am 07.03.2013 um 09:56 hat Peter Lieven geschrieben:
> On 07.03.2013 09:50, Kevin Wolf wrote:
> >Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben:
> >>Il 06/03/2013 20:03, Peter Lieven ha scritto:
> >>>Am 06.03.2013 19:48, schrieb Jeff Cody:
> >>>>On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
> >>>>>Il 06/03/2013 19:14, Jeff Cody ha scritto:
> >>>>>>QCOW breaks with it using a normal raw posix file as a device.  As a
> >>>>>>test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
> >>>>>>drive mounted, and try to partition and format it.  QEMU now asserts.
> >>>>>>
> >>>>>>The nicety of being able to using truncate during a write call,
> >>>>>>especially for VHDX (which can have relatively large block/cluster
> >>>>>>sizes), so to grow the file sparsely in a dynamically allocated file.
> >>>>>
> >>>>>Perhaps we need two APIs, "truncate" and "revalidate".
> >>>>>
> >>>>>Truncate should be a no-op if (!bs->growable).
> >>>>>
> >>>>>Revalidate could be called by the block_resize monitor command with no
> >>>>>size specified.
> >>>>>
> >>>>>Paolo
> >>>>
> >>>>I think that is a good solution.  Is it better to have "truncate" and
> >>>>"revalidate", or "truncate" and "grow", with grow being a subset of
> >>>>truncate, with fewer restrictions?  There may still be operations
> >>>>where it is OK to grow a file, but not OK to shrink it.
> >
> >What semantics would the both operations have? Is truncate the same as
> >it used to be? I don't really understand what "revalidate" would do, it
> >sounds like a read-only operation from its name?
> >
> >>>Or as a first step:
> >>>
> >>>a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable)
> >>>b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there
> >>>c) Fix the value of bs->growable for all drivers
> >>
> >>Let's start from (c).  bdrv_file_open sets bs->growable = 1.  I think it
> >>should be removed and only the file protocol should set it.
> >
> >This is probably right.
> 
> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This
> flag was introduced as a fix for this problem.
> 
> bdrv_check_byte_request() does nothing useful if bs->growable is 1.

Don't ignore the difference between bdrv_open() and bdrv_file_open().
Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
opened through bdrv_open() and has bs->growable = false. Its bs->file is
using the file protocol (raw-posix driver) and opened by
bdrv_file_open(). This one has bs->file->growable = true so that qcow2
can write to newly allocated areas without calling bdrv_truncate()
first.

Kevin
Peter Lieven March 7, 2013, 9:16 a.m. UTC | #18
Am 07.03.2013 um 10:03 schrieb Kevin Wolf <kwolf@redhat.com>:

> Am 07.03.2013 um 09:56 hat Peter Lieven geschrieben:
>> On 07.03.2013 09:50, Kevin Wolf wrote:
>>> Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben:
>>>> Il 06/03/2013 20:03, Peter Lieven ha scritto:
>>>>> Am 06.03.2013 19:48, schrieb Jeff Cody:
>>>>>> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
>>>>>>> Il 06/03/2013 19:14, Jeff Cody ha scritto:
>>>>>>>> QCOW breaks with it using a normal raw posix file as a device.  As a
>>>>>>>> test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
>>>>>>>> drive mounted, and try to partition and format it.  QEMU now asserts.
>>>>>>>> 
>>>>>>>> The nicety of being able to using truncate during a write call,
>>>>>>>> especially for VHDX (which can have relatively large block/cluster
>>>>>>>> sizes), so to grow the file sparsely in a dynamically allocated file.
>>>>>>> 
>>>>>>> Perhaps we need two APIs, "truncate" and "revalidate".
>>>>>>> 
>>>>>>> Truncate should be a no-op if (!bs->growable).
>>>>>>> 
>>>>>>> Revalidate could be called by the block_resize monitor command with no
>>>>>>> size specified.
>>>>>>> 
>>>>>>> Paolo
>>>>>> 
>>>>>> I think that is a good solution.  Is it better to have "truncate" and
>>>>>> "revalidate", or "truncate" and "grow", with grow being a subset of
>>>>>> truncate, with fewer restrictions?  There may still be operations
>>>>>> where it is OK to grow a file, but not OK to shrink it.
>>> 
>>> What semantics would the both operations have? Is truncate the same as
>>> it used to be? I don't really understand what "revalidate" would do, it
>>> sounds like a read-only operation from its name?
>>> 
>>>>> Or as a first step:
>>>>> 
>>>>> a) Call brdv_drain_all() only if the device is shrinked (independently of !bs->growable)
>>>>> b) Call brdv_drain_all() inside iscsi_truncate() because it is a special requirement there
>>>>> c) Fix the value of bs->growable for all drivers
>>>> 
>>>> Let's start from (c).  bdrv_file_open sets bs->growable = 1.  I think it
>>>> should be removed and only the file protocol should set it.
>>> 
>>> This is probably right.
>> 
>> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This
>> flag was introduced as a fix for this problem.
>> 
>> bdrv_check_byte_request() does nothing useful if bs->growable is 1.
> 
> Don't ignore the difference between bdrv_open() and bdrv_file_open().
> Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
> opened through bdrv_open() and has bs->growable = false. Its bs->file is
> using the file protocol (raw-posix driver) and opened by
> bdrv_file_open(). This one has bs->file->growable = true so that qcow2
> can write to newly allocated areas without calling bdrv_truncate()
> first.

Sorry, I have to admin I am little confused by what is happening in bdrv_open().

However, what I can say is that bs->growable is 1 for an iSCSI backed
harddrive and I wonder how this can happen if bdrv_file_open is not used for
opening it because that is the only place where bs->growable is set to 1.

cmdline:
x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0 -vnc :1 -boot dc -monitor stdio

Peter

> 
> Kevin
Kevin Wolf March 7, 2013, 9:22 a.m. UTC | #19
Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben:
> >> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This
> >> flag was introduced as a fix for this problem.
> >> 
> >> bdrv_check_byte_request() does nothing useful if bs->growable is 1.
> > 
> > Don't ignore the difference between bdrv_open() and bdrv_file_open().
> > Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
> > opened through bdrv_open() and has bs->growable = false. Its bs->file is
> > using the file protocol (raw-posix driver) and opened by
> > bdrv_file_open(). This one has bs->file->growable = true so that qcow2
> > can write to newly allocated areas without calling bdrv_truncate()
> > first.
> 
> Sorry, I have to admin I am little confused by what is happening in bdrv_open().
> 
> However, what I can say is that bs->growable is 1 for an iSCSI backed
> harddrive and I wonder how this can happen if bdrv_file_open is not used for
> opening it because that is the only place where bs->growable is set to 1.
> 
> cmdline:
> x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0 -vnc :1 -boot dc -monitor stdio

It is used for the iscsi driver. You have a raw BDS (growable == false)
on top of an iscsi one (growable == true).

Kevin
Peter Lieven March 7, 2013, 9:25 a.m. UTC | #20
Am 07.03.2013 um 10:22 schrieb Kevin Wolf <kwolf@redhat.com>:

> Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben:
>>>> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This
>>>> flag was introduced as a fix for this problem.
>>>> 
>>>> bdrv_check_byte_request() does nothing useful if bs->growable is 1.
>>> 
>>> Don't ignore the difference between bdrv_open() and bdrv_file_open().
>>> Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
>>> opened through bdrv_open() and has bs->growable = false. Its bs->file is
>>> using the file protocol (raw-posix driver) and opened by
>>> bdrv_file_open(). This one has bs->file->growable = true so that qcow2
>>> can write to newly allocated areas without calling bdrv_truncate()
>>> first.
>> 
>> Sorry, I have to admin I am little confused by what is happening in bdrv_open().
>> 
>> However, what I can say is that bs->growable is 1 for an iSCSI backed
>> harddrive and I wonder how this can happen if bdrv_file_open is not used for
>> opening it because that is the only place where bs->growable is set to 1.
>> 
>> cmdline:
>> x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0 -vnc :1 -boot dc -monitor stdio
> 
> It is used for the iscsi driver. You have a raw BDS (growable == false)
> on top of an iscsi one (growable == true).

Ok, but growable == true is wrong for the iSCSI driver isn`t it?

Peter
Kevin Wolf March 7, 2013, 10 a.m. UTC | #21
Am 07.03.2013 um 10:25 hat Peter Lieven geschrieben:
> 
> Am 07.03.2013 um 10:22 schrieb Kevin Wolf <kwolf@redhat.com>:
> 
> > Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben:
> >>>> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This
> >>>> flag was introduced as a fix for this problem.
> >>>> 
> >>>> bdrv_check_byte_request() does nothing useful if bs->growable is 1.
> >>> 
> >>> Don't ignore the difference between bdrv_open() and bdrv_file_open().
> >>> Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
> >>> opened through bdrv_open() and has bs->growable = false. Its bs->file is
> >>> using the file protocol (raw-posix driver) and opened by
> >>> bdrv_file_open(). This one has bs->file->growable = true so that qcow2
> >>> can write to newly allocated areas without calling bdrv_truncate()
> >>> first.
> >> 
> >> Sorry, I have to admin I am little confused by what is happening in bdrv_open().
> >> 
> >> However, what I can say is that bs->growable is 1 for an iSCSI backed
> >> harddrive and I wonder how this can happen if bdrv_file_open is not used for
> >> opening it because that is the only place where bs->growable is set to 1.
> >> 
> >> cmdline:
> >> x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0 -vnc :1 -boot dc -monitor stdio
> > 
> > It is used for the iscsi driver. You have a raw BDS (growable == false)
> > on top of an iscsi one (growable == true).
> 
> Ok, but growable == true is wrong for the iSCSI driver isn`t it?

I guess it depends on your definition. If you say that growable includes
the capability of growing the image, then yes, it's wrong. If you only
interpret it as permission to write beyond EOF (if the driver supports
that), then it's right even though any such attempt will result in an
error.

Practically speaking, the difference is between -EIO returned from
bdrv_check_byte_request() and -EIO from iscsi_aio_write16_cb(), i.e.
the result is the same.

Kevin
Peter Lieven March 7, 2013, 10:22 a.m. UTC | #22
Am 07.03.2013 um 11:00 schrieb Kevin Wolf <kwolf@redhat.com>:

> Am 07.03.2013 um 10:25 hat Peter Lieven geschrieben:
>> 
>> Am 07.03.2013 um 10:22 schrieb Kevin Wolf <kwolf@redhat.com>:
>> 
>>> Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben:
>>>>>> If bs->growable is 1 for all drivers, whats the fix status of CVE-2008-0928? This
>>>>>> flag was introduced as a fix for this problem.
>>>>>> 
>>>>>> bdrv_check_byte_request() does nothing useful if bs->growable is 1.
>>>>> 
>>>>> Don't ignore the difference between bdrv_open() and bdrv_file_open().
>>>>> Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
>>>>> opened through bdrv_open() and has bs->growable = false. Its bs->file is
>>>>> using the file protocol (raw-posix driver) and opened by
>>>>> bdrv_file_open(). This one has bs->file->growable = true so that qcow2
>>>>> can write to newly allocated areas without calling bdrv_truncate()
>>>>> first.
>>>> 
>>>> Sorry, I have to admin I am little confused by what is happening in bdrv_open().
>>>> 
>>>> However, what I can say is that bs->growable is 1 for an iSCSI backed
>>>> harddrive and I wonder how this can happen if bdrv_file_open is not used for
>>>> opening it because that is the only place where bs->growable is set to 1.
>>>> 
>>>> cmdline:
>>>> x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0 -vnc :1 -boot dc -monitor stdio
>>> 
>>> It is used for the iscsi driver. You have a raw BDS (growable == false)
>>> on top of an iscsi one (growable == true).
>> 
>> Ok, but growable == true is wrong for the iSCSI driver isn`t it?
> 
> I guess it depends on your definition. If you say that growable includes
> the capability of growing the image, then yes, it's wrong. If you only
> interpret it as permission to write beyond EOF (if the driver supports
> that), then it's right even though any such attempt will result in an
> error.
> 
> Practically speaking, the difference is between -EIO returned from
> bdrv_check_byte_request() and -EIO from iscsi_aio_write16_cb(), i.e.
> the result is the same.

Yes, but there are many broken storage implementations outside which might react
freaky if there is a read/write beyond the LUN size or with negative offset.
The check_request would block such requests earlier protecting the infrastructure.
I have a case open with the vendor of a storage system where I am able to crash
the storage sending illegal requests. I personally would feel more comfortable
if illegal requests where just not possible.

Peter
Paolo Bonzini March 7, 2013, 4:09 p.m. UTC | #23
Il 07/03/2013 09:53, Peter Lieven ha scritto:
>>
>> Then we can add bdrv_revalidate and, for block_resize, call
>> bdrv_revalidate+bdrv_truncate.  For bs->growable = 0 &&
>> !bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual
>> size is the same or bigger as the one requested, and fail otherwise.
>>
> 
> Regarding brd_drain_all(). Is the fix right to call it only on device
> shrink?

I think the fix is to only call it for the monitor command.  Optionally,
when shrinking, assert that there are no requests in flight.

Paolo

> In this case it has to be added to iscsi_truncate as well.
Paolo Bonzini March 7, 2013, 4:45 p.m. UTC | #24
Il 07/03/2013 09:50, Kevin Wolf ha scritto:
> Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben:
>> Il 06/03/2013 20:03, Peter Lieven ha scritto:
>>> Am 06.03.2013 19:48, schrieb Jeff Cody:
>>>> On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
>>>>> Il 06/03/2013 19:14, Jeff Cody ha scritto:
>>>>>> QCOW breaks with it using a normal raw posix file as a device.  As a
>>>>>> test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
>>>>>> drive mounted, and try to partition and format it.  QEMU now asserts.
>>>>>>
>>>>>> The nicety of being able to using truncate during a write call,
>>>>>> especially for VHDX (which can have relatively large block/cluster
>>>>>> sizes), so to grow the file sparsely in a dynamically allocated file.
>>>>>
>>>>> Perhaps we need two APIs, "truncate" and "revalidate".
>>>>>
>>>>> Truncate should be a no-op if (!bs->growable).
>>>>>
>>>>> Revalidate could be called by the block_resize monitor command with no
>>>>> size specified.
>>>>
>>>> I think that is a good solution.  Is it better to have "truncate" and
>>>> "revalidate", or "truncate" and "grow", with grow being a subset of
>>>> truncate, with fewer restrictions?  There may still be operations
>>>> where it is OK to grow a file, but not OK to shrink it.
> 
> What semantics would the both operations have? Is truncate the same as
> it used to be? I don't really understand what "revalidate" would do, it
> sounds like a read-only operation from its name?

Revalidate would update the current size.  Files fetch it on all calls
to bdrv_getlength, but other backends cache it.  It would visit the BDS
->file chain from the bottom (bs->file->file->file...) to the top
implicitly, with no need for an explicit recursion in the callback (like
we do for flush_to_os).  Before starting the recursion, bdrv_revalidate
does a bdrv_drain_all, so using the current iscsi_truncate for
iscsi_revalidate would be fine.

The block_resize command will always call revalidate before doing
anything.  Then block_resize will try to do a bdrv_truncate if the
callback is there.

Another change that could make sense, is to make bdrv_truncate succeed
if there is no bdrv_truncate callback but bs is larger than the
requested size.  This would be a difference from today's

    if (!drv->bdrv_truncate)
        return -ENOTSUP;

And it could even do the same if the callback is there, but returns
ENOTSUP.  It would simplify some code, like this in raw-posix.c's
raw_truncate:

    } else if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
       if (offset > raw_getlength(bs)) {
           return -EINVAL;
       }
    } else {
        return -ENOTSUP;
    }

The "else if" branch can just go away.

The remaining question is about usage of bdrv_truncate in the formats.
One important aspect is that both QCOW and VHDX, in addition to using
bdrv_truncate to extend the file, rely on bdrv_getlength to fetch the
last _used_ byte of the file, not the available space.  I think we can
say that bdrv_getlength behaves like that iff bs->growable.  If so, QCOW
and VHDX should fail to open for write if the underlying file has
!bs->growable.  Which will never fail right now, but it will as soon as
we implement this:

>> Let's start from (c).  bdrv_file_open sets bs->growable = 1.  I think it
>> should be removed and only the file protocol should set it.
> 
> This is probably right.

Good.  More precisely, raw_open should only set it if the file is a
regular file.

Paolo
Peter Lieven March 8, 2013, 7:53 a.m. UTC | #25
On 07.03.2013 17:09, Paolo Bonzini wrote:
> Il 07/03/2013 09:53, Peter Lieven ha scritto:
>>>
>>> Then we can add bdrv_revalidate and, for block_resize, call
>>> bdrv_revalidate+bdrv_truncate.  For bs->growable = 0 &&
>>> !bs->drv->bdrv_truncate, bdrv_truncate can just check that the actual
>>> size is the same or bigger as the one requested, and fail otherwise.
>>>
>>
>> Regarding brd_drain_all(). Is the fix right to call it only on device
>> shrink?
>
> I think the fix is to only call it for the monitor command.  Optionally,
> when shrinking, assert that there are no requests in flight.

Okay.

What is the plan? just fix this or fix the whole thing (which seems to be some
work).

The suggested patch from Jeff will break iscsi_truncate as bs->growable is 1 currently.

Peter

>
> Paolo
>
>> In this case it has to be added to iscsi_truncate as well.
>
Paolo Bonzini March 8, 2013, 9:23 a.m. UTC | #26
Il 08/03/2013 08:53, Peter Lieven ha scritto:
>>
>> I think the fix is to only call it for the monitor command.  Optionally,
>> when shrinking, assert that there are no requests in flight.
> 
> Okay.
> 
> What is the plan? just fix this or fix the whole thing (which seems to
> be some
> work).
> 
> The suggested patch from Jeff will break iscsi_truncate as bs->growable
> is 1 currently.

I guess it's up to Kevin and Jeff.

Paolo
Kevin Wolf March 8, 2013, 9:35 a.m. UTC | #27
Am 08.03.2013 um 10:23 hat Paolo Bonzini geschrieben:
> Il 08/03/2013 08:53, Peter Lieven ha scritto:
> >>
> >> I think the fix is to only call it for the monitor command.  Optionally,
> >> when shrinking, assert that there are no requests in flight.
> > 
> > Okay.
> > 
> > What is the plan? just fix this or fix the whole thing (which seems to
> > be some
> > work).
> > 
> > The suggested patch from Jeff will break iscsi_truncate as bs->growable
> > is 1 currently.
> 
> I guess it's up to Kevin and Jeff.

Someone needs to send a fix for the qcow1 (and probably vmdk)
regression, otherwise I'll have to revert the patch.

Kevin
Peter Lieven March 8, 2013, 11:46 a.m. UTC | #28
Am 08.03.2013 um 10:35 schrieb Kevin Wolf <kwolf@redhat.com>:

> Am 08.03.2013 um 10:23 hat Paolo Bonzini geschrieben:
>> Il 08/03/2013 08:53, Peter Lieven ha scritto:
>>>> 
>>>> I think the fix is to only call it for the monitor command.  Optionally,
>>>> when shrinking, assert that there are no requests in flight.
>>> 
>>> Okay.
>>> 
>>> What is the plan? just fix this or fix the whole thing (which seems to
>>> be some
>>> work).
>>> 
>>> The suggested patch from Jeff will break iscsi_truncate as bs->growable
>>> is 1 currently.
>> 
>> I guess it's up to Kevin and Jeff.
> 
> Someone needs to send a fix for the qcow1 (and probably vmdk)
> regression, otherwise I'll have to revert the patch.

What about Paolos suggestion to call bdrv_drain_all() from the block_resize
command and not in bdrv_truncate? It is not a the complete solution, but it
will fix the regression. However, what happens if someone resizes a qcow2
device?

Peter

> 
> Kevin
Kevin Wolf March 8, 2013, 11:56 a.m. UTC | #29
Am 08.03.2013 um 12:46 hat Peter Lieven geschrieben:
> 
> Am 08.03.2013 um 10:35 schrieb Kevin Wolf <kwolf@redhat.com>:
> 
> > Am 08.03.2013 um 10:23 hat Paolo Bonzini geschrieben:
> >> Il 08/03/2013 08:53, Peter Lieven ha scritto:
> >>>> 
> >>>> I think the fix is to only call it for the monitor command.  Optionally,
> >>>> when shrinking, assert that there are no requests in flight.
> >>> 
> >>> Okay.
> >>> 
> >>> What is the plan? just fix this or fix the whole thing (which seems to
> >>> be some
> >>> work).
> >>> 
> >>> The suggested patch from Jeff will break iscsi_truncate as bs->growable
> >>> is 1 currently.
> >> 
> >> I guess it's up to Kevin and Jeff.
> > 
> > Someone needs to send a fix for the qcow1 (and probably vmdk)
> > regression, otherwise I'll have to revert the patch.
> 
> What about Paolos suggestion to call bdrv_drain_all() from the block_resize
> command and not in bdrv_truncate? It is not a the complete solution, but it
> will fix the regression. However, what happens if someone resizes a qcow2
> device?

I suppose you mean qcow1? It doesn't support resizing images, but even
if it did, this shouldn't be a problem: The problematic case is the call
of bdrv_drain_all() during a read/write function, which would have to
wait for itself to complete. As soon as you restrict the
bdrv_drain_all() to the monitor, waiting for in-flight I/Os should just
work.

Kevin
Peter Lieven March 9, 2013, 9:36 a.m. UTC | #30
Am 08.03.2013 12:56, schrieb Kevin Wolf:
> Am 08.03.2013 um 12:46 hat Peter Lieven geschrieben:
>>
>> Am 08.03.2013 um 10:35 schrieb Kevin Wolf <kwolf@redhat.com>:
>>
>>> Am 08.03.2013 um 10:23 hat Paolo Bonzini geschrieben:
>>>> Il 08/03/2013 08:53, Peter Lieven ha scritto:
>>>>>>
>>>>>> I think the fix is to only call it for the monitor command.  Optionally,
>>>>>> when shrinking, assert that there are no requests in flight.
>>>>>
>>>>> Okay.
>>>>>
>>>>> What is the plan? just fix this or fix the whole thing (which seems to
>>>>> be some
>>>>> work).
>>>>>
>>>>> The suggested patch from Jeff will break iscsi_truncate as bs->growable
>>>>> is 1 currently.
>>>>
>>>> I guess it's up to Kevin and Jeff.
>>>
>>> Someone needs to send a fix for the qcow1 (and probably vmdk)
>>> regression, otherwise I'll have to revert the patch.
>>
>> What about Paolos suggestion to call bdrv_drain_all() from the block_resize
>> command and not in bdrv_truncate? It is not a the complete solution, but it
>> will fix the regression. However, what happens if someone resizes a qcow2
>> device?
> 
> I suppose you mean qcow1? It doesn't support resizing images, but even
> if it did, this shouldn't be a problem: The problematic case is the call
> of bdrv_drain_all() during a read/write function, which would have to
> wait for itself to complete. As soon as you restrict the
> bdrv_drain_all() to the monitor, waiting for in-flight I/Os should just
> work.

Ok, then please ignore / revert the patch that added bdrv_drain_all() in bdrv_truncate() and
I will sent a new one that adds bdrv_drain_all() to qmp_block_resize().

Peter


> 
> Kevin
>
diff mbox

Patch

diff --git a/block.c b/block.c
index 124a9eb..dce844c 100644
--- a/block.c
+++ b/block.c
@@ -2450,8 +2450,14 @@  int bdrv_truncate(BlockDriverState *bs, int64_t offset)
     if (bdrv_in_use(bs))
         return -EBUSY;
 
-    /* There better not be any in-flight IOs when we truncate the device. */
-    bdrv_drain_all();
+    /* Don't force a drain if we are just growing the file - otherwise,
+     * using bdrv_truncate() from within a block driver in a read/write
+     * operation will cause an assert, because bdrv_drain_all() will assert if
+     * a tracked_request is still in progress. */
+    if (!bs->growable || offset < bdrv_getlength(bs)) {
+        /* There better not be any in-flight IOs when we truncate the device. */
+        bdrv_drain_all();
+    }
 
     ret = drv->bdrv_truncate(bs, offset);
     if (ret == 0) {