diff mbox

[Qemu-block,v2,0/3] block: Warn about usage of growing formats over non-growable protocols

Message ID 554B6EBB.1010001@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini May 7, 2015, 1:55 p.m. UTC
On 07/05/2015 15:20, Kevin Wolf wrote:
> > Does ENOSPC over LVM (dm-linear) work at all, and who generates the
> > ENOSPC there?
>
> The LVM use case is what oVirt uses, so I'm pretty sure that it works.
> I'm now sure who generates the ENOSPC, but it's not qemu anyway. If I
> had to guess, I'd say that the kernel block layer might just forbid
> writing after EOF for any block device.

Indeed, though it's VFS (blkdev_write_iter in fs/block_dev.c) and not
the block layer.  It looks like we need this:


Paolo

Comments

Kevin Wolf May 7, 2015, 2:07 p.m. UTC | #1
Am 07.05.2015 um 15:55 hat Paolo Bonzini geschrieben:
> 
> 
> On 07/05/2015 15:20, Kevin Wolf wrote:
> > > Does ENOSPC over LVM (dm-linear) work at all, and who generates the
> > > ENOSPC there?
> >
> > The LVM use case is what oVirt uses, so I'm pretty sure that it works.
> > I'm now sure who generates the ENOSPC, but it's not qemu anyway. If I
> > had to guess, I'd say that the kernel block layer might just forbid
> > writing after EOF for any block device.
> 
> Indeed, though it's VFS (blkdev_write_iter in fs/block_dev.c) and not
> the block layer.  It looks like we need this:
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 93e46f3..e54c433 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -461,7 +461,7 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
>      }
>  
>      if (offset > len || len - offset < size) {
> -        return -EIO;
> +        return -ENOSPC;
>      }

This is not right for two reasons: The first is that this is
BlockBackend code and it wouldn't even take effect for the qcow2 case
where we're writing past EOF only on the protocol layer. The second is
that -ENOSPC is only for writes and not for reads.

For the protocol level, bdrv_aligned_preadv() has code to handle reads
past EOF if bs->zero_beyond_eof is set. This is always the case, except
for qcow2, which has the snapshot VM state after EOF, so the driver is
called for that.

For writes, the driver is always called. The expectiation is that beyond
EOF it resizes the image file if it can, and returns -ENOSPC if it can't.
We could change this to have a check directly in bdrv_aligned_pwrite()
and then drivers would have to advertise whether they can extend a file
beyond EOF or not so we know whether to apply the check or not
(essentially the growable flag that Max wants to add), but I'm not sure
what we would win with that.

Kevin
Paolo Bonzini May 7, 2015, 2:16 p.m. UTC | #2
On 07/05/2015 16:07, Kevin Wolf wrote:
> This is not right for two reasons: The first is that this is
> BlockBackend code

I think it would take effect for the qemu-nbd case though.

> and it wouldn't even take effect for the qcow2 case
> where we're writing past EOF only on the protocol layer. The second is
> that -ENOSPC is only for writes and not for reads.

This is right.

Reads in the kernel return 0, but in QEMU we do not want that.  The code
currently returns -EIO, but perhaps -EINVAL is a better match.  It also
happens to be what Linux returns for discards.

Paolo

> For the protocol level, bdrv_aligned_preadv() has code to handle reads
> past EOF if bs->zero_beyond_eof is set. This is always the case, except
> for qcow2, which has the snapshot VM state after EOF, so the driver is
> called for that.
> 
> For writes, the driver is always called. The expectiation is that beyond
> EOF it resizes the image file if it can, and returns -ENOSPC if it can't.
> We could change this to have a check directly in bdrv_aligned_pwrite()
> and then drivers would have to advertise whether they can extend a file
> beyond EOF or not so we know whether to apply the check or not
> (essentially the growable flag that Max wants to add), but I'm not sure
> what we would win with that.
Kevin Wolf May 7, 2015, 2:34 p.m. UTC | #3
Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben:
> 
> 
> On 07/05/2015 16:07, Kevin Wolf wrote:
> > This is not right for two reasons: The first is that this is
> > BlockBackend code
> 
> I think it would take effect for the qemu-nbd case though.

Oh, you want to change the server code rather than the client?

Wait... Are you saying that NBD sends a (platform specific) errno value
over the network? :-/

In theory, what error code the NBD server needs to send should be
specified by the NBD protocol. Am I right to assume that it doesn't do
that? In any case, I'm not sure whether qemu's internal error code
should change just for NBD. Producing the right error code for the
protocol is the job of nbd_co_receive_request().

> > and it wouldn't even take effect for the qcow2 case
> > where we're writing past EOF only on the protocol layer. The second is
> > that -ENOSPC is only for writes and not for reads.
> 
> This is right.
> 
> Reads in the kernel return 0, but in QEMU we do not want that.  The code
> currently returns -EIO, but perhaps -EINVAL is a better match.  It also
> happens to be what Linux returns for discards.

Perhaps it is, yes. It shouldn't make a difference for guests anyway.
(Unlike -ENOSPC for writes, which would trigger werror=enospc! That's
most likely not what we want.)

Kevin
Paolo Bonzini May 7, 2015, 2:50 p.m. UTC | #4
On 07/05/2015 16:34, Kevin Wolf wrote:
> Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben:
>>
>>
>> On 07/05/2015 16:07, Kevin Wolf wrote:
>>> This is not right for two reasons: The first is that this is
>>> BlockBackend code
>>
>> I think it would take effect for the qemu-nbd case though.
> 
> Oh, you want to change the server code rather than the client?

Yes.

> Wait... Are you saying that NBD sends a (platform specific) errno value
> over the network? :-/

Yes. :/  That said, at least the error codes that Linux places in
/usr/include/asm/errno-base.h seem to be pretty much standard---at least
Windows and most Unices share them---with the exception of EAGAIN.

I'll send a patch to NBD to standardize the set of error codes that it
sends.

> In theory, what error code the NBD server needs to send should be
> specified by the NBD protocol. Am I right to assume that it doesn't do
> that?

Nope.

> In any case, I'm not sure whether qemu's internal error code
> should change just for NBD. Producing the right error code for the
> protocol is the job of nbd_co_receive_request().

Ok, so it shouldn't reach blk_check_request at all.  But then, we should
aim at making blk_check_request's checks assertions.

>>> and it wouldn't even take effect for the qcow2 case
>>> where we're writing past EOF only on the protocol layer. The second is
>>> that -ENOSPC is only for writes and not for reads.
>>
>> This is right.
>>
>> Reads in the kernel return 0, but in QEMU we do not want that.  The code
>> currently returns -EIO, but perhaps -EINVAL is a better match.  It also
>> happens to be what Linux returns for discards.
> 
> Perhaps it is, yes. It shouldn't make a difference for guests anyway.
> (Unlike -ENOSPC for writes, which would trigger werror=enospc! That's
> most likely not what we want.)

Yes, we want the check duplicated in all BlockBackend users.  Most of
them already do it, see the work that Markus did last year I think.

Paolo
Kevin Wolf May 8, 2015, 10:08 a.m. UTC | #5
Am 07.05.2015 um 16:50 hat Paolo Bonzini geschrieben:
> On 07/05/2015 16:34, Kevin Wolf wrote:
> > Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben:
> >>
> >>
> >> On 07/05/2015 16:07, Kevin Wolf wrote:
> >>> This is not right for two reasons: The first is that this is
> >>> BlockBackend code
> >>
> >> I think it would take effect for the qemu-nbd case though.
> > 
> > Oh, you want to change the server code rather than the client?
> 
> Yes.

Actually, considering all the information in this thread, I'm inclined
that we should change both sides. qemu-nbd because ENOSPC might be what
clients expect by analogy with Linux block devices, even if the
behaviour for accesses beyond the device size isn't specified in the NBD
protocol and the server might just do anything. As long as the behaviour
is undefined, it's nice to do what most people may expect.

And as the real fix change the nbd client, because even if new qemu-nbd
versions will be nice, we shouldn't rely on undefined behaviour. We know
that old qemu-nbd servers won't produce ENOSPC and I'm not sure what
other NBD servers do.

> > Wait... Are you saying that NBD sends a (platform specific) errno value
> > over the network? :-/
> 
> Yes. :/  That said, at least the error codes that Linux places in
> /usr/include/asm/errno-base.h seem to be pretty much standard---at least
> Windows and most Unices share them---with the exception of EAGAIN.
> 
> I'll send a patch to NBD to standardize the set of error codes that it
> sends.

Thanks, that will be helpful in the future.

Is this the right place to look up the spec?
http://sourceforge.net/p/nbd/code/ci/master/tree/doc/proto.txt

If so, the commands seem to be hopelessly underspecified, especially
with respect to error conditions. And where it says something about
errors, it doesn't make sense: The server is forbidden to reply to a
NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously)

> > In theory, what error code the NBD server needs to send should be
> > specified by the NBD protocol. Am I right to assume that it doesn't do
> > that?
> 
> Nope.
> 
> > In any case, I'm not sure whether qemu's internal error code
> > should change just for NBD. Producing the right error code for the
> > protocol is the job of nbd_co_receive_request().
> 
> Ok, so it shouldn't reach blk_check_request at all.  But then, we should
> aim at making blk_check_request's checks assertions.

Sounds fair as a goal, but I don't think all devices have such checks
yet. We've fixed the most common devices (IDE, scsi-disk and virtio-blk)
just a while ago.

> >>> and it wouldn't even take effect for the qcow2 case
> >>> where we're writing past EOF only on the protocol layer. The second is
> >>> that -ENOSPC is only for writes and not for reads.
> >>
> >> This is right.
> >>
> >> Reads in the kernel return 0, but in QEMU we do not want that.  The code
> >> currently returns -EIO, but perhaps -EINVAL is a better match.  It also
> >> happens to be what Linux returns for discards.
> > 
> > Perhaps it is, yes. It shouldn't make a difference for guests anyway.
> > (Unlike -ENOSPC for writes, which would trigger werror=enospc! That's
> > most likely not what we want.)
> 
> Yes, we want the check duplicated in all BlockBackend users.  Most of
> them already do it, see the work that Markus did last year I think.

I wouldn't call it duplicated because the action to take is different
for each device, but yes, the check belongs there.

Kevin
Paolo Bonzini May 8, 2015, 10:16 a.m. UTC | #6
On 08/05/2015 12:08, Kevin Wolf wrote:
> Actually, considering all the information in this thread, I'm inclined
> that we should change both sides. qemu-nbd because ENOSPC might be what
> clients expect by analogy with Linux block devices, even if the
> behaviour for accesses beyond the device size isn't specified in the NBD
> protocol and the server might just do anything. As long as the behaviour
> is undefined, it's nice to do what most people may expect.
> 
> And as the real fix change the nbd client, because even if new qemu-nbd
> versions will be nice, we shouldn't rely on undefined behaviour. We know
> that old qemu-nbd servers won't produce ENOSPC and I'm not sure what
> other NBD servers do.

Sounds like a plan.

> Thanks, that will be helpful in the future.
> 
> Is this the right place to look up the spec?
> http://sourceforge.net/p/nbd/code/ci/master/tree/doc/proto.txt

Yes.

> If so, the commands seem to be hopelessly underspecified, especially
> with respect to error conditions. And where it says something about
> errors, it doesn't make sense: The server is forbidden to reply to a
> NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously)

So does nbd-server. O:-)  Looks like you're reading the spec too
literally (which is never a bad thing).

>> Ok, so it shouldn't reach blk_check_request at all.  But then, we should
>> aim at making blk_check_request's checks assertions.
> 
> Sounds fair as a goal, but I don't think all devices have such checks
> yet. We've fixed the most common devices (IDE, scsi-disk and virtio-blk)
> just a while ago.

Indeed ("aim at").

Paolo
Kevin Wolf May 8, 2015, 10:34 a.m. UTC | #7
Am 08.05.2015 um 12:16 hat Paolo Bonzini geschrieben:
> On 08/05/2015 12:08, Kevin Wolf wrote:
> > If so, the commands seem to be hopelessly underspecified, especially
> > with respect to error conditions. And where it says something about
> > errors, it doesn't make sense: The server is forbidden to reply to a
> > NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously)
> 
> So does nbd-server. O:-)  Looks like you're reading the spec too
> literally (which is never a bad thing).

I don't think there is something like reading a spec too literally.
Specs are meant to be read literally. If a specification is open to
interpretation, you don't need it. So I'd rather say I've found a bug
in the spec. ;-)

As you already seem to be working on the NBD mailing list, do you want
to fix this, or should I subscribe and send a patch myself?

Kevin
Paolo Bonzini May 8, 2015, 11 a.m. UTC | #8
On 08/05/2015 12:34, Kevin Wolf wrote:
> Am 08.05.2015 um 12:16 hat Paolo Bonzini geschrieben:
>> On 08/05/2015 12:08, Kevin Wolf wrote:
>>> If so, the commands seem to be hopelessly underspecified, especially
>>> with respect to error conditions. And where it says something about
>>> errors, it doesn't make sense: The server is forbidden to reply to a
>>> NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously)
>>
>> So does nbd-server. O:-)  Looks like you're reading the spec too
>> literally (which is never a bad thing).
> 
> I don't think there is something like reading a spec too literally.
> Specs are meant to be read literally. If a specification is open to
> interpretation, you don't need it. So I'd rather say I've found a bug
> in the spec. ;-)

You have.  The bug is a single missing word ("successful") reply, but it
is still a bug.

There is another bug, in that it talks about "outstanding" writes rather
than completed" writes.

> As you already seem to be working on the NBD mailing list, do you want
> to fix this, or should I subscribe and send a patch myself?

You've been CCed on the fix.

Paolo
Max Reitz May 8, 2015, 12:58 p.m. UTC | #9
On 08.05.2015 12:08, Kevin Wolf wrote:
> Am 07.05.2015 um 16:50 hat Paolo Bonzini geschrieben:
>> On 07/05/2015 16:34, Kevin Wolf wrote:
>>> Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben:
>>>>
>>>> On 07/05/2015 16:07, Kevin Wolf wrote:
>>>>> This is not right for two reasons: The first is that this is
>>>>> BlockBackend code
>>>> I think it would take effect for the qemu-nbd case though.
>>> Oh, you want to change the server code rather than the client?
>> Yes.
> Actually, considering all the information in this thread, I'm inclined
> that we should change both sides. qemu-nbd because ENOSPC might be what
> clients expect by analogy with Linux block devices, even if the
> behaviour for accesses beyond the device size isn't specified in the NBD
> protocol and the server might just do anything. As long as the behaviour
> is undefined, it's nice to do what most people may expect.

It is practically defined by what the reference implementation does, and 
that is return EINVAL (as I said in the other thread), with the 
reasoning being that it's an invalid request. I concur, the client 
should simply not send a request beyond the export length, doing so is 
wrong. So it is the client that should catch this case and return ENOSPC.

> And as the real fix change the nbd client, because even if new qemu-nbd
> versions will be nice, we shouldn't rely on undefined behaviour. We know
> that old qemu-nbd servers won't produce ENOSPC and I'm not sure what
> other NBD servers do.

Return EINVAL.

Max

>>> Wait... Are you saying that NBD sends a (platform specific) errno value
>>> over the network? :-/
>> Yes. :/  That said, at least the error codes that Linux places in
>> /usr/include/asm/errno-base.h seem to be pretty much standard---at least
>> Windows and most Unices share them---with the exception of EAGAIN.
>>
>> I'll send a patch to NBD to standardize the set of error codes that it
>> sends.
> Thanks, that will be helpful in the future.
>
> Is this the right place to look up the spec?
> http://sourceforge.net/p/nbd/code/ci/master/tree/doc/proto.txt
>
> If so, the commands seem to be hopelessly underspecified, especially
> with respect to error conditions. And where it says something about
> errors, it doesn't make sense: The server is forbidden to reply to a
> NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously)
>
>>> In theory, what error code the NBD server needs to send should be
>>> specified by the NBD protocol. Am I right to assume that it doesn't do
>>> that?
>> Nope.
>>
>>> In any case, I'm not sure whether qemu's internal error code
>>> should change just for NBD. Producing the right error code for the
>>> protocol is the job of nbd_co_receive_request().
>> Ok, so it shouldn't reach blk_check_request at all.  But then, we should
>> aim at making blk_check_request's checks assertions.
> Sounds fair as a goal, but I don't think all devices have such checks
> yet. We've fixed the most common devices (IDE, scsi-disk and virtio-blk)
> just a while ago.
>
>>>>> and it wouldn't even take effect for the qcow2 case
>>>>> where we're writing past EOF only on the protocol layer. The second is
>>>>> that -ENOSPC is only for writes and not for reads.
>>>> This is right.
>>>>
>>>> Reads in the kernel return 0, but in QEMU we do not want that.  The code
>>>> currently returns -EIO, but perhaps -EINVAL is a better match.  It also
>>>> happens to be what Linux returns for discards.
>>> Perhaps it is, yes. It shouldn't make a difference for guests anyway.
>>> (Unlike -ENOSPC for writes, which would trigger werror=enospc! That's
>>> most likely not what we want.)
>> Yes, we want the check duplicated in all BlockBackend users.  Most of
>> them already do it, see the work that Markus did last year I think.
> I wouldn't call it duplicated because the action to take is different
> for each device, but yes, the check belongs there.
>
> Kevin
diff mbox

Patch

diff --git a/block/block-backend.c b/block/block-backend.c
index 93e46f3..e54c433 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -461,7 +461,7 @@  static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
     }
 
     if (offset > len || len - offset < size) {
-        return -EIO;
+        return -ENOSPC;
     }
 
     return 0;