Message ID | 554B6EBB.1010001@redhat.com |
---|---|
State | New |
Headers | show |
Am 07.05.2015 um 15:55 hat Paolo Bonzini geschrieben: > > > On 07/05/2015 15:20, Kevin Wolf wrote: > > > Does ENOSPC over LVM (dm-linear) work at all, and who generates the > > > ENOSPC there? > > > > The LVM use case is what oVirt uses, so I'm pretty sure that it works. > > I'm now sure who generates the ENOSPC, but it's not qemu anyway. If I > > had to guess, I'd say that the kernel block layer might just forbid > > writing after EOF for any block device. > > Indeed, though it's VFS (blkdev_write_iter in fs/block_dev.c) and not > the block layer. It looks like we need this: > > diff --git a/block/block-backend.c b/block/block-backend.c > index 93e46f3..e54c433 100644 > --- a/block/block-backend.c > +++ b/block/block-backend.c > @@ -461,7 +461,7 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset, > } > > if (offset > len || len - offset < size) { > - return -EIO; > + return -ENOSPC; > } This is not right for two reasons: The first is that this is BlockBackend code and it wouldn't even take effect for the qcow2 case where we're writing past EOF only on the protocol layer. The second is that -ENOSPC is only for writes and not for reads. For the protocol level, bdrv_aligned_preadv() has code to handle reads past EOF if bs->zero_beyond_eof is set. This is always the case, except for qcow2, which has the snapshot VM state after EOF, so the driver is called for that. For writes, the driver is always called. The expectiation is that beyond EOF it resizes the image file if it can, and returns -ENOSPC if it can't. We could change this to have a check directly in bdrv_aligned_pwrite() and then drivers would have to advertise whether they can extend a file beyond EOF or not so we know whether to apply the check or not (essentially the growable flag that Max wants to add), but I'm not sure what we would win with that. Kevin
On 07/05/2015 16:07, Kevin Wolf wrote: > This is not right for two reasons: The first is that this is > BlockBackend code I think it would take effect for the qemu-nbd case though. > and it wouldn't even take effect for the qcow2 case > where we're writing past EOF only on the protocol layer. The second is > that -ENOSPC is only for writes and not for reads. This is right. Reads in the kernel return 0, but in QEMU we do not want that. The code currently returns -EIO, but perhaps -EINVAL is a better match. It also happens to be what Linux returns for discards. Paolo > For the protocol level, bdrv_aligned_preadv() has code to handle reads > past EOF if bs->zero_beyond_eof is set. This is always the case, except > for qcow2, which has the snapshot VM state after EOF, so the driver is > called for that. > > For writes, the driver is always called. The expectiation is that beyond > EOF it resizes the image file if it can, and returns -ENOSPC if it can't. > We could change this to have a check directly in bdrv_aligned_pwrite() > and then drivers would have to advertise whether they can extend a file > beyond EOF or not so we know whether to apply the check or not > (essentially the growable flag that Max wants to add), but I'm not sure > what we would win with that.
Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben: > > > On 07/05/2015 16:07, Kevin Wolf wrote: > > This is not right for two reasons: The first is that this is > > BlockBackend code > > I think it would take effect for the qemu-nbd case though. Oh, you want to change the server code rather than the client? Wait... Are you saying that NBD sends a (platform specific) errno value over the network? :-/ In theory, what error code the NBD server needs to send should be specified by the NBD protocol. Am I right to assume that it doesn't do that? In any case, I'm not sure whether qemu's internal error code should change just for NBD. Producing the right error code for the protocol is the job of nbd_co_receive_request(). > > and it wouldn't even take effect for the qcow2 case > > where we're writing past EOF only on the protocol layer. The second is > > that -ENOSPC is only for writes and not for reads. > > This is right. > > Reads in the kernel return 0, but in QEMU we do not want that. The code > currently returns -EIO, but perhaps -EINVAL is a better match. It also > happens to be what Linux returns for discards. Perhaps it is, yes. It shouldn't make a difference for guests anyway. (Unlike -ENOSPC for writes, which would trigger werror=enospc! That's most likely not what we want.) Kevin
On 07/05/2015 16:34, Kevin Wolf wrote: > Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben: >> >> >> On 07/05/2015 16:07, Kevin Wolf wrote: >>> This is not right for two reasons: The first is that this is >>> BlockBackend code >> >> I think it would take effect for the qemu-nbd case though. > > Oh, you want to change the server code rather than the client? Yes. > Wait... Are you saying that NBD sends a (platform specific) errno value > over the network? :-/ Yes. :/ That said, at least the error codes that Linux places in /usr/include/asm/errno-base.h seem to be pretty much standard---at least Windows and most Unices share them---with the exception of EAGAIN. I'll send a patch to NBD to standardize the set of error codes that it sends. > In theory, what error code the NBD server needs to send should be > specified by the NBD protocol. Am I right to assume that it doesn't do > that? Nope. > In any case, I'm not sure whether qemu's internal error code > should change just for NBD. Producing the right error code for the > protocol is the job of nbd_co_receive_request(). Ok, so it shouldn't reach blk_check_request at all. But then, we should aim at making blk_check_request's checks assertions. >>> and it wouldn't even take effect for the qcow2 case >>> where we're writing past EOF only on the protocol layer. The second is >>> that -ENOSPC is only for writes and not for reads. >> >> This is right. >> >> Reads in the kernel return 0, but in QEMU we do not want that. The code >> currently returns -EIO, but perhaps -EINVAL is a better match. It also >> happens to be what Linux returns for discards. > > Perhaps it is, yes. It shouldn't make a difference for guests anyway. > (Unlike -ENOSPC for writes, which would trigger werror=enospc! That's > most likely not what we want.) Yes, we want the check duplicated in all BlockBackend users. Most of them already do it, see the work that Markus did last year I think. Paolo
Am 07.05.2015 um 16:50 hat Paolo Bonzini geschrieben: > On 07/05/2015 16:34, Kevin Wolf wrote: > > Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben: > >> > >> > >> On 07/05/2015 16:07, Kevin Wolf wrote: > >>> This is not right for two reasons: The first is that this is > >>> BlockBackend code > >> > >> I think it would take effect for the qemu-nbd case though. > > > > Oh, you want to change the server code rather than the client? > > Yes. Actually, considering all the information in this thread, I'm inclined that we should change both sides. qemu-nbd because ENOSPC might be what clients expect by analogy with Linux block devices, even if the behaviour for accesses beyond the device size isn't specified in the NBD protocol and the server might just do anything. As long as the behaviour is undefined, it's nice to do what most people may expect. And as the real fix change the nbd client, because even if new qemu-nbd versions will be nice, we shouldn't rely on undefined behaviour. We know that old qemu-nbd servers won't produce ENOSPC and I'm not sure what other NBD servers do. > > Wait... Are you saying that NBD sends a (platform specific) errno value > > over the network? :-/ > > Yes. :/ That said, at least the error codes that Linux places in > /usr/include/asm/errno-base.h seem to be pretty much standard---at least > Windows and most Unices share them---with the exception of EAGAIN. > > I'll send a patch to NBD to standardize the set of error codes that it > sends. Thanks, that will be helpful in the future. Is this the right place to look up the spec? http://sourceforge.net/p/nbd/code/ci/master/tree/doc/proto.txt If so, the commands seem to be hopelessly underspecified, especially with respect to error conditions. And where it says something about errors, it doesn't make sense: The server is forbidden to reply to a NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously) > > In theory, what error code the NBD server needs to send should be > > specified by the NBD protocol. Am I right to assume that it doesn't do > > that? > > Nope. > > > In any case, I'm not sure whether qemu's internal error code > > should change just for NBD. Producing the right error code for the > > protocol is the job of nbd_co_receive_request(). > > Ok, so it shouldn't reach blk_check_request at all. But then, we should > aim at making blk_check_request's checks assertions. Sounds fair as a goal, but I don't think all devices have such checks yet. We've fixed the most common devices (IDE, scsi-disk and virtio-blk) just a while ago. > >>> and it wouldn't even take effect for the qcow2 case > >>> where we're writing past EOF only on the protocol layer. The second is > >>> that -ENOSPC is only for writes and not for reads. > >> > >> This is right. > >> > >> Reads in the kernel return 0, but in QEMU we do not want that. The code > >> currently returns -EIO, but perhaps -EINVAL is a better match. It also > >> happens to be what Linux returns for discards. > > > > Perhaps it is, yes. It shouldn't make a difference for guests anyway. > > (Unlike -ENOSPC for writes, which would trigger werror=enospc! That's > > most likely not what we want.) > > Yes, we want the check duplicated in all BlockBackend users. Most of > them already do it, see the work that Markus did last year I think. I wouldn't call it duplicated because the action to take is different for each device, but yes, the check belongs there. Kevin
On 08/05/2015 12:08, Kevin Wolf wrote: > Actually, considering all the information in this thread, I'm inclined > that we should change both sides. qemu-nbd because ENOSPC might be what > clients expect by analogy with Linux block devices, even if the > behaviour for accesses beyond the device size isn't specified in the NBD > protocol and the server might just do anything. As long as the behaviour > is undefined, it's nice to do what most people may expect. > > And as the real fix change the nbd client, because even if new qemu-nbd > versions will be nice, we shouldn't rely on undefined behaviour. We know > that old qemu-nbd servers won't produce ENOSPC and I'm not sure what > other NBD servers do. Sounds like a plan. > Thanks, that will be helpful in the future. > > Is this the right place to look up the spec? > http://sourceforge.net/p/nbd/code/ci/master/tree/doc/proto.txt Yes. > If so, the commands seem to be hopelessly underspecified, especially > with respect to error conditions. And where it says something about > errors, it doesn't make sense: The server is forbidden to reply to a > NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously) So does nbd-server. O:-) Looks like you're reading the spec too literally (which is never a bad thing). >> Ok, so it shouldn't reach blk_check_request at all. But then, we should >> aim at making blk_check_request's checks assertions. > > Sounds fair as a goal, but I don't think all devices have such checks > yet. We've fixed the most common devices (IDE, scsi-disk and virtio-blk) > just a while ago. Indeed ("aim at"). Paolo
Am 08.05.2015 um 12:16 hat Paolo Bonzini geschrieben: > On 08/05/2015 12:08, Kevin Wolf wrote: > > If so, the commands seem to be hopelessly underspecified, especially > > with respect to error conditions. And where it says something about > > errors, it doesn't make sense: The server is forbidden to reply to a > > NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously) > > So does nbd-server. O:-) Looks like you're reading the spec too > literally (which is never a bad thing). I don't think there is something like reading a spec too literally. Specs are meant to be read literally. If a specification is open to interpretation, you don't need it. So I'd rather say I've found a bug in the spec. ;-) As you already seem to be working on the NBD mailing list, do you want to fix this, or should I subscribe and send a patch myself? Kevin
On 08/05/2015 12:34, Kevin Wolf wrote: > Am 08.05.2015 um 12:16 hat Paolo Bonzini geschrieben: >> On 08/05/2015 12:08, Kevin Wolf wrote: >>> If so, the commands seem to be hopelessly underspecified, especially >>> with respect to error conditions. And where it says something about >>> errors, it doesn't make sense: The server is forbidden to reply to a >>> NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously) >> >> So does nbd-server. O:-) Looks like you're reading the spec too >> literally (which is never a bad thing). > > I don't think there is something like reading a spec too literally. > Specs are meant to be read literally. If a specification is open to > interpretation, you don't need it. So I'd rather say I've found a bug > in the spec. ;-) You have. The bug is a single missing word ("successful") reply, but it is still a bug. There is another bug, in that it talks about "outstanding" writes rather than completed" writes. > As you already seem to be working on the NBD mailing list, do you want > to fix this, or should I subscribe and send a patch myself? You've been CCed on the fix. Paolo
On 08.05.2015 12:08, Kevin Wolf wrote: > Am 07.05.2015 um 16:50 hat Paolo Bonzini geschrieben: >> On 07/05/2015 16:34, Kevin Wolf wrote: >>> Am 07.05.2015 um 16:16 hat Paolo Bonzini geschrieben: >>>> >>>> On 07/05/2015 16:07, Kevin Wolf wrote: >>>>> This is not right for two reasons: The first is that this is >>>>> BlockBackend code >>>> I think it would take effect for the qemu-nbd case though. >>> Oh, you want to change the server code rather than the client? >> Yes. > Actually, considering all the information in this thread, I'm inclined > that we should change both sides. qemu-nbd because ENOSPC might be what > clients expect by analogy with Linux block devices, even if the > behaviour for accesses beyond the device size isn't specified in the NBD > protocol and the server might just do anything. As long as the behaviour > is undefined, it's nice to do what most people may expect. It is practically defined by what the reference implementation does, and that is return EINVAL (as I said in the other thread), with the reasoning being that it's an invalid request. I concur, the client should simply not send a request beyond the export length, doing so is wrong. So it is the client that should catch this case and return ENOSPC. > And as the real fix change the nbd client, because even if new qemu-nbd > versions will be nice, we shouldn't rely on undefined behaviour. We know > that old qemu-nbd servers won't produce ENOSPC and I'm not sure what > other NBD servers do. Return EINVAL. Max >>> Wait... Are you saying that NBD sends a (platform specific) errno value >>> over the network? :-/ >> Yes. :/ That said, at least the error codes that Linux places in >> /usr/include/asm/errno-base.h seem to be pretty much standard---at least >> Windows and most Unices share them---with the exception of EAGAIN. >> >> I'll send a patch to NBD to standardize the set of error codes that it >> sends. > Thanks, that will be helpful in the future. > > Is this the right place to look up the spec? > http://sourceforge.net/p/nbd/code/ci/master/tree/doc/proto.txt > > If so, the commands seem to be hopelessly underspecified, especially > with respect to error conditions. And where it says something about > errors, it doesn't make sense: The server is forbidden to reply to a > NBD_CMD_FLUSH if it failed... (qemu-nbd ignores this, obviously) > >>> In theory, what error code the NBD server needs to send should be >>> specified by the NBD protocol. Am I right to assume that it doesn't do >>> that? >> Nope. >> >>> In any case, I'm not sure whether qemu's internal error code >>> should change just for NBD. Producing the right error code for the >>> protocol is the job of nbd_co_receive_request(). >> Ok, so it shouldn't reach blk_check_request at all. But then, we should >> aim at making blk_check_request's checks assertions. > Sounds fair as a goal, but I don't think all devices have such checks > yet. We've fixed the most common devices (IDE, scsi-disk and virtio-blk) > just a while ago. > >>>>> and it wouldn't even take effect for the qcow2 case >>>>> where we're writing past EOF only on the protocol layer. The second is >>>>> that -ENOSPC is only for writes and not for reads. >>>> This is right. >>>> >>>> Reads in the kernel return 0, but in QEMU we do not want that. The code >>>> currently returns -EIO, but perhaps -EINVAL is a better match. It also >>>> happens to be what Linux returns for discards. >>> Perhaps it is, yes. It shouldn't make a difference for guests anyway. >>> (Unlike -ENOSPC for writes, which would trigger werror=enospc! That's >>> most likely not what we want.) >> Yes, we want the check duplicated in all BlockBackend users. Most of >> them already do it, see the work that Markus did last year I think. > I wouldn't call it duplicated because the action to take is different > for each device, but yes, the check belongs there. > > Kevin
diff --git a/block/block-backend.c b/block/block-backend.c index 93e46f3..e54c433 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -461,7 +461,7 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset, } if (offset > len || len - offset < size) { - return -EIO; + return -ENOSPC; } return 0;