Message ID | 20210317151734.41656-1-kwolf@redhat.com |
---|---|
State | New |
Headers | show |
Series | [RFC] curl: Allow reading after EOF | expand |
On 3/17/21 10:17 AM, Kevin Wolf wrote: > This makes the curl driver more consistent with file-posix in that it > doesn't return errors any more for reading after the end of the remote > file. Instead, zeros are returned for these areas. > > This inconsistency was reported in: > https://bugzilla.redhat.com/show_bug.cgi?id=1935061 > > Note that the image used in this bug report has a corrupted snapshot > table, which means that the qcow2 driver tries to do a zero-length read > after EOF on its image file. > > The old behaviour of the curl driver can hardly be called a bug, but the > inconsistency turned out to be confusing. > > Reported-by: Alice Frosi <afrosi@redhat.com> > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > > It is not entirely clear to me if this is something we want to do. If we > do care about consistency between protocol drivers, something like this > should probably be done in block/io.c eventually - but that would > require converting bs->total_sectors to byte granularity first. Something that's been (low priority) on my todo list for a while. NBD has the same problem. > > Any opinions on what the most desirable semantics would be and whether > we should patch individual drivers until we can have a generic solution? In nbdkit, we took the following approach in the 'truncate' driver: If presented with an image that is not a multiple of the desired block size, we round the image size up (corner cases for images with sizes near 2^63 where rounding would wrap to negative; and since qemu enforces a max image size at 2^63-2^32 to avoid 32-bit operations ever overflowing). Reads of the virtual tail come back as zero, writes to the virtual tail are allowed if they would write zero into the tail, and fail with ENOSPC otherwise. Doing that in the block layer makes more sense than doing it per-driver. Thus, I'm not sure if I'm a fan of this patch. > > block/curl.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/block/curl.c b/block/curl.c > index 50e741a0d7..a8d87a1813 100644 > --- a/block/curl.c > +++ b/block/curl.c > @@ -898,6 +898,7 @@ out: > static int coroutine_fn curl_co_preadv(BlockDriverState *bs, > uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) > { > + BDRVCURLState *s = bs->opaque; > CURLAIOCB acb = { > .co = qemu_coroutine_self(), > .ret = -EINPROGRESS, > @@ -906,6 +907,15 @@ static int coroutine_fn curl_co_preadv(BlockDriverState *bs, > .bytes = bytes > }; > > + if (offset > s->len || bytes > s->len - offset) { > + uint64_t req_bytes = offset > s->len ? 0 : s->len - offset; > + qemu_iovec_memset(qiov, req_bytes, 0, bytes - req_bytes); > + bytes = req_bytes; > + } > + if (bytes == 0) { > + return 0; > + } > + > curl_setup_preadv(bs, &acb); > while (acb.ret == -EINPROGRESS) { > qemu_coroutine_yield(); >
On 3/17/21 10:32 AM, Eric Blake wrote: > On 3/17/21 10:17 AM, Kevin Wolf wrote: >> This makes the curl driver more consistent with file-posix in that it >> doesn't return errors any more for reading after the end of the remote >> file. Instead, zeros are returned for these areas. >> >> This inconsistency was reported in: >> https://bugzilla.redhat.com/show_bug.cgi?id=1935061 >> >> Note that the image used in this bug report has a corrupted snapshot >> table, which means that the qcow2 driver tries to do a zero-length read >> after EOF on its image file. >> >> The old behaviour of the curl driver can hardly be called a bug, but the >> inconsistency turned out to be confusing. >> >> Reported-by: Alice Frosi <afrosi@redhat.com> >> Signed-off-by: Kevin Wolf <kwolf@redhat.com> >> --- >> >> It is not entirely clear to me if this is something we want to do. If we >> do care about consistency between protocol drivers, something like this >> should probably be done in block/io.c eventually - but that would >> require converting bs->total_sectors to byte granularity first. > > Something that's been (low priority) on my todo list for a while. NBD > has the same problem. Actually, NBD has already been patched to fuzz around the lack of byte-accurateness in the block layer; see commit 9cf638508. So doing something similar in the curl driver as a workaround until the block layer does it for everyone is tolerable, but does not scale. > >> >> Any opinions on what the most desirable semantics would be and whether >> we should patch individual drivers until we can have a generic solution? > > In nbdkit, we took the following approach in the 'truncate' driver: > > If presented with an image that is not a multiple of the desired block > size, we round the image size up (corner cases for images with sizes > near 2^63 where rounding would wrap to negative; and since qemu enforces > a max image size at 2^63-2^32 to avoid 32-bit operations ever > overflowing). Reads of the virtual tail come back as zero, writes to > the virtual tail are allowed if they would write zero into the tail, and > fail with ENOSPC otherwise. The current code in block/nbd.c does this for reads, but fails on EIO without regards to the content of what is being attempted to write into that tail. I like the nbdkit behavior better. > > Doing that in the block layer makes more sense than doing it per-driver. > > Thus, I'm not sure if I'm a fan of this patch. > >> >> block/curl.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/block/curl.c b/block/curl.c >> index 50e741a0d7..a8d87a1813 100644 >> --- a/block/curl.c >> +++ b/block/curl.c >> @@ -898,6 +898,7 @@ out: >> static int coroutine_fn curl_co_preadv(BlockDriverState *bs, >> uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) >> { >> + BDRVCURLState *s = bs->opaque; >> CURLAIOCB acb = { >> .co = qemu_coroutine_self(), >> .ret = -EINPROGRESS, >> @@ -906,6 +907,15 @@ static int coroutine_fn curl_co_preadv(BlockDriverState *bs, >> .bytes = bytes >> }; >> >> + if (offset > s->len || bytes > s->len - offset) { >> + uint64_t req_bytes = offset > s->len ? 0 : s->len - offset; >> + qemu_iovec_memset(qiov, req_bytes, 0, bytes - req_bytes); >> + bytes = req_bytes; In nbd.c, I also have: if (offset >= client->info.size) { assert(bytes < BDRV_SECTOR_SIZE); if (offset + bytes > client->info.size) { assert(slop < BDRV_SECTOR_SIZE); With those assertions added, I can give it Reviewed-by: Eric Blake <eblake@redhat.com> >> + } >> + if (bytes == 0) { >> + return 0; >> + } >> + >> curl_setup_preadv(bs, &acb); >> while (acb.ret == -EINPROGRESS) { >> qemu_coroutine_yield(); >> >
On Wed, Mar 17, 2021 at 04:17:34PM +0100, Kevin Wolf wrote: > This makes the curl driver more consistent with file-posix in that it > doesn't return errors any more for reading after the end of the remote > file. Instead, zeros are returned for these areas. > > This inconsistency was reported in: > https://bugzilla.redhat.com/show_bug.cgi?id=1935061 > > Note that the image used in this bug report has a corrupted snapshot > table, which means that the qcow2 driver tries to do a zero-length read > after EOF on its image file. > > The old behaviour of the curl driver can hardly be called a bug, but the > inconsistency turned out to be confusing. > > Reported-by: Alice Frosi <afrosi@redhat.com> > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > > It is not entirely clear to me if this is something we want to do. If we > do care about consistency between protocol drivers, something like this > should probably be done in block/io.c eventually - but that would > require converting bs->total_sectors to byte granularity first. > > Any opinions on what the most desirable semantics would be and whether > we should patch individual drivers until we can have a generic solution? What valid scenarios are there for wanting to read beyond the bounds of the protocol driver storage ? Why was file-posix allowing this so far ? If I've given file-posix a 10 GB plain file or device and something requests a read from the 11 GB offset, IMHO, that is a sign of serious error somewhere and possible impending doom. For writable storage, I would think that read + write should be symmetric, by which I mean if a read() at a particular offset succeeds, then I would also expect a write() at the same offset to succeed, and have its data later returned by a read(). We generally can't write at an offset beyond the storage (unless we are intending to auto-enlarge a plain file), so I think we shouldn't allow reads either. > > diff --git a/block/curl.c b/block/curl.c > index 50e741a0d7..a8d87a1813 100644 > --- a/block/curl.c > +++ b/block/curl.c > @@ -898,6 +898,7 @@ out: > static int coroutine_fn curl_co_preadv(BlockDriverState *bs, > uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) > { > + BDRVCURLState *s = bs->opaque; > CURLAIOCB acb = { > .co = qemu_coroutine_self(), > .ret = -EINPROGRESS, > @@ -906,6 +907,15 @@ static int coroutine_fn curl_co_preadv(BlockDriverState *bs, > .bytes = bytes > }; > > + if (offset > s->len || bytes > s->len - offset) { > + uint64_t req_bytes = offset > s->len ? 0 : s->len - offset; > + qemu_iovec_memset(qiov, req_bytes, 0, bytes - req_bytes); > + bytes = req_bytes; > + } > + if (bytes == 0) { > + return 0; > + } > + > curl_setup_preadv(bs, &acb); > while (acb.ret == -EINPROGRESS) { > qemu_coroutine_yield(); > -- > 2.30.2 > > Regards, Daniel
Am 17.03.2021 um 16:46 hat Eric Blake geschrieben: > On 3/17/21 10:32 AM, Eric Blake wrote: > > On 3/17/21 10:17 AM, Kevin Wolf wrote: > >> This makes the curl driver more consistent with file-posix in that it > >> doesn't return errors any more for reading after the end of the remote > >> file. Instead, zeros are returned for these areas. > >> > >> This inconsistency was reported in: > >> https://bugzilla.redhat.com/show_bug.cgi?id=1935061 > >> > >> Note that the image used in this bug report has a corrupted snapshot > >> table, which means that the qcow2 driver tries to do a zero-length read > >> after EOF on its image file. > >> > >> The old behaviour of the curl driver can hardly be called a bug, but the > >> inconsistency turned out to be confusing. > >> > >> Reported-by: Alice Frosi <afrosi@redhat.com> > >> Signed-off-by: Kevin Wolf <kwolf@redhat.com> > >> --- > >> > >> It is not entirely clear to me if this is something we want to do. If we > >> do care about consistency between protocol drivers, something like this > >> should probably be done in block/io.c eventually - but that would > >> require converting bs->total_sectors to byte granularity first. > > > > Something that's been (low priority) on my todo list for a while. NBD > > has the same problem. > > Actually, NBD has already been patched to fuzz around the lack of > byte-accurateness in the block layer; see commit 9cf638508. So doing > something similar in the curl driver as a workaround until the block > layer does it for everyone is tolerable, but does not scale. Right, duplicating the same code in every protocol driver wouldn't be great. > >> Any opinions on what the most desirable semantics would be and whether > >> we should patch individual drivers until we can have a generic solution? > > > > In nbdkit, we took the following approach in the 'truncate' driver: > > > > If presented with an image that is not a multiple of the desired block > > size, we round the image size up (corner cases for images with sizes > > near 2^63 where rounding would wrap to negative; and since qemu enforces > > a max image size at 2^63-2^32 to avoid 32-bit operations ever > > overflowing). Reads of the virtual tail come back as zero, writes to > > the virtual tail are allowed if they would write zero into the tail, and > > fail with ENOSPC otherwise. > > The current code in block/nbd.c does this for reads, but fails on EIO > without regards to the content of what is being attempted to write into > that tail. I like the nbdkit behavior better. I don't, honestly. Making the success of a write depend on the content written is just too surprising. Is there a real use case behind this that would justify it? I think the file-posix behaviour (which is really just the kernel behaviour) is the most sane: On regular files, which can be extended, extend the image file on writes. On block devices, return -ENOSPC. > > Doing that in the block layer makes more sense than doing it per-driver. > > > > Thus, I'm not sure if I'm a fan of this patch. What we need to decide is the semantics that we want to have. If the behaviour implemented by this patch (or a similar patch) is what we want, then I would say we should commit it, even if we're planning to refactor things in the long term so that it's implemented in the generic block layer. > >> > >> block/curl.c | 10 ++++++++++ > >> 1 file changed, 10 insertions(+) > >> > >> diff --git a/block/curl.c b/block/curl.c > >> index 50e741a0d7..a8d87a1813 100644 > >> --- a/block/curl.c > >> +++ b/block/curl.c > >> @@ -898,6 +898,7 @@ out: > >> static int coroutine_fn curl_co_preadv(BlockDriverState *bs, > >> uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) > >> { > >> + BDRVCURLState *s = bs->opaque; > >> CURLAIOCB acb = { > >> .co = qemu_coroutine_self(), > >> .ret = -EINPROGRESS, > >> @@ -906,6 +907,15 @@ static int coroutine_fn curl_co_preadv(BlockDriverState *bs, > >> .bytes = bytes > >> }; > >> > >> + if (offset > s->len || bytes > s->len - offset) { > >> + uint64_t req_bytes = offset > s->len ? 0 : s->len - offset; > >> + qemu_iovec_memset(qiov, req_bytes, 0, bytes - req_bytes); > >> + bytes = req_bytes; > > In nbd.c, I also have: > if (offset >= client->info.size) { > assert(bytes < BDRV_SECTOR_SIZE); > > if (offset + bytes > client->info.size) { > assert(slop < BDRV_SECTOR_SIZE); > > With those assertions added, I can give it Hm, that requires splitting it into two blocks. Maybe I should just copy the whole thing from the NBD driver. Makes it a bit longer, but then it's more obviously doing the same. Anyway, these assertions show that we have already decided on the desired behaviour for reads because we already read zeros in bdrv_aligned_preadv() as long as things are sector aligned. Changing bs->total_sectors to bytes would result in the same behaviour as this patch. What we can try is how hard this conversion is really. If we can just do it right away, it's probably preferable. Kevin
Am 17.03.2021 um 17:12 hat Daniel P. Berrangé geschrieben: > On Wed, Mar 17, 2021 at 04:17:34PM +0100, Kevin Wolf wrote: > > This makes the curl driver more consistent with file-posix in that it > > doesn't return errors any more for reading after the end of the remote > > file. Instead, zeros are returned for these areas. > > > > This inconsistency was reported in: > > https://bugzilla.redhat.com/show_bug.cgi?id=1935061 > > > > Note that the image used in this bug report has a corrupted snapshot > > table, which means that the qcow2 driver tries to do a zero-length read > > after EOF on its image file. > > > > The old behaviour of the curl driver can hardly be called a bug, but the > > inconsistency turned out to be confusing. > > > > Reported-by: Alice Frosi <afrosi@redhat.com> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > --- > > > > It is not entirely clear to me if this is something we want to do. If we > > do care about consistency between protocol drivers, something like this > > should probably be done in block/io.c eventually - but that would > > require converting bs->total_sectors to byte granularity first. > > > > Any opinions on what the most desirable semantics would be and whether > > we should patch individual drivers until we can have a generic solution? > > What valid scenarios are there for wanting to read beyond the bounds > of the protocol driver storage ? Why was file-posix allowing this > so far ? > > If I've given file-posix a 10 GB plain file or device and something > requests a read from the 11 GB offset, IMHO, that is a sign of serious > error somewhere and possible impending doom. > > For writable storage, I would think that read + write should be > symmetric, by which I mean if a read() at a particular offset > succeeds, then I would also expect a write() at the same offset to > succeed, and have its data later returned by a read(). > > We generally can't write at an offset beyond the storage (unless we > are intending to auto-enlarge a plain file), so I think we shouldn't > allow reads either. It is definitely related to format drivers that grow their image files. I think the reason for allowing this may have been that with O_DIRECT, you need aligned requests and when format drivers write just a few bytes, we actually do a RMW - and you don't want to get an error during the read part just because the image file will only be resized by the write. Since curl is a read-only protocol driver (at the moment, I actually have an experimental branch that adds write support so we can run iotests for http), this reason doesn't really apply. At the moment, it would be just for consistency. Kevin
On 3/17/21 11:43 AM, Kevin Wolf wrote: >>> It is not entirely clear to me if this is something we want to do. If we >>> do care about consistency between protocol drivers, something like this >>> should probably be done in block/io.c eventually - but that would >>> require converting bs->total_sectors to byte granularity first. >>> >>> Any opinions on what the most desirable semantics would be and whether >>> we should patch individual drivers until we can have a generic solution? >> >> What valid scenarios are there for wanting to read beyond the bounds >> of the protocol driver storage ? Why was file-posix allowing this >> so far ? >> Our block driver already filters all reads larger than the image size rounded to the nearest sector; so this discussion is ONLY about the 511 bytes possible in an unaligned file at the protocol layer and its rounded-up size at the block layer. >> If I've given file-posix a 10 GB plain file or device and something >> requests a read from the 11 GB offset, IMHO, that is a sign of serious >> error somewhere and possible impending doom. The block layer won't permit that; it's too far beyond the 511 bytes of rounding up a sector-unaligned image. >> >> For writable storage, I would think that read + write should be >> symmetric, by which I mean if a read() at a particular offset >> succeeds, then I would also expect a write() at the same offset to >> succeed, and have its data later returned by a read(). >> >> We generally can't write at an offset beyond the storage (unless we >> are intending to auto-enlarge a plain file), so I think we shouldn't >> allow reads either. > > It is definitely related to format drivers that grow their image files. > I think the reason for allowing this may have been that with O_DIRECT, > you need aligned requests and when format drivers write just a few > bytes, we actually do a RMW - and you don't want to get an error during > the read part just because the image file will only be resized by the > write. I like the nbdkit behavior for symmetry: since we can read the tail as zero, allowing write as zero does not change the size but also avoids an ENOSPC, while allowing the guest full control over the bytes prior to the unaligned tail. But I can also live with the symmetry of reads from the final sector see zero, but writes to the final sector fail (basically, the final sector becomes read-only, even if the rest of the image is writable). > > Since curl is a read-only protocol driver (at the moment, I actually > have an experimental branch that adds write support so we can run > iotests for http), this reason doesn't really apply. At the moment, it > would be just for consistency.
diff --git a/block/curl.c b/block/curl.c index 50e741a0d7..a8d87a1813 100644 --- a/block/curl.c +++ b/block/curl.c @@ -898,6 +898,7 @@ out: static int coroutine_fn curl_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) { + BDRVCURLState *s = bs->opaque; CURLAIOCB acb = { .co = qemu_coroutine_self(), .ret = -EINPROGRESS, @@ -906,6 +907,15 @@ static int coroutine_fn curl_co_preadv(BlockDriverState *bs, .bytes = bytes }; + if (offset > s->len || bytes > s->len - offset) { + uint64_t req_bytes = offset > s->len ? 0 : s->len - offset; + qemu_iovec_memset(qiov, req_bytes, 0, bytes - req_bytes); + bytes = req_bytes; + } + if (bytes == 0) { + return 0; + } + curl_setup_preadv(bs, &acb); while (acb.ret == -EINPROGRESS) { qemu_coroutine_yield();
This makes the curl driver more consistent with file-posix in that it doesn't return errors any more for reading after the end of the remote file. Instead, zeros are returned for these areas. This inconsistency was reported in: https://bugzilla.redhat.com/show_bug.cgi?id=1935061 Note that the image used in this bug report has a corrupted snapshot table, which means that the qcow2 driver tries to do a zero-length read after EOF on its image file. The old behaviour of the curl driver can hardly be called a bug, but the inconsistency turned out to be confusing. Reported-by: Alice Frosi <afrosi@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- It is not entirely clear to me if this is something we want to do. If we do care about consistency between protocol drivers, something like this should probably be done in block/io.c eventually - but that would require converting bs->total_sectors to byte granularity first. Any opinions on what the most desirable semantics would be and whether we should patch individual drivers until we can have a generic solution? block/curl.c | 10 ++++++++++ 1 file changed, 10 insertions(+)