diff mbox

[v2,3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure

Message ID 1415873823-13844-4-git-send-email-armbru@redhat.com
State New
Headers show

Commit Message

Markus Armbruster Nov. 13, 2014, 10:17 a.m. UTC
When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its
end.  When that fails, we pretend the hole extends to the end of file.
Wrong.  Except when SEEK_END fails, we screw up and claim it extends
to offset -1.  More wrong.

Fortunately, these seeks are very unlikely to fail.  Fix it anyway, by
returning failure.  The caller will then pretend there are no holes.
Inaccurate, but safe.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 block/raw-posix.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Max Reitz Nov. 13, 2014, 10:22 a.m. UTC | #1
On 2014-11-13 at 11:17, Markus Armbruster wrote:
> When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its
> end.  When that fails, we pretend the hole extends to the end of file.
> Wrong.  Except when SEEK_END fails, we screw up and claim it extends
> to offset -1.  More wrong.
>
> Fortunately, these seeks are very unlikely to fail.  Fix it anyway, by
> returning failure.  The caller will then pretend there are no holes.
> Inaccurate, but safe.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   block/raw-posix.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>
Kevin Wolf Nov. 13, 2014, 1:03 p.m. UTC | #2
Am 13.11.2014 um 11:17 hat Markus Armbruster geschrieben:
> When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its
> end.  When that fails, we pretend the hole extends to the end of file.
> Wrong.

Wrong only in some cases, see below.

> Except when SEEK_END fails, we screw up and claim it extends
> to offset -1.  More wrong.
> 
> Fortunately, these seeks are very unlikely to fail.  Fix it anyway, by
> returning failure.  The caller will then pretend there are no holes.
> Inaccurate, but safe.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  block/raw-posix.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index fd80d84..2a12a50 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1494,8 +1494,9 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
>      } else {
>          /* On a hole.  We need another syscall to find its end.  */
>          *data = lseek(s->fd, start, SEEK_DATA);
> -        if (*data == -1) {
> -            *data = lseek(s->fd, 0, SEEK_END);
> +        if (*data < 0) {
> +            /* no idea where the hole ends, give up (unlikely to happen) */

Not quite unlikely. If the file ends with a sparse area, we'll get
-1/ENXIO here.

lseek() with SEEK_DATA starting in a hole when there is no data until
EOF is actually the part that isn't documented in the man page, but
ENXIO is what I'm seeing here on RHEL 7.

> +            return -errno;
>          }
>      }

Kevin
Eric Blake Nov. 13, 2014, 2:52 p.m. UTC | #3
On 11/13/2014 06:03 AM, Kevin Wolf wrote:
> Am 13.11.2014 um 11:17 hat Markus Armbruster geschrieben:
>> When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its
>> end.  When that fails, we pretend the hole extends to the end of file.
>> Wrong.
> 
> Wrong only in some cases, see below.
> 
>> Except when SEEK_END fails, we screw up and claim it extends
>> to offset -1.  More wrong.


>> +++ b/block/raw-posix.c
>> @@ -1494,8 +1494,9 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
>>      } else {
>>          /* On a hole.  We need another syscall to find its end.  */
>>          *data = lseek(s->fd, start, SEEK_DATA);
>> -        if (*data == -1) {
>> -            *data = lseek(s->fd, 0, SEEK_END);
>> +        if (*data < 0) {
>> +            /* no idea where the hole ends, give up (unlikely to happen) */
> 
> Not quite unlikely. If the file ends with a sparse area, we'll get
> -1/ENXIO here.
> 
> lseek() with SEEK_DATA starting in a hole when there is no data until
> EOF is actually the part that isn't documented in the man page, but
> ENXIO is what I'm seeing here on RHEL 7.

Here's the (proposed) POSIX wording:

http://austingroupbugs.net/view.php?id=415

And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole,
so maybe we should special case it.
Eric Blake Nov. 13, 2014, 3:29 p.m. UTC | #4
On 11/13/2014 07:52 AM, Eric Blake wrote:
> On 11/13/2014 06:03 AM, Kevin Wolf wrote:
>> Am 13.11.2014 um 11:17 hat Markus Armbruster geschrieben:
>>> When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its
>>> end.  When that fails, we pretend the hole extends to the end of file.
>>> Wrong.
>>
>> Wrong only in some cases, see below.
>>
>>> Except when SEEK_END fails, we screw up and claim it extends
>>> to offset -1.  More wrong.
> 
> 
>>> +++ b/block/raw-posix.c
>>> @@ -1494,8 +1494,9 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
>>>      } else {
>>>          /* On a hole.  We need another syscall to find its end.  */
>>>          *data = lseek(s->fd, start, SEEK_DATA);
>>> -        if (*data == -1) {
>>> -            *data = lseek(s->fd, 0, SEEK_END);
>>> +        if (*data < 0) {
>>> +            /* no idea where the hole ends, give up (unlikely to happen) */
>>
>> Not quite unlikely. If the file ends with a sparse area, we'll get
>> -1/ENXIO here.
>>
>> lseek() with SEEK_DATA starting in a hole when there is no data until
>> EOF is actually the part that isn't documented in the man page, but
>> ENXIO is what I'm seeing here on RHEL 7.
> 
> Here's the (proposed) POSIX wording:
> 
> http://austingroupbugs.net/view.php?id=415
> 
> And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole,
> so maybe we should special case it.
> 

Uggh.  Historical practice on Solaris (and therefore the POSIX wording)
says that SEEK_HOLE in a trailing hole is allowed (but not required) to
seek to EOF instead of reporting the offset requested.  I have no clue
why this was done, but it is VERY annoying - it means that if you
provide an offset within a tail hole of a file, you cannot reliably tell
if the file ends in a hole or with data, without ALSO trying SEEK_DATA.
 For applications that are reading a file sequentially but skipping over
holes, this behavior is fine (it short-circuits the hole/data search
points and might shave an iteration off a lop).  But for OUR purposes,
where we are merely trying to ascertain whether we are in a hole, we
have an inaccurate response - since SEEK_HOLE does NOT return the offset
we passed in, we are prone to treat the offset as belonging to data,
which is a pessimization (you never get wrong results by treating a hole
as data and reading it, but it is definitely slower).

I think you HAVE to call lseek() twice, both with SEEK_HOLE and with
SEEK_DATA, if you want to accurately determine whether an offset happens
to live within a trailing hole.

(By the way, I really wish Solaris had implemented a variant that
queried, but did NOT change the file offset - maybe Linux can add that
as an extension, and give it sane semantics of not special casing
trailing holes...)
Max Reitz Nov. 13, 2014, 3:44 p.m. UTC | #5
On 2014-11-13 at 16:29, Eric Blake wrote:
> On 11/13/2014 07:52 AM, Eric Blake wrote:
>> On 11/13/2014 06:03 AM, Kevin Wolf wrote:
>>> Am 13.11.2014 um 11:17 hat Markus Armbruster geschrieben:
>>>> When SEEK_HOLE tells us we're in a hole, we try SEEK_DATA to find its
>>>> end.  When that fails, we pretend the hole extends to the end of file.
>>>> Wrong.
>>> Wrong only in some cases, see below.
>>>
>>>> Except when SEEK_END fails, we screw up and claim it extends
>>>> to offset -1.  More wrong.
>>
>>>> +++ b/block/raw-posix.c
>>>> @@ -1494,8 +1494,9 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
>>>>       } else {
>>>>           /* On a hole.  We need another syscall to find its end.  */
>>>>           *data = lseek(s->fd, start, SEEK_DATA);
>>>> -        if (*data == -1) {
>>>> -            *data = lseek(s->fd, 0, SEEK_END);
>>>> +        if (*data < 0) {
>>>> +            /* no idea where the hole ends, give up (unlikely to happen) */
>>> Not quite unlikely. If the file ends with a sparse area, we'll get
>>> -1/ENXIO here.
>>>
>>> lseek() with SEEK_DATA starting in a hole when there is no data until
>>> EOF is actually the part that isn't documented in the man page, but
>>> ENXIO is what I'm seeing here on RHEL 7.
>> Here's the (proposed) POSIX wording:
>>
>> http://austingroupbugs.net/view.php?id=415
>>
>> And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole,
>> so maybe we should special case it.
>>
> Uggh.  Historical practice on Solaris (and therefore the POSIX wording)
> says that SEEK_HOLE in a trailing hole is allowed (but not required) to
> seek to EOF instead of reporting the offset requested.  I have no clue
> why this was done, but it is VERY annoying - it means that if you
> provide an offset within a tail hole of a file, you cannot reliably tell
> if the file ends in a hole or with data, without ALSO trying SEEK_DATA.
>   For applications that are reading a file sequentially but skipping over
> holes, this behavior is fine (it short-circuits the hole/data search
> points and might shave an iteration off a lop).  But for OUR purposes,
> where we are merely trying to ascertain whether we are in a hole, we
> have an inaccurate response - since SEEK_HOLE does NOT return the offset
> we passed in, we are prone to treat the offset as belonging to data,
> which is a pessimization (you never get wrong results by treating a hole
> as data and reading it, but it is definitely slower).
>
> I think you HAVE to call lseek() twice, both with SEEK_HOLE and with
> SEEK_DATA, if you want to accurately determine whether an offset happens
> to live within a trailing hole.
>
> (By the way, I really wish Solaris had implemented a variant that
> queried, but did NOT change the file offset - maybe Linux can add that
> as an extension, and give it sane semantics of not special casing
> trailing holes...)

Are you asking for fiemap? :-P

Max
Eric Blake Nov. 13, 2014, 3:47 p.m. UTC | #6
On 11/13/2014 08:29 AM, Eric Blake wrote:

>>> lseek() with SEEK_DATA starting in a hole when there is no data until
>>> EOF is actually the part that isn't documented in the man page, but
>>> ENXIO is what I'm seeing here on RHEL 7.
>>
>> Here's the (proposed) POSIX wording:
>>
>> http://austingroupbugs.net/view.php?id=415
>>
>> And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole,
>> so maybe we should special case it.
>>
> 
> Uggh.  Historical practice on Solaris (and therefore the POSIX wording)
> says that SEEK_HOLE in a trailing hole is allowed (but not required) to
> seek to EOF instead of reporting the offset requested.  I have no clue
> why this was done, but it is VERY annoying - it means that if you
> provide an offset within a tail hole of a file, you cannot reliably tell
> if the file ends in a hole or with data, without ALSO trying SEEK_DATA.
>  For applications that are reading a file sequentially but skipping over
> holes, this behavior is fine (it short-circuits the hole/data search
> points and might shave an iteration off a lop).  But for OUR purposes,
> where we are merely trying to ascertain whether we are in a hole, we
> have an inaccurate response - since SEEK_HOLE does NOT return the offset
> we passed in, we are prone to treat the offset as belonging to data,
> which is a pessimization (you never get wrong results by treating a hole
> as data and reading it, but it is definitely slower).
> 
> I think you HAVE to call lseek() twice, both with SEEK_HOLE and with
> SEEK_DATA, if you want to accurately determine whether an offset happens
> to live within a trailing hole.

Here's a table of possible situations, based solely on POSIX wording
(and not on actual tests on Solaris or Linux, although it shouldn't be
too hard to confirm behavior):

0-length file:
lseek(fd, 0, SEEK_HOLE) => -1 ENXIO
lseek(fd, 0, SEEK_DATA) => -1 ENXIO
conclusion: 0 is at EOF

file of any size:
lseek(fd, size_or_larger, SEEK_HOLE) => -1 ENXIO
lseek(fd, size_or_larger, SEEK_DATA) => -1 ENXIO
conclusion: size_or_larger is at or beyond EOF

file where offset is in a hole, but data appears later:
lseek(fd, offset, SEEK_HOLE) => offset
lseek(fd, offset, SEEK_DATA) => end_of_hole
conclusion: offset through end_of_hole is in a hole

file where offset is data, whether or not a hole appears later:
lseek(fd, offset, SEEK_HOLE) => end_of_data
lseek(fd, offset, SEEK_DATA) => offset
conclusion: offset through end_of_data is in data

file where offset is in a tail hole, option 1:
lseek(fd, offset, SEEK_HOLE) => offset
lseek(fd, offset, SEEK_DATA) => -1 ENXIO
conclusion: offset through EOF is in hole, but another seek needed to
learn EOF

file where offset is in a tail hole, option 2:
lseek(fd, offset, SEEK_HOLE) => EOF
lseek(fd, offset, SEEK_DATA) => -1 ENXIO
conclusion: offset through EOF is in hole, no additional seek needed

The two calls are both necessary, in order to learn which extant type
offset belongs to, and to tell where that extant ends; and the behaviors
are distinguishable (if both lseek() succeed, we have both numbers we
want; if both fail with ENXIO, we know the offset is at or beyond EOF;
and if only SEEK_HOLE fails with ENXIO, we know we have a trailing
hole); and we can tell at runtime what to do about a trailing hole (if
the return value is offset, we need one more lseek(fd, 0, SEEK_END) to
find EOF; if the return value is larger than offset, we have EOF for
free).  You can optimize by calling SEEK_HOLE first (if it fails with
ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is
insufficient to give you all the information you need.
Eric Blake Nov. 13, 2014, 3:49 p.m. UTC | #7
On 11/13/2014 08:44 AM, Max Reitz wrote:

>> (By the way, I really wish Solaris had implemented a variant that
>> queried, but did NOT change the file offset - maybe Linux can add that
>> as an extension, and give it sane semantics of not special casing
>> trailing holes...)
> 
> Are you asking for fiemap? :-P

Not that bulky; maybe just two more constants SEEK_PEEK_HOLE and
SEEK_PEEK_DATA, which return the same values as their non-peek
counterparts but without modifying the fd offset.
Eric Blake Nov. 13, 2014, 3:52 p.m. UTC | #8
On 11/13/2014 08:49 AM, Eric Blake wrote:
> On 11/13/2014 08:44 AM, Max Reitz wrote:
> 
>>> (By the way, I really wish Solaris had implemented a variant that
>>> queried, but did NOT change the file offset - maybe Linux can add that
>>> as an extension, and give it sane semantics of not special casing
>>> trailing holes...)
>>
>> Are you asking for fiemap? :-P
> 
> Not that bulky; maybe just two more constants SEEK_PEEK_HOLE and
> SEEK_PEEK_DATA, which return the same values as their non-peek
> counterparts but without modifying the fd offset.

And not the first time I've requested it.  From 2011:
https://lkml.org/lkml/2011/4/22/91
Eric Blake Nov. 13, 2014, 4:01 p.m. UTC | #9
On 11/13/2014 08:47 AM, Eric Blake wrote:

> The two calls are both necessary, in order to learn which extant type
> offset belongs to, and to tell where that extant ends; and the behaviors
> are distinguishable (if both lseek() succeed, we have both numbers we
> want; if both fail with ENXIO, we know the offset is at or beyond EOF;
> and if only SEEK_HOLE fails with ENXIO, we know we have a trailing
              ^
I meant SEEK_DATA here.

> hole); and we can tell at runtime what to do about a trailing hole (if
> the return value is offset, we need one more lseek(fd, 0, SEEK_END) to
> find EOF; if the return value is larger than offset, we have EOF for
> free).  You can optimize by calling SEEK_HOLE first (if it fails with
> ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is
> insufficient to give you all the information you need.
>
Markus Armbruster Nov. 14, 2014, 1:12 p.m. UTC | #10
Eric Blake <eblake@redhat.com> writes:

> On 11/13/2014 08:29 AM, Eric Blake wrote:
>
>>>> lseek() with SEEK_DATA starting in a hole when there is no data until
>>>> EOF is actually the part that isn't documented in the man page, but
>>>> ENXIO is what I'm seeing here on RHEL 7.
>>>
>>> Here's the (proposed) POSIX wording:
>>>
>>> http://austingroupbugs.net/view.php?id=415
>>>
>>> And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole,
>>> so maybe we should special case it.
>>>
>> 
>> Uggh.  Historical practice on Solaris (and therefore the POSIX wording)
>> says that SEEK_HOLE in a trailing hole is allowed (but not required) to
>> seek to EOF instead of reporting the offset requested.  I have no clue
>> why this was done, but it is VERY annoying - it means that if you
>> provide an offset within a tail hole of a file, you cannot reliably tell
>> if the file ends in a hole or with data, without ALSO trying SEEK_DATA.
>>  For applications that are reading a file sequentially but skipping over
>> holes, this behavior is fine (it short-circuits the hole/data search
>> points and might shave an iteration off a lop).  But for OUR purposes,
>> where we are merely trying to ascertain whether we are in a hole, we
>> have an inaccurate response - since SEEK_HOLE does NOT return the offset
>> we passed in, we are prone to treat the offset as belonging to data,
>> which is a pessimization (you never get wrong results by treating a hole
>> as data and reading it, but it is definitely slower).
>> 
>> I think you HAVE to call lseek() twice, both with SEEK_HOLE and with
>> SEEK_DATA, if you want to accurately determine whether an offset happens
>> to live within a trailing hole.
>
> Here's a table of possible situations, based solely on POSIX wording
> (and not on actual tests on Solaris or Linux, although it shouldn't be
> too hard to confirm behavior):
>
> 0-length file:
> lseek(fd, 0, SEEK_HOLE) => -1 ENXIO
> lseek(fd, 0, SEEK_DATA) => -1 ENXIO
> conclusion: 0 is at EOF

Isn't this a special case of the next one?

> file of any size:
> lseek(fd, size_or_larger, SEEK_HOLE) => -1 ENXIO
> lseek(fd, size_or_larger, SEEK_DATA) => -1 ENXIO
> conclusion: size_or_larger is at or beyond EOF
>
> file where offset is in a hole, but data appears later:
> lseek(fd, offset, SEEK_HOLE) => offset
> lseek(fd, offset, SEEK_DATA) => end_of_hole
> conclusion: offset through end_of_hole is in a hole
>
> file where offset is data, whether or not a hole appears later:
> lseek(fd, offset, SEEK_HOLE) => end_of_data
> lseek(fd, offset, SEEK_DATA) => offset
> conclusion: offset through end_of_data is in data
>
> file where offset is in a tail hole, option 1:
> lseek(fd, offset, SEEK_HOLE) => offset
> lseek(fd, offset, SEEK_DATA) => -1 ENXIO
> conclusion: offset through EOF is in hole, but another seek needed to
> learn EOF
>
> file where offset is in a tail hole, option 2:
> lseek(fd, offset, SEEK_HOLE) => EOF
> lseek(fd, offset, SEEK_DATA) => -1 ENXIO
> conclusion: offset through EOF is in hole, no additional seek needed
>
> The two calls are both necessary, in order to learn which extant type
> offset belongs to, and to tell where that extant ends; and the behaviors
> are distinguishable (if both lseek() succeed, we have both numbers we
> want; if both fail with ENXIO, we know the offset is at or beyond EOF;
> and if only SEEK_HOLE fails with ENXIO, we know we have a trailing
> hole); and we can tell at runtime what to do about a trailing hole (if
> the return value is offset, we need one more lseek(fd, 0, SEEK_END) to
> find EOF; if the return value is larger than offset, we have EOF for
> free).  You can optimize by calling SEEK_HOLE first (if it fails with
> ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is
> insufficient to give you all the information you need.

Not discussed: how to handle failures other than ENXIO.

The appended code still avoids a second seek in one case.  Useful mostly
because it saves us from handling a second seek's contradictory
information.


/*
 * Find allocation range in @bs around offset @start.
 * May change underlying file descriptor's file offset.
 * If @start is not in a hole, store @start in @data, and the
 * beginning of the next hole in @hole, and return 0.
 * If @start is in a non-trailing hole, store @start in @hole and the
 * beginning of the next non-hole in @data, and return 0.
 * If @start is in a trailing hole or beyond EOF, return -ENXIO.
 * If we can't find out, return a negative errno other than -ENXIO.
 */
static int find_allocation(BlockDriverState *bs, off_t start,
                           off_t *data, off_t *hole)
{
#if defined SEEK_HOLE && defined SEEK_DATA
    BDRVRawState *s = bs->opaque;
    off_t offs;

    /*
     * SEEK_DATA cases:
     * D1. offs == start: start is in data
     * D2. offs > start: start is in a hole, next data at offs
     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
     *                              or start is beyond EOF
     *     If the latter happens, the file has been truncated behind
     *     our back since we opened it.  Best we can do is treat like
     *     a trailing hole.
     * D4. offs < 0, errno != ENXIO: we learned nothing
     */
    offs = lseek(s->fd, start, SEEK_DATA);
    if (offs < 0) {
        return -errno;          /* D3 or D4 */
    }
    assert(offs >= start);

    if (offs > start) {
        /* D2: in hole, next data at offs */
        *hole = start;
        *data = offs;
        return 0;
    }

    /* D1: in data, end not yet known */

    /*
     * SEEK_HOLE cases:
     * H1. offs == start: start is in a hole
     *     If this happens here, a hole has been dug behind our back
     *     since the previous lseek().
     * H2. offs > start: either start is in data, next hole at offs,
     *                   or start is in trailing hole, EOF at offs
     *     Linux treats trailing holes like any other hole: offs ==
     *     start.  Solaris seeks to EOF instead: offs > start (blech).
     *     If that happens here, a hole has been dug behind our back
     *     since the previous lseek().
     * H3. offs < 0, errno = ENXIO: start is beyond EOF
     *     If this happens, the file has been truncated behind our
     *     back since we opened it.  Treat it like a trailing hole.
     * H4. offs < 0, errno != ENXIO: we learned nothing
     *     Pretend we know nothing at all, i.e. "forget" about D1.
     */
    offs = lseek(s->fd, start, SEEK_HOLE);
    if (offs < 0) {
        return -errno;          /* D1 and (H3 or H4) */
    }
    assert(offs >= start);

    if (offs > start) {
        /*
         * D1 and H2: either in data, next hole at offs, or it was in
         * data but is now in a trailing hole.  Treating the latter as
         * if it there was data extending to EOF is safe, so simply do
         * that.
         */
        *data = start;
        *hole = offs;
        return 0;
    }

    /* D1 and H1 */
    return -EBUSY;
#else
    return -ENOTSUP;
#endif
}
Eric Blake Nov. 15, 2014, 12:47 a.m. UTC | #11
On 11/14/2014 06:12 AM, Markus Armbruster wrote:
>> 0-length file:
>> lseek(fd, 0, SEEK_HOLE) => -1 ENXIO
>> lseek(fd, 0, SEEK_DATA) => -1 ENXIO
>> conclusion: 0 is at EOF
> 
> Isn't this a special case of the next one?
> 
>> file of any size:
>> lseek(fd, size_or_larger, SEEK_HOLE) => -1 ENXIO
>> lseek(fd, size_or_larger, SEEK_DATA) => -1 ENXIO
>> conclusion: size_or_larger is at or beyond EOF

Yes.

>>
>> The two calls are both necessary, in order to learn which extant type
>> offset belongs to, and to tell where that extant ends; and the behaviors
>> are distinguishable (if both lseek() succeed, we have both numbers we
>> want; if both fail with ENXIO, we know the offset is at or beyond EOF;
>> and if only SEEK_HOLE fails with ENXIO, we know we have a trailing
>> hole); and we can tell at runtime what to do about a trailing hole (if
>> the return value is offset, we need one more lseek(fd, 0, SEEK_END) to
>> find EOF; if the return value is larger than offset, we have EOF for
>> free).  You can optimize by calling SEEK_HOLE first (if it fails with
>> ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is
>> insufficient to give you all the information you need.
> 
> Not discussed: how to handle failures other than ENXIO.
> 
> The appended code still avoids a second seek in one case.  Useful mostly
> because it saves us from handling a second seek's contradictory
> information.

Slick - I focused on SEEK_HOLE first, but you focused on SEEK_DATA
first.  Your comments make all the difference.

> 
> 
> /*
>  * Find allocation range in @bs around offset @start.
>  * May change underlying file descriptor's file offset.
>  * If @start is not in a hole, store @start in @data, and the
>  * beginning of the next hole in @hole, and return 0.
>  * If @start is in a non-trailing hole, store @start in @hole and the
>  * beginning of the next non-hole in @data, and return 0.
>  * If @start is in a trailing hole or beyond EOF, return -ENXIO.

And caller can blindly and safely treat that as a trailing hole, as needed.

>  * If we can't find out, return a negative errno other than -ENXIO.
>  */
> static int find_allocation(BlockDriverState *bs, off_t start,
>                            off_t *data, off_t *hole)
> {
> #if defined SEEK_HOLE && defined SEEK_DATA

I seriously doubt you'd find a system with one but not both of these
constants defined.  But it doesn't hurt to check both.

>     BDRVRawState *s = bs->opaque;
>     off_t offs;
> 
>     /*
>      * SEEK_DATA cases:
>      * D1. offs == start: start is in data
>      * D2. offs > start: start is in a hole, next data at offs
>      * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
>      *                              or start is beyond EOF
>      *     If the latter happens, the file has been truncated behind
>      *     our back since we opened it.  Best we can do is treat like
>      *     a trailing hole.
>      * D4. offs < 0, errno != ENXIO: we learned nothing
>      */

Correct.

>     offs = lseek(s->fd, start, SEEK_DATA);
>     if (offs < 0) {
>         return -errno;          /* D3 or D4 */
>     }
>     assert(offs >= start);
> 
>     if (offs > start) {
>         /* D2: in hole, next data at offs */
>         *hole = start;
>         *data = offs;
>         return 0;
>     }
> 
>     /* D1: in data, end not yet known */
> 
>     /*
>      * SEEK_HOLE cases:
>      * H1. offs == start: start is in a hole
>      *     If this happens here, a hole has been dug behind our back
>      *     since the previous lseek().
>      * H2. offs > start: either start is in data, next hole at offs,
>      *                   or start is in trailing hole, EOF at offs
>      *     Linux treats trailing holes like any other hole: offs ==
>      *     start.  Solaris seeks to EOF instead: offs > start (blech).

Correct in isolation.  Coupled with the additional knowledge that we are
in state D1 (and already treated D3 as a trailing hole with early exit),...

>      *     If that happens here, a hole has been dug behind our back
>      *     since the previous lseek().

...this is further true for this function.

>      * H3. offs < 0, errno = ENXIO: start is beyond EOF
>      *     If this happens, the file has been truncated behind our
>      *     back since we opened it.  Treat it like a trailing hole.
>      * H4. offs < 0, errno != ENXIO: we learned nothing
>      *     Pretend we know nothing at all, i.e. "forget" about D1.
>      */
>     offs = lseek(s->fd, start, SEEK_HOLE);
>     if (offs < 0) {
>         return -errno;          /* D1 and (H3 or H4) */
>     }
>     assert(offs >= start);
> 
>     if (offs > start) {
>         /*
>          * D1 and H2: either in data, next hole at offs, or it was in
>          * data but is now in a trailing hole.  Treating the latter as
>          * if it there was data extending to EOF is safe, so simply do
>          * that.
>          */
>         *data = start;
>         *hole = offs;
>         return 0;
>     }

Reasonable.

> 
>     /* D1 and H1 */
>     return -EBUSY;
> #else
>     return -ENOTSUP;
> #endif
> }

I like it.  Maybe we could do better than -ENOTSUP (by treating the
entire file as data and the hole at EOF), but if the caller handles
ENOTSUP differently from ENXIO, you don't necessarily need to do it here.

Looking forward to this in an actual v3 patch.
diff mbox

Patch

diff --git a/block/raw-posix.c b/block/raw-posix.c
index fd80d84..2a12a50 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1494,8 +1494,9 @@  static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
     } else {
         /* On a hole.  We need another syscall to find its end.  */
         *data = lseek(s->fd, start, SEEK_DATA);
-        if (*data == -1) {
-            *data = lseek(s->fd, 0, SEEK_END);
+        if (*data < 0) {
+            /* no idea where the hole ends, give up (unlikely to happen) */
+            return -errno;
         }
     }