Message ID | 20201002170620.1611673-5-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | Fix getdents{64} regression on some FS | expand |
* Adhemerval Zanella via Libc-alpha: > It reserves some space on the allocated internal buffer to be > used as a the returned dirent struct. The kernel obtained dirent64 > struct are copied to the temporary buffer on each readdir call. > > The overflow test is moved once the dirent64 entry is copied > to the temporary buffer, and a subsequent readdir will obtain the > next entry. The idea is an overflow fails to return the entry on > readdir, but a next readdir might still obtain the next entry. > (for filesystem that does not have the concept of sequential d_off, > such as ext4). Can you make the translation buffer a separate allocation? Then you can resize it to accommodate long names. Thanks, Florian
On 13/10/2020 12:59, Florian Weimer wrote: > * Adhemerval Zanella via Libc-alpha: > >> It reserves some space on the allocated internal buffer to be >> used as a the returned dirent struct. The kernel obtained dirent64 >> struct are copied to the temporary buffer on each readdir call. >> >> The overflow test is moved once the dirent64 entry is copied >> to the temporary buffer, and a subsequent readdir will obtain the >> next entry. The idea is an overflow fails to return the entry on >> readdir, but a next readdir might still obtain the next entry. >> (for filesystem that does not have the concept of sequential d_off, >> such as ext4). > > Can you make the translation buffer a separate allocation? Then you can > resize it to accommodate long names. I followed the idea of mips64 getdents64 (which came from your suggestion) and it seems that at least for most majority of filesystems [1], the buffer size should accommodate the maximum supported file size (ReiserFS seems to be an outlier, but it also seems that Linux VFS limits the maximum size to 255 characters). Using a single allocation make the code slight faster (which not sure if plays much difference since a lot of memory copy would be required for non-LFS) and it does not require to handle the possible reallocation failure on readdir. [1] https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits (Not sure how exact this reflect the latest Linux version though)
* Adhemerval Zanella: > Using a single allocation make the code slight faster (which not sure if > plays much difference since a lot of memory copy would be required for > non-LFS) and it does not require to handle the possible reallocation > failure on readdir. But you'd still have to handle the case where name is longer than the allocation, to avoid a buffer overflow. So an error case still exists. Thanks, Florian
On 19/10/2020 05:18, Florian Weimer wrote: > * Adhemerval Zanella: > >> Using a single allocation make the code slight faster (which not sure if >> plays much difference since a lot of memory copy would be required for >> non-LFS) and it does not require to handle the possible reallocation >> failure on readdir. > > But you'd still have to handle the case where name is longer than the > allocation, to avoid a buffer overflow. So an error case still exists. > > Thanks, > Florian The dirstream_ret_entry already handles it by returning EOVERFLOW for such case. Also keep in mind this limitation is only for the old non-LFS interface, for the LFS one it will use all the available allocated buffer created by the opendir (it still have the 32kb limit for getdents64 call). I can add a more complex buffer handling to resize the auxiliary buffer, but I am not sure it really pays of the complexity.
* Adhemerval Zanella: > On 19/10/2020 05:18, Florian Weimer wrote: >> * Adhemerval Zanella: >> >>> Using a single allocation make the code slight faster (which not sure if >>> plays much difference since a lot of memory copy would be required for >>> non-LFS) and it does not require to handle the possible reallocation >>> failure on readdir. >> >> But you'd still have to handle the case where name is longer than the >> allocation, to avoid a buffer overflow. So an error case still exists. >> >> Thanks, >> Florian > > The dirstream_ret_entry already handles it by returning EOVERFLOW for such > case. Also keep in mind this limitation is only for the old non-LFS > interface, for the LFS one it will use all the available allocated buffer > created by the opendir (it still have the 32kb limit for getdents64 call). > I can add a more complex buffer handling to resize the auxiliary buffer, > but I am not sure it really pays of the complexity. Hmm. The error code should be ENAMETOOLONG. Ideally, it should be delayed until the end of the stream, so that the rest of the directory can be listed (similar to what we do for readdir_r). It's probably okay to report an ENOMEM failure immediately. Anyway, I'll look at the overflow check and see if we can move this forward, and then revisit the translation buffer allocation in a separate patch. Thanks, Florian
On 19/10/2020 17:50, Florian Weimer wrote: > * Adhemerval Zanella: > >> On 19/10/2020 05:18, Florian Weimer wrote: >>> * Adhemerval Zanella: >>> >>>> Using a single allocation make the code slight faster (which not sure if >>>> plays much difference since a lot of memory copy would be required for >>>> non-LFS) and it does not require to handle the possible reallocation >>>> failure on readdir. >>> >>> But you'd still have to handle the case where name is longer than the >>> allocation, to avoid a buffer overflow. So an error case still exists. >>> >>> Thanks, >>> Florian >> >> The dirstream_ret_entry already handles it by returning EOVERFLOW for such >> case. Also keep in mind this limitation is only for the old non-LFS >> interface, for the LFS one it will use all the available allocated buffer >> created by the opendir (it still have the 32kb limit for getdents64 call). >> I can add a more complex buffer handling to resize the auxiliary buffer, >> but I am not sure it really pays of the complexity. > > Hmm. The error code should be ENAMETOOLONG. Ideally, it should be > delayed until the end of the stream, so that the rest of the directory > can be listed (similar to what we do for readdir_r). It's probably okay > to report an ENOMEM failure immediately. ENAMETOOLONG does make more sense, I have fixed it locally. I am not sure I understood by 'delayed', it it report on the readdir call, but the next entry will still be reported in the next readdir call. The construction such as won't work: while (readdir (dp) != NULL) { [...] } But this will work as intended: while (1) { errno = 0; struct dirent *entry = readdir (dp); if (entry == NULL) { if (errno == ENAMETOOLONG) continue; break; } } Even if go to buffer resize, application will still need to handle ENOMEM so I am not sure if using separated buffer is really an improvement here (at least on trying to adequate usual way to call opendir functions). > > Anyway, I'll look at the overflow check and see if we can move this > forward, and then revisit the translation buffer allocation in a > separate patch. Thanks.
* Adhemerval Zanella via Libc-alpha: > ENAMETOOLONG does make more sense, I have fixed it locally. I am not > sure I understood by 'delayed', it it report on the readdir call, but > the next entry will still be reported in the next readdir call. The > construction such as won't work: > > while (readdir (dp) != NULL) > { > [...] > } > > But this will work as intended: > > while (1) > { > errno = 0; > struct dirent *entry = readdir (dp); > if (entry == NULL) > { > if (errno == ENAMETOOLONG) > continue; > break; > } > } I think it's unreasonable to expect that this pattern will work. It assumes that readdir updates dp despite the error. In other implementations, this may produce an endless loop. This is why for readdir_r, we record the ENAMETOOLONG error as pending, and report it only at the end. (We should do the same for EOVERFLOW errors.) > Even if go to buffer resize, application will still need to handle > ENOMEM so I am not sure if using separated buffer is really an > improvement here (at least on trying to adequate usual way to call > opendir functions). ENOMEM is a temporary error, ENAMETOOLONG depends on file system content and may be much more difficult to resolve. Thanks, Florian
On 20/10/2020 04:38, Florian Weimer wrote: > * Adhemerval Zanella via Libc-alpha: > >> ENAMETOOLONG does make more sense, I have fixed it locally. I am not >> sure I understood by 'delayed', it it report on the readdir call, but >> the next entry will still be reported in the next readdir call. The >> construction such as won't work: >> >> while (readdir (dp) != NULL) >> { >> [...] >> } >> >> But this will work as intended: >> >> while (1) >> { >> errno = 0; >> struct dirent *entry = readdir (dp); >> if (entry == NULL) >> { >> if (errno == ENAMETOOLONG) >> continue; >> break; >> } >> } > > I think it's unreasonable to expect that this pattern will work. It > assumes that readdir updates dp despite the error. In other > implementations, this may produce an endless loop. This is why for > readdir_r, we record the ENAMETOOLONG error as pending, and report it > only at the end. (We should do the same for EOVERFLOW errors.) This is what POSIX states as the way to check for errors: "Applications wishing to check for error situations should set errno to 0 before calling readdir(). If errno is set to non-zero on return, an error occurred." What the standard is not clear is whether the position as the directory stream is updated in a case of failure. I think it is a fair assumption since it also states that 'When the end of the directory is encountered, a null pointer shall be returned and errno is not changed.'. So there is a distinction between end of directory stream and a failure (in the example above checking for 'errno == 0' should indicate the end of the stream). And it also defines EOVERFLOW as possible transient error. It is possible to do for readdir_r because the function return is different than the returned 'direct' itself, for readdir what we have is the 'errno' to signal such failure. > >> Even if go to buffer resize, application will still need to handle >> ENOMEM so I am not sure if using separated buffer is really an >> improvement here (at least on trying to adequate usual way to call >> opendir functions). > > ENOMEM is a temporary error, ENAMETOOLONG depends on file system content > and may be much more difficult to resolve. Even with ENOMEM, application will require to actually handle a possible readdir failure and acts accordingly (either by retrying or skip the file). But I consider failing with ENOMEM in fact much more confusing, it exposes internal implementation that needs to be handled by the application (the internal memory reallocation), where ENAMETOOLONG is indeed a system limitation that can be resolved by using LFS support. That's why this limitation is only for non-LFS interface. This whole fix is a best effort to a deprecated interface.
* Adhemerval Zanella: > On 20/10/2020 04:38, Florian Weimer wrote: >> * Adhemerval Zanella via Libc-alpha: >> >>> ENAMETOOLONG does make more sense, I have fixed it locally. I am not >>> sure I understood by 'delayed', it it report on the readdir call, but >>> the next entry will still be reported in the next readdir call. The >>> construction such as won't work: >>> >>> while (readdir (dp) != NULL) >>> { >>> [...] >>> } >>> >>> But this will work as intended: >>> >>> while (1) >>> { >>> errno = 0; >>> struct dirent *entry = readdir (dp); >>> if (entry == NULL) >>> { >>> if (errno == ENAMETOOLONG) >>> continue; >>> break; >>> } >>> } >> >> I think it's unreasonable to expect that this pattern will work. Itg >> assumes that readdir updates dp despite the error. In other >> implementations, this may produce an endless loop. This is why for >> readdir_r, we record the ENAMETOOLONG error as pending, and report it >> only at the end. (We should do the same for EOVERFLOW errors.) > > This is what POSIX states as the way to check for errors: > > "Applications wishing to check for error situations should set errno > to 0 before calling readdir(). If errno is set to non-zero on return, > an error occurred." > > What the standard is not clear is whether the position as the directory > stream is updated in a case of failure. Yes, that was my concern. I found this in POSIX (in 2.3 Error Numbers): | If an error condition is detected, the action requested may have been | partially performed, unless otherwise stated. But in general we would be rather concerned if a regular read or write on a file descriptor failed *and* still updated the file offset. Not sure about directory streams. > I think it is a fair assumption > since it also states that 'When the end of the directory is encountered, > a null pointer shall be returned and errno is not changed.'. So there > is a distinction between end of directory stream and a failure (in the > example above checking for 'errno == 0' should indicate the end of the > stream). The question is whether you are supposed to be able to continue reading and encounter errno == 0 eventually. In general, this is simply not true, consider EIO or EUCLEAN. This is why I went with delayed error reporting for readdir_r: it is simply not reasonable to expect that applications continue calling readddir_r or readdir after the first error because it's not clear how to can avoid an endless loop (even assuming that the failing readdir call somehow advanced the read pointer). I hope this explains my point of view. > And it also defines EOVERFLOW as possible transient error. It > is possible to do for readdir_r because the function return is different > than the returned 'direct' itself, for readdir what we have is the > 'errno' to signal such failure. I don't understand this point. The approach to detect EOF vs error is different for readdir_r, but the question whether continue after the first error is equally relevant. >>> Even if go to buffer resize, application will still need to handle >>> ENOMEM so I am not sure if using separated buffer is really an >>> improvement here (at least on trying to adequate usual way to call >>> opendir functions). >> >> ENOMEM is a temporary error, ENAMETOOLONG depends on file system content >> and may be much more difficult to resolve. > > Even with ENOMEM, application will require to actually handle a possible > readdir failure and acts accordingly (either by retrying or skip the > file). But I consider failing with ENOMEM in fact much more confusing, > it exposes internal implementation that needs to be handled by the > application (the internal memory reallocation), where ENAMETOOLONG is > indeed a system limitation that can be resolved by using LFS support. I think readdir can already fail with ENOMEM if the error comes from the kernel, for example from ext4_ind_map_blocks and ext4_alloc_branch. So adding another corner case where ENOMEM can result does not seem egregiously bad to me. Thanks, Florian
On 20/10/2020 09:35, Florian Weimer wrote: > * Adhemerval Zanella: > >> On 20/10/2020 04:38, Florian Weimer wrote: >>> * Adhemerval Zanella via Libc-alpha: >>> >>>> ENAMETOOLONG does make more sense, I have fixed it locally. I am not >>>> sure I understood by 'delayed', it it report on the readdir call, but >>>> the next entry will still be reported in the next readdir call. The >>>> construction such as won't work: >>>> >>>> while (readdir (dp) != NULL) >>>> { >>>> [...] >>>> } >>>> >>>> But this will work as intended: >>>> >>>> while (1) >>>> { >>>> errno = 0; >>>> struct dirent *entry = readdir (dp); >>>> if (entry == NULL) >>>> { >>>> if (errno == ENAMETOOLONG) >>>> continue; >>>> break; >>>> } >>>> } >>> >>> I think it's unreasonable to expect that this pattern will work. Itg >>> assumes that readdir updates dp despite the error. In other >>> implementations, this may produce an endless loop. This is why for >>> readdir_r, we record the ENAMETOOLONG error as pending, and report it >>> only at the end. (We should do the same for EOVERFLOW errors.) >> >> This is what POSIX states as the way to check for errors: >> >> "Applications wishing to check for error situations should set errno >> to 0 before calling readdir(). If errno is set to non-zero on return, >> an error occurred." >> >> What the standard is not clear is whether the position as the directory >> stream is updated in a case of failure. > > Yes, that was my concern. I found this in POSIX (in 2.3 Error Numbers): > > | If an error condition is detected, the action requested may have been > | partially performed, unless otherwise stated. > > But in general we would be rather concerned if a regular read or write > on a file descriptor failed *and* still updated the file offset. Not > sure about directory streams. I think the snippet you stated does not really prevent us to return an empty entry with a transient error (ENAMETOOLONG), since it is clear different than entries read error for getdents itself (EINVAL, ENOENT). Former mean that only the referred entry could not be obtained, where latter means that either there is a EOF (if the directory has been removed) or the buffer is too small. For latter we might still try to resize the internal buffer to accommodate the required size, but this incur in the memory failure possible error nonetheless. > >> I think it is a fair assumption >> since it also states that 'When the end of the directory is encountered, >> a null pointer shall be returned and errno is not changed.'. So there >> is a distinction between end of directory stream and a failure (in the >> example above checking for 'errno == 0' should indicate the end of the >> stream). > > The question is whether you are supposed to be able to continue reading > and encounter errno == 0 eventually. In general, this is simply not > true, consider EIO or EUCLEAN. This is why I went with delayed error > reporting for readdir_r: it is simply not reasonable to expect that > applications continue calling readddir_r or readdir after the first > error because it's not clear how to can avoid an endless loop (even > assuming that the failing readdir call somehow advanced the read > pointer). > > I hope this explains my point of view. I understood your point, however we are constraint by opendir interface. Different than readdir_r, for readdir we can't signal that some error has occurred in obtained the entry while returning the partial entry information (for the case of ENAMETOOLONG for instance). We can just skip the ENAMETOOLONG entries is set the errno only for EOF, but even with this approach we might encounter different errors (due getdents failure for instance) which might shadow this specific interface error glibc might trigger. > >> And it also defines EOVERFLOW as possible transient error. It >> is possible to do for readdir_r because the function return is different >> than the returned 'direct' itself, for readdir what we have is the >> 'errno' to signal such failure. > > I don't understand this point. The approach to detect EOF vs error is > different for readdir_r, but the question whether continue after the > first error is equally relevant. My point is if we decide to ignore entries with ENAMETOOLONG, how to signal the readdir caller this has happened? > >>>> Even if go to buffer resize, application will still need to handle >>>> ENOMEM so I am not sure if using separated buffer is really an >>>> improvement here (at least on trying to adequate usual way to call >>>> opendir functions). >>> >>> ENOMEM is a temporary error, ENAMETOOLONG depends on file system content >>> and may be much more difficult to resolve. >> >> Even with ENOMEM, application will require to actually handle a possible >> readdir failure and acts accordingly (either by retrying or skip the >> file). But I consider failing with ENOMEM in fact much more confusing, >> it exposes internal implementation that needs to be handled by the >> application (the internal memory reallocation), where ENAMETOOLONG is >> indeed a system limitation that can be resolved by using LFS support. > > I think readdir can already fail with ENOMEM if the error comes from the > kernel, for example from ext4_ind_map_blocks and ext4_alloc_branch. So > adding another corner case where ENOMEM can result does not seem > egregiously bad to me. Is it a possible error for getdents? I couldn't really track it, but I haven't dig into kernel code that deep. In any case it would be good to update the getdents implementation.
On 20/10/2020 09:35, Florian Weimer wrote: > * Adhemerval Zanella: > >> On 20/10/2020 04:38, Florian Weimer wrote: >>> * Adhemerval Zanella via Libc-alpha: >>> >>>> ENAMETOOLONG does make more sense, I have fixed it locally. I am not >>>> sure I understood by 'delayed', it it report on the readdir call, but >>>> the next entry will still be reported in the next readdir call. The >>>> construction such as won't work: >>>> >>>> while (readdir (dp) != NULL) >>>> { >>>> [...] >>>> } >>>> >>>> But this will work as intended: >>>> >>>> while (1) >>>> { >>>> errno = 0; >>>> struct dirent *entry = readdir (dp); >>>> if (entry == NULL) >>>> { >>>> if (errno == ENAMETOOLONG) >>>> continue; >>>> break; >>>> } >>>> } >>> >>> I think it's unreasonable to expect that this pattern will work. Itg >>> assumes that readdir updates dp despite the error. In other >>> implementations, this may produce an endless loop. This is why for >>> readdir_r, we record the ENAMETOOLONG error as pending, and report it >>> only at the end. (We should do the same for EOVERFLOW errors.) >> >> This is what POSIX states as the way to check for errors: >> >> "Applications wishing to check for error situations should set errno >> to 0 before calling readdir(). If errno is set to non-zero on return, >> an error occurred." >> >> What the standard is not clear is whether the position as the directory >> stream is updated in a case of failure. > > Yes, that was my concern. I found this in POSIX (in 2.3 Error Numbers): > > | If an error condition is detected, the action requested may have been > | partially performed, unless otherwise stated. > > But in general we would be rather concerned if a regular read or write > on a file descriptor failed *and* still updated the file offset. Not > sure about directory streams. > >> I think it is a fair assumption >> since it also states that 'When the end of the directory is encountered, >> a null pointer shall be returned and errno is not changed.'. So there >> is a distinction between end of directory stream and a failure (in the >> example above checking for 'errno == 0' should indicate the end of the >> stream). > > The question is whether you are supposed to be able to continue reading > and encounter errno == 0 eventually. In general, this is simply not > true, consider EIO or EUCLEAN. This is why I went with delayed error > reporting for readdir_r: it is simply not reasonable to expect that > applications continue calling readddir_r or readdir after the first > error because it's not clear how to can avoid an endless loop (even > assuming that the failing readdir call somehow advanced the read > pointer). > > I hope this explains my point of view. Thinking twice I think a resizable buffer does make sense in this situation. I will send an updated set with this modification.
diff --git a/sysdeps/unix/sysv/linux/opendir.c b/sysdeps/unix/sysv/linux/opendir.c index 2198224588..24bd63d2ba 100644 --- a/sysdeps/unix/sysv/linux/opendir.c +++ b/sysdeps/unix/sysv/linux/opendir.c @@ -22,6 +22,7 @@ #include <sys/param.h> /* For MIN and MAX. */ #include <not-cancel.h> +#include <readdir.h> /* For return_buffer_size. */ enum { opendir_oflags = O_RDONLY|O_NDELAY|O_DIRECTORY|O_LARGEFILE|O_CLOEXEC @@ -103,8 +104,9 @@ __alloc_dir (int fd, bool close_fd, int flags, const struct stat64 *statp) enum { max_buffer_size = 1048576 }; const size_t allocation_size = 32768; - _Static_assert (allocation_size >= sizeof (struct dirent64), - "allocation_size < sizeof (struct dirent64)"); + _Static_assert (allocation_size >= sizeof (struct dirent64) + + return_buffer_size, + "opendir buffer size smaller than required"); /* Increase allocation if requested, but not if the value appears to be bogus. It will be between 32Kb and 1Mb. */ diff --git a/sysdeps/unix/sysv/linux/readdir.c b/sysdeps/unix/sysv/linux/readdir.c index ca2a8964e9..8eab0f4c9b 100644 --- a/sysdeps/unix/sysv/linux/readdir.c +++ b/sysdeps/unix/sysv/linux/readdir.c @@ -19,7 +19,7 @@ #include <dirent.h> #if !_DIRENT_MATCHES_DIRENT64 -#include <dirstream.h> +#include <readdir.h> /* Read a directory entry from DIRP. */ struct dirent * @@ -30,16 +30,12 @@ __readdir_unlocked (DIR *dirp) do { - size_t reclen; - if (dirp->offset >= dirp->size) { /* We've emptied out our buffer. Refill it. */ - size_t maxread = dirp->allocation; - ssize_t bytes; - - bytes = __getdents (dirp->fd, dirp->data, maxread); + ssize_t bytes = __getdents64 (dirp->fd, dirstream_data (dirp), + dirstream_alloc_size (dirp)); if (bytes <= 0) { /* On some systems getdents fails with ENOENT when the @@ -54,19 +50,18 @@ __readdir_unlocked (DIR *dirp) dp = NULL; break; } - dirp->size = (size_t) bytes; + dirp->size = bytes; /* Reset the offset into the buffer. */ dirp->offset = 0; } - dp = (struct dirent *) &dirp->data[dirp->offset]; - - reclen = dp->d_reclen; - - dirp->offset += reclen; - - dirp->filepos = dp->d_off; + dp = dirstream_ret_entry (dirp); + if (dp == NULL) + { + __set_errno (EOVERFLOW); + break; + } /* Skip deleted files. */ } while (dp->d_ino == 0); diff --git a/sysdeps/unix/sysv/linux/readdir.h b/sysdeps/unix/sysv/linux/readdir.h new file mode 100644 index 0000000000..4dc219e220 --- /dev/null +++ b/sysdeps/unix/sysv/linux/readdir.h @@ -0,0 +1,117 @@ +/* Linux readdir internal implementation details. + Copyright (C) 2020 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#ifndef _DIRSTREAM_NOLFS_H +#define _DIRSTREAM_NOLFS_H + +#if !_DIRENT_MATCHES_DIRENT64 +# include <dirstream.h> + +/* getdents64 is used internally for both LFS and non-LFS implementations. + The non-LFS interface reserves part of the allocated buffer to return the + non-LFS 'struct dirent' entry. */ + +/* This defines the reserved space size on DIR internal buffer to use as the + returned 'struct dirent' from a 'readdir' call. + + The largest possible practical length of the d_name member are 255 + Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus + 10 bytes from header, for a total of 776 bytes total. + + Also it should take in cosideration the alignment requirement for + getdents64 call. */ +enum { return_buffer_size = 1024 + + sizeof (off64_t) + - _Alignof (((struct __dirstream) {0}).data) }; + +_Static_assert ((_Alignof (((struct __dirstream) {0}).data) + + return_buffer_size) % sizeof (off64_t) == 0, + "return_buffer_size does not align the buffer properly"); + +/* Return the avaliable buffer size to use with getdents64 calls. */ +static inline size_t +dirstream_alloc_size (struct __dirstream *ds) +{ + return ds->allocation - return_buffer_size; +} + +/* Return the start of the allocated buffer minus the reserved part to use on + non-LFS readdir call. */ +static inline void * +dirstream_data (struct __dirstream *ds) +{ + return (char *) ds->data + return_buffer_size; +} + +/* Return the allocated buffer used on non-LFS readdir call. */ +static inline struct dirent * +dirstream_ret (struct __dirstream *ds) +{ + return (struct dirent *) ds->data; +} + +/* Return the current dirent64 entry from the reserved buffer used on + getdent64. */ +static inline struct dirent64 * +dirstream_entry (struct __dirstream *ds) +{ + size_t offset = return_buffer_size + ds->offset; + return (struct dirent64 *) ((char *) ds->data + offset); +} + +/* Copy one obtained entry from 'getdents64' call to the reserved space + on DS allocated buffer and updated its internal state. */ +static inline struct dirent * +dirstream_ret_entry (struct __dirstream *ds) +{ + struct dirent64 *dp64 = dirstream_entry (ds); + struct dirent *dp = dirstream_ret (ds); + + dp->d_ino = dp64->d_ino; + + dp->d_off = dp64->d_off; + if (dp->d_off != dp64->d_off) + /* Overflow. */ + return NULL; + + const size_t size_diff = (offsetof (struct dirent64, d_name) + - offsetof (struct dirent, d_name)); + const size_t alignment = _Alignof (struct dirent); + size_t new_reclen = (dp64->d_reclen - size_diff + alignment - 1) + & ~(alignment - 1); + if (new_reclen > return_buffer_size) + /* Overflow. */ + return NULL; + dp->d_reclen = new_reclen; + + dp->d_type = dp64->d_type; + + memcpy (dp->d_name, dp64->d_name, + dp64->d_reclen - offsetof (struct dirent64, d_name)); + + ds->offset += dp64->d_reclen; + ds->filepos = dp64->d_off; + + return dp; +} +#else +/* No need to reserve an buffer space if dirent has already LFS support. */ +enum { return_buffer_size = 0 }; +#endif /* _DIRENT_MATCHES_DIRENT64 */ + +#endif