diff mbox

libstdc++/67747 Allocate space for dirent::d_name

Message ID 20150929113726.GU12094@redhat.com
State New
Headers show

Commit Message

Jonathan Wakely Sept. 29, 2015, 11:37 a.m. UTC
POSIX says that dirent::d_name has an unspecified length, so calls to
readdir_r must pass a buffer with enough trailing space for
{NAME_MAX}+1 characters. I wasn't doing that, which works OK on
GNU/Linux and BSD where d_name is a large array, but fails on Solaris
32-bit.

This uses pathconf to get NAME_MAX and allocates a buffer.

Tested powerpc64le-linux and x86_64-dragonfly4.1, I'm going to commit
this to trunk today (and backport all the filesystem fixes to
gcc-5-branch).

Comments

Martin Sebor Sept. 29, 2015, 6:54 p.m. UTC | #1
On 09/29/2015 05:37 AM, Jonathan Wakely wrote:
> POSIX says that dirent::d_name has an unspecified length, so calls to
> readdir_r must pass a buffer with enough trailing space for
> {NAME_MAX}+1 characters. I wasn't doing that, which works OK on
> GNU/Linux and BSD where d_name is a large array, but fails on Solaris
> 32-bit.
>
> This uses pathconf to get NAME_MAX and allocates a buffer.
>
> Tested powerpc64le-linux and x86_64-dragonfly4.1, I'm going to commit
> this to trunk today (and backport all the filesystem fixes to
> gcc-5-branch).

Calling pathconf is only necessary when _POSIX_NO_TRUNC is zero
which I think exists mainly for legacy file systems. Otherwise,
it's safe to use NAME_MAX instead. Avoiding the call to pathconf
also avoids the TOCTOU between it and the call to opendir, and
hardcoding the value makes it possible to avoid dynamically
allocating the dirent buffer.

I didn't remember the MAX_PATH value on Windows anymore but from
what I've just read online it sounds like it's defined to 260.

Defaulting to 255 on POSIX is appropriate. On XSI systems, the
minimum required value is _XOPEN_NAME_MAX which is 255 (I would
suggest using the macro instead when it's defined). Otherwise,
the strictly conforming minimum value would be 14 -- the value
of _POSIX_NAME_MAX, but since 255 is greater it's fine.

Other than that, I tend to be leery of using plain char arrays
as buffers for objects of bigger types. I don't know to what
extent this is a problem for libstdc++ anymore as more and more
hardware is tolerant of misaligned accesses and as the default
new expression typically returns memory suitably aligned for
the largest fundamental type. But since there is no requirement
in the language that it do so and I would tend to err on the
side of caution and use operator new (as opposed to
new char[len]).

Martin

PS I'm interpreting _POSIX_NO_TRUNC being zero as more
restrictive than if it was non-zero and so calling pathconf(p,
_PC_NO_TRUNC) should be required to also return non-zero for
such an implementation, regardless of p. But let me check that
I'm reading it right.
Jonathan Wakely Sept. 30, 2015, 11:01 a.m. UTC | #2
On 29/09/15 12:54 -0600, Martin Sebor wrote:
>On 09/29/2015 05:37 AM, Jonathan Wakely wrote:
>>POSIX says that dirent::d_name has an unspecified length, so calls to
>>readdir_r must pass a buffer with enough trailing space for
>>{NAME_MAX}+1 characters. I wasn't doing that, which works OK on
>>GNU/Linux and BSD where d_name is a large array, but fails on Solaris
>>32-bit.
>>
>>This uses pathconf to get NAME_MAX and allocates a buffer.
>>
>>Tested powerpc64le-linux and x86_64-dragonfly4.1, I'm going to commit
>>this to trunk today (and backport all the filesystem fixes to
>>gcc-5-branch).
>
>Calling pathconf is only necessary when _POSIX_NO_TRUNC is zero
>which I think exists mainly for legacy file systems. Otherwise,
>it's safe to use NAME_MAX instead. Avoiding the call to pathconf

Oh, nice. I was using NAME_MAX originally but the glibc readdir_r(3)
man-page has an example using pathconf and says that should be used,
so I went down that route.

>also avoids the TOCTOU between it and the call to opendir, and

Also nice.

>hardcoding the value makes it possible to avoid dynamically
>allocating the dirent buffer.

Can we be sure NAME_MAX will never be too big for the stack?

As currently written _Dir::advance() can call itself recursively to
skip the . and .. directories, so if we put the dirent buffer on the
stack then maybe we should re-use the same one not create three large
frames.

>I didn't remember the MAX_PATH value on Windows anymore but from
>what I've just read online it sounds like it's defined to 260.

Yes, I found that value, but I think I found something saying the
individual components were limited to 255. I'll make it 260 anyway. I
think that might relate to UTF-16 characters, so we'd need a larger
buffer for narrow characters, but I'm not sure what mingw supports
here anyway. The Windows implementations are just stubs where someone
who cares can add working code.

>Defaulting to 255 on POSIX is appropriate. On XSI systems, the
>minimum required value is _XOPEN_NAME_MAX which is 255 (I would
>suggest using the macro instead when it's defined). Otherwise,
>the strictly conforming minimum value would be 14 -- the value
>of _POSIX_NAME_MAX, but since 255 is greater it's fine.
>
>Other than that, I tend to be leery of using plain char arrays
>as buffers for objects of bigger types. I don't know to what
>extent this is a problem for libstdc++ anymore as more and more
>hardware is tolerant of misaligned accesses and as the default
>new expression typically returns memory suitably aligned for
>the largest fundamental type. But since there is no requirement
>in the language that it do so and I would tend to err on the
>side of caution and use operator new (as opposed to
>new char[len]).

For some reason I thought new char[len] would be suitably aligned for
any type with sizeof <= len, but that's only true for operator new.
(I should check I haven't made the same assumption elsewhere in the
library!)

std::aligned_union<offsetof(dirent, d_name)+NAME_MAX, dirent>::type
will give us what we need.


>Martin
>
>PS I'm interpreting _POSIX_NO_TRUNC being zero as more
>restrictive than if it was non-zero and so calling pathconf(p,
>_PC_NO_TRUNC) should be required to also return non-zero for
>such an implementation, regardless of p. But let me check that
>I'm reading it right.
>
Martin Sebor Sept. 30, 2015, 3:30 p.m. UTC | #3
On 09/30/2015 05:01 AM, Jonathan Wakely wrote:
> On 29/09/15 12:54 -0600, Martin Sebor wrote:
>> On 09/29/2015 05:37 AM, Jonathan Wakely wrote:
>>> POSIX says that dirent::d_name has an unspecified length, so calls to
>>> readdir_r must pass a buffer with enough trailing space for
>>> {NAME_MAX}+1 characters. I wasn't doing that, which works OK on
>>> GNU/Linux and BSD where d_name is a large array, but fails on Solaris
>>> 32-bit.
>>>
>>> This uses pathconf to get NAME_MAX and allocates a buffer.
>>>
>>> Tested powerpc64le-linux and x86_64-dragonfly4.1, I'm going to commit
>>> this to trunk today (and backport all the filesystem fixes to
>>> gcc-5-branch).
>>
>> Calling pathconf is only necessary when _POSIX_NO_TRUNC is zero
>> which I think exists mainly for legacy file systems. Otherwise,
>> it's safe to use NAME_MAX instead. Avoiding the call to pathconf
>
> Oh, nice. I was using NAME_MAX originally but the glibc readdir_r(3)
> man-page has an example using pathconf and says that should be used,
> so I went down that route.

GLIBC pathconf calls statfs to get the NAME_MAX value from the OS,
so there's some chance that some unusual file system will define
it as greater than 255. I tested all those on my Fedora PC and on
my RHEL 7.2 powerpc64le box and didn't find one.

There also isn't one in the Wikipedia comparison of file systems:
   https://en.wikipedia.org/wiki/Comparison_of_file_systems

But to be 100% safe, we would need to call pathconf if readdir
failed due to truncation.

> Can we be sure NAME_MAX will never be too big for the stack?

When it's defined I think it's probably safe in practice but not
guaranteed. Also, like all of these <limits.h> constants, it's not
guaranteed to be defined at all when the implementation doesn't
impose a limit. I believe AIX is one implementation that doesn't
define it. So some preprocessor magic will be required to deal
with that.

Martin
Martin Sebor Oct. 1, 2015, 10:56 p.m. UTC | #4
> Other than that, I tend to be leery of using plain char arrays
> as buffers for objects of bigger types. I don't know to what
> extent this is a problem for libstdc++ anymore as more and more
> hardware is tolerant of misaligned accesses and as the default
> new expression typically returns memory suitably aligned for
> the largest fundamental type. But since there is no requirement
> in the language that it do so and I would tend to err on the
> side of caution and use operator new (as opposed to
> new char[len]).

Actually, after re-reading 5.3.4, I see that this idiom is supposed
to be safe. Paragraph 11 has this note explaining the requirement
for the char array form of the new expression to adjust offset
the (already aligned) pointer returned by the allocation function
by a multiple of the strictest fundamental alignment:

   Note: Because allocation functions are assumed to return pointers
   to storage that is appropriately aligned for objects of any type
   with fundamental alignment, this constraint on array allocation
   overhead permits the common idiom of allocating character arrays
   into which objects of other types will later be placed.

Sorry for sounding the false alarm. I must have been remembering
problems with arrays allocated on the stack.

Martin
Florian Weimer Oct. 2, 2015, 12:16 p.m. UTC | #5
On 09/29/2015 01:37 PM, Jonathan Wakely wrote:
> POSIX says that dirent::d_name has an unspecified length, so calls to
> readdir_r must pass a buffer with enough trailing space for
> {NAME_MAX}+1 characters. I wasn't doing that, which works OK on
> GNU/Linux and BSD where d_name is a large array, but fails on Solaris
> 32-bit.
> 
> This uses pathconf to get NAME_MAX and allocates a buffer.

This still has a buffer overflow on certain file systems.

You must not use readdir_r, it is deprecated and always insecure.  We
should probably mark it as such in the glibc headers.

Have we already released code which uses readdir_r?

Florian
Jonathan Wakely Oct. 2, 2015, 12:34 p.m. UTC | #6
On 02/10/15 14:16 +0200, Florian Weimer wrote:
>On 09/29/2015 01:37 PM, Jonathan Wakely wrote:
>> POSIX says that dirent::d_name has an unspecified length, so calls to
>> readdir_r must pass a buffer with enough trailing space for
>> {NAME_MAX}+1 characters. I wasn't doing that, which works OK on
>> GNU/Linux and BSD where d_name is a large array, but fails on Solaris
>> 32-bit.
>>
>> This uses pathconf to get NAME_MAX and allocates a buffer.
>
>This still has a buffer overflow on certain file systems.
>
>You must not use readdir_r, it is deprecated and always insecure.  We
>should probably mark it as such in the glibc headers.

OK, I'll just use readdir() then. The directory stream is private to
the library type, so the only way to call readdir() concurrently on a
single directory stream is to increment iterators concurrently, which
is undefined anyway.

So that will work as long as readdir() doesn't use a global static
buffer shared between streams, i.e. it meets the POSIX requirement
that "They shall not be affected by a call to readdir() on a different
directory stream." I don't know if mingw meets that, but there is lots
of work needed to make this stuff work in mingw.

>Have we already released code which uses readdir_r?

No, it's only on trunk and the gcc-5-branch, not in any release.
Sebastian Huber Oct. 2, 2015, 12:37 p.m. UTC | #7
On 02/10/15 14:16, Florian Weimer wrote:
> On 09/29/2015 01:37 PM, Jonathan Wakely wrote:
>> >POSIX says that dirent::d_name has an unspecified length, so calls to
>> >readdir_r must pass a buffer with enough trailing space for
>> >{NAME_MAX}+1 characters. I wasn't doing that, which works OK on
>> >GNU/Linux and BSD where d_name is a large array, but fails on Solaris
>> >32-bit.
>> >
>> >This uses pathconf to get NAME_MAX and allocates a buffer.
> This still has a buffer overflow on certain file systems.
>
> You must not use readdir_r, it is deprecated and always insecure.  We
> should probably mark it as such in the glibc headers.

The READDIR(3) man page should be updated as well, since it doesn't 
mention that readdir_r() is deprecated and always insecure.
Florian Weimer Oct. 2, 2015, 12:41 p.m. UTC | #8
On 10/02/2015 02:34 PM, Jonathan Wakely wrote:
> On 02/10/15 14:16 +0200, Florian Weimer wrote:
>> On 09/29/2015 01:37 PM, Jonathan Wakely wrote:
>>> POSIX says that dirent::d_name has an unspecified length, so calls to
>>> readdir_r must pass a buffer with enough trailing space for
>>> {NAME_MAX}+1 characters. I wasn't doing that, which works OK on
>>> GNU/Linux and BSD where d_name is a large array, but fails on Solaris
>>> 32-bit.
>>>
>>> This uses pathconf to get NAME_MAX and allocates a buffer.
>>
>> This still has a buffer overflow on certain file systems.
>>
>> You must not use readdir_r, it is deprecated and always insecure.  We
>> should probably mark it as such in the glibc headers.
> 
> OK, I'll just use readdir() then. The directory stream is private to
> the library type, so the only way to call readdir() concurrently on a
> single directory stream is to increment iterators concurrently, which
> is undefined anyway.

Right, that's the only case where readdir_r could be theoretically
useful.  But it's not a global structure, the callers have to coordinate
anyway, and so you could well use an external lock.

> So that will work as long as readdir() doesn't use a global static
> buffer shared between streams, i.e. it meets the POSIX requirement
> that "They shall not be affected by a call to readdir() on a different
> directory stream." I don't know if mingw meets that, but there is lots
> of work needed to make this stuff work in mingw.

If mingw has this flaw, it is worth fixing on its own, and mingw is
sufficiently alive that sticking workarounds into libstdc++ for its bugs
doesn't make sense (IMHO).

>> Have we already released code which uses readdir_r?
> 
> No, it's only on trunk and the gcc-5-branch, not in any release.

Good, no need to treat this as a security vulnerability. :)
Florian Weimer Oct. 2, 2015, 12:52 p.m. UTC | #9
On 10/02/2015 02:37 PM, Sebastian Huber wrote:
> 
> 
> On 02/10/15 14:16, Florian Weimer wrote:
>> On 09/29/2015 01:37 PM, Jonathan Wakely wrote:
>>> >POSIX says that dirent::d_name has an unspecified length, so calls to
>>> >readdir_r must pass a buffer with enough trailing space for
>>> >{NAME_MAX}+1 characters. I wasn't doing that, which works OK on
>>> >GNU/Linux and BSD where d_name is a large array, but fails on Solaris
>>> >32-bit.
>>> >
>>> >This uses pathconf to get NAME_MAX and allocates a buffer.
>> This still has a buffer overflow on certain file systems.
>>
>> You must not use readdir_r, it is deprecated and always insecure.  We
>> should probably mark it as such in the glibc headers.
> 
> The READDIR(3) man page should be updated as well, since it doesn't
> mention that readdir_r() is deprecated and always insecure.

Right, and I filed: https://bugzilla.kernel.org/show_bug.cgi?id=105391

Florian
Martin Sebor Oct. 2, 2015, 4:57 p.m. UTC | #10
On 10/02/2015 06:34 AM, Jonathan Wakely wrote:
> On 02/10/15 14:16 +0200, Florian Weimer wrote:
>> On 09/29/2015 01:37 PM, Jonathan Wakely wrote:
>>> POSIX says that dirent::d_name has an unspecified length, so calls to
>>> readdir_r must pass a buffer with enough trailing space for
>>> {NAME_MAX}+1 characters. I wasn't doing that, which works OK on
>>> GNU/Linux and BSD where d_name is a large array, but fails on Solaris
>>> 32-bit.
>>>
>>> This uses pathconf to get NAME_MAX and allocates a buffer.
>>
>> This still has a buffer overflow on certain file systems.
>>
>> You must not use readdir_r, it is deprecated and always insecure.  We
>> should probably mark it as such in the glibc headers.
>
> OK, I'll just use readdir() then. The directory stream is private to
> the library type, so the only way to call readdir() concurrently on a
> single directory stream is to increment iterators concurrently, which
> is undefined anyway.
>
> So that will work as long as readdir() doesn't use a global static
> buffer shared between streams, i.e. it meets the POSIX requirement
> that "They shall not be affected by a call to readdir() on a different
> directory stream." I don't know if mingw meets that, but there is lots
> of work needed to make this stuff work in mingw.

Readdir isn't required to be thread-safe (it may reference global
data) so calling it in multiple threads even with a different dirp
argument is undefined. A thread-unsafe implementation can meet the
POSIX requirement and still access global data but without locking.

The Solaris implementation, for example, is explicitly documented
as thread unsafe.

Martin
Jonathan Wakely Oct. 2, 2015, 5:01 p.m. UTC | #11
On 02/10/15 10:57 -0600, Martin Sebor wrote:
>On 10/02/2015 06:34 AM, Jonathan Wakely wrote:
>>On 02/10/15 14:16 +0200, Florian Weimer wrote:
>>>On 09/29/2015 01:37 PM, Jonathan Wakely wrote:
>>>>POSIX says that dirent::d_name has an unspecified length, so calls to
>>>>readdir_r must pass a buffer with enough trailing space for
>>>>{NAME_MAX}+1 characters. I wasn't doing that, which works OK on
>>>>GNU/Linux and BSD where d_name is a large array, but fails on Solaris
>>>>32-bit.
>>>>
>>>>This uses pathconf to get NAME_MAX and allocates a buffer.
>>>
>>>This still has a buffer overflow on certain file systems.
>>>
>>>You must not use readdir_r, it is deprecated and always insecure.  We
>>>should probably mark it as such in the glibc headers.
>>
>>OK, I'll just use readdir() then. The directory stream is private to
>>the library type, so the only way to call readdir() concurrently on a
>>single directory stream is to increment iterators concurrently, which
>>is undefined anyway.
>>
>>So that will work as long as readdir() doesn't use a global static
>>buffer shared between streams, i.e. it meets the POSIX requirement
>>that "They shall not be affected by a call to readdir() on a different
>>directory stream." I don't know if mingw meets that, but there is lots
>>of work needed to make this stuff work in mingw.
>
>Readdir isn't required to be thread-safe (it may reference global
>data) so calling it in multiple threads even with a different dirp
>argument is undefined. A thread-unsafe implementation can meet the
>POSIX requirement and still access global data but without locking.
>
>The Solaris implementation, for example, is explicitly documented
>as thread unsafe.

At this point I think I'm just going to disable the filesystem lib on
Solaris!
Florian Weimer Oct. 2, 2015, 5:09 p.m. UTC | #12
On 10/02/2015 06:57 PM, Martin Sebor wrote:

> Readdir isn't required to be thread-safe (it may reference global
> data) so calling it in multiple threads even with a different dirp
> argument is undefined. A thread-unsafe implementation can meet the
> POSIX requirement and still access global data but without locking.

A readdir implementation which is not thread-safe when used with
different directory streams is just buggy.  It may be conforming, but so
is an implementation that always fails with EOVERFLOW.

> The Solaris implementation, for example, is explicitly documented
> as thread unsafe.

MT-safety is ambiguous for functions which operate on pointer arguments.
 It is not clear if it is permitted to call the function without
synchronization on overlapping objects.

memcpy has the same thread-safety level as readdir on Solaris
(operations on overlapping objects are not thread-safe, non-overlapping
objects are), and is documented to be MT-safe.

Florian
Martin Sebor Oct. 2, 2015, 5:37 p.m. UTC | #13
On 10/02/2015 11:09 AM, Florian Weimer wrote:
> On 10/02/2015 06:57 PM, Martin Sebor wrote:
>
>> Readdir isn't required to be thread-safe (it may reference global
>> data) so calling it in multiple threads even with a different dirp
>> argument is undefined. A thread-unsafe implementation can meet the
>> POSIX requirement and still access global data but without locking.
>
> A readdir implementation which is not thread-safe when used with
> different directory streams is just buggy.  It may be conforming, but so
> is an implementation that always fails with EOVERFLOW.

POSIX is clear: The readdir() function need not be thread-safe.
There's nothing wrong with relying on a function's thread safety
when it's documented to be thread safe by the implementation,
even if POSIX doesn't guarantee it. But doing so otherwise,
against both the standard and against the documentation, would
be risky to say the least.

>
>> The Solaris implementation, for example, is explicitly documented
>> as thread unsafe.
>
> MT-safety is ambiguous for functions which operate on pointer arguments.
>   It is not clear if it is permitted to call the function without
> synchronization on overlapping objects.
>
> memcpy has the same thread-safety level as readdir on Solaris

I'm not sure what you are basing this assertion on. In the man
pages I have looked at, memcpy is documented as MT-Safe. readdir
is documented as MT-Unsafe. The Unsafe definition is clear:
contains global and static data that is not protected.

For example, Solaris 11.2:
http://docs.oracle.com/cd/E36784_01/html/E36874/readdir-3c.html
http://docs.oracle.com/cd/E36784_01/html/E36874/memcpy-3c.html

Martin
Florian Weimer Oct. 2, 2015, 5:43 p.m. UTC | #14
On 10/02/2015 07:37 PM, Martin Sebor wrote:

> I'm not sure what you are basing this assertion on. In the man
> pages I have looked at, memcpy is documented as MT-Safe. readdir
> is documented as MT-Unsafe. The Unsafe definition is clear:
> contains global and static data that is not protected.

I think the Solaris thread-safety attributes are a bit too simplistic to
capture the whole scope of thread safety issues for functions operating
on mutable data.

> For example, Solaris 11.2:
> http://docs.oracle.com/cd/E36784_01/html/E36874/readdir-3c.html

“It is safe to use readdir() in a threaded application, so long as only
one thread reads from the directory stream at any given time. The
readdir() function is generally preferred over the readdir_r() function.”

Florian
Martin Sebor Oct. 2, 2015, 5:53 p.m. UTC | #15
On 10/02/2015 11:43 AM, Florian Weimer wrote:
> On 10/02/2015 07:37 PM, Martin Sebor wrote:
>
>> I'm not sure what you are basing this assertion on. In the man
>> pages I have looked at, memcpy is documented as MT-Safe. readdir
>> is documented as MT-Unsafe. The Unsafe definition is clear:
>> contains global and static data that is not protected.
>
> I think the Solaris thread-safety attributes are a bit too simplistic to
> capture the whole scope of thread safety issues for functions operating
> on mutable data.
>
>> For example, Solaris 11.2:
>> http://docs.oracle.com/cd/E36784_01/html/E36874/readdir-3c.html
>
> “It is safe to use readdir() in a threaded application, so long as only
> one thread reads from the directory stream at any given time. The
> readdir() function is generally preferred over the readdir_r() function.”

Ah, I see the discrepancy. I had a few pages open, one for Solaris
9 and one for 11.2. The newer one has the quote, the older one does
not.

Here's the older page:
http://docs.oracle.com/cd/E19683-01/816-0213/6m6ne3895/index.html

With that, I agree that using readdir is thread-safe on Solaris 11.2.
It is not safe on Solaris 9, and it would not be safe on other systems
that don't document it as such.

Martin
Florian Weimer Oct. 2, 2015, 6:02 p.m. UTC | #16
On 10/02/2015 07:53 PM, Martin Sebor wrote:

> Here's the older page:
> http://docs.oracle.com/cd/E19683-01/816-0213/6m6ne3895/index.html
> 
> With that, I agree that using readdir is thread-safe on Solaris 11.2.
> It is not safe on Solaris 9, and it would not be safe on other systems
> that don't document it as such.

I'm pretty sure the Solaris implementation did not change, only the
documentation.

Florian
diff mbox

Patch

commit 16ff5d124b8e6c5d1f9dd4edb81b6ca5c9129134
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Sep 29 11:58:19 2015 +0100

    PR libstdc++/67747 Allocate space for dirent::d_name
    
    	PR libstdc++/67747
    	* src/filesystem/dir.cc (_Dir::dirent_buf): New member.
    	(get_name_max): New function.
    	(native_readdir) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Copy to supplied
    	dirent object. Handle end of directory.
    	(_Dir::advance): Allocate space for d_name.

diff --git a/libstdc++-v3/src/filesystem/dir.cc b/libstdc++-v3/src/filesystem/dir.cc
index bce751c..d29f8eb 100644
--- a/libstdc++-v3/src/filesystem/dir.cc
+++ b/libstdc++-v3/src/filesystem/dir.cc
@@ -25,8 +25,12 @@ 
 #include <experimental/filesystem>
 #include <utility>
 #include <stack>
+#include <stddef.h>
 #include <string.h>
 #include <errno.h>
+#ifdef _GLIBCXX_HAVE_UNISTD_H
+# include <unistd.h>
+#endif
 #ifdef _GLIBCXX_HAVE_DIRENT_H
 # ifdef _GLIBCXX_HAVE_SYS_TYPES_H
 #  include <sys/types.h>
@@ -64,20 +68,23 @@  struct fs::_Dir
   fs::path		path;
   directory_entry	entry;
   file_type		type = file_type::none;
+  unique_ptr<char[]>	dirent_buf;
 };
 
 namespace
 {
   template<typename Bitmask>
-    inline bool is_set(Bitmask obj, Bitmask bits)
+    inline bool
+    is_set(Bitmask obj, Bitmask bits)
     {
       return (obj & bits) != Bitmask::none;
     }
 
   // Returns {dirp, p} on success, {nullptr, p} on error.
   // If an ignored EACCES error occurs returns {}.
-  fs::_Dir
-  open_dir(const fs::path& p, fs::directory_options options, std::error_code* ec)
+  inline fs::_Dir
+  open_dir(const fs::path& p, fs::directory_options options,
+	   std::error_code* ec)
   {
     if (ec)
       ec->clear();
@@ -99,8 +106,22 @@  namespace
     return {nullptr, p};
   }
 
+  inline long
+  get_name_max(const fs::path& path __attribute__((__unused__)))
+  {
+#ifdef _GLIBCXX_HAVE_UNISTD_H
+    long name_max = pathconf(path.c_str(), _PC_NAME_MAX);
+    if (name_max != -1)
+      return name_max;
+#endif
+
+    // Maximum path component on Windows is 255 (UTF-16?) characters,
+    // which is a reasonable default for POSIX too.
+    return 255;
+  }
+
   inline fs::file_type
-  get_file_type(const dirent& d __attribute__((__unused__)))
+  get_file_type(const ::dirent& d __attribute__((__unused__)))
   {
 #ifdef _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE
     switch (d.d_type)
@@ -129,12 +150,26 @@  namespace
 #endif
   }
 
-  int
+  inline int
   native_readdir(DIR* dirp, ::dirent*& entryp)
   {
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-    if ((entryp = ::readdir(dirp)))
-      return 0;
+    const int saved_errno = errno;
+    errno = 0;
+    if (auto entp = ::readdir(dirp))
+      {
+	size_t name_len = strlen(entp->d_name);
+	if (name_len > 255)
+	  return ENAMETOOLONG;
+	size_t len = offsetof(::dirent, d_name) + name_len + 1;
+	memcpy(entryp, entp, len);
+	return 0;
+      }
+    else if (errno == 0) // End of directory reached.
+      {
+	errno = saved_errno;
+	entryp = nullptr;
+      }
     return errno;
 #else
     return ::readdir_r(dirp, entryp, &entryp);
@@ -142,6 +177,7 @@  namespace
   }
 }
 
+
 // Returns false when the end of the directory entries is reached.
 // Reports errors by setting ec or throwing.
 bool
@@ -150,9 +186,15 @@  fs::_Dir::advance(error_code* ec, directory_options options)
   if (ec)
     ec->clear();
 
-  ::dirent ent;
-  ::dirent* result = &ent;
-  if (int err = native_readdir(dirp, result))
+  if (!dirent_buf)
+    {
+      size_t len = offsetof(::dirent, d_name) + get_name_max(path) + 1;
+      dirent_buf.reset(new char[len]);
+    }
+
+  ::dirent* entp = reinterpret_cast<::dirent*>(dirent_buf.get());
+
+  if (int err = native_readdir(dirp, entp))
     {
       if (err == EACCES
         && is_set(options, directory_options::skip_permission_denied))
@@ -165,13 +207,13 @@  fs::_Dir::advance(error_code* ec, directory_options options)
       ec->assign(err, std::generic_category());
       return true;
     }
-  else if (result != nullptr)
+  else if (entp != nullptr)
     {
       // skip past dot and dot-dot
-      if (!strcmp(ent.d_name, ".") || !strcmp(ent.d_name, ".."))
+      if (!strcmp(entp->d_name, ".") || !strcmp(entp->d_name, ".."))
 	return advance(ec, options);
-      entry = fs::directory_entry{path / ent.d_name};
-      type = get_file_type(ent);
+      entry = fs::directory_entry{path / entp->d_name};
+      type = get_file_type(*entp);
       return true;
     }
   else