[4/4] Remove broken posix_fallocate, posix_falllocate64 fallback code [BZ#15661]
diff mbox

Message ID 20150424134516.6795441F484D0@oldenburg.str.redhat.com
State New
Headers show

Commit Message

Florian Weimer April 24, 2015, 12:53 p.m. UTC
The previous implementation could result in silent data corruption,
and this has been observed to happen with application code.
---
 ChangeLog                                          |  18 ++++
 NEWS                                               |  22 ++--
 sysdeps/posix/posix_fallocate.c                    |  93 -----------------
 sysdeps/posix/posix_fallocate64.c                  | 113 ---------------------
 .../sysv/linux/mips/mips64/n32/posix_fallocate.c   |   8 +-
 .../sysv/linux/mips/mips64/n32/posix_fallocate64.c |   9 +-
 sysdeps/unix/sysv/linux/posix_fallocate.c          |   8 +-
 sysdeps/unix/sysv/linux/posix_fallocate64.c        |  26 +++--
 .../unix/sysv/linux/wordsize-64/posix_fallocate.c  |  10 +-
 9 files changed, 56 insertions(+), 251 deletions(-)
 delete mode 100644 sysdeps/posix/posix_fallocate.c
 delete mode 100644 sysdeps/posix/posix_fallocate64.c

Comments

Florian Weimer May 5, 2015, 3:37 p.m. UTC | #1
On 04/24/2015 02:53 PM, Florian Weimer wrote:
> The previous implementation could result in silent data corruption,
> and this has been observed to happen with application code.

I'd appreciate some comment on this patch.  Do you agree that this is
the right approach?
Paul Eggert May 5, 2015, 3:59 p.m. UTC | #2
On 05/05/2015 08:37 AM, Florian Weimer wrote:
> On 04/24/2015 02:53 PM, Florian Weimer wrote:
>> The previous implementation could result in silent data corruption,
>> and this has been observed to happen with application code.
> I'd appreciate some comment on this patch.  Do you agree that this is
> the right approach?
>

I just now read the bug report and patch and agree that the patch is the 
right way to go.
Carlos O'Donell May 5, 2015, 8:28 p.m. UTC | #3
On 04/24/2015 08:53 AM, Florian Weimer wrote:
> The previous implementation could result in silent data corruption,
> and this has been observed to happen with application code.

In principle I agree with the removal of all of the fallback fallocate
code, it simply can't work reliably, and a reliable solution is ridiculously
expensive (see Rich's comments in the BZ about CAS over all the mmap'd pages).

The bug with O_APPEND files is real, and yet another reason to remove the
fallback code.

My opinion is that some of the failure modes talked about in the bugzilla
are invalid, for example having another thread or process calling truncate
is already a race condition, you don't need the fallocate fallback to
expose a race that corrupts data. The other thread might truncate after
you had written data to the file, resulting in the loss of data since
there was no synchronization. If there was synchronization then there would
be no problem since the thread calling truncate would wait for posix_fallocate
to complete before truncating.

The other side of the coin is that POSIX goes on further to say in
"2.9.7 Thread Interactions with Regular File Operations" that threads
should never see interleaving sets of file operations, but it is insane
to do anything like that because it kills performance, so you don't get
those guarantees in Linux.

What worries me though is that this change could break existing systems
that relied on this emulation to do something sensible for filesystems
that don't support fallocate. These binaries could easily be single threaded
systems with no other process touching their files and writing to filesystems
that don't support fallocate. If that is a sensible class of users, then we
need to version the interface, with the old version continuing to call the
fallback code and the new version not calling the fallback code.

In summary:

OK to checkin as long as you version the interface to prevent breaking
existing applications. Unless you can show that all filesystems a sensible
person might care about support fallocate, making versioning a waste of
time.

Thoughts?

Cheers,
Carlos.
Christoph Hellwig May 5, 2015, 8:48 p.m. UTC | #4
On Tue, May 05, 2015 at 04:28:41PM -0400, Carlos O'Donell wrote:
> The other side of the coin is that POSIX goes on further to say in
> "2.9.7 Thread Interactions with Regular File Operations" that threads
> should never see interleaving sets of file operations, but it is insane
> to do anything like that because it kills performance, so you don't get
> those guarantees in Linux.

Which specific guarantees do you see violated with a sane filesystem like
XFS?
Florian Weimer May 6, 2015, 7:19 a.m. UTC | #5
On 05/05/2015 10:28 PM, Carlos O'Donell wrote:
> On 04/24/2015 08:53 AM, Florian Weimer wrote:
>> The previous implementation could result in silent data corruption,
>> and this has been observed to happen with application code.
> 
> In principle I agree with the removal of all of the fallback fallocate
> code, it simply can't work reliably, and a reliable solution is ridiculously
> expensive (see Rich's comments in the BZ about CAS over all the mmap'd pages).

It's also not covered by the memory model, I think.

> The bug with O_APPEND files is real, and yet another reason to remove the
> fallback code.

We should handle that better at the very least.

We could clear O_APPEND, but only in single-threaded mode; I don't think
it's worth the effort.  Re-opening the descriptor through /proc/self/fd
does not work because closing that descriptor would release POSIX
advisory locks.

> What worries me though is that this change could break existing systems
> that relied on this emulation to do something sensible for filesystems
> that don't support fallocate. These binaries could easily be single threaded
> systems with no other process touching their files and writing to filesystems
> that don't support fallocate. If that is a sensible class of users, then we
> need to version the interface, with the old version continuing to call the
> fallback code and the new version not calling the fallback code.

After sleeping over your comments, I actually did my homework.  The gist
is that we cannot remove fallback, I think not even with the
compatibility symbol.

Various file systems do not support fallocate.  This includes NFS, where
even the most recent version makes it optional to implement in the server.

SQLite ignores the posix_fallocate return value, but MariaDB does not.
A recompiled MariaDB would suddenly start to fail, and the DBA would
have to disable pre-allocation in the configuration.  If I read the
source correctly, systemd-journald will stop logging, and there is no
knob to turn off fallocate.  Same for libvirt, it will fail to create
backing files for storage devices.

Both MariaDB and libvirt are often run on NFS storage, so a glibc change
which removes fallback would actually affect them.  For the code we
ship, we can move the fallback to the applications, but there is no good
way to make sure that happens with third-party applications.  I do not
believe the compatibility symbol mechanism is a good alternative because
the breakage will be file-system-dependent and may not be noticed during
testing.  (I'm generally skeptical of using compatibility symbols this way.)

Maybe we could remove the write loop and perform only an ftruncate call
which (hopefully) increases the file size.  This would take care of the
O_APPEND issue and remove most of the races.  Using posix_fallocate to
avoid ENOSPC later would not work, but with thin provisioning,
deduplicating storage and compression going around these days, I don't
think writing zero blocks has that effect in practice anyway
(particularly not on NFS).  I'll ask around.
Carlos O'Donell May 6, 2015, 8:58 p.m. UTC | #6
On 05/05/2015 04:48 PM, Christoph Hellwig wrote:
> On Tue, May 05, 2015 at 04:28:41PM -0400, Carlos O'Donell wrote:
>> The other side of the coin is that POSIX goes on further to say in
>> "2.9.7 Thread Interactions with Regular File Operations" that threads
>> should never see interleaving sets of file operations, but it is insane
>> to do anything like that because it kills performance, so you don't get
>> those guarantees in Linux.
> 
> Which specific guarantees do you see violated with a sane filesystem like
> XFS?

I have not verified that XFS behaves as is expected by POSIX, but I was 
going by Linus's comments when this issue was discussed and then fixed
in 3.14.

In particular:
http://article.gmane.org/gmane.linux.kernel/398249

With the original thread here:
http://thread.gmane.org/gmane.linux.kernel/397980

Would an fstat on XFS show the in-progress IO being done by a call to
write? If it does, then it violates POSIX, which requires that none
or all of the write show up in the fstat call.

The standard statement in question is:
~~~
2.9.7 Thread Interactions with Regular File Operations
All of the functions chmod( ), close( ), fchmod( ), fcntl( ), fstat( ), 
ftruncate( ), lseek( ), open( ), read( ), readlink( ), stat( ), symlink( ), 
and write( ) shall be atomic with respect to each other in the effects 
specified in IEEE Std 1003.1-2001 when they operate on regular files. If two 
threads each call one of these functions, each call shall either see all of 
the specified effects of the other call, or none of them.
~~~

Cheers,
Carlos.
Paul Eggert May 6, 2015, 10:48 p.m. UTC | #7
Florian Weimer wrote:
> Maybe we could remove the write loop and perform only an ftruncate call
> which (hopefully) increases the file size.  This would take care of the
> O_APPEND issue and remove most of the races.

I like this idea.

> Using posix_fallocate to
> avoid ENOSPC later would not work, but with thin provisioning,
> deduplicating storage and compression going around these days, I don't
> think writing zero blocks has that effect in practice anyway

That's right.

> (particularly not on NFS).

It's in draft NFS v4.2 as the ALLOCATE operation; see:

https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-38

This is pretty much bleeding-edge of course.
Rich Felker May 6, 2015, 11:29 p.m. UTC | #8
On Wed, May 06, 2015 at 04:58:56PM -0400, Carlos O'Donell wrote:
> On 05/05/2015 04:48 PM, Christoph Hellwig wrote:
> > On Tue, May 05, 2015 at 04:28:41PM -0400, Carlos O'Donell wrote:
> >> The other side of the coin is that POSIX goes on further to say in
> >> "2.9.7 Thread Interactions with Regular File Operations" that threads
> >> should never see interleaving sets of file operations, but it is insane
> >> to do anything like that because it kills performance, so you don't get
> >> those guarantees in Linux.
> > 
> > Which specific guarantees do you see violated with a sane filesystem like
> > XFS?
> 
> I have not verified that XFS behaves as is expected by POSIX, but I was 
> going by Linus's comments when this issue was discussed and then fixed
> in 3.14.
> 
> In particular:
> http://article.gmane.org/gmane.linux.kernel/398249
> 
> With the original thread here:
> http://thread.gmane.org/gmane.linux.kernel/397980
> 
> Would an fstat on XFS show the in-progress IO being done by a call to
> write? If it does, then it violates POSIX, which requires that none
> or all of the write show up in the fstat call.
> 
> The standard statement in question is:
> ~~~
> 2.9.7 Thread Interactions with Regular File Operations
> All of the functions chmod( ), close( ), fchmod( ), fcntl( ), fstat( ), 
> ftruncate( ), lseek( ), open( ), read( ), readlink( ), stat( ), symlink( ), 
> and write( ) shall be atomic with respect to each other in the effects 
> specified in IEEE Std 1003.1-2001 when they operate on regular files. If two 
> threads each call one of these functions, each call shall either see all of 
> the specified effects of the other call, or none of them.
> ~~~

I'm pretty sure Linux has a lot of bugs in this regard. Unless the
standard is to be relaxed, I think the right solution is either for
the kernel to simulate atomicity or to break out of the long write and
return a short write when another operation tries to access the file
state while it's in progress. Sadly there does not seem to be anything
userspace can do to work around the kernel bugs, though.

Rich
Rich Felker May 6, 2015, 11:30 p.m. UTC | #9
On Wed, May 06, 2015 at 03:48:38PM -0700, Paul Eggert wrote:
> Florian Weimer wrote:
> >Maybe we could remove the write loop and perform only an ftruncate call
> >which (hopefully) increases the file size.  This would take care of the
> >O_APPEND issue and remove most of the races.
> 
> I like this idea.

If I'm not mistaken ftruncate could still reduce the file size if it
races with another operation that would extend the file. This is also
a data loss bug.

Rich
Roland McGrath May 7, 2015, 6:19 p.m. UTC | #10
> If I'm not mistaken ftruncate could still reduce the file size if it
> races with another operation that would extend the file. This is also
> a data loss bug.

I concur.
Carlos O'Donell May 7, 2015, 7:01 p.m. UTC | #11
On 05/06/2015 03:19 AM, Florian Weimer wrote:
> On 05/05/2015 10:28 PM, Carlos O'Donell wrote:
>> On 04/24/2015 08:53 AM, Florian Weimer wrote:
>>> The previous implementation could result in silent data corruption,
>>> and this has been observed to happen with application code.
>>
>> In principle I agree with the removal of all of the fallback fallocate
>> code, it simply can't work reliably, and a reliable solution is ridiculously
>> expensive (see Rich's comments in the BZ about CAS over all the mmap'd pages).
> 
> It's also not covered by the memory model, I think.
> 
>> The bug with O_APPEND files is real, and yet another reason to remove the
>> fallback code.
> 
> We should handle that better at the very least.
> 
> We could clear O_APPEND, but only in single-threaded mode; I don't think
> it's worth the effort.  Re-opening the descriptor through /proc/self/fd
> does not work because closing that descriptor would release POSIX
> advisory locks.

I do not think we need to do that, and I agree with some of your comments
below.

Keep in mind that we need only assure that subsequent writes succeed
and that the files is the right length on the filesystem. This in my mind
means we need only call `ftruncate` successfully.

>> What worries me though is that this change could break existing systems
>> that relied on this emulation to do something sensible for filesystems
>> that don't support fallocate. These binaries could easily be single threaded
>> systems with no other process touching their files and writing to filesystems
>> that don't support fallocate. If that is a sensible class of users, then we
>> need to version the interface, with the old version continuing to call the
>> fallback code and the new version not calling the fallback code.
> 
> After sleeping over your comments, I actually did my homework.  The gist
> is that we cannot remove fallback, I think not even with the
> compatibility symbol.
> 
> Various file systems do not support fallocate.  This includes NFS, where
> even the most recent version makes it optional to implement in the server.

OK.

> SQLite ignores the posix_fallocate return value, but MariaDB does not.
> A recompiled MariaDB would suddenly start to fail, and the DBA would
> have to disable pre-allocation in the configuration.  If I read the
> source correctly, systemd-journald will stop logging, and there is no
> knob to turn off fallocate.  Same for libvirt, it will fail to create
> backing files for storage devices.

OK.

> Both MariaDB and libvirt are often run on NFS storage, so a glibc change
> which removes fallback would actually affect them.  For the code we
> ship, we can move the fallback to the applications, but there is no good
> way to make sure that happens with third-party applications.  I do not
> believe the compatibility symbol mechanism is a good alternative because
> the breakage will be file-system-dependent and may not be noticed during
> testing.  (I'm generally skeptical of using compatibility symbols this way.)

That is a difference of opinion, but I buy your analysis, despite our best
efforts with compatibility symbols the NFS use case would remain and users
would see failures everywhere after a recompilation. It would not be prudent
of us to do this, and it is exactly what I worried about.

> Maybe we could remove the write loop and perform only an ftruncate call
> which (hopefully) increases the file size.  This would take care of the
> O_APPEND issue and remove most of the races.  Using posix_fallocate to
> avoid ENOSPC later would not work, but with thin provisioning,
> deduplicating storage and compression going around these days, I don't
> think writing zero blocks has that effect in practice anyway
> (particularly not on NFS).  I'll ask around.

I agree. I was thinking exactly the same thing when I saw the write loop.
Unfortunately only fallocate at the kernel fs layer is going to guarantee
you never see ENOSPC in all reasonable situations.

Cheers,
Carlos.
Florian Weimer May 7, 2015, 7:05 p.m. UTC | #12
On 05/07/2015 08:19 PM, Roland McGrath wrote:
>> If I'm not mistaken ftruncate could still reduce the file size if it
>> races with another operation that would extend the file. This is also
>> a data loss bug.
> 
> I concur.

It happens with length == 0.  We could error out with EINVAL instead of
calling ftruncate.

Daniel Berrange pointed me to these bugs:

  https://sourceware.org/bugzilla/show_bug.cgi?id=17322
  https://bugzilla.redhat.com/show_bug.cgi?id=1140250
  https://bugzilla.redhat.com/show_bug.cgi?id=1077068

This suggests that people actually rely on the current allocation
behavior.  Combined with my previous analysis that applications will
start to fail if we remove the fallback and return EINVAL, I now think
we need to keep the allocation loop.

I don't like this situation.  It's a strong argument against providing
approximate user-space emulation (setxid is another example, I'm sure
there are others).  These experiences may be relevant to the getrandom
debate.

I'm working on a patch with a few minor fixes to posix_fallocate and an
update to the manual.  I don't think we can do better at present,
unfortunately.

Patch
diff mbox

diff --git a/ChangeLog b/ChangeLog
index b927022..9219d8b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,23 @@ 
 2015-04-24  Florian Weimer  <fweimer@redhat.com>
 
+	[BZ#15661]
+	* sysdeps/posix/posix_fallocate.c: Remove.
+	* sysdeps/posix/posix_fallocate64.c: Likewise.
+	* sysdeps/unix/sysv/linux/posix_fallocate.c (posix_fallocate):
+	Remove internal_fallocate function and fallback.
+	* sysdeps/unix/sysv/linux/posix_fallocate64.c
+	(__posix_fallocate64_l64): Likewise.  Establish aliases previously
+	defined in sysdeps/posix/posix_fallocate64.c.
+	* sysdeps/unix/sysv/linux/mips/mips64/n32/posix_fallocate.c
+	(posix_fallocate): Remove internal_fallocate function and
+	fallback.
+	* sysdeps/unix/sysv/linux/mips/mips64/n32/posix_fallocate64.c
+	(__posix_fallocate64_l64): Likewise.
+	* sysdeps/unix/sysv/linux/wordsize-64/posix_fallocate.c
+	(posix_fallocate): Likewise.
+
+2015-04-24  Florian Weimer  <fweimer@redhat.com>
+
 	* sysdeps/unix/sysv/linux/posix_fallocate.c (posix_fallocate):
 	Assume __ASSUME_FALLOCATE is always true.
 	* sysdeps/unix/sysv/linux/posix_fallocate64.c
diff --git a/NEWS b/NEWS
index ccc4d13..016629f 100644
--- a/NEWS
+++ b/NEWS
@@ -9,14 +9,14 @@  Version 2.22
 
 * The following bugs are resolved with this release:
 
-  4719, 6792, 13064, 14094, 14841, 14906, 15319, 15467, 15790, 15969, 16351,
-  16512, 16560, 16783, 16850, 17090, 17195, 17269, 17523, 17542, 17569,
-  17588, 17596, 17620, 17621, 17628, 17631, 17711, 17776, 17779, 17792,
-  17836, 17912, 17916, 17930, 17932, 17944, 17949, 17964, 17965, 17967,
-  17969, 17978, 17987, 17991, 17996, 17998, 17999, 18019, 18020, 18029,
-  18030, 18032, 18036, 18038, 18039, 18042, 18043, 18046, 18047, 18068,
-  18080, 18093, 18100, 18104, 18110, 18111, 18128, 18138, 18185, 18197,
-  18206, 18210, 18211, 18247, 18287.
+  4719, 6792, 13064, 14094, 14841, 14906, 15319, 15467, 15661, 15790, 15969,
+  16351, 16512, 16560, 16783, 16850, 17090, 17195, 17269, 17523, 17542,
+  17569, 17588, 17596, 17620, 17621, 17628, 17631, 17711, 17776, 17779,
+  17792, 17836, 17912, 17916, 17930, 17932, 17944, 17949, 17964, 17965,
+  17967, 17969, 17978, 17987, 17991, 17996, 17998, 17999, 18019, 18020,
+  18029, 18030, 18032, 18036, 18038, 18039, 18042, 18043, 18046, 18047,
+  18068, 18080, 18093, 18100, 18104, 18110, 18111, 18128, 18138, 18185,
+  18197, 18206, 18210, 18211, 18247, 18287.
 
 * A buffer overflow in gethostbyname_r and related functions performing DNS
   requests has been fixed.  If the NSS functions were called with a
@@ -25,6 +25,12 @@  Version 2.22
   potentially arbitrary code execution, using crafted, but syntactically
   valid DNS responses.  (CVE-2015-1781)
 
+* The fallback emulation of posix_fallocate and posix_fallocate64 was
+  removed because it could result in silent data corruption on file systems
+  which do not implement fallocate support in the kernel.  posix_fallocate
+  and posix_fallocate64 will now fail and return ENOTSUP if the file system
+  does not support fallocate operations.
+
 * A powerpc and powerpc64 optimization for TLS, similar to TLS descriptors
   for LD and GD on x86 and x86-64, has been implemented.  You will need
   binutils-2.24 or later to enable this optimization.
diff --git a/sysdeps/posix/posix_fallocate.c b/sysdeps/posix/posix_fallocate.c
deleted file mode 100644
index d15d603..0000000
--- a/sysdeps/posix/posix_fallocate.c
+++ /dev/null
@@ -1,93 +0,0 @@ 
-/* Copyright (C) 2000-2015 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <errno.h>
-#include <fcntl.h>
-#include <unistd.h>
-#include <sys/stat.h>
-#include <sys/statfs.h>
-
-/* Reserve storage for the data of the file associated with FD.  */
-
-int
-posix_fallocate (int fd, __off_t offset, __off_t len)
-{
-  struct stat64 st;
-  struct statfs f;
-
-  /* `off_t' is a signed type.  Therefore we can determine whether
-     OFFSET + LEN is too large if it is a negative value.  */
-  if (offset < 0 || len < 0)
-    return EINVAL;
-  if (offset + len < 0)
-    return EFBIG;
-
-  /* First thing we have to make sure is that this is really a regular
-     file.  */
-  if (__fxstat64 (_STAT_VER, fd, &st) != 0)
-    return EBADF;
-  if (S_ISFIFO (st.st_mode))
-    return ESPIPE;
-  if (! S_ISREG (st.st_mode))
-    return ENODEV;
-
-  if (len == 0)
-    {
-      if (st.st_size < offset)
-	{
-	  int ret = __ftruncate (fd, offset);
-
-	  if (ret != 0)
-	    ret = errno;
-	  return ret;
-	}
-      return 0;
-    }
-
-  /* We have to know the block size of the filesystem to get at least some
-     sort of performance.  */
-  if (__fstatfs (fd, &f) != 0)
-    return errno;
-
-  /* Try to play safe.  */
-  if (f.f_bsize == 0)
-    f.f_bsize = 512;
-
-  /* Write something to every block.  */
-  for (offset += (len - 1) % f.f_bsize; len > 0; offset += f.f_bsize)
-    {
-      len -= f.f_bsize;
-
-      if (offset < st.st_size)
-	{
-	  unsigned char c;
-	  ssize_t rsize = __pread (fd, &c, 1, offset);
-
-	  if (rsize < 0)
-	    return errno;
-	  /* If there is a non-zero byte, the block must have been
-	     allocated already.  */
-	  else if (rsize == 1 && c != 0)
-	    continue;
-	}
-
-      if (__pwrite (fd, "", 1, offset) != 1)
-	return errno;
-    }
-
-  return 0;
-}
diff --git a/sysdeps/posix/posix_fallocate64.c b/sysdeps/posix/posix_fallocate64.c
deleted file mode 100644
index b845df7..0000000
--- a/sysdeps/posix/posix_fallocate64.c
+++ /dev/null
@@ -1,113 +0,0 @@ 
-/* Copyright (C) 2000-2015 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include <errno.h>
-#include <fcntl.h>
-#include <unistd.h>
-#include <sys/stat.h>
-#include <sys/statfs.h>
-
-/* Reserve storage for the data of the file associated with FD.  */
-
-int
-__posix_fallocate64_l64 (int fd, __off64_t offset, __off64_t len)
-{
-  struct stat64 st;
-  struct statfs64 f;
-
-  /* `off64_t' is a signed type.  Therefore we can determine whether
-     OFFSET + LEN is too large if it is a negative value.  */
-  if (offset < 0 || len < 0)
-    return EINVAL;
-  if (offset + len < 0)
-    return EFBIG;
-
-  /* First thing we have to make sure is that this is really a regular
-     file.  */
-  if (__fxstat64 (_STAT_VER, fd, &st) != 0)
-    return EBADF;
-  if (S_ISFIFO (st.st_mode))
-    return ESPIPE;
-  if (! S_ISREG (st.st_mode))
-    return ENODEV;
-
-  if (len == 0)
-    {
-      if (st.st_size < offset)
-	{
-	  int ret = __ftruncate64 (fd, offset);
-
-	  if (ret != 0)
-	    ret = errno;
-	  return ret;
-	}
-      return 0;
-    }
-
-  /* We have to know the block size of the filesystem to get at least some
-     sort of performance.  */
-  if (__fstatfs64 (fd, &f) != 0)
-    return errno;
-
-  /* Try to play safe.  */
-  if (f.f_bsize == 0)
-    f.f_bsize = 512;
-
-  /* Write something to every block.  */
-  for (offset += (len - 1) % f.f_bsize; len > 0; offset += f.f_bsize)
-    {
-      len -= f.f_bsize;
-
-      if (offset < st.st_size)
-	{
-	  unsigned char c;
-	  ssize_t rsize = __libc_pread64 (fd, &c, 1, offset);
-
-	  if (rsize < 0)
-	    return errno;
-	  /* If there is a non-zero byte, the block must have been
-	     allocated already.  */
-	  else if (rsize == 1 && c != 0)
-	    continue;
-	}
-
-      if (__libc_pwrite64 (fd, "", 1, offset) != 1)
-	return errno;
-    }
-
-  return 0;
-}
-
-#undef __posix_fallocate64_l64
-#include <shlib-compat.h>
-#include <bits/wordsize.h>
-
-#if __WORDSIZE == 32 && SHLIB_COMPAT(libc, GLIBC_2_2, GLIBC_2_3_3)
-
-int
-attribute_compat_text_section
-__posix_fallocate64_l32 (int fd, off64_t offset, size_t len)
-{
-  return __posix_fallocate64_l64 (fd, offset, len);
-}
-
-versioned_symbol (libc, __posix_fallocate64_l64, posix_fallocate64,
-		  GLIBC_2_3_3);
-compat_symbol (libc, __posix_fallocate64_l32, posix_fallocate64, GLIBC_2_2);
-#else
-strong_alias (__posix_fallocate64_l64, posix_fallocate64);
-#endif
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/posix_fallocate.c b/sysdeps/unix/sysv/linux/mips/mips64/n32/posix_fallocate.c
index a9c8d73..5d926f5 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/posix_fallocate.c
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/posix_fallocate.c
@@ -18,10 +18,6 @@ 
 #include <fcntl.h>
 #include <sysdep.h>
 
-#define posix_fallocate static internal_fallocate
-#include <sysdeps/posix/posix_fallocate.c>
-#undef posix_fallocate
-
 /* Reserve storage for the data of the file associated with FD.  */
 int
 posix_fallocate (int fd, __off_t offset, __off_t len)
@@ -31,7 +27,5 @@  posix_fallocate (int fd, __off_t offset, __off_t len)
 
   if (! INTERNAL_SYSCALL_ERROR_P (res, err))
     return 0;
-  if (INTERNAL_SYSCALL_ERRNO (res, err) != EOPNOTSUPP)
-    return INTERNAL_SYSCALL_ERRNO (res, err);
-  return internal_fallocate (fd, offset, len);
+  return INTERNAL_SYSCALL_ERRNO (res, err);
 }
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/posix_fallocate64.c b/sysdeps/unix/sysv/linux/mips/mips64/n32/posix_fallocate64.c
index 503e918..5d3a636 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/posix_fallocate64.c
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/posix_fallocate64.c
@@ -18,11 +18,6 @@ 
 #include <fcntl.h>
 #include <sysdep.h>
 
-extern int __posix_fallocate64_l64 (int fd, __off64_t offset, __off64_t len);
-#define __posix_fallocate64_l64 static internal_fallocate64
-#include <sysdeps/posix/posix_fallocate64.c>
-#undef __posix_fallocate64_l64
-
 /* Reserve storage for the data of the file associated with FD.  */
 int
 __posix_fallocate64_l64 (int fd, __off64_t offset, __off64_t len)
@@ -32,7 +27,5 @@  __posix_fallocate64_l64 (int fd, __off64_t offset, __off64_t len)
 
   if (! INTERNAL_SYSCALL_ERROR_P (res, err))
     return 0;
-  if (INTERNAL_SYSCALL_ERRNO (res, err) != EOPNOTSUPP)
-    return INTERNAL_SYSCALL_ERRNO (res, err);
-  return internal_fallocate64 (fd, offset, len);
+  return INTERNAL_SYSCALL_ERRNO (res, err);
 }
diff --git a/sysdeps/unix/sysv/linux/posix_fallocate.c b/sysdeps/unix/sysv/linux/posix_fallocate.c
index 4587029..b6124db 100644
--- a/sysdeps/unix/sysv/linux/posix_fallocate.c
+++ b/sysdeps/unix/sysv/linux/posix_fallocate.c
@@ -18,10 +18,6 @@ 
 #include <fcntl.h>
 #include <sysdep.h>
 
-#define posix_fallocate static internal_fallocate
-#include <sysdeps/posix/posix_fallocate.c>
-#undef posix_fallocate
-
 /* Reserve storage for the data of the file associated with FD.  */
 int
 posix_fallocate (int fd, __off_t offset, __off_t len)
@@ -33,7 +29,5 @@  posix_fallocate (int fd, __off_t offset, __off_t len)
 
   if (! INTERNAL_SYSCALL_ERROR_P (res, err))
     return 0;
-  if (INTERNAL_SYSCALL_ERRNO (res, err) != EOPNOTSUPP)
-    return INTERNAL_SYSCALL_ERRNO (res, err);
-  return internal_fallocate (fd, offset, len);
+  return INTERNAL_SYSCALL_ERRNO (res, err);
 }
diff --git a/sysdeps/unix/sysv/linux/posix_fallocate64.c b/sysdeps/unix/sysv/linux/posix_fallocate64.c
index 771e59c..97c5a57 100644
--- a/sysdeps/unix/sysv/linux/posix_fallocate64.c
+++ b/sysdeps/unix/sysv/linux/posix_fallocate64.c
@@ -15,14 +15,11 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include <bits/wordsize.h>
 #include <fcntl.h>
+#include <shlib-compat.h>
 #include <sysdep.h>
 
-extern int __posix_fallocate64_l64 (int fd, __off64_t offset, __off64_t len);
-#define __posix_fallocate64_l64 static internal_fallocate64
-#include <sysdeps/posix/posix_fallocate64.c>
-#undef __posix_fallocate64_l64
-
 /* Reserve storage for the data of the file associated with FD.  */
 int
 __posix_fallocate64_l64 (int fd, __off64_t offset, __off64_t len)
@@ -36,7 +33,20 @@  __posix_fallocate64_l64 (int fd, __off64_t offset, __off64_t len)
 
   if (! INTERNAL_SYSCALL_ERROR_P (res, err))
     return 0;
-  if (INTERNAL_SYSCALL_ERRNO (res, err) != EOPNOTSUPP)
-    return INTERNAL_SYSCALL_ERRNO (res, err);
-  return internal_fallocate64 (fd, offset, len);
+  return INTERNAL_SYSCALL_ERRNO (res, err);
+}
+
+#if __WORDSIZE == 32 && SHLIB_COMPAT(libc, GLIBC_2_2, GLIBC_2_3_3)
+int
+attribute_compat_text_section
+__posix_fallocate64_l32 (int fd, off64_t offset, size_t len)
+{
+  return __posix_fallocate64_l64 (fd, offset, len);
 }
+
+versioned_symbol (libc, __posix_fallocate64_l64, posix_fallocate64,
+		  GLIBC_2_3_3);
+compat_symbol (libc, __posix_fallocate64_l32, posix_fallocate64, GLIBC_2_2);
+#else
+strong_alias (__posix_fallocate64_l64, posix_fallocate64);
+#endif
diff --git a/sysdeps/unix/sysv/linux/wordsize-64/posix_fallocate.c b/sysdeps/unix/sysv/linux/wordsize-64/posix_fallocate.c
index 8ae8a29..992d8cb 100644
--- a/sysdeps/unix/sysv/linux/wordsize-64/posix_fallocate.c
+++ b/sysdeps/unix/sysv/linux/wordsize-64/posix_fallocate.c
@@ -16,13 +16,10 @@ 
    <http://www.gnu.org/licenses/>.  */
 
 #include <fcntl.h>
+#include <errno.h>
 #include <kernel-features.h>
 #include <sysdep.h>
 
-#define posix_fallocate static internal_fallocate
-#include <sysdeps/posix/posix_fallocate.c>
-#undef posix_fallocate
-
 /* The alpha architecture introduced the fallocate system call in
    2.6.33-rc1, so we still need the fallback code.  */
 #if !defined __ASSUME_FALLOCATE && defined __NR_fallocate
@@ -56,11 +53,10 @@  posix_fallocate (int fd, __off_t offset, __off_t len)
 	__have_fallocate = -1;
       else
 # endif
-	if (INTERNAL_SYSCALL_ERRNO (res, err) != EOPNOTSUPP)
-	  return INTERNAL_SYSCALL_ERRNO (res, err);
+	return INTERNAL_SYSCALL_ERRNO (res, err);
     }
 #endif
 
-  return internal_fallocate (fd, offset, len);
+  return ENOSYS;
 }
 weak_alias (posix_fallocate, posix_fallocate64)