libc: add fallback for posix_fallocate() when not supported by underlying FS

Submitted by Olivier Blin on Sept. 29, 2014, 2:43 p.m.

Details

Message ID 1412001812-9943-1-git-send-email-olivier.blin@softathome.com
State New
Headers show

Commit Message

Olivier Blin Sept. 29, 2014, 2:43 p.m.
EOPNOTSUPP is not a valid error code for posix_fallocate(), the
implementation should have a fallback when the underlying filesystem
does not support fallocate().

This is especially useful when using posix_fallocate() on tmpfs with
kernels older than 3.5, for which there was no fallocate support.

This copies the implementation of the internal fallback from glibc,
with a few adaptations for internals symbols.

Though, there are no internal symbols for the following functions in
uClibc:
- fstat64
- fstatfs64 (not important, since it is not weak)
- ftruncate/ftruncate64
---
 libc/sysdeps/linux/common/posix_fallocate.c   | 78 ++++++++++++++++++++++++++-
 libc/sysdeps/linux/common/posix_fallocate64.c | 78 ++++++++++++++++++++++++++-
 2 files changed, 154 insertions(+), 2 deletions(-)

Comments

Rich Felker Sept. 29, 2014, 4:42 p.m.
On Mon, Sep 29, 2014 at 04:43:32PM +0200, Olivier Blin wrote:
> EOPNOTSUPP is not a valid error code for posix_fallocate(), the
> implementation should have a fallback when the underlying filesystem
> does not support fallocate().
> 
> This is especially useful when using posix_fallocate() on tmpfs with
> kernels older than 3.5, for which there was no fallocate support.
> 
> This copies the implementation of the internal fallback from glibc,
> with a few adaptations for internals symbols.

This code is dangerous (has race conditions that cause file
corruption) and should not be added. See the bug report for the
corresponding code in glibc:

https://sourceware.org/bugzilla/show_bug.cgi?id=15661

Rich
Olivier Blin Sept. 29, 2014, 5:39 p.m.
On 09/29/14 18:42, Rich Felker wrote:
> On Mon, Sep 29, 2014 at 04:43:32PM +0200, Olivier Blin wrote:
> > EOPNOTSUPP is not a valid error code for posix_fallocate(), the
> > implementation should have a fallback when the underlying filesystem
> > does not support fallocate().
> >
> > This is especially useful when using posix_fallocate() on tmpfs with
> > kernels older than 3.5, for which there was no fallocate support.
> >
> > This copies the implementation of the internal fallback from glibc,
> > with a few adaptations for internals symbols.
>
> This code is dangerous (has race conditions that cause file
> corruption) and should not be added. See the bug report for the
> corresponding code in glibc:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=15661
>
> Rich
Indeed, this is dangerous.

According to the specification of posix_fallocate(), there does not seem 
to be an appropriate error code to return when the filesystem driver 
does not natively support fallocate().
In this case, the libc should probably provide a fallback implementation.

Users of posix_fallocate() do not expect it to fail if the FS does not 
support fallocate().
See for example the usage in wayland:
http://cgit.freedesktop.org/wayland/wayland/tree/cursor/os-compatibility.c?h=1.6&id=1.6.0#n79 


To have a better fallback, I don't see a way to avoid the extra child 
process either, since we can only do the atomic compare-and-swap through 
mmap.
Does anyone have a better suggestion to implement the fallback?

Thanks for bringing this up
Anthony Basile Sept. 30, 2014, 9:21 p.m.
On 09/29/14 13:39, Olivier Blin wrote:
> On 09/29/14 18:42, Rich Felker wrote:
>> On Mon, Sep 29, 2014 at 04:43:32PM +0200, Olivier Blin wrote:
>> > EOPNOTSUPP is not a valid error code for posix_fallocate(), the
>> > implementation should have a fallback when the underlying filesystem
>> > does not support fallocate().
>> >
>> > This is especially useful when using posix_fallocate() on tmpfs with
>> > kernels older than 3.5, for which there was no fallocate support.
>> >
>> > This copies the implementation of the internal fallback from glibc,
>> > with a few adaptations for internals symbols.
>>
>> This code is dangerous (has race conditions that cause file
>> corruption) and should not be added. See the bug report for the
>> corresponding code in glibc:
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=15661
>>
>> Rich
> Indeed, this is dangerous.
>
> According to the specification of posix_fallocate(), there does not seem
> to be an appropriate error code to return when the filesystem driver
> does not natively support fallocate().
> In this case, the libc should probably provide a fallback implementation.

Why can't there be an implementation specific error message here since 
the specs are silent on the issue?  I see no problem with returning 
ENOTSUP even though its not specified in [1].

>
> Users of posix_fallocate() do not expect it to fail if the FS does not
> support fallocate().

The specs don't guarantee that.  A better approach would be to check for 
the errors specified in [1], in say a switch-case, and then have a 
default which is some other "unspecified" error.

> See for example the usage in wayland:
> http://cgit.freedesktop.org/wayland/wayland/tree/cursor/os-compatibility.c?h=1.6&id=1.6.0#n79
>

Sounds like a bug in wayland.  glibc might be creating some unreasonable 
expectations.

>
> To have a better fallback, I don't see a way to avoid the extra child
> process either, since we can only do the atomic compare-and-swap through
> mmap.
> Does anyone have a better suggestion to implement the fallback?

I'm not sure there is any good fallback if the fallocate syscall is not 
available.  Again, it seems reasonable to me to just let it fail rather 
than trying to implement it via some brute force.

Ref.
[1] 
http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html
Rich Felker Sept. 30, 2014, 10:01 p.m.
On Tue, Sep 30, 2014 at 05:21:43PM -0400, Anthony G. Basile wrote:
> On 09/29/14 13:39, Olivier Blin wrote:
> >On 09/29/14 18:42, Rich Felker wrote:
> >>On Mon, Sep 29, 2014 at 04:43:32PM +0200, Olivier Blin wrote:
> >>> EOPNOTSUPP is not a valid error code for posix_fallocate(), the
> >>> implementation should have a fallback when the underlying filesystem
> >>> does not support fallocate().
> >>>
> >>> This is especially useful when using posix_fallocate() on tmpfs with
> >>> kernels older than 3.5, for which there was no fallocate support.
> >>>
> >>> This copies the implementation of the internal fallback from glibc,
> >>> with a few adaptations for internals symbols.
> >>
> >>This code is dangerous (has race conditions that cause file
> >>corruption) and should not be added. See the bug report for the
> >>corresponding code in glibc:
> >>
> >>https://sourceware.org/bugzilla/show_bug.cgi?id=15661
> >>
> >>Rich
> >Indeed, this is dangerous.
> >
> >According to the specification of posix_fallocate(), there does not seem
> >to be an appropriate error code to return when the filesystem driver
> >does not natively support fallocate().
> >In this case, the libc should probably provide a fallback implementation.
> 
> Why can't there be an implementation specific error message here
> since the specs are silent on the issue?  I see no problem with
> returning ENOTSUP even though its not specified in [1].

For details see XSH 2.3 Error Numbers:

"Implementations may support additional errors not included in this
list, may generate errors included in this list under circumstances
other than those described here, or may contain extensions or
limitations that prevent some errors from occurring.

...

Implementations may generate error numbers listed here under
circumstances other than those described, if and only if all those
error conditions can always be treated identically to the error
conditions as described in this volume of POSIX.1-2008.
Implementations shall not generate a different error number from one
required by this volume of POSIX.1-2008 for an error condition
described in this volume of POSIX.1-2008, but may generate additional
errors unless explicitly disallowed for a particular function."

Source:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_03

Rich

Patch hide | download patch | download mbox

diff --git a/libc/sysdeps/linux/common/posix_fallocate.c b/libc/sysdeps/linux/common/posix_fallocate.c
index 76771e3..cc4553c 100644
--- a/libc/sysdeps/linux/common/posix_fallocate.c
+++ b/libc/sysdeps/linux/common/posix_fallocate.c
@@ -12,12 +12,88 @@ 
 #include <fcntl.h>
 #include <bits/kernel-features.h>
 #include <stdint.h>
+#include <sys/statfs.h>
 
 #if defined __NR_fallocate
+/* Reserve storage for the data of the file associated with FD.  */
+/* Adapted from glibc */
+int
+internal_fallocate (int fd, __off_t offset, __off_t len)
+{
+  struct stat64 st;
+  struct statfs f;
+
+  /* `off_t' is a signed type.  Therefore we can determine whether
+     OFFSET + LEN is too large if it is a negative value.  */
+  if (offset < 0 || len < 0)
+    return EINVAL;
+  if (offset + len < 0)
+    return EFBIG;
+
+  /* First thing we have to make sure is that this is really a regular
+     file.  */
+  if (fstat64 (fd, &st) != 0)
+    return EBADF;
+  if (S_ISFIFO (st.st_mode))
+    return ESPIPE;
+  if (! S_ISREG (st.st_mode))
+    return ENODEV;
+
+  if (len == 0)
+    {
+      if (st.st_size < offset)
+	{
+	  int ret = ftruncate (fd, offset);
+
+	  if (ret != 0)
+	    ret = errno;
+	  return ret;
+	}
+      return 0;
+    }
+
+  /* We have to know the block size of the filesystem to get at least some
+     sort of performance.  */
+  if (__libc_fstatfs (fd, &f) != 0)
+    return errno;
+
+  /* Try to play safe.  */
+  if (f.f_bsize == 0)
+    f.f_bsize = 512;
+
+  /* Write something to every block.  */
+  for (offset += (len - 1) % f.f_bsize; len > 0; offset += f.f_bsize)
+    {
+      len -= f.f_bsize;
+
+      if (offset < st.st_size)
+	{
+	  unsigned char c;
+	  ssize_t rsize = __libc_pread (fd, &c, 1, offset);
+
+	  if (rsize < 0)
+	    return errno;
+	  /* If there is a non-zero byte, the block must have been
+	     allocated already.  */
+	  else if (rsize == 1 && c != 0)
+	    continue;
+	}
+
+      if (__libc_pwrite (fd, "", 1, offset) != 1)
+	return errno;
+    }
+
+  return 0;
+}
+
 extern __typeof(fallocate) __libc_fallocate attribute_hidden;
 int posix_fallocate(int fd, __off_t offset, __off_t len)
 {
-	return __libc_fallocate(fd, 0, offset, len);
+	int result = __libc_fallocate(fd, 0, offset, len);
+	if (result != EOPNOTSUPP)
+		return result;
+
+	return internal_fallocate(fd, offset, len);
 }
 # if defined __UCLIBC_HAS_LFS__ && __WORDSIZE == 64
 strong_alias(posix_fallocate,posix_fallocate64)
diff --git a/libc/sysdeps/linux/common/posix_fallocate64.c b/libc/sysdeps/linux/common/posix_fallocate64.c
index 12ddbc2..2f02626 100644
--- a/libc/sysdeps/linux/common/posix_fallocate64.c
+++ b/libc/sysdeps/linux/common/posix_fallocate64.c
@@ -12,15 +12,91 @@ 
 #include <fcntl.h>
 #include <bits/kernel-features.h>
 #include <stdint.h>
+#include <sys/statfs.h>
 
 #if defined __NR_fallocate
 # if __WORDSIZE == 64
 /* Can use normal posix_fallocate() */
 # elif __WORDSIZE == 32
+/* Reserve storage for the data of the file associated with FD.  */
+/* Adapted from glibc */
+static int
+internal_fallocate64 (int fd, __off64_t offset, __off64_t len)
+{
+  struct stat64 st;
+  struct statfs64 f;
+
+  /* `off64_t' is a signed type.  Therefore we can determine whether
+     OFFSET + LEN is too large if it is a negative value.  */
+  if (offset < 0 || len < 0)
+    return EINVAL;
+  if (offset + len < 0)
+    return EFBIG;
+
+  /* First thing we have to make sure is that this is really a regular
+     file.  */
+  if (fstat64 (fd, &st) != 0)
+    return EBADF;
+  if (S_ISFIFO (st.st_mode))
+    return ESPIPE;
+  if (! S_ISREG (st.st_mode))
+    return ENODEV;
+
+  if (len == 0)
+    {
+      if (st.st_size < offset)
+	{
+	  int ret = ftruncate64 (fd, offset);
+
+	  if (ret != 0)
+	    ret = errno;
+	  return ret;
+	}
+      return 0;
+    }
+
+  /* We have to know the block size of the filesystem to get at least some
+     sort of performance.  */
+  if (fstatfs64 (fd, &f) != 0)
+    return errno;
+
+  /* Try to play safe.  */
+  if (f.f_bsize == 0)
+    f.f_bsize = 512;
+
+  /* Write something to every block.  */
+  for (offset += (len - 1) % f.f_bsize; len > 0; offset += f.f_bsize)
+    {
+      len -= f.f_bsize;
+
+      if (offset < st.st_size)
+	{
+	  unsigned char c;
+	  ssize_t rsize = __libc_pread64 (fd, &c, 1, offset);
+
+	  if (rsize < 0)
+	    return errno;
+	  /* If there is a non-zero byte, the block must have been
+	     allocated already.  */
+	  else if (rsize == 1 && c != 0)
+	    continue;
+	}
+
+      if (__libc_pwrite64 (fd, "", 1, offset) != 1)
+	return errno;
+    }
+
+  return 0;
+}
+
 extern __typeof(fallocate64) __libc_fallocate64 attribute_hidden;
 int posix_fallocate64(int fd, __off64_t offset, __off64_t len)
 {
-	return __libc_fallocate64(fd, 0, offset, len);
+	int result = __libc_fallocate64(fd, 0, offset, len);
+	if (result != EOPNOTSUPP)
+		return result;
+
+	return internal_fallocate64(fd, offset, len);
 }
 # else
 #  error your machine is neither 32 bit or 64 bit ... it must be magical