diff mbox series

io: Remove copy_file_range emulation

Message ID 871rzesyvp.fsf@oldenburg2.str.redhat.com
State New
Headers show
Series io: Remove copy_file_range emulation | expand

Commit Message

Florian Weimer June 27, 2019, 5:48 p.m. UTC
The kernel is evolving this interface (e.g., removal of the
restriction on cross-device copies), and keeping up with that
is difficult.  Applications which need the function should
run kernels which support the system call instead of relying on
the imperfect glibc emulation.

2019-06-27  Florian Weimer  <fweimer@redhat.com>

	io: Remove the copy_file_range emulation.
	* sysdeps/unix/sysv/linux/copy_file_range.c (copy_file_range): Do
	not define and call copy_file_range_compat.
	* io/Makefile (tests-static, tests-internal): Do not add
	tst-copy_file_range-compat.
	* io/copy_file_range-compat.c: Remove file.
	* io/copy_file_range.c (copy_file_range): Define as stub.
	* io/tst-copy_file_range-compat.c: Remove file.
	* io/tst-copy_file_range.c (xdevfile): Remove variable.
	(typical_sizes): Update comment.  Remove 16K sizes.
	(maximum_offset, maximum_offset_errno, maximum_offset_hard_limit):
	Remove variables.
	(find_maximum_offset, pipe_as_source, pipe_as_destination)
	(delayed_write_failure_beginning, delayed_write_failure_end)
	(cross_device_failure, enospc_failure_1, enospc_failure)
	(oappend_failure): Remove functions.
	(tests): Adjust test case list.
	(do_test): Remove file system search code.  Check for ENOSYS from
	copy_file_range.  Do not free xdevfile.
	* manual/llio.texi (Copying File Data): Document ENOSYS error from
	copy_file_range.  Do not document the EXDEV error, which future
	kernels may not report.  Update the wording to reflect that
	further errors are possible.
	* sysdeps/unix/sysv/linux/alpha/kernel-features.h
	[__LINUX_KERNEL_VERSION < 0x040D00] (__ASSUME_COPY_FILE_RANGE): Do
	not undefine.
	* sysdeps/unix/sysv/linux/arm/kernel-features.h
	[__LINUX_KERNEL_VERSION < 0x040700] (__ASSUME_COPY_FILE_RANGE):
	Likewise.
	* sysdeps/unix/sysv/linux/kernel-features.h
	[__LINUX_KERNEL_VERSION >= 0x040500] (__ASSUME_COPY_FILE_RANGE):
	Remove definition.
	* sysdeps/unix/sysv/linux/microblaze/kernel-features.h
	[__LINUX_KERNEL_VERSION < 0x040A00] (__ASSUME_COPY_FILE_RANGE): Do
	not undefine.
	* sysdeps/unix/sysv/linux/sh/kernel-features.h
	[__LINUX_KERNEL_VERSION < 0x040800] (__ASSUME_COPY_FILE_RANGE):
	Likewise.

Comments

Adhemerval Zanella Netto June 27, 2019, 7:22 p.m. UTC | #1
On 27/06/2019 14:48, Florian Weimer wrote:
> The kernel is evolving this interface (e.g., removal of the
> restriction on cross-device copies), and keeping up with that
> is difficult.  Applications which need the function should
> run kernels which support the system call instead of relying on
> the imperfect glibc emulation.
> 
> 2019-06-27  Florian Weimer  <fweimer@redhat.com>
> 
> 	io: Remove the copy_file_range emulation.
> 	* sysdeps/unix/sysv/linux/copy_file_range.c (copy_file_range): Do
> 	not define and call copy_file_range_compat.
> 	* io/Makefile (tests-static, tests-internal): Do not add
> 	tst-copy_file_range-compat.
> 	* io/copy_file_range-compat.c: Remove file.
> 	* io/copy_file_range.c (copy_file_range): Define as stub.
> 	* io/tst-copy_file_range-compat.c: Remove file.
> 	* io/tst-copy_file_range.c (xdevfile): Remove variable.
> 	(typical_sizes): Update comment.  Remove 16K sizes.
> 	(maximum_offset, maximum_offset_errno, maximum_offset_hard_limit):
> 	Remove variables.
> 	(find_maximum_offset, pipe_as_source, pipe_as_destination)
> 	(delayed_write_failure_beginning, delayed_write_failure_end)
> 	(cross_device_failure, enospc_failure_1, enospc_failure)
> 	(oappend_failure): Remove functions.
> 	(tests): Adjust test case list.
> 	(do_test): Remove file system search code.  Check for ENOSYS from
> 	copy_file_range.  Do not free xdevfile.
> 	* manual/llio.texi (Copying File Data): Document ENOSYS error from
> 	copy_file_range.  Do not document the EXDEV error, which future
> 	kernels may not report.  Update the wording to reflect that
> 	further errors are possible.
> 	* sysdeps/unix/sysv/linux/alpha/kernel-features.h
> 	[__LINUX_KERNEL_VERSION < 0x040D00] (__ASSUME_COPY_FILE_RANGE): Do
> 	not undefine.
> 	* sysdeps/unix/sysv/linux/arm/kernel-features.h
> 	[__LINUX_KERNEL_VERSION < 0x040700] (__ASSUME_COPY_FILE_RANGE):
> 	Likewise.
> 	* sysdeps/unix/sysv/linux/kernel-features.h
> 	[__LINUX_KERNEL_VERSION >= 0x040500] (__ASSUME_COPY_FILE_RANGE):
> 	Remove definition.
> 	* sysdeps/unix/sysv/linux/microblaze/kernel-features.h
> 	[__LINUX_KERNEL_VERSION < 0x040A00] (__ASSUME_COPY_FILE_RANGE): Do
> 	not undefine.
> 	* sysdeps/unix/sysv/linux/sh/kernel-features.h
> 	[__LINUX_KERNEL_VERSION < 0x040800] (__ASSUME_COPY_FILE_RANGE):
> 	Likewise.
> 
> diff --git a/NEWS b/NEWS
> index 8a2fecef47..3961715106 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -36,6 +36,13 @@ Major new features:
>  
>  Deprecated and removed features, and other changes affecting compatibility:
>  
> +* The copy_file_range function fails with ENOSYS if the kernel does not
> +  support the system call of the same name.  Previously, user space
> +  emulation was performed, but its behavior did not match the kernel
> +  behavior, which was deemed too confusing.  Applications which use the
> +  copy_file_range function will have to be run on kernels which implement
> +  the copy_file_range system call.
> +
>  * The functions clock_gettime, clock_getres, clock_settime,
>    clock_getcpuclockid, clock_nanosleep were removed from the librt library
>    for new applications (on architectures which had them).  Instead, the

Maybe add the minimum kernel version that implements the syscall is Linux 4.5.
Do you plan to backport it to 2.27, 2.28, and/or 2.29?

Michael, we will need to update the man pages. 

LGTM with a nit below regarding a gratuitous extra line.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

> diff --git a/io/Makefile b/io/Makefile
> index 088e86da77..ac3e29e1ba 100644
> --- a/io/Makefile
> +++ b/io/Makefile
> @@ -75,11 +75,6 @@ tests		:= test-utime test-stat test-stat2 test-lfs tst-getcwd \
>  		   tst-fts tst-fts-lfs tst-open-tmpfile \
>  		   tst-copy_file_range tst-getcwd-abspath tst-lockf
>  
> -# This test includes the compat implementation of copy_file_range,
> -# which uses internal, unexported libc functions.
> -tests-static += tst-copy_file_range-compat
> -tests-internal += tst-copy_file_range-compat
> -
>  # Likewise for statx, but we do not need static linking here.
>  tests-internal += tst-statx
>  

Ok.

> diff --git a/io/copy_file_range-compat.c b/io/copy_file_range-compat.c
> deleted file mode 100644
> index 58dbeef3e9..0000000000
> --- a/io/copy_file_range-compat.c
> +++ /dev/null
> @@ -1,160 +0,0 @@
> -/* Emulation of copy_file_range.
> -   Copyright (C) 2017-2019 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <http://www.gnu.org/licenses/>.  */
> -
> -/* The following macros should be defined before including this
> -   file:
> -
> -   COPY_FILE_RANGE_DECL   Declaration specifiers for the function below.
> -   COPY_FILE_RANGE        Name of the function to define.  */
> -
> -#include <errno.h>
> -#include <fcntl.h>
> -#include <inttypes.h>
> -#include <limits.h>
> -#include <sys/stat.h>
> -#include <sys/types.h>
> -#include <unistd.h>
> -
> -COPY_FILE_RANGE_DECL
> -ssize_t
> -COPY_FILE_RANGE (int infd, __off64_t *pinoff,
> -                 int outfd, __off64_t *poutoff,
> -                 size_t length, unsigned int flags)
> -{
> -  if (flags != 0)
> -    {
> -      __set_errno (EINVAL);
> -      return -1;
> -    }
> -
> -  {
> -    struct stat64 instat;
> -    struct stat64 outstat;
> -    if (fstat64 (infd, &instat) != 0 || fstat64 (outfd, &outstat) != 0)
> -      return -1;
> -    if (S_ISDIR (instat.st_mode) || S_ISDIR (outstat.st_mode))
> -      {
> -        __set_errno (EISDIR);
> -        return -1;
> -      }
> -    if (!S_ISREG (instat.st_mode) || !S_ISREG (outstat.st_mode))
> -      {
> -        /* We need a regular input file so that the we can seek
> -           backwards in case of a write failure.  */
> -        __set_errno (EINVAL);
> -        return -1;
> -      }
> -    if (instat.st_dev != outstat.st_dev)
> -      {
> -        /* Cross-device copies are not supported.  */
> -        __set_errno (EXDEV);
> -        return -1;
> -      }
> -  }
> -
> -  /* The output descriptor must not have O_APPEND set.  */
> -  {
> -    int flags = __fcntl (outfd, F_GETFL);
> -    if (flags & O_APPEND)
> -      {
> -        __set_errno (EBADF);
> -        return -1;
> -      }
> -  }
> -
> -  /* Avoid an overflow in the result.  */
> -  if (length > SSIZE_MAX)
> -    length = SSIZE_MAX;
> -
> -  /* Main copying loop.  The buffer size is arbitrary and is a
> -     trade-off between stack size consumption, cache usage, and
> -     amortization of system call overhead.  */
> -  size_t copied = 0;
> -  char buf[8192];
> -  while (length > 0)
> -    {
> -      size_t to_read = length;
> -      if (to_read > sizeof (buf))
> -        to_read = sizeof (buf);
> -
> -      /* Fill the buffer.  */
> -      ssize_t read_count;
> -      if (pinoff == NULL)
> -        read_count = read (infd, buf, to_read);
> -      else
> -        read_count = __libc_pread64 (infd, buf, to_read, *pinoff);
> -      if (read_count == 0)
> -        /* End of file reached prematurely.  */
> -        return copied;
> -      if (read_count < 0)
> -        {
> -          if (copied > 0)
> -            /* Report the number of bytes copied so far.  */
> -            return copied;
> -          return -1;
> -        }
> -      if (pinoff != NULL)
> -        *pinoff += read_count;
> -
> -      /* Write the buffer part which was read to the destination.  */
> -      char *end = buf + read_count;
> -      for (char *p = buf; p < end; )
> -        {
> -          ssize_t write_count;
> -          if (poutoff == NULL)
> -            write_count = write (outfd, p, end - p);
> -          else
> -            write_count = __libc_pwrite64 (outfd, p, end - p, *poutoff);
> -          if (write_count < 0)
> -            {
> -              /* Adjust the input read position to match what we have
> -                 written, so that the caller can pick up after the
> -                 error.  */
> -              size_t written = p - buf;
> -              /* NB: This needs to be signed so that we can form the
> -                 negative value below.  */
> -              ssize_t overread = read_count - written;
> -              if (pinoff == NULL)
> -                {
> -                  if (overread > 0)
> -                    {
> -                      /* We are on an error recovery path, so we
> -                         cannot deal with failure here.  */
> -                      int save_errno = errno;
> -                      (void) __libc_lseek64 (infd, -overread, SEEK_CUR);
> -                      __set_errno (save_errno);
> -                    }
> -                }
> -              else /* pinoff != NULL */
> -                *pinoff -= overread;
> -
> -              if (copied + written > 0)
> -                /* Report the number of bytes copied so far.  */
> -                return copied + written;
> -              return -1;
> -            }
> -          p += write_count;
> -          if (poutoff != NULL)
> -            *poutoff += write_count;
> -        } /* Write loop.  */
> -
> -      copied += read_count;
> -      length -= read_count;
> -    }
> -  return copied;
> -}

Ok.

> diff --git a/io/copy_file_range.c b/io/copy_file_range.c
> index 7b968be19d..59fb979773 100644
> --- a/io/copy_file_range.c
> +++ b/io/copy_file_range.c
> @@ -1,4 +1,4 @@
> -/* Generic implementation of copy_file_range.
> +/* Stub implementation of copy_file_range.
>     Copyright (C) 2017-2019 Free Software Foundation, Inc.
>     This file is part of the GNU C Library.
>  
> @@ -16,7 +16,15 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> -#define COPY_FILE_RANGE_DECL
> -#define COPY_FILE_RANGE copy_file_range
> +#include <errno.h>
> +#include <unistd.h>
>  
> -#include <io/copy_file_range-compat.c>
> +ssize_t
> +copy_file_range (int infd, __off64_t *pinoff,
> +                 int outfd, __off64_t *poutoff,
> +                 size_t length, unsigned int flags)
> +{
> +  __set_errno (ENOSYS);
> +  return -1;
> +}
> +stub_warning (copy_file_range)


Ok.

> diff --git a/io/tst-copy_file_range-compat.c b/io/tst-copy_file_range-compat.c
> deleted file mode 100644
> index fe6de8ac68..0000000000
> --- a/io/tst-copy_file_range-compat.c
> +++ /dev/null
> @@ -1,30 +0,0 @@
> -/* Test the fallback implementation of copy_file_range.
> -   Copyright (C) 2017-2019 Free Software Foundation, Inc.
> -   This file is part of the GNU C Library.
> -
> -   The GNU C Library is free software; you can redistribute it and/or
> -   modify it under the terms of the GNU Lesser General Public
> -   License as published by the Free Software Foundation; either
> -   version 2.1 of the License, or (at your option) any later version.
> -
> -   The GNU C Library is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> -   Lesser General Public License for more details.
> -
> -   You should have received a copy of the GNU Lesser General Public
> -   License along with the GNU C Library; if not, see
> -   <http://www.gnu.org/licenses/>.  */
> -
> -/* Get the declaration of the official copy_of_range function.  */
> -#include <unistd.h>
> -
> -/* Compile a local version of copy_file_range.  */
> -#define COPY_FILE_RANGE_DECL static
> -#define COPY_FILE_RANGE copy_file_range_compat
> -#include <io/copy_file_range-compat.c>
> -
> -/* Re-use the test, but run it against copy_file_range_compat defined
> -   above.  */
> -#define copy_file_range copy_file_range_compat
> -#include "tst-copy_file_range.c"
> diff --git a/io/tst-copy_file_range.c b/io/tst-copy_file_range.c
> index a5dcf3c1f6..353ba509be 100644
> --- a/io/tst-copy_file_range.c
> +++ b/io/tst-copy_file_range.c
> @@ -20,22 +20,15 @@
>  #include <errno.h>
>  #include <fcntl.h>
>  #include <inttypes.h>
> -#include <libgen.h>
> -#include <poll.h>
> -#include <sched.h>
>  #include <stdbool.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <support/check.h>
> -#include <support/namespace.h>
>  #include <support/support.h>
>  #include <support/temp_file.h>
>  #include <support/test-driver.h>
>  #include <support/xunistd.h>
> -#ifdef CLONE_NEWNS
> -# include <sys/mount.h>
> -#endif
>  
>  /* Boolean flags which indicate whether to use pointers with explicit
>     output flags.  */
> @@ -49,10 +42,6 @@ static int infd;
>  static char *outfile;
>  static int outfd;
>  
> -/* Like the above, but on a different file system.  xdevfile can be
> -   NULL if no suitable file system has been found.  */
> -static char *xdevfile;
> -
>  /* Input and output offsets.  Set according to do_inoff and do_outoff
>     before the test.  The offsets themselves are always set to
>     zero.  */
> @@ -61,13 +50,10 @@ static off64_t *pinoff;
>  static off64_t outoff;
>  static off64_t *poutoff;
>  
> -/* These are a collection of copy sizes used in tests.  The selection
> -   takes into account that the fallback implementation uses an
> -   internal buffer of 8192 bytes.  */
> +/* These are a collection of copy sizes used in tests.    */
>  enum { maximum_size = 99999 };
>  static const int typical_sizes[] =
> -  { 0, 1, 2, 3, 1024, 2048, 4096, 8191, 8192, 8193, 16383, 16384, 16385,
> -    maximum_size };
> +  { 0, 1, 2, 3, 1024, 2048, 4096, 8191, 8192, 8193, maximum_size };
>  
>  /* The random contents of this array can be used as a pattern to check
>     for correct write operations.  */
> @@ -76,101 +62,6 @@ static unsigned char random_data[maximum_size];
>  /* The size chosen by the test harness.  */
>  static int current_size;
>  
> -/* Maximum writable file offset.  Updated by find_maximum_offset
> -   below.  */
> -static off64_t maximum_offset;
> -
> -/* Error code when crossing the offset.  */
> -static int maximum_offset_errno;
> -
> -/* If true: Writes which cross the limit will fail.  If false: Writes
> -   which cross the limit will result in a partial write.  */
> -static bool maximum_offset_hard_limit;
> -
> -/* Fills maximum_offset etc. above.  Truncates outfd as a side
> -   effect.  */
> -static void
> -find_maximum_offset (void)
> -{
> -  xftruncate (outfd, 0);
> -  if (maximum_offset != 0)
> -    return;
> -
> -  uint64_t upper = -1;
> -  upper >>= 1;                  /* Maximum of off64_t.  */
> -  TEST_VERIFY ((off64_t) upper > 0);
> -  TEST_VERIFY ((off64_t) (upper + 1) < 0);
> -  if (lseek64 (outfd, upper, SEEK_SET) >= 0)
> -    {
> -      if (write (outfd, "", 1) == 1)
> -        FAIL_EXIT1 ("created a file larger than the off64_t range");
> -    }
> -
> -  uint64_t lower = 1024 * 1024; /* A reasonable minimum file size.  */
> -  /* Loop invariant: writing at lower succeeds, writing at upper fails.  */
> -  while (lower + 1 < upper)
> -    {
> -      uint64_t middle = (lower + upper) / 2;
> -      if (test_verbose > 0)
> -        printf ("info: %s: remaining test range %" PRIu64 " .. %" PRIu64
> -                ", probe at %" PRIu64 "\n", __func__, lower, upper, middle);
> -      xftruncate (outfd, 0);
> -      if (lseek64 (outfd, middle, SEEK_SET) >= 0
> -          && write (outfd, "", 1) == 1)
> -        lower = middle;
> -      else
> -        upper = middle;
> -    }
> -  TEST_VERIFY (lower + 1 == upper);
> -  maximum_offset = lower;
> -  printf ("info: maximum writable file offset: %" PRIu64 " (%" PRIx64 ")\n",
> -          lower, lower);
> -
> -  /* Check that writing at the valid offset actually works.  */
> -  xftruncate (outfd, 0);
> -  xlseek (outfd, lower, SEEK_SET);
> -  TEST_COMPARE (write (outfd, "", 1), 1);
> -
> -  /* Cross the boundary with a two-byte write.  This can either result
> -     in a short write, or a failure.  */
> -  xlseek (outfd, lower, SEEK_SET);
> -  ssize_t ret = write (outfd, " ", 2);
> -  if (ret < 0)
> -    {
> -      maximum_offset_errno = errno;
> -      maximum_offset_hard_limit = true;
> -    }
> -  else
> -    maximum_offset_hard_limit = false;
> -
> -  /* Check that writing at the next offset actually fails.  This also
> -     obtains the expected errno value.  */
> -  xftruncate (outfd, 0);
> -  const char *action;
> -  if (lseek64 (outfd, lower + 1, SEEK_SET) != 0)
> -    {
> -      if (write (outfd, "", 1) != -1)
> -        FAIL_EXIT1 ("write to impossible offset %" PRIu64 " succeeded",
> -                    lower + 1);
> -      action = "writing";
> -      int errno_copy = errno;
> -      if (maximum_offset_hard_limit)
> -        TEST_COMPARE (errno_copy, maximum_offset_errno);
> -      else
> -        maximum_offset_errno = errno_copy;
> -    }
> -  else
> -    {
> -      action = "seeking";
> -      maximum_offset_errno = errno;
> -    }
> -  printf ("info: %s out of range fails with %m (%d)\n",
> -          action, maximum_offset_errno);
> -
> -  xftruncate (outfd, 0);
> -  xlseek (outfd, 0, SEEK_SET);
> -}
> -
>  /* Perform a copy of a file.  */
>  static void
>  simple_file_copy (void)
> @@ -247,390 +138,6 @@ simple_file_copy (void)
>    free (bytes);
>  }
>  
> -/* Test that reading from a pipe willfails.  */
> -static void
> -pipe_as_source (void)
> -{
> -  int pipefds[2];
> -  xpipe (pipefds);
> -
> -  for (int length = 0; length < 2; ++length)
> -    {
> -      if (test_verbose > 0)
> -        printf ("info: %s: length=%d\n", __func__, length);
> -
> -      /* Make sure that there is something to copy in the pipe.  */
> -      xwrite (pipefds[1], "@", 1);
> -
> -      TEST_COMPARE (copy_file_range (pipefds[0], pinoff, outfd, poutoff,
> -                                     length, 0), -1);
> -      /* Linux 4.10 and later return EINVAL.  Older kernels return
> -         EXDEV.  */
> -      TEST_VERIFY (errno == EINVAL || errno == EXDEV);
> -      TEST_COMPARE (inoff, 0);
> -      TEST_COMPARE (outoff, 0);
> -      TEST_COMPARE (xlseek (outfd, 0, SEEK_CUR), 0);
> -
> -      /* Make sure that nothing was read.  */
> -      char buf = 'A';
> -      TEST_COMPARE (read (pipefds[0], &buf, 1), 1);
> -      TEST_COMPARE (buf, '@');
> -    }
> -
> -  xclose (pipefds[0]);
> -  xclose (pipefds[1]);
> -}
> -
> -/* Test that writing to a pipe fails.  */
> -static void
> -pipe_as_destination (void)
> -{
> -  /* Make sure that there is something to read in the input file.  */
> -  xwrite (infd, "abc", 3);
> -  xlseek (infd, 0, SEEK_SET);
> -
> -  int pipefds[2];
> -  xpipe (pipefds);
> -
> -  for (int length = 0; length < 2; ++length)
> -    {
> -      if (test_verbose > 0)
> -        printf ("info: %s: length=%d\n", __func__, length);
> -
> -      TEST_COMPARE (copy_file_range (infd, pinoff, pipefds[1], poutoff,
> -                                     length, 0), -1);
> -      /* Linux 4.10 and later return EINVAL.  Older kernels return
> -         EXDEV.  */
> -      TEST_VERIFY (errno == EINVAL || errno == EXDEV);
> -      TEST_COMPARE (inoff, 0);
> -      TEST_COMPARE (outoff, 0);
> -      TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 0);
> -
> -      /* Make sure that nothing was written.  */
> -      struct pollfd pollfd = { .fd = pipefds[0], .events = POLLIN, };
> -      TEST_COMPARE (poll (&pollfd, 1, 0), 0);
> -    }
> -
> -  xclose (pipefds[0]);
> -  xclose (pipefds[1]);
> -}
> -
> -/* Test a write failure after (potentially) writing some bytes.
> -   Failure occurs near the start of the buffer.  */
> -static void
> -delayed_write_failure_beginning (void)
> -{
> -  /* We need to write something to provoke the error.  */
> -  if (current_size == 0)
> -    return;
> -  xwrite (infd, random_data, sizeof (random_data));
> -  xlseek (infd, 0, SEEK_SET);
> -
> -  /* Write failure near the start.  The actual error code varies among
> -     file systems.  */
> -  find_maximum_offset ();
> -  off64_t where = maximum_offset;
> -
> -  if (current_size == 1)
> -    ++where;
> -  outoff = where;
> -  if (do_outoff)
> -    xlseek (outfd, 1, SEEK_SET);
> -  else
> -    xlseek (outfd, where, SEEK_SET);
> -  if (maximum_offset_hard_limit || where > maximum_offset)
> -    {
> -      TEST_COMPARE (copy_file_range (infd, pinoff, outfd, poutoff,
> -                                     sizeof (random_data), 0), -1);
> -      TEST_COMPARE (errno, maximum_offset_errno);
> -      TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 0);
> -      TEST_COMPARE (inoff, 0);
> -      if (do_outoff)
> -        TEST_COMPARE (xlseek (outfd, 0, SEEK_CUR), 1);
> -      else
> -        TEST_COMPARE (xlseek (outfd, 0, SEEK_CUR), where);
> -      TEST_COMPARE (outoff, where);
> -      struct stat64 st;
> -      xfstat (outfd, &st);
> -      TEST_COMPARE (st.st_size, 0);
> -    }
> -  else
> -    {
> -      /* The offset is not a hard limit.  This means we write one
> -         byte.  */
> -      TEST_COMPARE (copy_file_range (infd, pinoff, outfd, poutoff,
> -                                     sizeof (random_data), 0), 1);
> -      if (do_inoff)
> -        {
> -          TEST_COMPARE (inoff, 1);
> -          TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 0);
> -        }
> -      else
> -        {
> -          TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 1);
> -          TEST_COMPARE (inoff, 0);
> -        }
> -      if (do_outoff)
> -        {
> -          TEST_COMPARE (xlseek (outfd, 0, SEEK_CUR), 1);
> -          TEST_COMPARE (outoff, where + 1);
> -        }
> -      else
> -        {
> -          TEST_COMPARE (xlseek (outfd, 0, SEEK_CUR), where + 1);
> -          TEST_COMPARE (outoff, where);
> -        }
> -      struct stat64 st;
> -      xfstat (outfd, &st);
> -      TEST_COMPARE (st.st_size, where + 1);
> -    }
> -}
> -
> -/* Test a write failure after (potentially) writing some bytes.
> -   Failure occurs near the end of the buffer.  */
> -static void
> -delayed_write_failure_end (void)
> -{
> -  if (current_size <= 1)
> -    /* This would be same as the first test because there is not
> -       enough data to write to make a difference.  */
> -    return;
> -  xwrite (infd, random_data, sizeof (random_data));
> -  xlseek (infd, 0, SEEK_SET);
> -
> -  find_maximum_offset ();
> -  off64_t where = maximum_offset - current_size + 1;
> -  if (current_size == sizeof (random_data))
> -    /* Otherwise we do not reach the non-writable byte.  */
> -    ++where;
> -  outoff = where;
> -  if (do_outoff)
> -    xlseek (outfd, 1, SEEK_SET);
> -  else
> -    xlseek (outfd, where, SEEK_SET);
> -  ssize_t ret = copy_file_range (infd, pinoff, outfd, poutoff,
> -                                 sizeof (random_data), 0);
> -  if (ret < 0)
> -    {
> -      TEST_COMPARE (ret, -1);
> -      TEST_COMPARE (errno, maximum_offset_errno);
> -      struct stat64 st;
> -      xfstat (outfd, &st);
> -      TEST_COMPARE (st.st_size, 0);
> -    }
> -  else
> -    {
> -      /* The first copy succeeded.  This happens in the emulation
> -         because the internal buffer of limited size does not
> -         necessarily cross the off64_t boundary on the first write
> -         operation.  */
> -      if (test_verbose > 0)
> -        printf ("info:   copy_file_range (%zu) returned %zd\n",
> -                sizeof (random_data), ret);
> -      TEST_VERIFY (ret > 0);
> -      TEST_VERIFY (ret < maximum_size);
> -      struct stat64 st;
> -      xfstat (outfd, &st);
> -      TEST_COMPARE (st.st_size, where + ret);
> -      if (do_inoff)
> -        {
> -          TEST_COMPARE (inoff, ret);
> -          TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 0);
> -        }
> -      else
> -          TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), ret);
> -
> -      char *buffer = xmalloc (ret);
> -      TEST_COMPARE (pread64 (outfd, buffer, ret, where), ret);
> -      TEST_VERIFY (memcmp (buffer, random_data, ret) == 0);
> -      free (buffer);
> -
> -      /* The second copy fails.  */
> -      TEST_COMPARE (copy_file_range (infd, pinoff, outfd, poutoff,
> -                                     sizeof (random_data), 0), -1);
> -      TEST_COMPARE (errno, maximum_offset_errno);
> -    }
> -}
> -
> -/* Test a write failure across devices.  */
> -static void
> -cross_device_failure (void)
> -{
> -  if (xdevfile == NULL)
> -    /* Subtest not supported due to missing cross-device file.  */
> -    return;
> -
> -  /* We need something to write.  */
> -  xwrite (infd, random_data, sizeof (random_data));
> -  xlseek (infd, 0, SEEK_SET);
> -
> -  int xdevfd = xopen (xdevfile, O_RDWR | O_LARGEFILE, 0);
> -  TEST_COMPARE (copy_file_range (infd, pinoff, xdevfd, poutoff,
> -                                 current_size, 0), -1);
> -  TEST_COMPARE (errno, EXDEV);
> -  TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 0);
> -  struct stat64 st;
> -  xfstat (xdevfd, &st);
> -  TEST_COMPARE (st.st_size, 0);
> -
> -  xclose (xdevfd);
> -}
> -
> -/* Try to exercise ENOSPC behavior with a tempfs file system (so that
> -   we do not have to fill up a regular file system to get the error).
> -   This function runs in a subprocess, so that we do not change the
> -   mount namespace of the actual test process.  */
> -static void
> -enospc_failure_1 (void *closure)
> -{
> -#ifdef CLONE_NEWNS
> -  support_become_root ();
> -
> -  /* Make sure that we do not alter the file system mounts of the
> -     parents.  */
> -  if (! support_enter_mount_namespace ())
> -    {
> -      printf ("warning: ENOSPC test skipped\n");
> -      return;
> -    }
> -
> -  char *mountpoint = closure;
> -  if (mount ("none", mountpoint, "tmpfs", MS_NODEV | MS_NOEXEC,
> -             "size=500k") != 0)
> -    {
> -      printf ("warning: could not mount tmpfs at %s: %m\n", mountpoint);
> -      return;
> -    }
> -
> -  /* The source file must reside on the same file system.  */
> -  char *intmpfsfile = xasprintf ("%s/%s", mountpoint, "in");
> -  int intmpfsfd = xopen (intmpfsfile, O_RDWR | O_CREAT | O_LARGEFILE, 0600);
> -  xwrite (intmpfsfd, random_data, sizeof (random_data));
> -  xlseek (intmpfsfd, 1, SEEK_SET);
> -  inoff = 1;
> -
> -  char *outtmpfsfile = xasprintf ("%s/%s", mountpoint, "out");
> -  int outtmpfsfd = xopen (outtmpfsfile, O_RDWR | O_CREAT | O_LARGEFILE, 0600);
> -
> -  /* Fill the file with data until ENOSPC is reached.  */
> -  while (true)
> -    {
> -      ssize_t ret = write (outtmpfsfd, random_data, sizeof (random_data));
> -      if (ret < 0 && errno != ENOSPC)
> -        FAIL_EXIT1 ("write to %s: %m", outtmpfsfile);
> -      if (ret < sizeof (random_data))
> -        break;
> -    }
> -  TEST_COMPARE (write (outtmpfsfd, "", 1), -1);
> -  TEST_COMPARE (errno, ENOSPC);
> -  off64_t maxsize = xlseek (outtmpfsfd, 0, SEEK_CUR);
> -  TEST_VERIFY_EXIT (maxsize > sizeof (random_data));
> -
> -  /* Constructed the expected file contents.  */
> -  char *expected = xmalloc (maxsize);
> -  TEST_COMPARE (pread64 (outtmpfsfd, expected, maxsize, 0), maxsize);
> -  /* Go back a little, so some bytes can be written.  */
> -  enum { offset = 20000 };
> -  TEST_VERIFY_EXIT (offset < maxsize);
> -  TEST_VERIFY_EXIT (offset < sizeof (random_data));
> -  memcpy (expected + maxsize - offset, random_data + 1, offset);
> -
> -  if (do_outoff)
> -    {
> -      outoff = maxsize - offset;
> -      xlseek (outtmpfsfd, 2, SEEK_SET);
> -    }
> -  else
> -    xlseek (outtmpfsfd, -offset, SEEK_CUR);
> -
> -  /* First call is expected to succeed because we made room for some
> -     bytes.  */
> -  TEST_COMPARE (copy_file_range (intmpfsfd, pinoff, outtmpfsfd, poutoff,
> -                                 maximum_size, 0), offset);
> -  if (do_inoff)
> -    {
> -      TEST_COMPARE (inoff, 1 + offset);
> -      TEST_COMPARE (xlseek (intmpfsfd, 0, SEEK_CUR), 1);
> -    }
> -  else
> -      TEST_COMPARE (xlseek (intmpfsfd, 0, SEEK_CUR), 1 + offset);
> -  if (do_outoff)
> -    {
> -      TEST_COMPARE (outoff, maxsize);
> -      TEST_COMPARE (xlseek (outtmpfsfd, 0, SEEK_CUR), 2);
> -    }
> -  else
> -    TEST_COMPARE (xlseek (outtmpfsfd, 0, SEEK_CUR), maxsize);
> -  struct stat64 st;
> -  xfstat (outtmpfsfd, &st);
> -  TEST_COMPARE (st.st_size, maxsize);
> -  char *actual = xmalloc (st.st_size);
> -  TEST_COMPARE (pread64 (outtmpfsfd, actual, st.st_size, 0), st.st_size);
> -  TEST_VERIFY (memcmp (expected, actual, maxsize) == 0);
> -
> -  /* Second call should fail with ENOSPC.  */
> -  TEST_COMPARE (copy_file_range (intmpfsfd, pinoff, outtmpfsfd, poutoff,
> -                                 maximum_size, 0), -1);
> -  TEST_COMPARE (errno, ENOSPC);
> -
> -  /* Offsets should be unchanged.  */
> -  if (do_inoff)
> -    {
> -      TEST_COMPARE (inoff, 1 + offset);
> -      TEST_COMPARE (xlseek (intmpfsfd, 0, SEEK_CUR), 1);
> -    }
> -  else
> -    TEST_COMPARE (xlseek (intmpfsfd, 0, SEEK_CUR), 1 + offset);
> -  if (do_outoff)
> -    {
> -      TEST_COMPARE (outoff, maxsize);
> -      TEST_COMPARE (xlseek (outtmpfsfd, 0, SEEK_CUR), 2);
> -    }
> -  else
> -    TEST_COMPARE (xlseek (outtmpfsfd, 0, SEEK_CUR), maxsize);
> -  TEST_COMPARE (xlseek (outtmpfsfd, 0, SEEK_END), maxsize);
> -  TEST_COMPARE (pread64 (outtmpfsfd, actual, maxsize, 0), maxsize);
> -  TEST_VERIFY (memcmp (expected, actual, maxsize) == 0);
> -
> -  free (actual);
> -  free (expected);
> -
> -  xclose (intmpfsfd);
> -  xclose (outtmpfsfd);
> -  free (intmpfsfile);
> -  free (outtmpfsfile);
> -
> -#else /* !CLONE_NEWNS */
> -  puts ("warning: ENOSPC test skipped (no mount namespaces)");
> -#endif
> -}
> -
> -/* Call enospc_failure_1 in a subprocess.  */
> -static void
> -enospc_failure (void)
> -{
> -  char *mountpoint
> -    = support_create_temp_directory ("tst-copy_file_range-enospc-");
> -  support_isolate_in_subprocess (enospc_failure_1, mountpoint);
> -  free (mountpoint);
> -}
> -
> -/* The target file descriptor must have O_APPEND enabled.  */
> -static void
> -oappend_failure (void)
> -{
> -  /* Add data, to make sure we do not fail because there is
> -     insufficient input data.  */
> -  xwrite (infd, random_data, current_size);
> -  xlseek (infd, 0, SEEK_SET);
> -
> -  xclose (outfd);
> -  outfd = xopen (outfile, O_RDWR | O_APPEND, 0);
> -  TEST_COMPARE (copy_file_range (infd, pinoff, outfd, poutoff,
> -                                 current_size, 0), -1);
> -  TEST_COMPARE (errno, EBADF);
> -}
> -
>  /* Test that a short input file results in a shortened copy.  */
>  static void
>  short_copy (void)

Ok.

> @@ -721,14 +228,6 @@ struct test_case
>  static struct test_case tests[] =
>    {
>      { "simple_file_copy", simple_file_copy, .sizes = true },
> -    { "pipe_as_source", pipe_as_source, },
> -    { "pipe_as_destination", pipe_as_destination, },
> -    { "delayed_write_failure_beginning", delayed_write_failure_beginning,
> -      .sizes = true },
> -    { "delayed_write_failure_end", delayed_write_failure_end, .sizes = true },
> -    { "cross_device_failure", cross_device_failure, .sizes = true },
> -    { "enospc_failure", enospc_failure, },
> -    { "oappend_failure", oappend_failure, .sizes = true },
>      { "short_copy", short_copy, .sizes = true },
>    };
>  
> @@ -738,59 +237,20 @@ do_test (void)
>    for (unsigned char *p = random_data; p < array_end (random_data); ++p)
>      *p = rand () >> 24;
>  
> +

Gratuitous extra line?

>    infd = create_temp_file ("tst-copy_file_range-in-", &infile);
> +  outfd = create_temp_file ("tst-copy_file_range-out-", &outfile);
>    {
> -    int outfd = create_temp_file ("tst-copy_file_range-out-", &outfile);
> -    if (!support_descriptor_supports_holes (outfd))
> -      FAIL_UNSUPPORTED ("File %s does not support holes", outfile);
> -    xclose (outfd);
> -  }
> -
> -  /* Try to find a different directory from the default input/output
> -     file.  */
> -  {
> -    struct stat64 instat;
> -    xfstat (infd, &instat);
> -    static const char *const candidates[] =
> -      { NULL, "/var/tmp", "/dev/shm" };
> -    for (const char *const *c = candidates; c < array_end (candidates); ++c)
> -      {
> -        const char *path = *c;
> -        char *to_free = NULL;
> -        if (path == NULL)
> -          {
> -            to_free = xreadlink ("/proc/self/exe");
> -            path = dirname (to_free);
> -          }
> -
> -        struct stat64 cstat;
> -        xstat (path, &cstat);
> -        if (cstat.st_dev == instat.st_dev)
> -          {
> -            free (to_free);
> -            continue;
> -          }
> -
> -        printf ("info: using alternate temporary files directory: %s\n", path);
> -        xdevfile = xasprintf ("%s/tst-copy_file_range-xdev-XXXXXX", path);
> -        free (to_free);
> -        break;
> -      }
> -    if (xdevfile != NULL)
> +    ssize_t ret = copy_file_range (infd, NULL, outfd, NULL, 0, 0);
> +    if (ret != 0)
>        {
> -        int xdevfd = mkstemp (xdevfile);
> -        if (xdevfd < 0)
> -          FAIL_EXIT1 ("mkstemp (\"%s\"): %m", xdevfile);
> -        struct stat64 xdevst;
> -        xfstat (xdevfd, &xdevst);
> -        TEST_VERIFY (xdevst.st_dev != instat.st_dev);
> -        add_temp_file (xdevfile);
> -        xclose (xdevfd);
> +        if (errno == ENOSYS)
> +          FAIL_UNSUPPORTED ("copy_file_range is not support on this system");
> +        FAIL_EXIT1 ("copy_file_range probing call: %m");
>        }
> -    else
> -      puts ("warning: no alternate directory on different file system found");
>    }
>    xclose (infd);
> +  xclose (outfd);
>  
>    for (do_inoff = 0; do_inoff < 2; ++do_inoff)
>      for (do_outoff = 0; do_outoff < 2; ++do_outoff)
> @@ -832,7 +292,6 @@ do_test (void)
>  
>    free (infile);
>    free (outfile);
> -  free (xdevfile);
>  
>    return 0;
>  }

Ok.

> diff --git a/manual/llio.texi b/manual/llio.texi
> index e89affd666..447126b7eb 100644
> --- a/manual/llio.texi
> +++ b/manual/llio.texi
> @@ -1404,10 +1404,13 @@ failure occurs.  The return value is zero if the end of the input file
>  is encountered immediately.
>  
>  If no bytes can be copied, to report an error, @code{copy_file_range}
> -returns the value @math{-1} and sets @code{errno}.  The following
> -@code{errno} error conditions are specific to this function:
> +returns the value @math{-1} and sets @code{errno}.  The table below
> +lists some of the error conditions for this function.
>  
>  @table @code
> +@item ENOSYS
> +The kernel does not implement the required functionality.
> +
>  @item EISDIR
>  At least one of the descriptors @var{inputfd} or @var{outputfd} refers
>  to a directory.

Ok.

> @@ -1437,9 +1440,6 @@ reading.
>  
>  The argument @var{outputfd} is not a valid file descriptor open for
>  writing, or @var{outputfd} has been opened with @code{O_APPEND}.
> -
> -@item EXDEV
> -The input and output files reside on different file systems.
>  @end table
>  
>  In addition, @code{copy_file_range} can fail with the error codes

Ok.

> diff --git a/sysdeps/unix/sysv/linux/alpha/kernel-features.h b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
> index 4a5d029c1d..81f6c3633a 100644
> --- a/sysdeps/unix/sysv/linux/alpha/kernel-features.h
> +++ b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
> @@ -49,7 +49,6 @@
>  /* Support for copy_file_range, statx was added in kernel 4.13.  */
>  #if __LINUX_KERNEL_VERSION < 0x040D00
>  # undef __ASSUME_MLOCK2
> -# undef __ASSUME_COPY_FILE_RANGE
>  # undef __ASSUME_STATX
>  #endif
>  

Ok.

> diff --git a/sysdeps/unix/sysv/linux/arm/kernel-features.h b/sysdeps/unix/sysv/linux/arm/kernel-features.h
> index 2d2d355844..4220adff37 100644
> --- a/sysdeps/unix/sysv/linux/arm/kernel-features.h
> +++ b/sysdeps/unix/sysv/linux/arm/kernel-features.h
> @@ -45,7 +45,6 @@
>     present in 32-bit kernels from 4.4 and 4.5 respectively.  */
>  #if __LINUX_KERNEL_VERSION < 0x040700
>  # undef __ASSUME_MLOCK2
> -# undef __ASSUME_COPY_FILE_RANGE
>  #endif
>  
>  #undef __ASSUME_CLONE_DEFAULT

Ok.

> diff --git a/sysdeps/unix/sysv/linux/copy_file_range.c b/sysdeps/unix/sysv/linux/copy_file_range.c
> index 70961007a5..e950db3bf5 100644
> --- a/sysdeps/unix/sysv/linux/copy_file_range.c
> +++ b/sysdeps/unix/sysv/linux/copy_file_range.c
> @@ -20,27 +20,16 @@
>  #include <sysdep-cancel.h>
>  #include <unistd.h>
>  
> -/* Include the fallback implementation.  */
> -#ifndef __ASSUME_COPY_FILE_RANGE
> -#define COPY_FILE_RANGE_DECL static
> -#define COPY_FILE_RANGE copy_file_range_compat
> -#include <io/copy_file_range-compat.c>
> -#endif
> -
>  ssize_t
>  copy_file_range (int infd, __off64_t *pinoff,
>                   int outfd, __off64_t *poutoff,
>                   size_t length, unsigned int flags)
>  {
>  #ifdef __NR_copy_file_range
> -  ssize_t ret = SYSCALL_CANCEL (copy_file_range, infd, pinoff, outfd, poutoff,
> -                                length, flags);
> -# ifndef __ASSUME_COPY_FILE_RANGE
> -  if (ret == -1 && errno == ENOSYS)
> -    ret = copy_file_range_compat (infd, pinoff, outfd, poutoff, length, flags);
> -# endif
> -  return ret;
> -#else  /* !__NR_copy_file_range */
> -  return copy_file_range_compat (infd, pinoff, outfd, poutoff, length, flags);
> +  return SYSCALL_CANCEL (copy_file_range, infd, pinoff, outfd, poutoff,
> +                         length, flags);
> +#else
> +  __set_errno (ENOSYS);
> +  return -1;
>  #endif
>  }

Ok.

> diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h
> index bc5c959f58..1518bb5228 100644
> --- a/sysdeps/unix/sysv/linux/kernel-features.h
> +++ b/sysdeps/unix/sysv/linux/kernel-features.h
> @@ -100,10 +100,6 @@
>  # define __ASSUME_MLOCK2 1
>  #endif
>  
> -#if __LINUX_KERNEL_VERSION >= 0x040500
> -# define __ASSUME_COPY_FILE_RANGE 1
> -#endif
> -
>  /* Support for statx was added in kernel 4.11.  */
>  #if __LINUX_KERNEL_VERSION >= 0x040B00
>  # define __ASSUME_STATX 1

Ok.

> diff --git a/sysdeps/unix/sysv/linux/microblaze/kernel-features.h b/sysdeps/unix/sysv/linux/microblaze/kernel-features.h
> index 8df19400af..a787409295 100644
> --- a/sysdeps/unix/sysv/linux/microblaze/kernel-features.h
> +++ b/sysdeps/unix/sysv/linux/microblaze/kernel-features.h
> @@ -60,11 +60,6 @@
>  # undef __ASSUME_MLOCK2
>  #endif
>  
> -/* Support for the copy_file_range syscall was added in 4.10.  */
> -#if __LINUX_KERNEL_VERSION < 0x040A00
> -# undef __ASSUME_COPY_FILE_RANGE
> -#endif
> -
>  /* Support for statx was added in kernel 4.12.  */
>  #if __LINUX_KERNEL_VERSION < 0X040C00
>  # undef __ASSUME_STATX

Ok.

> diff --git a/sysdeps/unix/sysv/linux/sh/kernel-features.h b/sysdeps/unix/sysv/linux/sh/kernel-features.h
> index b11a5cb544..0f287fbf85 100644
> --- a/sysdeps/unix/sysv/linux/sh/kernel-features.h
> +++ b/sysdeps/unix/sysv/linux/sh/kernel-features.h
> @@ -49,7 +49,6 @@
>  # undef __ASSUME_RENAMEAT2
>  # undef __ASSUME_EXECVEAT
>  # undef __ASSUME_MLOCK2
> -# undef __ASSUME_COPY_FILE_RANGE
>  #endif
>  
>  /* sh does not support the statx system call before 5.1.  */
> 

Ok.
Florian Weimer June 28, 2019, 7:58 a.m. UTC | #2
* Adhemerval Zanella:

>> diff --git a/NEWS b/NEWS
>> index 8a2fecef47..3961715106 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -36,6 +36,13 @@ Major new features:
>>  
>>  Deprecated and removed features, and other changes affecting compatibility:
>>  
>> +* The copy_file_range function fails with ENOSYS if the kernel does not
>> +  support the system call of the same name.  Previously, user space
>> +  emulation was performed, but its behavior did not match the kernel
>> +  behavior, which was deemed too confusing.  Applications which use the
>> +  copy_file_range function will have to be run on kernels which implement
>> +  the copy_file_range system call.
>> +
>>  * The functions clock_gettime, clock_getres, clock_settime,
>>    clock_getcpuclockid, clock_nanosleep were removed from the librt library
>>    for new applications (on architectures which had them).  Instead, the
>
> Maybe add the minimum kernel version that implements the syscall is
> Linux 4.5.

I'm going to add this sentence to the NEWS entry:

  Support for most architectures was added in version 4.5 of the
  mainline Linux kernel.

> Do you plan to backport it to 2.27, 2.28, and/or 2.29?

I had not, but I now think we should do it.  I will also try to excise
it from our downstream releases.

> Michael, we will need to update the man pages. 
>
> LGTM with a nit below regarding a gratuitous extra line.
>
> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>

Thanks, fixed the nit, and finally tests are running.

Florian
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 8a2fecef47..3961715106 100644
--- a/NEWS
+++ b/NEWS
@@ -36,6 +36,13 @@  Major new features:
 
 Deprecated and removed features, and other changes affecting compatibility:
 
+* The copy_file_range function fails with ENOSYS if the kernel does not
+  support the system call of the same name.  Previously, user space
+  emulation was performed, but its behavior did not match the kernel
+  behavior, which was deemed too confusing.  Applications which use the
+  copy_file_range function will have to be run on kernels which implement
+  the copy_file_range system call.
+
 * The functions clock_gettime, clock_getres, clock_settime,
   clock_getcpuclockid, clock_nanosleep were removed from the librt library
   for new applications (on architectures which had them).  Instead, the
diff --git a/io/Makefile b/io/Makefile
index 088e86da77..ac3e29e1ba 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -75,11 +75,6 @@  tests		:= test-utime test-stat test-stat2 test-lfs tst-getcwd \
 		   tst-fts tst-fts-lfs tst-open-tmpfile \
 		   tst-copy_file_range tst-getcwd-abspath tst-lockf
 
-# This test includes the compat implementation of copy_file_range,
-# which uses internal, unexported libc functions.
-tests-static += tst-copy_file_range-compat
-tests-internal += tst-copy_file_range-compat
-
 # Likewise for statx, but we do not need static linking here.
 tests-internal += tst-statx
 
diff --git a/io/copy_file_range-compat.c b/io/copy_file_range-compat.c
deleted file mode 100644
index 58dbeef3e9..0000000000
--- a/io/copy_file_range-compat.c
+++ /dev/null
@@ -1,160 +0,0 @@ 
-/* Emulation of copy_file_range.
-   Copyright (C) 2017-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-/* The following macros should be defined before including this
-   file:
-
-   COPY_FILE_RANGE_DECL   Declaration specifiers for the function below.
-   COPY_FILE_RANGE        Name of the function to define.  */
-
-#include <errno.h>
-#include <fcntl.h>
-#include <inttypes.h>
-#include <limits.h>
-#include <sys/stat.h>
-#include <sys/types.h>
-#include <unistd.h>
-
-COPY_FILE_RANGE_DECL
-ssize_t
-COPY_FILE_RANGE (int infd, __off64_t *pinoff,
-                 int outfd, __off64_t *poutoff,
-                 size_t length, unsigned int flags)
-{
-  if (flags != 0)
-    {
-      __set_errno (EINVAL);
-      return -1;
-    }
-
-  {
-    struct stat64 instat;
-    struct stat64 outstat;
-    if (fstat64 (infd, &instat) != 0 || fstat64 (outfd, &outstat) != 0)
-      return -1;
-    if (S_ISDIR (instat.st_mode) || S_ISDIR (outstat.st_mode))
-      {
-        __set_errno (EISDIR);
-        return -1;
-      }
-    if (!S_ISREG (instat.st_mode) || !S_ISREG (outstat.st_mode))
-      {
-        /* We need a regular input file so that the we can seek
-           backwards in case of a write failure.  */
-        __set_errno (EINVAL);
-        return -1;
-      }
-    if (instat.st_dev != outstat.st_dev)
-      {
-        /* Cross-device copies are not supported.  */
-        __set_errno (EXDEV);
-        return -1;
-      }
-  }
-
-  /* The output descriptor must not have O_APPEND set.  */
-  {
-    int flags = __fcntl (outfd, F_GETFL);
-    if (flags & O_APPEND)
-      {
-        __set_errno (EBADF);
-        return -1;
-      }
-  }
-
-  /* Avoid an overflow in the result.  */
-  if (length > SSIZE_MAX)
-    length = SSIZE_MAX;
-
-  /* Main copying loop.  The buffer size is arbitrary and is a
-     trade-off between stack size consumption, cache usage, and
-     amortization of system call overhead.  */
-  size_t copied = 0;
-  char buf[8192];
-  while (length > 0)
-    {
-      size_t to_read = length;
-      if (to_read > sizeof (buf))
-        to_read = sizeof (buf);
-
-      /* Fill the buffer.  */
-      ssize_t read_count;
-      if (pinoff == NULL)
-        read_count = read (infd, buf, to_read);
-      else
-        read_count = __libc_pread64 (infd, buf, to_read, *pinoff);
-      if (read_count == 0)
-        /* End of file reached prematurely.  */
-        return copied;
-      if (read_count < 0)
-        {
-          if (copied > 0)
-            /* Report the number of bytes copied so far.  */
-            return copied;
-          return -1;
-        }
-      if (pinoff != NULL)
-        *pinoff += read_count;
-
-      /* Write the buffer part which was read to the destination.  */
-      char *end = buf + read_count;
-      for (char *p = buf; p < end; )
-        {
-          ssize_t write_count;
-          if (poutoff == NULL)
-            write_count = write (outfd, p, end - p);
-          else
-            write_count = __libc_pwrite64 (outfd, p, end - p, *poutoff);
-          if (write_count < 0)
-            {
-              /* Adjust the input read position to match what we have
-                 written, so that the caller can pick up after the
-                 error.  */
-              size_t written = p - buf;
-              /* NB: This needs to be signed so that we can form the
-                 negative value below.  */
-              ssize_t overread = read_count - written;
-              if (pinoff == NULL)
-                {
-                  if (overread > 0)
-                    {
-                      /* We are on an error recovery path, so we
-                         cannot deal with failure here.  */
-                      int save_errno = errno;
-                      (void) __libc_lseek64 (infd, -overread, SEEK_CUR);
-                      __set_errno (save_errno);
-                    }
-                }
-              else /* pinoff != NULL */
-                *pinoff -= overread;
-
-              if (copied + written > 0)
-                /* Report the number of bytes copied so far.  */
-                return copied + written;
-              return -1;
-            }
-          p += write_count;
-          if (poutoff != NULL)
-            *poutoff += write_count;
-        } /* Write loop.  */
-
-      copied += read_count;
-      length -= read_count;
-    }
-  return copied;
-}
diff --git a/io/copy_file_range.c b/io/copy_file_range.c
index 7b968be19d..59fb979773 100644
--- a/io/copy_file_range.c
+++ b/io/copy_file_range.c
@@ -1,4 +1,4 @@ 
-/* Generic implementation of copy_file_range.
+/* Stub implementation of copy_file_range.
    Copyright (C) 2017-2019 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,7 +16,15 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#define COPY_FILE_RANGE_DECL
-#define COPY_FILE_RANGE copy_file_range
+#include <errno.h>
+#include <unistd.h>
 
-#include <io/copy_file_range-compat.c>
+ssize_t
+copy_file_range (int infd, __off64_t *pinoff,
+                 int outfd, __off64_t *poutoff,
+                 size_t length, unsigned int flags)
+{
+  __set_errno (ENOSYS);
+  return -1;
+}
+stub_warning (copy_file_range)
diff --git a/io/tst-copy_file_range-compat.c b/io/tst-copy_file_range-compat.c
deleted file mode 100644
index fe6de8ac68..0000000000
--- a/io/tst-copy_file_range-compat.c
+++ /dev/null
@@ -1,30 +0,0 @@ 
-/* Test the fallback implementation of copy_file_range.
-   Copyright (C) 2017-2019 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-/* Get the declaration of the official copy_of_range function.  */
-#include <unistd.h>
-
-/* Compile a local version of copy_file_range.  */
-#define COPY_FILE_RANGE_DECL static
-#define COPY_FILE_RANGE copy_file_range_compat
-#include <io/copy_file_range-compat.c>
-
-/* Re-use the test, but run it against copy_file_range_compat defined
-   above.  */
-#define copy_file_range copy_file_range_compat
-#include "tst-copy_file_range.c"
diff --git a/io/tst-copy_file_range.c b/io/tst-copy_file_range.c
index a5dcf3c1f6..353ba509be 100644
--- a/io/tst-copy_file_range.c
+++ b/io/tst-copy_file_range.c
@@ -20,22 +20,15 @@ 
 #include <errno.h>
 #include <fcntl.h>
 #include <inttypes.h>
-#include <libgen.h>
-#include <poll.h>
-#include <sched.h>
 #include <stdbool.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <support/check.h>
-#include <support/namespace.h>
 #include <support/support.h>
 #include <support/temp_file.h>
 #include <support/test-driver.h>
 #include <support/xunistd.h>
-#ifdef CLONE_NEWNS
-# include <sys/mount.h>
-#endif
 
 /* Boolean flags which indicate whether to use pointers with explicit
    output flags.  */
@@ -49,10 +42,6 @@  static int infd;
 static char *outfile;
 static int outfd;
 
-/* Like the above, but on a different file system.  xdevfile can be
-   NULL if no suitable file system has been found.  */
-static char *xdevfile;
-
 /* Input and output offsets.  Set according to do_inoff and do_outoff
    before the test.  The offsets themselves are always set to
    zero.  */
@@ -61,13 +50,10 @@  static off64_t *pinoff;
 static off64_t outoff;
 static off64_t *poutoff;
 
-/* These are a collection of copy sizes used in tests.  The selection
-   takes into account that the fallback implementation uses an
-   internal buffer of 8192 bytes.  */
+/* These are a collection of copy sizes used in tests.    */
 enum { maximum_size = 99999 };
 static const int typical_sizes[] =
-  { 0, 1, 2, 3, 1024, 2048, 4096, 8191, 8192, 8193, 16383, 16384, 16385,
-    maximum_size };
+  { 0, 1, 2, 3, 1024, 2048, 4096, 8191, 8192, 8193, maximum_size };
 
 /* The random contents of this array can be used as a pattern to check
    for correct write operations.  */
@@ -76,101 +62,6 @@  static unsigned char random_data[maximum_size];
 /* The size chosen by the test harness.  */
 static int current_size;
 
-/* Maximum writable file offset.  Updated by find_maximum_offset
-   below.  */
-static off64_t maximum_offset;
-
-/* Error code when crossing the offset.  */
-static int maximum_offset_errno;
-
-/* If true: Writes which cross the limit will fail.  If false: Writes
-   which cross the limit will result in a partial write.  */
-static bool maximum_offset_hard_limit;
-
-/* Fills maximum_offset etc. above.  Truncates outfd as a side
-   effect.  */
-static void
-find_maximum_offset (void)
-{
-  xftruncate (outfd, 0);
-  if (maximum_offset != 0)
-    return;
-
-  uint64_t upper = -1;
-  upper >>= 1;                  /* Maximum of off64_t.  */
-  TEST_VERIFY ((off64_t) upper > 0);
-  TEST_VERIFY ((off64_t) (upper + 1) < 0);
-  if (lseek64 (outfd, upper, SEEK_SET) >= 0)
-    {
-      if (write (outfd, "", 1) == 1)
-        FAIL_EXIT1 ("created a file larger than the off64_t range");
-    }
-
-  uint64_t lower = 1024 * 1024; /* A reasonable minimum file size.  */
-  /* Loop invariant: writing at lower succeeds, writing at upper fails.  */
-  while (lower + 1 < upper)
-    {
-      uint64_t middle = (lower + upper) / 2;
-      if (test_verbose > 0)
-        printf ("info: %s: remaining test range %" PRIu64 " .. %" PRIu64
-                ", probe at %" PRIu64 "\n", __func__, lower, upper, middle);
-      xftruncate (outfd, 0);
-      if (lseek64 (outfd, middle, SEEK_SET) >= 0
-          && write (outfd, "", 1) == 1)
-        lower = middle;
-      else
-        upper = middle;
-    }
-  TEST_VERIFY (lower + 1 == upper);
-  maximum_offset = lower;
-  printf ("info: maximum writable file offset: %" PRIu64 " (%" PRIx64 ")\n",
-          lower, lower);
-
-  /* Check that writing at the valid offset actually works.  */
-  xftruncate (outfd, 0);
-  xlseek (outfd, lower, SEEK_SET);
-  TEST_COMPARE (write (outfd, "", 1), 1);
-
-  /* Cross the boundary with a two-byte write.  This can either result
-     in a short write, or a failure.  */
-  xlseek (outfd, lower, SEEK_SET);
-  ssize_t ret = write (outfd, " ", 2);
-  if (ret < 0)
-    {
-      maximum_offset_errno = errno;
-      maximum_offset_hard_limit = true;
-    }
-  else
-    maximum_offset_hard_limit = false;
-
-  /* Check that writing at the next offset actually fails.  This also
-     obtains the expected errno value.  */
-  xftruncate (outfd, 0);
-  const char *action;
-  if (lseek64 (outfd, lower + 1, SEEK_SET) != 0)
-    {
-      if (write (outfd, "", 1) != -1)
-        FAIL_EXIT1 ("write to impossible offset %" PRIu64 " succeeded",
-                    lower + 1);
-      action = "writing";
-      int errno_copy = errno;
-      if (maximum_offset_hard_limit)
-        TEST_COMPARE (errno_copy, maximum_offset_errno);
-      else
-        maximum_offset_errno = errno_copy;
-    }
-  else
-    {
-      action = "seeking";
-      maximum_offset_errno = errno;
-    }
-  printf ("info: %s out of range fails with %m (%d)\n",
-          action, maximum_offset_errno);
-
-  xftruncate (outfd, 0);
-  xlseek (outfd, 0, SEEK_SET);
-}
-
 /* Perform a copy of a file.  */
 static void
 simple_file_copy (void)
@@ -247,390 +138,6 @@  simple_file_copy (void)
   free (bytes);
 }
 
-/* Test that reading from a pipe willfails.  */
-static void
-pipe_as_source (void)
-{
-  int pipefds[2];
-  xpipe (pipefds);
-
-  for (int length = 0; length < 2; ++length)
-    {
-      if (test_verbose > 0)
-        printf ("info: %s: length=%d\n", __func__, length);
-
-      /* Make sure that there is something to copy in the pipe.  */
-      xwrite (pipefds[1], "@", 1);
-
-      TEST_COMPARE (copy_file_range (pipefds[0], pinoff, outfd, poutoff,
-                                     length, 0), -1);
-      /* Linux 4.10 and later return EINVAL.  Older kernels return
-         EXDEV.  */
-      TEST_VERIFY (errno == EINVAL || errno == EXDEV);
-      TEST_COMPARE (inoff, 0);
-      TEST_COMPARE (outoff, 0);
-      TEST_COMPARE (xlseek (outfd, 0, SEEK_CUR), 0);
-
-      /* Make sure that nothing was read.  */
-      char buf = 'A';
-      TEST_COMPARE (read (pipefds[0], &buf, 1), 1);
-      TEST_COMPARE (buf, '@');
-    }
-
-  xclose (pipefds[0]);
-  xclose (pipefds[1]);
-}
-
-/* Test that writing to a pipe fails.  */
-static void
-pipe_as_destination (void)
-{
-  /* Make sure that there is something to read in the input file.  */
-  xwrite (infd, "abc", 3);
-  xlseek (infd, 0, SEEK_SET);
-
-  int pipefds[2];
-  xpipe (pipefds);
-
-  for (int length = 0; length < 2; ++length)
-    {
-      if (test_verbose > 0)
-        printf ("info: %s: length=%d\n", __func__, length);
-
-      TEST_COMPARE (copy_file_range (infd, pinoff, pipefds[1], poutoff,
-                                     length, 0), -1);
-      /* Linux 4.10 and later return EINVAL.  Older kernels return
-         EXDEV.  */
-      TEST_VERIFY (errno == EINVAL || errno == EXDEV);
-      TEST_COMPARE (inoff, 0);
-      TEST_COMPARE (outoff, 0);
-      TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 0);
-
-      /* Make sure that nothing was written.  */
-      struct pollfd pollfd = { .fd = pipefds[0], .events = POLLIN, };
-      TEST_COMPARE (poll (&pollfd, 1, 0), 0);
-    }
-
-  xclose (pipefds[0]);
-  xclose (pipefds[1]);
-}
-
-/* Test a write failure after (potentially) writing some bytes.
-   Failure occurs near the start of the buffer.  */
-static void
-delayed_write_failure_beginning (void)
-{
-  /* We need to write something to provoke the error.  */
-  if (current_size == 0)
-    return;
-  xwrite (infd, random_data, sizeof (random_data));
-  xlseek (infd, 0, SEEK_SET);
-
-  /* Write failure near the start.  The actual error code varies among
-     file systems.  */
-  find_maximum_offset ();
-  off64_t where = maximum_offset;
-
-  if (current_size == 1)
-    ++where;
-  outoff = where;
-  if (do_outoff)
-    xlseek (outfd, 1, SEEK_SET);
-  else
-    xlseek (outfd, where, SEEK_SET);
-  if (maximum_offset_hard_limit || where > maximum_offset)
-    {
-      TEST_COMPARE (copy_file_range (infd, pinoff, outfd, poutoff,
-                                     sizeof (random_data), 0), -1);
-      TEST_COMPARE (errno, maximum_offset_errno);
-      TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 0);
-      TEST_COMPARE (inoff, 0);
-      if (do_outoff)
-        TEST_COMPARE (xlseek (outfd, 0, SEEK_CUR), 1);
-      else
-        TEST_COMPARE (xlseek (outfd, 0, SEEK_CUR), where);
-      TEST_COMPARE (outoff, where);
-      struct stat64 st;
-      xfstat (outfd, &st);
-      TEST_COMPARE (st.st_size, 0);
-    }
-  else
-    {
-      /* The offset is not a hard limit.  This means we write one
-         byte.  */
-      TEST_COMPARE (copy_file_range (infd, pinoff, outfd, poutoff,
-                                     sizeof (random_data), 0), 1);
-      if (do_inoff)
-        {
-          TEST_COMPARE (inoff, 1);
-          TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 0);
-        }
-      else
-        {
-          TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 1);
-          TEST_COMPARE (inoff, 0);
-        }
-      if (do_outoff)
-        {
-          TEST_COMPARE (xlseek (outfd, 0, SEEK_CUR), 1);
-          TEST_COMPARE (outoff, where + 1);
-        }
-      else
-        {
-          TEST_COMPARE (xlseek (outfd, 0, SEEK_CUR), where + 1);
-          TEST_COMPARE (outoff, where);
-        }
-      struct stat64 st;
-      xfstat (outfd, &st);
-      TEST_COMPARE (st.st_size, where + 1);
-    }
-}
-
-/* Test a write failure after (potentially) writing some bytes.
-   Failure occurs near the end of the buffer.  */
-static void
-delayed_write_failure_end (void)
-{
-  if (current_size <= 1)
-    /* This would be same as the first test because there is not
-       enough data to write to make a difference.  */
-    return;
-  xwrite (infd, random_data, sizeof (random_data));
-  xlseek (infd, 0, SEEK_SET);
-
-  find_maximum_offset ();
-  off64_t where = maximum_offset - current_size + 1;
-  if (current_size == sizeof (random_data))
-    /* Otherwise we do not reach the non-writable byte.  */
-    ++where;
-  outoff = where;
-  if (do_outoff)
-    xlseek (outfd, 1, SEEK_SET);
-  else
-    xlseek (outfd, where, SEEK_SET);
-  ssize_t ret = copy_file_range (infd, pinoff, outfd, poutoff,
-                                 sizeof (random_data), 0);
-  if (ret < 0)
-    {
-      TEST_COMPARE (ret, -1);
-      TEST_COMPARE (errno, maximum_offset_errno);
-      struct stat64 st;
-      xfstat (outfd, &st);
-      TEST_COMPARE (st.st_size, 0);
-    }
-  else
-    {
-      /* The first copy succeeded.  This happens in the emulation
-         because the internal buffer of limited size does not
-         necessarily cross the off64_t boundary on the first write
-         operation.  */
-      if (test_verbose > 0)
-        printf ("info:   copy_file_range (%zu) returned %zd\n",
-                sizeof (random_data), ret);
-      TEST_VERIFY (ret > 0);
-      TEST_VERIFY (ret < maximum_size);
-      struct stat64 st;
-      xfstat (outfd, &st);
-      TEST_COMPARE (st.st_size, where + ret);
-      if (do_inoff)
-        {
-          TEST_COMPARE (inoff, ret);
-          TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 0);
-        }
-      else
-          TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), ret);
-
-      char *buffer = xmalloc (ret);
-      TEST_COMPARE (pread64 (outfd, buffer, ret, where), ret);
-      TEST_VERIFY (memcmp (buffer, random_data, ret) == 0);
-      free (buffer);
-
-      /* The second copy fails.  */
-      TEST_COMPARE (copy_file_range (infd, pinoff, outfd, poutoff,
-                                     sizeof (random_data), 0), -1);
-      TEST_COMPARE (errno, maximum_offset_errno);
-    }
-}
-
-/* Test a write failure across devices.  */
-static void
-cross_device_failure (void)
-{
-  if (xdevfile == NULL)
-    /* Subtest not supported due to missing cross-device file.  */
-    return;
-
-  /* We need something to write.  */
-  xwrite (infd, random_data, sizeof (random_data));
-  xlseek (infd, 0, SEEK_SET);
-
-  int xdevfd = xopen (xdevfile, O_RDWR | O_LARGEFILE, 0);
-  TEST_COMPARE (copy_file_range (infd, pinoff, xdevfd, poutoff,
-                                 current_size, 0), -1);
-  TEST_COMPARE (errno, EXDEV);
-  TEST_COMPARE (xlseek (infd, 0, SEEK_CUR), 0);
-  struct stat64 st;
-  xfstat (xdevfd, &st);
-  TEST_COMPARE (st.st_size, 0);
-
-  xclose (xdevfd);
-}
-
-/* Try to exercise ENOSPC behavior with a tempfs file system (so that
-   we do not have to fill up a regular file system to get the error).
-   This function runs in a subprocess, so that we do not change the
-   mount namespace of the actual test process.  */
-static void
-enospc_failure_1 (void *closure)
-{
-#ifdef CLONE_NEWNS
-  support_become_root ();
-
-  /* Make sure that we do not alter the file system mounts of the
-     parents.  */
-  if (! support_enter_mount_namespace ())
-    {
-      printf ("warning: ENOSPC test skipped\n");
-      return;
-    }
-
-  char *mountpoint = closure;
-  if (mount ("none", mountpoint, "tmpfs", MS_NODEV | MS_NOEXEC,
-             "size=500k") != 0)
-    {
-      printf ("warning: could not mount tmpfs at %s: %m\n", mountpoint);
-      return;
-    }
-
-  /* The source file must reside on the same file system.  */
-  char *intmpfsfile = xasprintf ("%s/%s", mountpoint, "in");
-  int intmpfsfd = xopen (intmpfsfile, O_RDWR | O_CREAT | O_LARGEFILE, 0600);
-  xwrite (intmpfsfd, random_data, sizeof (random_data));
-  xlseek (intmpfsfd, 1, SEEK_SET);
-  inoff = 1;
-
-  char *outtmpfsfile = xasprintf ("%s/%s", mountpoint, "out");
-  int outtmpfsfd = xopen (outtmpfsfile, O_RDWR | O_CREAT | O_LARGEFILE, 0600);
-
-  /* Fill the file with data until ENOSPC is reached.  */
-  while (true)
-    {
-      ssize_t ret = write (outtmpfsfd, random_data, sizeof (random_data));
-      if (ret < 0 && errno != ENOSPC)
-        FAIL_EXIT1 ("write to %s: %m", outtmpfsfile);
-      if (ret < sizeof (random_data))
-        break;
-    }
-  TEST_COMPARE (write (outtmpfsfd, "", 1), -1);
-  TEST_COMPARE (errno, ENOSPC);
-  off64_t maxsize = xlseek (outtmpfsfd, 0, SEEK_CUR);
-  TEST_VERIFY_EXIT (maxsize > sizeof (random_data));
-
-  /* Constructed the expected file contents.  */
-  char *expected = xmalloc (maxsize);
-  TEST_COMPARE (pread64 (outtmpfsfd, expected, maxsize, 0), maxsize);
-  /* Go back a little, so some bytes can be written.  */
-  enum { offset = 20000 };
-  TEST_VERIFY_EXIT (offset < maxsize);
-  TEST_VERIFY_EXIT (offset < sizeof (random_data));
-  memcpy (expected + maxsize - offset, random_data + 1, offset);
-
-  if (do_outoff)
-    {
-      outoff = maxsize - offset;
-      xlseek (outtmpfsfd, 2, SEEK_SET);
-    }
-  else
-    xlseek (outtmpfsfd, -offset, SEEK_CUR);
-
-  /* First call is expected to succeed because we made room for some
-     bytes.  */
-  TEST_COMPARE (copy_file_range (intmpfsfd, pinoff, outtmpfsfd, poutoff,
-                                 maximum_size, 0), offset);
-  if (do_inoff)
-    {
-      TEST_COMPARE (inoff, 1 + offset);
-      TEST_COMPARE (xlseek (intmpfsfd, 0, SEEK_CUR), 1);
-    }
-  else
-      TEST_COMPARE (xlseek (intmpfsfd, 0, SEEK_CUR), 1 + offset);
-  if (do_outoff)
-    {
-      TEST_COMPARE (outoff, maxsize);
-      TEST_COMPARE (xlseek (outtmpfsfd, 0, SEEK_CUR), 2);
-    }
-  else
-    TEST_COMPARE (xlseek (outtmpfsfd, 0, SEEK_CUR), maxsize);
-  struct stat64 st;
-  xfstat (outtmpfsfd, &st);
-  TEST_COMPARE (st.st_size, maxsize);
-  char *actual = xmalloc (st.st_size);
-  TEST_COMPARE (pread64 (outtmpfsfd, actual, st.st_size, 0), st.st_size);
-  TEST_VERIFY (memcmp (expected, actual, maxsize) == 0);
-
-  /* Second call should fail with ENOSPC.  */
-  TEST_COMPARE (copy_file_range (intmpfsfd, pinoff, outtmpfsfd, poutoff,
-                                 maximum_size, 0), -1);
-  TEST_COMPARE (errno, ENOSPC);
-
-  /* Offsets should be unchanged.  */
-  if (do_inoff)
-    {
-      TEST_COMPARE (inoff, 1 + offset);
-      TEST_COMPARE (xlseek (intmpfsfd, 0, SEEK_CUR), 1);
-    }
-  else
-    TEST_COMPARE (xlseek (intmpfsfd, 0, SEEK_CUR), 1 + offset);
-  if (do_outoff)
-    {
-      TEST_COMPARE (outoff, maxsize);
-      TEST_COMPARE (xlseek (outtmpfsfd, 0, SEEK_CUR), 2);
-    }
-  else
-    TEST_COMPARE (xlseek (outtmpfsfd, 0, SEEK_CUR), maxsize);
-  TEST_COMPARE (xlseek (outtmpfsfd, 0, SEEK_END), maxsize);
-  TEST_COMPARE (pread64 (outtmpfsfd, actual, maxsize, 0), maxsize);
-  TEST_VERIFY (memcmp (expected, actual, maxsize) == 0);
-
-  free (actual);
-  free (expected);
-
-  xclose (intmpfsfd);
-  xclose (outtmpfsfd);
-  free (intmpfsfile);
-  free (outtmpfsfile);
-
-#else /* !CLONE_NEWNS */
-  puts ("warning: ENOSPC test skipped (no mount namespaces)");
-#endif
-}
-
-/* Call enospc_failure_1 in a subprocess.  */
-static void
-enospc_failure (void)
-{
-  char *mountpoint
-    = support_create_temp_directory ("tst-copy_file_range-enospc-");
-  support_isolate_in_subprocess (enospc_failure_1, mountpoint);
-  free (mountpoint);
-}
-
-/* The target file descriptor must have O_APPEND enabled.  */
-static void
-oappend_failure (void)
-{
-  /* Add data, to make sure we do not fail because there is
-     insufficient input data.  */
-  xwrite (infd, random_data, current_size);
-  xlseek (infd, 0, SEEK_SET);
-
-  xclose (outfd);
-  outfd = xopen (outfile, O_RDWR | O_APPEND, 0);
-  TEST_COMPARE (copy_file_range (infd, pinoff, outfd, poutoff,
-                                 current_size, 0), -1);
-  TEST_COMPARE (errno, EBADF);
-}
-
 /* Test that a short input file results in a shortened copy.  */
 static void
 short_copy (void)
@@ -721,14 +228,6 @@  struct test_case
 static struct test_case tests[] =
   {
     { "simple_file_copy", simple_file_copy, .sizes = true },
-    { "pipe_as_source", pipe_as_source, },
-    { "pipe_as_destination", pipe_as_destination, },
-    { "delayed_write_failure_beginning", delayed_write_failure_beginning,
-      .sizes = true },
-    { "delayed_write_failure_end", delayed_write_failure_end, .sizes = true },
-    { "cross_device_failure", cross_device_failure, .sizes = true },
-    { "enospc_failure", enospc_failure, },
-    { "oappend_failure", oappend_failure, .sizes = true },
     { "short_copy", short_copy, .sizes = true },
   };
 
@@ -738,59 +237,20 @@  do_test (void)
   for (unsigned char *p = random_data; p < array_end (random_data); ++p)
     *p = rand () >> 24;
 
+
   infd = create_temp_file ("tst-copy_file_range-in-", &infile);
+  outfd = create_temp_file ("tst-copy_file_range-out-", &outfile);
   {
-    int outfd = create_temp_file ("tst-copy_file_range-out-", &outfile);
-    if (!support_descriptor_supports_holes (outfd))
-      FAIL_UNSUPPORTED ("File %s does not support holes", outfile);
-    xclose (outfd);
-  }
-
-  /* Try to find a different directory from the default input/output
-     file.  */
-  {
-    struct stat64 instat;
-    xfstat (infd, &instat);
-    static const char *const candidates[] =
-      { NULL, "/var/tmp", "/dev/shm" };
-    for (const char *const *c = candidates; c < array_end (candidates); ++c)
-      {
-        const char *path = *c;
-        char *to_free = NULL;
-        if (path == NULL)
-          {
-            to_free = xreadlink ("/proc/self/exe");
-            path = dirname (to_free);
-          }
-
-        struct stat64 cstat;
-        xstat (path, &cstat);
-        if (cstat.st_dev == instat.st_dev)
-          {
-            free (to_free);
-            continue;
-          }
-
-        printf ("info: using alternate temporary files directory: %s\n", path);
-        xdevfile = xasprintf ("%s/tst-copy_file_range-xdev-XXXXXX", path);
-        free (to_free);
-        break;
-      }
-    if (xdevfile != NULL)
+    ssize_t ret = copy_file_range (infd, NULL, outfd, NULL, 0, 0);
+    if (ret != 0)
       {
-        int xdevfd = mkstemp (xdevfile);
-        if (xdevfd < 0)
-          FAIL_EXIT1 ("mkstemp (\"%s\"): %m", xdevfile);
-        struct stat64 xdevst;
-        xfstat (xdevfd, &xdevst);
-        TEST_VERIFY (xdevst.st_dev != instat.st_dev);
-        add_temp_file (xdevfile);
-        xclose (xdevfd);
+        if (errno == ENOSYS)
+          FAIL_UNSUPPORTED ("copy_file_range is not support on this system");
+        FAIL_EXIT1 ("copy_file_range probing call: %m");
       }
-    else
-      puts ("warning: no alternate directory on different file system found");
   }
   xclose (infd);
+  xclose (outfd);
 
   for (do_inoff = 0; do_inoff < 2; ++do_inoff)
     for (do_outoff = 0; do_outoff < 2; ++do_outoff)
@@ -832,7 +292,6 @@  do_test (void)
 
   free (infile);
   free (outfile);
-  free (xdevfile);
 
   return 0;
 }
diff --git a/manual/llio.texi b/manual/llio.texi
index e89affd666..447126b7eb 100644
--- a/manual/llio.texi
+++ b/manual/llio.texi
@@ -1404,10 +1404,13 @@  failure occurs.  The return value is zero if the end of the input file
 is encountered immediately.
 
 If no bytes can be copied, to report an error, @code{copy_file_range}
-returns the value @math{-1} and sets @code{errno}.  The following
-@code{errno} error conditions are specific to this function:
+returns the value @math{-1} and sets @code{errno}.  The table below
+lists some of the error conditions for this function.
 
 @table @code
+@item ENOSYS
+The kernel does not implement the required functionality.
+
 @item EISDIR
 At least one of the descriptors @var{inputfd} or @var{outputfd} refers
 to a directory.
@@ -1437,9 +1440,6 @@  reading.
 
 The argument @var{outputfd} is not a valid file descriptor open for
 writing, or @var{outputfd} has been opened with @code{O_APPEND}.
-
-@item EXDEV
-The input and output files reside on different file systems.
 @end table
 
 In addition, @code{copy_file_range} can fail with the error codes
diff --git a/sysdeps/unix/sysv/linux/alpha/kernel-features.h b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
index 4a5d029c1d..81f6c3633a 100644
--- a/sysdeps/unix/sysv/linux/alpha/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/alpha/kernel-features.h
@@ -49,7 +49,6 @@ 
 /* Support for copy_file_range, statx was added in kernel 4.13.  */
 #if __LINUX_KERNEL_VERSION < 0x040D00
 # undef __ASSUME_MLOCK2
-# undef __ASSUME_COPY_FILE_RANGE
 # undef __ASSUME_STATX
 #endif
 
diff --git a/sysdeps/unix/sysv/linux/arm/kernel-features.h b/sysdeps/unix/sysv/linux/arm/kernel-features.h
index 2d2d355844..4220adff37 100644
--- a/sysdeps/unix/sysv/linux/arm/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/arm/kernel-features.h
@@ -45,7 +45,6 @@ 
    present in 32-bit kernels from 4.4 and 4.5 respectively.  */
 #if __LINUX_KERNEL_VERSION < 0x040700
 # undef __ASSUME_MLOCK2
-# undef __ASSUME_COPY_FILE_RANGE
 #endif
 
 #undef __ASSUME_CLONE_DEFAULT
diff --git a/sysdeps/unix/sysv/linux/copy_file_range.c b/sysdeps/unix/sysv/linux/copy_file_range.c
index 70961007a5..e950db3bf5 100644
--- a/sysdeps/unix/sysv/linux/copy_file_range.c
+++ b/sysdeps/unix/sysv/linux/copy_file_range.c
@@ -20,27 +20,16 @@ 
 #include <sysdep-cancel.h>
 #include <unistd.h>
 
-/* Include the fallback implementation.  */
-#ifndef __ASSUME_COPY_FILE_RANGE
-#define COPY_FILE_RANGE_DECL static
-#define COPY_FILE_RANGE copy_file_range_compat
-#include <io/copy_file_range-compat.c>
-#endif
-
 ssize_t
 copy_file_range (int infd, __off64_t *pinoff,
                  int outfd, __off64_t *poutoff,
                  size_t length, unsigned int flags)
 {
 #ifdef __NR_copy_file_range
-  ssize_t ret = SYSCALL_CANCEL (copy_file_range, infd, pinoff, outfd, poutoff,
-                                length, flags);
-# ifndef __ASSUME_COPY_FILE_RANGE
-  if (ret == -1 && errno == ENOSYS)
-    ret = copy_file_range_compat (infd, pinoff, outfd, poutoff, length, flags);
-# endif
-  return ret;
-#else  /* !__NR_copy_file_range */
-  return copy_file_range_compat (infd, pinoff, outfd, poutoff, length, flags);
+  return SYSCALL_CANCEL (copy_file_range, infd, pinoff, outfd, poutoff,
+                         length, flags);
+#else
+  __set_errno (ENOSYS);
+  return -1;
 #endif
 }
diff --git a/sysdeps/unix/sysv/linux/kernel-features.h b/sysdeps/unix/sysv/linux/kernel-features.h
index bc5c959f58..1518bb5228 100644
--- a/sysdeps/unix/sysv/linux/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/kernel-features.h
@@ -100,10 +100,6 @@ 
 # define __ASSUME_MLOCK2 1
 #endif
 
-#if __LINUX_KERNEL_VERSION >= 0x040500
-# define __ASSUME_COPY_FILE_RANGE 1
-#endif
-
 /* Support for statx was added in kernel 4.11.  */
 #if __LINUX_KERNEL_VERSION >= 0x040B00
 # define __ASSUME_STATX 1
diff --git a/sysdeps/unix/sysv/linux/microblaze/kernel-features.h b/sysdeps/unix/sysv/linux/microblaze/kernel-features.h
index 8df19400af..a787409295 100644
--- a/sysdeps/unix/sysv/linux/microblaze/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/microblaze/kernel-features.h
@@ -60,11 +60,6 @@ 
 # undef __ASSUME_MLOCK2
 #endif
 
-/* Support for the copy_file_range syscall was added in 4.10.  */
-#if __LINUX_KERNEL_VERSION < 0x040A00
-# undef __ASSUME_COPY_FILE_RANGE
-#endif
-
 /* Support for statx was added in kernel 4.12.  */
 #if __LINUX_KERNEL_VERSION < 0X040C00
 # undef __ASSUME_STATX
diff --git a/sysdeps/unix/sysv/linux/sh/kernel-features.h b/sysdeps/unix/sysv/linux/sh/kernel-features.h
index b11a5cb544..0f287fbf85 100644
--- a/sysdeps/unix/sysv/linux/sh/kernel-features.h
+++ b/sysdeps/unix/sysv/linux/sh/kernel-features.h
@@ -49,7 +49,6 @@ 
 # undef __ASSUME_RENAMEAT2
 # undef __ASSUME_EXECVEAT
 # undef __ASSUME_MLOCK2
-# undef __ASSUME_COPY_FILE_RANGE
 #endif
 
 /* sh does not support the statx system call before 5.1.  */