Message ID | 20190526061100.21761-10-amir73il@gmail.com |
---|---|
State | New |
Headers | show |
Series | Fixes for major copy_file_range() issues | expand |
On Sun, May 26, 2019 at 09:11:00AM +0300, Amir Goldstein wrote: > Update with all the missing errors the syscall can return, the > behaviour the syscall should have w.r.t. to copies within single > files, etc. > > [Amir] Copying beyond EOF returns zero. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > man2/copy_file_range.2 | 93 ++++++++++++++++++++++++++++++++++-------- > 1 file changed, 77 insertions(+), 16 deletions(-) > > diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2 > index 2438b63c8..fab11f977 100644 > --- a/man2/copy_file_range.2 > +++ b/man2/copy_file_range.2 > @@ -42,9 +42,9 @@ without the additional cost of transferring data from the kernel to user space > and then back into the kernel. > It copies up to > .I len > -bytes of data from file descriptor > +bytes of data from the source file descriptor > .I fd_in > -to file descriptor > +to target file descriptor "to the target file descriptor" > .IR fd_out , > overwriting any data that exists within the requested range of the target file. > .PP > @@ -74,6 +74,11 @@ is not changed, but > .I off_in > is adjusted appropriately. > .PP > +.I fd_in > +and > +.I fd_out > +can refer to the same file. If they refer to the same file, then the source and > +target ranges are not allowed to overlap. Please start each sentence on a new line, per mkerrisk rules. > .PP > The > .I flags > @@ -84,6 +89,11 @@ Upon successful completion, > .BR copy_file_range () > will return the number of bytes copied between files. > This could be less than the length originally requested. > +If the file offset of > +.I fd_in > +is at or past the end of file, no bytes are copied, and > +.BR copy_file_range () > +returns zero. > .PP > On error, > .BR copy_file_range () > @@ -93,12 +103,16 @@ is set to indicate the error. > .SH ERRORS > .TP > .B EBADF > -One or more file descriptors are not valid; or > +One or more file descriptors are not valid. > +.TP > +.B EBADF > .I fd_in > is not open for reading; or > .I fd_out > -is not open for writing; or > -the > +is not open for writing. > +.TP > +.B EBADF > +The > .B O_APPEND > flag is set for the open file description (see > .BR open (2)) > @@ -106,17 +120,36 @@ referred to by the file descriptor > .IR fd_out . > .TP > .B EFBIG > -An attempt was made to write a file that exceeds the implementation-defined > -maximum file size or the process's file size limit, > -or to write at a position past the maximum allowed offset. > +An attempt was made to write at a position past the maximum file offset the > +kernel supports. > +.TP > +.B EFBIG > +An attempt was made to write a range that exceeds the allowed maximum file size. > +The maximum file size differs between filesystem implemenations and can be "implementations" > +different to the maximum allowed file offset. "...different from the maximum..." > +.TP > +.B EFBIG > +An attempt was made to write beyond the process's file size resource > +limit. This may also result in the process receiving a > +.I SIGXFSZ > +signal. Start new sentences on a new line, please. > .TP > .B EINVAL > -Requested range extends beyond the end of the source file; or the > +The > .I flags > argument is not 0. > .TP > -.B EIO > -A low-level I/O error occurred while copying. > +.B EINVAL > +.I fd_in > +and > +.I fd_out > +refer to the same file and the source and target ranges overlap. > +.TP > +.B EINVAL > +.I fd_in > +or > +.I fd_out > +is not a regular file. Adding the word "either" at the beginning of the sentence (e.g. "Either fd_in or fd_out is not a regular file.") would help this flow better. > .TP > .B EISDIR > .I fd_in > @@ -124,22 +157,50 @@ or > .I fd_out > refers to a directory. > .TP > +.B EOVERFLOW > +The requested source or destination range is too large to represent in the > +specified data types. > +.TP > +.B EIO > +A low-level I/O error occurred while copying. > +.TP > .B ENOMEM > Out of memory. > .TP > -.B ENOSPC > -There is not enough space on the target filesystem to complete the copy. > -.TP > .B EXDEV > The files referred to by > .IR file_in " and " file_out > -are not on the same mounted filesystem. > +are not on the same mounted filesystem (pre Linux 5.3). > +.TP > +.B ENOSPC > +There is not enough space on the target filesystem to complete the copy. Why move this? > +.TP > +.B TXTBSY > +.I fd_in > +or > +.I fd_out > +refers to an active swap file. "Either fd_in or fd_out refers to..." > +.TP > +.B EPERM > +.I fd_out > +refers to an immutable file. > +.TP > +.B EACCES > +The user does not have write permissions for the destination file. > .SH VERSIONS > The > .BR copy_file_range () > system call first appeared in Linux 4.5, but glibc 2.27 provides a user-space > emulation when it is not available. > .\" https://sourceware.org/git/?p=glibc.git;a=commit;f=posix/unistd.h;h=bad7a0c81f501fbbcc79af9eaa4b8254441c4a1f > +.PP > +A major rework of the kernel implementation occurred in 5.3. Areas of the API > +that weren't clearly defined were clarified and the API bounds are much more > +strictly checked than on earlier kernels. Applications should target the > +behaviour and requirements of 5.3 kernels. Are there any weird cases where a program targetting 5.3 behavior would fail or get stuck in an infinite loop on a 5.2 kernel? Particularly since glibc spat out a copy_file_range fallback for 2.29 that tries to emulate the kernel behavior 100%. It even refuses cross-filesystem copies (because hey, we documented that :() even though that's perfectly fine for a userspace implementation. TBH I suspect that we ought to get the glibc developers to remove the "no cross device copies" code from their implementation and then update the manpage to say that cross device copies are supposed to be supported all the time, at least as of glibc 2.(futureversion). Anyways, thanks for taking on the c_f_r cleanup! :) --D > +.PP > +First support for cross-filesystem copies was introduced in Linux 5.3. Older > +kernels will return -EXDEV when cross-filesystem copies are attempted. > .SH CONFORMING TO > The > .BR copy_file_range () > @@ -224,7 +285,7 @@ main(int argc, char **argv) > } > > len \-= ret; > - } while (len > 0); > + } while (len > 0 && ret > 0); > > close(fd_in); > close(fd_out); > -- > 2.17.1 >
> > +A major rework of the kernel implementation occurred in 5.3. Areas of the API > > +that weren't clearly defined were clarified and the API bounds are much more > > +strictly checked than on earlier kernels. Applications should target the > > +behaviour and requirements of 5.3 kernels. > > Are there any weird cases where a program targetting 5.3 behavior would > fail or get stuck in an infinite loop on a 5.2 kernel? I don't think so. When Dave wrote this paragraph the behavior was changed from short copy to EINVAL. That would have been a problem to maintain old vs. new copy loops, but now the behavior did not change in that respect. > > Particularly since glibc spat out a copy_file_range fallback for 2.29 > that tries to emulate the kernel behavior 100%. It even refuses > cross-filesystem copies (because hey, we documented that :() even though > that's perfectly fine for a userspace implementation. > > TBH I suspect that we ought to get the glibc developers to remove the > "no cross device copies" code from their implementation and then update > the manpage to say that cross device copies are supposed to be > supported all the time, at least as of glibc 2.(futureversion). I don't see a problem with copy_file_range() returning EXDEV. That is why I left EXDEV in the man page. Tools should know how to deal with EXDEV by now. If you are running on a new kernel, you get better likelihood for copy_file_range() to do clone or in-kernel copy for you. > > Anyways, thanks for taking on the c_f_r cleanup! :) > Sure, get ready for another round ;-) Thanks for the review! Amir.
diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2 index 2438b63c8..fab11f977 100644 --- a/man2/copy_file_range.2 +++ b/man2/copy_file_range.2 @@ -42,9 +42,9 @@ without the additional cost of transferring data from the kernel to user space and then back into the kernel. It copies up to .I len -bytes of data from file descriptor +bytes of data from the source file descriptor .I fd_in -to file descriptor +to target file descriptor .IR fd_out , overwriting any data that exists within the requested range of the target file. .PP @@ -74,6 +74,11 @@ is not changed, but .I off_in is adjusted appropriately. .PP +.I fd_in +and +.I fd_out +can refer to the same file. If they refer to the same file, then the source and +target ranges are not allowed to overlap. .PP The .I flags @@ -84,6 +89,11 @@ Upon successful completion, .BR copy_file_range () will return the number of bytes copied between files. This could be less than the length originally requested. +If the file offset of +.I fd_in +is at or past the end of file, no bytes are copied, and +.BR copy_file_range () +returns zero. .PP On error, .BR copy_file_range () @@ -93,12 +103,16 @@ is set to indicate the error. .SH ERRORS .TP .B EBADF -One or more file descriptors are not valid; or +One or more file descriptors are not valid. +.TP +.B EBADF .I fd_in is not open for reading; or .I fd_out -is not open for writing; or -the +is not open for writing. +.TP +.B EBADF +The .B O_APPEND flag is set for the open file description (see .BR open (2)) @@ -106,17 +120,36 @@ referred to by the file descriptor .IR fd_out . .TP .B EFBIG -An attempt was made to write a file that exceeds the implementation-defined -maximum file size or the process's file size limit, -or to write at a position past the maximum allowed offset. +An attempt was made to write at a position past the maximum file offset the +kernel supports. +.TP +.B EFBIG +An attempt was made to write a range that exceeds the allowed maximum file size. +The maximum file size differs between filesystem implemenations and can be +different to the maximum allowed file offset. +.TP +.B EFBIG +An attempt was made to write beyond the process's file size resource +limit. This may also result in the process receiving a +.I SIGXFSZ +signal. .TP .B EINVAL -Requested range extends beyond the end of the source file; or the +The .I flags argument is not 0. .TP -.B EIO -A low-level I/O error occurred while copying. +.B EINVAL +.I fd_in +and +.I fd_out +refer to the same file and the source and target ranges overlap. +.TP +.B EINVAL +.I fd_in +or +.I fd_out +is not a regular file. .TP .B EISDIR .I fd_in @@ -124,22 +157,50 @@ or .I fd_out refers to a directory. .TP +.B EOVERFLOW +The requested source or destination range is too large to represent in the +specified data types. +.TP +.B EIO +A low-level I/O error occurred while copying. +.TP .B ENOMEM Out of memory. .TP -.B ENOSPC -There is not enough space on the target filesystem to complete the copy. -.TP .B EXDEV The files referred to by .IR file_in " and " file_out -are not on the same mounted filesystem. +are not on the same mounted filesystem (pre Linux 5.3). +.TP +.B ENOSPC +There is not enough space on the target filesystem to complete the copy. +.TP +.B TXTBSY +.I fd_in +or +.I fd_out +refers to an active swap file. +.TP +.B EPERM +.I fd_out +refers to an immutable file. +.TP +.B EACCES +The user does not have write permissions for the destination file. .SH VERSIONS The .BR copy_file_range () system call first appeared in Linux 4.5, but glibc 2.27 provides a user-space emulation when it is not available. .\" https://sourceware.org/git/?p=glibc.git;a=commit;f=posix/unistd.h;h=bad7a0c81f501fbbcc79af9eaa4b8254441c4a1f +.PP +A major rework of the kernel implementation occurred in 5.3. Areas of the API +that weren't clearly defined were clarified and the API bounds are much more +strictly checked than on earlier kernels. Applications should target the +behaviour and requirements of 5.3 kernels. +.PP +First support for cross-filesystem copies was introduced in Linux 5.3. Older +kernels will return -EXDEV when cross-filesystem copies are attempted. .SH CONFORMING TO The .BR copy_file_range () @@ -224,7 +285,7 @@ main(int argc, char **argv) } len \-= ret; - } while (len > 0); + } while (len > 0 && ret > 0); close(fd_in); close(fd_out);