[v4,02/11] VFS: copy_file_range check validity of input source offset

Message ID 20181026201057.36899-4-olga.kornievskaia@gmail.com
State New
Headers show
Series
  • client-side support for "inter" SSC copy
Related show

Commit Message

Olga Kornievskaia Oct. 26, 2018, 8:10 p.m.
From: Olga Kornievskaia <kolga@netapp.com>

Input source offset can't be beyond the end of the file.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/read_write.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Matthew Wilcox Oct. 26, 2018, 9:26 p.m. | #1
On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> +++ b/fs/read_write.c
> @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  		}
>  	}
>  
> +	if (pos_in >= i_size_read(inode_in))
> +		return -EINVAL;
> +
>  	if (file_out->f_op->copy_file_range) {
>  		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
>  						      pos_out, len, flags);

Is this the right place to check this?  Surely we should be checking for
this before calling clone_file_range()?
Dave Chinner Oct. 27, 2018, 9:27 a.m. | #2
On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> Input source offset can't be beyond the end of the file.
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/read_write.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index fb4ffca..b3b304e 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  		}
>  	}
>  
> +	if (pos_in >= i_size_read(inode_in))
> +		return -EINVAL;
> +

vfs_copy_file_range seems ot be missing a wide range of checks.
rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
checks in generic_write_checks() apply, right? And the same security
issues like stripping setuid bits, etc? And we need to touch
atime on the source file, too?

We've just merged 5 or so patches in 4.19-rc8 and we're ready to
merge another ~30 patch series to fix all the stuff missing from the
clone/dedupe file range operations that make them safe and robust.
It seems like copy_file_range is all the checks it needs, too?

Cheers,

Dave.
Olga Kornievskaia Oct. 29, 2018, 2:41 p.m. | #3
On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > Input source offset can't be beyond the end of the file.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/read_write.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index fb4ffca..b3b304e 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >               }
> >       }
> >
> > +     if (pos_in >= i_size_read(inode_in))
> > +             return -EINVAL;
> > +
>
> vfs_copy_file_range seems ot be missing a wide range of checks.
> rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> checks in generic_write_checks() apply, right? And the same security
> issues like stripping setuid bits, etc? And we need to touch
> atime on the source file, too?

Yes sound like needed checks.

> We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> merge another ~30 patch series to fix all the stuff missing from the
> clone/dedupe file range operations that make them safe and robust.
> It seems like copy_file_range is all the checks it needs, too?

Are you proposing to not do this check now in favor of the proper work
that will do all of those checks you listed above? I can not volunteer
to provide this comprehensive check. However if this is the path
community decides is the best then I can move this check into NFS for
now and remove it once VFS provides such check later.

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
Olga Kornievskaia Oct. 29, 2018, 4:09 p.m. | #4
On Fri, Oct 26, 2018 at 5:26 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > +++ b/fs/read_write.c
> > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >               }
> >       }
> >
> > +     if (pos_in >= i_size_read(inode_in))
> > +             return -EINVAL;
> > +
> >       if (file_out->f_op->copy_file_range) {
> >               ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
> >                                                     pos_out, len, flags);
>
> Is this the right place to check this?  Surely we should be checking for
> this before calling clone_file_range()?

Ops, indeed this is the wrong place. If I were to keep this check here
then I need to move it where it was originally located. However, right
now I'm included to not do the check in VFS at all and move it to NFS.
Dave Chinner Oct. 30, 2018, 9:03 a.m. | #5
On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > >
> > > Input source offset can't be beyond the end of the file.
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/read_write.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > index fb4ffca..b3b304e 100644
> > > --- a/fs/read_write.c
> > > +++ b/fs/read_write.c
> > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > >               }
> > >       }
> > >
> > > +     if (pos_in >= i_size_read(inode_in))
> > > +             return -EINVAL;
> > > +
> >
> > vfs_copy_file_range seems ot be missing a wide range of checks.
> > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > checks in generic_write_checks() apply, right? And the same security
> > issues like stripping setuid bits, etc? And we need to touch
> > atime on the source file, too?
> 
> Yes sound like needed checks.
> 
> > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > merge another ~30 patch series to fix all the stuff missing from the
> > clone/dedupe file range operations that make them safe and robust.
> > It seems like copy_file_range is all the checks it needs, too?
> 
> Are you proposing to not do this check now in favor of the proper work
> that will do all of those checks you listed above?

No, I'm saying that if you're adding one check, there's a whole heap
of checks that still need to be added, *especially* if this is going
to fall back to page cache copy between superblocks that may have
different limits and constraints.

There's security issues in this API. They need to be fixed before we
allow it to do more and potentially expose more problems due to it's
wider capability.

> I can not volunteer
> to provide this comprehensive check.

Why not?

Cheers,

Dave.
Olga Kornievskaia Oct. 30, 2018, 1:40 p.m. | #6
On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > >
> > > > Input source offset can't be beyond the end of the file.
> > > >
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/read_write.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > index fb4ffca..b3b304e 100644
> > > > --- a/fs/read_write.c
> > > > +++ b/fs/read_write.c
> > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > >               }
> > > >       }
> > > >
> > > > +     if (pos_in >= i_size_read(inode_in))
> > > > +             return -EINVAL;
> > > > +
> > >
> > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > checks in generic_write_checks() apply, right? And the same security
> > > issues like stripping setuid bits, etc? And we need to touch
> > > atime on the source file, too?
> >
> > Yes sound like needed checks.
> >
> > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > merge another ~30 patch series to fix all the stuff missing from the
> > > clone/dedupe file range operations that make them safe and robust.
> > > It seems like copy_file_range is all the checks it needs, too?
> >
> > Are you proposing to not do this check now in favor of the proper work
> > that will do all of those checks you listed above?
>
> No, I'm saying that if you're adding one check, there's a whole heap
> of checks that still need to be added, *especially* if this is going
> to fall back to page cache copy between superblocks that may have
> different limits and constraints.
>
> There's security issues in this API. They need to be fixed before we
> allow it to do more and potentially expose more problems due to it's
> wider capability.

Sounds like you are arguing that enabling generic copy_file_range()
via do_splice() isn't a good thing without those checks. I'm totally
OK with removing this functionality. As I mentioned earlier I added it
as I thought it was beneficial and I assumed that do_splice() was a
generic enough API to handle what was needed.

What this patch series was intended for was removing/relocating the
cross device check and it sounds like I should be keeping it just to
that.

> > I can not volunteer
> > to provide this comprehensive check.
>
> Why not?

I don't have the expertise in the VFS code.

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
Olga Kornievskaia Oct. 30, 2018, 9:10 p.m. | #7
On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > >
> > > > Input source offset can't be beyond the end of the file.
> > > >
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/read_write.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > index fb4ffca..b3b304e 100644
> > > > --- a/fs/read_write.c
> > > > +++ b/fs/read_write.c
> > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > >               }
> > > >       }
> > > >
> > > > +     if (pos_in >= i_size_read(inode_in))
> > > > +             return -EINVAL;
> > > > +
> > >
> > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > checks in generic_write_checks() apply, right? And the same security
> > > issues like stripping setuid bits, etc? And we need to touch
> > > atime on the source file, too?
> >
> > Yes sound like needed checks.
> >
> > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > merge another ~30 patch series to fix all the stuff missing from the
> > > clone/dedupe file range operations that make them safe and robust.
> > > It seems like copy_file_range is all the checks it needs, too?
> >
> > Are you proposing to not do this check now in favor of the proper work
> > that will do all of those checks you listed above?
>
> No, I'm saying that if you're adding one check, there's a whole heap
> of checks that still need to be added, *especially* if this is going
> to fall back to page cache copy between superblocks that may have
> different limits and constraints.
>
> There's security issues in this API. They need to be fixed before we
> allow it to do more and potentially expose more problems due to it's
> wider capability.

Before I totally give up on this feature, can you help me understand
your concerns with allowing the generic copy_file_range via
do_splice().

I have mentioned I'm not a VFS expert thus I come from just looking at
the available documentation and the code.

I don't see any restrictions on the files being passed in the
do_splice_direct(). There are no restrictions that they must be from
the same filesystem or file system type. But perhaps this not the
concern you had but more about checking validity of arguments?

I have looked at Dave Wong's, if I'm not mistaken these 2 are the
relevant patches:
[PATCH 02/28] vfs: check file ranges before cloning files
 -- a couple but not all checks apply to copy_file_range() .
specifically, the offsets don't wrap and offset isn't past eof (as my
patch suggests). Other checks have to do with aligned memory which I
don't believe is needed or other dedup requirement that don't apply.

[PATCH 04/28] vfs: strengthen checking of file range inputs to
generic_remap_checks
 -- these checks apply to the code once we fall back to the
do_splice(). it looks to me that perhaps exporting
generic_access_check_limits()/generic_write_check_limits so that they
can be used by the copy_file_range when it get pulled in.

Also, can you elaborate one what "security issues" are present in this
API? Is it "stripping setuid bits" and so something like calling
file_remove_priv() that should be done when the fallback to the
do_splice_direct() happens?

As for the atime, wouldn't the ->copy_file_range() be updating the
file attributes? I guess for the fallback case, the attributes need to
be updated.

If those are checks/issues needed to be addressed and would then get
the generic copy_file_range() in, I could give a go at a patch (or 2).
>
> > I can not volunteer
> > to provide this comprehensive check.
>
> Why not?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
Olga Kornievskaia Oct. 30, 2018, 9:12 p.m. | #8
On Tue, Oct 30, 2018 at 5:10 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > >
> > > > > Input source offset can't be beyond the end of the file.
> > > > >
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  fs/read_write.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > index fb4ffca..b3b304e 100644
> > > > > --- a/fs/read_write.c
> > > > > +++ b/fs/read_write.c
> > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > >               }
> > > > >       }
> > > > >
> > > > > +     if (pos_in >= i_size_read(inode_in))
> > > > > +             return -EINVAL;
> > > > > +
> > > >
> > > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > > checks in generic_write_checks() apply, right? And the same security
> > > > issues like stripping setuid bits, etc? And we need to touch
> > > > atime on the source file, too?
> > >
> > > Yes sound like needed checks.
> > >
> > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > > merge another ~30 patch series to fix all the stuff missing from the
> > > > clone/dedupe file range operations that make them safe and robust.
> > > > It seems like copy_file_range is all the checks it needs, too?
> > >
> > > Are you proposing to not do this check now in favor of the proper work
> > > that will do all of those checks you listed above?
> >
> > No, I'm saying that if you're adding one check, there's a whole heap
> > of checks that still need to be added, *especially* if this is going
> > to fall back to page cache copy between superblocks that may have
> > different limits and constraints.
> >
> > There's security issues in this API. They need to be fixed before we
> > allow it to do more and potentially expose more problems due to it's
> > wider capability.
>
> Before I totally give up on this feature, can you help me understand
> your concerns with allowing the generic copy_file_range via
> do_splice().
>
> I have mentioned I'm not a VFS expert thus I come from just looking at
> the available documentation and the code.
>
> I don't see any restrictions on the files being passed in the
> do_splice_direct(). There are no restrictions that they must be from
> the same filesystem or file system type. But perhaps this not the
> concern you had but more about checking validity of arguments?
>
> I have looked at Dave Wong's, if I'm not mistaken these 2 are the

apologizes Darrick Wong's patches.

> relevant patches:
> [PATCH 02/28] vfs: check file ranges before cloning files
>  -- a couple but not all checks apply to copy_file_range() .
> specifically, the offsets don't wrap and offset isn't past eof (as my
> patch suggests). Other checks have to do with aligned memory which I
> don't believe is needed or other dedup requirement that don't apply.
>
> [PATCH 04/28] vfs: strengthen checking of file range inputs to
> generic_remap_checks
>  -- these checks apply to the code once we fall back to the
> do_splice(). it looks to me that perhaps exporting
> generic_access_check_limits()/generic_write_check_limits so that they
> can be used by the copy_file_range when it get pulled in.
>
> Also, can you elaborate one what "security issues" are present in this
> API? Is it "stripping setuid bits" and so something like calling
> file_remove_priv() that should be done when the fallback to the
> do_splice_direct() happens?
>
> As for the atime, wouldn't the ->copy_file_range() be updating the
> file attributes? I guess for the fallback case, the attributes need to
> be updated.
>
> If those are checks/issues needed to be addressed and would then get
> the generic copy_file_range() in, I could give a go at a patch (or 2).
> >
> > > I can not volunteer
> > > to provide this comprehensive check.
> >
> > Why not?
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
Dave Chinner Oct. 30, 2018, 11:40 p.m. | #9
On Tue, Oct 30, 2018 at 09:40:09AM -0400, Olga Kornievskaia wrote:
> On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > >
> > > > > Input source offset can't be beyond the end of the file.
> > > > >
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  fs/read_write.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > index fb4ffca..b3b304e 100644
> > > > > --- a/fs/read_write.c
> > > > > +++ b/fs/read_write.c
> > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > >               }
> > > > >       }
> > > > >
> > > > > +     if (pos_in >= i_size_read(inode_in))
> > > > > +             return -EINVAL;
> > > > > +
> > > >
> > > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > > checks in generic_write_checks() apply, right? And the same security
> > > > issues like stripping setuid bits, etc? And we need to touch
> > > > atime on the source file, too?
> > >
> > > Yes sound like needed checks.
> > >
> > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > > merge another ~30 patch series to fix all the stuff missing from the
> > > > clone/dedupe file range operations that make them safe and robust.
> > > > It seems like copy_file_range is all the checks it needs, too?
> > >
> > > Are you proposing to not do this check now in favor of the proper work
> > > that will do all of those checks you listed above?
> >
> > No, I'm saying that if you're adding one check, there's a whole heap
> > of checks that still need to be added, *especially* if this is going
> > to fall back to page cache copy between superblocks that may have
> > different limits and constraints.
> >
> > There's security issues in this API. They need to be fixed before we
> > allow it to do more and potentially expose more problems due to it's
> > wider capability.
> 
> Sounds like you are arguing that enabling generic copy_file_range()
> via do_splice() isn't a good thing without those checks.

It's already enabled for filesystems that don't provide
->remap_file_range or ->copy_file_range. Your changes don't
introduce new problems, that just made me aware that there are
serious problems here, too.

The problem we've got now is that filesystem that use
->remap_file_range() are going to have very different error
detection and handling when Darrick's patchset is merged. They are
going to reject setuid files, obey rlimit, LFS, superblock max
sizes, etc. while ->copy_file_range() and do_splice_direct() methods
are not.

That really needs to be fixed.

> I'm totally
> OK with removing this functionality. As I mentioned earlier I added it
> as I thought it was beneficial and I assumed that do_splice() was a
> generic enough API to handle what was needed.

That doesn't fix the problems that are already there. :/

> > > I can not volunteer
> > > to provide this comprehensive check.
> >
> > Why not?
> 
> I don't have the expertise in the VFS code.

Just like all the other people without VFS expertise who fix
problems introduced in code written by people without VFS expertise.
I'm certainly no VFS expert, either, I've just seen, found and
helped fix a bunch of serious problems in the VFS dedupe/reflink
code that were uncovered by data corruptions on XFS filesystems.

That's part of being a filesystem developer - the VFS is "common
property" of all the filesystems and everyone has to do their bit to
maintain it....

Cheers,

Dave.
Dave Chinner Oct. 31, 2018, 12:14 a.m. | #10
On Tue, Oct 30, 2018 at 05:10:58PM -0400, Olga Kornievskaia wrote:
> On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > >
> > > > > Input source offset can't be beyond the end of the file.
> > > > >
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  fs/read_write.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > index fb4ffca..b3b304e 100644
> > > > > --- a/fs/read_write.c
> > > > > +++ b/fs/read_write.c
> > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > >               }
> > > > >       }
> > > > >
> > > > > +     if (pos_in >= i_size_read(inode_in))
> > > > > +             return -EINVAL;
> > > > > +
> > > >
> > > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > > checks in generic_write_checks() apply, right? And the same security
> > > > issues like stripping setuid bits, etc? And we need to touch
> > > > atime on the source file, too?
> > >
> > > Yes sound like needed checks.
> > >
> > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > > merge another ~30 patch series to fix all the stuff missing from the
> > > > clone/dedupe file range operations that make them safe and robust.
> > > > It seems like copy_file_range is all the checks it needs, too?
> > >
> > > Are you proposing to not do this check now in favor of the proper work
> > > that will do all of those checks you listed above?
> >
> > No, I'm saying that if you're adding one check, there's a whole heap
> > of checks that still need to be added, *especially* if this is going
> > to fall back to page cache copy between superblocks that may have
> > different limits and constraints.
> >
> > There's security issues in this API. They need to be fixed before we
> > allow it to do more and potentially expose more problems due to it's
> > wider capability.
> 
> Before I totally give up on this feature, can you help me understand
> your concerns with allowing the generic copy_file_range via
> do_splice().

it's not do_splice_direct() i'm concerned about. It's /writing data
without adequate checks/ that I'm concerned about.
->copy_file_range() also writes data, so it needs to undergo the
same safety checks as well.

> I have mentioned I'm not a VFS expert thus I come from just looking at
> the available documentation and the code.
> 
> I don't see any restrictions on the files being passed in the
> do_splice_direct(). There are no restrictions that they must be from
> the same filesystem or file system type. But perhaps this not the
> concern you had but more about checking validity of arguments?
> 
> I have looked at Dave Wong's, if I'm not mistaken these 2 are the
> relevant patches:
> [PATCH 02/28] vfs: check file ranges before cloning files
>  -- a couple but not all checks apply to copy_file_range() .

Yes, of course - clone/dedupe have different constraints, but the
core checks are still needed for copy_file_range(). 

For example, the man page says:

EINVAL
	Requested range extends beyond the end of the source
	file; or the flags argument is not 0.

Your patch above doesn't actually check that - it only checks if the
pos_in is beyond EOF. It needs to check if pos_in + len is beyond
EOF. After checking for wraps, of course.

> [PATCH 04/28] vfs: strengthen checking of file range inputs to
> generic_remap_checks
>  -- these checks apply to the code once we fall back to the
> do_splice().

man page says:

EFBIG
       An  attempt  was made to write a file that exceeds the
       implementation-defined maximum file size or the process's
       file size limit, or to write at a position past the maximum
       allowed offset.

These conditions apply to the destination file regards of the method
used to copy the data. That's what the generic methods now check for
clone/dedupe, and need to be used here, too.

e.g. In the case of NFS, if the user ion the NFS client is not
allowed to copy a file (e.g. say user RLIMIT restrictions) then that
should not be bypassed just because the NFS client can do a server
side copy. The restriction has been placed on the local user calling
copy_file_range(), not on the server side implementation.

> Also, can you elaborate one what "security issues" are present in this
> API? Is it "stripping setuid bits" and so something like calling
> file_remove_priv() that should be done when the fallback to the
> do_splice_direct() happens?

From 4.19-rc8:

7debbf015f58 xfs: update ctime and remove suid before cloning files

Which then got moved into the generic remap_file_range code in
Darrick's "vfs: remap helper should update destination inode
metadata" patch:

https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=8dde90bca6fca3736ea20109654bcf6dcf2ecf1d

We can't assume that a server side copy is going to strip setuid
bits or even update target files c/mtimes. 

> As for the atime, wouldn't the ->copy_file_range() be updating the
> file attributes? I guess for the fallback case, the attributes need to
> be updated.

Generic VFS code should take care of atime updates. Hence if the
source file needs an atime update, it should be in the generic code
and done for all copy methods.

> If those are checks/issues needed to be addressed and would then get
> the generic copy_file_range() in, I could give a go at a patch (or 2).

yes, those are the ones I'm aware of. BUt there may be more, I
haven't really looked that closely...

Cheers,

Dave.
Olga Kornievskaia Oct. 31, 2018, 2:51 p.m. | #11
On Tue, Oct 30, 2018 at 8:15 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Tue, Oct 30, 2018 at 05:10:58PM -0400, Olga Kornievskaia wrote:
> > On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > >
> > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > >
> > > > > > Input source offset can't be beyond the end of the file.
> > > > > >
> > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > ---
> > > > > >  fs/read_write.c | 3 +++
> > > > > >  1 file changed, 3 insertions(+)
> > > > > >
> > > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > > index fb4ffca..b3b304e 100644
> > > > > > --- a/fs/read_write.c
> > > > > > +++ b/fs/read_write.c
> > > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > > >               }
> > > > > >       }
> > > > > >
> > > > > > +     if (pos_in >= i_size_read(inode_in))
> > > > > > +             return -EINVAL;
> > > > > > +
> > > > >
> > > > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > > > checks in generic_write_checks() apply, right? And the same security
> > > > > issues like stripping setuid bits, etc? And we need to touch
> > > > > atime on the source file, too?
> > > >
> > > > Yes sound like needed checks.
> > > >
> > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > > > merge another ~30 patch series to fix all the stuff missing from the
> > > > > clone/dedupe file range operations that make them safe and robust.
> > > > > It seems like copy_file_range is all the checks it needs, too?
> > > >
> > > > Are you proposing to not do this check now in favor of the proper work
> > > > that will do all of those checks you listed above?
> > >
> > > No, I'm saying that if you're adding one check, there's a whole heap
> > > of checks that still need to be added, *especially* if this is going
> > > to fall back to page cache copy between superblocks that may have
> > > different limits and constraints.
> > >
> > > There's security issues in this API. They need to be fixed before we
> > > allow it to do more and potentially expose more problems due to it's
> > > wider capability.
> >
> > Before I totally give up on this feature, can you help me understand
> > your concerns with allowing the generic copy_file_range via
> > do_splice().
>
> it's not do_splice_direct() i'm concerned about. It's /writing data
> without adequate checks/ that I'm concerned about.
> ->copy_file_range() also writes data, so it needs to undergo the
> same safety checks as well.

Thank you Dave for clarifying and elaborating on the points. As you
pointed out this concerns apply to the current code the same way as to
the patch series. Those concerns should be address however I feel like
they shouldn't be the responsibility of this particular patch series.
Therefore, I ask for the community to either make any final comments
for any changes that are needed to "version 7" patches and if no more
comments arise I would like to ask for this to be added to the queue
for the next kernel version.

Then the next patch series would be just VFS and would add appropriate
checks and then allow for the generic copy_file_range() via do_splice.

> > I have mentioned I'm not a VFS expert thus I come from just looking at
> > the available documentation and the code.
> >
> > I don't see any restrictions on the files being passed in the
> > do_splice_direct(). There are no restrictions that they must be from
> > the same filesystem or file system type. But perhaps this not the
> > concern you had but more about checking validity of arguments?
> >
> > I have looked at Dave Wong's, if I'm not mistaken these 2 are the
> > relevant patches:
> > [PATCH 02/28] vfs: check file ranges before cloning files
> >  -- a couple but not all checks apply to copy_file_range() .
>
> Yes, of course - clone/dedupe have different constraints, but the
> core checks are still needed for copy_file_range().
>
> For example, the man page says:
>
> EINVAL
>         Requested range extends beyond the end of the source
>         file; or the flags argument is not 0.
>
> Your patch above doesn't actually check that - it only checks if the
> pos_in is beyond EOF. It needs to check if pos_in + len is beyond
> EOF. After checking for wraps, of course.

There was a reason why I didn't include the "pos_in + len" check. It
sparked the conversation why should "pos_in + len" be an error, when a
"read" system call would just return a "short" read and EOF. So I
dropped the check for "pst_in + len" to be an error.

https://www.spinics.net/lists/linux-nfs/msg62627.html  (probably not
all the conversations but some of them)

> > [PATCH 04/28] vfs: strengthen checking of file range inputs to
> > generic_remap_checks
> >  -- these checks apply to the code once we fall back to the
> > do_splice().
>
> man page says:
>
> EFBIG
>        An  attempt  was made to write a file that exceeds the
>        implementation-defined maximum file size or the process's
>        file size limit, or to write at a position past the maximum
>        allowed offset.
>
> These conditions apply to the destination file regards of the method
> used to copy the data. That's what the generic methods now check for
> clone/dedupe, and need to be used here, too.

Agreed and once Darrek patches are in, copy_file_range() can use them too.

> e.g. In the case of NFS, if the user ion the NFS client is not
> allowed to copy a file (e.g. say user RLIMIT restrictions) then that
> should not be bypassed just because the NFS client can do a server
> side copy. The restriction has been placed on the local user calling
> copy_file_range(), not on the server side implementation.
>
> > Also, can you elaborate one what "security issues" are present in this
> > API? Is it "stripping setuid bits" and so something like calling
> > file_remove_priv() that should be done when the fallback to the
> > do_splice_direct() happens?
>
> From 4.19-rc8:
>
> 7debbf015f58 xfs: update ctime and remove suid before cloning files
>
> Which then got moved into the generic remap_file_range code in
> Darrick's "vfs: remap helper should update destination inode
> metadata" patch:
>
> https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=8dde90bca6fca3736ea20109654bcf6dcf2ecf1d
>
> We can't assume that a server side copy is going to strip setuid
> bits or even update target files c/mtimes.

I would like to discuss your concerns about updating attributes
(c/m/atimes), why shouldn't it be a ->copy_file_range()
responsibility. copy_file_rage is basically a read+write. As far as I
can tell, vfs_read and vfs_write (in VFS) don't deal with updating
attributes. I'm guessing it's assumed that underlying file systems are
going to take care of it (unless of course I misread the code).

> > As for the atime, wouldn't the ->copy_file_range() be updating the
> > file attributes? I guess for the fallback case, the attributes need to
> > be updated.
>
> Generic VFS code should take care of atime updates. Hence if the
> source file needs an atime update, it should be in the generic code
> and done for all copy methods.
>
> > If those are checks/issues needed to be addressed and would then get
> > the generic copy_file_range() in, I could give a go at a patch (or 2).
>
> yes, those are the ones I'm aware of. BUt there may be more, I
> haven't really looked that closely...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
Dave Chinner Oct. 31, 2018, 11:33 p.m. | #12
On Wed, Oct 31, 2018 at 10:51:48AM -0400, Olga Kornievskaia wrote:
> On Tue, Oct 30, 2018 at 8:15 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Tue, Oct 30, 2018 at 05:10:58PM -0400, Olga Kornievskaia wrote:
> > > On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote:
> > > > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote:
> > > > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > > > >
> > > > > > > Input source offset can't be beyond the end of the file.
> > > > > > >
> > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > > > ---
> > > > > > >  fs/read_write.c | 3 +++
> > > > > > >  1 file changed, 3 insertions(+)
> > > > > > >
> > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > > > index fb4ffca..b3b304e 100644
> > > > > > > --- a/fs/read_write.c
> > > > > > > +++ b/fs/read_write.c
> > > > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > > > >               }
> > > > > > >       }
> > > > > > >
> > > > > > > +     if (pos_in >= i_size_read(inode_in))
> > > > > > > +             return -EINVAL;
> > > > > > > +
> > > > > >
> > > > > > vfs_copy_file_range seems ot be missing a wide range of checks.
> > > > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the
> > > > > > checks in generic_write_checks() apply, right? And the same security
> > > > > > issues like stripping setuid bits, etc? And we need to touch
> > > > > > atime on the source file, too?
> > > > >
> > > > > Yes sound like needed checks.
> > > > >
> > > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to
> > > > > > merge another ~30 patch series to fix all the stuff missing from the
> > > > > > clone/dedupe file range operations that make them safe and robust.
> > > > > > It seems like copy_file_range is all the checks it needs, too?
> > > > >
> > > > > Are you proposing to not do this check now in favor of the proper work
> > > > > that will do all of those checks you listed above?
> > > >
> > > > No, I'm saying that if you're adding one check, there's a whole heap
> > > > of checks that still need to be added, *especially* if this is going
> > > > to fall back to page cache copy between superblocks that may have
> > > > different limits and constraints.
> > > >
> > > > There's security issues in this API. They need to be fixed before we
> > > > allow it to do more and potentially expose more problems due to it's
> > > > wider capability.
> > >
> > > Before I totally give up on this feature, can you help me understand
> > > your concerns with allowing the generic copy_file_range via
> > > do_splice().
> >
> > it's not do_splice_direct() i'm concerned about. It's /writing data
> > without adequate checks/ that I'm concerned about.
> > ->copy_file_range() also writes data, so it needs to undergo the
> > same safety checks as well.
> 
> Thank you Dave for clarifying and elaborating on the points. As you
> pointed out this concerns apply to the current code the same way as to
> the patch series. Those concerns should be address however I feel like
> they shouldn't be the responsibility of this particular patch series.
> Therefore, I ask for the community to either make any final comments
> for any changes that are needed to "version 7" patches and if no more
> comments arise I would like to ask for this to be added to the queue
> for the next kernel version.
> 
> Then the next patch series would be just VFS and would add appropriate
> checks and then allow for the generic copy_file_range() via do_splice.

That's fine by me.

> 
> > > I have mentioned I'm not a VFS expert thus I come from just looking at
> > > the available documentation and the code.
> > >
> > > I don't see any restrictions on the files being passed in the
> > > do_splice_direct(). There are no restrictions that they must be from
> > > the same filesystem or file system type. But perhaps this not the
> > > concern you had but more about checking validity of arguments?
> > >
> > > I have looked at Dave Wong's, if I'm not mistaken these 2 are the
> > > relevant patches:
> > > [PATCH 02/28] vfs: check file ranges before cloning files
> > >  -- a couple but not all checks apply to copy_file_range() .
> >
> > Yes, of course - clone/dedupe have different constraints, but the
> > core checks are still needed for copy_file_range().
> >
> > For example, the man page says:
> >
> > EINVAL
> >         Requested range extends beyond the end of the source
> >         file; or the flags argument is not 0.
> >
> > Your patch above doesn't actually check that - it only checks if the
> > pos_in is beyond EOF. It needs to check if pos_in + len is beyond
> > EOF. After checking for wraps, of course.
> 
> There was a reason why I didn't include the "pos_in + len" check. It
> sparked the conversation why should "pos_in + len" be an error, when a
> "read" system call would just return a "short" read and EOF. So I
> dropped the check for "pst_in + len" to be an error.

So man page patches will be required, too. :)

Basically, we need to nail down the expected semantics, make sure
they are correctly documented and /enforced consistently/ across all
filesystems.


> > >  -- these checks apply to the code once we fall back to the
> > > do_splice().
> >
> > man page says:
> >
> > EFBIG
> >        An  attempt  was made to write a file that exceeds the
> >        implementation-defined maximum file size or the process's
> >        file size limit, or to write at a position past the maximum
> >        allowed offset.
> >
> > These conditions apply to the destination file regards of the method
> > used to copy the data. That's what the generic methods now check for
> > clone/dedupe, and need to be used here, too.
> 
> Agreed and once Darrek patches are in, copy_file_range() can use them too.

Should be in the next couple of days.

> > 7debbf015f58 xfs: update ctime and remove suid before cloning files
> >
> > Which then got moved into the generic remap_file_range code in
> > Darrick's "vfs: remap helper should update destination inode
> > metadata" patch:
> >
> > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=8dde90bca6fca3736ea20109654bcf6dcf2ecf1d
> >
> > We can't assume that a server side copy is going to strip setuid
> > bits or even update target files c/mtimes.
> 
> I would like to discuss your concerns about updating attributes
> (c/m/atimes), why shouldn't it be a ->copy_file_range()
> responsibility. copy_file_rage is basically a read+write. As far as I
> can tell, vfs_read and vfs_write (in VFS) don't deal with updating
> attributes.

You're looking at the wrong level. The VFS layer is the first
multiplexing layer, allowing filesystems to select a method of
handling functionality. They then make use of "generic helpers"
to implement the required functionality, and they contain the
required updates.

ie.g. A list of generic helpers with atime update callers from my
cscope index:

f fs/pipe.c          pipe_read                   343 file_accessed(filp);
h fs/readdir.c       iterate_dir                  56 file_accessed(file);
i fs/splice.c        generic_file_splice_read    311 file_accessed(in);
j fs/splice.c        splice_direct_to_actor      992 file_accessed(in);
p mm/filemap.c       generic_file_buffered_read 2299 file_accessed(filp);
q mm/filemap.c       generic_file_read_iter     2339 file_accessed(file);
r mm/filemap.c       generic_file_mmap          2736 file_accessed(file);

These are effectively reference implementations of the file reading
infrastructure. Filesystems often have customised implementations
but they all must contain the same functioanlity and behaviour as
the reference implementation.

> I'm guessing it's assumed that underlying file systems are
> going to take care of it (unless of course I misread the code).

Only the ones that don't specifically call the generic helper to do
the work.

IOWs, what I'd like to see is a generic_copy_file_range() as the
reference implemenation using a page cache copy. This contains all
the required checks, timestamp updates, etc. If the filesystem does
not supply ->copy_file_range, then generic_copy_file_range() is
called, not do_splice_direct(). Indeed, a filesystem should be able
to do:

	.copy_file_range = xfs_copy_file_range,

xfs_copy_file_range(...)
{
	trace_xfs_copy_file_range(...)
	return generic_copy_file_range(....);
}

and have everything work correctly.

Cheers,

Dave.

Patch

diff --git a/fs/read_write.c b/fs/read_write.c
index fb4ffca..b3b304e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1594,6 +1594,9 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 		}
 	}
 
+	if (pos_in >= i_size_read(inode_in))
+		return -EINVAL;
+
 	if (file_out->f_op->copy_file_range) {
 		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
 						      pos_out, len, flags);