diff mbox series

[v5,06/10] libfs: Validate negative dentries in case-insensitive directories

Message ID 20230812004146.30980-7-krisman@suse.de
State Superseded
Headers show
Series Support negative dentries on case-insensitive ext4 and f2fs | expand

Commit Message

Gabriel Krisman Bertazi Aug. 12, 2023, 12:41 a.m. UTC
From: Gabriel Krisman Bertazi <krisman@collabora.com>

Introduce a dentry revalidation helper to be used by case-insensitive
filesystems to check if it is safe to reuse a negative dentry.

A negative dentry is safe to be reused on a case-insensitive lookup if
it was created during a case-insensitive lookup and this is not a lookup
that will instantiate a dentry. If this is a creation lookup, we also
need to make sure the name matches sensitively the name under lookup in
order to assure the name preserving semantics.

dentry->d_name is only checked by the case-insensitive d_revalidate hook
in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
d_revalidate is always called with the parent inode at least
read-locked, and therefore the name cannot change from under us.

d_revalidate is only called in 4 places: lookup_dcache, __lookup_slow,
lookup_open and lookup_fast:

  - lookup_dcache always calls it with zeroed flags, with the exception
    of when coming from __lookup_hash, which needs the parent locked
    already, for instance in the open/creation path, which is locked in
    open_last_lookups.

  - In __lookup_slow, either the parent inode is read-locked by the
    caller (lookup_slow), or it is called with no flags (lookup_one*).
    The read lock suffices to prevent ->d_name modifications, with the
    exception of one case: __d_unalias, will call __d_move to fix a
    directory accessible from multiple dentries, which effectively swaps
    ->d_name while holding only the shared read lock.  This happens
    through this flow:

    lookup_slow()  //LOOKUP_CREATE
      d_lookup()
        ->d_lookup()
          d_splice_alias()
            __d_unalias()
              __d_move()

    Nevertheless, this case is not a problem because negative dentries
    are not allowed to be moved with __d_move.  In addition,
    d_instantiate shouldn't race with this case because it's callers
    also acquire the parent inode lock, preventing it from racing with
    lookup creation, so the dentry cannot become positive and be moved
    while inside d_revalidate, which would be a problem if a parallel
    lookup did d_splice_alias.

  - lookup_open also requires the parent to be locked in the creation
    case, which is done in open_last_lookups.

  - lookup_fast will indeed be called with the parent unlocked, but it
    shouldn't be called with LOOKUP_CREATE.  Either it is called in the
    link_path_walk, where nd->flags doesn't have LOOKUP_CREATE yet or in
    open_last_lookups. But, in this case, it also never has LOOKUP_CREATE,
    because it is only called on the !O_CREAT case, which means op->intent
    doesn't have LOOKUP_CREAT (set in build_open_flags only if O_CREAT is
    set).

Finally, for the LOOKUP_RENAME_TARGET, we are doing a rename, so the
parents inodes are also locked.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v4:
  - Drop useless inline declaration (eric)
  - Refactor to drop extra identation (Christian)
  - Discuss d_instantiate
Changes since v3:
  - Add comment regarding creation (Eric)
  - Reorder checks to clarify !flags meaning (Eric)
  - Add commit message explanaton of the inode read lock wrt.
    __d_move. (Eric)
Changes since v2:
  - Add comments to all rejection cases (Eric)
  - safeguard against filesystem creating dentries without LOOKUP flags
---
 fs/libfs.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

Comments

Eric Biggers Aug. 12, 2023, 2:41 a.m. UTC | #1
On Fri, Aug 11, 2023 at 08:41:42PM -0400, Gabriel Krisman Bertazi wrote:
> +	/*
> +	 * Filesystems will call into d_revalidate without setting
> +	 * LOOKUP_ flags even for file creation (see lookup_one*
> +	 * variants).  Reject negative dentries in this case, since we
> +	 * can't know for sure it won't be used for creation.
> +	 */
> +	if (!flags)
> +		return 0;
> +
> +	/*
> +	 * If the lookup is for creation, then a negative dentry can
> +	 * only be reused if it's a case-sensitive match, not just a
> +	 * case-insensitive one.  This is needed to make the new file be
> +	 * created with the name the user specified, preserving case.
> +	 */
> +	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
> +		/*
> +		 * ->d_name won't change from under us in the creation
> +		 * path only, since d_revalidate during creation and
> +		 * renames is always called with the parent inode
> +		 * locked.  It isn't the case for all lookup callpaths,
> +		 * so ->d_name must not be touched outside
> +		 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
> +		 */
> +		if (dentry->d_name.len != name->len ||
> +		    memcmp(dentry->d_name.name, name->name, name->len))
> +			return 0;
> +	}

This is still really confusing to me.  Can you consider the below?  The code is
the same except for the reordering, but the explanation is reworked to be much
clearer (IMO).  Anything I am misunderstanding?

	/*
	 * If the lookup is for creation, then a negative dentry can only be
	 * reused if it's a case-sensitive match, not just a case-insensitive
	 * one.  This is needed to make the new file be created with the name
	 * the user specified, preserving case.
	 *
	 * LOOKUP_CREATE or LOOKUP_RENAME_TARGET cover most creations.  In these
	 * cases, ->d_name is stable and can be compared to 'name' without
	 * taking ->d_lock because the caller holds dir->i_rwsem for write.
	 * (This is because the directory lock blocks the dentry from being
	 * concurrently instantiated, and negative dentries are never moved.)
	 *
	 * All other creations actually use flags==0.  These come from the edge
	 * case of filesystems calling functions like lookup_one() that do a
	 * lookup without setting the lookup flags at all.  Such lookups might
	 * or might not be for creation, and if not don't guarantee stable
	 * ->d_name.  Therefore, invalidate all negative dentries when flags==0.
	 */
	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
		if (dentry->d_name.len != name->len ||
		    memcmp(dentry->d_name.name, name->name, name->len))
			return 0;
	}
	if (!flags)
		return 0;
Gabriel Krisman Bertazi Aug. 14, 2023, 2:50 p.m. UTC | #2
Eric Biggers <ebiggers@kernel.org> writes:

> On Fri, Aug 11, 2023 at 08:41:42PM -0400, Gabriel Krisman Bertazi wrote:
>> +	/*
>> +	 * Filesystems will call into d_revalidate without setting
>> +	 * LOOKUP_ flags even for file creation (see lookup_one*
>> +	 * variants).  Reject negative dentries in this case, since we
>> +	 * can't know for sure it won't be used for creation.
>> +	 */
>> +	if (!flags)
>> +		return 0;
>> +
>> +	/*
>> +	 * If the lookup is for creation, then a negative dentry can
>> +	 * only be reused if it's a case-sensitive match, not just a
>> +	 * case-insensitive one.  This is needed to make the new file be
>> +	 * created with the name the user specified, preserving case.
>> +	 */
>> +	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
>> +		/*
>> +		 * ->d_name won't change from under us in the creation
>> +		 * path only, since d_revalidate during creation and
>> +		 * renames is always called with the parent inode
>> +		 * locked.  It isn't the case for all lookup callpaths,
>> +		 * so ->d_name must not be touched outside
>> +		 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
>> +		 */
>> +		if (dentry->d_name.len != name->len ||
>> +		    memcmp(dentry->d_name.name, name->name, name->len))
>> +			return 0;
>> +	}
>
> This is still really confusing to me.  Can you consider the below?  The code is
> the same except for the reordering, but the explanation is reworked to be much
> clearer (IMO).  Anything I am misunderstanding?
>
> 	/*
> 	 * If the lookup is for creation, then a negative dentry can only be
> 	 * reused if it's a case-sensitive match, not just a case-insensitive
> 	 * one.  This is needed to make the new file be created with the name
> 	 * the user specified, preserving case.
> 	 *
> 	 * LOOKUP_CREATE or LOOKUP_RENAME_TARGET cover most creations.  In these
> 	 * cases, ->d_name is stable and can be compared to 'name' without
> 	 * taking ->d_lock because the caller holds dir->i_rwsem for write.
> 	 * (This is because the directory lock blocks the dentry from being
> 	 * concurrently instantiated, and negative dentries are never moved.)
> 	 *
> 	 * All other creations actually use flags==0.  These come from the edge
> 	 * case of filesystems calling functions like lookup_one() that do a
> 	 * lookup without setting the lookup flags at all.  Such lookups might
> 	 * or might not be for creation, and if not don't guarantee stable
> 	 * ->d_name.  Therefore, invalidate all negative dentries when flags==0.
> 	 */
> 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
> 		if (dentry->d_name.len != name->len ||
> 		    memcmp(dentry->d_name.name, name->name, name->len))
> 			return 0;
> 	}
> 	if (!flags)
> 		return 0;

I don't see it as particularly better or less confusing than the
original. but I also don't mind taking it into the next iteration.
Eric Biggers Aug. 14, 2023, 6:42 p.m. UTC | #3
On Mon, Aug 14, 2023 at 10:50:13AM -0400, Gabriel Krisman Bertazi wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
> 
> > On Fri, Aug 11, 2023 at 08:41:42PM -0400, Gabriel Krisman Bertazi wrote:
> >> +	/*
> >> +	 * Filesystems will call into d_revalidate without setting
> >> +	 * LOOKUP_ flags even for file creation (see lookup_one*
> >> +	 * variants).  Reject negative dentries in this case, since we
> >> +	 * can't know for sure it won't be used for creation.
> >> +	 */
> >> +	if (!flags)
> >> +		return 0;
> >> +
> >> +	/*
> >> +	 * If the lookup is for creation, then a negative dentry can
> >> +	 * only be reused if it's a case-sensitive match, not just a
> >> +	 * case-insensitive one.  This is needed to make the new file be
> >> +	 * created with the name the user specified, preserving case.
> >> +	 */
> >> +	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
> >> +		/*
> >> +		 * ->d_name won't change from under us in the creation
> >> +		 * path only, since d_revalidate during creation and
> >> +		 * renames is always called with the parent inode
> >> +		 * locked.  It isn't the case for all lookup callpaths,
> >> +		 * so ->d_name must not be touched outside
> >> +		 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
> >> +		 */
> >> +		if (dentry->d_name.len != name->len ||
> >> +		    memcmp(dentry->d_name.name, name->name, name->len))
> >> +			return 0;
> >> +	}
> >
> > This is still really confusing to me.  Can you consider the below?  The code is
> > the same except for the reordering, but the explanation is reworked to be much
> > clearer (IMO).  Anything I am misunderstanding?
> >
> > 	/*
> > 	 * If the lookup is for creation, then a negative dentry can only be
> > 	 * reused if it's a case-sensitive match, not just a case-insensitive
> > 	 * one.  This is needed to make the new file be created with the name
> > 	 * the user specified, preserving case.
> > 	 *
> > 	 * LOOKUP_CREATE or LOOKUP_RENAME_TARGET cover most creations.  In these
> > 	 * cases, ->d_name is stable and can be compared to 'name' without
> > 	 * taking ->d_lock because the caller holds dir->i_rwsem for write.
> > 	 * (This is because the directory lock blocks the dentry from being
> > 	 * concurrently instantiated, and negative dentries are never moved.)
> > 	 *
> > 	 * All other creations actually use flags==0.  These come from the edge
> > 	 * case of filesystems calling functions like lookup_one() that do a
> > 	 * lookup without setting the lookup flags at all.  Such lookups might
> > 	 * or might not be for creation, and if not don't guarantee stable
> > 	 * ->d_name.  Therefore, invalidate all negative dentries when flags==0.
> > 	 */
> > 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
> > 		if (dentry->d_name.len != name->len ||
> > 		    memcmp(dentry->d_name.name, name->name, name->len))
> > 			return 0;
> > 	}
> > 	if (!flags)
> > 		return 0;
> 
> I don't see it as particularly better or less confusing than the
> original. but I also don't mind taking it into the next iteration.
> 

Your commit message is still much longer and covers some quite different details
which seem irrelevant to me.  So if you don't see my explanation as being much
different, I think we're still not on the same page.  I hope that I'm not
misunderstanding anything, in which I believe that what I wrote above is a good
explanation and your commit message should be substantially simplified.
Remember, longer != better.  Keep things as simple as possible.

- Eric
Gabriel Krisman Bertazi Aug. 14, 2023, 7:21 p.m. UTC | #4
Eric Biggers <ebiggers@kernel.org> writes:

> On Mon, Aug 14, 2023 at 10:50:13AM -0400, Gabriel Krisman Bertazi wrote:
>> Eric Biggers <ebiggers@kernel.org> writes:
>> 
>> > On Fri, Aug 11, 2023 at 08:41:42PM -0400, Gabriel Krisman Bertazi wrote:
>> >> +	/*
>> >> +	 * Filesystems will call into d_revalidate without setting
>> >> +	 * LOOKUP_ flags even for file creation (see lookup_one*
>> >> +	 * variants).  Reject negative dentries in this case, since we
>> >> +	 * can't know for sure it won't be used for creation.
>> >> +	 */
>> >> +	if (!flags)
>> >> +		return 0;
>> >> +
>> >> +	/*
>> >> +	 * If the lookup is for creation, then a negative dentry can
>> >> +	 * only be reused if it's a case-sensitive match, not just a
>> >> +	 * case-insensitive one.  This is needed to make the new file be
>> >> +	 * created with the name the user specified, preserving case.
>> >> +	 */
>> >> +	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
>> >> +		/*
>> >> +		 * ->d_name won't change from under us in the creation
>> >> +		 * path only, since d_revalidate during creation and
>> >> +		 * renames is always called with the parent inode
>> >> +		 * locked.  It isn't the case for all lookup callpaths,
>> >> +		 * so ->d_name must not be touched outside
>> >> +		 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
>> >> +		 */
>> >> +		if (dentry->d_name.len != name->len ||
>> >> +		    memcmp(dentry->d_name.name, name->name, name->len))
>> >> +			return 0;
>> >> +	}
>> >
>> > This is still really confusing to me.  Can you consider the below?  The code is
>> > the same except for the reordering, but the explanation is reworked to be much
>> > clearer (IMO).  Anything I am misunderstanding?
>> >
>> > 	/*
>> > 	 * If the lookup is for creation, then a negative dentry can only be
>> > 	 * reused if it's a case-sensitive match, not just a case-insensitive
>> > 	 * one.  This is needed to make the new file be created with the name
>> > 	 * the user specified, preserving case.
>> > 	 *
>> > 	 * LOOKUP_CREATE or LOOKUP_RENAME_TARGET cover most creations.  In these
>> > 	 * cases, ->d_name is stable and can be compared to 'name' without
>> > 	 * taking ->d_lock because the caller holds dir->i_rwsem for write.
>> > 	 * (This is because the directory lock blocks the dentry from being
>> > 	 * concurrently instantiated, and negative dentries are never moved.)
>> > 	 *
>> > 	 * All other creations actually use flags==0.  These come from the edge
>> > 	 * case of filesystems calling functions like lookup_one() that do a
>> > 	 * lookup without setting the lookup flags at all.  Such lookups might
>> > 	 * or might not be for creation, and if not don't guarantee stable
>> > 	 * ->d_name.  Therefore, invalidate all negative dentries when flags==0.
>> > 	 */
>> > 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
>> > 		if (dentry->d_name.len != name->len ||
>> > 		    memcmp(dentry->d_name.name, name->name, name->len))
>> > 			return 0;
>> > 	}
>> > 	if (!flags)
>> > 		return 0;
>> 
>> I don't see it as particularly better or less confusing than the
>> original. but I also don't mind taking it into the next iteration.
>> 
>
> Your commit message is still much longer and covers some quite different details
> which seem irrelevant to me.  So if you don't see my explanation as being much
> different, I think we're still not on the same page.  I hope that I'm not
> misunderstanding anything, in which I believe that what I wrote above is a good
> explanation and your commit message should be substantially simplified.
> Remember, longer != better.  Keep things as simple as possible.

I think we are on the same page regarding the code.  I was under the
impression your suggestion was regarding the *code comment* you proposed
to replace, and not the commit message.  I don't see your code comment
to be much different than the original.

The commit message has information accumulated on previous discussions,
including the conclusions from the locking discussion Viro requested.
I'll reword it too for the next iteration to see if I can make it more
concise.

Thx
Eric Biggers Aug. 14, 2023, 7:26 p.m. UTC | #5
On Mon, Aug 14, 2023 at 03:21:33PM -0400, Gabriel Krisman Bertazi wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
> 
> > On Mon, Aug 14, 2023 at 10:50:13AM -0400, Gabriel Krisman Bertazi wrote:
> >> Eric Biggers <ebiggers@kernel.org> writes:
> >> 
> >> > On Fri, Aug 11, 2023 at 08:41:42PM -0400, Gabriel Krisman Bertazi wrote:
> >> >> +	/*
> >> >> +	 * Filesystems will call into d_revalidate without setting
> >> >> +	 * LOOKUP_ flags even for file creation (see lookup_one*
> >> >> +	 * variants).  Reject negative dentries in this case, since we
> >> >> +	 * can't know for sure it won't be used for creation.
> >> >> +	 */
> >> >> +	if (!flags)
> >> >> +		return 0;
> >> >> +
> >> >> +	/*
> >> >> +	 * If the lookup is for creation, then a negative dentry can
> >> >> +	 * only be reused if it's a case-sensitive match, not just a
> >> >> +	 * case-insensitive one.  This is needed to make the new file be
> >> >> +	 * created with the name the user specified, preserving case.
> >> >> +	 */
> >> >> +	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
> >> >> +		/*
> >> >> +		 * ->d_name won't change from under us in the creation
> >> >> +		 * path only, since d_revalidate during creation and
> >> >> +		 * renames is always called with the parent inode
> >> >> +		 * locked.  It isn't the case for all lookup callpaths,
> >> >> +		 * so ->d_name must not be touched outside
> >> >> +		 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
> >> >> +		 */
> >> >> +		if (dentry->d_name.len != name->len ||
> >> >> +		    memcmp(dentry->d_name.name, name->name, name->len))
> >> >> +			return 0;
> >> >> +	}
> >> >
> >> > This is still really confusing to me.  Can you consider the below?  The code is
> >> > the same except for the reordering, but the explanation is reworked to be much
> >> > clearer (IMO).  Anything I am misunderstanding?
> >> >
> >> > 	/*
> >> > 	 * If the lookup is for creation, then a negative dentry can only be
> >> > 	 * reused if it's a case-sensitive match, not just a case-insensitive
> >> > 	 * one.  This is needed to make the new file be created with the name
> >> > 	 * the user specified, preserving case.
> >> > 	 *
> >> > 	 * LOOKUP_CREATE or LOOKUP_RENAME_TARGET cover most creations.  In these
> >> > 	 * cases, ->d_name is stable and can be compared to 'name' without
> >> > 	 * taking ->d_lock because the caller holds dir->i_rwsem for write.
> >> > 	 * (This is because the directory lock blocks the dentry from being
> >> > 	 * concurrently instantiated, and negative dentries are never moved.)
> >> > 	 *
> >> > 	 * All other creations actually use flags==0.  These come from the edge
> >> > 	 * case of filesystems calling functions like lookup_one() that do a
> >> > 	 * lookup without setting the lookup flags at all.  Such lookups might
> >> > 	 * or might not be for creation, and if not don't guarantee stable
> >> > 	 * ->d_name.  Therefore, invalidate all negative dentries when flags==0.
> >> > 	 */
> >> > 	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
> >> > 		if (dentry->d_name.len != name->len ||
> >> > 		    memcmp(dentry->d_name.name, name->name, name->len))
> >> > 			return 0;
> >> > 	}
> >> > 	if (!flags)
> >> > 		return 0;
> >> 
> >> I don't see it as particularly better or less confusing than the
> >> original. but I also don't mind taking it into the next iteration.
> >> 
> >
> > Your commit message is still much longer and covers some quite different details
> > which seem irrelevant to me.  So if you don't see my explanation as being much
> > different, I think we're still not on the same page.  I hope that I'm not
> > misunderstanding anything, in which I believe that what I wrote above is a good
> > explanation and your commit message should be substantially simplified.
> > Remember, longer != better.  Keep things as simple as possible.
> 
> I think we are on the same page regarding the code.  I was under the
> impression your suggestion was regarding the *code comment* you proposed
> to replace, and not the commit message.  I don't see your code comment
> to be much different than the original.
> 
> The commit message has information accumulated on previous discussions,
> including the conclusions from the locking discussion Viro requested.
> I'll reword it too for the next iteration to see if I can make it more
> concise.
> 

Yes, I was talking about the code comment, but the commit message is explaining
the same thing so it needs to be consistent (or have the commit message just
reference the code).  As-is they seem to be in contradiction.

- Eric
diff mbox series

Patch

diff --git a/fs/libfs.c b/fs/libfs.c
index 8d0b64cfd5da..cb98c4721327 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1452,9 +1452,66 @@  static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
 	return 0;
 }
 
+static int generic_ci_d_revalidate(struct dentry *dentry,
+				   const struct qstr *name,
+				   unsigned int flags)
+{
+	const struct dentry *parent;
+	const struct inode *dir;
+
+	if (!d_is_negative(dentry))
+		return 1;
+
+	parent = READ_ONCE(dentry->d_parent);
+	dir = READ_ONCE(parent->d_inode);
+
+	if (!dir || !dir_is_casefolded(dir))
+		return 1;
+
+	/*
+	 * Negative dentries created prior to turning the directory
+	 * case-insensitive cannot be trusted, since they don't ensure
+	 * any possible case version of the filename doesn't exist.
+	 */
+	if (!d_is_casefolded_name(dentry))
+		return 0;
+
+	/*
+	 * Filesystems will call into d_revalidate without setting
+	 * LOOKUP_ flags even for file creation (see lookup_one*
+	 * variants).  Reject negative dentries in this case, since we
+	 * can't know for sure it won't be used for creation.
+	 */
+	if (!flags)
+		return 0;
+
+	/*
+	 * If the lookup is for creation, then a negative dentry can
+	 * only be reused if it's a case-sensitive match, not just a
+	 * case-insensitive one.  This is needed to make the new file be
+	 * created with the name the user specified, preserving case.
+	 */
+	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
+		/*
+		 * ->d_name won't change from under us in the creation
+		 * path only, since d_revalidate during creation and
+		 * renames is always called with the parent inode
+		 * locked.  It isn't the case for all lookup callpaths,
+		 * so ->d_name must not be touched outside
+		 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
+		 */
+		if (dentry->d_name.len != name->len ||
+		    memcmp(dentry->d_name.name, name->name, name->len))
+			return 0;
+	}
+
+	return 1;
+}
+
 static const struct dentry_operations generic_ci_dentry_ops = {
 	.d_hash = generic_ci_d_hash,
 	.d_compare = generic_ci_d_compare,
+	.d_revalidate = generic_ci_d_revalidate,
 };
 #endif