diff mbox series

[v4,3/7] libfs: Validate negative dentries in case-insensitive directories

Message ID 20230727172843.20542-4-krisman@suse.de
State Not Applicable
Headers show
Series Support negative dentries on case-insensitive ext4 and f2fs | expand

Commit Message

Gabriel Krisman Bertazi July 27, 2023, 5:28 p.m. UTC
From: Gabriel Krisman Bertazi <krisman@collabora.com>

Introduce a dentry revalidation helper to be used by case-insensitive
filesystems to check if it is safe to reuse a negative dentry.

A negative dentry is safe to be reused on a case-insensitive lookup if
it was created during a case-insensitive lookup and this is not a lookup
that will instantiate a dentry. If this is a creation lookup, we also
need to make sure the name matches sensitively the name under lookup in
order to assure the name preserving semantics.

dentry->d_name is only checked by the case-insensitive d_revalidate hook
in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
d_revalidate is always called with the parent inode read-locked, and
therefore the name cannot change from under us.

d_revalidate is only called in 4 places: lookup_dcache, __lookup_slow,
lookup_open and lookup_fast:

  - lookup_dcache always calls it with zeroed flags, with the exception
    of when coming from __lookup_hash, which needs the parent locked
    already, for instance in the open/creation path, which is locked in
    open_last_lookups.

  - In __lookup_slow, either the parent inode is read locked by the
    caller (lookup_slow), or it is called with no flags (lookup_one*).
    The read lock suffices to prevent ->d_name modifications, with the
    exception of one case: __d_unalias, will call __d_move to fix a
    directory accessible from multiple dentries, which effectively swaps
    ->d_name while holding only the shared read lock.  This happens
    through this flow:

    lookup_slow()  //LOOKUP_CREATE
      d_lookup()
        ->d_lookup()
          d_splice_alias()
            __d_unalias()
              __d_move()

    Nevertheless, this case is not a problem because negative dentries
    are not allowed to be moved with __d_move.

  - lookup_open also requires the parent to be locked in the creation
    case, which is done in open_last_lookups.

  - lookup_fast will indeed be called with the parent unlocked, but it
    shouldn't be called with LOOKUP_CREATE.  Either it is called in the
    link_path_walk, where nd->flags doesn't have LOOKUP_CREATE yet or in
    open_last_lookups. But, in this case, it also never has LOOKUP_CREATE,
    because it is only called on the !O_CREAT case, which means op->intent
    doesn't have LOOKUP_CREAT (set in build_open_flags only if O_CREAT is
    set).

Finally, for the LOOKUP_RENAME_TARGET, we are doing a rename, so the
parents inodes are also locked.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v3:
  - Add comment regarding creation (Eric)
  - Reorder checks to clarify !flags meaning (Eric)
  - Add commit message explanaton of the inode read lock wrt.
    __d_move. (Eric)
Changes since v2:
  - Add comments to all rejection cases (Eric)
  - safeguard against filesystem creating dentries without LOOKUP flags
---
 fs/libfs.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

Comments

Christian Brauner July 28, 2023, 1:06 p.m. UTC | #1
On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
> From: Gabriel Krisman Bertazi <krisman@collabora.com>
> 
> Introduce a dentry revalidation helper to be used by case-insensitive
> filesystems to check if it is safe to reuse a negative dentry.
> 
> A negative dentry is safe to be reused on a case-insensitive lookup if
> it was created during a case-insensitive lookup and this is not a lookup
> that will instantiate a dentry. If this is a creation lookup, we also
> need to make sure the name matches sensitively the name under lookup in
> order to assure the name preserving semantics.
> 
> dentry->d_name is only checked by the case-insensitive d_revalidate hook
> in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
> d_revalidate is always called with the parent inode read-locked, and
> therefore the name cannot change from under us.
> 
> d_revalidate is only called in 4 places: lookup_dcache, __lookup_slow,
> lookup_open and lookup_fast:
> 
>   - lookup_dcache always calls it with zeroed flags, with the exception
>     of when coming from __lookup_hash, which needs the parent locked
>     already, for instance in the open/creation path, which is locked in
>     open_last_lookups.
> 
>   - In __lookup_slow, either the parent inode is read locked by the
>     caller (lookup_slow), or it is called with no flags (lookup_one*).
>     The read lock suffices to prevent ->d_name modifications, with the
>     exception of one case: __d_unalias, will call __d_move to fix a
>     directory accessible from multiple dentries, which effectively swaps
>     ->d_name while holding only the shared read lock.  This happens
>     through this flow:
> 
>     lookup_slow()  //LOOKUP_CREATE
>       d_lookup()
>         ->d_lookup()
>           d_splice_alias()
>             __d_unalias()
>               __d_move()
> 
>     Nevertheless, this case is not a problem because negative dentries
>     are not allowed to be moved with __d_move.
> 
>   - lookup_open also requires the parent to be locked in the creation
>     case, which is done in open_last_lookups.
> 
>   - lookup_fast will indeed be called with the parent unlocked, but it
>     shouldn't be called with LOOKUP_CREATE.  Either it is called in the
>     link_path_walk, where nd->flags doesn't have LOOKUP_CREATE yet or in
>     open_last_lookups. But, in this case, it also never has LOOKUP_CREATE,
>     because it is only called on the !O_CREAT case, which means op->intent
>     doesn't have LOOKUP_CREAT (set in build_open_flags only if O_CREAT is
>     set).
> 
> Finally, for the LOOKUP_RENAME_TARGET, we are doing a rename, so the
> parents inodes are also locked.
> 
> Reviewed-by: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> 
> ---
> Changes since v3:
>   - Add comment regarding creation (Eric)
>   - Reorder checks to clarify !flags meaning (Eric)
>   - Add commit message explanaton of the inode read lock wrt.
>     __d_move. (Eric)
> Changes since v2:
>   - Add comments to all rejection cases (Eric)
>   - safeguard against filesystem creating dentries without LOOKUP flags
> ---
>  fs/libfs.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 5b851315eeed..ed04c4dcc312 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1462,9 +1462,64 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
>  	return 0;
>  }
>  
> +static inline int generic_ci_d_revalidate(struct dentry *dentry,
> +					  const struct qstr *name,
> +					  unsigned int flags)
> +{
> +	if (d_is_negative(dentry)) {
> +		const struct dentry *parent = READ_ONCE(dentry->d_parent);
> +		const struct inode *dir = READ_ONCE(parent->d_inode);
> +
> +		if (dir && needs_casefold(dir)) {
> +			/*
> +			 * Negative dentries created prior to turning the
> +			 * directory case-insensitive cannot be trusted, since
> +			 * they don't ensure any possible case version of the
> +			 * filename doesn't exist.
> +			 */
> +			if (!d_is_casefold_lookup(dentry))
> +				return 0;
> +
> +			/*
> +			 * Filesystems will call into d_revalidate without
> +			 * setting LOOKUP_ flags even for file creation (see
> +			 * lookup_one* variants).  Reject negative dentries in
> +			 * this case, since we can't know for sure it won't be
> +			 * used for creation.
> +			 */
> +			if (!flags)
> +				return 0;
> +
> +			/*
> +			 * If the lookup is for creation, then a negative dentry
> +			 * can only be reused if it's a case-sensitive match,
> +			 * not just a case-insensitive one.  This is needed to
> +			 * make the new file be created with the name the user
> +			 * specified, preserving case.
> +			 */
> +			if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
> +				/*
> +				 * ->d_name won't change from under us in the
> +				 * creation path only, since d_revalidate during
> +				 * creation and renames is always called with
> +				 * the parent inode locked.  It isn't the case
> +				 * for all lookup callpaths, so ->d_name must
> +				 * not be touched outside
> +				 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
> +				 */
> +				if (dentry->d_name.len != name->len ||
> +				    memcmp(dentry->d_name.name, name->name, name->len))
> +					return 0;
> +			}
> +		}
> +	}
> +	return 1;
> +}
> +
>  static const struct dentry_operations generic_ci_dentry_ops = {
>  	.d_hash = generic_ci_d_hash,
>  	.d_compare = generic_ci_d_compare,
> +	.d_revalidate_name = generic_ci_d_revalidate,
>  };
>  #endif

Wouldn't it make sense to get rid of all this indentation?

	const struct dentry *parent;
	const struct inode *dir;

	if (!d_is_negative(dentry))
		return 1;

	parent = READ_ONCE(dentry->d_parent);
	dir = READ_ONCE(parent->d_inode);

	if (!dir)
		return 1;

	if (!needs_casefold(dir))
		return 1;

	/*
	 * Negative dentries created prior to turning the
	 * directory case-insensitive cannot be trusted, since
	 * they don't ensure any possible case version of the
	 * filename doesn't exist.
	 */
	if (!d_is_casefold_lookup(dentry))
		return 0;

	/*
	 * Filesystems will call into d_revalidate without
	 * setting LOOKUP_ flags even for file creation (see
	 * lookup_one* variants).  Reject negative dentries in
	 * this case, since we can't know for sure it won't be
	 * used for creation.
	 */
	if (!flags)
		return 0;

	/*
	 * If the lookup is for creation, then a negative dentry
	 * can only be reused if it's a case-sensitive match,
	 * not just a case-insensitive one.  This is needed to
	 * make the new file be created with the name the user
	 * specified, preserving case.
	 */
	if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
		/*
		 * ->d_name won't change from under us in the
		 * creation path only, since d_revalidate during
		 * creation and renames is always called with
		 * the parent inode locked.  It isn't the case
		 * for all lookup callpaths, so ->d_name must
		 * not be touched outside
		 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
		 */
		if (dentry->d_name.len != name->len ||
		    memcmp(dentry->d_name.name, name->name, name->len))
			return 0;
	}
	return 1;
Gabriel Krisman Bertazi July 28, 2023, 3:09 p.m. UTC | #2
Christian Brauner <brauner@kernel.org> writes:

> On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:

>
> Wouldn't it make sense to get rid of all this indentation?

I'm ok with making this change. I'll wait for more reviews and Eric
before sending a new version with this done.

Thanks!
Eric Biggers July 29, 2023, 4:18 a.m. UTC | #3
On Fri, Jul 28, 2023 at 11:09:46AM -0400, Gabriel Krisman Bertazi wrote:
> Christian Brauner <brauner@kernel.org> writes:
> 
> > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
> 
> >
> > Wouldn't it make sense to get rid of all this indentation?
> 
> I'm ok with making this change. I'll wait for more reviews and Eric
> before sending a new version with this done.
> 
> Thanks!
> 

Well, the issue is that with patch 4, all the 'return 1;' would need to change
to 'return fscrypt_d_revalidate(dentry, flags);'.

A helper function could be used, though, if you prefer:

static int generic_ci_d_revalidate(struct dentry *dentry,
				   const struct qstr *name, unsigned int flags)
{
	if (!ci_d_revalidate(dentry, name, flags))
		return 0;
	return fscrypt_d_revalidate(dentry, flags);
}

- Eric
Eric Biggers July 29, 2023, 4:20 a.m. UTC | #4
On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
>   - In __lookup_slow, either the parent inode is read locked by the
>     caller (lookup_slow), or it is called with no flags (lookup_one*).
>     The read lock suffices to prevent ->d_name modifications, with the
>     exception of one case: __d_unalias, will call __d_move to fix a
>     directory accessible from multiple dentries, which effectively swaps
>     ->d_name while holding only the shared read lock.  This happens
>     through this flow:
> 
>     lookup_slow()  //LOOKUP_CREATE
>       d_lookup()
>         ->d_lookup()
>           d_splice_alias()
>             __d_unalias()
>               __d_move()
> 
>     Nevertheless, this case is not a problem because negative dentries
>     are not allowed to be moved with __d_move.

Isn't it possible for a negative dentry to become a positive one concurrently?

- Eric
Eric Biggers July 29, 2023, 4:51 a.m. UTC | #5
On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
> dentry->d_name is only checked by the case-insensitive d_revalidate hook
> in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
> d_revalidate is always called with the parent inode read-locked, and
> therefore the name cannot change from under us.

"at least read-locked"?  Or do you actually mean write-locked?

> +static inline int generic_ci_d_revalidate(struct dentry *dentry,
> +					  const struct qstr *name,
> +					  unsigned int flags)

No need for inline here.

- Eric
Gabriel Krisman Bertazi Aug. 3, 2023, 4:56 p.m. UTC | #6
Eric Biggers <ebiggers@kernel.org> writes:

> On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
>> dentry->d_name is only checked by the case-insensitive d_revalidate hook
>> in the LOOKUP_CREATE/LOOKUP_RENAME_TARGET case since, for these cases,
>> d_revalidate is always called with the parent inode read-locked, and
>> therefore the name cannot change from under us.
>
> "at least read-locked"?  Or do you actually mean write-locked?

No. I mean read-locked, as in holding the read-part of the inode lock.
This is the case for lookup_slow, which is safe, despite the d_add_ci
case we discussed in the previous iteration.  I'll reword to say "at
least read-locked and mention it is the case in lookup_slow".

>> +static inline int generic_ci_d_revalidate(struct dentry *dentry,
>> +					  const struct qstr *name,
>> +					  unsigned int flags)
>
> No need for inline here.

sorry, I missed the inline from your previuos review.  Will fix it up
for this one.


>
> - Eric
Gabriel Krisman Bertazi Aug. 3, 2023, 5:37 p.m. UTC | #7
Eric Biggers <ebiggers@kernel.org> writes:

> On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
>>   - In __lookup_slow, either the parent inode is read locked by the
>>     caller (lookup_slow), or it is called with no flags (lookup_one*).
>>     The read lock suffices to prevent ->d_name modifications, with the
>>     exception of one case: __d_unalias, will call __d_move to fix a
>>     directory accessible from multiple dentries, which effectively swaps
>>     ->d_name while holding only the shared read lock.  This happens
>>     through this flow:
>> 
>>     lookup_slow()  //LOOKUP_CREATE
>>       d_lookup()
>>         ->d_lookup()
>>           d_splice_alias()
>>             __d_unalias()
>>               __d_move()
>> 
>>     Nevertheless, this case is not a problem because negative dentries
>>     are not allowed to be moved with __d_move.
>
> Isn't it possible for a negative dentry to become a positive one concurrently?

Do you mean d_splice_alias racing with a dentry instantiation and
__d_move being called on a negative dentry that is turning positive?

It is not possible for __d_move to be called with a negative dentry for
d_splice_alias, since the inode->i_lock is locked during __d_find_alias,
so it can't race with __d_instantiate or d_add. Then, __d_find_alias
can't find negative dentries in the first place, so we either have a
positive dentry, in which case __d_move is fine with regard to
d_revalidate_name, or we don't have any aliases and don't call
__d_move.

Can you clarify what problem you see here?
Eric Biggers Aug. 4, 2023, 4:41 a.m. UTC | #8
On Thu, Aug 03, 2023 at 01:37:45PM -0400, Gabriel Krisman Bertazi wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
> 
> > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
> >>   - In __lookup_slow, either the parent inode is read locked by the
> >>     caller (lookup_slow), or it is called with no flags (lookup_one*).
> >>     The read lock suffices to prevent ->d_name modifications, with the
> >>     exception of one case: __d_unalias, will call __d_move to fix a
> >>     directory accessible from multiple dentries, which effectively swaps
> >>     ->d_name while holding only the shared read lock.  This happens
> >>     through this flow:
> >> 
> >>     lookup_slow()  //LOOKUP_CREATE
> >>       d_lookup()
> >>         ->d_lookup()
> >>           d_splice_alias()
> >>             __d_unalias()
> >>               __d_move()
> >> 
> >>     Nevertheless, this case is not a problem because negative dentries
> >>     are not allowed to be moved with __d_move.
> >
> > Isn't it possible for a negative dentry to become a positive one concurrently?
> 
> Do you mean d_splice_alias racing with a dentry instantiation and
> __d_move being called on a negative dentry that is turning positive?
> 
> It is not possible for __d_move to be called with a negative dentry for
> d_splice_alias, since the inode->i_lock is locked during __d_find_alias,
> so it can't race with __d_instantiate or d_add. Then, __d_find_alias
> can't find negative dentries in the first place, so we either have a
> positive dentry, in which case __d_move is fine with regard to
> d_revalidate_name, or we don't have any aliases and don't call
> __d_move.
> 
> Can you clarify what problem you see here?
> 

I agree that negative dentries can't be moved --- I pointed this out earlier
(https://lore.kernel.org/linux-fsdevel/20230720060657.GB2607@sol.localdomain).
The question is whether if ->d_revalidate sees a negative dentry, when can it
assume that it remains a negative dentry for the remainder of ->d_revalidate.
I'm not sure there is a problem, I just don't understand your explanation.

- Eric
Gabriel Krisman Bertazi Aug. 8, 2023, 1:33 a.m. UTC | #9
Eric Biggers <ebiggers@kernel.org> writes:

> On Thu, Aug 03, 2023 at 01:37:45PM -0400, Gabriel Krisman Bertazi wrote:
>> Eric Biggers <ebiggers@kernel.org> writes:
>> 
>> > On Thu, Jul 27, 2023 at 01:28:39PM -0400, Gabriel Krisman Bertazi wrote:
>> >>   - In __lookup_slow, either the parent inode is read locked by the
>> >>     caller (lookup_slow), or it is called with no flags (lookup_one*).
>> >>     The read lock suffices to prevent ->d_name modifications, with the
>> >>     exception of one case: __d_unalias, will call __d_move to fix a
>> >>     directory accessible from multiple dentries, which effectively swaps
>> >>     ->d_name while holding only the shared read lock.  This happens
>> >>     through this flow:
>> >> 
>> >>     lookup_slow()  //LOOKUP_CREATE
>> >>       d_lookup()
>> >>         ->d_lookup()
>> >>           d_splice_alias()
>> >>             __d_unalias()
>> >>               __d_move()
>> >> 
>> >>     Nevertheless, this case is not a problem because negative dentries
>> >>     are not allowed to be moved with __d_move.
>> >
>> > Isn't it possible for a negative dentry to become a positive one concurrently?
>> 
>> Do you mean d_splice_alias racing with a dentry instantiation and
>> __d_move being called on a negative dentry that is turning positive?
>> 
>> It is not possible for __d_move to be called with a negative dentry for
>> d_splice_alias, since the inode->i_lock is locked during __d_find_alias,
>> so it can't race with __d_instantiate or d_add. Then, __d_find_alias
>> can't find negative dentries in the first place, so we either have a
>> positive dentry, in which case __d_move is fine with regard to
>> d_revalidate_name, or we don't have any aliases and don't call
>> __d_move.
>> 
>> Can you clarify what problem you see here?
>> 
>
> I agree that negative dentries can't be moved --- I pointed this out earlier
> (https://lore.kernel.org/linux-fsdevel/20230720060657.GB2607@sol.localdomain).
> The question is whether if ->d_revalidate sees a negative dentry, when can it
> assume that it remains a negative dentry for the remainder of ->d_revalidate.
> I'm not sure there is a problem, I just don't understand your
> explanation.

I see. Thanks for clarifying, as I had previously misunderstood your
point.

So, first of all, if d_revalidate itself is not a creation, it doesn't
matter, because we won't touch ->d_name. We might invalidate a valid
dentry, but that is ok.  The problem would be limited to d_revalidate
being on the creation path, where the parent (read-)lock is held.  The
problem would be doing the memcmp(), while the dentry is turned positive
(d_instantiate), while someone else moves the name.

For the dentry to be turned positive during a d_revalidate, it would
then have to race with d_add or with d_instantiate.  d_add shouldn't be
possible since we are holding the parent inode lock (at least
read-side), which will serialize file creation.

From my understanding of the code, d_instantiate also can't race with
d_revalidate for the same reason - is also serialized by the parent
inode lock, which is acquired in filename_create. At least for all paths
in ext4/f2fs. In fact, I'm failing to find a case where the lock is not
taken when instantiating a dentry, but I'm unsure if this is a guarantee
or just an artifact of the code.

It seems to be safe in the current code, but I don't know if it is a
guarantee.  Can anyone comment on this?
diff mbox series

Patch

diff --git a/fs/libfs.c b/fs/libfs.c
index 5b851315eeed..ed04c4dcc312 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1462,9 +1462,64 @@  static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
 	return 0;
 }
 
+static inline int generic_ci_d_revalidate(struct dentry *dentry,
+					  const struct qstr *name,
+					  unsigned int flags)
+{
+	if (d_is_negative(dentry)) {
+		const struct dentry *parent = READ_ONCE(dentry->d_parent);
+		const struct inode *dir = READ_ONCE(parent->d_inode);
+
+		if (dir && needs_casefold(dir)) {
+			/*
+			 * Negative dentries created prior to turning the
+			 * directory case-insensitive cannot be trusted, since
+			 * they don't ensure any possible case version of the
+			 * filename doesn't exist.
+			 */
+			if (!d_is_casefold_lookup(dentry))
+				return 0;
+
+			/*
+			 * Filesystems will call into d_revalidate without
+			 * setting LOOKUP_ flags even for file creation (see
+			 * lookup_one* variants).  Reject negative dentries in
+			 * this case, since we can't know for sure it won't be
+			 * used for creation.
+			 */
+			if (!flags)
+				return 0;
+
+			/*
+			 * If the lookup is for creation, then a negative dentry
+			 * can only be reused if it's a case-sensitive match,
+			 * not just a case-insensitive one.  This is needed to
+			 * make the new file be created with the name the user
+			 * specified, preserving case.
+			 */
+			if (flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)) {
+				/*
+				 * ->d_name won't change from under us in the
+				 * creation path only, since d_revalidate during
+				 * creation and renames is always called with
+				 * the parent inode locked.  It isn't the case
+				 * for all lookup callpaths, so ->d_name must
+				 * not be touched outside
+				 * (LOOKUP_CREATE|LOOKUP_RENAME_TARGET) context.
+				 */
+				if (dentry->d_name.len != name->len ||
+				    memcmp(dentry->d_name.name, name->name, name->len))
+					return 0;
+			}
+		}
+	}
+	return 1;
+}
+
 static const struct dentry_operations generic_ci_dentry_ops = {
 	.d_hash = generic_ci_d_hash,
 	.d_compare = generic_ci_d_compare,
+	.d_revalidate_name = generic_ci_d_revalidate,
 };
 #endif