diff mbox series

smb: client: avoid dentry leak by not overwriting cfid->dentry

Message ID 20250506223156.121141-1-henrique.carvalho@suse.com
State New
Headers show
Series smb: client: avoid dentry leak by not overwriting cfid->dentry | expand

Commit Message

Henrique Carvalho May 6, 2025, 10:31 p.m. UTC
A race, likely between lease break and open, can cause cfid->dentry to
be valid when open_cached_dir() tries to set it again. This overwrites
the old dentry without dput(), leaking it.

Skip assignment if cfid->dentry is already set.

Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
---
 fs/smb/client/cached_dir.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

Comments

Paul Aurich May 7, 2025, 6:16 a.m. UTC | #1
On 2025-05-06 19:31:56 -0300, Henrique Carvalho wrote:
>A race, likely between lease break and open, can cause cfid->dentry to
>be valid when open_cached_dir() tries to set it again. This overwrites
>the old dentry without dput(), leaking it.
>
>Skip assignment if cfid->dentry is already set.
>
>Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
>---
> fs/smb/client/cached_dir.c | 23 +++++++++++++++--------
> 1 file changed, 15 insertions(+), 8 deletions(-)
>
>diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
>index 43228ec2424d..8c1f00a3fc29 100644
>--- a/fs/smb/client/cached_dir.c
>+++ b/fs/smb/client/cached_dir.c
>@@ -219,16 +219,23 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
> 		goto out;
> 	}
>
>-	if (!npath[0]) {
>-		dentry = dget(cifs_sb->root);
>-	} else {
>-		dentry = path_to_dentry(cifs_sb, npath);
>-		if (IS_ERR(dentry)) {
>-			rc = -ENOENT;
>-			goto out;
>+	/*
>+	 * BB: cfid->dentry should be NULL here; if not, we're likely racing with
>+	 * a lease break. This is a temporary workaround to avoid overwriting
>+	 * a valid dentry. Needs proper fix.
>+	 */

Ah ha. I think this is trying to address the same race as Shyam's 'cifs: do 
not return an invalidated cfid' [1].

What about modifying open_cached_dir to hold cfid_list_lock across the call to 
find_or_create_cached_dir through where it tests for validity, and then 
dropping the locking in find_or_create_cached_dir itself (see attached in case 
my text description isn't clear)?

That's the only way I can see that a pre-existing cfid could escape to the 
rest of open_cached_dir. I think.

~Paul

[1] https://lore.kernel.org/linux-cifs/20250502051517.10449-2-sprasad@microsoft.com/T/#u

>+	if (!cfid->dentry) {
>+		if (!npath[0]) {
>+			dentry = dget(cifs_sb->root);
>+		} else {
>+			dentry = path_to_dentry(cifs_sb, npath);
>+			if (IS_ERR(dentry)) {
>+				rc = -ENOENT;
>+				goto out;
>+			}
> 		}
>+		cfid->dentry = dentry;
> 	}
>-	cfid->dentry = dentry;
> 	cfid->tcon = tcon;
>
> 	/*
>-- 
>2.47.0
Henrique Carvalho May 7, 2025, 2:03 p.m. UTC | #2
On Tue, May 06, 2025 at 11:16:46PM -0700, Paul Aurich wrote:
> On 2025-05-06 19:31:56 -0300, Henrique Carvalho wrote:
> > A race, likely between lease break and open, can cause cfid->dentry to
> > be valid when open_cached_dir() tries to set it again. This overwrites
> > the old dentry without dput(), leaking it.
> > 
> > Skip assignment if cfid->dentry is already set.
> > 
> > Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
> > ---
> > fs/smb/client/cached_dir.c | 23 +++++++++++++++--------
> > 1 file changed, 15 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
> > index 43228ec2424d..8c1f00a3fc29 100644
> > --- a/fs/smb/client/cached_dir.c
> > +++ b/fs/smb/client/cached_dir.c
> > @@ -219,16 +219,23 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
> > 		goto out;
> > 	}
> > 
> > -	if (!npath[0]) {
> > -		dentry = dget(cifs_sb->root);
> > -	} else {
> > -		dentry = path_to_dentry(cifs_sb, npath);
> > -		if (IS_ERR(dentry)) {
> > -			rc = -ENOENT;
> > -			goto out;
> > +	/*
> > +	 * BB: cfid->dentry should be NULL here; if not, we're likely racing with
> > +	 * a lease break. This is a temporary workaround to avoid overwriting
> > +	 * a valid dentry. Needs proper fix.
> > +	 */
> 
> Ah ha. I think this is trying to address the same race as Shyam's 'cifs: do
> not return an invalidated cfid' [1].
> 
> What about modifying open_cached_dir to hold cfid_list_lock across the call
> to find_or_create_cached_dir through where it tests for validity, and then
> dropping the locking in find_or_create_cached_dir itself (see attached in
> case my text description isn't clear)?
> 
> That's the only way I can see that a pre-existing cfid could escape to the
> rest of open_cached_dir. I think.
> 
> ~Paul
> 
> [1] https://lore.kernel.org/linux-cifs/20250502051517.10449-2-sprasad@microsoft.com/T/#u
> 
> > +	if (!cfid->dentry) {
> > +		if (!npath[0]) {
> > +			dentry = dget(cifs_sb->root);
> > +		} else {
> > +			dentry = path_to_dentry(cifs_sb, npath);
> > +			if (IS_ERR(dentry)) {
> > +				rc = -ENOENT;
> > +				goto out;
> > +			}
> > 		}
> > +		cfid->dentry = dentry;
> > 	}
> > -	cfid->dentry = dentry;
> > 	cfid->tcon = tcon;
> > 
> > 	/*
> > -- 
> > 2.47.0

> >From 583824153dd901f1f7d63ce9364d32c9c249fd43 Mon Sep 17 00:00:00 2001
> From: Paul Aurich <paul@darkrain42.org>
> Date: Tue, 6 May 2025 22:28:09 -0700
> Subject: [PATCH] smb: client: Avoid race in open_cached_dir with lease breaks
> 
> A pre-existing valid cfid returned from find_or_create_cached_dir might
> race with a lease break, meaning open_cached_dir doesn't consider it
> valid, and thinks it's newly-constructed. This leaks a dentry reference
> if the allocation occurs before the queued lease break work runs.
> 
> Avoid the race by extending holding the cfid_list_lock across
> find_or_create_cached_dir and when the result is checked.
> 
> Signed-off-by: Paul Aurich <paul@darkrain42.org>
> Cc: stable@vger.kernel.org
> ---
>  fs/smb/client/cached_dir.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
> index 10aad87ce679..cb8477c18905 100644
> --- a/fs/smb/client/cached_dir.c
> +++ b/fs/smb/client/cached_dir.c
> @@ -27,38 +27,32 @@ static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
>  						    bool lookup_only,
>  						    __u32 max_cached_dirs)
>  {
>  	struct cached_fid *cfid;
>  
> -	spin_lock(&cfids->cfid_list_lock);
>  	list_for_each_entry(cfid, &cfids->entries, entry) {
>  		if (!strcmp(cfid->path, path)) {
>  			/*
>  			 * If it doesn't have a lease it is either not yet
>  			 * fully cached or it may be in the process of
>  			 * being deleted due to a lease break.
>  			 */
>  			if (!cfid->time || !cfid->has_lease) {
> -				spin_unlock(&cfids->cfid_list_lock);
>  				return NULL;
>  			}
>  			kref_get(&cfid->refcount);
> -			spin_unlock(&cfids->cfid_list_lock);
>  			return cfid;
>  		}
>  	}
>  	if (lookup_only) {
> -		spin_unlock(&cfids->cfid_list_lock);
>  		return NULL;
>  	}
>  	if (cfids->num_entries >= max_cached_dirs) {
> -		spin_unlock(&cfids->cfid_list_lock);
>  		return NULL;
>  	}
>  	cfid = init_cached_dir(path);
>  	if (cfid == NULL) {
> -		spin_unlock(&cfids->cfid_list_lock);
>  		return NULL;
>  	}
>  	cfid->cfids = cfids;
>  	cfids->num_entries++;
>  	list_add(&cfid->entry, &cfids->entries);
> @@ -72,11 +66,10 @@ static struct cached_fid *find_or_create_cached_dir(struct cached_fids *cfids,
>  	 * Concurrent processes won't be to use it yet due to @cfid->time being
>  	 * zero.
>  	 */
>  	cfid->has_lease = true;
>  
> -	spin_unlock(&cfids->cfid_list_lock);
>  	return cfid;
>  }
>  
>  static struct dentry *
>  path_to_dentry(struct cifs_sb_info *cifs_sb, const char *path)
> @@ -185,21 +178,22 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
>  
>  	utf16_path = cifs_convert_path_to_utf16(path, cifs_sb);
>  	if (!utf16_path)
>  		return -ENOMEM;
>  
> +	spin_lock(&cfids->cfid_list_lock);
>  	cfid = find_or_create_cached_dir(cfids, path, lookup_only, tcon->max_cached_dirs);
>  	if (cfid == NULL) {
> +		spin_unlock(&cfids->cfid_list_lock);
>  		kfree(utf16_path);
>  		return -ENOENT;
>  	}
>  	/*
>  	 * Return cached fid if it is valid (has a lease and has a time).
>  	 * Otherwise, it is either a new entry or laundromat worker removed it
>  	 * from @cfids->entries.  Caller will put last reference if the latter.
>  	 */
> -	spin_lock(&cfids->cfid_list_lock);
>  	if (cfid->has_lease && cfid->time) {
>  		spin_unlock(&cfids->cfid_list_lock);
>  		*ret_cfid = cfid;
>  		kfree(utf16_path);
>  		return 0;
> -- 
> 2.47.2
> 

Yes!

Closing this gap seems to be the only viable fix.

I'm running the tests, but LGTM.


Best,
Henrique
Steve French May 7, 2025, 5:01 p.m. UTC | #3
---------- Forwarded message ---------
From: Steve French <smfrench@gmail.com>
Date: Wed, May 7, 2025 at 11:31 AM
Subject: Re: [PATCH] smb: client: avoid dentry leak by not overwriting
cfid->dentry
To: Henrique Carvalho <henrique.carvalho@suse.com>, Enzo Matsumiya
<ematsumiya@suse.de>, Steve French <smfrench@gmail.com>, Shyam Prasad
N <sprasad@microsoft.com>, Bharath S M <bharathsm@microsoft.com>,
ronnie sahlberg <ronniesahlberg@gmail.com>, CIFS
<linux-cifs@vger.kernel.org>


I can try some test runs with Paul's patch.   I wasn't clear on
whether it obsoletes Henrique's patch or if both would still be needed
though.
Is it ok to run with both patches

237d73fd2428 (HEAD -> for-next, origin/for-next, origin/HEAD) smb:
client: Avoid race in open_cached_dir with lease breaks
0b68d50bb6aa smb: client: fix delay on concurrent opens
419408103208 smb: client: avoid dentry leak by not overwriting cfid->dentry
d90b023718a1 smb3 client: warn when parse contexts returns error on
compounded operation
92a09c47464d (tag: v6.15-rc5, linus/master) Linux 6.15-rc5

With Henrique's patch there is a hang it looks like it introduces in
generic/013 to Azure with multichannel (no directory leases) which
seems strange
            http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/443
but it does address the hang with directory leases after test
generic/241 with directory leases:
          http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/12/builds/48
and in the main test group, the only test which fails is the expected
one (the netfs regression) in generic/349 (I need to retry David's
updated netfs fixes for this)
          http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/3/builds/466

Thanks,

Steve

On Wed, May 7, 2025, 1:16 AM Paul Aurich <paul@darkrain42.org> wrote:
>
> On 2025-05-06 19:31:56 -0300, Henrique Carvalho wrote:
> >A race, likely between lease break and open, can cause cfid->dentry to
> >be valid when open_cached_dir() tries to set it again. This overwrites
> >the old dentry without dput(), leaking it.
> >
> >Skip assignment if cfid->dentry is already set.
> >
> >Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
> >---
> > fs/smb/client/cached_dir.c | 23 +++++++++++++++--------
> > 1 file changed, 15 insertions(+), 8 deletions(-)
> >
> >diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
> >index 43228ec2424d..8c1f00a3fc29 100644
> >--- a/fs/smb/client/cached_dir.c
> >+++ b/fs/smb/client/cached_dir.c
> >@@ -219,16 +219,23 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
> >               goto out;
> >       }
> >
> >-      if (!npath[0]) {
> >-              dentry = dget(cifs_sb->root);
> >-      } else {
> >-              dentry = path_to_dentry(cifs_sb, npath);
> >-              if (IS_ERR(dentry)) {
> >-                      rc = -ENOENT;
> >-                      goto out;
> >+      /*
> >+       * BB: cfid->dentry should be NULL here; if not, we're likely racing with
> >+       * a lease break. This is a temporary workaround to avoid overwriting
> >+       * a valid dentry. Needs proper fix.
> >+       */
>
> Ah ha. I think this is trying to address the same race as Shyam's 'cifs: do
> not return an invalidated cfid' [1].
>
> What about modifying open_cached_dir to hold cfid_list_lock across the call to
> find_or_create_cached_dir through where it tests for validity, and then
> dropping the locking in find_or_create_cached_dir itself (see attached in case
> my text description isn't clear)?
>
> That's the only way I can see that a pre-existing cfid could escape to the
> rest of open_cached_dir. I think.
>
> ~Paul
>
> [1] https://lore.kernel.org/linux-cifs/20250502051517.10449-2-sprasad@microsoft.com/T/#u
>
> >+      if (!cfid->dentry) {
> >+              if (!npath[0]) {
> >+                      dentry = dget(cifs_sb->root);
> >+              } else {
> >+                      dentry = path_to_dentry(cifs_sb, npath);
> >+                      if (IS_ERR(dentry)) {
> >+                              rc = -ENOENT;
> >+                              goto out;
> >+                      }
> >               }
> >+              cfid->dentry = dentry;
> >       }
> >-      cfid->dentry = dentry;
> >       cfid->tcon = tcon;
> >
> >       /*
> >--
> >2.47.0
Henrique Carvalho May 7, 2025, 7:38 p.m. UTC | #4
On Wed, May 07, 2025 at 02:01:08PM -0500, Steve French wrote:
> 
>    With both patches I still see the hang in generic/013 to azure
>    multichannel but it does still fix the directory lease crash
> 
>    [1]http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/buil
>    ders/5/builds/444

I cannot see the mount options for this test, does it mount with
nohandlecache?

> 
>    Thanks,
> 
>    Steve
> 
>    On Wed, May 7, 2025, 11:31 AM Steve French <[2]smfrench@gmail.com>
>    wrote:
> 
>    I can try some test runs with Paul's patch.   I wasn't clear on whether
>    it obsoletes Henrique's patch or if both would still be needed though.
>    Is it ok to run with both patches

I believe Paul's patch makes mine obsolete, unless I'm missing something
-- or unless we want to be extra cautious by keeping the dentry check as
an additional safeguard.
Steve French May 7, 2025, 11:56 p.m. UTC | #5
So far Paul's patch is working - it avoided the generic/241 crash (dir
lease related)

      http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/12/builds/50

 and the generic/013 multichannel hang

     http://smb311-linux-testing.southcentralus.cloudapp.azure.com/#/builders/5/builds/445

Since Paul's patch avoids the most urgent problems and is small
enough, the larger longer term locking improvements should probably
just build on this (but some may have to wait for 6.16-rc)

On Wed, May 7, 2025 at 1:16 AM Paul Aurich <paul@darkrain42.org> wrote:
>
> On 2025-05-06 19:31:56 -0300, Henrique Carvalho wrote:
> >A race, likely between lease break and open, can cause cfid->dentry to
> >be valid when open_cached_dir() tries to set it again. This overwrites
> >the old dentry without dput(), leaking it.
> >
> >Skip assignment if cfid->dentry is already set.
> >
> >Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
> >---
> > fs/smb/client/cached_dir.c | 23 +++++++++++++++--------
> > 1 file changed, 15 insertions(+), 8 deletions(-)
> >
> >diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
> >index 43228ec2424d..8c1f00a3fc29 100644
> >--- a/fs/smb/client/cached_dir.c
> >+++ b/fs/smb/client/cached_dir.c
> >@@ -219,16 +219,23 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
> >               goto out;
> >       }
> >
> >-      if (!npath[0]) {
> >-              dentry = dget(cifs_sb->root);
> >-      } else {
> >-              dentry = path_to_dentry(cifs_sb, npath);
> >-              if (IS_ERR(dentry)) {
> >-                      rc = -ENOENT;
> >-                      goto out;
> >+      /*
> >+       * BB: cfid->dentry should be NULL here; if not, we're likely racing with
> >+       * a lease break. This is a temporary workaround to avoid overwriting
> >+       * a valid dentry. Needs proper fix.
> >+       */
>
> Ah ha. I think this is trying to address the same race as Shyam's 'cifs: do
> not return an invalidated cfid' [1].
>
> What about modifying open_cached_dir to hold cfid_list_lock across the call to
> find_or_create_cached_dir through where it tests for validity, and then
> dropping the locking in find_or_create_cached_dir itself (see attached in case
> my text description isn't clear)?
>
> That's the only way I can see that a pre-existing cfid could escape to the
> rest of open_cached_dir. I think.
>
> ~Paul
>
> [1] https://lore.kernel.org/linux-cifs/20250502051517.10449-2-sprasad@microsoft.com/T/#u
>
> >+      if (!cfid->dentry) {
> >+              if (!npath[0]) {
> >+                      dentry = dget(cifs_sb->root);
> >+              } else {
> >+                      dentry = path_to_dentry(cifs_sb, npath);
> >+                      if (IS_ERR(dentry)) {
> >+                              rc = -ENOENT;
> >+                              goto out;
> >+                      }
> >               }
> >+              cfid->dentry = dentry;
> >       }
> >-      cfid->dentry = dentry;
> >       cfid->tcon = tcon;
> >
> >       /*
> >--
> >2.47.0
Shyam Prasad N May 8, 2025, 3:24 p.m. UTC | #6
On Wed, May 7, 2025 at 11:56 AM Paul Aurich <paul@darkrain42.org> wrote:
>
> On 2025-05-06 19:31:56 -0300, Henrique Carvalho wrote:
> >A race, likely between lease break and open, can cause cfid->dentry to
> >be valid when open_cached_dir() tries to set it again. This overwrites
> >the old dentry without dput(), leaking it.
> >
> >Skip assignment if cfid->dentry is already set.
> >
> >Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
> >---
> > fs/smb/client/cached_dir.c | 23 +++++++++++++++--------
> > 1 file changed, 15 insertions(+), 8 deletions(-)
> >
> >diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
> >index 43228ec2424d..8c1f00a3fc29 100644
> >--- a/fs/smb/client/cached_dir.c
> >+++ b/fs/smb/client/cached_dir.c
> >@@ -219,16 +219,23 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
> >               goto out;
> >       }
> >
> >-      if (!npath[0]) {
> >-              dentry = dget(cifs_sb->root);
> >-      } else {
> >-              dentry = path_to_dentry(cifs_sb, npath);
> >-              if (IS_ERR(dentry)) {
> >-                      rc = -ENOENT;
> >-                      goto out;
> >+      /*
> >+       * BB: cfid->dentry should be NULL here; if not, we're likely racing with
> >+       * a lease break. This is a temporary workaround to avoid overwriting
> >+       * a valid dentry. Needs proper fix.
> >+       */
>
> Ah ha. I think this is trying to address the same race as Shyam's 'cifs: do
> not return an invalidated cfid' [1].


Hi Paul,
Yes. One of my patch did this check to avoid leaking dentry.
However, without serializing threads in this codepath, it is not
possible to rule out all such races.

>
> What about modifying open_cached_dir to hold cfid_list_lock across the call to
> find_or_create_cached_dir through where it tests for validity, and then
> dropping the locking in find_or_create_cached_dir itself (see attached in case
> my text description isn't clear)?

We can do that. But holding a spinlock till the response comes back
from the server is not a good idea.
We could see high CPU utilization if response from the server takes longer.
My other patch introduced a mutex just for this purpose.

>
> That's the only way I can see that a pre-existing cfid could escape to the
> rest of open_cached_dir. I think.
>
> ~Paul
>
> [1] https://lore.kernel.org/linux-cifs/20250502051517.10449-2-sprasad@microsoft.com/T/#u
>
> >+      if (!cfid->dentry) {
> >+              if (!npath[0]) {
> >+                      dentry = dget(cifs_sb->root);
> >+              } else {
> >+                      dentry = path_to_dentry(cifs_sb, npath);
> >+                      if (IS_ERR(dentry)) {
> >+                              rc = -ENOENT;
> >+                              goto out;
> >+                      }
> >               }
> >+              cfid->dentry = dentry;
> >       }
> >-      cfid->dentry = dentry;
> >       cfid->tcon = tcon;
> >
> >       /*
> >--
> >2.47.0
Paul Aurich May 8, 2025, 3:53 p.m. UTC | #7
On 2025-05-08 20:54:34 +0530, Shyam Prasad N wrote:
>On Wed, May 7, 2025 at 11:56 AM Paul Aurich <paul@darkrain42.org> wrote:
>>
>> On 2025-05-06 19:31:56 -0300, Henrique Carvalho wrote:
>> >A race, likely between lease break and open, can cause cfid->dentry to
>> >be valid when open_cached_dir() tries to set it again. This overwrites
>> >the old dentry without dput(), leaking it.
>> >
>> >Skip assignment if cfid->dentry is already set.
>> >
>> >Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
>> >---
>> > fs/smb/client/cached_dir.c | 23 +++++++++++++++--------
>> > 1 file changed, 15 insertions(+), 8 deletions(-)
>> >
>> >diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
>> >index 43228ec2424d..8c1f00a3fc29 100644
>> >--- a/fs/smb/client/cached_dir.c
>> >+++ b/fs/smb/client/cached_dir.c
>> >@@ -219,16 +219,23 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
>> >               goto out;
>> >       }
>> >
>> >-      if (!npath[0]) {
>> >-              dentry = dget(cifs_sb->root);
>> >-      } else {
>> >-              dentry = path_to_dentry(cifs_sb, npath);
>> >-              if (IS_ERR(dentry)) {
>> >-                      rc = -ENOENT;
>> >-                      goto out;
>> >+      /*
>> >+       * BB: cfid->dentry should be NULL here; if not, we're likely racing with
>> >+       * a lease break. This is a temporary workaround to avoid overwriting
>> >+       * a valid dentry. Needs proper fix.
>> >+       */
>>
>> Ah ha. I think this is trying to address the same race as Shyam's 'cifs: do
>> not return an invalidated cfid' [1].
>
>
>Hi Paul,
>Yes. One of my patch did this check to avoid leaking dentry.
>However, without serializing threads in this codepath, it is not
>possible to rule out all such races.
>
>>
>> What about modifying open_cached_dir to hold cfid_list_lock across the call to
>> find_or_create_cached_dir through where it tests for validity, and then
>> dropping the locking in find_or_create_cached_dir itself (see attached in case
>> my text description isn't clear)?
>
>We can do that. But holding a spinlock till the response comes back
>from the server is not a good idea.
>We could see high CPU utilization if response from the server takes longer.
>My other patch introduced a mutex just for this purpose.

I don't think my proposed patch extends locking over server-side 
communication or anything that can sleep/preempt. (To be clear about 'tests 
for validity', I just meant the 'if (cfid->has_lease && cfid->time)' test that 
occurs a few lines below the call to find_or_create_cached_dir)

Without my patch (i.e. before changes), find_or_create_cached_dir() holds 
cfid_list_lock over its entire execution, but drops the lock before returning.  
open_cached_dir() (the only caller of find_or_create_cached_dir) then the 
spinlock again and checks if the cfid is valid.

The fix just moves the locking out of find_or_create_cached_dir() so it's 
acquired _once_ and held across both searching for (or constructing a new) 
cfid and then checking the results.

I don't think there are any intervening operations that would sleep?

@@ -185,21 +178,22 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
  
  	utf16_path = cifs_convert_path_to_utf16(path, cifs_sb);
  	if (!utf16_path)
  		return -ENOMEM;
  
+	spin_lock(&cfids->cfid_list_lock);
  	cfid = find_or_create_cached_dir(cfids, path, lookup_only, tcon->max_cached_dirs);
  	if (cfid == NULL) {
+		spin_unlock(&cfids->cfid_list_lock);
  		kfree(utf16_path);
  		return -ENOENT;
  	}
  	/*
  	 * Return cached fid if it is valid (has a lease and has a time).
  	 * Otherwise, it is either a new entry or laundromat worker removed it
  	 * from @cfids->entries.  Caller will put last reference if the latter.
  	 */
-	spin_lock(&cfids->cfid_list_lock);
  	if (cfid->has_lease && cfid->time) {
  		spin_unlock(&cfids->cfid_list_lock);
  		*ret_cfid = cfid;
  		kfree(utf16_path);
  		return 0;


Full patch:
https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commitdiff;h=3ca02e63edccb78ef3659bebc68579c7224a6ca2

>
>>
>> That's the only way I can see that a pre-existing cfid could escape to the
>> rest of open_cached_dir. I think.
>>
>> ~Paul
>>
>> [1] https://lore.kernel.org/linux-cifs/20250502051517.10449-2-sprasad@microsoft.com/T/#u
>>
>> >+      if (!cfid->dentry) {
>> >+              if (!npath[0]) {
>> >+                      dentry = dget(cifs_sb->root);
>> >+              } else {
>> >+                      dentry = path_to_dentry(cifs_sb, npath);
>> >+                      if (IS_ERR(dentry)) {
>> >+                              rc = -ENOENT;
>> >+                              goto out;
>> >+                      }
>> >               }
>> >+              cfid->dentry = dentry;
>> >       }
>> >-      cfid->dentry = dentry;
>> >       cfid->tcon = tcon;
>> >
>> >       /*
>> >--
>> >2.47.0
>
>-- 
>Regards,
>Shyam
Steve French May 8, 2025, 4:09 p.m. UTC | #8
On Thu, May 8, 2025 at 10:53 AM Paul Aurich <paul@darkrain42.org> wrote:
>
> On 2025-05-08 20:54:34 +0530, Shyam Prasad N wrote:
> >On Wed, May 7, 2025 at 11:56 AM Paul Aurich <paul@darkrain42.org> wrote:
> >>
> >> On 2025-05-06 19:31:56 -0300, Henrique Carvalho wrote:
> >> >A race, likely between lease break and open, can cause cfid->dentry to
> >> >be valid when open_cached_dir() tries to set it again. This overwrites
> >> >the old dentry without dput(), leaking it.
> >> >
> >> >Skip assignment if cfid->dentry is already set.
> >> >
> >> >Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
> >> >---
> >> > fs/smb/client/cached_dir.c | 23 +++++++++++++++--------
> >> > 1 file changed, 15 insertions(+), 8 deletions(-)
> >> >
> >> >diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
> >> >index 43228ec2424d..8c1f00a3fc29 100644
> >> >--- a/fs/smb/client/cached_dir.c
> >> >+++ b/fs/smb/client/cached_dir.c
> >> >@@ -219,16 +219,23 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
> >> >               goto out;
> >> >       }
> >> >
> >> >-      if (!npath[0]) {
> >> >-              dentry = dget(cifs_sb->root);
> >> >-      } else {
> >> >-              dentry = path_to_dentry(cifs_sb, npath);
> >> >-              if (IS_ERR(dentry)) {
> >> >-                      rc = -ENOENT;
> >> >-                      goto out;
> >> >+      /*
> >> >+       * BB: cfid->dentry should be NULL here; if not, we're likely racing with
> >> >+       * a lease break. This is a temporary workaround to avoid overwriting
> >> >+       * a valid dentry. Needs proper fix.
> >> >+       */
> >>
> >> Ah ha. I think this is trying to address the same race as Shyam's 'cifs: do
> >> not return an invalidated cfid' [1].
> >
> >
> >Hi Paul,
> >Yes. One of my patch did this check to avoid leaking dentry.
> >However, without serializing threads in this codepath, it is not
> >possible to rule out all such races.
> >
> >>
> >> What about modifying open_cached_dir to hold cfid_list_lock across the call to
> >> find_or_create_cached_dir through where it tests for validity, and then
> >> dropping the locking in find_or_create_cached_dir itself (see attached in case
> >> my text description isn't clear)?
> >
> >We can do that. But holding a spinlock till the response comes back
> >from the server is not a good idea.
> >We could see high CPU utilization if response from the server takes longer.
> >My other patch introduced a mutex just for this purpose.
>
> I don't think my proposed patch extends locking over server-side
> communication or anything that can sleep/preempt. (To be clear about 'tests
> for validity', I just meant the 'if (cfid->has_lease && cfid->time)' test that
> occurs a few lines below the call to find_or_create_cached_dir)
>
> Without my patch (i.e. before changes), find_or_create_cached_dir() holds
> cfid_list_lock over its entire execution, but drops the lock before returning.
> open_cached_dir() (the only caller of find_or_create_cached_dir) then the
> spinlock again and checks if the cfid is valid.
>
> The fix just moves the locking out of find_or_create_cached_dir() so it's
> acquired _once_ and held across both searching for (or constructing a new)
> cfid and then checking the results.
>
> I don't think there are any intervening operations that would sleep?

I don't remember seeing any obvious slowdowns on individual xfstests
running with Paul's patch,
but we can rescan the test runs with and without his patch to make
sure we aren't slowing down
anything (and there is some perf randomness when running tests in the
cloud) - but I didn't see
anything suspicious from a perf perspective
diff mbox series

Patch

diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 43228ec2424d..8c1f00a3fc29 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -219,16 +219,23 @@  int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
 		goto out;
 	}
 
-	if (!npath[0]) {
-		dentry = dget(cifs_sb->root);
-	} else {
-		dentry = path_to_dentry(cifs_sb, npath);
-		if (IS_ERR(dentry)) {
-			rc = -ENOENT;
-			goto out;
+	/*
+	 * BB: cfid->dentry should be NULL here; if not, we're likely racing with
+	 * a lease break. This is a temporary workaround to avoid overwriting
+	 * a valid dentry. Needs proper fix.
+	 */
+	if (!cfid->dentry) {
+		if (!npath[0]) {
+			dentry = dget(cifs_sb->root);
+		} else {
+			dentry = path_to_dentry(cifs_sb, npath);
+			if (IS_ERR(dentry)) {
+				rc = -ENOENT;
+				goto out;
+			}
 		}
+		cfid->dentry = dentry;
 	}
-	cfid->dentry = dentry;
 	cfid->tcon = tcon;
 
 	/*