diff mbox series

ext4: Fix possible corruption when moving a directory with RENAME_EXCHANGE

Message ID 20230523131408.13470-1-jack@suse.cz
State Superseded
Headers show
Series ext4: Fix possible corruption when moving a directory with RENAME_EXCHANGE | expand

Commit Message

Jan Kara May 23, 2023, 1:14 p.m. UTC
Commit 0813299c586b ("ext4: Fix possible corruption when moving a
directory") forgot that handling of RENAME_EXCHANGE renames needs the
protection of inode lock when changing directory parents for moved
directories. Add proper locking for that case as well.

CC: stable@vger.kernel.org
Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory")
Reported-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/namei.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

Comments

David Laight May 23, 2023, 1:50 p.m. UTC | #1
From: Jan Kara
> Sent: 23 May 2023 14:14
> 
> Commit 0813299c586b ("ext4: Fix possible corruption when moving a
> directory") forgot that handling of RENAME_EXCHANGE renames needs the
> protection of inode lock when changing directory parents for moved
> directories. Add proper locking for that case as well.
> 
> CC: stable@vger.kernel.org
> Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory")
> Reported-by: "Darrick J. Wong" <djwong@kernel.org>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/namei.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 45b579805c95..b91abea1c781 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -4083,10 +4083,25 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
>  	if (retval)
>  		return retval;
> 
> +	/*
> +	 * We need to protect against old.inode and new.inode directory getting
> +	 * converted from inline directory format into a normal one. The lock
> +	 * ordering does not matter here as old and new are guaranteed to be
> +	 * incomparable in the directory hierarchy.
> +	 */
> +	if (S_ISDIR(old.inode->i_mode))
> +		inode_lock(old.inode);
> +	if (S_ISDIR(new.inode->i_mode))
> +		inode_lock_nested(new.inode, I_MUTEX_NONDIR2);
> +

What happens if there is another concurrent rename from new.inode
to old.inode?
That will try to acquire the locks in the other order.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Jan Kara May 24, 2023, 10:51 a.m. UTC | #2
On Tue 23-05-23 13:50:01, David Laight wrote:
> From: Jan Kara
> > Sent: 23 May 2023 14:14
> > 
> > Commit 0813299c586b ("ext4: Fix possible corruption when moving a
> > directory") forgot that handling of RENAME_EXCHANGE renames needs the
> > protection of inode lock when changing directory parents for moved
> > directories. Add proper locking for that case as well.
> > 
> > CC: stable@vger.kernel.org
> > Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory")
> > Reported-by: "Darrick J. Wong" <djwong@kernel.org>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ext4/namei.c | 23 +++++++++++++++++++++--
> >  1 file changed, 21 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > index 45b579805c95..b91abea1c781 100644
> > --- a/fs/ext4/namei.c
> > +++ b/fs/ext4/namei.c
> > @@ -4083,10 +4083,25 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
> >  	if (retval)
> >  		return retval;
> > 
> > +	/*
> > +	 * We need to protect against old.inode and new.inode directory getting
> > +	 * converted from inline directory format into a normal one. The lock
> > +	 * ordering does not matter here as old and new are guaranteed to be
> > +	 * incomparable in the directory hierarchy.
> > +	 */
> > +	if (S_ISDIR(old.inode->i_mode))
> > +		inode_lock(old.inode);
> > +	if (S_ISDIR(new.inode->i_mode))
> > +		inode_lock_nested(new.inode, I_MUTEX_NONDIR2);
> > +
> 
> What happens if there is another concurrent rename from new.inode
> to old.inode?
> That will try to acquire the locks in the other order.

That is not really possible because these two renames cannot happen in
parallel due to VFS locking - either old & new share parent which is locked
by VFS (so there cannot be another rename in that directory) or they have
different parents which are also locked by VFS (so again it is not possible
to race with another rename in these two dirs).

								Honza
Amir Goldstein May 24, 2023, 1:11 p.m. UTC | #3
On Wed, May 24, 2023 at 2:27 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 23-05-23 13:50:01, David Laight wrote:
> > From: Jan Kara
> > > Sent: 23 May 2023 14:14
> > >
> > > Commit 0813299c586b ("ext4: Fix possible corruption when moving a
> > > directory") forgot that handling of RENAME_EXCHANGE renames needs the
> > > protection of inode lock when changing directory parents for moved
> > > directories. Add proper locking for that case as well.
> > >
> > > CC: stable@vger.kernel.org
> > > Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory")
> > > Reported-by: "Darrick J. Wong" <djwong@kernel.org>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  fs/ext4/namei.c | 23 +++++++++++++++++++++--
> > >  1 file changed, 21 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > > index 45b579805c95..b91abea1c781 100644
> > > --- a/fs/ext4/namei.c
> > > +++ b/fs/ext4/namei.c
> > > @@ -4083,10 +4083,25 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
> > >     if (retval)
> > >             return retval;
> > >
> > > +   /*
> > > +    * We need to protect against old.inode and new.inode directory getting
> > > +    * converted from inline directory format into a normal one. The lock
> > > +    * ordering does not matter here as old and new are guaranteed to be
> > > +    * incomparable in the directory hierarchy.
> > > +    */
> > > +   if (S_ISDIR(old.inode->i_mode))
> > > +           inode_lock(old.inode);
> > > +   if (S_ISDIR(new.inode->i_mode))
> > > +           inode_lock_nested(new.inode, I_MUTEX_NONDIR2);
> > > +
> >
> > What happens if there is another concurrent rename from new.inode
> > to old.inode?
> > That will try to acquire the locks in the other order.
>
> That is not really possible because these two renames cannot happen in
> parallel due to VFS locking - either old & new share parent which is locked
> by VFS (so there cannot be another rename in that directory) or they have
> different parents which are also locked by VFS (so again it is not possible
> to race with another rename in these two dirs).

Unless D1/A ; D1/B are hardlinks of D2/B ; D2/A respectively
and exchange (D1/A, D1/B) is racing with exchange (D2/B, D2/A)

There is a simple solution of course, same as xfs_lock_two_inodes()

Another possible deadlock (I think) is if D/A ; D/B are subdirs that
are exchanged and after taking inode_lock of D and A, rename comes
in D/B/foo => D/A/foo and lock_rename() tries to
lock_two_directories(B, A).

So it seems that both lock_two_directories() and to be helper
lock_two_inodes() need to order the two inodes by address?

Thanks,
Amir.
Jan Kara May 24, 2023, 2:18 p.m. UTC | #4
On Wed 24-05-23 16:11:13, Amir Goldstein wrote:
> On Wed, May 24, 2023 at 2:27 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 23-05-23 13:50:01, David Laight wrote:
> > > From: Jan Kara
> > > > Sent: 23 May 2023 14:14
> > > >
> > > > Commit 0813299c586b ("ext4: Fix possible corruption when moving a
> > > > directory") forgot that handling of RENAME_EXCHANGE renames needs the
> > > > protection of inode lock when changing directory parents for moved
> > > > directories. Add proper locking for that case as well.
> > > >
> > > > CC: stable@vger.kernel.org
> > > > Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory")
> > > > Reported-by: "Darrick J. Wong" <djwong@kernel.org>
> > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > ---
> > > >  fs/ext4/namei.c | 23 +++++++++++++++++++++--
> > > >  1 file changed, 21 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > > > index 45b579805c95..b91abea1c781 100644
> > > > --- a/fs/ext4/namei.c
> > > > +++ b/fs/ext4/namei.c
> > > > @@ -4083,10 +4083,25 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
> > > >     if (retval)
> > > >             return retval;
> > > >
> > > > +   /*
> > > > +    * We need to protect against old.inode and new.inode directory getting
> > > > +    * converted from inline directory format into a normal one. The lock
> > > > +    * ordering does not matter here as old and new are guaranteed to be
> > > > +    * incomparable in the directory hierarchy.
> > > > +    */
> > > > +   if (S_ISDIR(old.inode->i_mode))
> > > > +           inode_lock(old.inode);
> > > > +   if (S_ISDIR(new.inode->i_mode))
> > > > +           inode_lock_nested(new.inode, I_MUTEX_NONDIR2);
> > > > +
> > >
> > > What happens if there is another concurrent rename from new.inode
> > > to old.inode?
> > > That will try to acquire the locks in the other order.
> >
> > That is not really possible because these two renames cannot happen in
> > parallel due to VFS locking - either old & new share parent which is locked
> > by VFS (so there cannot be another rename in that directory) or they have
> > different parents which are also locked by VFS (so again it is not possible
> > to race with another rename in these two dirs).
> 
> Unless D1/A ; D1/B are hardlinks of D2/B ; D2/A respectively
> and exchange (D1/A, D1/B) is racing with exchange (D2/B, D2/A)

Well, but these are *directories*. So no hardlinks possible ;) I agree with
regular files we'd have to be more careful but then VFS would take care of
the locking anyway. I'm still convinced VFS should be taking care of
locking of directories as well but Al disagreed [1] and wants only filesystems
that need this to handle the directory locking.

> There is a simple solution of course, same as xfs_lock_two_inodes()
> 
> Another possible deadlock (I think) is if D/A ; D/B are subdirs that
> are exchanged and after taking inode_lock of D and A, rename comes
> in D/B/foo => D/A/foo and lock_rename() tries to
> lock_two_directories(B, A).
> 
> So it seems that both lock_two_directories() and to be helper
> lock_two_inodes() need to order the two inodes by address?

Right, so this case indeed looks possible and I didn't think about it.
Thanks for spotting this! Let me try to persuade Al again to do the
necessary locking in VFS as it is getting really hairy and needs VFS
changes anyway.

								Honza

[1] https://lore.kernel.org/all/Y8bTk1CsH9AaAnLt@ZenIV
David Laight May 24, 2023, 2:25 p.m. UTC | #5
From: Jan Kara
> Sent: 24 May 2023 15:19
....
> Right, so this case indeed looks possible and I didn't think about it.
> Thanks for spotting this! Let me try to persuade Al again to do the
> necessary locking in VFS as it is getting really hairy and needs VFS
> changes anyway.

I think it was NetBSD that started using a global lock for
non-trival renames because otherwise it is all 'just too hard'.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 45b579805c95..b91abea1c781 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -4083,10 +4083,25 @@  static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	if (retval)
 		return retval;
 
+	/*
+	 * We need to protect against old.inode and new.inode directory getting
+	 * converted from inline directory format into a normal one. The lock
+	 * ordering does not matter here as old and new are guaranteed to be
+	 * incomparable in the directory hierarchy.
+	 */
+	if (S_ISDIR(old.inode->i_mode))
+		inode_lock(old.inode);
+	if (S_ISDIR(new.inode->i_mode))
+		inode_lock_nested(new.inode, I_MUTEX_NONDIR2);
+
 	old.bh = ext4_find_entry(old.dir, &old.dentry->d_name,
 				 &old.de, &old.inlined);
-	if (IS_ERR(old.bh))
-		return PTR_ERR(old.bh);
+	if (IS_ERR(old.bh)) {
+		retval = PTR_ERR(old.bh);
+		old.bh = NULL;
+		goto end_rename;
+	}
+
 	/*
 	 *  Check for inode number is _not_ due to possible IO errors.
 	 *  We might rmdir the source, keep it as pwd of some process
@@ -4186,6 +4201,10 @@  static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	retval = 0;
 
 end_rename:
+	if (S_ISDIR(old.inode->i_mode))
+		inode_unlock(old.inode);
+	if (S_ISDIR(new.inode->i_mode))
+		inode_unlock(new.inode);
 	brelse(old.dir_bh);
 	brelse(new.dir_bh);
 	brelse(old.bh);