Patchwork ext4: fix races in ext4_sync_parent()

login
register
mail settings
Submitter Theodore Ts'o
Date July 27, 2011, 1:13 a.m.
Message ID <1311729192-30598-1-git-send-email-tytso@mit.edu>
Download mbox | patch
Permalink /patch/106952/
State Superseded
Headers show

Comments

Theodore Ts'o - July 27, 2011, 1:13 a.m.
Fix problems if fsync() races against a rename of a parent directory
as pointed out by Al Viro in his own inimitable way:

>While we are at it, could somebody please explain what the hell is ext4
>doing in
>static int ext4_sync_parent(struct inode *inode)
>{
>        struct writeback_control wbc;
>        struct dentry *dentry = NULL;
>        int ret = 0;
>
>        while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
>                ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
>                dentry = list_entry(inode->i_dentry.next,
>                                    struct dentry, d_alias);
>                if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
>                        break;
>                inode = dentry->d_parent->d_inode;
>                ret = sync_mapping_buffers(inode->i_mapping);
>                ...
>Note that dentry obviously can't be NULL there.  dentry->d_parent is never
>NULL.  And dentry->d_parent would better not be negative, for crying out
>loud!  What's worse, there's no guarantees that dentry->d_parent will
>remain our parent over that sync_mapping_buffers() *and* that inode won't
>just be freed under us (after rename() and memory pressure leading to
>eviction of what used to be our dentry->d_parent)......

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/fsync.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)
Al Viro - July 27, 2011, 1:15 a.m.
On Tue, Jul 26, 2011 at 09:13:12PM -0400, Theodore Ts'o wrote:
> +	inode = igrab(inode);
> +	while (ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
>  		ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
> -		dentry = list_entry(inode->i_dentry.next,
> -				    struct dentry, d_alias);
> -		if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
> +		dentry = list_first_entry(&inode->i_dentry,
> +					  struct dentry, d_alias);

... and what if you don't have a dentry for that inode anymore?

> +		next = igrab(dentry->d_parent->d_inode);

what protects you against rename() (OK, d_mode()) here?
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - July 28, 2011, 12:34 a.m.
On Wed, Jul 27, 2011 at 02:15:54AM +0100, Al Viro wrote:
> On Tue, Jul 26, 2011 at 09:13:12PM -0400, Theodore Ts'o wrote:
> > +	inode = igrab(inode);
> > +	while (ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
> >  		ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
> > -		dentry = list_entry(inode->i_dentry.next,
> > -				    struct dentry, d_alias);
> > -		if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
> > +		dentry = list_first_entry(&inode->i_dentry,
> > +					  struct dentry, d_alias);
> 
> ... and what if you don't have a dentry for that inode anymore?

I thought you were complaining that dentry could never be NULL
earlier?  We earlier checked for NULL, but I figured you knew
something I didn't....

> > +		next = igrab(dentry->d_parent->d_inode);
> 
> what protects you against rename() (OK, d_mode()) here?

Does it matter?  We'll get either the old or the new parent directory,
but either way it will be a valid inode, right?  Or am I missing
something?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro - July 28, 2011, 1:11 a.m.
On Wed, Jul 27, 2011 at 08:34:22PM -0400, Ted Ts'o wrote:
> > > +		dentry = list_first_entry(&inode->i_dentry,
> > > +					  struct dentry, d_alias);
> > 
> > ... and what if you don't have a dentry for that inode anymore?
> 
> I thought you were complaining that dentry could never be NULL
> earlier?  We earlier checked for NULL, but I figured you knew
> something I didn't....

The pointer won't be NULL, of course, but what guarantees that inode->i_dentry
won't be *empty*?  IOW, you could get a perfectly non-NULL pointer, equal to
	(char *)inode + offsetof(struct inode, i_dentry)
	 - offsetof(struct dentry, d_alias)
and treating it as struct dentry * won't do you any good...

> > > +		next = igrab(dentry->d_parent->d_inode);
> > 
> > what protects you against rename() (OK, d_mode()) here?
> 
> Does it matter?  We'll get either the old or the new parent directory,
> but either way it will be a valid inode, right?  Or am I missing
> something?

Once d_move() has happened, there's nothing to protect the old parent
anymore...  Granted, it's a hell of a narrow race window, but you
need at least ->d_lock on your dentry...
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - July 30, 2011, 4:32 p.m.
On Thu, Jul 28, 2011 at 02:11:29AM +0100, Al Viro wrote:
> Once d_move() has happened, there's nothing to protect the old parent
> anymore...  Granted, it's a hell of a narrow race window, but you
> need at least ->d_lock on your dentry...

Right, got it.  So the following should be safe according to the
dcache locking protocols, right?

	while (ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
		ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
		dentry = NULL;
		spin_lock(&inode->i_lock);
		if (!list_empty(&inode->i_dentry)) {
			dentry = list_first_entry(&inode->i_dentry,
						  struct dentry, d_alias);
			dget(dentry);
		}
		spin_unlock(&inode->i_lock);
		if (!dentry)
			break;
		next = igrab(dentry->d_parent->d_inode);
		dput(dentry);
		if (!next)
			break;
		iput(inode);
		inode = next;
		...

Let me know if I've missed anything....

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro - July 30, 2011, 4:42 p.m.
On Sat, Jul 30, 2011 at 12:32:04PM -0400, Ted Ts'o wrote:
> 		spin_lock(&inode->i_lock);
> 		if (!list_empty(&inode->i_dentry)) {
> 			dentry = list_first_entry(&inode->i_dentry,
> 						  struct dentry, d_alias);
> 			dget(dentry);
> 		}
> 		spin_unlock(&inode->i_lock);

		dentry = d_find_alias(inode);
actually...

> 		if (!dentry)
> 			break;

> 		next = igrab(dentry->d_parent->d_inode);

and that one needs dentry->d_lock around it, to stabilize ->d_parent.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index ce66d2f..5dd17d5 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -129,15 +129,21 @@  static int ext4_sync_parent(struct inode *inode)
 {
 	struct writeback_control wbc;
 	struct dentry *dentry = NULL;
+	struct inode *next;
 	int ret = 0;
 
-	while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
+	if (!ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY))
+		return 0;
+	inode = igrab(inode);
+	while (ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
 		ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
-		dentry = list_entry(inode->i_dentry.next,
-				    struct dentry, d_alias);
-		if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
+		dentry = list_first_entry(&inode->i_dentry,
+					  struct dentry, d_alias);
+		next = igrab(dentry->d_parent->d_inode);
+		if (!next)
 			break;
-		inode = dentry->d_parent->d_inode;
+		iput(inode);
+		inode = next;
 		ret = sync_mapping_buffers(inode->i_mapping);
 		if (ret)
 			break;
@@ -148,6 +154,7 @@  static int ext4_sync_parent(struct inode *inode)
 		if (ret)
 			break;
 	}
+	iput(inode);
 	return ret;
 }