diff mbox

fs/ext{3,4}: fix potential race when setversion ioctl updates inode

Message ID 20120103013152.GA26455@dztty
State New, archived
Headers show

Commit Message

Djalal Harouni Jan. 3, 2012, 1:31 a.m. UTC
The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex,
this can lead to a race with the other operations that update the same
inode.

Patch tested.

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
---
 fs/ext3/ioctl.c |    6 +++++-
 fs/ext4/ioctl.c |    6 +++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

Comments

Jan Kara Jan. 3, 2012, 12:46 p.m. UTC | #1
Hello,

On Tue 03-01-12 02:31:52, Djalal Harouni wrote:
> 
> The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex,
> this can lead to a race with the other operations that update the same
> inode.
> 
> Patch tested.
  Thanks for the patch but I don't quite understand the problem.
i_generation is set when:
  a) inode is loaded from disk
  b) inode is allocated
  c) in SETVERSION ioctl

  The only thing that can race here seems to be c) against c) and that is
racy with i_mutex as well. So what problems do you exactly observe without
the patch?

								Honza
> Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
> ---
>  fs/ext3/ioctl.c |    6 +++++-
>  fs/ext4/ioctl.c |    6 +++++-
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
> index ba1b54e..e7b2ed9 100644
> --- a/fs/ext3/ioctl.c
> +++ b/fs/ext3/ioctl.c
> @@ -134,10 +134,11 @@ flags_out:
>  			goto setversion_out;
>  		}
>  
> +		mutex_lock(&inode->i_mutex);
>  		handle = ext3_journal_start(inode, 1);
>  		if (IS_ERR(handle)) {
>  			err = PTR_ERR(handle);
> -			goto setversion_out;
> +			goto unlock_out;
>  		}
>  		err = ext3_reserve_inode_write(handle, inode, &iloc);
>  		if (err == 0) {
> @@ -146,6 +147,9 @@ flags_out:
>  			err = ext3_mark_iloc_dirty(handle, inode, &iloc);
>  		}
>  		ext3_journal_stop(handle);
> +
> +unlock_out:
> +		mutex_unlock(&inode->i_mutex);
>  setversion_out:
>  		mnt_drop_write(filp->f_path.mnt);
>  		return err;
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index a567968..46a8de6 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -158,10 +158,11 @@ flags_out:
>  			goto setversion_out;
>  		}
>  
> +		mutex_lock(&inode->i_mutex);
>  		handle = ext4_journal_start(inode, 1);
>  		if (IS_ERR(handle)) {
>  			err = PTR_ERR(handle);
> -			goto setversion_out;
> +			goto unlock_out;
>  		}
>  		err = ext4_reserve_inode_write(handle, inode, &iloc);
>  		if (err == 0) {
> @@ -170,6 +171,9 @@ flags_out:
>  			err = ext4_mark_iloc_dirty(handle, inode, &iloc);
>  		}
>  		ext4_journal_stop(handle);
> +
> +unlock_out:
> +		mutex_unlock(&inode->i_mutex);
>  setversion_out:
>  		mnt_drop_write(filp->f_path.mnt);
>  		return err;
> -- 
> 1.7.1
Djalal Harouni Jan. 3, 2012, 11:14 p.m. UTC | #2
On Tue, Jan 03, 2012 at 01:46:24PM +0100, Jan Kara wrote:
>   Hello,
> 
> On Tue 03-01-12 02:31:52, Djalal Harouni wrote:
> > 
> > The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex,
> > this can lead to a race with the other operations that update the same
> > inode.
> > 
> > Patch tested.
>   Thanks for the patch but I don't quite understand the problem.
> i_generation is set when:
>   a) inode is loaded from disk
>   b) inode is allocated
>   c) in SETVERSION ioctl
> 
>   The only thing that can race here seems to be c) against c) and that is
> racy with i_mutex as well. So what problems do you exactly observe without
> the patch?
Right, but what about the related i_ctime change ? (i_ctime is updated in
other places...)

The i_ctime update must reflect the _appropriate_ inode modification
operation. This is why IMHO we should protect them to avoid a lost update.

BTW the i_generation which is used by NFS and fuse filesystems is updated
even if the inode is marked immutable, is this the intended behaviour?


> 								Honza
Thanks for your response.
Jan Kara Jan. 4, 2012, 5:34 p.m. UTC | #3
On Wed 04-01-12 00:14:32, Djalal Harouni wrote:
> On Tue, Jan 03, 2012 at 01:46:24PM +0100, Jan Kara wrote:
> > On Tue 03-01-12 02:31:52, Djalal Harouni wrote:
> > > 
> > > The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex,
> > > this can lead to a race with the other operations that update the same
> > > inode.
> > > 
> > > Patch tested.
> >   Thanks for the patch but I don't quite understand the problem.
> > i_generation is set when:
> >   a) inode is loaded from disk
> >   b) inode is allocated
> >   c) in SETVERSION ioctl
> > 
> >   The only thing that can race here seems to be c) against c) and that is
> > racy with i_mutex as well. So what problems do you exactly observe without
> > the patch?
> Right, but what about the related i_ctime change ? (i_ctime is updated in
> other places...)
> 
> The i_ctime update must reflect the _appropriate_ inode modification
> operation. This is why IMHO we should protect them to avoid a lost update.
  Yes, but realistically even if we race with someone else updating
i_ctime, the times will be rather similar so there's hardly a real
difference.

Anyway, using i_mutex is consistent with how we handle permission changes
or timestamp changes and the ioctl isn't performance critical so I'll take
your patch. I was just wondering whether you observed a real problem or
whether it's more or less a theoretical concern.

> BTW the i_generation which is used by NFS and fuse filesystems is updated
> even if the inode is marked immutable, is this the intended behaviour?
  Well, I'd say it's unexpected that generation can be changed for
immutable inode so I'd be inclined to take a patch that would make ext3
refuse to do that. But it's a change in how the ioctl behaves so I'd like
to hear opinion of Ted or Andrew on this.

								Honza
Jan Kara Jan. 4, 2012, 5:46 p.m. UTC | #4
On Tue 03-01-12 02:31:52, Djalal Harouni wrote:
> 
> The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex,
> this can lead to a race with the other operations that update the same
> inode.
> 
> Patch tested.
  OK, so I've taken the patch into my tree, just updated the changelog
which result of our discussion in this thread. I also took the ext4 part
since there is no risk of conflict and the patch looks obvious.

								Honza

> Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
> ---
>  fs/ext3/ioctl.c |    6 +++++-
>  fs/ext4/ioctl.c |    6 +++++-
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
> index ba1b54e..e7b2ed9 100644
> --- a/fs/ext3/ioctl.c
> +++ b/fs/ext3/ioctl.c
> @@ -134,10 +134,11 @@ flags_out:
>  			goto setversion_out;
>  		}
>  
> +		mutex_lock(&inode->i_mutex);
>  		handle = ext3_journal_start(inode, 1);
>  		if (IS_ERR(handle)) {
>  			err = PTR_ERR(handle);
> -			goto setversion_out;
> +			goto unlock_out;
>  		}
>  		err = ext3_reserve_inode_write(handle, inode, &iloc);
>  		if (err == 0) {
> @@ -146,6 +147,9 @@ flags_out:
>  			err = ext3_mark_iloc_dirty(handle, inode, &iloc);
>  		}
>  		ext3_journal_stop(handle);
> +
> +unlock_out:
> +		mutex_unlock(&inode->i_mutex);
>  setversion_out:
>  		mnt_drop_write(filp->f_path.mnt);
>  		return err;
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index a567968..46a8de6 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -158,10 +158,11 @@ flags_out:
>  			goto setversion_out;
>  		}
>  
> +		mutex_lock(&inode->i_mutex);
>  		handle = ext4_journal_start(inode, 1);
>  		if (IS_ERR(handle)) {
>  			err = PTR_ERR(handle);
> -			goto setversion_out;
> +			goto unlock_out;
>  		}
>  		err = ext4_reserve_inode_write(handle, inode, &iloc);
>  		if (err == 0) {
> @@ -170,6 +171,9 @@ flags_out:
>  			err = ext4_mark_iloc_dirty(handle, inode, &iloc);
>  		}
>  		ext4_journal_stop(handle);
> +
> +unlock_out:
> +		mutex_unlock(&inode->i_mutex);
>  setversion_out:
>  		mnt_drop_write(filp->f_path.mnt);
>  		return err;
> -- 
> 1.7.1
Andreas Dilger Jan. 4, 2012, 11:15 p.m. UTC | #5
On 2012-01-04, at 10:46 AM, Jan Kara wrote:
> On Tue 03-01-12 02:31:52, Djalal Harouni wrote:
>> 
>> The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex,
>> this can lead to a race with the other operations that update the same
>> inode.
>> 
>> Patch tested.
> 
>  OK, so I've taken the patch into my tree, just updated the changelog
> which result of our discussion in this thread. I also took the ext4 part
> since there is no risk of conflict and the patch looks obvious.

Actually, I'd like to hear more about whether this is a real problem, or
if it is just a theoretical problem found during code inspection or from
some static code analysis tool?

With the metadata checksum feature we were discussing using the inode
generation as part of the seed for the directory leaf block checksum, so
that it wasn't possible to incorrectly access stale directory blocks from
a previous incarnation of the same inode number.

We were discussing just disabling this ioctl on filesystems with metadata
checksums, and printing a deprecation warning for filesystems without that
feature enabled.  I'm not aware of any real-world use for this ioctl, since
NFS cannot use it to reconstruct handles because there's no API to allocate
an inode with a specific number, so setting the generation is pointless.

This ioctl (despite its confusing name) is completely different from the
NFSv4 inode version, which is stored in i_version (and i_version_hi).

Cheers, Andreas

>> Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
>> ---
>> fs/ext3/ioctl.c |    6 +++++-
>> fs/ext4/ioctl.c |    6 +++++-
>> 2 files changed, 10 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
>> index ba1b54e..e7b2ed9 100644
>> --- a/fs/ext3/ioctl.c
>> +++ b/fs/ext3/ioctl.c
>> @@ -134,10 +134,11 @@ flags_out:
>> 			goto setversion_out;
>> 		}
>> 
>> +		mutex_lock(&inode->i_mutex);
>> 		handle = ext3_journal_start(inode, 1);
>> 		if (IS_ERR(handle)) {
>> 			err = PTR_ERR(handle);
>> -			goto setversion_out;
>> +			goto unlock_out;
>> 		}
>> 		err = ext3_reserve_inode_write(handle, inode, &iloc);
>> 		if (err == 0) {
>> @@ -146,6 +147,9 @@ flags_out:
>> 			err = ext3_mark_iloc_dirty(handle, inode, &iloc);
>> 		}
>> 		ext3_journal_stop(handle);
>> +
>> +unlock_out:
>> +		mutex_unlock(&inode->i_mutex);
>> setversion_out:
>> 		mnt_drop_write(filp->f_path.mnt);
>> 		return err;
>> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
>> index a567968..46a8de6 100644
>> --- a/fs/ext4/ioctl.c
>> +++ b/fs/ext4/ioctl.c
>> @@ -158,10 +158,11 @@ flags_out:
>> 			goto setversion_out;
>> 		}
>> 
>> +		mutex_lock(&inode->i_mutex);
>> 		handle = ext4_journal_start(inode, 1);
>> 		if (IS_ERR(handle)) {
>> 			err = PTR_ERR(handle);
>> -			goto setversion_out;
>> +			goto unlock_out;
>> 		}
>> 		err = ext4_reserve_inode_write(handle, inode, &iloc);
>> 		if (err == 0) {
>> @@ -170,6 +171,9 @@ flags_out:
>> 			err = ext4_mark_iloc_dirty(handle, inode, &iloc);
>> 		}
>> 		ext4_journal_stop(handle);
>> +
>> +unlock_out:
>> +		mutex_unlock(&inode->i_mutex);
>> setversion_out:
>> 		mnt_drop_write(filp->f_path.mnt);
>> 		return err;
>> -- 
>> 1.7.1
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Jan. 4, 2012, 11:32 p.m. UTC | #6
On Wed 04-01-12 16:15:04, Andreas Dilger wrote:
> On 2012-01-04, at 10:46 AM, Jan Kara wrote:
> > On Tue 03-01-12 02:31:52, Djalal Harouni wrote:
> >> 
> >> The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex,
> >> this can lead to a race with the other operations that update the same
> >> inode.
> >> 
> >> Patch tested.
> > 
> >  OK, so I've taken the patch into my tree, just updated the changelog
> > which result of our discussion in this thread. I also took the ext4 part
> > since there is no risk of conflict and the patch looks obvious.
> 
> Actually, I'd like to hear more about whether this is a real problem, or
> if it is just a theoretical problem found during code inspection or from
> some static code analysis tool?
  As far as I understood that was just a theoretical issue and I applied
the patch just on the grounds that it is more consistent to use i_mutex for
i_generation changes.

> With the metadata checksum feature we were discussing using the inode
> generation as part of the seed for the directory leaf block checksum, so
> that it wasn't possible to incorrectly access stale directory blocks from
> a previous incarnation of the same inode number.
> 
> We were discussing just disabling this ioctl on filesystems with metadata
> checksums, and printing a deprecation warning for filesystems without that
> feature enabled.  I'm not aware of any real-world use for this ioctl, since
> NFS cannot use it to reconstruct handles because there's no API to allocate
> an inode with a specific number, so setting the generation is pointless.
  OK, I didn't know this. I'm fine with deprecating the ioctl if it's
useless but since that's going to take a while I think the cleanup still
makes some sense.

								Honza
Andreas Dilger Jan. 4, 2012, 11:56 p.m. UTC | #7
On 2012-01-04, at 4:32 PM, Jan Kara wrote:
> On Wed 04-01-12 16:15:04, Andreas Dilger wrote:
>> On 2012-01-04, at 10:46 AM, Jan Kara wrote:
>>> On Tue 03-01-12 02:31:52, Djalal Harouni wrote:
>>>> 
>>>> The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex,
>>>> this can lead to a race with the other operations that update the same
>>>> inode.
>>>> 
>>>> Patch tested.
>>> 
>>> OK, so I've taken the patch into my tree, just updated the changelog
>>> which result of our discussion in this thread. I also took the ext4 part
>>> since there is no risk of conflict and the patch looks obvious.
>> 
>> Actually, I'd like to hear more about whether this is a real problem, or
>> if it is just a theoretical problem found during code inspection or from
>> some static code analysis tool?
> 
> As far as I understood that was just a theoretical issue and I applied
> the patch just on the grounds that it is more consistent to use i_mutex for
> i_generation changes.
> 
>> With the metadata checksum feature we were discussing using the inode
>> generation as part of the seed for the directory leaf block checksum, so
>> that it wasn't possible to incorrectly access stale directory blocks from
>> a previous incarnation of the same inode number.
>> 
>> We were discussing just disabling this ioctl on filesystems with metadata
>> checksums, and printing a deprecation warning for filesystems without that
>> feature enabled.  I'm not aware of any real-world use for this ioctl, since
>> NFS cannot use it to reconstruct handles because there's no API to allocate
>> an inode with a specific number, so setting the generation is pointless.
> 
>  OK, I didn't know this. I'm fine with deprecating the ioctl if it's
> useless but since that's going to take a while I think the cleanup still
> makes some sense.

I'm not against landing the patch, and I agree that there is no question
about the performance impact of making this ioctl safe.  My real question
was whether there was a real-world use for this ioctl which might prevent
it from being deprecated.


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Djalal Harouni Jan. 5, 2012, 12:40 a.m. UTC | #8
On Thu, Jan 05, 2012 at 12:32:54AM +0100, Jan Kara wrote:
> On Wed 04-01-12 16:15:04, Andreas Dilger wrote:
> > On 2012-01-04, at 10:46 AM, Jan Kara wrote:
> > > On Tue 03-01-12 02:31:52, Djalal Harouni wrote:
> > >> 
> > >> The EXT{3,4}_IOC_SETVERSION ioctl() updates the inode without i_mutex,
> > >> this can lead to a race with the other operations that update the same
> > >> inode.
> > >> 
> > >> Patch tested.
> > > 
> > >  OK, so I've taken the patch into my tree, just updated the changelog
> > > which result of our discussion in this thread. I also took the ext4 part
> > > since there is no risk of conflict and the patch looks obvious.
> > 
> > Actually, I'd like to hear more about whether this is a real problem, or
> > if it is just a theoretical problem found during code inspection or from
> > some static code analysis tool?
>   As far as I understood that was just a theoretical issue and I applied
> the patch just on the grounds that it is more consistent to use i_mutex for
> i_generation changes.
This was found using a static code analysis tool (currently a PoC) which
is a part of a research project at our university.

And later, code inspection confirms that i_ctime updates can be disturbed.

I should have specified this. Sorry.

> > With the metadata checksum feature we were discussing using the inode
> > generation as part of the seed for the directory leaf block checksum, so
> > that it wasn't possible to incorrectly access stale directory blocks from
> > a previous incarnation of the same inode number.
> > 
> > We were discussing just disabling this ioctl on filesystems with metadata
> > checksums, and printing a deprecation warning for filesystems without that
> > feature enabled.  I'm not aware of any real-world use for this ioctl, since
> > NFS cannot use it to reconstruct handles because there's no API to allocate
> > an inode with a specific number, so setting the generation is pointless.
>   OK, I didn't know this. I'm fine with deprecating the ioctl if it's
> useless but since that's going to take a while I think the cleanup still
> makes some sense.
Actually I've grepped this ioctl but did not found use cases, but as
ext{3,2} also support it, I did not say anything (this is old, there is
even the EXT4_IOC_SETVERSION_OLD ioctl ?). I don't know if this ioctl is
used or not.

Only the reiserfs and ext{2,3,4} filesystems support this ioctl. The reiserfs
do not use mutexes at all, even in the REISERFS_IOC_SETFLAGS ioctl which will
test and set _all_ the possible values of the i_flags field.
Perhaps I should also send a patch for this ?

And perhaps ext2 should also be updated.

> 								Honza
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
Thanks for the feedback.
Jan Kara Jan. 5, 2012, 11:42 a.m. UTC | #9
On Thu 05-01-12 01:40:09, Djalal Harouni wrote:
> On Thu, Jan 05, 2012 at 12:32:54AM +0100, Jan Kara wrote:
> > > With the metadata checksum feature we were discussing using the inode
> > > generation as part of the seed for the directory leaf block checksum, so
> > > that it wasn't possible to incorrectly access stale directory blocks from
> > > a previous incarnation of the same inode number.
> > > 
> > > We were discussing just disabling this ioctl on filesystems with metadata
> > > checksums, and printing a deprecation warning for filesystems without that
> > > feature enabled.  I'm not aware of any real-world use for this ioctl, since
> > > NFS cannot use it to reconstruct handles because there's no API to allocate
> > > an inode with a specific number, so setting the generation is pointless.
> >   OK, I didn't know this. I'm fine with deprecating the ioctl if it's
> > useless but since that's going to take a while I think the cleanup still
> > makes some sense.
> Actually I've grepped this ioctl but did not found use cases, but as
> ext{3,2} also support it, I did not say anything (this is old, there is
> even the EXT4_IOC_SETVERSION_OLD ioctl ?). I don't know if this ioctl is
> used or not.
> 
> Only the reiserfs and ext{2,3,4} filesystems support this ioctl. The reiserfs
> do not use mutexes at all, even in the REISERFS_IOC_SETFLAGS ioctl which will
> test and set _all_ the possible values of the i_flags field.
> Perhaps I should also send a patch for this ?
  Yes, possibly reiserfs should use i_mutex for that ioctl.

> And perhaps ext2 should also be updated.
  Sure. Send a patch my way when you have it.

								Honza
diff mbox

Patch

diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index ba1b54e..e7b2ed9 100644
--- a/fs/ext3/ioctl.c
+++ b/fs/ext3/ioctl.c
@@ -134,10 +134,11 @@  flags_out:
 			goto setversion_out;
 		}
 
+		mutex_lock(&inode->i_mutex);
 		handle = ext3_journal_start(inode, 1);
 		if (IS_ERR(handle)) {
 			err = PTR_ERR(handle);
-			goto setversion_out;
+			goto unlock_out;
 		}
 		err = ext3_reserve_inode_write(handle, inode, &iloc);
 		if (err == 0) {
@@ -146,6 +147,9 @@  flags_out:
 			err = ext3_mark_iloc_dirty(handle, inode, &iloc);
 		}
 		ext3_journal_stop(handle);
+
+unlock_out:
+		mutex_unlock(&inode->i_mutex);
 setversion_out:
 		mnt_drop_write(filp->f_path.mnt);
 		return err;
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index a567968..46a8de6 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -158,10 +158,11 @@  flags_out:
 			goto setversion_out;
 		}
 
+		mutex_lock(&inode->i_mutex);
 		handle = ext4_journal_start(inode, 1);
 		if (IS_ERR(handle)) {
 			err = PTR_ERR(handle);
-			goto setversion_out;
+			goto unlock_out;
 		}
 		err = ext4_reserve_inode_write(handle, inode, &iloc);
 		if (err == 0) {
@@ -170,6 +171,9 @@  flags_out:
 			err = ext4_mark_iloc_dirty(handle, inode, &iloc);
 		}
 		ext4_journal_stop(handle);
+
+unlock_out:
+		mutex_unlock(&inode->i_mutex);
 setversion_out:
 		mnt_drop_write(filp->f_path.mnt);
 		return err;