ext3: Fix error handling on inode bitmap corruption

Message ID	1323376115-23881-2-git-send-email-jack@suse.cz
State	Not Applicable, archived
Headers	show Return-Path: <linux-ext4-owner@vger.kernel.org> From: Jan Kara <jack@suse.cz> To: linux-ext4@vger.kernel.org Cc: Ted Tso <tytso@mit.edu>, Jan Kara <jack@suse.cz> Subject: [PATCH] ext3: Fix error handling on inode bitmap corruption Date: Thu, 8 Dec 2011 21:28:35 +0100 Message-Id: <1323376115-23881-2-git-send-email-jack@suse.cz> In-Reply-To: <1323376115-23881-1-git-send-email-jack@suse.cz> References: <1323376115-23881-1-git-send-email-jack@suse.cz> Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk

Message ID

1323376115-23881-2-git-send-email-jack@suse.cz

State

Not Applicable, archived

Headers

From: Jan Kara <jack@suse.cz>
To: linux-ext4@vger.kernel.org
Cc: Ted Tso <tytso@mit.edu>, Jan Kara <jack@suse.cz>
Subject: [PATCH] ext3: Fix error handling on inode bitmap corruption
Date: Thu,  8 Dec 2011 21:28:35 +0100
Message-Id: <1323376115-23881-2-git-send-email-jack@suse.cz>
In-Reply-To: <1323376115-23881-1-git-send-email-jack@suse.cz>
References: <1323376115-23881-1-git-send-email-jack@suse.cz>
Sender: linux-ext4-owner@vger.kernel.org
Precedence: bulk

Commit Message

Jan Kara Dec. 8, 2011, 8:28 p.m. UTC

When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext3/ialloc.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

Comments

Eric Sandeen Dec. 8, 2011, 8:46 p.m. UTC | #1

On 12/8/11 2:28 PM, Jan Kara wrote:
> When insert_inode_locked() fails in ext3_new_inode() it most likely
> means inode bitmap got corrupted and we allocated again inode which
> is already in use. Also doing unlock_new_inode() during error recovery
> is wrong since inode does not have I_NEW set. Fix the problem by jumping
> to fail: (instead of fail_drop:) which declares filesystem error and
> does not call unlock_new_inode().
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

I think ext2 could use the same treatment.

BTW, though, have you recently started seeing the issue?  We have
people hitting this when resuming after suspend; it seems likely
that the bitmap did get corrupted though, based on some other
things seen in similar bugs.

-Eric

> ---
>  fs/ext3/ialloc.c |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
> index 5c866e0..adae962 100644
> --- a/fs/ext3/ialloc.c
> +++ b/fs/ext3/ialloc.c
> @@ -525,8 +525,12 @@ got:
>  	if (IS_DIRSYNC(inode))
>  		handle->h_sync = 1;
>  	if (insert_inode_locked(inode) < 0) {
> -		err = -EINVAL;
> -		goto fail_drop;
> +		/*
> +		 * Likely a bitmap corruption causing inode to be allocated
> +		 * twice.
> +		 */
> +		err = -EIO;
> +		goto fail;
>  	}
>  	spin_lock(&sbi->s_next_gen_lock);
>  	inode->i_generation = sbi->s_next_generation++;

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Kara Dec. 8, 2011, 10:28 p.m. UTC | #2

On Thu 08-12-11 14:46:09, Eric Sandeen wrote:
> On 12/8/11 2:28 PM, Jan Kara wrote:
> > When insert_inode_locked() fails in ext3_new_inode() it most likely
> > means inode bitmap got corrupted and we allocated again inode which
> > is already in use. Also doing unlock_new_inode() during error recovery
> > is wrong since inode does not have I_NEW set. Fix the problem by jumping
> > to fail: (instead of fail_drop:) which declares filesystem error and
> > does not call unlock_new_inode().
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> 
> I think ext2 could use the same treatment.
> 
> BTW, though, have you recently started seeing the issue?  We have
> people hitting this when resuming after suspend; it seems likely
> that the bitmap did get corrupted though, based on some other
> things seen in similar bugs.
  Interesting. I've got a report from IBM testing ext3 on SLE11 SP2 kernel
(3.0 based). Their filesystem got damaged (might be HW issue, not sure yet)
and they also observed warnings from unlock_new_inode().

								Honza
> > ---
> >  fs/ext3/ialloc.c |    8 ++++++--
> >  1 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
> > index 5c866e0..adae962 100644
> > --- a/fs/ext3/ialloc.c
> > +++ b/fs/ext3/ialloc.c
> > @@ -525,8 +525,12 @@ got:
> >  	if (IS_DIRSYNC(inode))
> >  		handle->h_sync = 1;
> >  	if (insert_inode_locked(inode) < 0) {
> > -		err = -EINVAL;
> > -		goto fail_drop;
> > +		/*
> > +		 * Likely a bitmap corruption causing inode to be allocated
> > +		 * twice.
> > +		 */
> > +		err = -EIO;
> > +		goto fail;
> >  	}
> >  	spin_lock(&sbi->s_next_gen_lock);
> >  	inode->i_generation = sbi->s_next_generation++;
>

Eric Sandeen Dec. 8, 2011, 10:40 p.m. UTC | #3

On 12/8/11 4:28 PM, Jan Kara wrote:
> On Thu 08-12-11 14:46:09, Eric Sandeen wrote:
>> On 12/8/11 2:28 PM, Jan Kara wrote:
>>> When insert_inode_locked() fails in ext3_new_inode() it most likely
>>> means inode bitmap got corrupted and we allocated again inode which
>>> is already in use. Also doing unlock_new_inode() during error recovery
>>> is wrong since inode does not have I_NEW set. Fix the problem by jumping
>>> to fail: (instead of fail_drop:) which declares filesystem error and
>>> does not call unlock_new_inode().
>>>
>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>
>> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
>>
>> I think ext2 could use the same treatment.
>>
>> BTW, though, have you recently started seeing the issue?  We have
>> people hitting this when resuming after suspend; it seems likely
>> that the bitmap did get corrupted though, based on some other
>> things seen in similar bugs.
>   Interesting. I've got a report from IBM testing ext3 on SLE11 SP2 kernel
> (3.0 based). Their filesystem got damaged (might be HW issue, not sure yet)
> and they also observed warnings from unlock_new_inode().

It may be that it has been failing in other ways, but now we get the WARN_ON
and the long backtrace so it's reported more frequently...

I think there might be a hibernate issue that is causing the underlying
corruption, trying to look into that now.

-Eric

> 								Honza
>>> ---
>>>  fs/ext3/ialloc.c |    8 ++++++--
>>>  1 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
>>> index 5c866e0..adae962 100644
>>> --- a/fs/ext3/ialloc.c
>>> +++ b/fs/ext3/ialloc.c
>>> @@ -525,8 +525,12 @@ got:
>>>  	if (IS_DIRSYNC(inode))
>>>  		handle->h_sync = 1;
>>>  	if (insert_inode_locked(inode) < 0) {
>>> -		err = -EINVAL;
>>> -		goto fail_drop;
>>> +		/*
>>> +		 * Likely a bitmap corruption causing inode to be allocated
>>> +		 * twice.
>>> +		 */
>>> +		err = -EIO;
>>> +		goto fail;
>>>  	}
>>>  	spin_lock(&sbi->s_next_gen_lock);
>>>  	inode->i_generation = sbi->s_next_generation++;
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
index 5c866e0..adae962 100644
--- a/fs/ext3/ialloc.c
+++ b/fs/ext3/ialloc.c
@@ -525,8 +525,12 @@  got:
 	if (IS_DIRSYNC(inode))
 		handle->h_sync = 1;
 	if (insert_inode_locked(inode) < 0) {
-		err = -EINVAL;
-		goto fail_drop;
+		/*
+		 * Likely a bitmap corruption causing inode to be allocated
+		 * twice.
+		 */
+		err = -EIO;
+		goto fail;
 	}
 	spin_lock(&sbi->s_next_gen_lock);
 	inode->i_generation = sbi->s_next_generation++;

ext3: Fix error handling on inode bitmap corruption

Commit Message

Comments

Patch