EXT4 ENOSPC Bug

Message ID	20090216190001.GB11788@mini-me.lan
State	Accepted, archived
Headers	show Return-Path: <linux-ext4-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 8D371DDE00 for <patchwork-incoming@ozlabs.org>; Tue, 17 Feb 2009 06:00:55 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752635AbZBPTAe (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Mon, 16 Feb 2009 14:00:34 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752584AbZBPTAd (ORCPT <rfc822;linux-ext4-outgoing>); Mon, 16 Feb 2009 14:00:33 -0500 Received: from thunk.org ([69.25.196.29]:47527 "EHLO thunker.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751946AbZBPTAc (ORCPT <rfc822;linux-ext4@vger.kernel.org>); Mon, 16 Feb 2009 14:00:32 -0500 Received: from tytso authenticated as tytso by thunker.thunk.org with local (Exim 4.50 #1 (Debian)) id 1LZ8he-0005BT-93; Mon, 16 Feb 2009 14:00:26 -0500 Date: Mon, 16 Feb 2009 14:00:01 -0500 From: Theodore Tso <tytso@mit.edu> To: Andres Freund <andres@anarazel.de>, Alex Buell <alex.buell@munted.org.uk> Cc: adilger@sun.com, LKML <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org, Jonathan Bastien-Filiatrault <joe@x2a.org>, "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Subject: Re: EXT4 ENOSPC Bug Message-ID: <20090216190001.GB11788@mini-me.lan> Mail-Followup-To: Theodore Tso <tytso@mit.edu>, Andres Freund <andres@anarazel.de>, Alex Buell <alex.buell@munted.org.uk>, adilger@sun.com, LKML <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org, Jonathan Bastien-Filiatrault <joe@x2a.org>, "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> References: <20090216162028.3032666a@lithium.local.net> <200811291418.24672.andres@anarazel.de> <200812100108.04163.andres@anarazel.de> <49994FEF.2020908@anarazel.de> <20090216150156.GD22619@mini-me.lan> <499985C7.8010302@anarazel.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090216162028.3032666a@lithium.local.net> <499985C7.8010302@anarazel.de> User-Agent: Mutt/1.5.18 (2008-05-17) X-SA-Exim-Connect-IP: <locally generated> X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on thunker.thunk.org); SAEximRunCond expanded to false Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: <linux-ext4.vger.kernel.org> X-Mailing-List: linux-ext4@vger.kernel.org

Theodore Ts'o Feb. 16, 2009, 7 p.m. UTC

On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
>
> So, yes, seems to be an inode allocation problem.
>

Andres, Alex, others,

I'm pretty sure the ENOSPC problem which you both found is an inode
allocation problem.  Some of you seem to have an easier time
reproducing it than others; could you try this patch, and periodically
scan your system logs for the message "ext4: find_group_flex failed,
fallback succeeded"?  If the problem goes away for you, and you find
the occasional aforemention message in your system log, that will
confirm what I suspect, which is the bug is in fs/ext4/inode.c's
find_group_flex() function.  (If I'm wrong, the fallback code will
activate only when the filesystem is genuinely out of inodes, which
should be very rare.)

More comments are in the patch header.  My current long-term plan for
dealing with this is to enhance find_group_orlov() to and
find_group_other() to understand about flex_bg's.

							- Ted

commit 1012e25b371b203164e4766a98f1e696df68b56d
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Mon Feb 16 13:51:16 2009 -0500

    ext4: Add fallback for find_group_flex

    This is a workaround for find_group_flex() which badly needs to be
    replaced.  One of its problems (besides ignoring the Orlov algorithm)
    is that it is a bit hyperactive about returning failure under
    suspicious circumstances.  This can lead to spurious ENOSPC failures.
    Work around this for now by retrying the search using
    find_group_other() if find_group_flex() returns -1.  If
    find_group_other() succeeds when find_group_flex(), log a warning
    message.  I can't quite find the motivation to spend effort working on
    fixing find_group_flex() given that I want to replace it all anyway
    (and in fact work on the replacement code is underway), so we may
    leave the workaround for as long as find_group_flex() stays in the
    kernel...

    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alex Buell Feb. 17, 2009, 5:21 p.m. UTC | #1

On Mon, 16 Feb 2009 14:00:01 -0500, I waved a wand and this message
magically appears in front of Theodore Tso:

> On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
> >
> > So, yes, seems to be an inode allocation problem.
> 
> Andres, Alex, others,
> 
> I'm pretty sure the ENOSPC problem which you both found is an inode
> allocation problem.  Some of you seem to have an easier time
> reproducing it than others; could you try this patch, and periodically
> scan your system logs for the message "ext4: find_group_flex failed,
> fallback succeeded"?  If the problem goes away for you, and you find
> the occasional aforemention message in your system log, that will
> confirm what I suspect, which is the bug is in fs/ext4/inode.c's
> find_group_flex() function.  (If I'm wrong, the fallback code will
> activate only when the filesystem is genuinely out of inodes, which
> should be very rare.)

OK, I had to go look through the archives on linux-ext4 mailing list to
see what the context was. For myself, this used to happen at least once
a week with 2.6.26, and less frequently with 2.6.27. I think that 2.6.28
with your patch should get rid of that problem altogether. I will of
course get in touch should I see any more of these find_group_flex
failures as that would mean your patch worked. 

Thanks for your work on tracking this one down!

Andres Freund Feb. 17, 2009, 5:36 p.m. UTC | #2

On 02/16/2009 08:00 PM, Theodore Tso wrote:
> On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
>> So, yes, seems to be an inode allocation problem.

> I'm pretty sure the ENOSPC problem which you both found is an inode
> allocation problem.  Some of you seem to have an easier time
> reproducing it than others; could you try this patch, and periodically
> scan your system logs for the message "ext4: find_group_flex failed,
> fallback succeeded"?  If the problem goes away for you, and you find
> the occasional aforemention message in your system log, that will
> confirm what I suspect, which is the bug is in fs/ext4/inode.c's
> find_group_flex() function.  (If I'm wrong, the fallback code will
> activate only when the filesystem is genuinely out of inodes, which
> should be very rare.)
>
> More comments are in the patch header.  My current long-term plan for
> dealing with this is to enhance find_group_orlov() to and
> find_group_other() to understand about flex_bg's.
Ok. I am now running with the patch enabled on two machines - but as the 
issue occured only 2 times in nearly 2 months on two machines...

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Sandeen Feb. 17, 2009, 6:13 p.m. UTC | #3

Theodore Tso wrote:
> On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
>> So, yes, seems to be an inode allocation problem.
>>
> 
> Andres, Alex, others,
> 
> I'm pretty sure the ENOSPC problem which you both found is an inode
> allocation problem.  Some of you seem to have an easier time
> reproducing it than others; could you try this patch, and periodically
> scan your system logs for the message "ext4: find_group_flex failed,
> fallback succeeded"?  If the problem goes away for you, and you find
> the occasional aforemention message in your system log, that will
> confirm what I suspect, which is the bug is in fs/ext4/inode.c's
> find_group_flex() function.  (If I'm wrong, the fallback code will
> activate only when the filesystem is genuinely out of inodes, which
> should be very rare.)
> 
> More comments are in the patch header.  My current long-term plan for
> dealing with this is to enhance find_group_orlov() to and
> find_group_other() to understand about flex_bg's.

Ok, I finally got to where I can reliably hit this.  Just as I was about
to install an ext4 with this patch in place, and the bug was preventing
the new initrd creation ;)  But worked around that, and:

ext4: find_group_flex failed, fallback succeeded dir 258402
ext4: find_group_flex failed, fallback succeeded dir 258402
ext4: find_group_flex failed, fallback succeeded dir 258402
ext4: find_group_flex failed, fallback succeeded dir 258402
....

I'll see if I can dig a bit more as to why the find_group_flex failed,
if you think it's worth it, Ted.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Sandeen Feb. 17, 2009, 8:08 p.m. UTC | #4

Eric Sandeen wrote:
> Theodore Tso wrote:
>> On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
>>> So, yes, seems to be an inode allocation problem.
>>>
>> Andres, Alex, others,
>>
>> I'm pretty sure the ENOSPC problem which you both found is an inode
>> allocation problem.  Some of you seem to have an easier time
>> reproducing it than others; could you try this patch, and periodically
>> scan your system logs for the message "ext4: find_group_flex failed,
>> fallback succeeded"?  If the problem goes away for you, and you find
>> the occasional aforemention message in your system log, that will
>> confirm what I suspect, which is the bug is in fs/ext4/inode.c's
>> find_group_flex() function.  (If I'm wrong, the fallback code will
>> activate only when the filesystem is genuinely out of inodes, which
>> should be very rare.)
>>
>> More comments are in the patch header.  My current long-term plan for
>> dealing with this is to enhance find_group_orlov() to and
>> find_group_other() to understand about flex_bg's.
> 
> Ok, I finally got to where I can reliably hit this.  Just as I was about
> to install an ext4 with this patch in place, and the bug was preventing
> the new initrd creation ;)  But worked around that, and:
> 
> ext4: find_group_flex failed, fallback succeeded dir 258402
> ext4: find_group_flex failed, fallback succeeded dir 258402
> ext4: find_group_flex failed, fallback succeeded dir 258402
> ext4: find_group_flex failed, fallback succeeded dir 258402
> ....
> 
> I'll see if I can dig a bit more as to why the find_group_flex failed,
> if you think it's worth it, Ted.

FWIW my problem seems to be different than others have encountered; mine
persists past reboot, while other reporters have said that a reboot
(remount) makes the problem go away.

I seem to be encountering some silliness in find_group_flex when 2 out
of 3 groups are full (I "only" have 55k inodes left, all in the last group).

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o Feb. 17, 2009, 10 p.m. UTC | #5

On Tue, Feb 17, 2009 at 02:08:21PM -0600, Eric Sandeen wrote:
> FWIW my problem seems to be different than others have encountered; mine
> persists past reboot, while other reporters have said that a reboot
> (remount) makes the problem go away.

It might or might not be the same problem, since the reporters were
doing this on a mounted root partition, and on a filesystem quite a
bit larger than your test filesystem; so it could be that the act of
shutting down and rebooting created/deleted various pid files, and
purturbed the filesystem to make the problem go away.

The other possibility is that it is the flex_bg specific counters
which were introduced specifically for find_group_flex.  I'm not wild
about them since they mean we have to take an extra flex_bg specific
spin lock for every block and inode allocation.  The Orlov algorithm
only needs the information when allocating directories, and since
those are rarer than file allocations, I think it should be OK to
simply sum up the necessary fields at directory allocation time
instead of trying to maintain separate counters (which could possibly
get corrupted, although I couldn't see a way that they could be
getting out of sync with reality).

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alex Buell Feb. 17, 2009, 10:30 p.m. UTC | #6

On Tue, 17 Feb 2009 14:08:21 -0600, I waved a wand and this message
magically appears in front of Eric Sandeen:

> FWIW my problem seems to be different than others have encountered;
> mine persists past reboot, while other reporters have said that a
> reboot (remount) makes the problem go away.
> 
> I seem to be encountering some silliness in find_group_flex when 2 out
> of 3 groups are full (I "only" have 55k inodes left, all in the last
> group).

I've discovered a forced fsck clears this. HTH.

Eric Sandeen Feb. 17, 2009, 10:56 p.m. UTC | #7

Alex Buell wrote:
> On Tue, 17 Feb 2009 14:08:21 -0600, I waved a wand and this message
> magically appears in front of Eric Sandeen:
> 
>> FWIW my problem seems to be different than others have encountered;
>> mine persists past reboot, while other reporters have said that a
>> reboot (remount) makes the problem go away.
>>
>> I seem to be encountering some silliness in find_group_flex when 2 out
>> of 3 groups are full (I "only" have 55k inodes left, all in the last
>> group).
> 
> I've discovered a forced fsck clears this. HTH.

Do you have the output of the fsck run?

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alex Buell Feb. 17, 2009, 10:59 p.m. UTC | #8

On Tue, 17 Feb 2009 16:56:23 -0600, I waved a wand and this message
magically appears in front of Eric Sandeen:

> Alex Buell wrote:
> > On Tue, 17 Feb 2009 14:08:21 -0600, I waved a wand and this message
> > magically appears in front of Eric Sandeen:
> > 
> >> FWIW my problem seems to be different than others have encountered;
> >> mine persists past reboot, while other reporters have said that a
> >> reboot (remount) makes the problem go away.
> >>
> >> I seem to be encountering some silliness in find_group_flex when 2
> >> out of 3 groups are full (I "only" have 55k inodes left, all in
> >> the last group).
> > 
> > I've discovered a forced fsck clears this. HTH.
> 
> Do you have the output of the fsck run?

I'm afraid not, there's no way to save the output on a forced fsck
reboot.

Andres Freund Feb. 18, 2009, 9:18 p.m. UTC | #9

Hi All,

On 02/17/2009 06:36 PM, Andres Freund wrote:
> On 02/16/2009 08:00 PM, Theodore Tso wrote:
>> On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
>>> So, yes, seems to be an inode allocation problem.
>> I'm pretty sure the ENOSPC problem which you both found is an inode
>> allocation problem. Some of you seem to have an easier time
>> reproducing it than others; could you try this patch, and periodically
>> scan your system logs for the message "ext4: find_group_flex failed,
>> fallback succeeded"? If the problem goes away for you, and you find
>> the occasional aforemention message in your system log, that will
>> confirm what I suspect, which is the bug is in fs/ext4/inode.c's
>> find_group_flex() function. (If I'm wrong, the fallback code will
>> activate only when the filesystem is genuinely out of inodes, which
>> should be very rare.)
>> More comments are in the patch header. My current long-term plan for
>> dealing with this is to enhance find_group_orlov() to and
>> find_group_other() to understand about flex_bg's.
> Ok. I am now running with the patch enabled on two machines - but as the
> issue occured only 2 times in nearly 2 months on two machines...
Didn't take that long:
On one of the machines I got several thousand of:

[10379.575904] ext4: find_group_flex failed, fallback succeeded dir 416319
[10379.576002] ext4: find_group_flex failed, fallback succeeded dir 416319
[10379.579981] ext4: find_group_flex failed, fallback succeeded dir 416319
[10379.580097] ext4: find_group_flex failed, fallback succeeded dir 416319
(with different directories)

No userspace visible behaviour.

So it seems you were right. It seems sensible to put that patch without 
printk in the kernel until the issue is fully solved...


Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o Feb. 18, 2009, 9:29 p.m. UTC | #10

On Wed, Feb 18, 2009 at 10:18:57PM +0100, Andres Freund wrote:
> On one of the machines I got several thousand of:
>
> [10379.575904] ext4: find_group_flex failed, fallback succeeded dir 416319
> [10379.576002] ext4: find_group_flex failed, fallback succeeded dir 416319
> [10379.579981] ext4: find_group_flex failed, fallback succeeded dir 416319
> [10379.580097] ext4: find_group_flex failed, fallback succeeded dir 416319
> (with different directories)

Ok, that's good.  Good to know the workaround works.  

Can you send me a dumpe2fs of the filesystem in question?  I'm curious
what was going on...

> No userspace visible behaviour.
>
> So it seems you were right. It seems sensible to put that patch without  
> printk in the kernel until the issue is fully solved...

Thanks for the report.  I'll push the workaround patch to Linus for
2.6.29 to avoid this problem for now.  I recently sent to linux-ext4
for comment a patch to revamp the Orlov allocator for flex_bg and to
use that instead of find_group_flex(), but no way that's going into
2.6.29 at this point....

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andres Freund Feb. 19, 2009, 2:18 a.m. UTC | #11

On 02/18/2009 10:29 PM, Theodore Tso wrote:
> Ok, that's good.  Good to know the workaround works.
> Can you send me a dumpe2fs of the filesystem in question?  I'm curious
> what was going on...
Will do as soon as I am at the same place as the machine. I guess thats 
only interesting to you privately (size and so on)?

> Thanks for the report.  I'll push the workaround patch to Linus for
> 2.6.29 to avoid this problem for now.  I recently sent to linux-ext4
> for comment a patch to revamp the Orlov allocator for flex_bg and to
> use that instead of find_group_flex(), but no way that's going into
> 2.6.29 at this point....
Would it be helpfull if I test that patch?

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o Feb. 19, 2009, 3:22 a.m. UTC | #12

On Thu, Feb 19, 2009 at 03:18:45AM +0100, Andres Freund wrote:
> On 02/18/2009 10:29 PM, Theodore Tso wrote:
>> Ok, that's good.  Good to know the workaround works.
>> Can you send me a dumpe2fs of the filesystem in question?  I'm curious
>> what was going on...
> Will do as soon as I am at the same place as the machine. I guess thats  
> only interesting to you privately (size and so on)?
>
>> Thanks for the report.  I'll push the workaround patch to Linus for
>> 2.6.29 to avoid this problem for now.  I recently sent to linux-ext4
>> for comment a patch to revamp the Orlov allocator for flex_bg and to
>> use that instead of find_group_flex(), but no way that's going into
>> 2.6.29 at this point....
> Would it be helpfull if I test that patch?
>

Sure, I'll take all of the testing I can get.  :-)

The patch is in the ext4 patch queue, and I sent them to the ext4
patch queue.  The patch is also in patch work:

http://patchwork.ozlabs.org/patch/23343/

The patch which I sent you earlier (available below) is a prequisite:

http://patchwork.ozlabs.org/patch/23228/


							- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Sandeen Feb. 19, 2009, 3:46 p.m. UTC | #13

Theodore Tso wrote:
> On Thu, Feb 19, 2009 at 03:18:45AM +0100, Andres Freund wrote:
>> On 02/18/2009 10:29 PM, Theodore Tso wrote:
>>> Ok, that's good.  Good to know the workaround works.
>>> Can you send me a dumpe2fs of the filesystem in question?  I'm curious
>>> what was going on...
>> Will do as soon as I am at the same place as the machine. I guess thats  
>> only interesting to you privately (size and so on)?
>>
>>> Thanks for the report.  I'll push the workaround patch to Linus for
>>> 2.6.29 to avoid this problem for now.  I recently sent to linux-ext4
>>> for comment a patch to revamp the Orlov allocator for flex_bg and to
>>> use that instead of find_group_flex(), but no way that's going into
>>> 2.6.29 at this point....
>> Would it be helpfull if I test that patch?
>>
> 
> Sure, I'll take all of the testing I can get.  :-)
> 
> The patch is in the ext4 patch queue, and I sent them to the ext4
> patch queue.  The patch is also in patch work:
> 
> http://patchwork.ozlabs.org/patch/23343/
> 
> The patch which I sent you earlier (available below) is a prequisite:
> 
> http://patchwork.ozlabs.org/patch/23228/

Ted, I hope the printk will be removed or at least ratelimited before it
gets upstream?

Thanks,
-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Theodore Ts'o Feb. 23, 2009, 2:02 a.m. UTC | #14

On Thu, Feb 19, 2009 at 09:46:51AM -0600, Eric Sandeen wrote:
> 
> Ted, I hope the printk will be removed or at least ratelimited before it
> gets upstream?
> 

Yes, I've added a ratelimit.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andres Freund Feb. 27, 2009, 3:57 a.m. UTC | #15

On 02/18/2009 10:29 PM, Theodore Tso wrote:
>> [10379.575904] ext4: find_group_flex failed, fallback succeeded dir 416319
>> [10379.576002] ext4: find_group_flex failed, fallback succeeded dir 416319
>> [10379.579981] ext4: find_group_flex failed, fallback succeeded dir 416319
>> [10379.580097] ext4: find_group_flex failed, fallback succeeded dir 416319
>> (with different directories)
> Can you send me a dumpe2fs of the filesystem in question?  I'm curious
> what was going on...
Unfortunately the system was rebooted, before I had the chance to do the 
dump - since then the problem has not reemerged.
Would a dump after reboot still be usefull?

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Commit Message

Comments

Patch