diff mbox

EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan inode list

Message ID 4E66478E.90102@redhat.com
State Not Applicable, archived
Headers show

Commit Message

Eric Sandeen Sept. 6, 2011, 4:17 p.m. UTC
On 9/2/11 4:00 PM, Christian Kujau wrote:
> Hi,
> 
> for some time now, the following message keeps pooping up in my logs:
> 
>  > EXT4-fs (dm-1): Couldn't remount RDWR because of unprocessed orphan 
>  > inode list.  Please umount/remount instead
> 
> I don't know when it started. Maybe 2 months ago, I'd guess. The ext4 fs 
> is on top of a dm-crypt device, attached via firewire to a 1TB external 
> disk enclosure. The system (powerpc 32) is loosely following vanilla 
> kernels, currently running 3.1.0-rc4.
> 
> The filesystem is normally mounted r/o but remounted r/w every day to 
> receive backups, then remounted r/o again. Running e2fsck-1.41.12 with -n 
> on the r/o-mounted devices gives the output below.
> 
> I've unmounted the disk some weeks ago, ran e2fsck for real and it fixed 
> the errors. But now more errors seem to have occured.
> 
> Anyone got an idea why this keeps happening? Bad memory? Bad cables? Disk? 
> No other hardware related errors are in the logs and the box is otherwise 
> quite stable.

It's probably not a bug or flaw; orphan inodes can occur for legitimate
reasons (fs goes down while someone is holding open an unlinked file),
and then they must be cleaned up.  If orphan inode processing was skipped
for some reason on the original mount, you can get this error.

Did you happen to also get a message like this on the original mount?

        if (bdev_read_only(sb->s_bdev)) {
                ext4_msg(sb, KERN_ERR, "write access "
                        "unavailable, skipping orphan cleanup");
                return;
        }

?

See also commit: 

commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd
Author: Eric Sandeen <sandeen@redhat.com>
Date:   Sat Feb 10 01:46:08 2007 -0800

    [PATCH] ext4: refuse ro to rw remount of fs with orphan inodes
    
    In the rare case where we have skipped orphan inode processing due to a
    readonly block device, and the block device subsequently changes back to
    read-write, disallow a remount,rw transition of the filesystem when we have an
    unprocessed orphan inodes as this would corrupt the list.
    
    Ideally we should process the orphan inode list during the remount, but that's
    trickier, and this plugs the hole for now.
    
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Cc: "Stephen C. Tweedie" <sct@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


-Eric

> Thanks,
> Christian.
> 
> -------------------------------------------------
> # fsck.ext4 -vnf /dev/mapper/wdc0 
> e2fsck 1.41.12 (17-May-2010)
> Warning!  /dev/mapper/wdc0 is mounted.
> Pass 1: Checking inodes, blocks, and sizes
> Inodes that were part of a corrupted orphan linked list found.  Fix? no
> 
> Inode 16385 was part of the orphaned inode list.  IGNORED.
> Deleted inode 16439 has zero dtime.  Fix? no
> 
> Inode 2260993 was part of the orphaned inode list.  IGNORED.
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences:  -(9298--9299) -8921121
> Fix? no
> 
> Inode bitmap differences:  -16385 -16439 -2260993
> Fix? no
> 
> Directories count wrong for group #2 (72, counted=70).
> Fix? no
> 
> Directories count wrong for group #276 (76, counted=75).
> Fix? no
> 
> /dev/mapper/wdc0: ********** WARNING: Filesystem still has errors **********
> 
> 
>   562145 inodes used (0.92%)
>     3952 non-contiguous files (0.7%)
>     2544 non-contiguous directories (0.5%)
>          # of inodes with ind/dind/tind blocks: 0/0/0
>          Extent depth histogram: 555449/932
> 215226375 blocks used (88.14%)
>        0 bad blocks
>        5 large files
> 
>   353453 regular files
>   202351 directories
>        0 character device files
>        0 block device files
>        7 fifos
>  2491335 links
>     6235 symbolic links (5657 fast symbolic links)
>       87 sockets
> --------
>  3053468 files
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christian Kujau Sept. 6, 2011, 4:37 p.m. UTC | #1
On Tue, 6 Sep 2011 at 11:17, Eric Sandeen wrote:
> It's probably not a bug or flaw; orphan inodes can occur for legitimate
> reasons (fs goes down while someone is holding open an unlinked file),

The filesystem is being constantly accessed by an application, holding at 
least one file open (readonly). And then there is this mechanism trying to 
remount the filesystem rw and then ro again every day. I guess this equals
the scenario of "fs goes down (remount!) while someone is holding open a 
file"?

> Did you happen to also get a message like this on the original mount?
>                 ext4_msg(sb, KERN_ERR, "write access "
>                         "unavailable, skipping orphan cleanup");

I think I've seen this message before, but I'm nore sure where and it's 
not in the logs of this particular system.

> See also commit: 
>
> commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd
> Author: Eric Sandeen <sandeen@redhat.com>
> Date:   Sat Feb 10 01:46:08 2007 -0800
> 
>     [PATCH] ext4: refuse ro to rw remount of fs with orphan inodes

Yes, I've seen this commit when I was searching where this message came
from. And I think I understand now why this is happening, but 
still...if I may ask: can't this be handled more elegantly? Do other 
filesystems have the same problem?

Right now the procedure is to pause the application, stop the nfs exports,
unmount, fsck, mount, start nfs exports and resume the application. And
every few days/weeks this has to be repeated, "just because" these daily
remounts occur (which are the main reason for this, I suppose).

Thanks for replying,
Christian.
Eric Sandeen Sept. 6, 2011, 4:44 p.m. UTC | #2
On 9/6/11 11:37 AM, Christian Kujau wrote:
> On Tue, 6 Sep 2011 at 11:17, Eric Sandeen wrote:
>> It's probably not a bug or flaw; orphan inodes can occur for legitimate
>> reasons (fs goes down while someone is holding open an unlinked file),
> 
> The filesystem is being constantly accessed by an application, holding at 
> least one file open (readonly). And then there is this mechanism trying to 
> remount the filesystem rw and then ro again every day. I guess this equals
> the scenario of "fs goes down (remount!) while someone is holding open a 
> file"?

well, no - "goes down" means "crashed or lost power"

>> Did you happen to also get a message like this on the original mount?
>>                 ext4_msg(sb, KERN_ERR, "write access "
>>                         "unavailable, skipping orphan cleanup");
> 
> I think I've seen this message before, but I'm nore sure where and it's 
> not in the logs of this particular system.
> 
>> See also commit: 
>>
>> commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd
>> Author: Eric Sandeen <sandeen@redhat.com>
>> Date:   Sat Feb 10 01:46:08 2007 -0800
>>
>>     [PATCH] ext4: refuse ro to rw remount of fs with orphan inodes
> 
> Yes, I've seen this commit when I was searching where this message came
> from. And I think I understand now why this is happening, but 
> still...if I may ask: can't this be handled more elegantly? Do other 
> filesystems have the same problem?

well, as the commit said, it'd be nice to handle it in remount, yes... :(

> Right now the procedure is to pause the application, stop the nfs exports,
> unmount, fsck, mount, start nfs exports and resume the application. And
> every few days/weeks this has to be repeated, "just because" these daily
> remounts occur (which are the main reason for this, I suppose).

well, seems like you need to get to the root cause of the unprocessed
orphan inodes.

I don't yet have my post-vacation thinking cap back on... does cycling
rw/ro/rw/ro with open & unlinked files cause an orphan inode situation?

-Eric

> Thanks for replying,
> Christian.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Kujau Sept. 6, 2011, 6:14 p.m. UTC | #3
On Tue, 6 Sep 2011 at 11:44, Eric Sandeen wrote:
> > remount the filesystem rw and then ro again every day. I guess this equals
> > the scenario of "fs goes down (remount!) while someone is holding open a 
> > file"?
> 
> well, no - "goes down" means "crashed or lost power"

Hm, the machine and its storage is online all the time and the messages 
occur inbetween downtimes.

> well, as the commit said, it'd be nice to handle it in remount, yes... :(

If my daily remounts are causing this, it's unforuntate. But it's nice to 
know that now. It'd be more worrying that someting else is slowly 
corrupting the fs.

> well, seems like you need to get to the root cause of the unprocessed
> orphan inodes.
> 
> I don't yet have my post-vacation thinking cap back on... does cycling
> rw/ro/rw/ro with open & unlinked files cause an orphan inode situation?

This is almost all I do on this fs. The whole process is:

1) fs is ro most of the time, while a remote application accesses it via 
   a readonly nfs mount.
2) once a day the fs gets remounted rw (the remote application does not 
   know this and is still accessing the fs via the same ro-nfs mount
3) backups are being pushed to the fs (via rsync, using hardlinks a lot)
4) fs is remounted ro again
5) at some point the remote application notices that the nfs mount went
   stale and has to remount its readonly nfs-mount

Thanks,
Christian.
Jan Kara Sept. 8, 2011, 6:51 p.m. UTC | #4
On Tue 06-09-11 11:44:45, Eric Sandeen wrote:
> On 9/6/11 11:37 AM, Christian Kujau wrote:
> > On Tue, 6 Sep 2011 at 11:17, Eric Sandeen wrote:
> >> It's probably not a bug or flaw; orphan inodes can occur for legitimate
> >> reasons (fs goes down while someone is holding open an unlinked file),
> > 
> > The filesystem is being constantly accessed by an application, holding at 
> > least one file open (readonly). And then there is this mechanism trying to 
> > remount the filesystem rw and then ro again every day. I guess this equals
> > the scenario of "fs goes down (remount!) while someone is holding open a 
> > file"?
> 
> well, no - "goes down" means "crashed or lost power"
> 
> >> Did you happen to also get a message like this on the original mount?
> >>                 ext4_msg(sb, KERN_ERR, "write access "
> >>                         "unavailable, skipping orphan cleanup");
> > 
> > I think I've seen this message before, but I'm nore sure where and it's 
> > not in the logs of this particular system.
> > 
> >> See also commit: 
> >>
> >> commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd
> >> Author: Eric Sandeen <sandeen@redhat.com>
> >> Date:   Sat Feb 10 01:46:08 2007 -0800
> >>
> >>     [PATCH] ext4: refuse ro to rw remount of fs with orphan inodes
> > 
> > Yes, I've seen this commit when I was searching where this message came
> > from. And I think I understand now why this is happening, but 
> > still...if I may ask: can't this be handled more elegantly? Do other 
> > filesystems have the same problem?
> 
> well, as the commit said, it'd be nice to handle it in remount, yes... :(
> 
> > Right now the procedure is to pause the application, stop the nfs exports,
> > unmount, fsck, mount, start nfs exports and resume the application. And
> > every few days/weeks this has to be repeated, "just because" these daily
> > remounts occur (which are the main reason for this, I suppose).
> 
> well, seems like you need to get to the root cause of the unprocessed
> orphan inodes.
> 
> I don't yet have my post-vacation thinking cap back on... does cycling
> rw/ro/rw/ro with open & unlinked files cause an orphan inode situation?
  There's race where VFS remount code can race with unlink and result will
be unlinked file in orphan list on read-only filesystem. Christian seems to
be hitting this race. Miklos Szeredi has patches
(http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
mostly close this hole but they're waiting for Al to find time to look at
them / merge them AFAIK.

								Honza
Christian Kujau Sept. 10, 2011, 1:11 a.m. UTC | #5
On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
> There's race where VFS remount code can race with unlink and result will
> be unlinked file in orphan list on read-only filesystem. Christian seems to
> be hitting this race. Miklos Szeredi has patches
> (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
> mostly close this hole but they're waiting for Al to find time to look at
> them / merge them AFAIK.

While these patches are still pending review, are they "dangerous" to 
apply? If not, I'd like to volunteer as a tester :-)

Christian.
Jan Kara Sept. 10, 2011, 8:04 p.m. UTC | #6
On Fri 09-09-11 18:11:26, Christian Kujau wrote:
> On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
> > There's race where VFS remount code can race with unlink and result will
> > be unlinked file in orphan list on read-only filesystem. Christian seems to
> > be hitting this race. Miklos Szeredi has patches
> > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
> > mostly close this hole but they're waiting for Al to find time to look at
> > them / merge them AFAIK.
> 
> While these patches are still pending review, are they "dangerous" to 
> apply? If not, I'd like to volunteer as a tester :-)
  As far as I saw them, they should be pretty safe. So feel free to test
them.

									Honza
Christian Kujau Sept. 13, 2011, 4:52 a.m. UTC | #7
On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote:
> On Fri 09-09-11 18:11:26, Christian Kujau wrote:
> > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
> > > There's race where VFS remount code can race with unlink and result will
> > > be unlinked file in orphan list on read-only filesystem. Christian seems to
> > > be hitting this race. Miklos Szeredi has patches
> > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
> > > mostly close this hole but they're waiting for Al to find time to look at
> > > them / merge them AFAIK.
> > 
> > While these patches are still pending review, are they "dangerous" to 
> > apply? If not, I'd like to volunteer as a tester :-)
>   As far as I saw them, they should be pretty safe. So feel free to test
> them.

I've applied them to -rc5. It might take a few days untile the message 
occurs. Or, until "nothing happens", since I have the patches applied :-)

Meanwhile I'm trying to reproduce this issue on an x86 machine, but 
haven't succeeded yet.

Thanks,
Christian.
Christian Kujau Sept. 16, 2011, 3:49 a.m. UTC | #8
On Mon, 12 Sep 2011 at 21:52, Christian Kujau wrote:
> On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote:
> > On Fri 09-09-11 18:11:26, Christian Kujau wrote:
> > > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
> > > > There's race where VFS remount code can race with unlink and result will
> > > > be unlinked file in orphan list on read-only filesystem. Christian seems to
> > > > be hitting this race. Miklos Szeredi has patches
> > > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
> > > > mostly close this hole but they're waiting for Al to find time to look at
> > > > them / merge them AFAIK.
> > > 
> > > While these patches are still pending review, are they "dangerous" to 
> > > apply? If not, I'd like to volunteer as a tester :-)
> >   As far as I saw them, they should be pretty safe. So feel free to test
> > them.
> 
> I've applied them to -rc5. It might take a few days untile the message 
> occurs. Or, until "nothing happens", since I have the patches applied :-)

With Miklos' patches applied to -rc5, this happend again just now :-(

> Meanwhile I'm trying to reproduce this issue on an x86 machine, but 
> haven't succeeded yet.

After a ~3k remounts with constantly reading from the filesystem in 
question[0], I still was NOT able to reproduce this on an x86 VM :(

Any ideas?

Thanks,
Christian.

[0] http://nerdbynature.de/bits/3.1-rc4/ext4/
Amir Goldstein Sept. 16, 2011, 12:04 p.m. UTC | #9
On Fri, Sep 16, 2011 at 6:49 AM, Christian Kujau <lists@nerdbynature.de> wrote:
> On Mon, 12 Sep 2011 at 21:52, Christian Kujau wrote:
>> On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote:
>> > On Fri 09-09-11 18:11:26, Christian Kujau wrote:
>> > > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
>> > > > There's race where VFS remount code can race with unlink and result will
>> > > > be unlinked file in orphan list on read-only filesystem. Christian seems to
>> > > > be hitting this race. Miklos Szeredi has patches
>> > > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
>> > > > mostly close this hole but they're waiting for Al to find time to look at
>> > > > them / merge them AFAIK.
>> > >
>> > > While these patches are still pending review, are they "dangerous" to
>> > > apply? If not, I'd like to volunteer as a tester :-)
>> >   As far as I saw them, they should be pretty safe. So feel free to test
>> > them.
>>
>> I've applied them to -rc5. It might take a few days untile the message
>> occurs. Or, until "nothing happens", since I have the patches applied :-)
>
> With Miklos' patches applied to -rc5, this happend again just now :-(
>
>> Meanwhile I'm trying to reproduce this issue on an x86 machine, but
>> haven't succeeded yet.
>
> After a ~3k remounts with constantly reading from the filesystem in
> question[0], I still was NOT able to reproduce this on an x86 VM :(
>
> Any ideas?
>

This is just a shot in the dark, but are you using Ubuntu on your
production machine by any chance?
The reason I am asking is becasue I have been getting failures to
umount fs, while running xfstests on ext4
with Ubuntu for a long time and nobody else seems to share this problem.

I always suspected Ubuntu has some service that keeps open handles on
mounted fs, but I never got
to examine this.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christian Kujau Sept. 16, 2011, 12:17 p.m. UTC | #10
On Fri, 16 Sep 2011 at 15:04, Amir Goldstein wrote:
> This is just a shot in the dark, but are you using Ubuntu on your
> production machine by any chance?

No, I'm using Debian/stable on the "production" machine (the powerpc32 
box, where the error occurs). I was trying to reproduce this in a x86 
Ubuntu/10.04 VM, but could not.

> The reason I am asking is becasue I have been getting failures to 
> umount fs, while running xfstests on ext4
> with Ubuntu for a long time and nobody else seems to share this problem.

Is there a bug open for that?

> I always suspected Ubuntu has some service that keeps open handles on
> mounted fs, but I never got to examine this.

Yeah...their "server" version needs major surgery to disable all the 
bells and whistles before it's becoming usable.

Christian.
Amir Goldstein Sept. 16, 2011, 12:36 p.m. UTC | #11
On Fri, Sep 16, 2011 at 3:17 PM, Christian Kujau <lists@nerdbynature.de> wrote:
> On Fri, 16 Sep 2011 at 15:04, Amir Goldstein wrote:
>> This is just a shot in the dark, but are you using Ubuntu on your
>> production machine by any chance?
>
> No, I'm using Debian/stable on the "production" machine (the powerpc32
> box, where the error occurs). I was trying to reproduce this in a x86
> Ubuntu/10.04 VM, but could not.

Actually, now I recall that Yongqiang did say he saw the same problem on Debian,
but I may be wrong.

>
>> The reason I am asking is becasue I have been getting failures to
>> umount fs, while running xfstests on ext4
>> with Ubuntu for a long time and nobody else seems to share this problem.
>
> Is there a bug open for that?
>

No, I couldn't find any trace of bug reports on this behavior, so I wrote it off
as "miss-configuration" of my server.
I did write to xfs list to ask if anyone else has seen this problem.
You could try to run xfstests on your server and see if the problem is
reproducible.

>> I always suspected Ubuntu has some service that keeps open handles on
>> mounted fs, but I never got to examine this.
>
> Yeah...their "server" version needs major surgery to disable all the
> bells and whistles before it's becoming usable.
>

And I installed the "desktop" version, so where does this leave me...

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Oct. 5, 2011, 6:03 p.m. UTC | #12
On Thu 15-09-11 20:49:19, Christian Kujau wrote:
> On Mon, 12 Sep 2011 at 21:52, Christian Kujau wrote:
> > On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote:
> > > On Fri 09-09-11 18:11:26, Christian Kujau wrote:
> > > > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote:
> > > > > There's race where VFS remount code can race with unlink and result will
> > > > > be unlinked file in orphan list on read-only filesystem. Christian seems to
> > > > > be hitting this race. Miklos Szeredi has patches
> > > > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to
> > > > > mostly close this hole but they're waiting for Al to find time to look at
> > > > > them / merge them AFAIK.
> > > > 
> > > > While these patches are still pending review, are they "dangerous" to 
> > > > apply? If not, I'd like to volunteer as a tester :-)
> > >   As far as I saw them, they should be pretty safe. So feel free to test
> > > them.
> > 
> > I've applied them to -rc5. It might take a few days untile the message 
> > occurs. Or, until "nothing happens", since I have the patches applied :-)
> 
> With Miklos' patches applied to -rc5, this happend again just now :-(
  Thanks for careful testing! Hmm, since you are able to reproduce on ppc
but not on x86 there might be some memory ordering bug in Miklos' patches
or it's simply because of different timing. Miklos, care to debug this
further?

> > Meanwhile I'm trying to reproduce this issue on an x86 machine, but 
> > haven't succeeded yet.
> 
> After a ~3k remounts with constantly reading from the filesystem in 
> question[0], I still was NOT able to reproduce this on an x86 VM :(
> 
> Any ideas?


								Honza
Christian Kujau Oct. 6, 2011, 1:34 a.m. UTC | #13
On Wed, 5 Oct 2011 at 20:03, Jan Kara wrote:
>> With Miklos' patches applied to -rc5, this happend again just now :-(
>>
> Thanks for careful testing! Hmm, since you are able to reproduce on ppc
> but not on x86 there might be some memory ordering bug in Miklos' patches
> or it's simply because of different timing. Miklos, care to debug this
> further?

Just to be clear: I'm still not entirely sure how to reproduce this at 
will. I *assumed* that the daily remount-rw-and-ro-again routine that left 
some inodes in limbo and eventually lead to those "unprocessed orphan 
inodes". With that in mind I tried to reproduce this with the help of a 
test-script (test-remount.sh, [0]) - but the message did not occur while 
the script was running.

I've ran the script again today on the said powerpc machine on a 
loop-mounted 500MB ext4 partition. But even after 100 iterations no
such message occured.

So maybe it's caused by something else or my test-script just doesn't get 
the scenario right and there's something subtle to this whole 
remounting-business I haven't figured out yet, leading to those orphan 
inodes.

I'm at 3.1.0-rc9 now and will wait until the errors occur again.

Christian.

[0] nerdbynature.de/bits/3.1-rc4/ext4/
Toshiyuki Okajima Oct. 6, 2011, 10:12 a.m. UTC | #14
(2011/10/06 10:34), Christian Kujau wrote:
> On Wed, 5 Oct 2011 at 20:03, Jan Kara wrote:
>>> With Miklos' patches applied to -rc5, this happend again just now :-(
>>>
>> Thanks for careful testing! Hmm, since you are able to reproduce on ppc
>> but not on x86 there might be some memory ordering bug in Miklos' patches
>> or it's simply because of different timing. Miklos, care to debug this
>> further?
> 
> Just to be clear: I'm still not entirely sure how to reproduce this at 
> will. I *assumed* that the daily remount-rw-and-ro-again routine that left 
> some inodes in limbo and eventually lead to those "unprocessed orphan 
> inodes". With that in mind I tried to reproduce this with the help of a 
> test-script (test-remount.sh, [0]) - but the message did not occur while 
> the script was running.
> 
> I've ran the script again today on the said powerpc machine on a 
> loop-mounted 500MB ext4 partition. But even after 100 iterations no
> such message occured.
> 
> So maybe it's caused by something else or my test-script just doesn't get 
> the scenario right and there's something subtle to this whole 
> remounting-business I haven't figured out yet, leading to those orphan 
> inodes.
> 
> I'm at 3.1.0-rc9 now and will wait until the errors occur again.
> 
> Christian.
> 
> [0] nerdbynature.de/bits/3.1-rc4/ext4/

With Miklos' patches applies to -rc8, I could display
"Couldn't remount RDWR because of unprocessed orphan inode list".
on my x86_64 machine by my reproducer.

Because actual removal starts from over a range between mnt_want_write() and
mnt_drop_write() even if do_unlinkat() or do_rmdir() calls mnt_want_write()
and mnt_drop_write() to prevent a filesystem from re-mounting read-only.

My reproducer is as follows:
-----------------------------------------------------------------------------
[1] go.sh
#!/bin/sh

dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1000k > /dev/null 2>&1
/sbin/mkfs.ext4 -Fq /tmp/img
mount -o loop /tmp/img /mnt
./writer.sh /mnt &
LOOP=1000000000
for ((i=0; i<LOOP; i++));
do
	echo "[$i]"
	if ((i%2 == 0));
	then
		mount -o ro,remount,loop /mnt
	else
		mount -o rw,remount,loop /mnt
	fi
	sleep 1
done

[2] writer.sh
#!/bin/sh

dir=$1
for ((i=0;i<10000000;i++));
do
	for ((j=0;j<64;j++));
	do
		filename="$dir/file$((i*64 + j))"
		dd if=/dev/zero of=$filename bs=1k count=8 > /dev/null 2>&1 &
	done
	for ((j=0;j<64;j++));
	do
		filename="$dir/file$((i*64 + j))"
		rm -f $filename > /dev/null 2>&1 &
	done
	wait
	if ((i%100 == 0 && i > 0));
	then
		rm -f $dir/file*
	fi
done
exit

[step to run]
# ./go.sh
-----------------------------------------------------------------------------

Therefore, we need a mechanism to prevent a filesystem from re-mounting 
read-only until actual removal finishes.

------------------------------------------------------------------------
[example fix]
 do_unlinkat() {
   ...
   mnt_want_write()
   vfs_unlink()
   if (inode && inode->i_nlink == 0) {              //
      atomic_inc(&inode->i_sb->s_unlink_count);     //   
      inode->i_deleting++;                          // 
   }                                                // 
   mnt_drop_write()
   ...
   iput() // usually, an acutal removal starts
   ...
 }

destroy_inode() {
  ...
  if (inode->i_deleting)
    atomic_dec(&inode->i_sb->s_unlink_count);
  ...
}

do_remount_sb() {
  ...
  else if (!fs_may_remount_ro(sb) || atomic_read(&sb->s_unlink_count)
     return -EBUSY;
  ...
}
------------------------------------------------------------------------

Besides, my reproducer also detects the following message:
"Ext4-fs (xxx): ext4_da_writepages: jbd2_start: xxx pages, ino xx: err -30"

This is because ext4_remount() cannot guarantee to write all ext4 
filesystem data out due to the delayed allocation feature.
(ext4_da_writepages() fails after ext4_remount() sets MS_RDONLY with 
sb->s_flags)

Therefore, we must write all delayed allocation buffers out before 
ext4_remount() sets sb->s_flags with MS_RDONLY. 

------------------------------------------------------------------------
[example fix] // This requires Miklos' patches. 

ext4_remount() {
  ...
  if (*flags & MS_RDONLY) {
      err = dquot_suspend(sb, -1);
      if (err < 0) 
         goto restore_opts;

      sync_filesystem(sb);  // write all delayed buffers out
      sb->s_flags |= MS_RDONLY;
  ...
}      
------------------------------------------------------------------------

Best Regards,
Toshiyuki Okajima

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Miklos Szeredi Oct. 11, 2011, 8:45 a.m. UTC | #15
On Thu, 2011-10-06 at 19:12 +0900, Toshiyuki Okajima wrote:

> With Miklos' patches applies to -rc8, I could display
> "Couldn't remount RDWR because of unprocessed orphan inode list".
> on my x86_64 machine by my reproducer.
> 
> Because actual removal starts from over a range between mnt_want_write() and
> mnt_drop_write() even if do_unlinkat() or do_rmdir() calls mnt_want_write()
> and mnt_drop_write() to prevent a filesystem from re-mounting read-only.


Thanks for the reproducer.

I'm looking at this now...

Miklos


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 486a641..463b52b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2419,6 +2419,22 @@  static int ext4_remount (struct super_block * sb, int * flags, char * data)
                                err = -EROFS;
                                goto restore_opts;
                        }
+
+                       /*
+                        * If we have an unprocessed orphan list hanging
+                        * around from a previously readonly bdev mount,
+                        * require a full umount/remount for now.
+                        */
+                       if (es->s_last_orphan) {
+                               printk(KERN_WARNING "EXT4-fs: %s: couldn't "
+                                      "remount RDWR because of unprocessed "
+                                      "orphan inode list.  Please "
+                                      "umount/remount instead.\n",
+                                      sb->s_id);
+                               err = -EINVAL;
+                               goto restore_opts;
+                       }
+
                        /*
                         * Mounting a RDONLY partition read-write, so reread
                         * and store the current valid flag.  (It may have