Message ID | 4E66478E.90102@redhat.com |
---|---|
State | Not Applicable, archived |
Headers | show |
On Tue, 6 Sep 2011 at 11:17, Eric Sandeen wrote: > It's probably not a bug or flaw; orphan inodes can occur for legitimate > reasons (fs goes down while someone is holding open an unlinked file), The filesystem is being constantly accessed by an application, holding at least one file open (readonly). And then there is this mechanism trying to remount the filesystem rw and then ro again every day. I guess this equals the scenario of "fs goes down (remount!) while someone is holding open a file"? > Did you happen to also get a message like this on the original mount? > ext4_msg(sb, KERN_ERR, "write access " > "unavailable, skipping orphan cleanup"); I think I've seen this message before, but I'm nore sure where and it's not in the logs of this particular system. > See also commit: > > commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd > Author: Eric Sandeen <sandeen@redhat.com> > Date: Sat Feb 10 01:46:08 2007 -0800 > > [PATCH] ext4: refuse ro to rw remount of fs with orphan inodes Yes, I've seen this commit when I was searching where this message came from. And I think I understand now why this is happening, but still...if I may ask: can't this be handled more elegantly? Do other filesystems have the same problem? Right now the procedure is to pause the application, stop the nfs exports, unmount, fsck, mount, start nfs exports and resume the application. And every few days/weeks this has to be repeated, "just because" these daily remounts occur (which are the main reason for this, I suppose). Thanks for replying, Christian.
On 9/6/11 11:37 AM, Christian Kujau wrote: > On Tue, 6 Sep 2011 at 11:17, Eric Sandeen wrote: >> It's probably not a bug or flaw; orphan inodes can occur for legitimate >> reasons (fs goes down while someone is holding open an unlinked file), > > The filesystem is being constantly accessed by an application, holding at > least one file open (readonly). And then there is this mechanism trying to > remount the filesystem rw and then ro again every day. I guess this equals > the scenario of "fs goes down (remount!) while someone is holding open a > file"? well, no - "goes down" means "crashed or lost power" >> Did you happen to also get a message like this on the original mount? >> ext4_msg(sb, KERN_ERR, "write access " >> "unavailable, skipping orphan cleanup"); > > I think I've seen this message before, but I'm nore sure where and it's > not in the logs of this particular system. > >> See also commit: >> >> commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd >> Author: Eric Sandeen <sandeen@redhat.com> >> Date: Sat Feb 10 01:46:08 2007 -0800 >> >> [PATCH] ext4: refuse ro to rw remount of fs with orphan inodes > > Yes, I've seen this commit when I was searching where this message came > from. And I think I understand now why this is happening, but > still...if I may ask: can't this be handled more elegantly? Do other > filesystems have the same problem? well, as the commit said, it'd be nice to handle it in remount, yes... :( > Right now the procedure is to pause the application, stop the nfs exports, > unmount, fsck, mount, start nfs exports and resume the application. And > every few days/weeks this has to be repeated, "just because" these daily > remounts occur (which are the main reason for this, I suppose). well, seems like you need to get to the root cause of the unprocessed orphan inodes. I don't yet have my post-vacation thinking cap back on... does cycling rw/ro/rw/ro with open & unlinked files cause an orphan inode situation? -Eric > Thanks for replying, > Christian. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 6 Sep 2011 at 11:44, Eric Sandeen wrote: > > remount the filesystem rw and then ro again every day. I guess this equals > > the scenario of "fs goes down (remount!) while someone is holding open a > > file"? > > well, no - "goes down" means "crashed or lost power" Hm, the machine and its storage is online all the time and the messages occur inbetween downtimes. > well, as the commit said, it'd be nice to handle it in remount, yes... :( If my daily remounts are causing this, it's unforuntate. But it's nice to know that now. It'd be more worrying that someting else is slowly corrupting the fs. > well, seems like you need to get to the root cause of the unprocessed > orphan inodes. > > I don't yet have my post-vacation thinking cap back on... does cycling > rw/ro/rw/ro with open & unlinked files cause an orphan inode situation? This is almost all I do on this fs. The whole process is: 1) fs is ro most of the time, while a remote application accesses it via a readonly nfs mount. 2) once a day the fs gets remounted rw (the remote application does not know this and is still accessing the fs via the same ro-nfs mount 3) backups are being pushed to the fs (via rsync, using hardlinks a lot) 4) fs is remounted ro again 5) at some point the remote application notices that the nfs mount went stale and has to remount its readonly nfs-mount Thanks, Christian.
On Tue 06-09-11 11:44:45, Eric Sandeen wrote: > On 9/6/11 11:37 AM, Christian Kujau wrote: > > On Tue, 6 Sep 2011 at 11:17, Eric Sandeen wrote: > >> It's probably not a bug or flaw; orphan inodes can occur for legitimate > >> reasons (fs goes down while someone is holding open an unlinked file), > > > > The filesystem is being constantly accessed by an application, holding at > > least one file open (readonly). And then there is this mechanism trying to > > remount the filesystem rw and then ro again every day. I guess this equals > > the scenario of "fs goes down (remount!) while someone is holding open a > > file"? > > well, no - "goes down" means "crashed or lost power" > > >> Did you happen to also get a message like this on the original mount? > >> ext4_msg(sb, KERN_ERR, "write access " > >> "unavailable, skipping orphan cleanup"); > > > > I think I've seen this message before, but I'm nore sure where and it's > > not in the logs of this particular system. > > > >> See also commit: > >> > >> commit ead6596b9e776ac32d82f7d1931d7638e6d4a7bd > >> Author: Eric Sandeen <sandeen@redhat.com> > >> Date: Sat Feb 10 01:46:08 2007 -0800 > >> > >> [PATCH] ext4: refuse ro to rw remount of fs with orphan inodes > > > > Yes, I've seen this commit when I was searching where this message came > > from. And I think I understand now why this is happening, but > > still...if I may ask: can't this be handled more elegantly? Do other > > filesystems have the same problem? > > well, as the commit said, it'd be nice to handle it in remount, yes... :( > > > Right now the procedure is to pause the application, stop the nfs exports, > > unmount, fsck, mount, start nfs exports and resume the application. And > > every few days/weeks this has to be repeated, "just because" these daily > > remounts occur (which are the main reason for this, I suppose). > > well, seems like you need to get to the root cause of the unprocessed > orphan inodes. > > I don't yet have my post-vacation thinking cap back on... does cycling > rw/ro/rw/ro with open & unlinked files cause an orphan inode situation? There's race where VFS remount code can race with unlink and result will be unlinked file in orphan list on read-only filesystem. Christian seems to be hitting this race. Miklos Szeredi has patches (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to mostly close this hole but they're waiting for Al to find time to look at them / merge them AFAIK. Honza
On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote: > There's race where VFS remount code can race with unlink and result will > be unlinked file in orphan list on read-only filesystem. Christian seems to > be hitting this race. Miklos Szeredi has patches > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to > mostly close this hole but they're waiting for Al to find time to look at > them / merge them AFAIK. While these patches are still pending review, are they "dangerous" to apply? If not, I'd like to volunteer as a tester :-) Christian.
On Fri 09-09-11 18:11:26, Christian Kujau wrote: > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote: > > There's race where VFS remount code can race with unlink and result will > > be unlinked file in orphan list on read-only filesystem. Christian seems to > > be hitting this race. Miklos Szeredi has patches > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to > > mostly close this hole but they're waiting for Al to find time to look at > > them / merge them AFAIK. > > While these patches are still pending review, are they "dangerous" to > apply? If not, I'd like to volunteer as a tester :-) As far as I saw them, they should be pretty safe. So feel free to test them. Honza
On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote: > On Fri 09-09-11 18:11:26, Christian Kujau wrote: > > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote: > > > There's race where VFS remount code can race with unlink and result will > > > be unlinked file in orphan list on read-only filesystem. Christian seems to > > > be hitting this race. Miklos Szeredi has patches > > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to > > > mostly close this hole but they're waiting for Al to find time to look at > > > them / merge them AFAIK. > > > > While these patches are still pending review, are they "dangerous" to > > apply? If not, I'd like to volunteer as a tester :-) > As far as I saw them, they should be pretty safe. So feel free to test > them. I've applied them to -rc5. It might take a few days untile the message occurs. Or, until "nothing happens", since I have the patches applied :-) Meanwhile I'm trying to reproduce this issue on an x86 machine, but haven't succeeded yet. Thanks, Christian.
On Mon, 12 Sep 2011 at 21:52, Christian Kujau wrote: > On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote: > > On Fri 09-09-11 18:11:26, Christian Kujau wrote: > > > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote: > > > > There's race where VFS remount code can race with unlink and result will > > > > be unlinked file in orphan list on read-only filesystem. Christian seems to > > > > be hitting this race. Miklos Szeredi has patches > > > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to > > > > mostly close this hole but they're waiting for Al to find time to look at > > > > them / merge them AFAIK. > > > > > > While these patches are still pending review, are they "dangerous" to > > > apply? If not, I'd like to volunteer as a tester :-) > > As far as I saw them, they should be pretty safe. So feel free to test > > them. > > I've applied them to -rc5. It might take a few days untile the message > occurs. Or, until "nothing happens", since I have the patches applied :-) With Miklos' patches applied to -rc5, this happend again just now :-( > Meanwhile I'm trying to reproduce this issue on an x86 machine, but > haven't succeeded yet. After a ~3k remounts with constantly reading from the filesystem in question[0], I still was NOT able to reproduce this on an x86 VM :( Any ideas? Thanks, Christian. [0] http://nerdbynature.de/bits/3.1-rc4/ext4/
On Fri, Sep 16, 2011 at 6:49 AM, Christian Kujau <lists@nerdbynature.de> wrote: > On Mon, 12 Sep 2011 at 21:52, Christian Kujau wrote: >> On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote: >> > On Fri 09-09-11 18:11:26, Christian Kujau wrote: >> > > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote: >> > > > There's race where VFS remount code can race with unlink and result will >> > > > be unlinked file in orphan list on read-only filesystem. Christian seems to >> > > > be hitting this race. Miklos Szeredi has patches >> > > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to >> > > > mostly close this hole but they're waiting for Al to find time to look at >> > > > them / merge them AFAIK. >> > > >> > > While these patches are still pending review, are they "dangerous" to >> > > apply? If not, I'd like to volunteer as a tester :-) >> > As far as I saw them, they should be pretty safe. So feel free to test >> > them. >> >> I've applied them to -rc5. It might take a few days untile the message >> occurs. Or, until "nothing happens", since I have the patches applied :-) > > With Miklos' patches applied to -rc5, this happend again just now :-( > >> Meanwhile I'm trying to reproduce this issue on an x86 machine, but >> haven't succeeded yet. > > After a ~3k remounts with constantly reading from the filesystem in > question[0], I still was NOT able to reproduce this on an x86 VM :( > > Any ideas? > This is just a shot in the dark, but are you using Ubuntu on your production machine by any chance? The reason I am asking is becasue I have been getting failures to umount fs, while running xfstests on ext4 with Ubuntu for a long time and nobody else seems to share this problem. I always suspected Ubuntu has some service that keeps open handles on mounted fs, but I never got to examine this. Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 16 Sep 2011 at 15:04, Amir Goldstein wrote: > This is just a shot in the dark, but are you using Ubuntu on your > production machine by any chance? No, I'm using Debian/stable on the "production" machine (the powerpc32 box, where the error occurs). I was trying to reproduce this in a x86 Ubuntu/10.04 VM, but could not. > The reason I am asking is becasue I have been getting failures to > umount fs, while running xfstests on ext4 > with Ubuntu for a long time and nobody else seems to share this problem. Is there a bug open for that? > I always suspected Ubuntu has some service that keeps open handles on > mounted fs, but I never got to examine this. Yeah...their "server" version needs major surgery to disable all the bells and whistles before it's becoming usable. Christian.
On Fri, Sep 16, 2011 at 3:17 PM, Christian Kujau <lists@nerdbynature.de> wrote: > On Fri, 16 Sep 2011 at 15:04, Amir Goldstein wrote: >> This is just a shot in the dark, but are you using Ubuntu on your >> production machine by any chance? > > No, I'm using Debian/stable on the "production" machine (the powerpc32 > box, where the error occurs). I was trying to reproduce this in a x86 > Ubuntu/10.04 VM, but could not. Actually, now I recall that Yongqiang did say he saw the same problem on Debian, but I may be wrong. > >> The reason I am asking is becasue I have been getting failures to >> umount fs, while running xfstests on ext4 >> with Ubuntu for a long time and nobody else seems to share this problem. > > Is there a bug open for that? > No, I couldn't find any trace of bug reports on this behavior, so I wrote it off as "miss-configuration" of my server. I did write to xfs list to ask if anyone else has seen this problem. You could try to run xfstests on your server and see if the problem is reproducible. >> I always suspected Ubuntu has some service that keeps open handles on >> mounted fs, but I never got to examine this. > > Yeah...their "server" version needs major surgery to disable all the > bells and whistles before it's becoming usable. > And I installed the "desktop" version, so where does this leave me... Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 15-09-11 20:49:19, Christian Kujau wrote: > On Mon, 12 Sep 2011 at 21:52, Christian Kujau wrote: > > On Sat, 10 Sep 2011 at 22:04, Jan Kara wrote: > > > On Fri 09-09-11 18:11:26, Christian Kujau wrote: > > > > On Thu, 8 Sep 2011 at 20:51, Jan Kara wrote: > > > > > There's race where VFS remount code can race with unlink and result will > > > > > be unlinked file in orphan list on read-only filesystem. Christian seems to > > > > > be hitting this race. Miklos Szeredi has patches > > > > > (http://lkml.indiana.edu/hypermail/linux/kernel/1108.3/00169.html) to > > > > > mostly close this hole but they're waiting for Al to find time to look at > > > > > them / merge them AFAIK. > > > > > > > > While these patches are still pending review, are they "dangerous" to > > > > apply? If not, I'd like to volunteer as a tester :-) > > > As far as I saw them, they should be pretty safe. So feel free to test > > > them. > > > > I've applied them to -rc5. It might take a few days untile the message > > occurs. Or, until "nothing happens", since I have the patches applied :-) > > With Miklos' patches applied to -rc5, this happend again just now :-( Thanks for careful testing! Hmm, since you are able to reproduce on ppc but not on x86 there might be some memory ordering bug in Miklos' patches or it's simply because of different timing. Miklos, care to debug this further? > > Meanwhile I'm trying to reproduce this issue on an x86 machine, but > > haven't succeeded yet. > > After a ~3k remounts with constantly reading from the filesystem in > question[0], I still was NOT able to reproduce this on an x86 VM :( > > Any ideas? Honza
On Wed, 5 Oct 2011 at 20:03, Jan Kara wrote: >> With Miklos' patches applied to -rc5, this happend again just now :-( >> > Thanks for careful testing! Hmm, since you are able to reproduce on ppc > but not on x86 there might be some memory ordering bug in Miklos' patches > or it's simply because of different timing. Miklos, care to debug this > further? Just to be clear: I'm still not entirely sure how to reproduce this at will. I *assumed* that the daily remount-rw-and-ro-again routine that left some inodes in limbo and eventually lead to those "unprocessed orphan inodes". With that in mind I tried to reproduce this with the help of a test-script (test-remount.sh, [0]) - but the message did not occur while the script was running. I've ran the script again today on the said powerpc machine on a loop-mounted 500MB ext4 partition. But even after 100 iterations no such message occured. So maybe it's caused by something else or my test-script just doesn't get the scenario right and there's something subtle to this whole remounting-business I haven't figured out yet, leading to those orphan inodes. I'm at 3.1.0-rc9 now and will wait until the errors occur again. Christian. [0] nerdbynature.de/bits/3.1-rc4/ext4/
(2011/10/06 10:34), Christian Kujau wrote: > On Wed, 5 Oct 2011 at 20:03, Jan Kara wrote: >>> With Miklos' patches applied to -rc5, this happend again just now :-( >>> >> Thanks for careful testing! Hmm, since you are able to reproduce on ppc >> but not on x86 there might be some memory ordering bug in Miklos' patches >> or it's simply because of different timing. Miklos, care to debug this >> further? > > Just to be clear: I'm still not entirely sure how to reproduce this at > will. I *assumed* that the daily remount-rw-and-ro-again routine that left > some inodes in limbo and eventually lead to those "unprocessed orphan > inodes". With that in mind I tried to reproduce this with the help of a > test-script (test-remount.sh, [0]) - but the message did not occur while > the script was running. > > I've ran the script again today on the said powerpc machine on a > loop-mounted 500MB ext4 partition. But even after 100 iterations no > such message occured. > > So maybe it's caused by something else or my test-script just doesn't get > the scenario right and there's something subtle to this whole > remounting-business I haven't figured out yet, leading to those orphan > inodes. > > I'm at 3.1.0-rc9 now and will wait until the errors occur again. > > Christian. > > [0] nerdbynature.de/bits/3.1-rc4/ext4/ With Miklos' patches applies to -rc8, I could display "Couldn't remount RDWR because of unprocessed orphan inode list". on my x86_64 machine by my reproducer. Because actual removal starts from over a range between mnt_want_write() and mnt_drop_write() even if do_unlinkat() or do_rmdir() calls mnt_want_write() and mnt_drop_write() to prevent a filesystem from re-mounting read-only. My reproducer is as follows: ----------------------------------------------------------------------------- [1] go.sh #!/bin/sh dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1000k > /dev/null 2>&1 /sbin/mkfs.ext4 -Fq /tmp/img mount -o loop /tmp/img /mnt ./writer.sh /mnt & LOOP=1000000000 for ((i=0; i<LOOP; i++)); do echo "[$i]" if ((i%2 == 0)); then mount -o ro,remount,loop /mnt else mount -o rw,remount,loop /mnt fi sleep 1 done [2] writer.sh #!/bin/sh dir=$1 for ((i=0;i<10000000;i++)); do for ((j=0;j<64;j++)); do filename="$dir/file$((i*64 + j))" dd if=/dev/zero of=$filename bs=1k count=8 > /dev/null 2>&1 & done for ((j=0;j<64;j++)); do filename="$dir/file$((i*64 + j))" rm -f $filename > /dev/null 2>&1 & done wait if ((i%100 == 0 && i > 0)); then rm -f $dir/file* fi done exit [step to run] # ./go.sh ----------------------------------------------------------------------------- Therefore, we need a mechanism to prevent a filesystem from re-mounting read-only until actual removal finishes. ------------------------------------------------------------------------ [example fix] do_unlinkat() { ... mnt_want_write() vfs_unlink() if (inode && inode->i_nlink == 0) { // atomic_inc(&inode->i_sb->s_unlink_count); // inode->i_deleting++; // } // mnt_drop_write() ... iput() // usually, an acutal removal starts ... } destroy_inode() { ... if (inode->i_deleting) atomic_dec(&inode->i_sb->s_unlink_count); ... } do_remount_sb() { ... else if (!fs_may_remount_ro(sb) || atomic_read(&sb->s_unlink_count) return -EBUSY; ... } ------------------------------------------------------------------------ Besides, my reproducer also detects the following message: "Ext4-fs (xxx): ext4_da_writepages: jbd2_start: xxx pages, ino xx: err -30" This is because ext4_remount() cannot guarantee to write all ext4 filesystem data out due to the delayed allocation feature. (ext4_da_writepages() fails after ext4_remount() sets MS_RDONLY with sb->s_flags) Therefore, we must write all delayed allocation buffers out before ext4_remount() sets sb->s_flags with MS_RDONLY. ------------------------------------------------------------------------ [example fix] // This requires Miklos' patches. ext4_remount() { ... if (*flags & MS_RDONLY) { err = dquot_suspend(sb, -1); if (err < 0) goto restore_opts; sync_filesystem(sb); // write all delayed buffers out sb->s_flags |= MS_RDONLY; ... } ------------------------------------------------------------------------ Best Regards, Toshiyuki Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2011-10-06 at 19:12 +0900, Toshiyuki Okajima wrote: > With Miklos' patches applies to -rc8, I could display > "Couldn't remount RDWR because of unprocessed orphan inode list". > on my x86_64 machine by my reproducer. > > Because actual removal starts from over a range between mnt_want_write() and > mnt_drop_write() even if do_unlinkat() or do_rmdir() calls mnt_want_write() > and mnt_drop_write() to prevent a filesystem from re-mounting read-only. Thanks for the reproducer. I'm looking at this now... Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 486a641..463b52b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2419,6 +2419,22 @@ static int ext4_remount (struct super_block * sb, int * flags, char * data) err = -EROFS; goto restore_opts; } + + /* + * If we have an unprocessed orphan list hanging + * around from a previously readonly bdev mount, + * require a full umount/remount for now. + */ + if (es->s_last_orphan) { + printk(KERN_WARNING "EXT4-fs: %s: couldn't " + "remount RDWR because of unprocessed " + "orphan inode list. Please " + "umount/remount instead.\n", + sb->s_id); + err = -EINVAL; + goto restore_opts; + } + /* * Mounting a RDONLY partition read-write, so reread * and store the current valid flag. (It may have