diff mbox

3.8.0-rc1: WARNING: at fs/ext4/page-io.c:232

Message ID 20121227062907.GA5001@gmail.com
State Superseded, archived
Headers show

Commit Message

Zheng Liu Dec. 27, 2012, 6:29 a.m. UTC
On Thu, Dec 27, 2012 at 03:27:04AM +0300, Alexander Beregalov wrote:
> Hello
> 
> Let me know if you need more info
> 
> EXT4-fs (sda2): INFO: recovery required on readonly filesystem
> EXT4-fs (sda2): write access will be enabled during recovery
> EXT4-fs (sda2): orphan cleanup on readonly fs
> EXT4-fs (sda2): ext4_orphan_cleanup: truncating inode 841849 to 0 bytes
> ------------[ cut here ]------------
> WARNING: at fs/ext4/page-io.c:232 ext4_flush_unwritten_io+0x6b/0x80()
> Hardware name: P35-DS3
> Modules linked in:
> Pid: 1, comm: swapper/0 Not tainted 3.8.0-rc1-00004-g637704c #1
> Call Trace:
>  [<ffffffff81038f5a>] warn_slowpath_common+0x7a/0xb0
>  [<ffffffff81038fa5>] warn_slowpath_null+0x15/0x20
>  [<ffffffff81163b8b>] ext4_flush_unwritten_io+0x6b/0x80
>  [<ffffffff8117ac5c>] ext4_ext_truncate+0x2c/0x1f0
>  [<ffffffff8116c6d0>] ? ext4_msg+0x50/0x60
>  [<ffffffff8115e050>] ext4_truncate+0x70/0xb0
>  [<ffffffff8117190b>] ext4_fill_super+0x2bab/0x2ce0
>  [<ffffffff810ef02a>] mount_bdev+0x1aa/0x1f0
>  [<ffffffff8102aec9>] ? default_spin_lock_flags+0x9/0x10
>  [<ffffffff8116ed60>] ? ext4_calculate_overhead+0x3a0/0x3a0
>  [<ffffffff8116a870>] ext4_mount+0x10/0x20
>  [<ffffffff810ef96b>] mount_fs+0x1b/0xd0
>  [<ffffffff81108fc1>] vfs_kern_mount+0x71/0x110
>  [<ffffffff8110b096>] do_mount+0x386/0x980
>  [<ffffffff810b7f43>] ? strndup_user+0x53/0x70
>  [<ffffffff8110b71b>] sys_mount+0x8b/0xe0
>  [<ffffffff816afcc8>] mount_block_root+0xfe/0x298
>  [<ffffffff816afeb8>] mount_root+0x56/0x5a
>  [<ffffffff816afff0>] prepare_namespace+0x134/0x16d
>  [<ffffffff8147c9c6>] kernel_init+0x196/0x2a0
>  [<ffffffff816af539>] ? loglevel+0x31/0x31
>  [<ffffffff8147c830>] ? rest_init+0x80/0x80
>  [<ffffffff81482a7c>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8147c830>] ? rest_init+0x80/0x80
> ---[ end trace 425942f4f0ed8d07 ]---
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 835709
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 835629
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 682715
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 832545
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 677529
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 838885
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 676342
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 832311
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 683216
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 828057
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 847476
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 834769
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 846534
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 842096
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 833886
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 688996
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 1085523
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 524364
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 12686
> EXT4-fs (sda2): 19 orphan inodes deleted
> EXT4-fs (sda2): 1 truncate cleaned up
> EXT4-fs (sda2): recovery complete
> EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)

Hi Alexander,

This warning is from ext4_flush_unwritten_io() because we need to take
i_mutex lock before calling this function.  Ohterwise we will trigger a
WARN_ON_ONCE().  But, unfortunately, we don't take this lock in
ext4_orphan_cleanup().  So that is why we will get this warning when
cleaning up orphan list.  Could you please test this patch?

Thanks,
                                        - Zheng

Subject: [PATCH] ext4: fixup a warning from ext4_flush_unwritten_io() in orphan list cleanup

From: Zheng Liu <wenqing.lz@taobao.com>

When ext4 tries to clean up orphan list, we will get the following warning from
ext4_flush_unwritten_io() because i_mutex lock doesn't be taken.

EXT4-fs (sda2): INFO: recovery required on readonly filesystem
EXT4-fs (sda2): write access will be enabled during recovery
EXT4-fs (sda2): orphan cleanup on readonly fs
EXT4-fs (sda2): ext4_orphan_cleanup: truncating inode 841849 to 0 bytes
------------[ cut here ]------------
WARNING: at fs/ext4/page-io.c:232 ext4_flush_unwritten_io+0x6b/0x80()
Hardware name: P35-DS3
Modules linked in:
Pid: 1, comm: swapper/0 Not tainted 3.8.0-rc1-00004-g637704c #1
Call Trace:
 [<ffffffff81038f5a>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81038fa5>] warn_slowpath_null+0x15/0x20
 [<ffffffff81163b8b>] ext4_flush_unwritten_io+0x6b/0x80
 [<ffffffff8117ac5c>] ext4_ext_truncate+0x2c/0x1f0
 [<ffffffff8116c6d0>] ? ext4_msg+0x50/0x60
 [<ffffffff8115e050>] ext4_truncate+0x70/0xb0
 [<ffffffff8117190b>] ext4_fill_super+0x2bab/0x2ce0
 [<ffffffff810ef02a>] mount_bdev+0x1aa/0x1f0
 [<ffffffff8102aec9>] ? default_spin_lock_flags+0x9/0x10
 [<ffffffff8116ed60>] ? ext4_calculate_overhead+0x3a0/0x3a0
 [<ffffffff8116a870>] ext4_mount+0x10/0x20
 [<ffffffff810ef96b>] mount_fs+0x1b/0xd0
 [<ffffffff81108fc1>] vfs_kern_mount+0x71/0x110
 [<ffffffff8110b096>] do_mount+0x386/0x980
 [<ffffffff810b7f43>] ? strndup_user+0x53/0x70
 [<ffffffff8110b71b>] sys_mount+0x8b/0xe0
 [<ffffffff816afcc8>] mount_block_root+0xfe/0x298
 [<ffffffff816afeb8>] mount_root+0x56/0x5a
 [<ffffffff816afff0>] prepare_namespace+0x134/0x16d
 [<ffffffff8147c9c6>] kernel_init+0x196/0x2a0
 [<ffffffff816af539>] ? loglevel+0x31/0x31
 [<ffffffff8147c830>] ? rest_init+0x80/0x80
 [<ffffffff81482a7c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8147c830>] ? rest_init+0x80/0x80
---[ end trace 425942f4f0ed8d07 ]---
EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 835709
EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 835629

Now we try to take i_mutex lock before orphan list cleanup, although we don't
need to take it in ext4_orphan_cleanup() because no one write this inode.
WARN_ON_ONCE doesn't be removed because this warning could help us to avoid some
ciritcal errors.

CC: Dmitry Monakhov <dmonakhov@openvz.org>
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 fs/ext4/super.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Dmitry Monakhov Dec. 27, 2012, 8:04 a.m. UTC | #1
On Thu, 27 Dec 2012 14:29:07 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> On Thu, Dec 27, 2012 at 03:27:04AM +0300, Alexander Beregalov wrote:
> > Hello
> > 
> > Let me know if you need more info
> > 
> > EXT4-fs (sda2): INFO: recovery required on readonly filesystem
> > EXT4-fs (sda2): write access will be enabled during recovery
> > EXT4-fs (sda2): orphan cleanup on readonly fs
> > EXT4-fs (sda2): ext4_orphan_cleanup: truncating inode 841849 to 0 bytes
> > ------------[ cut here ]------------
> > WARNING: at fs/ext4/page-io.c:232 ext4_flush_unwritten_io+0x6b/0x80()
> > Hardware name: P35-DS3
> > Modules linked in:
> > Pid: 1, comm: swapper/0 Not tainted 3.8.0-rc1-00004-g637704c #1
> > Call Trace:
> >  [<ffffffff81038f5a>] warn_slowpath_common+0x7a/0xb0
> >  [<ffffffff81038fa5>] warn_slowpath_null+0x15/0x20
> >  [<ffffffff81163b8b>] ext4_flush_unwritten_io+0x6b/0x80
> >  [<ffffffff8117ac5c>] ext4_ext_truncate+0x2c/0x1f0
> >  [<ffffffff8116c6d0>] ? ext4_msg+0x50/0x60
> >  [<ffffffff8115e050>] ext4_truncate+0x70/0xb0
> >  [<ffffffff8117190b>] ext4_fill_super+0x2bab/0x2ce0
> >  [<ffffffff810ef02a>] mount_bdev+0x1aa/0x1f0
> >  [<ffffffff8102aec9>] ? default_spin_lock_flags+0x9/0x10
> >  [<ffffffff8116ed60>] ? ext4_calculate_overhead+0x3a0/0x3a0
> >  [<ffffffff8116a870>] ext4_mount+0x10/0x20
> >  [<ffffffff810ef96b>] mount_fs+0x1b/0xd0
> >  [<ffffffff81108fc1>] vfs_kern_mount+0x71/0x110
> >  [<ffffffff8110b096>] do_mount+0x386/0x980
> >  [<ffffffff810b7f43>] ? strndup_user+0x53/0x70
> >  [<ffffffff8110b71b>] sys_mount+0x8b/0xe0
> >  [<ffffffff816afcc8>] mount_block_root+0xfe/0x298
> >  [<ffffffff816afeb8>] mount_root+0x56/0x5a
> >  [<ffffffff816afff0>] prepare_namespace+0x134/0x16d
> >  [<ffffffff8147c9c6>] kernel_init+0x196/0x2a0
> >  [<ffffffff816af539>] ? loglevel+0x31/0x31
> >  [<ffffffff8147c830>] ? rest_init+0x80/0x80
> >  [<ffffffff81482a7c>] ret_from_fork+0x7c/0xb0
> >  [<ffffffff8147c830>] ? rest_init+0x80/0x80
> > ---[ end trace 425942f4f0ed8d07 ]---
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 835709
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 835629
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 682715
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 832545
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 677529
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 838885
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 676342
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 832311
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 683216
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 828057
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 847476
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 834769
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 846534
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 842096
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 833886
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 688996
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 1085523
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 524364
> > EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 12686
> > EXT4-fs (sda2): 19 orphan inodes deleted
> > EXT4-fs (sda2): 1 truncate cleaned up
> > EXT4-fs (sda2): recovery complete
> > EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
> 
> Hi Alexander,
> 
> This warning is from ext4_flush_unwritten_io() because we need to take
> i_mutex lock before calling this function.  Ohterwise we will trigger a
> WARN_ON_ONCE().  But, unfortunately, we don't take this lock in
> ext4_orphan_cleanup().  So that is why we will get this warning when
> cleaning up orphan list.  Could you please test this patch?
> 
> Thanks,
>                                         - Zheng
> 
> Subject: [PATCH] ext4: fixup a warning from ext4_flush_unwritten_io() in orphan list cleanup
> 
> From: Zheng Liu <wenqing.lz@taobao.com>
> 
> When ext4 tries to clean up orphan list, we will get the following warning from
> ext4_flush_unwritten_io() because i_mutex lock doesn't be taken.
> 
> EXT4-fs (sda2): INFO: recovery required on readonly filesystem
> EXT4-fs (sda2): write access will be enabled during recovery
> EXT4-fs (sda2): orphan cleanup on readonly fs
> EXT4-fs (sda2): ext4_orphan_cleanup: truncating inode 841849 to 0 bytes
> ------------[ cut here ]------------
> WARNING: at fs/ext4/page-io.c:232 ext4_flush_unwritten_io+0x6b/0x80()
> Hardware name: P35-DS3
> Modules linked in:
> Pid: 1, comm: swapper/0 Not tainted 3.8.0-rc1-00004-g637704c #1
> Call Trace:
>  [<ffffffff81038f5a>] warn_slowpath_common+0x7a/0xb0
>  [<ffffffff81038fa5>] warn_slowpath_null+0x15/0x20
>  [<ffffffff81163b8b>] ext4_flush_unwritten_io+0x6b/0x80
>  [<ffffffff8117ac5c>] ext4_ext_truncate+0x2c/0x1f0
>  [<ffffffff8116c6d0>] ? ext4_msg+0x50/0x60
>  [<ffffffff8115e050>] ext4_truncate+0x70/0xb0
>  [<ffffffff8117190b>] ext4_fill_super+0x2bab/0x2ce0
>  [<ffffffff810ef02a>] mount_bdev+0x1aa/0x1f0
>  [<ffffffff8102aec9>] ? default_spin_lock_flags+0x9/0x10
>  [<ffffffff8116ed60>] ? ext4_calculate_overhead+0x3a0/0x3a0
>  [<ffffffff8116a870>] ext4_mount+0x10/0x20
>  [<ffffffff810ef96b>] mount_fs+0x1b/0xd0
>  [<ffffffff81108fc1>] vfs_kern_mount+0x71/0x110
>  [<ffffffff8110b096>] do_mount+0x386/0x980
>  [<ffffffff810b7f43>] ? strndup_user+0x53/0x70
>  [<ffffffff8110b71b>] sys_mount+0x8b/0xe0
>  [<ffffffff816afcc8>] mount_block_root+0xfe/0x298
>  [<ffffffff816afeb8>] mount_root+0x56/0x5a
>  [<ffffffff816afff0>] prepare_namespace+0x134/0x16d
>  [<ffffffff8147c9c6>] kernel_init+0x196/0x2a0
>  [<ffffffff816af539>] ? loglevel+0x31/0x31
>  [<ffffffff8147c830>] ? rest_init+0x80/0x80
>  [<ffffffff81482a7c>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8147c830>] ? rest_init+0x80/0x80
> ---[ end trace 425942f4f0ed8d07 ]---
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 835709
> EXT4-fs (sda2): ext4_orphan_cleanup: deleting unreferenced inode 835629
> 
> Now we try to take i_mutex lock before orphan list cleanup, although we don't
> need to take it in ext4_orphan_cleanup() because no one write this inode.
> WARN_ON_ONCE doesn't be removed because this warning could help us to avoid some
> ciritcal errors.
You can add  Ack-by: Dmitry Monakhov <dmonakhov@openvz.org>

In fact this is my fault that we still not have autotest for that.
I'm think of add crash-test to xfstests which should trigger journal abort and forced
umount. Later test should mount FS which trigger journal_replay and orphan_cleanup.
> 
> CC: Dmitry Monakhov <dmonakhov@openvz.org>
> Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> ---
>  fs/ext4/super.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 3cdb0a2..188d6f1 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2212,7 +2212,18 @@ static void ext4_orphan_cleanup(struct super_block *sb,
>  				__func__, inode->i_ino, inode->i_size);
>  			jbd_debug(2, "truncating inode %lu to %lld bytes\n",
>  				  inode->i_ino, inode->i_size);
> +			/*
> +			 * Actually we don't need to take i_mutex lock
> +			 * because in orphan list cleanup no one can write
> +			 * this inode.  We take it here because in calling
> +			 * ext4_flush_unwritten_io() this lock needs to be
> +			 * taken, and we don't want to remove this
> +			 * WARN_ON_ONCE().  It is useful for us to avoid some
> +			 * critical errors.
> +			 */
> +			mutex_lock(&inode->i_mutex);
>  			ext4_truncate(inode);
> +			mutex_unlock(&inode->i_mutex);
>  			nr_truncates++;
>  		} else {
>  			ext4_msg(sb, KERN_DEBUG,
> -- 
> 1.7.12.rc2.18.g61b472e
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu Dec. 27, 2012, 10:33 a.m. UTC | #2
On Thu, Dec 27, 2012 at 12:04:36PM +0400, Dmitry Monakhov wrote:
> On Thu, 27 Dec 2012 14:29:07 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> > On Thu, Dec 27, 2012 at 03:27:04AM +0300, Alexander Beregalov wrote:
[cut...]
> > Now we try to take i_mutex lock before orphan list cleanup, although we don't
> > need to take it in ext4_orphan_cleanup() because no one write this inode.
> > WARN_ON_ONCE doesn't be removed because this warning could help us to avoid some
> > ciritcal errors.
> You can add  Ack-by: Dmitry Monakhov <dmonakhov@openvz.org>
> 
> In fact this is my fault that we still not have autotest for that.
> I'm think of add crash-test to xfstests which should trigger journal abort and forced
> umount. Later test should mount FS which trigger journal_replay and orphan_cleanup.

Cool!  That would be great if this test case can be added in xfstests.
:-)

Regards,
                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Dec. 27, 2012, 1:44 p.m. UTC | #3
On Thu, Dec 27, 2012 at 12:04:36PM +0400, Dmitry Monakhov wrote:
> In fact this is my fault that we still not have autotest for that.
> I'm think of add crash-test to xfstests which should trigger journal
> abort and forced umount. Later test should mount FS which trigger
> journal_replay and orphan_cleanup.

We could create some tests in xfstests which force a crash via "echo b
> /proc/sysrq-trigger", but the trick is would require xfstests to
install something in the /etc/rc scripts so xfstests could resume
right after it came back --- and perhaps to echo something to the
console which automated test runners (such as the one I use which I've
published at [1] could capture so they would know that they should
restart the system.

[1] git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git

For now the simplest way to test this is to use the file system image
in tests/f_orphan_extents_inode/image.gz, and make this be an
ext4-specific test.  This is how I tested it when I created my fix (in
parallel with Zheng's patch).  The compressed file system image is
only 564 bytes --- and was made deliberately w/o a journal so it could
be that small --- and the lack of a journal was how I found the
infinite loop problem which was fixed in the 2/2 patch in my patches.
So including this compressed fs image in xfstests is probably the way
I would suggest for now.

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Dec. 29, 2012, 12:21 a.m. UTC | #4
On Thu, Dec 27, 2012 at 08:44:13AM -0500, Theodore Ts'o wrote:
> On Thu, Dec 27, 2012 at 12:04:36PM +0400, Dmitry Monakhov wrote:
> > In fact this is my fault that we still not have autotest for that.
> > I'm think of add crash-test to xfstests which should trigger journal
> > abort and forced umount. Later test should mount FS which trigger
> > journal_replay and orphan_cleanup.
> 
> We could create some tests in xfstests which force a crash via "echo b
> > /proc/sysrq-trigger", but the trick is would require xfstests to
> install something in the /etc/rc scripts so xfstests could resume
> right after it came back --- and perhaps to echo something to the
> console which automated test runners (such as the one I use which I've
> published at [1] could capture so they would know that they should
> restart the system.
> 
> [1] git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git
> 
> For now the simplest way to test this is to use the file system image
> in tests/f_orphan_extents_inode/image.gz, and make this be an
> ext4-specific test.  This is how I tested it when I created my fix (in
> parallel with Zheng's patch).  The compressed file system image is
> only 564 bytes --- and was made deliberately w/o a journal so it could
> be that small --- and the lack of a journal was how I found the
> infinite loop problem which was fixed in the 2/2 patch in my patches.
> So including this compressed fs image in xfstests is probably the way
> I would suggest for now.

Just implement XFS_IOC_GOINGDOWN. That way xfstests will immediately
support shutting down the filesystem via the src/godown utility.
The default XFS behaviour is to freeze the filesystem, then do a
forced shutdown on it, though it can also just trigger shutdowns
with and without first flushing the journal.

i.e.  it sounds like test 121 is pretty much what you are describing
here...

Cheers,

Dave.
Dmitry Monakhov Dec. 29, 2012, 5:04 a.m. UTC | #5
On Sat, 29 Dec 2012 11:21:31 +1100, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Dec 27, 2012 at 08:44:13AM -0500, Theodore Ts'o wrote:
> > On Thu, Dec 27, 2012 at 12:04:36PM +0400, Dmitry Monakhov wrote:
> > > In fact this is my fault that we still not have autotest for that.
> > > I'm think of add crash-test to xfstests which should trigger journal
> > > abort and forced umount. Later test should mount FS which trigger
> > > journal_replay and orphan_cleanup.
> > 
> > We could create some tests in xfstests which force a crash via "echo b
> > > /proc/sysrq-trigger", but the trick is would require xfstests to
> > install something in the /etc/rc scripts so xfstests could resume
> > right after it came back --- and perhaps to echo something to the
> > console which automated test runners (such as the one I use which I've
> > published at [1] could capture so they would know that they should
> > restart the system.
> > 
> > [1] git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git
> > 
> > For now the simplest way to test this is to use the file system image
> > in tests/f_orphan_extents_inode/image.gz, and make this be an
> > ext4-specific test.  This is how I tested it when I created my fix (in
> > parallel with Zheng's patch).  The compressed file system image is
> > only 564 bytes --- and was made deliberately w/o a journal so it could
> > be that small --- and the lack of a journal was how I found the
> > infinite loop problem which was fixed in the 2/2 patch in my patches.
> > So including this compressed fs image in xfstests is probably the way
> > I would suggest for now.
> 
> Just implement XFS_IOC_GOINGDOWN. That way xfstests will immediately
> support shutting down the filesystem via the src/godown utility.
> The default XFS behaviour is to freeze the filesystem, then do a
> forced shutdown on it, though it can also just trigger shutdowns
> with and without first flushing the journal.
Actually I want to emulate device failure this allow us to test
following scenarios
1) unsafe usb dongle unplug(test system survival)
2) power failure(
Our 'improved' loop device (http://wiki.openvz.org/Ploop) has
/sys/block/ploop0/make-it-fail  knob which explicitly fail blkdevice
Once failed it return EIO on all requests. I would like add this 
feature in generic loop device.
> 
> i.e.  it sounds like test 121 is pretty much what you are describing
> here...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Dec. 29, 2012, 11:23 p.m. UTC | #6
[ add xfs@oss.sgi.com to cc list. ]

On Sat, Dec 29, 2012 at 09:04:49AM +0400, Dmitry Monakhov wrote:
> On Sat, 29 Dec 2012 11:21:31 +1100, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Dec 27, 2012 at 08:44:13AM -0500, Theodore Ts'o wrote:
> > > On Thu, Dec 27, 2012 at 12:04:36PM +0400, Dmitry Monakhov wrote:
> > > > In fact this is my fault that we still not have autotest for that.
> > > > I'm think of add crash-test to xfstests which should trigger journal
> > > > abort and forced umount. Later test should mount FS which trigger
> > > > journal_replay and orphan_cleanup.
> > > 
> > > We could create some tests in xfstests which force a crash via "echo b
> > > > /proc/sysrq-trigger", but the trick is would require xfstests to
> > > install something in the /etc/rc scripts so xfstests could resume
> > > right after it came back --- and perhaps to echo something to the
> > > console which automated test runners (such as the one I use which I've
> > > published at [1] could capture so they would know that they should
> > > restart the system.
> > > 
> > > [1] git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git
> > > 
> > > For now the simplest way to test this is to use the file system image
> > > in tests/f_orphan_extents_inode/image.gz, and make this be an
> > > ext4-specific test.  This is how I tested it when I created my fix (in
> > > parallel with Zheng's patch).  The compressed file system image is
> > > only 564 bytes --- and was made deliberately w/o a journal so it could
> > > be that small --- and the lack of a journal was how I found the
> > > infinite loop problem which was fixed in the 2/2 patch in my patches.
> > > So including this compressed fs image in xfstests is probably the way
> > > I would suggest for now.
> > 
> > Just implement XFS_IOC_GOINGDOWN. That way xfstests will immediately
> > support shutting down the filesystem via the src/godown utility.
> > The default XFS behaviour is to freeze the filesystem, then do a
> > forced shutdown on it, though it can also just trigger shutdowns
> > with and without first flushing the journal.
> Actually I want to emulate device failure this allow us to test
> following scenarios
> 1) unsafe usb dongle unplug(test system survival)

This is the same as immediately returning EIO to any IO that is
started after the event, or in the case of a shutdown filesystem,
stopping any new IO from being submitted with an error.

XFS implements the latter as part of it's shutdown infrastructure.
IOWs, ioctl(XFS_IOC_GOINGDOWN, XFS_FSOP_GOING_FLAGS_NOLOGFLUSH) is
exactly equivalent to pulling the plug out of the device from under
the filesystem - after the call, no new IO submission ever reaches
the disk, and IO in flight is marked as failed on completion...

As it is, just unplugging the device leads to unpredictable test
behaviour as it cannot be guaranteed to reproduce the required
filesytem state that the test requires. Hence test 121 uses
XFS_FSOP_GOING_FLAGS_LOGFLUSH, which means the log is completely
written on disk before the shutdown is initiated. This ensures that
recovery will see the unlinked files and process them appropriately.
A "device unplug" equivalent shutdown would likely cause the unlink
transactions never to make it to disk, and so the test would be
unreliable.

> 2) power failure(
> Our 'improved' loop device (http://wiki.openvz.org/Ploop) has
> /sys/block/ploop0/make-it-fail  knob which explicitly fail blkdevice
> Once failed it return EIO on all requests. I would like add this 
> feature in generic loop device.

That's not the equivalent of a power failure. That's exactly the
same as pulling the plug. If you want robust power fail testing, you
need to use a device that emulates a volatile device cache which
causes IOs that have already been signalled as complete (without
errors) to the filesystem to then fail.

As it is, I'm pretty sure that the md-faulty/dm-flakey/scsi-debug
devices can already do this "return EIO to all new IOs" error
injection. We already use the scsi-debug module in xfstests, so I'd
suggest that it might be the best place to start for this sort of
device failure testing in xfstests....

What I'm trying to say here is that we already have mechanisms in
xfstests for exercising the functionality you are talking about
here. You don't need to re-invent the wheel or rely on an
out-of-tree device driver - just use the existing methods other
filesystems use for executing this sort of testing...

Cheers,

Dave.
Eric Sandeen Jan. 2, 2013, 3:17 p.m. UTC | #7
On 12/28/12 6:21 PM, Dave Chinner wrote:
> On Thu, Dec 27, 2012 at 08:44:13AM -0500, Theodore Ts'o wrote:
>> On Thu, Dec 27, 2012 at 12:04:36PM +0400, Dmitry Monakhov wrote:
>>> In fact this is my fault that we still not have autotest for that.
>>> I'm think of add crash-test to xfstests which should trigger journal
>>> abort and forced umount. Later test should mount FS which trigger
>>> journal_replay and orphan_cleanup.
>>
>> We could create some tests in xfstests which force a crash via "echo b
>>> /proc/sysrq-trigger", but the trick is would require xfstests to
>> install something in the /etc/rc scripts so xfstests could resume
>> right after it came back --- and perhaps to echo something to the
>> console which automated test runners (such as the one I use which I've
>> published at [1] could capture so they would know that they should
>> restart the system.
>>
>> [1] git://git.kernel.org/pub/scm/fs/ext2/xfstests-bld.git
>>
>> For now the simplest way to test this is to use the file system image
>> in tests/f_orphan_extents_inode/image.gz, and make this be an
>> ext4-specific test.  This is how I tested it when I created my fix (in
>> parallel with Zheng's patch).  The compressed file system image is
>> only 564 bytes --- and was made deliberately w/o a journal so it could
>> be that small --- and the lack of a journal was how I found the
>> infinite loop problem which was fixed in the 2/2 patch in my patches.
>> So including this compressed fs image in xfstests is probably the way
>> I would suggest for now.
> 
> Just implement XFS_IOC_GOINGDOWN. That way xfstests will immediately
> support shutting down the filesystem via the src/godown utility.
> The default XFS behaviour is to freeze the filesystem, then do a
> forced shutdown on it, though it can also just trigger shutdowns
> with and without first flushing the journal.
> 
> i.e.  it sounds like test 121 is pretty much what you are describing
> here...
> 
> Cheers,
> 
> Dave.
> 

Agreed, if the xfs testing ioctl semantics make sense across other
filesystems, it would be great to just re-use that, and voila, instant
test framework becomes available.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3cdb0a2..188d6f1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2212,7 +2212,18 @@  static void ext4_orphan_cleanup(struct super_block *sb,
 				__func__, inode->i_ino, inode->i_size);
 			jbd_debug(2, "truncating inode %lu to %lld bytes\n",
 				  inode->i_ino, inode->i_size);
+			/*
+			 * Actually we don't need to take i_mutex lock
+			 * because in orphan list cleanup no one can write
+			 * this inode.  We take it here because in calling
+			 * ext4_flush_unwritten_io() this lock needs to be
+			 * taken, and we don't want to remove this
+			 * WARN_ON_ONCE().  It is useful for us to avoid some
+			 * critical errors.
+			 */
+			mutex_lock(&inode->i_mutex);
 			ext4_truncate(inode);
+			mutex_unlock(&inode->i_mutex);
 			nr_truncates++;
 		} else {
 			ext4_msg(sb, KERN_DEBUG,