diff mbox

xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel

Message ID 20160608125631.GA19589@quack2.suse.cz
State Superseded, archived
Headers show

Commit Message

Jan Kara June 8, 2016, 12:56 p.m. UTC
On Fri 03-06-16 13:58:44, Jan Kara wrote:
> On Fri 03-06-16 18:16:12, Eryu Guan wrote:
> > On Thu, Jun 02, 2016 at 02:17:50PM +0200, Jan Kara wrote:
> > > 
> > > So I was trying but I could not reproduce the hang either. Can you find out
> > > which page is jbd2 thread waiting for and dump page->index, page->flags and
> > > also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
> > > page->private? Maybe that will shed some light...
> > 
> > I'm using crash on live system when the hang happens, so I got the page
> > address from "bt -f"
> > 
> >  #6 [ffff880212343b40] wait_on_page_bit at ffffffff8119009e
> >     ffff880212343b48: ffffea0002c23600 000000000000000d 
> >     ffff880212343b58: 0000000000000000 0000000000000000 
> >     ffff880212343b68: ffff880213251480 ffffffff810cd000 
> >     ffff880212343b78: ffff88021ff27218 ffff88021ff27218 
> >     ffff880212343b88: 00000000c1b4a75a ffff880212343c68 
> >     ffff880212343b98: ffffffff811901bf
> 
> Thanks for debugging! In the end I was able to reproduce the issue on my
> UML instance as well and I'm debugging what's going on.

Attached patch fixes the issue for me. I'll submit it once a full xfstests
run finishes for it (which may take a while as our server room is currently
moving to a different place).

								Honza

Comments

Holger Hoffstätte June 8, 2016, 2:23 p.m. UTC | #1
On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
(snip)
> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> run finishes for it (which may take a while as our server room is currently
> moving to a different place).
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Wed, 8 Jun 2016 10:01:45 +0200
> Subject: [PATCH] ext4: Fix deadlock during page writeback
> 
> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> deadlock in ext4_writepages() which was previously much harder to hit.
> After this commit xfstest generic/130 reproduces the deadlock on small
> filesystems.

Since you marked this for -stable, just a heads-up that the previous patch
for the data exposure was rejected from -stable (see [1]) because it
has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
until 4.6. I removed it locally but Greg probably wants an official patch.

So both this and the previous patch need to be submitted.

-h

[1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikola Pajkovsky June 9, 2016, 7:23 a.m. UTC | #2
Holger Hoffstätte <holger@applied-asynchrony.com> writes:

> On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> (snip)
>> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> run finishes for it (which may take a while as our server room is currently
>> moving to a different place).
>> 
>> 								Honza
>> -- 
>> Jan Kara <jack@suse.com>
>> SUSE Labs, CR
>> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> From: Jan Kara <jack@suse.cz>
>> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> Subject: [PATCH] ext4: Fix deadlock during page writeback
>> 
>> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> deadlock in ext4_writepages() which was previously much harder to hit.
>> After this commit xfstest generic/130 reproduces the deadlock on small
>> filesystems.
>
> Since you marked this for -stable, just a heads-up that the previous patch
> for the data exposure was rejected from -stable (see [1]) because it
> has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> until 4.6. I removed it locally but Greg probably wants an official patch.
>
> So both this and the previous patch need to be submitted.
>
> [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}

I'm just wondering if the Jan's patch is not related to blocked
processes in following trace. It very hard to hit it and I don't have
any reproducer.

kernel: INFO: task jbd2/vdc-8:4710 blocked for more than 120 seconds. severity=err
kernel:      Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: jbd2/vdc-8      D 0000000000000002     0  4710      2 0x00000080 severity=info
kernel: ffff88273a373b38 0000000000000046 ffff88273a373ae8 ffff88273a370010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273c76c7e0 ffff88273e322b20 severity=warning
kernel: ffff88273a373b38 ffff88277f092c40 ffff88273c76c7e0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffff8121021e>] __wait_on_buffer+0x2e/0x30 severity=warning
kernel: [<ffffffffa01180a1>] jbd2_journal_commit_transaction+0x861/0x16e0 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011d847>] kjournald2+0x127/0x340 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011d720>] ? jbd2_journal_clear_features+0x90/0x90 [jbd2] severity=warning
kernel: [<ffffffff8108b31e>] kthread+0xce/0xf0 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: [<ffffffff81618c58>] ret_from_fork+0x58/0x90 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: INFO: task postmaster:25406 blocked for more than 120 seconds. severity=err
kernel:      Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster      D 000000000000000f     0 25406  25395 0x00000080 severity=info
kernel: ffff882485e5f9a8 0000000000000086 ffff882485e5f958 ffff882485e5c010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882657c9d640 ffff88273e3b8000 severity=warning
kernel: ffff882485e5f9a8 ffff88277f3d2c40 ffff882657c9d640 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffff81210a9a>] ? __getblk_gfp+0x3a/0x80 severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa0146c08>] ext4_reserve_inode_write+0x78/0xa0 [ext4] severity=warning
kernel: [<ffffffffa0146c82>] ext4_mark_inode_dirty+0x52/0x240 [ext4] severity=warning
kernel: [<ffffffffa0146eba>] ext4_dirty_inode+0x4a/0x70 [ext4] severity=warning
kernel: [<ffffffff8120644a>] __mark_inode_dirty+0x3a/0x290 severity=warning
kernel: [<ffffffff811f6f91>] update_time+0x81/0xc0 severity=warning
kernel: [<ffffffff811f7068>] file_update_time+0x98/0xe0 severity=warning
kernel: [<ffffffff811630b2>] __generic_file_write_iter+0x162/0x390 severity=warning
kernel: [<ffffffffa013deb9>] ext4_file_write_iter+0x119/0x430 [ext4] severity=warning
kernel: [<ffffffff811dbf22>] new_sync_write+0x92/0xd0 severity=warning
kernel: [<ffffffff811dc43e>] vfs_write+0xce/0x180 severity=warning
kernel: [<ffffffff811dca86>] SyS_write+0x56/0xd0 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task kworker/u36:2:20256 blocked for more than 120 seconds. severity=err
kernel:      Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: kworker/u36:2   D 0000000000000002     0 20256      2 0x00000080 severity=info
kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:32) severity=info
kernel: ffff881b4ef83448 0000000000000046 ffff881b4ef833f8 ffff881b4ef80010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273a238e60 ffff88273e322b20 severity=warning
kernel: ffff881b4ef83418 ffff88277f092c40 ffff88273a238e60 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa0175f37>] ? mb_mark_used+0x297/0x350 [ext4] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffff81210a9a>] ? __getblk_gfp+0x3a/0x80 severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa013c61a>] ? ext4_read_block_bitmap+0x3a/0x60 [ext4] severity=warning
kernel: [<ffffffffa0177139>] ext4_mb_mark_diskspace_used+0x79/0x320 [ext4] severity=warning
kernel: [<ffffffffa017f341>] ext4_mb_new_blocks+0x331/0x510 [ext4] severity=warning
kernel: [<ffffffffa01722d7>] ext4_ext_map_blocks+0x877/0xae0 [ext4] severity=warning
kernel: [<ffffffff8116f9e1>] ? release_pages+0x1e1/0x250 severity=warning
kernel: [<ffffffffa0143fc6>] ext4_map_blocks+0x176/0x4b0 [ext4] severity=warning
kernel: [<ffffffffa01449c4>] ? mpage_prepare_extent_to_map+0x304/0x350 [ext4] severity=warning
kernel: [<ffffffffa0144591>] mpage_map_one_extent+0x71/0x1a0 [ext4] severity=warning
kernel: [<ffffffffa01484ab>] mpage_map_and_submit_extent+0x4b/0x220 [ext4] severity=warning
kernel: [<ffffffffa01746a4>] ? __ext4_journal_start_sb+0x74/0x100 [ext4] severity=warning
kernel: [<ffffffffa0148b4b>] ext4_writepages+0x4cb/0x630 [ext4] severity=warning
kernel: [<ffffffff812c37e5>] ? blk_mq_flush_plug_list+0x135/0x150 severity=warning
kernel: [<ffffffff810a4073>] ? set_next_entity+0x93/0xa0 severity=warning
kernel: [<ffffffff8116cb20>] do_writepages+0x20/0x40 severity=warning
kernel: [<ffffffff81205990>] __writeback_single_inode+0x40/0x220 severity=warning
kernel: [<ffffffff810ad72f>] ? wake_up_bit+0x2f/0x40 severity=warning
kernel: [<ffffffff81206c44>] writeback_sb_inodes+0x2b4/0x3c0 severity=warning
kernel: [<ffffffff811deef1>] ? put_super+0x31/0x40 severity=warning
kernel: [<ffffffff81206dee>] __writeback_inodes_wb+0x9e/0xd0 severity=warning
kernel: [<ffffffff81207063>] wb_writeback+0x243/0x2d0 severity=warning
kernel: [<ffffffff812071eb>] wb_do_writeback+0xfb/0x1e0 severity=warning
kernel: [<ffffffff81207340>] bdi_writeback_workfn+0x70/0x210 severity=warning
kernel: [<ffffffff810848c3>] process_one_work+0x143/0x4b0 severity=warning
kernel: [<ffffffff81085a33>] worker_thread+0x123/0x520 severity=warning
kernel: [<ffffffff81614095>] ? __schedule+0x375/0x750 severity=warning
kernel: [<ffffffff81085910>] ? maybe_create_worker+0x130/0x130 severity=warning
kernel: [<ffffffff8108b31e>] kthread+0xce/0xf0 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: [<ffffffff81618c58>] ret_from_fork+0x58/0x90 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: INFO: task postmaster:14855 blocked for more than 120 seconds. severity=err
kernel:      Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster      D 0000000000000009     0 14855  25395 0x00000080 severity=info
kernel: ffff8822bf3877b8 0000000000000082 ffff8822bf387768 ffff8822bf384010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8825d71f5640 ffff88273e391cc0 severity=warning
kernel: ffff8822bf387788 ffff88277f252c40 ffff8825d71f5640 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff8120f780>] ? bh_lru_install+0x170/0x1a0 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa013c61a>] ? ext4_read_block_bitmap+0x3a/0x60 [ext4] severity=warning
kernel: [<ffffffffa01771ab>] ext4_mb_mark_diskspace_used+0xeb/0x320 [ext4] severity=warning
kernel: [<ffffffffa017f341>] ext4_mb_new_blocks+0x331/0x510 [ext4] severity=warning
kernel: [<ffffffffa01722d7>] ext4_ext_map_blocks+0x877/0xae0 [ext4] severity=warning
kernel: [<ffffffff8116f9e1>] ? release_pages+0x1e1/0x250 severity=warning
kernel: [<ffffffffa0143fc6>] ext4_map_blocks+0x176/0x4b0 [ext4] severity=warning
kernel: [<ffffffffa01449c4>] ? mpage_prepare_extent_to_map+0x304/0x350 [ext4] severity=warning
kernel: [<ffffffffa0144591>] mpage_map_one_extent+0x71/0x1a0 [ext4] severity=warning
kernel: [<ffffffffa01484ab>] mpage_map_and_submit_extent+0x4b/0x220 [ext4] severity=warning
kernel: [<ffffffffa0148b4b>] ext4_writepages+0x4cb/0x630 [ext4] severity=warning
kernel: [<ffffffff81193472>] ? handle_mm_fault+0xb2/0x1a0 severity=warning
kernel: [<ffffffff8116cb20>] do_writepages+0x20/0x40 severity=warning
kernel: [<ffffffff81160a69>] __filemap_fdatawrite_range+0x59/0x60 severity=warning
kernel: [<ffffffff81160b1a>] filemap_write_and_wait_range+0xaa/0x100 severity=warning
kernel: [<ffffffffa013eac3>] ext4_sync_file+0xf3/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14856 blocked for more than 120 seconds. severity=err
kernel:      Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster      D 0000000000000006     0 14856  25395 0x00000080 severity=info
kernel: ffff88262affb8a8 0000000000000086 ffff88262affb858 ffff88262aff8010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266aab1cc0 ffff88273e3264a0 severity=warning
kernel: ffff88225c97d6a0 ffff88277f192c40 ffff88266aab1cc0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa011576e>] ? start_this_handle+0x28e/0x520 [jbd2] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa01406d4>] __ext4_new_inode+0x3a4/0x1110 [ext4] severity=warning
kernel: [<ffffffff8123c960>] ? __dquot_initialize+0x30/0x1e0 severity=warning
kernel: [<ffffffffa0150bbb>] ext4_create+0xab/0x170 [ext4] severity=warning
kernel: [<ffffffff811e9118>] vfs_create+0xd8/0x100 severity=warning
kernel: [<ffffffff811e982e>] lookup_open+0x13e/0x1d0 severity=warning
kernel: [<ffffffff811ed006>] do_last+0x416/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff811e77a9>] ? putname+0x29/0x40 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff8110f3e6>] ? __audit_syscall_exit+0x246/0x2f0 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14857 blocked for more than 120 seconds. severity=err
kernel:      Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster      D 0000000000000010     0 14857  25395 0x00000080 severity=info
kernel: ffff880748203df8 0000000000000082 ffff8826a94e67c8 ffff880748200010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266aab5640 ffff88273e3b8e60 severity=warning
kernel: ffff880748203e08 00000000023bc691 ffff882737c2e800 ffff880748203e38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14859 blocked for more than 120 seconds. severity=err
kernel:      Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster      D 0000000000000001     0 14859  25395 0x00000080 severity=info
kernel: ffff8820d2017c38 0000000000000082 ffff882700000000 ffff8820d2014010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882657c99cc0 ffff88273e321cc0 severity=warning
kernel: ffff882657c99cc0 ffff8826feadc068 ffff882657c99cc0 ffff8826feadc06c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ecfec>] do_last+0x3fc/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff816165b6>] ? mutex_lock+0x16/0x40 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff8110f3e6>] ? __audit_syscall_exit+0x246/0x2f0 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14860 blocked for more than 120 seconds. severity=err
kernel:      Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster      D 000000000000000e     0 14860  25395 0x00000080 severity=info
kernel: ffff8820d22fbdf8 0000000000000086 ffff8826947ba4d8 ffff8820d22f8010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266a93e4a0 ffff88273e3964a0 severity=warning
kernel: ffff8820d22fbe08 00000000023bc691 ffff882737c2e800 ffff8820d22fbe38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14861 blocked for more than 120 seconds. severity=err
kernel:      Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster      D 0000000000000007     0 14861  25395 0x00000080 severity=info
kernel: ffff88041dfdfc38 0000000000000082 0000000000000000 ffff88041dfdc010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266a939cc0 ffff88273e390000 severity=warning
kernel: ffff88281ffd8d80 ffff8826feadc068 ffff88266a939cc0 ffff8826feadc06c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ecfec>] do_last+0x3fc/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: INFO: task postmaster:14868 blocked for more than 120 seconds. severity=err
kernel:      Not tainted 3.18.21-1.el6.gdc.x86_64 #1 severity=err
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. severity=err
kernel: postmaster      D 000000000000000a     0 14868  25395 0x00000080 severity=info
kernel: ffff8821bd68fdf8 0000000000000086 ffff88269980e7c8 ffff8821bd68c010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273dd68e60 ffff88273e392b20 severity=warning
kernel: ffff8821bd68fe08 00000000023bc691 ffff882737c2e800 ffff8821bd68fe38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: SysRq : Show Blocked State severity=info
kernel:  task                        PC stack   pid father severity=info
kernel: jbd2/vdc-8      D 0000000000000002     0  4710      2 0x00000080 severity=info
kernel: ffff88273a373b38 0000000000000046 ffff88273a373ae8 ffff88273a370010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273c76c7e0 ffff88273e322b20 severity=warning
kernel: ffff88273a373b38 ffff88277f092c40 ffff88273c76c7e0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffff8121021e>] __wait_on_buffer+0x2e/0x30 severity=warning
kernel: [<ffffffffa01180a1>] jbd2_journal_commit_transaction+0x861/0x16e0 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011d847>] kjournald2+0x127/0x340 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011d720>] ? jbd2_journal_clear_features+0x90/0x90 [jbd2] severity=warning
kernel: [<ffffffff8108b31e>] kthread+0xce/0xf0 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: [<ffffffff81618c58>] ret_from_fork+0x58/0x90 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: postmaster      D 000000000000000f     0 25406  25395 0x00000080 severity=info
kernel: ffff882485e5f9a8 0000000000000086 ffff882485e5f958 ffff882485e5c010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882657c9d640 ffff88273e3b8000 severity=warning
kernel: ffff882485e5f9a8 ffff88277f3d2c40 ffff882657c9d640 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffff81210a9a>] ? __getblk_gfp+0x3a/0x80 severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa0146c08>] ext4_reserve_inode_write+0x78/0xa0 [ext4] severity=warning
kernel: [<ffffffffa0146c82>] ext4_mark_inode_dirty+0x52/0x240 [ext4] severity=warning
kernel: [<ffffffffa0146eba>] ext4_dirty_inode+0x4a/0x70 [ext4] severity=warning
kernel: [<ffffffff8120644a>] __mark_inode_dirty+0x3a/0x290 severity=warning
kernel: [<ffffffff811f6f91>] update_time+0x81/0xc0 severity=warning
kernel: [<ffffffff811f7068>] file_update_time+0x98/0xe0 severity=warning
kernel: [<ffffffff811630b2>] __generic_file_write_iter+0x162/0x390 severity=warning
kernel: [<ffffffffa013deb9>] ext4_file_write_iter+0x119/0x430 [ext4] severity=warning
kernel: [<ffffffff811dbf22>] new_sync_write+0x92/0xd0 severity=warning
kernel: [<ffffffff811dc43e>] vfs_write+0xce/0x180 severity=warning
kernel: [<ffffffff811dca86>] SyS_write+0x56/0xd0 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 000000000000000b     0 26139  25395 0x00000080 severity=info
kernel: ffff882451267e28 0000000000000082 ffff882737404020 ffff882451264010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882641ea8e60 ffff88273e393980 severity=warning
kernel: 0000000000000003 ffff8826feadc068 ffff882641ea8e60 ffff8826feadc06c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616694>] __mutex_lock_killable_slowpath+0xb4/0x140 severity=warning
kernel: [<ffffffff81616762>] mutex_lock_killable+0x42/0x50 severity=warning
kernel: [<ffffffff811efc77>] iterate_dir+0x77/0x140 severity=warning
kernel: [<ffffffff8110e0cc>] ? __audit_syscall_entry+0xac/0x110 severity=warning
kernel: [<ffffffff811efef1>] SyS_getdents+0xb1/0x100 severity=warning
kernel: [<ffffffff811effd0>] ? SyS_old_readdir+0x90/0x90 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: kworker/u36:2   D 0000000000000002     0 20256      2 0x00000080 severity=info
kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:32) severity=info
kernel: ffff881b4ef83448 0000000000000046 ffff881b4ef833f8 ffff881b4ef80010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273a238e60 ffff88273e322b20 severity=warning
kernel: ffff881b4ef83418 ffff88277f092c40 ffff88273a238e60 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa0175f37>] ? mb_mark_used+0x297/0x350 [ext4] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffff81210a9a>] ? __getblk_gfp+0x3a/0x80 severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa013c61a>] ? ext4_read_block_bitmap+0x3a/0x60 [ext4] severity=warning
kernel: [<ffffffffa0177139>] ext4_mb_mark_diskspace_used+0x79/0x320 [ext4] severity=warning
kernel: [<ffffffffa017f341>] ext4_mb_new_blocks+0x331/0x510 [ext4] severity=warning
kernel: [<ffffffffa01722d7>] ext4_ext_map_blocks+0x877/0xae0 [ext4] severity=warning
kernel: [<ffffffff8116f9e1>] ? release_pages+0x1e1/0x250 severity=warning
kernel: [<ffffffffa0143fc6>] ext4_map_blocks+0x176/0x4b0 [ext4] severity=warning
kernel: [<ffffffffa01449c4>] ? mpage_prepare_extent_to_map+0x304/0x350 [ext4] severity=warning
kernel: [<ffffffffa0144591>] mpage_map_one_extent+0x71/0x1a0 [ext4] severity=warning
kernel: [<ffffffffa01484ab>] mpage_map_and_submit_extent+0x4b/0x220 [ext4] severity=warning
kernel: [<ffffffffa01746a4>] ? __ext4_journal_start_sb+0x74/0x100 [ext4] severity=warning
kernel: [<ffffffffa0148b4b>] ext4_writepages+0x4cb/0x630 [ext4] severity=warning
kernel: [<ffffffff812c37e5>] ? blk_mq_flush_plug_list+0x135/0x150 severity=warning
kernel: [<ffffffff810a4073>] ? set_next_entity+0x93/0xa0 severity=warning
kernel: [<ffffffff8116cb20>] do_writepages+0x20/0x40 severity=warning
kernel: [<ffffffff81205990>] __writeback_single_inode+0x40/0x220 severity=warning
kernel: [<ffffffff810ad72f>] ? wake_up_bit+0x2f/0x40 severity=warning
kernel: [<ffffffff81206c44>] writeback_sb_inodes+0x2b4/0x3c0 severity=warning
kernel: [<ffffffff811deef1>] ? put_super+0x31/0x40 severity=warning
kernel: [<ffffffff81206dee>] __writeback_inodes_wb+0x9e/0xd0 severity=warning
kernel: [<ffffffff81207063>] wb_writeback+0x243/0x2d0 severity=warning
kernel: [<ffffffff812071eb>] wb_do_writeback+0xfb/0x1e0 severity=warning
kernel: [<ffffffff81207340>] bdi_writeback_workfn+0x70/0x210 severity=warning
kernel: [<ffffffff810848c3>] process_one_work+0x143/0x4b0 severity=warning
kernel: [<ffffffff81085a33>] worker_thread+0x123/0x520 severity=warning
kernel: [<ffffffff81614095>] ? __schedule+0x375/0x750 severity=warning
kernel: [<ffffffff81085910>] ? maybe_create_worker+0x130/0x130 severity=warning
kernel: [<ffffffff8108b31e>] kthread+0xce/0xf0 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: [<ffffffff81618c58>] ret_from_fork+0x58/0x90 severity=warning
kernel: [<ffffffff8108b250>] ? kthread_freezable_should_stop+0x70/0x70 severity=warning
kernel: postmaster      D 0000000000000009     0 14855  25395 0x00000080 severity=info
kernel: ffff8822bf3877b8 0000000000000082 ffff8822bf387768 ffff8822bf384010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8825d71f5640 ffff88273e391cc0 severity=warning
kernel: ffff8822bf387788 ffff88277f252c40 ffff8825d71f5640 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff8120f780>] ? bh_lru_install+0x170/0x1a0 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa013c61a>] ? ext4_read_block_bitmap+0x3a/0x60 [ext4] severity=warning
kernel: [<ffffffffa01771ab>] ext4_mb_mark_diskspace_used+0xeb/0x320 [ext4] severity=warning
kernel: [<ffffffffa017f341>] ext4_mb_new_blocks+0x331/0x510 [ext4] severity=warning
kernel: [<ffffffffa01722d7>] ext4_ext_map_blocks+0x877/0xae0 [ext4] severity=warning
kernel: [<ffffffff8116f9e1>] ? release_pages+0x1e1/0x250 severity=warning
kernel: [<ffffffffa0143fc6>] ext4_map_blocks+0x176/0x4b0 [ext4] severity=warning
kernel: [<ffffffffa01449c4>] ? mpage_prepare_extent_to_map+0x304/0x350 [ext4] severity=warning
kernel: [<ffffffffa0144591>] mpage_map_one_extent+0x71/0x1a0 [ext4] severity=warning
kernel: [<ffffffffa01484ab>] mpage_map_and_submit_extent+0x4b/0x220 [ext4] severity=warning
kernel: [<ffffffffa0148b4b>] ext4_writepages+0x4cb/0x630 [ext4] severity=warning
kernel: [<ffffffff81193472>] ? handle_mm_fault+0xb2/0x1a0 severity=warning
kernel: [<ffffffff8116cb20>] do_writepages+0x20/0x40 severity=warning
kernel: [<ffffffff81160a69>] __filemap_fdatawrite_range+0x59/0x60 severity=warning
kernel: [<ffffffff81160b1a>] filemap_write_and_wait_range+0xaa/0x100 severity=warning
kernel: [<ffffffffa013eac3>] ext4_sync_file+0xf3/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000006     0 14856  25395 0x00000080 severity=info
kernel: ffff88262affb8a8 0000000000000086 ffff88262affb858 ffff88262aff8010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266aab1cc0 ffff88273e3264a0 severity=warning
kernel: ffff88225c97d6a0 ffff88277f192c40 ffff88266aab1cc0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa011576e>] ? start_this_handle+0x28e/0x520 [jbd2] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa01406d4>] __ext4_new_inode+0x3a4/0x1110 [ext4] severity=warning
kernel: [<ffffffff8123c960>] ? __dquot_initialize+0x30/0x1e0 severity=warning
kernel: [<ffffffffa0150bbb>] ext4_create+0xab/0x170 [ext4] severity=warning
kernel: [<ffffffff811e9118>] vfs_create+0xd8/0x100 severity=warning
kernel: [<ffffffff811e982e>] lookup_open+0x13e/0x1d0 severity=warning
kernel: [<ffffffff811ed006>] do_last+0x416/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff811e77a9>] ? putname+0x29/0x40 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff8110f3e6>] ? __audit_syscall_exit+0x246/0x2f0 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000010     0 14857  25395 0x00000080 severity=info
kernel: ffff880748203df8 0000000000000082 ffff8826a94e67c8 ffff880748200010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266aab5640 ffff88273e3b8e60 severity=warning
kernel: ffff880748203e08 00000000023bc691 ffff882737c2e800 ffff880748203e38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000001     0 14859  25395 0x00000080 severity=info
kernel: ffff8820d2017c38 0000000000000082 ffff882700000000 ffff8820d2014010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882657c99cc0 ffff88273e321cc0 severity=warning
kernel: ffff882657c99cc0 ffff8826feadc068 ffff882657c99cc0 ffff8826feadc06c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ecfec>] do_last+0x3fc/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff816165b6>] ? mutex_lock+0x16/0x40 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff8110f3e6>] ? __audit_syscall_exit+0x246/0x2f0 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 000000000000000e     0 14860  25395 0x00000080 severity=info
kernel: ffff8820d22fbdf8 0000000000000086 ffff8826947ba4d8 ffff8820d22f8010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266a93e4a0 ffff88273e3964a0 severity=warning
kernel: ffff8820d22fbe08 00000000023bc691 ffff882737c2e800 ffff8820d22fbe38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000007     0 14861  25395 0x00000080 severity=info
kernel: ffff88041dfdfc38 0000000000000082 0000000000000000 ffff88041dfdc010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88266a939cc0 ffff88273e390000 severity=warning
kernel: ffff88281ffd8d80 ffff8826feadc068 ffff88266a939cc0 ffff8826feadc06c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ecfec>] do_last+0x3fc/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 000000000000000a     0 14868  25395 0x00000080 severity=info
kernel: ffff8821bd68fdf8 0000000000000086 ffff88269980e7c8 ffff8821bd68c010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273dd68e60 ffff88273e392b20 severity=warning
kernel: ffff8821bd68fe08 00000000023bc691 ffff882737c2e800 ffff8821bd68fe38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000001     0 14869  25395 0x00000080 severity=info
kernel: ffff880ed01d3df8 0000000000000082 ffff8826995ed418 ffff880ed01d0010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88273dd6d640 ffff88273e321cc0 severity=warning
kernel: ffff880ed01d3e08 00000000023bc691 ffff882737c2e800 ffff880ed01d3e38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 000000000000000b     0 14905  25395 0x00000080 severity=info
kernel: ffff8824d12979a8 0000000000000086 ffff8824d1297958 ffff8824d1294010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8826317f3980 ffff88273e393980 severity=warning
kernel: 000000000000008d ffff88277f2d2c40 ffff8826317f3980 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffff81210a9a>] ? __getblk_gfp+0x3a/0x80 severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa0146c08>] ext4_reserve_inode_write+0x78/0xa0 [ext4] severity=warning
kernel: [<ffffffffa0146c82>] ext4_mark_inode_dirty+0x52/0x240 [ext4] severity=warning
kernel: [<ffffffffa0146eba>] ext4_dirty_inode+0x4a/0x70 [ext4] severity=warning
kernel: [<ffffffff8120644a>] __mark_inode_dirty+0x3a/0x290 severity=warning
kernel: [<ffffffff811f6f91>] update_time+0x81/0xc0 severity=warning
kernel: [<ffffffff811fcc63>] ? mntput+0x23/0x40 severity=warning
kernel: [<ffffffff811f7068>] file_update_time+0x98/0xe0 severity=warning
kernel: [<ffffffff811e6925>] ? terminate_walk+0x35/0x40 severity=warning
kernel: [<ffffffff811630b2>] __generic_file_write_iter+0x162/0x390 severity=warning
kernel: [<ffffffffa013deb9>] ext4_file_write_iter+0x119/0x430 [ext4] severity=warning
kernel: [<ffffffff81170c3b>] ? invalidate_mapping_pages+0x8b/0x210 severity=warning
kernel: [<ffffffff811dbf22>] new_sync_write+0x92/0xd0 severity=warning
kernel: [<ffffffff811dc43e>] vfs_write+0xce/0x180 severity=warning
kernel: [<ffffffff811dca86>] SyS_write+0x56/0xd0 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000000     0 14910  25395 0x00000080 severity=info
kernel: ffff882641f7bdf8 0000000000000086 ffff8802a90d7398 ffff882641f78010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8800db8c0000 ffffffff81c1b4c0 severity=warning
kernel: ffff882641f7be08 00000000023bc691 ffff882737c2e800 ffff882641f7be38 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffffa011db45>] jbd2_log_wait_commit+0xe5/0x170 [jbd2] severity=warning
kernel: [<ffffffff810ad240>] ? woken_wake_function+0x20/0x20 severity=warning
kernel: [<ffffffffa011de47>] jbd2_complete_transaction+0x57/0x90 [jbd2] severity=warning
kernel: [<ffffffffa013ebcd>] ext4_sync_file+0x1fd/0x260 [ext4] severity=warning
kernel: [<ffffffff8120cf91>] vfs_fsync_range+0x21/0x30 severity=warning
kernel: [<ffffffff8120cfbc>] vfs_fsync+0x1c/0x20 severity=warning
kernel: [<ffffffff8120d1ad>] do_fsync+0x3d/0x70 severity=warning
kernel: [<ffffffff8120d210>] SyS_fsync+0x10/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: smokerd plugin  D 0000000000000009     0 15001  29979 0x00000084 severity=info
kernel: ffff88264c437b48 0000000000000082 ffff88264c437af8 ffff88264c434010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8825d71f1cc0 ffff88273e391cc0 severity=warning
kernel: ffff8803362adc98 ffff88277f252c40 ffff8825d71f1cc0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa0116cbf>] ? jbd2_journal_dirty_metadata+0xff/0x250 [jbd2] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa014cb7f>] ext4_orphan_add+0xbf/0x2e0 [ext4] severity=warning
kernel: [<ffffffffa0146cb8>] ? ext4_mark_inode_dirty+0x88/0x240 [ext4] severity=warning
kernel: [<ffffffffa0151d3a>] ext4_unlink+0x2aa/0x2d0 [ext4] severity=warning
kernel: [<ffffffff811e8bc3>] vfs_unlink+0xd3/0x140 severity=warning
kernel: [<ffffffff811ebd06>] do_unlinkat+0x2d6/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000007     0 15029  25395 0x00000080 severity=info
kernel: ffff88206081f8a8 0000000000000086 ffff88206081f858 ffff88206081c010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8825ab41ab20 ffff88273e390000 severity=warning
kernel: ffff88277ec03700 ffff88277f1d2c40 ffff8825ab41ab20 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa011576e>] ? start_this_handle+0x28e/0x520 [jbd2] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa0140ada>] __ext4_new_inode+0x7aa/0x1110 [ext4] severity=warning
kernel: [<ffffffff8123c960>] ? __dquot_initialize+0x30/0x1e0 severity=warning
kernel: [<ffffffff8115fece>] ? find_get_entry+0x1e/0xa0 severity=warning
kernel: [<ffffffffa0150bbb>] ext4_create+0xab/0x170 [ext4] severity=warning
kernel: [<ffffffff811e9118>] vfs_create+0xd8/0x100 severity=warning
kernel: [<ffffffff811e982e>] lookup_open+0x13e/0x1d0 severity=warning
kernel: [<ffffffff811ed006>] do_last+0x416/0x820 severity=warning
kernel: [<ffffffff811ed4d4>] path_openat+0xc4/0x480 severity=warning
kernel: [<ffffffff811ed9ca>] do_filp_open+0x4a/0xa0 severity=warning
kernel: [<ffffffff811e77a9>] ? putname+0x29/0x40 severity=warning
kernel: [<ffffffff811fa64c>] ? __alloc_fd+0xac/0x150 severity=warning
kernel: [<ffffffff811db9ea>] do_sys_open+0x11a/0x230 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811dbb3e>] SyS_open+0x1e/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000000     0 15175  25395 0x00000080 severity=info
kernel: ffff88057229bb48 0000000000000086 ffff88057229baf8 ffff880572298010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88245105c7e0 ffffffff81c1b4c0 severity=warning
kernel: ffff88057229bb38 ffff88277f012c40 ffff88245105c7e0 ffffffff81614e10 severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161469c>] io_schedule+0x8c/0xd0 severity=warning
kernel: [<ffffffff81614e3c>] bit_wait_io+0x2c/0x50 severity=warning
kernel: [<ffffffff81614bc5>] __wait_on_bit+0x65/0x90 severity=warning
kernel: [<ffffffffa0116cbf>] ? jbd2_journal_dirty_metadata+0xff/0x250 [jbd2] severity=warning
kernel: [<ffffffff81614e10>] ? bit_wait_timeout+0x70/0x70 severity=warning
kernel: [<ffffffff81614d18>] out_of_line_wait_on_bit+0x78/0x90 severity=warning
kernel: [<ffffffff810ad2c0>] ? wake_atomic_t_function+0x40/0x40 severity=warning
kernel: [<ffffffffa011676d>] do_get_write_access+0x1fd/0x4e0 [jbd2] severity=warning
kernel: [<ffffffffa0116ba1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] severity=warning
kernel: [<ffffffffa0174403>] __ext4_journal_get_write_access+0x43/0x90 [ext4] severity=warning
kernel: [<ffffffffa014cb7f>] ext4_orphan_add+0xbf/0x2e0 [ext4] severity=warning
kernel: [<ffffffffa0146cb8>] ? ext4_mark_inode_dirty+0x88/0x240 [ext4] severity=warning
kernel: [<ffffffffa0151d3a>] ext4_unlink+0x2aa/0x2d0 [ext4] severity=warning
kernel: [<ffffffff811e8bc3>] vfs_unlink+0xd3/0x140 severity=warning
kernel: [<ffffffff811ebd06>] do_unlinkat+0x2d6/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000003     0 15823  25395 0x00000084 severity=info
kernel: ffff880424173db8 0000000000000086 ffff880424173de8 ffff880424170010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882254f49cc0 ffff88273e323980 severity=warning
kernel: ffff880424173dd8 ffff8826e845acb8 ffff882254f49cc0 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000007     0 15855  25395 0x00000080 severity=info
kernel: ffff88266a8c3db8 0000000000000086 ffff88266a8c3de8 ffff88266a8c0010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff8820f680c7e0 ffff88273e390000 severity=warning
kernel: ffff88266a8c3dd8 ffff8826e845acb8 ffff8820f680c7e0 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000006     0 16102  25395 0x00000084 severity=info
kernel: ffff8820d23e7db8 0000000000000086 ffff8820d23e7de8 ffff8820d23e4010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff881b4d6147e0 ffff88273e3264a0 severity=warning
kernel: ffff8820d23e7dd8 ffff8826e845acb8 ffff881b4d6147e0 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 000000000000000c     0 16227  25395 0x00000084 severity=info
kernel: ffff881b4d8efdb8 0000000000000082 ffff881b4d8efde8 ffff881b4d8ec010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88220a8fe4a0 ffff88273e3947e0 severity=warning
kernel: ffff881b4d8efdd8 ffff8826e845acb8 ffff88220a8fe4a0 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 000000000000000d     0 16658  25395 0x00000084 severity=info
kernel: ffff88251e143db8 0000000000000082 ffff88251e143de8 ffff88251e140010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88265b268000 ffff88273e395640 severity=warning
kernel: ffff88251e143dd8 ffff8826e845acb8 ffff88265b268000 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 0000000000000003     0 24648  25395 0x00000084 severity=info
kernel: ffff88266abb3db8 0000000000000086 ffff88266abb3de8 ffff88266abb0010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882254f4ab20 ffff88273e323980 severity=warning
kernel: ffff88266abb3dd8 ffff8826e845acb8 ffff882254f4ab20 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: bash            D 0000000000000004     0 25435  25434 0x00000084 severity=info
kernel: ffff88265b2c3e28 0000000000000086 ffff882737404020 ffff88265b2c0010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff88217e4764a0 ffff88273e3247e0 severity=warning
kernel: 0000000000000003 ffff88270bfe3c78 ffff88217e4764a0 ffff88270bfe3c7c severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616694>] __mutex_lock_killable_slowpath+0xb4/0x140 severity=warning
kernel: [<ffffffff81616762>] mutex_lock_killable+0x42/0x50 severity=warning
kernel: [<ffffffff811efc77>] iterate_dir+0x77/0x140 severity=warning
kernel: [<ffffffff8110e0cc>] ? __audit_syscall_entry+0xac/0x110 severity=warning
kernel: [<ffffffff811efef1>] SyS_getdents+0xb1/0x100 severity=warning
kernel: [<ffffffff811effd0>] ? SyS_old_readdir+0x90/0x90 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
kernel: postmaster      D 000000000000000b     0 27811  25395 0x00000084 severity=info
kernel: ffff8822bf0e7db8 0000000000000086 ffff8822bf0e7de8 ffff8822bf0e4010 severity=warning
kernel: 0000000000012c40 0000000000012c40 ffff882739808000 ffff88273e393980 severity=warning
kernel: ffff8822bf0e7dd8 ffff8826e845acb8 ffff882739808000 ffff8826e845acbc severity=warning
kernel: Call Trace: severity=warning
kernel: [<ffffffff816145c9>] schedule+0x29/0x70 severity=warning
kernel: [<ffffffff8161472e>] schedule_preempt_disabled+0xe/0x10 severity=warning
kernel: [<ffffffff81616535>] __mutex_lock_slowpath+0x95/0x100 severity=warning
kernel: [<ffffffff816165c3>] mutex_lock+0x23/0x40 severity=warning
kernel: [<ffffffff811ebb58>] do_unlinkat+0x128/0x320 severity=warning
kernel: [<ffffffff81021c8c>] ? do_audit_syscall_entry+0x6c/0x70 severity=warning
kernel: [<ffffffff81021dc3>] ? syscall_trace_enter_phase1+0x133/0x150 severity=warning
kernel: [<ffffffff811ebd66>] SyS_unlink+0x16/0x20 severity=warning
kernel: [<ffffffff81618d09>] system_call_fastpath+0x12/0x17 severity=warning
Jan Kara June 9, 2016, 2:59 p.m. UTC | #3
Please use reply-to-all when replying to emails. Usually I read mailing
lists with much smaller frequency than directly sent email...

On Wed 08-06-16 14:23:49, Holger Hoffstätte wrote:
> On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> (snip)
> > Attached patch fixes the issue for me. I'll submit it once a full xfstests
> > run finishes for it (which may take a while as our server room is currently
> > moving to a different place).
> > 
> > 								Honza
> > -- 
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
> > From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@suse.cz>
> > Date: Wed, 8 Jun 2016 10:01:45 +0200
> > Subject: [PATCH] ext4: Fix deadlock during page writeback
> > 
> > Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> > deadlock in ext4_writepages() which was previously much harder to hit.
> > After this commit xfstest generic/130 reproduces the deadlock on small
> > filesystems.
> 
> Since you marked this for -stable, just a heads-up that the previous patch
> for the data exposure was rejected from -stable (see [1]) because it
> has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> until 4.6. I removed it locally but Greg probably wants an official patch.
> 
> So both this and the previous patch need to be submitted.

This patch is actually independent fix to the previous one. Just the
previous patch made the bug more visible. Regarding the backport - yeah, I
have it on my todo list.

								Honza
Jan Kara June 9, 2016, 3:04 p.m. UTC | #4
On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
> Holger Hoffstätte <holger@applied-asynchrony.com> writes:
> 
> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> > (snip)
> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> >> run finishes for it (which may take a while as our server room is currently
> >> moving to a different place).
> >> 
> >> 								Honza
> >> -- 
> >> Jan Kara <jack@suse.com>
> >> SUSE Labs, CR
> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> >> From: Jan Kara <jack@suse.cz>
> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
> >> 
> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> >> deadlock in ext4_writepages() which was previously much harder to hit.
> >> After this commit xfstest generic/130 reproduces the deadlock on small
> >> filesystems.
> >
> > Since you marked this for -stable, just a heads-up that the previous patch
> > for the data exposure was rejected from -stable (see [1]) because it
> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> > until 4.6. I removed it locally but Greg probably wants an official patch.
> >
> > So both this and the previous patch need to be submitted.
> >
> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
> 
> I'm just wondering if the Jan's patch is not related to blocked
> processes in following trace. It very hard to hit it and I don't have
> any reproducer.

This looks like a different issue. Does the machine recover itself or is it
a hard hang and you have to press a reset button?

								Honza
Nikola Pajkovsky June 10, 2016, 5:52 a.m. UTC | #5
Jan Kara <jack@suse.cz> writes:

> On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
>> Holger Hoffstätte <holger@applied-asynchrony.com> writes:
>> 
>> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
>> > (snip)
>> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> >> run finishes for it (which may take a while as our server room is currently
>> >> moving to a different place).
>> >> 
>> >> 								Honza
>> >> -- 
>> >> Jan Kara <jack@suse.com>
>> >> SUSE Labs, CR
>> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> >> From: Jan Kara <jack@suse.cz>
>> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
>> >> 
>> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> >> deadlock in ext4_writepages() which was previously much harder to hit.
>> >> After this commit xfstest generic/130 reproduces the deadlock on small
>> >> filesystems.
>> >
>> > Since you marked this for -stable, just a heads-up that the previous patch
>> > for the data exposure was rejected from -stable (see [1]) because it
>> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
>> > until 4.6. I removed it locally but Greg probably wants an official patch.
>> >
>> > So both this and the previous patch need to be submitted.
>> >
>> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
>> 
>> I'm just wondering if the Jan's patch is not related to blocked
>> processes in following trace. It very hard to hit it and I don't have
>> any reproducer.
>
> This looks like a different issue. Does the machine recover itself or is it
> a hard hang and you have to press a reset button?

The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
ram and machine has dedicated mount point only for PostgreSQL data.

Nevertheless, I was able always to ssh to the machine, so machine itself
was not in hard hang and ext4 mostly gets recover by itself (it took
30min). But I have seen situation, were every process who 'touch' the ext4
goes immediately to D state and does not recover even after hour.
Eryu Guan June 10, 2016, 8:37 a.m. UTC | #6
On Wed, Jun 08, 2016 at 02:56:31PM +0200, Jan Kara wrote:
> On Fri 03-06-16 13:58:44, Jan Kara wrote:
> > On Fri 03-06-16 18:16:12, Eryu Guan wrote:
> > > On Thu, Jun 02, 2016 at 02:17:50PM +0200, Jan Kara wrote:
> > > > 
> > > > So I was trying but I could not reproduce the hang either. Can you find out
> > > > which page is jbd2 thread waiting for and dump page->index, page->flags and
> > > > also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
> > > > page->private? Maybe that will shed some light...
> > > 
> > > I'm using crash on live system when the hang happens, so I got the page
> > > address from "bt -f"
> > > 
> > >  #6 [ffff880212343b40] wait_on_page_bit at ffffffff8119009e
> > >     ffff880212343b48: ffffea0002c23600 000000000000000d 
> > >     ffff880212343b58: 0000000000000000 0000000000000000 
> > >     ffff880212343b68: ffff880213251480 ffffffff810cd000 
> > >     ffff880212343b78: ffff88021ff27218 ffff88021ff27218 
> > >     ffff880212343b88: 00000000c1b4a75a ffff880212343c68 
> > >     ffff880212343b98: ffffffff811901bf
> > 
> > Thanks for debugging! In the end I was able to reproduce the issue on my
> > UML instance as well and I'm debugging what's going on.
> 
> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> run finishes for it (which may take a while as our server room is currently
> moving to a different place).

(Sorry for the late reply, I was on holiday yesterday)

Thanks for the fix! I'll give it a test as well.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan June 12, 2016, 3:28 a.m. UTC | #7
On Fri, Jun 10, 2016 at 04:37:36PM +0800, Eryu Guan wrote:
> On Wed, Jun 08, 2016 at 02:56:31PM +0200, Jan Kara wrote:
> > On Fri 03-06-16 13:58:44, Jan Kara wrote:
> > > On Fri 03-06-16 18:16:12, Eryu Guan wrote:
> > > > On Thu, Jun 02, 2016 at 02:17:50PM +0200, Jan Kara wrote:
> > > > > 
> > > > > So I was trying but I could not reproduce the hang either. Can you find out
> > > > > which page is jbd2 thread waiting for and dump page->index, page->flags and
> > > > > also bh->b_state, bh->b_blocknr of all 4 buffer heads attached to it via
> > > > > page->private? Maybe that will shed some light...
> > > > 
> > > > I'm using crash on live system when the hang happens, so I got the page
> > > > address from "bt -f"
> > > > 
> > > >  #6 [ffff880212343b40] wait_on_page_bit at ffffffff8119009e
> > > >     ffff880212343b48: ffffea0002c23600 000000000000000d 
> > > >     ffff880212343b58: 0000000000000000 0000000000000000 
> > > >     ffff880212343b68: ffff880213251480 ffffffff810cd000 
> > > >     ffff880212343b78: ffff88021ff27218 ffff88021ff27218 
> > > >     ffff880212343b88: 00000000c1b4a75a ffff880212343c68 
> > > >     ffff880212343b98: ffffffff811901bf
> > > 
> > > Thanks for debugging! In the end I was able to reproduce the issue on my
> > > UML instance as well and I'm debugging what's going on.
> > 
> > Attached patch fixes the issue for me. I'll submit it once a full xfstests
> > run finishes for it (which may take a while as our server room is currently
> > moving to a different place).
> 
> (Sorry for the late reply, I was on holiday yesterday)
> 
> Thanks for the fix! I'll give it a test as well.

I tested this patch with xfstests on x86_64 and ppc64 hosts, all results
look fine, no regression found. Test configurations are: 4k/2k/1k block
size ext4/3/2 and data=journal|writeback ext4.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara June 16, 2016, 1:26 p.m. UTC | #8
On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
> Jan Kara <jack@suse.cz> writes:
> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
> >> Holger Hoffstätte <holger@applied-asynchrony.com> writes:
> >> 
> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> >> > (snip)
> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> >> >> run finishes for it (which may take a while as our server room is currently
> >> >> moving to a different place).
> >> >> 
> >> >> 								Honza
> >> >> -- 
> >> >> Jan Kara <jack@suse.com>
> >> >> SUSE Labs, CR
> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> >> >> From: Jan Kara <jack@suse.cz>
> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
> >> >> 
> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
> >> >> filesystems.
> >> >
> >> > Since you marked this for -stable, just a heads-up that the previous patch
> >> > for the data exposure was rejected from -stable (see [1]) because it
> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
> >> >
> >> > So both this and the previous patch need to be submitted.
> >> >
> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
> >> 
> >> I'm just wondering if the Jan's patch is not related to blocked
> >> processes in following trace. It very hard to hit it and I don't have
> >> any reproducer.
> >
> > This looks like a different issue. Does the machine recover itself or is it
> > a hard hang and you have to press a reset button?
> 
> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
> ram and machine has dedicated mount point only for PostgreSQL data.
> 
> Nevertheless, I was able always to ssh to the machine, so machine itself
> was not in hard hang and ext4 mostly gets recover by itself (it took
> 30min). But I have seen situation, were every process who 'touch' the ext4
> goes immediately to D state and does not recover even after hour.

If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
dump stuck processes and also run 'iostat -x 1' for a while to see how much
IO is happening in the system? That should tell us more.

								Honza
Nikola Pajkovsky June 16, 2016, 2:42 p.m. UTC | #9
Jan Kara <jack@suse.cz> writes:

> On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
>> Jan Kara <jack@suse.cz> writes:
>> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
>> >> Holger Hoffstätte <holger@applied-asynchrony.com> writes:
>> >> 
>> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
>> >> > (snip)
>> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> >> >> run finishes for it (which may take a while as our server room is currently
>> >> >> moving to a different place).
>> >> >> 
>> >> >> 								Honza
>> >> >> -- 
>> >> >> Jan Kara <jack@suse.com>
>> >> >> SUSE Labs, CR
>> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> >> >> From: Jan Kara <jack@suse.cz>
>> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
>> >> >> 
>> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
>> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
>> >> >> filesystems.
>> >> >
>> >> > Since you marked this for -stable, just a heads-up that the previous patch
>> >> > for the data exposure was rejected from -stable (see [1]) because it
>> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
>> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
>> >> >
>> >> > So both this and the previous patch need to be submitted.
>> >> >
>> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
>> >> 
>> >> I'm just wondering if the Jan's patch is not related to blocked
>> >> processes in following trace. It very hard to hit it and I don't have
>> >> any reproducer.
>> >
>> > This looks like a different issue. Does the machine recover itself or is it
>> > a hard hang and you have to press a reset button?
>> 
>> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
>> ram and machine has dedicated mount point only for PostgreSQL data.
>> 
>> Nevertheless, I was able always to ssh to the machine, so machine itself
>> was not in hard hang and ext4 mostly gets recover by itself (it took
>> 30min). But I have seen situation, were every process who 'touch' the ext4
>> goes immediately to D state and does not recover even after hour.
>
> If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
> dump stuck processes and also run 'iostat -x 1' for a while to see how much
> IO is happening in the system? That should tell us more.


Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
to mail it.

   http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html

I was running iotop and there was traffic roughly ~20 KB/s write.

What was bit more interesting, was looking at

   cat /proc/vmstat | egrep "nr_dirty|nr_writeback"

nr_drity had around 240 and was slowly counting up, but nr_writeback had
~8800 and was stuck for 120s.
Jan Kara June 20, 2016, 11:39 a.m. UTC | #10
On Thu 16-06-16 16:42:58, Nikola Pajkovsky wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> > On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
> >> Jan Kara <jack@suse.cz> writes:
> >> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
> >> >> Holger Hoffstätte <holger@applied-asynchrony.com> writes:
> >> >> 
> >> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> >> >> > (snip)
> >> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> >> >> >> run finishes for it (which may take a while as our server room is currently
> >> >> >> moving to a different place).
> >> >> >> 
> >> >> >> 								Honza
> >> >> >> -- 
> >> >> >> Jan Kara <jack@suse.com>
> >> >> >> SUSE Labs, CR
> >> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> >> >> >> From: Jan Kara <jack@suse.cz>
> >> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
> >> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
> >> >> >> 
> >> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> >> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
> >> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
> >> >> >> filesystems.
> >> >> >
> >> >> > Since you marked this for -stable, just a heads-up that the previous patch
> >> >> > for the data exposure was rejected from -stable (see [1]) because it
> >> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> >> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
> >> >> >
> >> >> > So both this and the previous patch need to be submitted.
> >> >> >
> >> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
> >> >> 
> >> >> I'm just wondering if the Jan's patch is not related to blocked
> >> >> processes in following trace. It very hard to hit it and I don't have
> >> >> any reproducer.
> >> >
> >> > This looks like a different issue. Does the machine recover itself or is it
> >> > a hard hang and you have to press a reset button?
> >> 
> >> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
> >> ram and machine has dedicated mount point only for PostgreSQL data.
> >> 
> >> Nevertheless, I was able always to ssh to the machine, so machine itself
> >> was not in hard hang and ext4 mostly gets recover by itself (it took
> >> 30min). But I have seen situation, were every process who 'touch' the ext4
> >> goes immediately to D state and does not recover even after hour.
> >
> > If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
> > dump stuck processes and also run 'iostat -x 1' for a while to see how much
> > IO is happening in the system? That should tell us more.
> 
> 
> Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
> to mail it.
> 
>    http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html

Can you upload it again please? I've got to looking at the file only today
and it is already deleted. Thanks!

> I was running iotop and there was traffic roughly ~20 KB/s write.
> 
> What was bit more interesting, was looking at
> 
>    cat /proc/vmstat | egrep "nr_dirty|nr_writeback"
> 
> nr_drity had around 240 and was slowly counting up, but nr_writeback had
> ~8800 and was stuck for 120s.

Hum, interesting. This would suggest like IO completion got stuck for some
reason. We'll see more from the stacktraces hopefully.

								Honza
Nikola Pajkovsky June 20, 2016, 12:59 p.m. UTC | #11
Jan Kara <jack@suse.cz> writes:

> On Thu 16-06-16 16:42:58, Nikola Pajkovsky wrote:
>> Jan Kara <jack@suse.cz> writes:
>> 
>> > On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
>> >> Jan Kara <jack@suse.cz> writes:
>> >> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
>> >> >> Holger Hoffstätte <holger@applied-asynchrony.com> writes:
>> >> >> 
>> >> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
>> >> >> > (snip)
>> >> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> >> >> >> run finishes for it (which may take a while as our server room is currently
>> >> >> >> moving to a different place).
>> >> >> >> 
>> >> >> >> 								Honza
>> >> >> >> -- 
>> >> >> >> Jan Kara <jack@suse.com>
>> >> >> >> SUSE Labs, CR
>> >> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> >> >> >> From: Jan Kara <jack@suse.cz>
>> >> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> >> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
>> >> >> >> 
>> >> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> >> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
>> >> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
>> >> >> >> filesystems.
>> >> >> >
>> >> >> > Since you marked this for -stable, just a heads-up that the previous patch
>> >> >> > for the data exposure was rejected from -stable (see [1]) because it
>> >> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
>> >> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
>> >> >> >
>> >> >> > So both this and the previous patch need to be submitted.
>> >> >> >
>> >> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
>> >> >> 
>> >> >> I'm just wondering if the Jan's patch is not related to blocked
>> >> >> processes in following trace. It very hard to hit it and I don't have
>> >> >> any reproducer.
>> >> >
>> >> > This looks like a different issue. Does the machine recover itself or is it
>> >> > a hard hang and you have to press a reset button?
>> >> 
>> >> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
>> >> ram and machine has dedicated mount point only for PostgreSQL data.
>> >> 
>> >> Nevertheless, I was able always to ssh to the machine, so machine itself
>> >> was not in hard hang and ext4 mostly gets recover by itself (it took
>> >> 30min). But I have seen situation, were every process who 'touch' the ext4
>> >> goes immediately to D state and does not recover even after hour.
>> >
>> > If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
>> > dump stuck processes and also run 'iostat -x 1' for a while to see how much
>> > IO is happening in the system? That should tell us more.
>> 
>> 
>> Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
>> to mail it.
>> 
>>    http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html
>
> Can you upload it again please? I've got to looking at the file only today
> and it is already deleted. Thanks!

http://expirebox.com/download/c010e712e55938435c446cdc01a0b523.html

>> I was running iotop and there was traffic roughly ~20 KB/s write.
>> 
>> What was bit more interesting, was looking at
>> 
>>    cat /proc/vmstat | egrep "nr_dirty|nr_writeback"
>> 
>> nr_drity had around 240 and was slowly counting up, but nr_writeback had
>> ~8800 and was stuck for 120s.
>
> Hum, interesting. This would suggest like IO completion got stuck for some
> reason. We'll see more from the stacktraces hopefully.

I have monitor /sys/kernel/debug/bdi/253:32/stats for 10 mins per 1 sec.
Values are all same as follows:

    --[ Sun Jun 19 06:11:08 CEST 2016
    BdiWriteback:            15840 kB
    BdiReclaimable:          32320 kB
    BdiDirtyThresh:              0 kB
    DirtyThresh:           1048576 kB
    BackgroundThresh:       131072 kB
    BdiDirtied:         6131163680 kB
    BdiWritten:         6130214880 kB
    BdiWriteBandwidth:      324948 kBps
    b_dirty:                     2
    b_io:                        3
    b_more_io:                   0
    bdi_list:                    1
    state:                       c

Maybe those values can cause issue and kicks in writeback to often and
block everyone else.

   $ sysctl -a | grep dirty | grep -v ratio
   vm.dirty_background_bytes = 134217728
   vm.dirty_bytes = 1073741824
   vm.dirty_expire_centisecs = 1500
   vm.dirty_writeback_centisecs = 500

I even have output of command, if you're interested.

  $ trace-cmd record -e ext4 -e jbd2 -e writeback -e block sleep 600
Jan Kara June 21, 2016, 10:11 a.m. UTC | #12
On Mon 20-06-16 14:59:57, Nikola Pajkovsky wrote:
> Jan Kara <jack@suse.cz> writes:
> > On Thu 16-06-16 16:42:58, Nikola Pajkovsky wrote:
> >> Jan Kara <jack@suse.cz> writes:
> >> 
> >> > On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
> >> >> Jan Kara <jack@suse.cz> writes:
> >> >> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
> >> >> >> Holger Hoffstätte <holger@applied-asynchrony.com> writes:
> >> >> >> 
> >> >> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
> >> >> >> > (snip)
> >> >> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
> >> >> >> >> run finishes for it (which may take a while as our server room is currently
> >> >> >> >> moving to a different place).
> >> >> >> >> 
> >> >> >> >> 								Honza
> >> >> >> >> -- 
> >> >> >> >> Jan Kara <jack@suse.com>
> >> >> >> >> SUSE Labs, CR
> >> >> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
> >> >> >> >> From: Jan Kara <jack@suse.cz>
> >> >> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
> >> >> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
> >> >> >> >> 
> >> >> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
> >> >> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
> >> >> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
> >> >> >> >> filesystems.
> >> >> >> >
> >> >> >> > Since you marked this for -stable, just a heads-up that the previous patch
> >> >> >> > for the data exposure was rejected from -stable (see [1]) because it
> >> >> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
> >> >> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
> >> >> >> >
> >> >> >> > So both this and the previous patch need to be submitted.
> >> >> >> >
> >> >> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
> >> >> >> 
> >> >> >> I'm just wondering if the Jan's patch is not related to blocked
> >> >> >> processes in following trace. It very hard to hit it and I don't have
> >> >> >> any reproducer.
> >> >> >
> >> >> > This looks like a different issue. Does the machine recover itself or is it
> >> >> > a hard hang and you have to press a reset button?
> >> >> 
> >> >> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
> >> >> ram and machine has dedicated mount point only for PostgreSQL data.
> >> >> 
> >> >> Nevertheless, I was able always to ssh to the machine, so machine itself
> >> >> was not in hard hang and ext4 mostly gets recover by itself (it took
> >> >> 30min). But I have seen situation, were every process who 'touch' the ext4
> >> >> goes immediately to D state and does not recover even after hour.
> >> >
> >> > If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
> >> > dump stuck processes and also run 'iostat -x 1' for a while to see how much
> >> > IO is happening in the system? That should tell us more.
> >> 
> >> 
> >> Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
> >> to mail it.
> >> 
> >>    http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html
> >
> > Can you upload it again please? I've got to looking at the file only today
> > and it is already deleted. Thanks!
> 
> http://expirebox.com/download/c010e712e55938435c446cdc01a0b523.html

OK, I had a look into the traces and JBD2 thread just waits for the buffers
is has submitted for IO to complete. The rest is just blocked on that. From
the message "INFO: task jbd2/vdc-8:4710 blocked for more than 120 seconds.
severity=err" we can see that the JBD2 process has been waiting for a
significant amount of time. Now the question is why it takes so long for
the IO to complete - likely not a fs problem but somewhere below - block
layer or the storage itself.

What is the underlying storage? And what IO scheduler do you use? Seeing
that the device is 'vdc' - that suggests you are running in a guest - is
there anything interesting happening on the host at that moment? Is IO from
other guests / the host stalled at that moment as well?

> >> I was running iotop and there was traffic roughly ~20 KB/s write.
> >> 
> >> What was bit more interesting, was looking at
> >> 
> >>    cat /proc/vmstat | egrep "nr_dirty|nr_writeback"
> >> 
> >> nr_drity had around 240 and was slowly counting up, but nr_writeback had
> >> ~8800 and was stuck for 120s.
> >
> > Hum, interesting. This would suggest like IO completion got stuck for some
> > reason. We'll see more from the stacktraces hopefully.
> 
> I have monitor /sys/kernel/debug/bdi/253:32/stats for 10 mins per 1 sec.
> Values are all same as follows:
> 
>     --[ Sun Jun 19 06:11:08 CEST 2016
>     BdiWriteback:            15840 kB
>     BdiReclaimable:          32320 kB
>     BdiDirtyThresh:              0 kB
>     DirtyThresh:           1048576 kB
>     BackgroundThresh:       131072 kB
>     BdiDirtied:         6131163680 kB
>     BdiWritten:         6130214880 kB
>     BdiWriteBandwidth:      324948 kBps
>     b_dirty:                     2
>     b_io:                        3
>     b_more_io:                   0
>     bdi_list:                    1
>     state:                       c

OK, so all the IO looks stalled for that period of it.

> Maybe those values can cause issue and kicks in writeback to often and
> block everyone else.
> 
>    $ sysctl -a | grep dirty | grep -v ratio
>    vm.dirty_background_bytes = 134217728
>    vm.dirty_bytes = 1073741824
>    vm.dirty_expire_centisecs = 1500
>    vm.dirty_writeback_centisecs = 500

This looks healthy.

> I even have output of command, if you're interested.
> 
>   $ trace-cmd record -e ext4 -e jbd2 -e writeback -e block sleep 600

Traces from block layer may be interesting but you'd need the trace started
before the hang starts so that you see what happened with the IO that
jbd2/vdc-8:4710 is waiting for.

								Honza
Nikola Pajkovsky June 22, 2016, 8:55 a.m. UTC | #13
Jan Kara <jack@suse.cz> writes:

> On Mon 20-06-16 14:59:57, Nikola Pajkovsky wrote:
>> Jan Kara <jack@suse.cz> writes:
>> > On Thu 16-06-16 16:42:58, Nikola Pajkovsky wrote:
>> >> Jan Kara <jack@suse.cz> writes:
>> >> 
>> >> > On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
>> >> >> Jan Kara <jack@suse.cz> writes:
>> >> >> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
>> >> >> >> Holger Hoffstätte <holger@applied-asynchrony.com> writes:
>> >> >> >> 
>> >> >> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
>> >> >> >> > (snip)
>> >> >> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> >> >> >> >> run finishes for it (which may take a while as our server room is currently
>> >> >> >> >> moving to a different place).
>> >> >> >> >> 
>> >> >> >> >> 								Honza
>> >> >> >> >> -- 
>> >> >> >> >> Jan Kara <jack@suse.com>
>> >> >> >> >> SUSE Labs, CR
>> >> >> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> >> >> >> >> From: Jan Kara <jack@suse.cz>
>> >> >> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> >> >> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
>> >> >> >> >> 
>> >> >> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> >> >> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
>> >> >> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
>> >> >> >> >> filesystems.
>> >> >> >> >
>> >> >> >> > Since you marked this for -stable, just a heads-up that the previous patch
>> >> >> >> > for the data exposure was rejected from -stable (see [1]) because it
>> >> >> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
>> >> >> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
>> >> >> >> >
>> >> >> >> > So both this and the previous patch need to be submitted.
>> >> >> >> >
>> >> >> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
>> >> >> >> 
>> >> >> >> I'm just wondering if the Jan's patch is not related to blocked
>> >> >> >> processes in following trace. It very hard to hit it and I don't have
>> >> >> >> any reproducer.
>> >> >> >
>> >> >> > This looks like a different issue. Does the machine recover itself or is it
>> >> >> > a hard hang and you have to press a reset button?
>> >> >> 
>> >> >> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
>> >> >> ram and machine has dedicated mount point only for PostgreSQL data.
>> >> >> 
>> >> >> Nevertheless, I was able always to ssh to the machine, so machine itself
>> >> >> was not in hard hang and ext4 mostly gets recover by itself (it took
>> >> >> 30min). But I have seen situation, were every process who 'touch' the ext4
>> >> >> goes immediately to D state and does not recover even after hour.
>> >> >
>> >> > If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
>> >> > dump stuck processes and also run 'iostat -x 1' for a while to see how much
>> >> > IO is happening in the system? That should tell us more.
>> >> 
>> >> 
>> >> Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
>> >> to mail it.
>> >> 
>> >>    http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html
>> >
>> > Can you upload it again please? I've got to looking at the file only today
>> > and it is already deleted. Thanks!
>> 
>> http://expirebox.com/download/c010e712e55938435c446cdc01a0b523.html
>
> OK, I had a look into the traces and JBD2 thread just waits for the buffers
> is has submitted for IO to complete. The rest is just blocked on that. From
> the message "INFO: task jbd2/vdc-8:4710 blocked for more than 120 seconds.
> severity=err" we can see that the JBD2 process has been waiting for a
> significant amount of time. Now the question is why it takes so long for
> the IO to complete - likely not a fs problem but somewhere below - block
> layer or the storage itself.
>
> What is the underlying storage? And what IO scheduler do you use? Seeing
> that the device is 'vdc' - that suggests you are running in a guest - is
> there anything interesting happening on the host at that moment? Is IO from
> other guests / the host stalled at that moment as well?

The underlying storage is 24 disks in hw raid6 with 64k stripe. LVM is
used to manage partitions for virt guests. Guests see just block dev
which is formatted in guest with jsize=2048 and with mount options
rw,noatime,nodiratime,user_xattr,acl.

Two guests running 3.18.34 with virtio-blk and hence uses multiqueue and
if I remember correctly, no IO sched is used for multiqueue. Each 18
vcpu, 160GB ram and only Postgresql uses /dev/vdc.

There are two other guests, much smaller running standard rhel6 kernel
with deadline IO sched.

We're plotting IO and read/write throughput from host and when processes
get blocked in one guest, we don't see any traffic going down to the
host/raid6. The other guests are running just fine, because we don't get
any blocked processes from other guests.

>> >> I was running iotop and there was traffic roughly ~20 KB/s write.
>> >> 
>> >> What was bit more interesting, was looking at
>> >> 
>> >>    cat /proc/vmstat | egrep "nr_dirty|nr_writeback"
>> >> 
>> >> nr_drity had around 240 and was slowly counting up, but nr_writeback had
>> >> ~8800 and was stuck for 120s.
>> >
>> > Hum, interesting. This would suggest like IO completion got stuck for some
>> > reason. We'll see more from the stacktraces hopefully.
>> 
>> I have monitor /sys/kernel/debug/bdi/253:32/stats for 10 mins per 1 sec.
>> Values are all same as follows:
>> 
>>     --[ Sun Jun 19 06:11:08 CEST 2016
>>     BdiWriteback:            15840 kB
>>     BdiReclaimable:          32320 kB
>>     BdiDirtyThresh:              0 kB
>>     DirtyThresh:           1048576 kB
>>     BackgroundThresh:       131072 kB
>>     BdiDirtied:         6131163680 kB
>>     BdiWritten:         6130214880 kB
>>     BdiWriteBandwidth:      324948 kBps
>>     b_dirty:                     2
>>     b_io:                        3
>>     b_more_io:                   0
>>     bdi_list:                    1
>>     state:                       c
>
> OK, so all the IO looks stalled for that period of it.
>
>> Maybe those values can cause issue and kicks in writeback to often and
>> block everyone else.
>> 
>>    $ sysctl -a | grep dirty | grep -v ratio
>>    vm.dirty_background_bytes = 134217728
>>    vm.dirty_bytes = 1073741824
>>    vm.dirty_expire_centisecs = 1500
>>    vm.dirty_writeback_centisecs = 500
>
> This looks healthy.
>
>> I even have output of command, if you're interested.
>> 
>>   $ trace-cmd record -e ext4 -e jbd2 -e writeback -e block sleep 600
>
> Traces from block layer may be interesting but you'd need the trace started
> before the hang starts so that you see what happened with the IO that
> jbd2/vdc-8:4710 is waiting for.
>
> 								Honza
diff mbox

Patch

From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 8 Jun 2016 10:01:45 +0200
Subject: [PATCH] ext4: Fix deadlock during page writeback

Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
deadlock in ext4_writepages() which was previously much harder to hit.
After this commit xfstest generic/130 reproduces the deadlock on small
filesystems.

The problem happens when ext4_do_update_inode() sets LARGE_FILE feature
and marks current inode handle as synchronous. That subsequently results
in ext4_journal_stop() called from ext4_writepages() to block waiting for
transaction commit while still holding page locks, reference to io_end,
and some prepared bio in mpd structure each of which can possibly block
transaction commit from completing and thus results in deadlock.

Fix the problem by releasing page locks, io_end reference, and
submitting prepared bio before calling ext4_journal_stop().

Reported-by: Eryu Guan <eguan@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f7140ca66e3b..ba04d57656d4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2748,13 +2748,27 @@  retry:
 				done = true;
 			}
 		}
-		ext4_journal_stop(handle);
 		/* Submit prepared bio */
 		ext4_io_submit(&mpd.io_submit);
 		/* Unlock pages we didn't use */
 		mpage_release_unused_pages(&mpd, give_up_on_write);
-		/* Drop our io_end reference we got from init */
-		ext4_put_io_end(mpd.io_submit.io_end);
+		/*
+		 * Drop our io_end reference we got from init. We have to be
+		 * careful and use deferred io_end finishing as we can release
+		 * the last reference to io_end which may end up doing unwritten
+		 * extent conversion which we cannot do while holding
+		 * transaction handle.
+		 */
+		ext4_put_io_end_defer(mpd.io_submit.io_end);
+		/*
+		 * Caution: ext4_journal_stop() can wait for transaction commit
+		 * to finish which may depend on writeback of pages to complete
+		 * or on page lock to be released. So we can call it only
+		 * after we have submitted all the IO, released page locks
+		 * we hold, and dropped io_end reference (for extent conversion
+		 * to be able to complete).
+		 */
+		ext4_journal_stop(handle);
 
 		if (ret == -ENOSPC && sbi->s_journal) {
 			/*
-- 
2.6.6