diff mbox

[BUG] fatal hang untarring 90GB file, possibly writeback related.

Message ID 1303993705-sup-5213@think
State Not Applicable, archived
Headers show

Commit Message

Chris Mason April 28, 2011, 12:29 p.m. UTC
Excerpts from Colin Ian King's message of 2011-04-28 07:36:30 -0400:
> One more data point to add, I've been looking at an identical issue when
> copying large amounts of data.  I bisected this - and the lockups occur
> with commit 
> 3e7d344970673c5334cf7b5bb27c8c0942b06126 - before that I don't see the
> issue. With this commit, my file copy test locks up after ~8-10
> iterations, before this commit I can copy > 100 times and don't see the
> lockup.

Well, that's really interesting.  I tried with compaction on here and
couldn't trigger it, but this (very very lightly) tested patch might
help.

It moves the writeout throttle before the goto restart, and also makes
sure we do at least one cond_resched before we loop.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Colin Ian King April 28, 2011, 1:42 p.m. UTC | #1
On Thu, 2011-04-28 at 08:29 -0400, Chris Mason wrote:
> Excerpts from Colin Ian King's message of 2011-04-28 07:36:30 -0400:
> > One more data point to add, I've been looking at an identical issue when
> > copying large amounts of data.  I bisected this - and the lockups occur
> > with commit 
> > 3e7d344970673c5334cf7b5bb27c8c0942b06126 - before that I don't see the
> > issue. With this commit, my file copy test locks up after ~8-10
> > iterations, before this commit I can copy > 100 times and don't see the
> > lockup.
> 
> Well, that's really interesting.  I tried with compaction on here and
> couldn't trigger it, but this (very very lightly) tested patch might
> help.
> 
Thanks Chris,

I've given this a soak test but I still see the same lockup.

> It moves the writeout throttle before the goto restart, and also makes
> sure we do at least one cond_resched before we loop.
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6771ea7..cb08b41 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1934,12 +1934,14 @@ restart:
>  	if (inactive_anon_is_low(zone, sc))
>  		shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
>  
> +	throttle_vm_writeout(sc->gfp_mask);
> +
>  	/* reclaim/compaction might need reclaim to continue */
>  	if (should_continue_reclaim(zone, nr_reclaimed,
> -					sc->nr_scanned - nr_scanned, sc))
> +					sc->nr_scanned - nr_scanned, sc)) {
> +		cond_resched();
>  		goto restart;
> -
> -	throttle_vm_writeout(sc->gfp_mask);
> +	}
>  }
>  
>  /*


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason April 28, 2011, 1:45 p.m. UTC | #2
Excerpts from Colin Ian King's message of 2011-04-28 09:42:20 -0400:
> 
> On Thu, 2011-04-28 at 08:29 -0400, Chris Mason wrote:
> > Excerpts from Colin Ian King's message of 2011-04-28 07:36:30 -0400:
> > > One more data point to add, I've been looking at an identical issue when
> > > copying large amounts of data.  I bisected this - and the lockups occur
> > > with commit 
> > > 3e7d344970673c5334cf7b5bb27c8c0942b06126 - before that I don't see the
> > > issue. With this commit, my file copy test locks up after ~8-10
> > > iterations, before this commit I can copy > 100 times and don't see the
> > > lockup.
> > 
> > Well, that's really interesting.  I tried with compaction on here and
> > couldn't trigger it, but this (very very lightly) tested patch might
> > help.
> > 
> Thanks Chris,
> 
> I've given this a soak test but I still see the same lockup.

Could you post the soft lockups you're seeing?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Colin Ian King April 28, 2011, 2:01 p.m. UTC | #3
> Could you post the soft lockups you're seeing?

As requested, attached

Colin
> 
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Chris Mason April 28, 2011, 2:04 p.m. UTC | #4
Excerpts from Colin Ian King's message of 2011-04-28 10:01:22 -0400:
> 
> > Could you post the soft lockups you're seeing?
> 
> As requested, attached

These are not good, but they aren't the lockup James was seeing.  Were
these messages with my patch?  If yes, please post the messages from
without my patch.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara April 28, 2011, 2:25 p.m. UTC | #5
On Thu 28-04-11 15:01:22, Colin Ian King wrote:
> 
> > Could you post the soft lockups you're seeing?
> 
> As requested, attached
  Hum, what keeps puzzling me is that in all the cases of hangs I've seen
so far, we are stuck waiting for IO to finish for a long time - e.g. in the
traces below kjournald waits for PageWriteback bit to get cleared. Also we
are stuck waiting for page locks which might be because those pages are
being read in? All in all it seems that the IO is just incredibly slow.

But it's not clear to me what pushes us into that situation (especially
since ext4 refuses to do any IO from ->writepage (i.e. kswapd) when the
underlying blocks are not already allocated.

									Honza

[  287.088371] INFO: task rs:main Q:Reg:749 blocked for more than 30 seconds.
[  287.088374] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  287.088376] rs:main Q:Reg   D 0000000000000000     0   749      1 0x00000000
[  287.088381]  ffff880072c17b68 0000000000000082 ffff880072c17fd8 ffff880072c16000
[  287.088392]  0000000000013d00 ffff88003591b178 ffff880072c17fd8 0000000000013d00
[  287.088396]  ffffffff81a0b020 ffff88003591adc0 ffff88001fffc3e8 ffff88001fc13d00
[  287.088400] Call Trace:
[  287.088404]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
[  287.088408]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
[  287.088411]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
[  287.088414]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
[  287.088418]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
[  287.088421]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
[  287.088425]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
[  287.088428]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
[  287.088431]  [<ffffffff8112d144>] __do_fault+0x54/0x520
[  287.088434]  [<ffffffff81134a43>] ? unmap_region+0x113/0x170
[  287.088437]  [<ffffffff812ded90>] ? prio_tree_insert+0x150/0x1c0
[  287.088440]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
[  287.088442]  [<ffffffff810442a7>] ? pte_alloc_one+0x37/0x50
[  287.088446]  [<ffffffff815c2cce>] ? _raw_spin_lock+0xe/0x20
[  287.088448]  [<ffffffff8112de25>] ? __pte_alloc+0xb5/0x100
[  287.088451]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
[  287.088454]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
[  287.088457]  [<ffffffff81136f85>] ? do_mmap_pgoff+0x335/0x370
[  287.088460]  [<ffffffff81137127>] ? sys_mmap_pgoff+0x167/0x230
[  287.088463]  [<ffffffff815c34d5>] page_fault+0x25/0x30
[  287.088466] INFO: task NetworkManager:764 blocked for more than 30 seconds.
[  287.088468] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  287.088470] NetworkManager  D 0000000000000002     0   764      1 0x00000000
[  287.088473]  ffff880074ffbb68 0000000000000082 ffff880074ffbfd8 ffff880074ffa000
[  287.088477]  0000000000013d00 ffff880036051a98 ffff880074ffbfd8 0000000000013d00
[  287.088481]  ffff8801005badc0 ffff8800360516e0 ffff88001ffef128 ffff88001fc53d00
[  287.088484] Call Trace:
[  287.088488]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
[  287.088491]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
[  287.088494]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
[  287.088497]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
[  287.088500]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
[  287.088503]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
[  287.088506]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
[  287.088509]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
[  287.088513]  [<ffffffff81177110>] ? pollwake+0x0/0x60
[  287.088516]  [<ffffffff8112d144>] __do_fault+0x54/0x520
[  287.088519]  [<ffffffff81177110>] ? pollwake+0x0/0x60
[  287.088522]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
[  287.088525]  [<ffffffff8111561d>] ? __free_pages+0x2d/0x40
[  287.088527]  [<ffffffff8112de4f>] ? __pte_alloc+0xdf/0x100
[  287.088530]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
[  287.088533]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
[  287.088537]  [<ffffffff81013859>] ? read_tsc+0x9/0x20
[  287.088540]  [<ffffffff81092eb1>] ? ktime_get_ts+0xb1/0xf0
[  287.088543]  [<ffffffff811776d2>] ? poll_select_set_timeout+0x82/0x90
[  287.088546]  [<ffffffff815c34d5>] page_fault+0x25/0x30
[  287.088559] INFO: task unity-panel-ser:1521 blocked for more than 30 seconds.
[  287.088561] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  287.088562] unity-panel-ser D 0000000000000000     0  1521      1 0x00000000
[  287.088566]  ffff880061f37b68 0000000000000082 ffff880061f37fd8 ffff880061f36000
[  287.088570]  0000000000013d00 ffff880068c7c858 ffff880061f37fd8 0000000000013d00
[  287.088573]  ffff88003591c4a0 ffff880068c7c4a0 ffff88001fff0c88 ffff88001fc13d00
[  287.088577] Call Trace:
[  287.088581]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
[  287.088583]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
[  287.088587]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
[  287.088589]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
[  287.088593]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
[  287.088596]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
[  287.088599]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
[  287.088602]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
[  287.088605]  [<ffffffff8112d144>] __do_fault+0x54/0x520
[  287.088608]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
[  287.088610]  [<ffffffff8111561d>] ? __free_pages+0x2d/0x40
[  287.088613]  [<ffffffff8112de4f>] ? __pte_alloc+0xdf/0x100
[  287.088616]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
[  287.088619]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
[  287.088622]  [<ffffffff81136f85>] ? do_mmap_pgoff+0x335/0x370
[  287.088625]  [<ffffffff815c34d5>] page_fault+0x25/0x30
[  287.088629] INFO: task jbd2/sda4-8:1845 blocked for more than 30 seconds.
[  287.088630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  287.088632] jbd2/sda4-8     D 0000000000000000     0  1845      2 0x00000000
[  287.088636]  ffff880068f6baf0 0000000000000046 ffff880068f6bfd8 ffff880068f6a000
[  287.088639]  0000000000013d00 ffff880061d603b8 ffff880068f6bfd8 0000000000013d00
[  287.088643]  ffff88003591c4a0 ffff880061d60000 ffff88001fff8548 ffff88001fc13d00
[  287.088647] Call Trace:
[  287.088650]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
[  287.088653]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
[  287.088656]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
[  287.088659]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
[  287.088662]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
[  287.088665]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
[  287.088668]  [<ffffffff8110c41d>] filemap_fdatawait_range+0xfd/0x190
[  287.088672]  [<ffffffff8110c4db>] filemap_fdatawait+0x2b/0x30
[  287.088675]  [<ffffffff81242a93>] journal_finish_inode_data_buffers+0x63/0x170
[  287.088678]  [<ffffffff81243284>] jbd2_journal_commit_transaction+0x6e4/0x1190
[  287.088682]  [<ffffffff81076185>] ? try_to_del_timer_sync+0x85/0xe0
[  287.088685]  [<ffffffff81247e9b>] kjournald2+0xbb/0x220
[  287.088688]  [<ffffffff81087f30>] ? autoremove_wake_function+0x0/0x40
[  287.088691]  [<ffffffff81247de0>] ? kjournald2+0x0/0x220
[  287.088694]  [<ffffffff810877e6>] kthread+0x96/0xa0
[  287.088697]  [<ffffffff8100ce24>] kernel_thread_helper+0x4/0x10
[  287.088700]  [<ffffffff81087750>] ? kthread+0x0/0xa0
[  287.088703]  [<ffffffff8100ce20>] ? kernel_thread_helper+0x0/0x10
[  287.088705] INFO: task dirname:5969 blocked for more than 30 seconds.
[  287.088707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  287.088709] dirname         D 0000000000000002     0  5969   5214 0x00000000
[  287.088712]  ffff88005bd9d8b8 0000000000000086 ffff88005bd9dfd8 ffff88005bd9c000
[  287.088716]  0000000000013d00 ffff88005d65b178 ffff88005bd9dfd8 0000000000013d00
[  287.088720]  ffff8801005e5b80 ffff88005d65adc0 ffff88001ffe5228 ffff88001fc53d00
[  287.088723] Call Trace:
[  287.088726]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
[  287.088729]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
[  287.088732]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
[  287.088735]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
[  287.088738]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
[  287.088741]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
[  287.088744]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
[  287.088747]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
[  287.088750]  [<ffffffff8112d144>] __do_fault+0x54/0x520
[  287.088753]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
[  287.088756]  [<ffffffff810442a7>] ? pte_alloc_one+0x37/0x50
[  287.088759]  [<ffffffff815c2cce>] ? _raw_spin_lock+0xe/0x20
[  287.088761]  [<ffffffff8112de25>] ? __pte_alloc+0xb5/0x100
[  287.088764]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
[  287.088767]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
[  287.088770]  [<ffffffff81136947>] ? mmap_region+0x1f7/0x500
[  287.088773]  [<ffffffff8112db06>] ? free_pgd_range+0x356/0x4a0
[  287.088776]  [<ffffffff815c34d5>] page_fault+0x25/0x30
[  287.088779]  [<ffffffff812e6d5f>] ? __clear_user+0x3f/0x70
[  287.088782]  [<ffffffff812e6d41>] ? __clear_user+0x21/0x70
[  287.088786]  [<ffffffff812e6dc6>] clear_user+0x36/0x40
[  287.088788]  [<ffffffff811b0b6d>] padzero+0x2d/0x40
[  287.088791]  [<ffffffff811b2c7a>] load_elf_binary+0x95a/0xe00
[  287.088794]  [<ffffffff8116aa8a>] search_binary_handler+0xda/0x300
[  287.088797]  [<ffffffff811b2320>] ? load_elf_binary+0x0/0xe00
[  287.088800]  [<ffffffff8116c49c>] do_execve+0x24c/0x2d0
[  287.088802]  [<ffffffff8101521a>] sys_execve+0x4a/0x80
[  287.088805]  [<ffffffff8100c45c>] stub_execve+0x6c/0xc0
Jan Kara April 28, 2011, 2:33 p.m. UTC | #6
On Thu 28-04-11 16:25:51, Jan Kara wrote:
> On Thu 28-04-11 15:01:22, Colin Ian King wrote:
> > 
> > > Could you post the soft lockups you're seeing?
> > 
> > As requested, attached
>   Hum, what keeps puzzling me is that in all the cases of hangs I've seen
> so far, we are stuck waiting for IO to finish for a long time - e.g. in the
> traces below kjournald waits for PageWriteback bit to get cleared. Also we
> are stuck waiting for page locks which might be because those pages are
> being read in? All in all it seems that the IO is just incredibly slow.
> 
> But it's not clear to me what pushes us into that situation (especially
> since ext4 refuses to do any IO from ->writepage (i.e. kswapd) when the
> underlying blocks are not already allocated.
  Hmm, maybe because the system is under memory pressure (and kswapd is not
able to get rid of dirty pages), we page out clean pages. Thus also pages
of executables which need to be paged in soon anyway thus putting heavy
read load on the system which makes writes crawl? I'm not sure why
compaction should make this any worse but maybe it can.

James, Colin, can you capture output of 'vmstat 1' while you do the
copying? Thanks.

								Honza

> [  287.088371] INFO: task rs:main Q:Reg:749 blocked for more than 30 seconds.
> [  287.088374] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  287.088376] rs:main Q:Reg   D 0000000000000000     0   749      1 0x00000000
> [  287.088381]  ffff880072c17b68 0000000000000082 ffff880072c17fd8 ffff880072c16000
> [  287.088392]  0000000000013d00 ffff88003591b178 ffff880072c17fd8 0000000000013d00
> [  287.088396]  ffffffff81a0b020 ffff88003591adc0 ffff88001fffc3e8 ffff88001fc13d00
> [  287.088400] Call Trace:
> [  287.088404]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> [  287.088408]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> [  287.088411]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> [  287.088414]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> [  287.088418]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> [  287.088421]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> [  287.088425]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> [  287.088428]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> [  287.088431]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> [  287.088434]  [<ffffffff81134a43>] ? unmap_region+0x113/0x170
> [  287.088437]  [<ffffffff812ded90>] ? prio_tree_insert+0x150/0x1c0
> [  287.088440]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> [  287.088442]  [<ffffffff810442a7>] ? pte_alloc_one+0x37/0x50
> [  287.088446]  [<ffffffff815c2cce>] ? _raw_spin_lock+0xe/0x20
> [  287.088448]  [<ffffffff8112de25>] ? __pte_alloc+0xb5/0x100
> [  287.088451]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> [  287.088454]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> [  287.088457]  [<ffffffff81136f85>] ? do_mmap_pgoff+0x335/0x370
> [  287.088460]  [<ffffffff81137127>] ? sys_mmap_pgoff+0x167/0x230
> [  287.088463]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> [  287.088466] INFO: task NetworkManager:764 blocked for more than 30 seconds.
> [  287.088468] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  287.088470] NetworkManager  D 0000000000000002     0   764      1 0x00000000
> [  287.088473]  ffff880074ffbb68 0000000000000082 ffff880074ffbfd8 ffff880074ffa000
> [  287.088477]  0000000000013d00 ffff880036051a98 ffff880074ffbfd8 0000000000013d00
> [  287.088481]  ffff8801005badc0 ffff8800360516e0 ffff88001ffef128 ffff88001fc53d00
> [  287.088484] Call Trace:
> [  287.088488]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> [  287.088491]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> [  287.088494]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> [  287.088497]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> [  287.088500]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> [  287.088503]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> [  287.088506]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> [  287.088509]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> [  287.088513]  [<ffffffff81177110>] ? pollwake+0x0/0x60
> [  287.088516]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> [  287.088519]  [<ffffffff81177110>] ? pollwake+0x0/0x60
> [  287.088522]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> [  287.088525]  [<ffffffff8111561d>] ? __free_pages+0x2d/0x40
> [  287.088527]  [<ffffffff8112de4f>] ? __pte_alloc+0xdf/0x100
> [  287.088530]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> [  287.088533]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> [  287.088537]  [<ffffffff81013859>] ? read_tsc+0x9/0x20
> [  287.088540]  [<ffffffff81092eb1>] ? ktime_get_ts+0xb1/0xf0
> [  287.088543]  [<ffffffff811776d2>] ? poll_select_set_timeout+0x82/0x90
> [  287.088546]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> [  287.088559] INFO: task unity-panel-ser:1521 blocked for more than 30 seconds.
> [  287.088561] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  287.088562] unity-panel-ser D 0000000000000000     0  1521      1 0x00000000
> [  287.088566]  ffff880061f37b68 0000000000000082 ffff880061f37fd8 ffff880061f36000
> [  287.088570]  0000000000013d00 ffff880068c7c858 ffff880061f37fd8 0000000000013d00
> [  287.088573]  ffff88003591c4a0 ffff880068c7c4a0 ffff88001fff0c88 ffff88001fc13d00
> [  287.088577] Call Trace:
> [  287.088581]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> [  287.088583]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> [  287.088587]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> [  287.088589]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> [  287.088593]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> [  287.088596]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> [  287.088599]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> [  287.088602]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> [  287.088605]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> [  287.088608]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> [  287.088610]  [<ffffffff8111561d>] ? __free_pages+0x2d/0x40
> [  287.088613]  [<ffffffff8112de4f>] ? __pte_alloc+0xdf/0x100
> [  287.088616]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> [  287.088619]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> [  287.088622]  [<ffffffff81136f85>] ? do_mmap_pgoff+0x335/0x370
> [  287.088625]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> [  287.088629] INFO: task jbd2/sda4-8:1845 blocked for more than 30 seconds.
> [  287.088630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  287.088632] jbd2/sda4-8     D 0000000000000000     0  1845      2 0x00000000
> [  287.088636]  ffff880068f6baf0 0000000000000046 ffff880068f6bfd8 ffff880068f6a000
> [  287.088639]  0000000000013d00 ffff880061d603b8 ffff880068f6bfd8 0000000000013d00
> [  287.088643]  ffff88003591c4a0 ffff880061d60000 ffff88001fff8548 ffff88001fc13d00
> [  287.088647] Call Trace:
> [  287.088650]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> [  287.088653]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> [  287.088656]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> [  287.088659]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> [  287.088662]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> [  287.088665]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> [  287.088668]  [<ffffffff8110c41d>] filemap_fdatawait_range+0xfd/0x190
> [  287.088672]  [<ffffffff8110c4db>] filemap_fdatawait+0x2b/0x30
> [  287.088675]  [<ffffffff81242a93>] journal_finish_inode_data_buffers+0x63/0x170
> [  287.088678]  [<ffffffff81243284>] jbd2_journal_commit_transaction+0x6e4/0x1190
> [  287.088682]  [<ffffffff81076185>] ? try_to_del_timer_sync+0x85/0xe0
> [  287.088685]  [<ffffffff81247e9b>] kjournald2+0xbb/0x220
> [  287.088688]  [<ffffffff81087f30>] ? autoremove_wake_function+0x0/0x40
> [  287.088691]  [<ffffffff81247de0>] ? kjournald2+0x0/0x220
> [  287.088694]  [<ffffffff810877e6>] kthread+0x96/0xa0
> [  287.088697]  [<ffffffff8100ce24>] kernel_thread_helper+0x4/0x10
> [  287.088700]  [<ffffffff81087750>] ? kthread+0x0/0xa0
> [  287.088703]  [<ffffffff8100ce20>] ? kernel_thread_helper+0x0/0x10
> [  287.088705] INFO: task dirname:5969 blocked for more than 30 seconds.
> [  287.088707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  287.088709] dirname         D 0000000000000002     0  5969   5214 0x00000000
> [  287.088712]  ffff88005bd9d8b8 0000000000000086 ffff88005bd9dfd8 ffff88005bd9c000
> [  287.088716]  0000000000013d00 ffff88005d65b178 ffff88005bd9dfd8 0000000000013d00
> [  287.088720]  ffff8801005e5b80 ffff88005d65adc0 ffff88001ffe5228 ffff88001fc53d00
> [  287.088723] Call Trace:
> [  287.088726]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> [  287.088729]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> [  287.088732]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> [  287.088735]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> [  287.088738]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> [  287.088741]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> [  287.088744]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> [  287.088747]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> [  287.088750]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> [  287.088753]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> [  287.088756]  [<ffffffff810442a7>] ? pte_alloc_one+0x37/0x50
> [  287.088759]  [<ffffffff815c2cce>] ? _raw_spin_lock+0xe/0x20
> [  287.088761]  [<ffffffff8112de25>] ? __pte_alloc+0xb5/0x100
> [  287.088764]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> [  287.088767]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> [  287.088770]  [<ffffffff81136947>] ? mmap_region+0x1f7/0x500
> [  287.088773]  [<ffffffff8112db06>] ? free_pgd_range+0x356/0x4a0
> [  287.088776]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> [  287.088779]  [<ffffffff812e6d5f>] ? __clear_user+0x3f/0x70
> [  287.088782]  [<ffffffff812e6d41>] ? __clear_user+0x21/0x70
> [  287.088786]  [<ffffffff812e6dc6>] clear_user+0x36/0x40
> [  287.088788]  [<ffffffff811b0b6d>] padzero+0x2d/0x40
> [  287.088791]  [<ffffffff811b2c7a>] load_elf_binary+0x95a/0xe00
> [  287.088794]  [<ffffffff8116aa8a>] search_binary_handler+0xda/0x300
> [  287.088797]  [<ffffffff811b2320>] ? load_elf_binary+0x0/0xe00
> [  287.088800]  [<ffffffff8116c49c>] do_execve+0x24c/0x2d0
> [  287.088802]  [<ffffffff8101521a>] sys_execve+0x4a/0x80
> [  287.088805]  [<ffffffff8100c45c>] stub_execve+0x6c/0xc0
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley April 28, 2011, 2:49 p.m. UTC | #7
On Thu, 2011-04-28 at 08:29 -0400, Chris Mason wrote:
> Excerpts from Colin Ian King's message of 2011-04-28 07:36:30 -0400:
> > One more data point to add, I've been looking at an identical issue when
> > copying large amounts of data.  I bisected this - and the lockups occur
> > with commit 
> > 3e7d344970673c5334cf7b5bb27c8c0942b06126 - before that I don't see the
> > issue. With this commit, my file copy test locks up after ~8-10
> > iterations, before this commit I can copy > 100 times and don't see the
> > lockup.
> 
> Well, that's really interesting.  I tried with compaction on here and
> couldn't trigger it, but this (very very lightly) tested patch might
> help.
> 
> It moves the writeout throttle before the goto restart, and also makes
> sure we do at least one cond_resched before we loop.

It seems to take longer, but with a PREEMPT kernel, kswapd eventually
shoots up to 99% during the tar.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Colin Ian King April 28, 2011, 2:58 p.m. UTC | #8
On Thu, 2011-04-28 at 16:33 +0200, Jan Kara wrote:
> On Thu 28-04-11 16:25:51, Jan Kara wrote:
> > On Thu 28-04-11 15:01:22, Colin Ian King wrote:
> > > 
> > > > Could you post the soft lockups you're seeing?
> > > 
> > > As requested, attached
> >   Hum, what keeps puzzling me is that in all the cases of hangs I've seen
> > so far, we are stuck waiting for IO to finish for a long time - e.g. in the
> > traces below kjournald waits for PageWriteback bit to get cleared. Also we
> > are stuck waiting for page locks which might be because those pages are
> > being read in? All in all it seems that the IO is just incredibly slow.
> > 
> > But it's not clear to me what pushes us into that situation (especially
> > since ext4 refuses to do any IO from ->writepage (i.e. kswapd) when the
> > underlying blocks are not already allocated.
>   Hmm, maybe because the system is under memory pressure (and kswapd is not
> able to get rid of dirty pages), we page out clean pages. Thus also pages
> of executables which need to be paged in soon anyway thus putting heavy
> read load on the system which makes writes crawl? I'm not sure why
> compaction should make this any worse but maybe it can.
> 
> James, Colin, can you capture output of 'vmstat 1' while you do the
> copying? Thanks.

Attached.


> 
> 								Honza
> 
> > [  287.088371] INFO: task rs:main Q:Reg:749 blocked for more than 30 seconds.
> > [  287.088374] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  287.088376] rs:main Q:Reg   D 0000000000000000     0   749      1 0x00000000
> > [  287.088381]  ffff880072c17b68 0000000000000082 ffff880072c17fd8 ffff880072c16000
> > [  287.088392]  0000000000013d00 ffff88003591b178 ffff880072c17fd8 0000000000013d00
> > [  287.088396]  ffffffff81a0b020 ffff88003591adc0 ffff88001fffc3e8 ffff88001fc13d00
> > [  287.088400] Call Trace:
> > [  287.088404]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > [  287.088408]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > [  287.088411]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > [  287.088414]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > [  287.088418]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > [  287.088421]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > [  287.088425]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > [  287.088428]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > [  287.088431]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > [  287.088434]  [<ffffffff81134a43>] ? unmap_region+0x113/0x170
> > [  287.088437]  [<ffffffff812ded90>] ? prio_tree_insert+0x150/0x1c0
> > [  287.088440]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > [  287.088442]  [<ffffffff810442a7>] ? pte_alloc_one+0x37/0x50
> > [  287.088446]  [<ffffffff815c2cce>] ? _raw_spin_lock+0xe/0x20
> > [  287.088448]  [<ffffffff8112de25>] ? __pte_alloc+0xb5/0x100
> > [  287.088451]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > [  287.088454]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > [  287.088457]  [<ffffffff81136f85>] ? do_mmap_pgoff+0x335/0x370
> > [  287.088460]  [<ffffffff81137127>] ? sys_mmap_pgoff+0x167/0x230
> > [  287.088463]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > [  287.088466] INFO: task NetworkManager:764 blocked for more than 30 seconds.
> > [  287.088468] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  287.088470] NetworkManager  D 0000000000000002     0   764      1 0x00000000
> > [  287.088473]  ffff880074ffbb68 0000000000000082 ffff880074ffbfd8 ffff880074ffa000
> > [  287.088477]  0000000000013d00 ffff880036051a98 ffff880074ffbfd8 0000000000013d00
> > [  287.088481]  ffff8801005badc0 ffff8800360516e0 ffff88001ffef128 ffff88001fc53d00
> > [  287.088484] Call Trace:
> > [  287.088488]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > [  287.088491]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > [  287.088494]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > [  287.088497]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > [  287.088500]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > [  287.088503]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > [  287.088506]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > [  287.088509]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > [  287.088513]  [<ffffffff81177110>] ? pollwake+0x0/0x60
> > [  287.088516]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > [  287.088519]  [<ffffffff81177110>] ? pollwake+0x0/0x60
> > [  287.088522]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > [  287.088525]  [<ffffffff8111561d>] ? __free_pages+0x2d/0x40
> > [  287.088527]  [<ffffffff8112de4f>] ? __pte_alloc+0xdf/0x100
> > [  287.088530]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > [  287.088533]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > [  287.088537]  [<ffffffff81013859>] ? read_tsc+0x9/0x20
> > [  287.088540]  [<ffffffff81092eb1>] ? ktime_get_ts+0xb1/0xf0
> > [  287.088543]  [<ffffffff811776d2>] ? poll_select_set_timeout+0x82/0x90
> > [  287.088546]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > [  287.088559] INFO: task unity-panel-ser:1521 blocked for more than 30 seconds.
> > [  287.088561] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  287.088562] unity-panel-ser D 0000000000000000     0  1521      1 0x00000000
> > [  287.088566]  ffff880061f37b68 0000000000000082 ffff880061f37fd8 ffff880061f36000
> > [  287.088570]  0000000000013d00 ffff880068c7c858 ffff880061f37fd8 0000000000013d00
> > [  287.088573]  ffff88003591c4a0 ffff880068c7c4a0 ffff88001fff0c88 ffff88001fc13d00
> > [  287.088577] Call Trace:
> > [  287.088581]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > [  287.088583]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > [  287.088587]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > [  287.088589]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > [  287.088593]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > [  287.088596]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > [  287.088599]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > [  287.088602]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > [  287.088605]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > [  287.088608]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > [  287.088610]  [<ffffffff8111561d>] ? __free_pages+0x2d/0x40
> > [  287.088613]  [<ffffffff8112de4f>] ? __pte_alloc+0xdf/0x100
> > [  287.088616]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > [  287.088619]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > [  287.088622]  [<ffffffff81136f85>] ? do_mmap_pgoff+0x335/0x370
> > [  287.088625]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > [  287.088629] INFO: task jbd2/sda4-8:1845 blocked for more than 30 seconds.
> > [  287.088630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  287.088632] jbd2/sda4-8     D 0000000000000000     0  1845      2 0x00000000
> > [  287.088636]  ffff880068f6baf0 0000000000000046 ffff880068f6bfd8 ffff880068f6a000
> > [  287.088639]  0000000000013d00 ffff880061d603b8 ffff880068f6bfd8 0000000000013d00
> > [  287.088643]  ffff88003591c4a0 ffff880061d60000 ffff88001fff8548 ffff88001fc13d00
> > [  287.088647] Call Trace:
> > [  287.088650]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > [  287.088653]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > [  287.088656]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > [  287.088659]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > [  287.088662]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > [  287.088665]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > [  287.088668]  [<ffffffff8110c41d>] filemap_fdatawait_range+0xfd/0x190
> > [  287.088672]  [<ffffffff8110c4db>] filemap_fdatawait+0x2b/0x30
> > [  287.088675]  [<ffffffff81242a93>] journal_finish_inode_data_buffers+0x63/0x170
> > [  287.088678]  [<ffffffff81243284>] jbd2_journal_commit_transaction+0x6e4/0x1190
> > [  287.088682]  [<ffffffff81076185>] ? try_to_del_timer_sync+0x85/0xe0
> > [  287.088685]  [<ffffffff81247e9b>] kjournald2+0xbb/0x220
> > [  287.088688]  [<ffffffff81087f30>] ? autoremove_wake_function+0x0/0x40
> > [  287.088691]  [<ffffffff81247de0>] ? kjournald2+0x0/0x220
> > [  287.088694]  [<ffffffff810877e6>] kthread+0x96/0xa0
> > [  287.088697]  [<ffffffff8100ce24>] kernel_thread_helper+0x4/0x10
> > [  287.088700]  [<ffffffff81087750>] ? kthread+0x0/0xa0
> > [  287.088703]  [<ffffffff8100ce20>] ? kernel_thread_helper+0x0/0x10
> > [  287.088705] INFO: task dirname:5969 blocked for more than 30 seconds.
> > [  287.088707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  287.088709] dirname         D 0000000000000002     0  5969   5214 0x00000000
> > [  287.088712]  ffff88005bd9d8b8 0000000000000086 ffff88005bd9dfd8 ffff88005bd9c000
> > [  287.088716]  0000000000013d00 ffff88005d65b178 ffff88005bd9dfd8 0000000000013d00
> > [  287.088720]  ffff8801005e5b80 ffff88005d65adc0 ffff88001ffe5228 ffff88001fc53d00
> > [  287.088723] Call Trace:
> > [  287.088726]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > [  287.088729]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > [  287.088732]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > [  287.088735]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > [  287.088738]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > [  287.088741]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > [  287.088744]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > [  287.088747]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > [  287.088750]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > [  287.088753]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > [  287.088756]  [<ffffffff810442a7>] ? pte_alloc_one+0x37/0x50
> > [  287.088759]  [<ffffffff815c2cce>] ? _raw_spin_lock+0xe/0x20
> > [  287.088761]  [<ffffffff8112de25>] ? __pte_alloc+0xb5/0x100
> > [  287.088764]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > [  287.088767]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > [  287.088770]  [<ffffffff81136947>] ? mmap_region+0x1f7/0x500
> > [  287.088773]  [<ffffffff8112db06>] ? free_pgd_range+0x356/0x4a0
> > [  287.088776]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > [  287.088779]  [<ffffffff812e6d5f>] ? __clear_user+0x3f/0x70
> > [  287.088782]  [<ffffffff812e6d41>] ? __clear_user+0x21/0x70
> > [  287.088786]  [<ffffffff812e6dc6>] clear_user+0x36/0x40
> > [  287.088788]  [<ffffffff811b0b6d>] padzero+0x2d/0x40
> > [  287.088791]  [<ffffffff811b2c7a>] load_elf_binary+0x95a/0xe00
> > [  287.088794]  [<ffffffff8116aa8a>] search_binary_handler+0xda/0x300
> > [  287.088797]  [<ffffffff811b2320>] ? load_elf_binary+0x0/0xe00
> > [  287.088800]  [<ffffffff8116c49c>] do_execve+0x24c/0x2d0
> > [  287.088802]  [<ffffffff8101521a>] sys_execve+0x4a/0x80
> > [  287.088805]  [<ffffffff8100c45c>] stub_execve+0x6c/0xc0
> > -- 
> > Jan Kara <jack@suse.cz>
> > SUSE Labs, CR
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
Colin Ian King April 28, 2011, 3:23 p.m. UTC | #9
On Thu, 2011-04-28 at 10:04 -0400, Chris Mason wrote:
> Excerpts from Colin Ian King's message of 2011-04-28 10:01:22 -0400:
> > 
> > > Could you post the soft lockups you're seeing?
> > 
> > As requested, attached
> 
> These are not good, but they aren't the lockup James was seeing.  Were
> these messages with my patch?  If yes, please post the messages from
> without my patch.

Attached are the messages without your patch.

Colin
> 
> -chris
James Bottomley April 28, 2011, 4:11 p.m. UTC | #10
On Thu, 2011-04-28 at 16:33 +0200, Jan Kara wrote:
> James, Colin, can you capture output of 'vmstat 1' while you do the
> copying? Thanks.

Sure; this is with a PREEMPT kernel so the tar goes to completion
without locking the system.

I started using a memory balloon to clear out the page cache.

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0  98708 1676100   4112  39556    0   45  1566  1879  392  168 14 17 59 10  0
 0  0  98708 1676100   4112  39604    0    0     0     0   37   35  0  0 100  0  0
 0  0  98708 1676100   4112  39652    0    0     0     0   31   35  0  0 100  0  0
 0  0  98708 1676100   4112  39652    0    0     0     0   26   27  0  0 100  0  0
 0  0  98708 1676100   4112  39652   32    0    32     0   50   48  0  0 100  0  0
 1  1  98708 1673984   4616  40576  192    0  1812     0  286  117  0  0 92  8  0
 0  1  98708 1590540   4792 124236    0    0 41972     0  901  822  0  4 67 29  0
 0  1  98708 1499524   4808 214820    0    0 45328     0  917  890  0  4 77 20  0
 0  1  98708 1404532   4808 308316   64    0 46784     0  965 1033  0  3 77 20  0
 0  1  98708 1312020   5612 399008    0    0 44836     0 1274 1075  0  6 77 18  0
 0  1  98708 1304540   5628 406324    0    0  3600 47260  289  115  0  2 68 30  0
 0  1  98708 1230760   5672 479020    0    0 36260   712  914  854  0  3 79 18  0
 0  1  98708 1208696   5676 500016    0    0 10500 93888  421  251  0  3 67 31  0
 0  1  98708 1157632   5676 551264    0    0 25600     0  658  518  0  2 75 23  0
 0  1  98708 1052356   5680 655344    0    0 51972     0 1112 1112  0  4 78 19  0
 0  2  98708 1024960   5680 679080    0    0 11904 87304  477  277  0  3 54 43  0
 1  1  98708 1027288   5684 679432    0    0   128  2896  195   42  0  0 85 14  0
 0  1  98708 947124   5688 758528    0    0 39552     4  932  899  0  4 68 28  0
 0  1  98708 849792   5692 854272    0    0 47876  4468 1045  936  0  4 77 20  0
 0  1  98708 813000   5820 890492    0    0 18176 18300  512  386  0  3 66 32  0
 1  2  98708 794304   5820 908132    0    0  8832 95640  379  324  0  2 77 21  0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1  98708 794272   5824 908128    0    0     0 39456  212   58  0  1 73 26  0
 0  1  98708 792344   5832 910452    0    0  1156 44440  193   62  0  1 74 25  0
 0  1  98708 764392   5836 938288    0    0 13956     0  441  374  0  1 79 19  0
 0  1  98708 660828   5840 1040232    0    0 50948     0 1141  992  0  6 65 29  0
 0  1  98708 555156   5840 1144216    0    0 51968     0 1072 1008  0  4 76 20  0
 0  1  98708 458692   5856 1239212    0    0 47496   120  995 1046  0  3 78 19  0
 0  1  98708 411336   5856 1285896    0    0 23296 17764  536  463  0  3 71 26  0
 1  1  98708 367716   5856 1328904    0    0 21504 22272  541  463  0  3 78 20  0
 0  1  98708 330048   5864 1366044    0    0 18568 17524  529  515  0  2 78 20  0
 0  1  98708 301620   5864 1393912    0    0 13952 28148  382  309  0  2 68 30  0
 1  1  98708 245140   5872 1449576    0    0 27776 13224  610  567  0  3 73 24  0
 0  2  98708 226868   5872 1467000    0    0  8704 114804  375  322  0  2 77 21  0
 0  2  98708 226268   5876 1467552    0    0   260 51720  204   40  0  1 75 24  0
 0  1  98708 226548   5880 1467536    0    0     4 13180  149   29  0  1 70 30  0
 2  0  98708 159792   5880 1533844    0    0 33152     0  721  747  0  2 81 17  0
 0  1  98708  55252   5888 1636692    0    0 51456   104 1021  977  0  5 70 25  0
 1  1  98708  22804   2680 1671344    0    0 53632     0 1806 1188  0 20 59 21  0
 2  1  98708  22832   2672 1671372    0    0 40704 11196 1967 1126  0 22 60 17  0
 1  1  98708  22908   2676 1671196    0    0 24580 23040 1621  705  0 35 36 30  0
 1  1  98708  22968   2676 1670796    0    0 21504 22212 1576  674  0 22 60 18  0
 1  1  98708  22672   2684 1670996    0    0 17920 16968 1450  687  0 27 51 23  0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  1  98708  22904   2452 1671344    0    0 24832 25780 1632  714  0 27 50 23  0
 1  2  98708  22764   2452 1670724    0    0 15488 96304 1459  531  0 37 33 30  0
 2  2  98708  22640   2452 1670724    0    0     0 60300 1219  356  0 20 45 35  0
 1  1  98708  22744   2452 1670688    0    0   768 29312 1200  253  0 29 36 34  0
 2  0  98708  22964   2460 1671040    0    0 10752    68 1345  445  0 23 56 21  0
 1  1  98708  22652   2536 1671560    0    0 51540     0 2113 1246  0 28 49 23  0
 1  1  98708  22676   2528 1671768    0    0 118276 14604 4875 2681  0 27 51 22  0
 1  1  98708  22644   2528 1671804    0    0 23680 18172 1539  774  0 29 44 27  0
 1  1  98708  22720   2536 1671464    0    0 21888 24664 1494  651  0 24 55 22  0
 2  1  98708  22672   2496 1671592    0    0 22528 18144 1493  643  0 30 42 28  0
 1  1  98708  22732   2488 1671508    0    0 22528 25192 1537  726  0 23 55 21  0
 1  2  98708  22884   2488 1670504    0    0 18816 89440 1464  581  0 31 37 32  0
 1  2  98708  22740   2488 1670764    0    0   768 75012 1196  254  0 22 55 23  0
 1  2  98708  22848   2492 1670600    0    0   256 15512 1225  358  0 29 27 44  0
 1  1  98708  22620   2500 1671628    0    0 19076 11408 1490  575  0 23 54 23  0
 1  1  98708  22988   2500 1671108    0    0 55040     0 2118 1203  0 27 51 22  0
 1  1  98708  22876   2488 1671044    0    0 53760     0 2138 1323  0 28 49 23  0
 1  1  98708  22904   2500 1671012    0    0 48012  5528 2049 1097  0 32 43 25  0
 1  1  98708  22856   2508 1670876    0    0 21504 18536 1500  646  0 23 55 22  0
 2  0  98708  22684   2504 1670972    0    0 23680 23552 1537  784  0 29 44 26  0
 1  1  98708  22960   2500 1670520    0    0 23808 25552 1529  677  0 24 55 21  0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  1  98708  22820   2504 1670804    0    0 21764 25104 1507  647  0 30 43 26  0
 1  1  98708  22880   2504 1670812    0    0 24192 18900 1541  768  0 23 55 22  0
 2  3  98708  22716   2512 1670068    0    0  2816 132592 1257  303  0 30 30 40  0
 1  1  98708  22944   2512 1669940    0    0   512 40400 1165  247  0 23 57 20  0
 1  1  98708  22672   2512 1670624    0    0   512     0 1186  349  0 28 44 29  0
 1  1  98708  22788   2504 1670416    0    0 51200     0 2130 1158  0 25 53 22  0
 1  1  98708  22660   2508 1670688    0    0 50564     0 1972 1161  0 31 43 26  0
 1  1  98708  22780   2516 1670088    0    0 50176    64 2028 1204  0 24 56 20  0
 1  1  98708  22856   2508 1669884    0    0 32640 19040 1723  819  0 31 43 26  0
 3  0  98708  22808   2512 1669812    0    0 23812 19176 1535  670  0 23 55 22  0
 1  1  98708  22744   2504 1669728    0    0 21888 24872 1535  653  0 30 43 27  0
 1  1  98708  22696   2504 1669636    0    0 22784 25032 1529  647  0 23 55 21  0
 1  1  98708  22632   2512 1669596    0    0 21896 19228 1515  647  0 30 43 27  0
 1  2  98708  22676   2512 1668992    0    0  9600 74484 1366  418  0 24 43 33  0
 1  2  98708  22780   2512 1668772    0    0   256 107120 1211  245  0 29 42 30  0
 1  1  98708  23196   2512 1668920    0    0   256    40 1158  228  0 23 57 20  0
 2  1  98708  22968   2512 1669092    0    0 47360     0 2023 1097  0 26 49 25  0
 1  1  98708  22864   2704 1668784    0    0 39992   116 1878  994  0 27 49 24  0
 1  1  98708  22704   2708 1668872    0    0 53388     0 2166 1206  0 28 49 23  0
 1  1  98708  22796   2712 1668472    0    0 39940 12784 1873  966  0 27 52 21  0
 2  1  98708  22624   2712 1668508    0    0 21888 21904 1498  645  0 30 43 27  0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  1  98708  22700   2704 1668688    0    0 24320 26240 1548  685  0 23 55 22  0
 1  1  98708  22776   2712 1668284    0    0 18048 17532 1438  586  0 30 43 27  0
 1  1  98708  22868   2712 1668336    0    0 24192 21764 1552  661  0 23 55 23  0
 2  2  98708  22772   2716 1667696    0    0 12420 88772 1433  494  0 30 41 28  0
 1  2  98708  23424   2720 1667584    0    0     4  9092 1181  246  0 23 40 37  0
 1  1  98708  23120   2724 1667584    0    0     4 86380 1262  269  0 29 42 29  0
 1  0  98708  23408   2984 1667708    0    0   364     0 1187  249  0 22 60 18  0
 2  0  98708  23416   2992 1667772    0    0     0   120 1045  246  0 25 74  1  0
 1  0  98708  23408   2992 1667720    0    0     0     0 1025  221  0 26 74  0  0
 1  0  98708  23408   2992 1667720    0    0     0     0 1030  225  0 28 72  0  0
 1  0  98708  23408   2992 1667720    0    0     0     0 1024  224  0 23 77  0  0
 1  0  98708  23408   2992 1667720    0    0     0     0 1034  226  0 28 72  0  0
 1  0  98708  23408   2992 1667720    0    0     0     0 1022  221  0 23 77  0  0
 1  0  98708  23408   2992 1667720    0    0     0     0 1032  227  0 28 72  0  0
 1  0  98708  23408   2992 1667720    0    0     0     0 1024  221  0 23 77  0  0
 1  0  98708  23408   2992 1667720    0    0     0     0 1034  228  0 28 72  0  0

kswapd goes up to 99% at nearly the end of the second page.  I ^C on the
tar but kswapd stays at 99% for the end.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara April 28, 2011, 10:40 p.m. UTC | #11
On Thu 28-04-11 15:58:21, Colin Ian King wrote:
> On Thu, 2011-04-28 at 16:33 +0200, Jan Kara wrote:
> > On Thu 28-04-11 16:25:51, Jan Kara wrote:
> > > On Thu 28-04-11 15:01:22, Colin Ian King wrote:
> > > > 
> > > > > Could you post the soft lockups you're seeing?
> > > > 
> > > > As requested, attached
> > >   Hum, what keeps puzzling me is that in all the cases of hangs I've seen
> > > so far, we are stuck waiting for IO to finish for a long time - e.g. in the
> > > traces below kjournald waits for PageWriteback bit to get cleared. Also we
> > > are stuck waiting for page locks which might be because those pages are
> > > being read in? All in all it seems that the IO is just incredibly slow.
> > > 
> > > But it's not clear to me what pushes us into that situation (especially
> > > since ext4 refuses to do any IO from ->writepage (i.e. kswapd) when the
> > > underlying blocks are not already allocated.
> >   Hmm, maybe because the system is under memory pressure (and kswapd is not
> > able to get rid of dirty pages), we page out clean pages. Thus also pages
> > of executables which need to be paged in soon anyway thus putting heavy
> > read load on the system which makes writes crawl? I'm not sure why
> > compaction should make this any worse but maybe it can.
> > 
> > James, Colin, can you capture output of 'vmstat 1' while you do the
> > copying? Thanks.
> 
> Attached.
  Thanks. So I there are a few interesting points in the vmstat output:
For first 30 seconds, we are happily copying data - relatively steady read
throughput (about 20-40 MB/s) and occasional peak from flusher thread
flushing dirty data. During this time free memory drops from about 1.4 GB
to about 22!!! MB - mm seems to like to really use the machine ;). Then
things get interesting:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  1      0  83372   5228 1450776    0    0 39684 90132  450  918  0  4 74 22
 0  1      0  22416   5228 1509864    0    0 29452 48492  403  869  1  2 80 18
 2  0      0  20056   5384 1513996    0    0  2248  2116  434 1191  4  4 71 21
 0  1      0  19800   5932 1514600    0    0   644   104  454 1166  8  3 64 24
 1  0      0  21848   5940 1515244    0    0   292   380  468 1775 15  6 75 3
 1  0      0  20936   5940 1515876    0    0   296   296  496 1846 18  8 74 0
 1  0      0  17792   5940 1516580    0    0   356   356  484 1862 23  8 69 0
 1  0      0  17544   5940 1517364    0    0   412   412  482 1812 16  7 77 0
 4  0      0  18148   5948 1517968    0    0   288   344  436 1749 19  9 69 3
 0  2    220 137528   1616 1402468    0  220 74708  2164  849 1806  3  6 63 28
 0  3    224  36184   1628 1499648    0    4 50820 86204  532 1272  0  4 64 32
 0  2  19680  53688   1628 1484388   32 19456  6080 62972  242  287  0  2 63 34
 0  2  36928 1407432   1356 150980    0 17252  1564 17276  368  764  1  3 73 22

So when free memory reached about 20 MB, both read and write activity
almost stalled for 7 s (probably everybody waits for free memory). Then
mm manages to free 100 MB from page cache, things move on for two seconds,
then we swap out! about 36 MB and page reclaim also finally decides it
maybe has too much of page cache and reaps most of it (1.3 GB in one go).
Then things get going again, although there are still occasional stalls
such as this (about 30s later):
 1  3  36688 753192   1208 792344    0    0 35948 32768  435 6625  0  6 61 33
 0  2  36668 754996   1344 792760    0    0   252 58736 2832 16239  0  1 60 39
 0  2  36668 750132   1388 796688    0    0  2508  1524  325  959  1  3 68 28
 1  0  36668 751160   1400 797968    0    0   620   620  460 1470  6  6 50 38
 1  0  36668 750516   1400 798520    0    0   300   300  412 1764 17  8 75 1
 1  0  36668 750648   1408 799108    0    0   280   340  423 1816 18  6 73 3
 1  0  36668 748856   1408 799752    0    0   336   328  409 1788 18  8 75 0
 1  0  36668 748120   1416 800604    0    0   428   452  407 1723 14 10 75 2
 1  0  36668 750048   1416 801176    0    0   296   296  405 1779 18  7 75 1
 0  1  36668 650428   1420 897252    0    0 48100   556  658 1718 10  3 71 15
 0  2  36668 505444   1424 1037012    0    0 69888 90272  686 1491  1  4 68 27
 0  1  36668 479264   1428 1063372    0    0 12984 40896  324  674  1  1 76 23
...
I'm not sure what we were blocked on here since there is still plenty of
free memory (750 MB). These stalls repeat once in a while but things go on.
Then at about 350s, things just stop:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 3  1  75280  73564   1844 1503848    0    0 43396 81976  627 1061  0 25 42 32
 3  3  75280  73344   1852 1504256    0    0   256    20  240  149  0 26 25 49
 3  3  75280  73344   1852 1504268    0    0     0     0  265  140  0 29 13 58
 3  3  75280  73468   1852 1504232    0    0     0     0  265  132  0 22 34 44
 3  3  75280  73468   1852 1504232    0    0     0     0  339  283  0 25 26 49
 3  3  75280  73468   1852 1504232    0    0     0     0  362  327  0 25 25 50
 3  3  75280  73468   1852 1504232    0    0     0     0  317  320  0 26 25 49
 3  3  75280  73468   1852 1504232    0    0     0     0  361  343  0 26 25 50

and nothing really happens for 150 s, except more and more tasks blocking
in D state (second column).
 3  6  75272  73416   1872 1503872    0    0     0     0  445  700  0 25 25 50
 0  7  75264  67940   1884 1509008   64    0  5056    16  481  876  0 22 23 55
Then suddently things get going again:
 0  2  75104  76808   1892 1502552  352    0 14292 40456  459 14865  0  2 39 59
 0  1  75104  75704   1900 1503412    0    0   820    32  405  788  1  1 72 27
 1  0  75104  76512   1904 1505576    0    0  1068  1072  454 1586  8  7 74 11

I guess this 150 s stall is when kernel barfs the "task blocked for more
than 30 seconds" messages. And from the traces we know that everyone is
waiting for PageWriteback or page lock during this time. Also James's vmstat
report shows that IO is stalled when kswapd is spinning. Really strange.

James in the meantime identified that cgroups are somehow involved. Are you
using systemd by any chance? Maybe cgroup IO throttling screws us?

								Honza

> > > [  287.088371] INFO: task rs:main Q:Reg:749 blocked for more than 30 seconds.
> > > [  287.088374] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  287.088376] rs:main Q:Reg   D 0000000000000000     0   749      1 0x00000000
> > > [  287.088381]  ffff880072c17b68 0000000000000082 ffff880072c17fd8 ffff880072c16000
> > > [  287.088392]  0000000000013d00 ffff88003591b178 ffff880072c17fd8 0000000000013d00
> > > [  287.088396]  ffffffff81a0b020 ffff88003591adc0 ffff88001fffc3e8 ffff88001fc13d00
> > > [  287.088400] Call Trace:
> > > [  287.088404]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > > [  287.088408]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > > [  287.088411]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > > [  287.088414]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > > [  287.088418]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > > [  287.088421]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > > [  287.088425]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > > [  287.088428]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > > [  287.088431]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > > [  287.088434]  [<ffffffff81134a43>] ? unmap_region+0x113/0x170
> > > [  287.088437]  [<ffffffff812ded90>] ? prio_tree_insert+0x150/0x1c0
> > > [  287.088440]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > > [  287.088442]  [<ffffffff810442a7>] ? pte_alloc_one+0x37/0x50
> > > [  287.088446]  [<ffffffff815c2cce>] ? _raw_spin_lock+0xe/0x20
> > > [  287.088448]  [<ffffffff8112de25>] ? __pte_alloc+0xb5/0x100
> > > [  287.088451]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > > [  287.088454]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > > [  287.088457]  [<ffffffff81136f85>] ? do_mmap_pgoff+0x335/0x370
> > > [  287.088460]  [<ffffffff81137127>] ? sys_mmap_pgoff+0x167/0x230
> > > [  287.088463]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > > [  287.088466] INFO: task NetworkManager:764 blocked for more than 30 seconds.
> > > [  287.088468] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  287.088470] NetworkManager  D 0000000000000002     0   764      1 0x00000000
> > > [  287.088473]  ffff880074ffbb68 0000000000000082 ffff880074ffbfd8 ffff880074ffa000
> > > [  287.088477]  0000000000013d00 ffff880036051a98 ffff880074ffbfd8 0000000000013d00
> > > [  287.088481]  ffff8801005badc0 ffff8800360516e0 ffff88001ffef128 ffff88001fc53d00
> > > [  287.088484] Call Trace:
> > > [  287.088488]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > > [  287.088491]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > > [  287.088494]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > > [  287.088497]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > > [  287.088500]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > > [  287.088503]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > > [  287.088506]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > > [  287.088509]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > > [  287.088513]  [<ffffffff81177110>] ? pollwake+0x0/0x60
> > > [  287.088516]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > > [  287.088519]  [<ffffffff81177110>] ? pollwake+0x0/0x60
> > > [  287.088522]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > > [  287.088525]  [<ffffffff8111561d>] ? __free_pages+0x2d/0x40
> > > [  287.088527]  [<ffffffff8112de4f>] ? __pte_alloc+0xdf/0x100
> > > [  287.088530]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > > [  287.088533]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > > [  287.088537]  [<ffffffff81013859>] ? read_tsc+0x9/0x20
> > > [  287.088540]  [<ffffffff81092eb1>] ? ktime_get_ts+0xb1/0xf0
> > > [  287.088543]  [<ffffffff811776d2>] ? poll_select_set_timeout+0x82/0x90
> > > [  287.088546]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > > [  287.088559] INFO: task unity-panel-ser:1521 blocked for more than 30 seconds.
> > > [  287.088561] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  287.088562] unity-panel-ser D 0000000000000000     0  1521      1 0x00000000
> > > [  287.088566]  ffff880061f37b68 0000000000000082 ffff880061f37fd8 ffff880061f36000
> > > [  287.088570]  0000000000013d00 ffff880068c7c858 ffff880061f37fd8 0000000000013d00
> > > [  287.088573]  ffff88003591c4a0 ffff880068c7c4a0 ffff88001fff0c88 ffff88001fc13d00
> > > [  287.088577] Call Trace:
> > > [  287.088581]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > > [  287.088583]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > > [  287.088587]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > > [  287.088589]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > > [  287.088593]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > > [  287.088596]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > > [  287.088599]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > > [  287.088602]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > > [  287.088605]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > > [  287.088608]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > > [  287.088610]  [<ffffffff8111561d>] ? __free_pages+0x2d/0x40
> > > [  287.088613]  [<ffffffff8112de4f>] ? __pte_alloc+0xdf/0x100
> > > [  287.088616]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > > [  287.088619]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > > [  287.088622]  [<ffffffff81136f85>] ? do_mmap_pgoff+0x335/0x370
> > > [  287.088625]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > > [  287.088629] INFO: task jbd2/sda4-8:1845 blocked for more than 30 seconds.
> > > [  287.088630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  287.088632] jbd2/sda4-8     D 0000000000000000     0  1845      2 0x00000000
> > > [  287.088636]  ffff880068f6baf0 0000000000000046 ffff880068f6bfd8 ffff880068f6a000
> > > [  287.088639]  0000000000013d00 ffff880061d603b8 ffff880068f6bfd8 0000000000013d00
> > > [  287.088643]  ffff88003591c4a0 ffff880061d60000 ffff88001fff8548 ffff88001fc13d00
> > > [  287.088647] Call Trace:
> > > [  287.088650]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > > [  287.088653]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > > [  287.088656]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > > [  287.088659]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > > [  287.088662]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > > [  287.088665]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > > [  287.088668]  [<ffffffff8110c41d>] filemap_fdatawait_range+0xfd/0x190
> > > [  287.088672]  [<ffffffff8110c4db>] filemap_fdatawait+0x2b/0x30
> > > [  287.088675]  [<ffffffff81242a93>] journal_finish_inode_data_buffers+0x63/0x170
> > > [  287.088678]  [<ffffffff81243284>] jbd2_journal_commit_transaction+0x6e4/0x1190
> > > [  287.088682]  [<ffffffff81076185>] ? try_to_del_timer_sync+0x85/0xe0
> > > [  287.088685]  [<ffffffff81247e9b>] kjournald2+0xbb/0x220
> > > [  287.088688]  [<ffffffff81087f30>] ? autoremove_wake_function+0x0/0x40
> > > [  287.088691]  [<ffffffff81247de0>] ? kjournald2+0x0/0x220
> > > [  287.088694]  [<ffffffff810877e6>] kthread+0x96/0xa0
> > > [  287.088697]  [<ffffffff8100ce24>] kernel_thread_helper+0x4/0x10
> > > [  287.088700]  [<ffffffff81087750>] ? kthread+0x0/0xa0
> > > [  287.088703]  [<ffffffff8100ce20>] ? kernel_thread_helper+0x0/0x10
> > > [  287.088705] INFO: task dirname:5969 blocked for more than 30 seconds.
> > > [  287.088707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  287.088709] dirname         D 0000000000000002     0  5969   5214 0x00000000
> > > [  287.088712]  ffff88005bd9d8b8 0000000000000086 ffff88005bd9dfd8 ffff88005bd9c000
> > > [  287.088716]  0000000000013d00 ffff88005d65b178 ffff88005bd9dfd8 0000000000013d00
> > > [  287.088720]  ffff8801005e5b80 ffff88005d65adc0 ffff88001ffe5228 ffff88001fc53d00
> > > [  287.088723] Call Trace:
> > > [  287.088726]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > > [  287.088729]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > > [  287.088732]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > > [  287.088735]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > > [  287.088738]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > > [  287.088741]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > > [  287.088744]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > > [  287.088747]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > > [  287.088750]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > > [  287.088753]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > > [  287.088756]  [<ffffffff810442a7>] ? pte_alloc_one+0x37/0x50
> > > [  287.088759]  [<ffffffff815c2cce>] ? _raw_spin_lock+0xe/0x20
> > > [  287.088761]  [<ffffffff8112de25>] ? __pte_alloc+0xb5/0x100
> > > [  287.088764]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > > [  287.088767]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > > [  287.088770]  [<ffffffff81136947>] ? mmap_region+0x1f7/0x500
> > > [  287.088773]  [<ffffffff8112db06>] ? free_pgd_range+0x356/0x4a0
> > > [  287.088776]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > > [  287.088779]  [<ffffffff812e6d5f>] ? __clear_user+0x3f/0x70
> > > [  287.088782]  [<ffffffff812e6d41>] ? __clear_user+0x21/0x70
> > > [  287.088786]  [<ffffffff812e6dc6>] clear_user+0x36/0x40
> > > [  287.088788]  [<ffffffff811b0b6d>] padzero+0x2d/0x40
> > > [  287.088791]  [<ffffffff811b2c7a>] load_elf_binary+0x95a/0xe00
> > > [  287.088794]  [<ffffffff8116aa8a>] search_binary_handler+0xda/0x300
> > > [  287.088797]  [<ffffffff811b2320>] ? load_elf_binary+0x0/0xe00
> > > [  287.088800]  [<ffffffff8116c49c>] do_execve+0x24c/0x2d0
> > > [  287.088802]  [<ffffffff8101521a>] sys_execve+0x4a/0x80
> > > [  287.088805]  [<ffffffff8100c45c>] stub_execve+0x6c/0xc0
> > > -- 
> > > Jan Kara <jack@suse.cz>
> > > SUSE Labs, CR
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
James Bottomley April 28, 2011, 10:44 p.m. UTC | #12
On Fri, 2011-04-29 at 00:40 +0200, Jan Kara wrote:
> James in the meantime identified that cgroups are somehow involved.
> Are you
> using systemd by any chance? Maybe cgroup IO throttling screws us?

Yes, this is a FC15 system with systemd as the default.

See related post: it looks like disabling the memory controller is what
makes this go away.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Colin Ian King May 3, 2011, 6:55 p.m. UTC | #13
On Fri, 2011-04-29 at 00:40 +0200, Jan Kara wrote:
> On Thu 28-04-11 15:58:21, Colin Ian King wrote:
> > On Thu, 2011-04-28 at 16:33 +0200, Jan Kara wrote:
> > > On Thu 28-04-11 16:25:51, Jan Kara wrote:
> > > > On Thu 28-04-11 15:01:22, Colin Ian King wrote:
> > > > > 
> > > > > > Could you post the soft lockups you're seeing?
> > > > > 
> > > > > As requested, attached
> > > >   Hum, what keeps puzzling me is that in all the cases of hangs I've seen
> > > > so far, we are stuck waiting for IO to finish for a long time - e.g. in the
> > > > traces below kjournald waits for PageWriteback bit to get cleared. Also we
> > > > are stuck waiting for page locks which might be because those pages are
> > > > being read in? All in all it seems that the IO is just incredibly slow.
> > > > 
> > > > But it's not clear to me what pushes us into that situation (especially
> > > > since ext4 refuses to do any IO from ->writepage (i.e. kswapd) when the
> > > > underlying blocks are not already allocated.
> > >   Hmm, maybe because the system is under memory pressure (and kswapd is not
> > > able to get rid of dirty pages), we page out clean pages. Thus also pages
> > > of executables which need to be paged in soon anyway thus putting heavy
> > > read load on the system which makes writes crawl? I'm not sure why
> > > compaction should make this any worse but maybe it can.
> > > 
> > > James, Colin, can you capture output of 'vmstat 1' while you do the
> > > copying? Thanks.
> > 
> > Attached.
>   Thanks. So I there are a few interesting points in the vmstat output:
> For first 30 seconds, we are happily copying data - relatively steady read
> throughput (about 20-40 MB/s) and occasional peak from flusher thread
> flushing dirty data. During this time free memory drops from about 1.4 GB
> to about 22!!! MB - mm seems to like to really use the machine ;). Then
> things get interesting:
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  1      0  83372   5228 1450776    0    0 39684 90132  450  918  0  4 74 22
>  0  1      0  22416   5228 1509864    0    0 29452 48492  403  869  1  2 80 18
>  2  0      0  20056   5384 1513996    0    0  2248  2116  434 1191  4  4 71 21
>  0  1      0  19800   5932 1514600    0    0   644   104  454 1166  8  3 64 24
>  1  0      0  21848   5940 1515244    0    0   292   380  468 1775 15  6 75 3
>  1  0      0  20936   5940 1515876    0    0   296   296  496 1846 18  8 74 0
>  1  0      0  17792   5940 1516580    0    0   356   356  484 1862 23  8 69 0
>  1  0      0  17544   5940 1517364    0    0   412   412  482 1812 16  7 77 0
>  4  0      0  18148   5948 1517968    0    0   288   344  436 1749 19  9 69 3
>  0  2    220 137528   1616 1402468    0  220 74708  2164  849 1806  3  6 63 28
>  0  3    224  36184   1628 1499648    0    4 50820 86204  532 1272  0  4 64 32
>  0  2  19680  53688   1628 1484388   32 19456  6080 62972  242  287  0  2 63 34
>  0  2  36928 1407432   1356 150980    0 17252  1564 17276  368  764  1  3 73 22
> 
> So when free memory reached about 20 MB, both read and write activity
> almost stalled for 7 s (probably everybody waits for free memory). Then
> mm manages to free 100 MB from page cache, things move on for two seconds,
> then we swap out! about 36 MB and page reclaim also finally decides it
> maybe has too much of page cache and reaps most of it (1.3 GB in one go).

> Then things get going again, although there are still occasional stalls
> such as this (about 30s later):
>  1  3  36688 753192   1208 792344    0    0 35948 32768  435 6625  0  6 61 33
>  0  2  36668 754996   1344 792760    0    0   252 58736 2832 16239  0  1 60 39
>  0  2  36668 750132   1388 796688    0    0  2508  1524  325  959  1  3 68 28
>  1  0  36668 751160   1400 797968    0    0   620   620  460 1470  6  6 50 38
>  1  0  36668 750516   1400 798520    0    0   300   300  412 1764 17  8 75 1
>  1  0  36668 750648   1408 799108    0    0   280   340  423 1816 18  6 73 3
>  1  0  36668 748856   1408 799752    0    0   336   328  409 1788 18  8 75 0
>  1  0  36668 748120   1416 800604    0    0   428   452  407 1723 14 10 75 2
>  1  0  36668 750048   1416 801176    0    0   296   296  405 1779 18  7 75 1
>  0  1  36668 650428   1420 897252    0    0 48100   556  658 1718 10  3 71 15
>  0  2  36668 505444   1424 1037012    0    0 69888 90272  686 1491  1  4 68 27
>  0  1  36668 479264   1428 1063372    0    0 12984 40896  324  674  1  1 76 23
> ...
> I'm not sure what we were blocked on here since there is still plenty of
> free memory (750 MB). These stalls repeat once in a while but things go on.
> Then at about 350s, things just stop:
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  3  1  75280  73564   1844 1503848    0    0 43396 81976  627 1061  0 25 42 32
>  3  3  75280  73344   1852 1504256    0    0   256    20  240  149  0 26 25 49
>  3  3  75280  73344   1852 1504268    0    0     0     0  265  140  0 29 13 58
>  3  3  75280  73468   1852 1504232    0    0     0     0  265  132  0 22 34 44
>  3  3  75280  73468   1852 1504232    0    0     0     0  339  283  0 25 26 49
>  3  3  75280  73468   1852 1504232    0    0     0     0  362  327  0 25 25 50
>  3  3  75280  73468   1852 1504232    0    0     0     0  317  320  0 26 25 49
>  3  3  75280  73468   1852 1504232    0    0     0     0  361  343  0 26 25 50
> 
> and nothing really happens for 150 s, except more and more tasks blocking
> in D state (second column).
>  3  6  75272  73416   1872 1503872    0    0     0     0  445  700  0 25 25 50
>  0  7  75264  67940   1884 1509008   64    0  5056    16  481  876  0 22 23 55
> Then suddently things get going again:
>  0  2  75104  76808   1892 1502552  352    0 14292 40456  459 14865  0  2 39 59
>  0  1  75104  75704   1900 1503412    0    0   820    32  405  788  1  1 72 27
>  1  0  75104  76512   1904 1505576    0    0  1068  1072  454 1586  8  7 74 11
> 
> I guess this 150 s stall is when kernel barfs the "task blocked for more
> than 30 seconds" messages. And from the traces we know that everyone is
> waiting for PageWriteback or page lock during this time. Also James's vmstat
> report shows that IO is stalled when kswapd is spinning. Really strange.

Just to add, this machine has relatively low amount of memory (1GB).
I've re-run the tests today with cgroups disabled and it ran for 47
'copy' cycles, 27 'copy' cycles and then 35 'copy' cycles.

One extra data point, with maxcpus=1 I get a lockup after 2 'copy'
cycles every time, so it's more predictable than the default 4 processor
configuration.

> 
> James in the meantime identified that cgroups are somehow involved. Are you
> using systemd by any chance?

No, I'm using upstart.

>  Maybe cgroup IO throttling screws us?
> 
> 								Honza
> 
> > > > [  287.088371] INFO: task rs:main Q:Reg:749 blocked for more than 30 seconds.
> > > > [  287.088374] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > [  287.088376] rs:main Q:Reg   D 0000000000000000     0   749      1 0x00000000
> > > > [  287.088381]  ffff880072c17b68 0000000000000082 ffff880072c17fd8 ffff880072c16000
> > > > [  287.088392]  0000000000013d00 ffff88003591b178 ffff880072c17fd8 0000000000013d00
> > > > [  287.088396]  ffffffff81a0b020 ffff88003591adc0 ffff88001fffc3e8 ffff88001fc13d00
> > > > [  287.088400] Call Trace:
> > > > [  287.088404]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > > > [  287.088408]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > > > [  287.088411]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > > > [  287.088414]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > > > [  287.088418]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > > > [  287.088421]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > > > [  287.088425]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > > > [  287.088428]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > > > [  287.088431]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > > > [  287.088434]  [<ffffffff81134a43>] ? unmap_region+0x113/0x170
> > > > [  287.088437]  [<ffffffff812ded90>] ? prio_tree_insert+0x150/0x1c0
> > > > [  287.088440]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > > > [  287.088442]  [<ffffffff810442a7>] ? pte_alloc_one+0x37/0x50
> > > > [  287.088446]  [<ffffffff815c2cce>] ? _raw_spin_lock+0xe/0x20
> > > > [  287.088448]  [<ffffffff8112de25>] ? __pte_alloc+0xb5/0x100
> > > > [  287.088451]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > > > [  287.088454]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > > > [  287.088457]  [<ffffffff81136f85>] ? do_mmap_pgoff+0x335/0x370
> > > > [  287.088460]  [<ffffffff81137127>] ? sys_mmap_pgoff+0x167/0x230
> > > > [  287.088463]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > > > [  287.088466] INFO: task NetworkManager:764 blocked for more than 30 seconds.
> > > > [  287.088468] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > [  287.088470] NetworkManager  D 0000000000000002     0   764      1 0x00000000
> > > > [  287.088473]  ffff880074ffbb68 0000000000000082 ffff880074ffbfd8 ffff880074ffa000
> > > > [  287.088477]  0000000000013d00 ffff880036051a98 ffff880074ffbfd8 0000000000013d00
> > > > [  287.088481]  ffff8801005badc0 ffff8800360516e0 ffff88001ffef128 ffff88001fc53d00
> > > > [  287.088484] Call Trace:
> > > > [  287.088488]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > > > [  287.088491]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > > > [  287.088494]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > > > [  287.088497]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > > > [  287.088500]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > > > [  287.088503]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > > > [  287.088506]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > > > [  287.088509]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > > > [  287.088513]  [<ffffffff81177110>] ? pollwake+0x0/0x60
> > > > [  287.088516]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > > > [  287.088519]  [<ffffffff81177110>] ? pollwake+0x0/0x60
> > > > [  287.088522]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > > > [  287.088525]  [<ffffffff8111561d>] ? __free_pages+0x2d/0x40
> > > > [  287.088527]  [<ffffffff8112de4f>] ? __pte_alloc+0xdf/0x100
> > > > [  287.088530]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > > > [  287.088533]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > > > [  287.088537]  [<ffffffff81013859>] ? read_tsc+0x9/0x20
> > > > [  287.088540]  [<ffffffff81092eb1>] ? ktime_get_ts+0xb1/0xf0
> > > > [  287.088543]  [<ffffffff811776d2>] ? poll_select_set_timeout+0x82/0x90
> > > > [  287.088546]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > > > [  287.088559] INFO: task unity-panel-ser:1521 blocked for more than 30 seconds.
> > > > [  287.088561] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > [  287.088562] unity-panel-ser D 0000000000000000     0  1521      1 0x00000000
> > > > [  287.088566]  ffff880061f37b68 0000000000000082 ffff880061f37fd8 ffff880061f36000
> > > > [  287.088570]  0000000000013d00 ffff880068c7c858 ffff880061f37fd8 0000000000013d00
> > > > [  287.088573]  ffff88003591c4a0 ffff880068c7c4a0 ffff88001fff0c88 ffff88001fc13d00
> > > > [  287.088577] Call Trace:
> > > > [  287.088581]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > > > [  287.088583]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > > > [  287.088587]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > > > [  287.088589]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > > > [  287.088593]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > > > [  287.088596]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > > > [  287.088599]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > > > [  287.088602]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > > > [  287.088605]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > > > [  287.088608]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > > > [  287.088610]  [<ffffffff8111561d>] ? __free_pages+0x2d/0x40
> > > > [  287.088613]  [<ffffffff8112de4f>] ? __pte_alloc+0xdf/0x100
> > > > [  287.088616]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > > > [  287.088619]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > > > [  287.088622]  [<ffffffff81136f85>] ? do_mmap_pgoff+0x335/0x370
> > > > [  287.088625]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > > > [  287.088629] INFO: task jbd2/sda4-8:1845 blocked for more than 30 seconds.
> > > > [  287.088630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > [  287.088632] jbd2/sda4-8     D 0000000000000000     0  1845      2 0x00000000
> > > > [  287.088636]  ffff880068f6baf0 0000000000000046 ffff880068f6bfd8 ffff880068f6a000
> > > > [  287.088639]  0000000000013d00 ffff880061d603b8 ffff880068f6bfd8 0000000000013d00
> > > > [  287.088643]  ffff88003591c4a0 ffff880061d60000 ffff88001fff8548 ffff88001fc13d00
> > > > [  287.088647] Call Trace:
> > > > [  287.088650]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > > > [  287.088653]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > > > [  287.088656]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > > > [  287.088659]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > > > [  287.088662]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > > > [  287.088665]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > > > [  287.088668]  [<ffffffff8110c41d>] filemap_fdatawait_range+0xfd/0x190
> > > > [  287.088672]  [<ffffffff8110c4db>] filemap_fdatawait+0x2b/0x30
> > > > [  287.088675]  [<ffffffff81242a93>] journal_finish_inode_data_buffers+0x63/0x170
> > > > [  287.088678]  [<ffffffff81243284>] jbd2_journal_commit_transaction+0x6e4/0x1190
> > > > [  287.088682]  [<ffffffff81076185>] ? try_to_del_timer_sync+0x85/0xe0
> > > > [  287.088685]  [<ffffffff81247e9b>] kjournald2+0xbb/0x220
> > > > [  287.088688]  [<ffffffff81087f30>] ? autoremove_wake_function+0x0/0x40
> > > > [  287.088691]  [<ffffffff81247de0>] ? kjournald2+0x0/0x220
> > > > [  287.088694]  [<ffffffff810877e6>] kthread+0x96/0xa0
> > > > [  287.088697]  [<ffffffff8100ce24>] kernel_thread_helper+0x4/0x10
> > > > [  287.088700]  [<ffffffff81087750>] ? kthread+0x0/0xa0
> > > > [  287.088703]  [<ffffffff8100ce20>] ? kernel_thread_helper+0x0/0x10
> > > > [  287.088705] INFO: task dirname:5969 blocked for more than 30 seconds.
> > > > [  287.088707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > [  287.088709] dirname         D 0000000000000002     0  5969   5214 0x00000000
> > > > [  287.088712]  ffff88005bd9d8b8 0000000000000086 ffff88005bd9dfd8 ffff88005bd9c000
> > > > [  287.088716]  0000000000013d00 ffff88005d65b178 ffff88005bd9dfd8 0000000000013d00
> > > > [  287.088720]  ffff8801005e5b80 ffff88005d65adc0 ffff88001ffe5228 ffff88001fc53d00
> > > > [  287.088723] Call Trace:
> > > > [  287.088726]  [<ffffffff8110c070>] ? sync_page+0x0/0x50
> > > > [  287.088729]  [<ffffffff815c0990>] io_schedule+0x70/0xc0
> > > > [  287.088732]  [<ffffffff8110c0b0>] sync_page+0x40/0x50
> > > > [  287.088735]  [<ffffffff815c130f>] __wait_on_bit+0x5f/0x90
> > > > [  287.088738]  [<ffffffff8110c278>] wait_on_page_bit+0x78/0x80
> > > > [  287.088741]  [<ffffffff81087f70>] ? wake_bit_function+0x0/0x50
> > > > [  287.088744]  [<ffffffff8110dffd>] __lock_page_or_retry+0x3d/0x70
> > > > [  287.088747]  [<ffffffff8110e3c7>] filemap_fault+0x397/0x4a0
> > > > [  287.088750]  [<ffffffff8112d144>] __do_fault+0x54/0x520
> > > > [  287.088753]  [<ffffffff811309da>] handle_pte_fault+0xfa/0x210
> > > > [  287.088756]  [<ffffffff810442a7>] ? pte_alloc_one+0x37/0x50
> > > > [  287.088759]  [<ffffffff815c2cce>] ? _raw_spin_lock+0xe/0x20
> > > > [  287.088761]  [<ffffffff8112de25>] ? __pte_alloc+0xb5/0x100
> > > > [  287.088764]  [<ffffffff81131d5d>] handle_mm_fault+0x16d/0x250
> > > > [  287.088767]  [<ffffffff815c6a47>] do_page_fault+0x1a7/0x540
> > > > [  287.088770]  [<ffffffff81136947>] ? mmap_region+0x1f7/0x500
> > > > [  287.088773]  [<ffffffff8112db06>] ? free_pgd_range+0x356/0x4a0
> > > > [  287.088776]  [<ffffffff815c34d5>] page_fault+0x25/0x30
> > > > [  287.088779]  [<ffffffff812e6d5f>] ? __clear_user+0x3f/0x70
> > > > [  287.088782]  [<ffffffff812e6d41>] ? __clear_user+0x21/0x70
> > > > [  287.088786]  [<ffffffff812e6dc6>] clear_user+0x36/0x40
> > > > [  287.088788]  [<ffffffff811b0b6d>] padzero+0x2d/0x40
> > > > [  287.088791]  [<ffffffff811b2c7a>] load_elf_binary+0x95a/0xe00
> > > > [  287.088794]  [<ffffffff8116aa8a>] search_binary_handler+0xda/0x300
> > > > [  287.088797]  [<ffffffff811b2320>] ? load_elf_binary+0x0/0xe00
> > > > [  287.088800]  [<ffffffff8116c49c>] do_execve+0x24c/0x2d0
> > > > [  287.088802]  [<ffffffff8101521a>] sys_execve+0x4a/0x80
> > > > [  287.088805]  [<ffffffff8100c45c>] stub_execve+0x6c/0xc0
> > > > -- 
> > > > Jan Kara <jack@suse.cz>
> > > > SUSE Labs, CR
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6771ea7..cb08b41 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1934,12 +1934,14 @@  restart:
 	if (inactive_anon_is_low(zone, sc))
 		shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
 
+	throttle_vm_writeout(sc->gfp_mask);
+
 	/* reclaim/compaction might need reclaim to continue */
 	if (should_continue_reclaim(zone, nr_reclaimed,
-					sc->nr_scanned - nr_scanned, sc))
+					sc->nr_scanned - nr_scanned, sc)) {
+		cond_resched();
 		goto restart;
-
-	throttle_vm_writeout(sc->gfp_mask);
+	}
 }
 
 /*