Message ID | 20111225010238.GA6013@htj.dyndns.org |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
On 2011-12-25 02:02, Tejun Heo wrote: > While fixing io_context creation / task exit race condition, > 6e736be7f2 "block: make ioc get/put interface more conventional and > fix race on alloction" also prevented an exiting (%PF_EXITING) task > from creating its own io_context. This is incorrect as exit path may > issue IOs, e.g. from exit_files(), and if those IOs are the first ones > issued by the task, io_context needs to be created to process the IOs. > > Combined with the existing problem of io_context / io_cq creation > failure having the possibility of stalling IO, this problem results in > deterministic full IO lockup with certain workloads. > > Fix it by allowing io_context creation regardless of %PF_EXITING for > %current. Thanks, applied.
On Sun, 25 Dec 2011 14:29:34 +0100 Jens Axboe <axboe@kernel.dk> wrote: > On 2011-12-25 02:02, Tejun Heo wrote: > > While fixing io_context creation / task exit race condition, > > 6e736be7f2 "block: make ioc get/put interface more conventional and > > fix race on alloction" also prevented an exiting (%PF_EXITING) task > > from creating its own io_context. This is incorrect as exit path may > > issue IOs, e.g. from exit_files(), and if those IOs are the first ones > > issued by the task, io_context needs to be created to process the IOs. > > > > Combined with the existing problem of io_context / io_cq creation > > failure having the possibility of stalling IO, this problem results in > > deterministic full IO lockup with certain workloads. > > > > Fix it by allowing io_context creation regardless of %PF_EXITING for > > %current. The patch works for me. > Thanks, applied. So we get another great big bisection hole in mainline. I feel duty bound to rewhine about this :( -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 24 Dec 2011, Tejun Heo wrote: > While fixing io_context creation / task exit race condition, > 6e736be7f2 "block: make ioc get/put interface more conventional and > fix race on alloction" also prevented an exiting (%PF_EXITING) task > from creating its own io_context. This is incorrect as exit path may > issue IOs, e.g. from exit_files(), and if those IOs are the first ones > issued by the task, io_context needs to be created to process the IOs. > > Combined with the existing problem of io_context / io_cq creation > failure having the possibility of stalling IO, this problem results in > deterministic full IO lockup with certain workloads. > > Fix it by allowing io_context creation regardless of %PF_EXITING for > %current. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Andrew Morton <akpm@linux-foundation.org> > Reported-by: Hugh Dickins <hughd@google.com> Thanks, I think I've now built enough kernels on -next plus your patch to say that it does indeed solve that problem. However, there are a couple of other unhealthy symptoms I've noticed under load in -next's block/cfq layer, both with and without your patch. One is kernel BUG at block/cfq-iosched.c:2585! BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list)); cfq_dispatch_request+0x1a cfq_dispatch_requests+0x5c blk_peek_request+0x195 scsi_request_fn+0x6a __blk_run_queue+0x16 scsi_run_queue+0x18a scsi_next_command+0x36 scsi_io_completion+0x426 scsi_finish_command+0xaf scsi_softirq_done+0xdd blk_done_softirq+0x6c __do_softirq+0x80 call_softirq+0x1c do_softirq+0x33 irq_exit+0x3f do_IRQ+0x97 ret_from_intr I've had that one four times now on different machines; but quicker to reproduce are these warnings from CONFIG_DEBUG_LIST=y: ------------[ cut here ]------------ WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98() Hardware name: 4174AY9 list_del corruption. prev->next should be ffff880005aa1380, but was 6b6b6b6b6b6b6b6b Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device Pid: 29241, comm: cc1 Tainted: G W 3.2.0-rc6-next-20111222 #18 Call Trace: <IRQ> [<ffffffff810544b4>] warn_slowpath_common+0x80/0x98 [<ffffffff81054560>] warn_slowpath_fmt+0x41/0x43 [<ffffffff811fc1a1>] __list_del_entry+0x8d/0x98 [<ffffffff811df8ab>] cfq_remove_request+0x3b/0xdf [<ffffffff811df989>] cfq_dispatch_insert+0x3a/0x87 [<ffffffff811dfb3b>] cfq_dispatch_request+0x65/0x92 [<ffffffff811dfbc4>] cfq_dispatch_requests+0x5c/0x133 [<ffffffff812e103e>] ? scsi_request_fn+0x3b6/0x3d3 [<ffffffff811d3069>] blk_peek_request+0x195/0x1a6 [<ffffffff812e103e>] ? scsi_request_fn+0x3b6/0x3d3 [<ffffffff812e0cf5>] scsi_request_fn+0x6d/0x3d3 [<ffffffff811d0730>] __blk_run_queue+0x19/0x1b [<ffffffff811d0bfd>] blk_run_queue+0x21/0x35 [<ffffffff812e08c4>] scsi_run_queue+0x11f/0x1b9 [<ffffffff812e205c>] scsi_next_command+0x36/0x46 [<ffffffff812e24dc>] scsi_io_completion+0x426/0x4a9 [<ffffffff812dc0b2>] scsi_finish_command+0xaf/0xb8 [<ffffffff812e200c>] scsi_softirq_done+0xdd/0xe5 [<ffffffff811d79c6>] blk_done_softirq+0x76/0x8a [<ffffffff8105a28d>] __do_softirq+0x98/0x136 [<ffffffff814e649c>] call_softirq+0x1c/0x30 [<ffffffff8102f187>] do_softirq+0x38/0x81 [<ffffffff8105a596>] irq_exit+0x4e/0xb6 [<ffffffff8102ee9e>] do_IRQ+0x97/0xae [<ffffffff814e49f0>] common_interrupt+0x70/0x70 <EOI> [<ffffffff814e4a8e>] ? retint_swapgs+0xe/0x13 ---[ end trace 61fdaa1b260613d1 ]--- Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Hugh. On Wed, Dec 28, 2011 at 12:33:01AM -0800, Hugh Dickins wrote: > Thanks, I think I've now built enough kernels on -next plus your patch > to say that it does indeed solve that problem. Awesome, thanks for verifying the fix. > However, there are a couple of other unhealthy symptoms I've noticed > under load in -next's block/cfq layer, both with and without your patch. > > One is kernel BUG at block/cfq-iosched.c:2585! > BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list)); > > cfq_dispatch_request+0x1a > cfq_dispatch_requests+0x5c > blk_peek_request+0x195 > scsi_request_fn+0x6a > __blk_run_queue+0x16 > scsi_run_queue+0x18a > scsi_next_command+0x36 > scsi_io_completion+0x426 > scsi_finish_command+0xaf > scsi_softirq_done+0xdd > blk_done_softirq+0x6c > __do_softirq+0x80 > call_softirq+0x1c > do_softirq+0x33 > irq_exit+0x3f > do_IRQ+0x97 > ret_from_intr > > I've had that one four times now on different machines; but quicker > to reproduce are these warnings from CONFIG_DEBUG_LIST=y: > > ------------[ cut here ]------------ > WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98() > Hardware name: 4174AY9 > list_del corruption. prev->next should be ffff880005aa1380, but was 6b6b6b6b6b6b6b6b > Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device > Pid: 29241, comm: cc1 Tainted: G W 3.2.0-rc6-next-20111222 #18 > Call Trace: > <IRQ> [<ffffffff810544b4>] warn_slowpath_common+0x80/0x98 > [<ffffffff81054560>] warn_slowpath_fmt+0x41/0x43 > [<ffffffff811fc1a1>] __list_del_entry+0x8d/0x98 > [<ffffffff811df8ab>] cfq_remove_request+0x3b/0xdf > [<ffffffff811df989>] cfq_dispatch_insert+0x3a/0x87 > [<ffffffff811dfb3b>] cfq_dispatch_request+0x65/0x92 > [<ffffffff811dfbc4>] cfq_dispatch_requests+0x5c/0x133 > [<ffffffff812e103e>] ? scsi_request_fn+0x3b6/0x3d3 > [<ffffffff811d3069>] blk_peek_request+0x195/0x1a6 > [<ffffffff812e103e>] ? scsi_request_fn+0x3b6/0x3d3 > [<ffffffff812e0cf5>] scsi_request_fn+0x6d/0x3d3 > [<ffffffff811d0730>] __blk_run_queue+0x19/0x1b > [<ffffffff811d0bfd>] blk_run_queue+0x21/0x35 > [<ffffffff812e08c4>] scsi_run_queue+0x11f/0x1b9 > [<ffffffff812e205c>] scsi_next_command+0x36/0x46 > [<ffffffff812e24dc>] scsi_io_completion+0x426/0x4a9 > [<ffffffff812dc0b2>] scsi_finish_command+0xaf/0xb8 > [<ffffffff812e200c>] scsi_softirq_done+0xdd/0xe5 > [<ffffffff811d79c6>] blk_done_softirq+0x76/0x8a > [<ffffffff8105a28d>] __do_softirq+0x98/0x136 > [<ffffffff814e649c>] call_softirq+0x1c/0x30 > [<ffffffff8102f187>] do_softirq+0x38/0x81 > [<ffffffff8105a596>] irq_exit+0x4e/0xb6 > [<ffffffff8102ee9e>] do_IRQ+0x97/0xae > [<ffffffff814e49f0>] common_interrupt+0x70/0x70 > <EOI> [<ffffffff814e4a8e>] ? retint_swapgs+0xe/0x13 > ---[ end trace 61fdaa1b260613d1 ]--- Hmm... that looks like cfqq being freed before unlinked. I'll try to reproduce it. Is there any particular workload you were running? Thanks.
On Wed, 28 Dec 2011, Tejun Heo wrote: > On Wed, Dec 28, 2011 at 12:33:01AM -0800, Hugh Dickins wrote: > > > However, there are a couple of other unhealthy symptoms I've noticed > > under load in -next's block/cfq layer, both with and without your patch. > > > > One is kernel BUG at block/cfq-iosched.c:2585! > > BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list)); > > > > cfq_dispatch_request+0x1a > > cfq_dispatch_requests+0x5c > > blk_peek_request+0x195 > > scsi_request_fn+0x6a > > __blk_run_queue+0x16 > > scsi_run_queue+0x18a > > scsi_next_command+0x36 > > scsi_io_completion+0x426 > > scsi_finish_command+0xaf > > scsi_softirq_done+0xdd > > blk_done_softirq+0x6c > > __do_softirq+0x80 > > call_softirq+0x1c > > do_softirq+0x33 > > irq_exit+0x3f > > do_IRQ+0x97 > > ret_from_intr > > > > I've had that one four times now on different machines; but quicker > > to reproduce are these warnings from CONFIG_DEBUG_LIST=y: > > > > ------------[ cut here ]------------ > > WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98() > > Hardware name: 4174AY9 > > list_del corruption. prev->next should be ffff880005aa1380, but was 6b6b6b6b6b6b6b6b > > Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device > > Pid: 29241, comm: cc1 Tainted: G W 3.2.0-rc6-next-20111222 #18 > > Call Trace: > > <IRQ> [<ffffffff810544b4>] warn_slowpath_common+0x80/0x98 > > [<ffffffff81054560>] warn_slowpath_fmt+0x41/0x43 > > [<ffffffff811fc1a1>] __list_del_entry+0x8d/0x98 > > [<ffffffff811df8ab>] cfq_remove_request+0x3b/0xdf > > [<ffffffff811df989>] cfq_dispatch_insert+0x3a/0x87 > > [<ffffffff811dfb3b>] cfq_dispatch_request+0x65/0x92 > > [<ffffffff811dfbc4>] cfq_dispatch_requests+0x5c/0x133 > > [<ffffffff812e103e>] ? scsi_request_fn+0x3b6/0x3d3 > > [<ffffffff811d3069>] blk_peek_request+0x195/0x1a6 > > [<ffffffff812e103e>] ? scsi_request_fn+0x3b6/0x3d3 > > [<ffffffff812e0cf5>] scsi_request_fn+0x6d/0x3d3 > > [<ffffffff811d0730>] __blk_run_queue+0x19/0x1b > > [<ffffffff811d0bfd>] blk_run_queue+0x21/0x35 > > [<ffffffff812e08c4>] scsi_run_queue+0x11f/0x1b9 > > [<ffffffff812e205c>] scsi_next_command+0x36/0x46 > > [<ffffffff812e24dc>] scsi_io_completion+0x426/0x4a9 > > [<ffffffff812dc0b2>] scsi_finish_command+0xaf/0xb8 > > [<ffffffff812e200c>] scsi_softirq_done+0xdd/0xe5 > > [<ffffffff811d79c6>] blk_done_softirq+0x76/0x8a > > [<ffffffff8105a28d>] __do_softirq+0x98/0x136 > > [<ffffffff814e649c>] call_softirq+0x1c/0x30 > > [<ffffffff8102f187>] do_softirq+0x38/0x81 > > [<ffffffff8105a596>] irq_exit+0x4e/0xb6 > > [<ffffffff8102ee9e>] do_IRQ+0x97/0xae > > [<ffffffff814e49f0>] common_interrupt+0x70/0x70 > > <EOI> [<ffffffff814e4a8e>] ? retint_swapgs+0xe/0x13 > > ---[ end trace 61fdaa1b260613d1 ]--- > > Hmm... that looks like cfqq being freed before unlinked. I'll try to > reproduce it. Is there any particular workload you were running? "It's the tmpfs swapping test that I've been running, with variations, for years. System booted with mem=700M and 1.5G swap, two repetitious make -j20 kernel builds (of a 2.6.24 kernel: I stuck with that because the balance of built to unbuilt source grows smaller with later kernels), one directly in a tmpfs, the other in a 1k-block ext2 (that I drive with ext4's CONFIG_EXT4_USE_FOR_EXT23) on /dev/loop0 on a 450MB tmpfs file." I doubt much of that (quoted from an older mail to someone else about one of the many other bugs it's found) is relevant: maybe just plenty of file I/O and swapping. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Wed, Dec 28, 2011 at 9:50 AM, Hugh Dickins <hughd@google.com> wrote: > "It's the tmpfs swapping test that I've been running, with variations, > for years. System booted with mem=700M and 1.5G swap, two repetitious > make -j20 kernel builds (of a 2.6.24 kernel: I stuck with that because > the balance of built to unbuilt source grows smaller with later kernels), > one directly in a tmpfs, the other in a 1k-block ext2 (that I drive with > ext4's CONFIG_EXT4_USE_FOR_EXT23) on /dev/loop0 on a 450MB tmpfs file." > > I doubt much of that (quoted from an older mail to someone else about > one of the many other bugs it's found) is relevant: maybe just plenty > of file I/O and swapping. Plain -j4 build isn't triggering anything. I'll try to replicate the condition. Thanks.
On Wed, Dec 28, 2011 at 09:55:12AM -0800, Tejun Heo wrote: > Hello, > > On Wed, Dec 28, 2011 at 9:50 AM, Hugh Dickins <hughd@google.com> wrote: > > "It's the tmpfs swapping test that I've been running, with variations, > > for years. System booted with mem=700M and 1.5G swap, two repetitious > > make -j20 kernel builds (of a 2.6.24 kernel: I stuck with that because > > the balance of built to unbuilt source grows smaller with later kernels), > > one directly in a tmpfs, the other in a 1k-block ext2 (that I drive with > > ext4's CONFIG_EXT4_USE_FOR_EXT23) on /dev/loop0 on a 450MB tmpfs file." > > > > I doubt much of that (quoted from an older mail to someone else about > > one of the many other bugs it's found) is relevant: maybe just plenty > > of file I/O and swapping. > > Plain -j4 build isn't triggering anything. I'll try to replicate the condition. It's not too reliable but I can reproduce it with -j 22 allmodconfig build inside qemu w/ 512M of memory. I'll try to find out what's going on. Thanks.
Happy new year, guys. On Wed, Dec 28, 2011 at 01:19:18PM -0800, Tejun Heo wrote: > > On Wed, Dec 28, 2011 at 9:50 AM, Hugh Dickins <hughd@google.com> wrote: > > > "It's the tmpfs swapping test that I've been running, with variations, > > > for years. System booted with mem=700M and 1.5G swap, two repetitious > > > make -j20 kernel builds (of a 2.6.24 kernel: I stuck with that because > > > the balance of built to unbuilt source grows smaller with later kernels), > > > one directly in a tmpfs, the other in a 1k-block ext2 (that I drive with > > > ext4's CONFIG_EXT4_USE_FOR_EXT23) on /dev/loop0 on a 450MB tmpfs file." > > > > > > I doubt much of that (quoted from an older mail to someone else about > > > one of the many other bugs it's found) is relevant: maybe just plenty > > > of file I/O and swapping. > > > > Plain -j4 build isn't triggering anything. I'll try to replicate the condition. > > It's not too reliable but I can reproduce it with -j 22 allmodconfig > build inside qemu w/ 512M of memory. I'll try to find out what's > going on. I misread the code, the problem is empty cfqq on the cfq prio tree. I don't think this is caused by recent io_context changes. It looks like somebody is forgetting to remove cfqq from the dispatch prio tree after emptying a cfqq by removing a request from it. Jens, any ideas? Thanks.
Hello, again. Adding Shaohua Li as he fixed a similar issue in 4a0b75c7d0 "block, cfq: fix empty queue crash caused by request merge". The original thread can be read from http://thread.gmane.org/gmane.linux.kernel.next/20064/focus=20159 On Tue, Jan 03, 2012 at 09:35:00AM -0800, Tejun Heo wrote: > Happy new year, guys. > > On Wed, Dec 28, 2011 at 01:19:18PM -0800, Tejun Heo wrote: > > > On Wed, Dec 28, 2011 at 9:50 AM, Hugh Dickins <hughd@google.com> wrote: > > > > "It's the tmpfs swapping test that I've been running, with variations, > > > > for years. System booted with mem=700M and 1.5G swap, two repetitious > > > > make -j20 kernel builds (of a 2.6.24 kernel: I stuck with that because > > > > the balance of built to unbuilt source grows smaller with later kernels), > > > > one directly in a tmpfs, the other in a 1k-block ext2 (that I drive with > > > > ext4's CONFIG_EXT4_USE_FOR_EXT23) on /dev/loop0 on a 450MB tmpfs file." > > > > > > > > I doubt much of that (quoted from an older mail to someone else about > > > > one of the many other bugs it's found) is relevant: maybe just plenty > > > > of file I/O and swapping. > > > > > > Plain -j4 build isn't triggering anything. I'll try to replicate the condition. > > > > It's not too reliable but I can reproduce it with -j 22 allmodconfig > > build inside qemu w/ 512M of memory. I'll try to find out what's > > going on. > > I misread the code, the problem is empty cfqq on the cfq prio tree. I > don't think this is caused by recent io_context changes. It looks > like somebody is forgetting to remove cfqq from the dispatch prio tree > after emptying a cfqq by removing a request from it. Jens, any ideas? That should have been service tree. I couldn't find more missing removals other than the one Shaohua's patch already fixed. Close cooperator selection in cfq_select_queue() seems suspicious tho. I can't see what prevents it from returning an empty coopeator cfqq. I'm trying to verify whether that's the case. Will update when I know more. Thanks.
On Tue, Jan 03, 2012 at 09:59:22AM -0800, Tejun Heo wrote: > That should have been service tree. I couldn't find more missing > removals other than the one Shaohua's patch already fixed. Close > cooperator selection in cfq_select_queue() seems suspicious tho. I > can't see what prevents it from returning an empty coopeator cfqq. > I'm trying to verify whether that's the case. Will update when I know > more. While testing, found another bug. Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [<ffffffff813a82ee>](cfq_put_queue+0x7e/0xd0) 070: e8 32 ab 1d 00 88 ff ff e8 32 ab 1d 00 88 ff ff .2.......2...... Prev obj: start=ffff88001dab3178, len=232 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [<ffffffff813a82ee>](cfq_put_queue+0x7e/0xd0) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Next obj: start=ffff88001dab3378, len=232 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. Last user: [<ffffffff813a8e53>](cfq_get_queue+0x153/0x670) 000: 02 00 00 00 21 01 00 00 e0 c9 b1 1d 00 88 ff ff ....!........... 010: 89 96 ae 18 00 88 ff ff 00 00 00 00 00 00 00 00 ................ The field at 0x70 which is being updated after being freed is cfqq->fifo. Interestingly, it didn't lead to any visible failure. Thanks.
On 2012-01-03 21:09, Tejun Heo wrote: > On Tue, Jan 03, 2012 at 09:59:22AM -0800, Tejun Heo wrote: >> That should have been service tree. I couldn't find more missing >> removals other than the one Shaohua's patch already fixed. Close >> cooperator selection in cfq_select_queue() seems suspicious tho. I >> can't see what prevents it from returning an empty coopeator cfqq. >> I'm trying to verify whether that's the case. Will update when I know >> more. > > While testing, found another bug. > > Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. > Last user: [<ffffffff813a82ee>](cfq_put_queue+0x7e/0xd0) > 070: e8 32 ab 1d 00 88 ff ff e8 32 ab 1d 00 88 ff ff .2.......2...... > Prev obj: start=ffff88001dab3178, len=232 > Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. > Last user: [<ffffffff813a82ee>](cfq_put_queue+0x7e/0xd0) > 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > Next obj: start=ffff88001dab3378, len=232 > Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. > Last user: [<ffffffff813a8e53>](cfq_get_queue+0x153/0x670) > 000: 02 00 00 00 21 01 00 00 e0 c9 b1 1d 00 88 ff ff ....!........... > 010: 89 96 ae 18 00 88 ff ff 00 00 00 00 00 00 00 00 ................ > > The field at 0x70 which is being updated after being freed is > cfqq->fifo. Interestingly, it didn't lead to any visible failure. That's pretty odd. Given Hughs report as well, it sure does sound like we now have some life time issues with cfqq's.
On Tue, Jan 03, 2012 at 09:20:44PM +0100, Jens Axboe wrote: > On 2012-01-03 21:09, Tejun Heo wrote: > > On Tue, Jan 03, 2012 at 09:59:22AM -0800, Tejun Heo wrote: > >> That should have been service tree. I couldn't find more missing > >> removals other than the one Shaohua's patch already fixed. Close > >> cooperator selection in cfq_select_queue() seems suspicious tho. I > >> can't see what prevents it from returning an empty coopeator cfqq. > >> I'm trying to verify whether that's the case. Will update when I know > >> more. > > > > While testing, found another bug. > > > > Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. > > Last user: [<ffffffff813a82ee>](cfq_put_queue+0x7e/0xd0) > > 070: e8 32 ab 1d 00 88 ff ff e8 32 ab 1d 00 88 ff ff .2.......2...... > > Prev obj: start=ffff88001dab3178, len=232 > > Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. > > Last user: [<ffffffff813a82ee>](cfq_put_queue+0x7e/0xd0) > > 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > > 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > > Next obj: start=ffff88001dab3378, len=232 > > Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. > > Last user: [<ffffffff813a8e53>](cfq_get_queue+0x153/0x670) > > 000: 02 00 00 00 21 01 00 00 e0 c9 b1 1d 00 88 ff ff ....!........... > > 010: 89 96 ae 18 00 88 ff ff 00 00 00 00 00 00 00 00 ................ > > > > The field at 0x70 which is being updated after being freed is > > cfqq->fifo. Interestingly, it didn't lead to any visible failure. > > That's pretty odd. Given Hughs report as well, it sure does sound like > we now have some life time issues with cfqq's. Hmmm... I disabled cfqq merge logic (commented out cfq_close_cooperator() and the following cfq_setup_merge() calls) in cfq_select_queue() and neither is triggering for quite a while now. Maybe cfqq refcnt is getting borked over cfqq merging / splitting? It would also explain the low frequency of the issue too. I'll try to further isolate it but It would be awesome if someone more familiar with the logic can go over that part. Thanks.
On Tue, Jan 03, 2012 at 02:13:01PM -0800, Tejun Heo wrote: > > That's pretty odd. Given Hughs report as well, it sure does sound like > > we now have some life time issues with cfqq's. > > Hmmm... I disabled cfqq merge logic (commented out > cfq_close_cooperator() and the following cfq_setup_merge() calls) in > cfq_select_queue() and neither is triggering for quite a while now. > Maybe cfqq refcnt is getting borked over cfqq merging / splitting? It > would also explain the low frequency of the issue too. I'll try to > further isolate it but It would be awesome if someone more familiar > with the logic can go over that part. Scrap that. It triggered and yeah cfq_get_next_queue() is retrieving empty cfqq from the service tree. Thanks.
diff --git a/block/blk-ioc.c b/block/blk-ioc.c index ce9b35a..33fae7d 100644 --- a/block/blk-ioc.c +++ b/block/blk-ioc.c @@ -281,9 +281,16 @@ void create_io_context_slowpath(struct task_struct *task, gfp_t gfp_flags, INIT_HLIST_HEAD(&ioc->icq_list); INIT_WORK(&ioc->release_work, ioc_release_fn); - /* try to install, somebody might already have beaten us to it */ + /* + * Try to install. ioc shouldn't be installed if someone else + * already did or @task, which isn't %current, is exiting. Note + * that we need to allow ioc creation on exiting %current as exit + * path may issue IOs from e.g. exit_files(). The exit path is + * responsible for not issuing IO after exit_io_context(). + */ task_lock(task); - if (!task->io_context && !(task->flags & PF_EXITING)) + if (!task->io_context && + (task == current || !(task->flags & PF_EXITING))) task->io_context = ioc; else kmem_cache_free(iocontext_cachep, ioc);
While fixing io_context creation / task exit race condition, 6e736be7f2 "block: make ioc get/put interface more conventional and fix race on alloction" also prevented an exiting (%PF_EXITING) task from creating its own io_context. This is incorrect as exit path may issue IOs, e.g. from exit_files(), and if those IOs are the first ones issued by the task, io_context needs to be created to process the IOs. Combined with the existing problem of io_context / io_cq creation failure having the possibility of stalling IO, this problem results in deterministic full IO lockup with certain workloads. Fix it by allowing io_context creation regardless of %PF_EXITING for %current. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andrew Morton <akpm@linux-foundation.org> Reported-by: Hugh Dickins <hughd@google.com> --- Thanks a lot for the hint, Hugh. My testing stuff (fio, dd and some adhoc rawio testing programs) was issuing IOs before exiting, so I didn't hit the problem and I suspect the reason why I didn't see the boot failure Andrew was seeing was because of systemd - boot process used to be dominated by lots of short-lived programs, many of which touching/modifying files, and thus it triggered the first IO on exit paths with Andrew's old userland. With systemd, most of those are gone, so... Thanks. block/blk-ioc.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html