Patchwork [RESEND,0/1] AHCI: Optimize interrupt processing

login
register
mail settings
Submitter Mike Christie
Date July 20, 2013, 2:48 p.m.
Message ID <51EAA33C.9010405@fusionio.com>
Download mbox | patch
Permalink /patch/260460/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Mike Christie - July 20, 2013, 2:48 p.m.
On 07/19/2013 11:56 PM, Nicholas A. Bellinger wrote:
> On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote:
>> On Fri, 2013-07-19 at 08:33 -0700, James Bottomley wrote:
>>> On Thu, 2013-07-18 at 23:34 -0700, Nicholas A. Bellinger wrote:
>>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>>> index 0101af5..191bc15 100644
>>>> --- a/drivers/ata/libata-scsi.c
>>>> +++ b/drivers/ata/libata-scsi.c
>>>> @@ -1144,7 +1144,11 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
>>>>                         "sector_size=%u > PAGE_SIZE, PIO may malfunction\n",
>>>>                         sdev->sector_size);
>>>>  
>>>> -       blk_queue_update_dma_alignment(q, sdev->sector_size - 1);
>>>> +       if (!q->mq_ops) {
>>>> +               blk_queue_update_dma_alignment(q, sdev->sector_size - 1);
>>>> +       } else {
>>>> +               printk("Skipping dma_alignment for libata w/ scsi-mq\n");
>>>> +       }
>>>
>>> Amazingly enough there is a reason for the dma alignment, and it wasn't
>>> just to annoy you, so you can't blindly do this.
>>>
>>> The email thread is probably lost in the mists of time, but if I
>>> remember correctly the problem is that some ahci DMA controllers barf if
>>> the sector they're doing DMA on crosses a page boundary.  Some are
>>> annoying enough to actually cause silent data corruption.  You won't
>>> find every ahci DMA controller doing this, so the change will work for
>>> some, but it will be hard to identify those it won't work for until
>>> people start losing data.
>>
>> Thanks for the extra background.
>>
>> So at least from what I gather thus far this shouldn't be an issue for
>> initial testing with scsi-mq <-> libata w/ ata_piix.
>>
>>>
>>> The correct fix, obviously, is to do the bio copy on the kernel path for
>>> unaligned data.  It is OK to assume that REQ_TYPE_FS data is correctly
>>> aligned (because of the block to page alignment).
>>>
>>
>> Indeed.  Looking into the bio_copy_kern() breakage next..
>>
> 
> OK, after further investigation the root cause is a actually a missing
> bio->bio_end_io() -> bio_copy_kern_endio() -> bio_put() from the
> blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is
> currently using.
> 
> Including the following patch into the scsi-mq working branch now, and
> reverting the libata dma_alignment=0x03 hack.
> 
> Alexander, care to give this a try..?
> 
> --nab
> 
> diff --git a/block/blk-exec.c b/block/blk-exec.c
> index 0761c89..70303d2 100644
> --- a/block/blk-exec.c
> +++ b/block/blk-exec.c
> @@ -25,7 +25,10 @@ static void blk_end_sync_rq(struct request *rq, int error)
>         struct completion *waiting = rq->end_io_data;
>  
>         rq->end_io_data = NULL;
> -       if (!rq->q->mq_ops) {
> +       if (rq->q->mq_ops) {
> +               if (rq->bio)
> +                       bio_endio(rq->bio, error);
> +       } else {
>                 __blk_put_request(rq->q, rq);
>         }
> 


This does not handle requests with multiple bios, and for the mq stye
passthrough insertion completions you actually want to call
blk_mq_finish_request in scsi_execute. Same for all the other
passthrough code in your scsi mq tree. That is your root bug. Instead of
doing that though I think we want to have the block layer free the bios
like before.

For the non mq calls, blk_end_request type of calls will complete the
bios when blk_finish_request is called from that path. It will then call
the rq end_io callback.

I think the blk mq code assumes if the end_io callack is set that the
end_io function will do the bio cleanup. See __blk_mq_end_io. Also see
how blk_mq_execute_rq calls blk_mq_finish_request for an example of how
rq passthrough execution and cleanup is being done in the mq paths.

Now with the scsi mq changes, when blk_execute_rq_nowait calls
blk_mq_insert_request it calls it with a old non mq style of end io
function that does not complete the bios.

What about the attached only compile tested patch. The patch has the mq
block code work like the non mq code for bio cleanups.
blk-mq: blk-mq should free bios in pass through case

For non mq calls, the block layer will free the bios when
blk_finish_request is called.

For mq calls, the blk mq code wants the caller to do this.

This patch has the blk mq code work like the non mq code
and has the block layer free the bios.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Jens Axboe - July 20, 2013, 11:57 p.m.
On Sat, Jul 20 2013, Mike Christie wrote:
> blk-mq: blk-mq should free bios in pass through case
> 
> For non mq calls, the block layer will free the bios when
> blk_finish_request is called.
> 
> For mq calls, the blk mq code wants the caller to do this.
> 
> This patch has the blk mq code work like the non mq code
> and has the block layer free the bios.

Thanks Mike, looks good, applied.
Alexander Gordeev - Aug. 9, 2013, 7:15 p.m.
On Sat, Jul 20, 2013 at 09:48:28AM -0500, Mike Christie wrote:
> What about the attached only compile tested patch. The patch has the mq
> block code work like the non mq code for bio cleanups.

Not sure if it is related to the patch or not, but it never returns from
wait_for_completion_io(&wait) in blkdev_issue_flush():

# ps axl | awk '$10 ~ /D\+/'
4     0   938   879  20   0 111216   656 blkdev D+   pts/1      0:00 fdisk/dev/sda
#
# cat /proc/938/stack
[<ffffffff812a8a5c>] blkdev_issue_flush+0xfc/0x160
[<ffffffff811ac606>] blkdev_fsync+0x96/0xc0
[<ffffffff811a2f4d>] do_fsync+0x5d/0x90
[<ffffffff811a3330>] SyS_fsync+0x10/0x20
[<ffffffff81611582>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Any ideas?

Thanks!
Nicholas A. Bellinger - Aug. 9, 2013, 8:17 p.m.
On Fri, 2013-08-09 at 21:15 +0200, Alexander Gordeev wrote:
> On Sat, Jul 20, 2013 at 09:48:28AM -0500, Mike Christie wrote:
> > What about the attached only compile tested patch. The patch has the mq
> > block code work like the non mq code for bio cleanups.
> 
> Not sure if it is related to the patch or not, but it never returns from
> wait_for_completion_io(&wait) in blkdev_issue_flush():
> 
> # ps axl | awk '$10 ~ /D\+/'
> 4     0   938   879  20   0 111216   656 blkdev D+   pts/1      0:00 fdisk/dev/sda
> #
> # cat /proc/938/stack
> [<ffffffff812a8a5c>] blkdev_issue_flush+0xfc/0x160
> [<ffffffff811ac606>] blkdev_fsync+0x96/0xc0
> [<ffffffff811a2f4d>] do_fsync+0x5d/0x90
> [<ffffffff811a3330>] SyS_fsync+0x10/0x20
> [<ffffffff81611582>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Any ideas?
> 

Mmmm, I'm able to reproduce over here with ahci + scsi-mq, and it
appears to be a bug related with using sdev->sdev_md_req.queue_depth=1,
that ends up causing the blkdev_issue_flush() to wait forever because
blk_mq_wait_for_tags() never ends up getting the single tag back for the
WRITE_FLUSH bio -> SYNCHRONIZE_CACHE cdb.

Here's the echo w > /proc/sysrq-trigger output:

[  282.620140] SysRq : Show Blocked State
[  282.620958]   task                        PC stack   pid father
[  282.622228] kworker/2:1H    D 0000000000000002     0   532      2 0x00000000
[  282.623607] Workqueue: kblockd mq_flush_work
[  282.624027]  ffff880037869c98 0000000000000046 ffff880037868010 0000000000011380
[  282.624027]  ffff88007d255910 0000000000011380 ffff880037869fd8 0000000000011380
[  282.624027]  ffff880037869fd8 0000000000011380 ffff88007d06f0d0 ffff88007d255910
[  282.624027] Call Trace:
[  282.624027]  [<ffffffff8125b4fd>] ? do_rw_taskfile+0x2ab/0x2bf
[  282.624027]  [<ffffffff810235c4>] ? kvm_clock_read+0x1f/0x21
[  282.624027]  [<ffffffff81054fbc>] ? update_curr+0x4f/0xcd
[  282.624027]  [<ffffffff810235c4>] ? kvm_clock_read+0x1f/0x21
[  282.624027]  [<ffffffff810235cf>] ? kvm_clock_get_cycles+0x9/0xb
[  282.624027]  [<ffffffff81383946>] schedule+0x5f/0x61
[  282.624027]  [<ffffffff813839cf>] io_schedule+0x87/0xca
[  282.624027]  [<ffffffff81192402>] wait_on_tags+0x10f/0x146
[  282.624027]  [<ffffffff81192462>] blk_mq_wait_for_tags+0x29/0x3b
[  282.624027]  [<ffffffff8119132d>] blk_mq_alloc_request_pinned+0xcf/0xe5
[  282.624027]  [<ffffffff811913a6>] blk_mq_alloc_request+0x2d/0x34
[  282.624027]  [<ffffffff8118c60f>] mq_flush_work+0x1a/0x3d
[  282.624027]  [<ffffffff8104474b>] process_one_work+0x257/0x368
[  282.624027]  [<ffffffff81044a4a>] worker_thread+0x1ee/0x34b
[  282.624027]  [<ffffffff8104485c>] ? process_one_work+0x368/0x368
[  282.624027]  [<ffffffff81049771>] kthread+0xb0/0xb8
[  282.624027]  [<ffffffff810496c1>] ? kthread_freezable_should_stop+0x60/0x60
[  282.624027]  [<ffffffff8138a07c>] ret_from_fork+0x7c/0xb0
[  282.624027]  [<ffffffff810496c1>] ? kthread_freezable_should_stop+0x60/0x60
[  282.624027] fdisk           D 0000000000000002     0  1947   1930 0x00000000
[  282.624027]  ffff880037bd9d48 0000000000000082 ffff880037bd8010 0000000000011380
[  282.624027]  ffff88007ca223a0 0000000000011380 ffff880037bd9fd8 0000000000011380
[  282.624027]  ffff880037bd9fd8 0000000000011380 ffff88007d06bb60 ffff88007ca223a0
[  282.624027] Call Trace:
[  282.624027]  [<ffffffff813835e8>] ? __schedule+0x687/0x726
[  282.624027]  [<ffffffff81383946>] schedule+0x5f/0x61
[  282.624027]  [<ffffffff81381d18>] schedule_timeout+0x24/0x183
[  282.624027]  [<ffffffff810235c4>] ? kvm_clock_read+0x1f/0x21
[  282.624027]  [<ffffffff810235cf>] ? kvm_clock_get_cycles+0x9/0xb
[  282.624027]  [<ffffffff8105d2bd>] ? ktime_get_ts+0x53/0xc7
[  282.624027]  [<ffffffff81382e01>] io_schedule_timeout+0x93/0xe4
[  282.624027]  [<ffffffff81051fc7>] ? __cond_resched+0x25/0x31
[  282.624027]  [<ffffffff81383c46>] T.1554+0x8e/0xfc
[  282.624027]  [<ffffffff81054159>] ? try_to_wake_up+0x222/0x222
[  282.624027]  [<ffffffff81383cc7>] wait_for_completion_io+0x13/0x15
[  282.624027]  [<ffffffff8118cbde>] blkdev_issue_flush+0xfb/0x145
[  282.624027]  [<ffffffff810f067a>] blkdev_fsync+0x30/0x3d
[  282.624027]  [<ffffffff810e9259>] vfs_fsync_range+0x18/0x21
[  282.624027]  [<ffffffff810e9279>] vfs_fsync+0x17/0x19
[  282.624027]  [<ffffffff810e942e>] do_fsync+0x35/0x53
[  282.624027]  [<ffffffff810d5574>] ? SyS_ioctl+0x47/0x69
[  282.624027]  [<ffffffff810e9469>] SyS_fsync+0xb/0xf
[  282.624027]  [<ffffffff8138a129>] system_call_fastpath+0x16/0x1b
[  282.624027] blkid           D 0000000000000001     0  1952      1 0x00000000
[  282.679428]  ffff8800371638a8 0000000000000082 ffff880037162010 0000000000011380
[  282.679428]  ffff88007ca205f0 0000000000011380 ffff880037163fd8 0000000000011380
[  282.679428]  ffff880037163fd8 0000000000011380 ffff88007d06b570 ffff88007ca205f0
[  282.679428] Call Trace:
[  282.679428]  [<ffffffff810677a9>] ? generic_exec_single+0x75/0x93
[  282.679428]  [<ffffffff8119212a>] ? blk_mq_tag_busy_iter+0x116/0x116
[  282.679428]  [<ffffffff8106797f>] ? smp_call_function_single+0xf9/0x111
[  282.679428]  [<ffffffff81383946>] schedule+0x5f/0x61
[  282.679428]  [<ffffffff813839cf>] io_schedule+0x87/0xca
[  282.679428]  [<ffffffff81192402>] wait_on_tags+0x10f/0x146
[  282.679428]  [<ffffffff81192462>] blk_mq_wait_for_tags+0x29/0x3b
[  282.679428]  [<ffffffff8119132d>] blk_mq_alloc_request_pinned+0xcf/0xe5
[  282.679428]  [<ffffffff811916b9>] blk_mq_make_request+0x14d/0x2dc
[  282.679428]  [<ffffffff810978c4>] ? mempool_alloc_slab+0x10/0x12
[  282.679428]  [<ffffffff8118951e>] generic_make_request+0x9c/0xdf
[  282.679428]  [<ffffffff81189648>] submit_bio+0xe7/0xf2
[  282.679428]  [<ffffffff810eaaeb>] _submit_bh+0x1b0/0x1d3
[  282.679428]  [<ffffffff810eab19>] submit_bh+0xb/0xd
[  282.679428]  [<ffffffff810ed6e5>] block_read_full_page+0x24d/0x26d
[  282.679428]  [<ffffffff810ef905>] ? I_BDEV+0xd/0xd
[  282.679428]  [<ffffffff810a7624>] ? __inc_zone_page_state+0x1e/0x20
[  282.679428]  [<ffffffff81096188>] ? add_to_page_cache_locked+0x78/0xb0
[  282.679428]  [<ffffffff810f04a5>] blkdev_readpage+0x13/0x15
[  282.679428]  [<ffffffff8109de8d>] __do_page_cache_readahead+0x194/0x1d0
[  282.679428]  [<ffffffff81381f41>] ? __wait_on_bit_lock+0x79/0x8a
[  282.679428]  [<ffffffff8109df50>] force_page_cache_readahead+0x67/0x8d
[  282.679428]  [<ffffffff8109e29a>] page_cache_sync_readahead+0x26/0x3a
[  282.679428]  [<ffffffff81097510>] generic_file_aio_read+0x265/0x5cd
[  282.679428]  [<ffffffff810efaae>] blkdev_aio_read+0x57/0x5e
[  282.679428]  [<ffffffff810c6b8d>] do_sync_read+0x79/0x9f
[  282.679428]  [<ffffffff810c7db7>] vfs_read+0xab/0x130
[  282.679428]  [<ffffffff810c7f06>] SyS_read+0x4f/0x79
[  282.679428]  [<ffffffff8138a129>] system_call_fastpath+0x16/0x1b
[  282.679428] blkid           D 0000000000000003     0  1992    927 0x00000000
[  282.679428]  ffff88007848f8a8 0000000000000086 ffff88007848e010 0000000000011380
[  282.679428]  ffff88007d250000 0000000000011380 ffff88007848ffd8 0000000000011380
[  282.679428]  ffff88007848ffd8 0000000000011380 ffff88007d06c150 ffff88007d250000
[  282.679428] Call Trace:
[  282.679428]  [<ffffffff8109b9b6>] ? __alloc_pages_nodemask+0xf7/0x5eb
[  282.679428]  [<ffffffff81383946>] schedule+0x5f/0x61
[  282.679428]  [<ffffffff813839cf>] io_schedule+0x87/0xca
[  282.679428]  [<ffffffff81192402>] wait_on_tags+0x10f/0x146
[  282.679428]  [<ffffffff81192462>] blk_mq_wait_for_tags+0x29/0x3b
[  282.679428]  [<ffffffff8119132d>] blk_mq_alloc_request_pinned+0xcf/0xe5
[  282.679428]  [<ffffffff811916b9>] blk_mq_make_request+0x14d/0x2dc
[  282.679428]  [<ffffffff8118d81f>] ? create_task_io_context+0xa6/0xf5
[  282.679428]  [<ffffffff8118951e>] generic_make_request+0x9c/0xdf
[  282.679428]  [<ffffffff81189648>] submit_bio+0xe7/0xf2
[  282.679428]  [<ffffffff810eaaeb>] _submit_bh+0x1b0/0x1d3
[  282.679428]  [<ffffffff810eab19>] submit_bh+0xb/0xd
[  282.679428]  [<ffffffff810ed6e5>] block_read_full_page+0x24d/0x26d
[  282.679428]  [<ffffffff810ef905>] ? I_BDEV+0xd/0xd
[  282.679428]  [<ffffffff810f04a5>] blkdev_readpage+0x13/0x15
[  282.679428]  [<ffffffff8109de8d>] __do_page_cache_readahead+0x194/0x1d0
[  282.679428]  [<ffffffff8109df50>] force_page_cache_readahead+0x67/0x8d
[  282.679428]  [<ffffffff8109e29a>] page_cache_sync_readahead+0x26/0x3a
[  282.679428]  [<ffffffff81097510>] generic_file_aio_read+0x265/0x5cd
[  282.679428]  [<ffffffff810efaae>] blkdev_aio_read+0x57/0x5e
[  282.679428]  [<ffffffff810c6b8d>] do_sync_read+0x79/0x9f
[  282.679428]  [<ffffffff810c7db7>] vfs_read+0xab/0x130
[  282.679428]  [<ffffffff810c7f06>] SyS_read+0x4f/0x79
[  282.679428]  [<ffffffff8138a129>] system_call_fastpath+0x16/0x1b

Bumping queue_depth=2 seems to work-around the issue, but AFAICT it's a
genuine tag starvation bug with queue_depth=1 and WRITE_FLUSH..

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig - Oct. 3, 2013, 11:06 a.m.
On Sat, Jul 20, 2013 at 09:48:28AM -0500, Mike Christie wrote:
> What about the attached only compile tested patch. The patch has the mq
> block code work like the non mq code for bio cleanups.
> 
> 

> blk-mq: blk-mq should free bios in pass through case
> 
> For non mq calls, the block layer will free the bios when
> blk_finish_request is called.
e 
> For mq calls, the blk mq code wants the caller to do this.
> 
> This patch has the blk mq code work like the non mq code
> and has the block layer free the bios.
> 
> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>

This patch breaks booting for me in the current blk multiqueue tree,
with an apparent double free of a bio when using virtio-blk in writeback
mode (cache=writeback or cache=none in qemu):

[   15.253608] ------------[ cut here ]------------
[   15.256422] kernel BUG at /work/hch/linux/fs/bio.c:498!
[   15.256879] invalid opcode: 0000 [#1] SMP 
[   15.256879] Modules linked in:
[   15.256879] CPU: 3 PID: 353 Comm: kblockd Not tainted 3.11.0+ #25
[   15.256879] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   15.256879] task: ffff88007d75e0c0 ti: ffff88007d676000 task.ti: ffff88007d676000
[   15.256879] RIP: 0010:[<ffffffff811b470a>]  [<ffffffff811b470a>] bio_put+0x8a/0x90
[   15.256879] RSP: 0018:ffff88007fd83b50  EFLAGS: 00010046
[   15.256879] RAX: 0000000000000000 RBX: ffff88007d713080 RCX: 0000000000000035
[   15.256879] RDX: 0000000000000002 RSI: ffff88007ad50338 RDI: ffff88007d713080
[   15.256879] RBP: ffff88007fd83b60 R08: 7010000000000000 R09: 007ad50338080000
[   15.256879] R10: ff672b1b7d38ce02 R11: 000000000000028b R12: 0000000000000000
[   15.256879] R13: 0000000000000000 R14: ffff88007b4c36c0 R15: ffff88007b40d608
[   15.256879] FS:  0000000000000000(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000
[   15.256879] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   15.256879] CR2: 0000000000000138 CR3: 0000000002124000 CR4: 00000000000006e0
[   15.256879] Stack:
[   15.256879]  ffff88007d713080 0000000000000000 ffff88007fd83b80 ffffffff811ae8a3
[   15.256879]  ffff88007fd83bf0 0000000000001000 ffff88007fd83b90 ffffffff811b3268
[   15.256879]  ffff88007fd83bc0 ffffffff816ac847 ffff88007b4c36c0 ffff88007fd99d00
[   15.256879] Call Trace:
[   15.256879]  <IRQ> 
[   15.256879]  [<ffffffff811ae8a3>] end_bio_bh_io_sync+0x33/0x50
[   15.256879]  [<ffffffff811b3268>] bio_endio+0x18/0x30
[   15.256879]  [<ffffffff816ac847>] blk_mq_complete_request+0x47/0xd0
[   15.256879]  [<ffffffff816ac8e9>] __blk_mq_end_io+0x19/0x20
[   15.256879]  [<ffffffff816ac958>] blk_mq_end_io+0x68/0xd0
[   15.256879]  [<ffffffff816a6162>] blk_flush_complete_seq+0xe2/0x370
[   15.256879]  [<ffffffff816a653b>] flush_end_io+0x11b/0x200
[   15.256879]  [<ffffffff816ac875>] blk_mq_complete_request+0x75/0xd0
[   15.256879]  [<ffffffff816ac8e9>] __blk_mq_end_io+0x19/0x20
[   15.256879]  [<ffffffff816ac958>] blk_mq_end_io+0x68/0xd0
[   15.256879]  [<ffffffff81844c2f>] virtblk_done+0xef/0x260
[   15.256879]  [<ffffffff81753cc0>] vring_interrupt+0x30/0x60
[   15.256879]  [<ffffffff81103724>] handle_irq_event_percpu+0x54/0x1f0
[   15.256879]  [<ffffffff81103903>] handle_irq_event+0x43/0x70
[   15.256879]  [<ffffffff8110609f>] handle_edge_irq+0x6f/0x120
[   15.256879]  [<ffffffff810445b8>] handle_irq+0x58/0x140
[   15.256879]  [<ffffffff81094bbf>] ? irq_enter+0x4f/0x90
[   15.256879]  [<ffffffff810440b5>] do_IRQ+0x55/0xd0
[   15.256879]  [<ffffffff81bd3972>] common_interrupt+0x72/0x72
[   15.256879]  [<ffffffff810c5135>] ? sched_clock_local+0x25/0xa0
[   15.256879]  [<ffffffff81094960>] ? __do_softirq+0xb0/0x250
[   15.256879]  [<ffffffff81094959>] ? __do_softirq+0xa9/0x250
[   15.256879]  [<ffffffff81094cae>] irq_exit+0xae/0xd0
[   15.256879]  [<ffffffff8106dcd5>] smp_apic_timer_interrupt+0x45/0x60
[   15.256879]  [<ffffffff81bdc772>] apic_timer_interrupt+0x72/0x80
[   15.256879]  <EOI> 
[   15.256879]  [<ffffffff81bd3a33>] ? retint_restore_args+0x13/0x13
[   15.256879]  [<ffffffff81bd3502>] ? _raw_spin_unlock_irq+0x32/0x40
[   15.256879]  [<ffffffff81bd34fb>] ? _raw_spin_unlock_irq+0x2b/0x40
[   15.256879]  [<ffffffff810ac0c4>] rescuer_thread+0xe4/0x2f0
[   15.256879]  [<ffffffff810abfe0>] ? process_scheduled_works+0x40/0x40
[   15.256879]  [<ffffffff810b3916>] kthread+0xd6/0xe0
[   15.256879]  [<ffffffff81bd34fb>] ? _raw_spin_unlock_irq+0x2b/0x40
[   15.256879]  [<ffffffff810b3840>] ? __init_kthread_worker+0x70/0x70
[   15.256879]  [<ffffffff81bdbabc>] ret_from_fork+0x7c/0xb0
[   15.256879]  [<ffffffff810b3840>] ? __init_kthread_worker+0x70/0x70
[   15.256879] Code: ff 41 8b 44 24 08 48 89 df 49 8b 74 24 10 48 29 c7 e8 cb 88 f8 ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 90 48 89 df e8 b8 5c fc ff eb 9b <0f> 0b 0f 1f 40 00 55 48 89 e5 41 57 45 31 ff 41 56 41 55 41 54 
[   15.256879] RIP  [<ffffffff811b470a>] bio_put+0x8a/0x90
[   15.256879]  RSP <ffff88007fd83b50>
[   15.256879] ---[ end trace 1f201608bfddfca7 ]---
[   15.256879] Kernel panic - not syncing: Fatal exception in interrupt
[   15.256879] Shutting down cpus with NMI

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Gordeev - Oct. 7, 2013, 2:44 p.m.
On Thu, Oct 03, 2013 at 04:06:51AM -0700, Christoph Hellwig wrote:
> On Sat, Jul 20, 2013 at 09:48:28AM -0500, Mike Christie wrote:
> > What about the attached only compile tested patch. The patch has the mq
> > block code work like the non mq code for bio cleanups.
> > 
> > 
> 
> > blk-mq: blk-mq should free bios in pass through case
> > 
> > For non mq calls, the block layer will free the bios when
> > blk_finish_request is called.
> e 
> > For mq calls, the blk mq code wants the caller to do this.
> > 
> > This patch has the blk mq code work like the non mq code
> > and has the block layer free the bios.
> > 
> > Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
> 
> This patch breaks booting for me in the current blk multiqueue tree,
> with an apparent double free of a bio when using virtio-blk in writeback
> mode (cache=writeback or cache=none in qemu):

I am not sure if the root cause the same, but the panic I experience with
mounting a ahci device (and those double-tag usage described in another
thread) is somehow similar:


[  181.184510] general protection fault: 0000 [#1] SMP 
[  181.184546] Modules linked in: lockd sunrpc snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm mperf snd_page_alloc i5000_edac coretemp snd_timer edac_core iTCO_wdt snd kvm_intel iTCO_vendor_support lpc_ich mfd_core igb dca i5k_amb ppdev soundcore hp_wmi tg3 kvm sparse_keymap serio_raw ptp microcode pcspkr rfkill pps_core shpchp parport_pc parport mptsas scsi_transport_sas mptscsih mptbase floppy nouveau video mxm_wmi wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core
[  181.184550] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W    3.10.0-rc5.debug+ #180
[  181.184552] Hardware name: Hewlett-Packard HP xw6400 Workstation/0A04h, BIOS 786D4 v02.31 03/14/2008
[  181.184554] task: ffff88007b1a8000 ti: ffff88007b19c000 task.ti: ffff88007b19c000
[  181.184563] RIP: 0010:[<ffffffff811fa97b>]  [<ffffffff811fa97b>] bio_endio+0x1b/0x40
[  181.184565] RSP: 0018:ffff88007d203a28  EFLAGS: 00010002
[  181.184567] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: dead000000200200
[  181.184568] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880068e29200
[  181.184570] RBP: ffff88007d203a28 R08: ffffe8ffff201240 R09: 0000000000000000
[  181.184571] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880074d86000
[  181.184572] R13: 6b6b6b6b6b6b6b6b R14: 000000006b6b6b6b R15: 0000000000000001
[  181.184575] FS:  0000000000000000(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000
[  181.184576] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  181.184578] CR2: 00007f8afac8c45c CR3: 000000005f431000 CR4: 00000000000007e0
[  181.184580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  181.184581] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  181.184582] Stack:
[  181.184587]  ffff88007d203a78 ffffffff813202aa ffff88007d203a98 0000000000000046
[  181.184591]  0000000000000046 ffff880072d08000 0000000000000000 ffff880074d860d8
[  181.184594]  0000000000000000 0000000000000001 ffff88007d203a88 ffffffff81320445
[  181.184595] Call Trace:
[  181.184597]  <IRQ> 
[  181.184603]  [<ffffffff813202aa>] blk_mq_complete_request+0x5a/0x1d0
[  181.184607]  [<ffffffff81320445>] __blk_mq_end_io+0x25/0x30
[  181.184609]  [<ffffffff81320535>] blk_mq_end_io+0xe5/0xf0
[  181.184613]  [<ffffffff81319754>] blk_flush_complete_seq+0xf4/0x360
[  181.184616]  [<ffffffff81319a4b>] ? flush_end_io+0x4b/0x210
[  181.184619]  [<ffffffff81319b2a>] flush_end_io+0x12a/0x210
[  181.184622]  [<ffffffff813202da>] blk_mq_complete_request+0x8a/0x1d0
[  181.184626]  [<ffffffff8147b9fd>] ? scsi_device_unbusy+0x9d/0xd0
[  181.184629]  [<ffffffff81320445>] __blk_mq_end_io+0x25/0x30
[  181.184632]  [<ffffffff81320535>] blk_mq_end_io+0xe5/0xf0
[  181.184635]  [<ffffffff8147cae5>] scsi_mq_end_request+0x15/0x20
[  181.184638]  [<ffffffff8147bf20>] scsi_io_completion+0xa0/0x650
[  181.184643]  [<ffffffff810bbc3d>] ? trace_hardirqs_off+0xd/0x10
[  181.184648]  [<ffffffff814722f7>] scsi_finish_command+0x87/0xe0
[  181.184650]  [<ffffffff8147bccf>] scsi_softirq_done+0x13f/0x160
[  181.184653]  [<ffffffff8147cba5>] scsi_mq_done+0x15/0x20
[  181.184658]  [<ffffffff81495a73>] ata_scsi_qc_complete+0x63/0x470
[  181.184661]  [<ffffffff8148fad0>] __ata_qc_complete+0x90/0x140
[  181.184664]  [<ffffffff8148fc1d>] ata_qc_complete+0x9d/0x230
[  181.184667]  [<ffffffff8148fe51>] ata_qc_complete_multiple+0xa1/0xe0
[  181.184673]  [<ffffffff814aa449>] ahci_handle_port_interrupt+0x109/0x560
[  181.184676]  [<ffffffff814ab63f>] ahci_port_intr+0x2f/0x40
[  181.184678]  [<ffffffff814ab6f1>] ahci_interrupt+0xa1/0x100
[  181.184683]  [<ffffffff810ff7b5>] handle_irq_event_percpu+0x75/0x3d0
[  181.184686]  [<ffffffff810ffb58>] handle_irq_event+0x48/0x70
[  181.184689]  [<ffffffff81102d9e>] ? handle_fasteoi_irq+0x1e/0x100
[  181.184692]  [<ffffffff81102dda>] handle_fasteoi_irq+0x5a/0x100
[  181.184696]  [<ffffffff81004320>] handle_irq+0x60/0x150
[  181.184702]  [<ffffffff816ff846>] ? atomic_notifier_call_chain+0x16/0x20
[  181.184706]  [<ffffffff81705f7a>] do_IRQ+0x5a/0xe0
[  181.184710]  [<ffffffff816fb52f>] common_interrupt+0x6f/0x6f
[  181.184712]  <EOI> 
[  181.184716]  [<ffffffff8100aa45>] ? default_idle+0x25/0x280
[  181.184719]  [<ffffffff8100aa43>] ? default_idle+0x23/0x280
[  181.184722]  [<ffffffff8100b4f6>] arch_cpu_idle+0x26/0x30
[  181.184726]  [<ffffffff810afe66>] cpu_startup_entry+0x96/0x3e0
[  181.184729]  [<ffffffff810b7ad5>] ? clockevents_register_device+0xb5/0x120
[  181.184734]  [<ffffffff816e67ea>] start_secondary+0x27a/0x27c
[  181.184767] Code: 47 c1 c1 ea 03 83 c2 01 39 d0 0f 47 c2 c3 66 90 66 66 66 66 90 55 85 f6 48 89 e5 74 1b f0 80 67 18 fe 48 8b 47 40 48 85 c0 74 02 <ff> d0 5d 66 90 c3 0f 1f 80 00 00 00 00 48 8b 47 18 a8 01 b8 fb 
[  181.184770] RIP  [<ffffffff811fa97b>] bio_endio+0x1b/0x40
[  181.184772]  RSP <ffff88007d203a28>
[  181.184777] ---[ end trace 5e8fd083b8562c3c ]---
[  181.184779] Kernel panic - not syncing: Fatal exception in interrupt
[  181.185483] drm_kms_helper: panic occurred, switching back to text console

Patch

diff --git a/block/blk-flush.c b/block/blk-flush.c
index c56c37d..3e4cc9c 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -231,7 +231,7 @@  static void flush_end_io(struct request *flush_rq, int error)
 	unsigned long flags = 0;
 
 	if (q->mq_ops) {
-		blk_mq_finish_request(flush_rq, error);
+		blk_mq_free_request(flush_rq);
 		spin_lock_irqsave(&q->mq_flush_lock, flags);
 	}
 	running = &q->flush_queue[q->flush_running_idx];
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 799d305..5489b5a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -270,7 +270,7 @@  void blk_mq_free_request(struct request *rq)
 }
 EXPORT_SYMBOL(blk_mq_free_request);
 
-void blk_mq_finish_request(struct request *rq, int error)
+static void blk_mq_finish_request(struct request *rq, int error)
 {
 	struct bio *bio = rq->bio;
 	unsigned int bytes = 0;
@@ -286,22 +286,17 @@  void blk_mq_finish_request(struct request *rq, int error)
 
 	blk_account_io_completion(rq, bytes);
 	blk_account_io_done(rq);
-	blk_mq_free_request(rq);
 }
 
 void blk_mq_complete_request(struct request *rq, int error)
 {
 	trace_block_rq_complete(rq->q, rq);
+	blk_mq_finish_request(rq, error);
 
-	/*
-	 * If ->end_io is set, it's responsible for doing the rest of the
-	 * completion.
-	 */
 	if (rq->end_io)
 		rq->end_io(rq, error);
 	else
-		blk_mq_finish_request(rq, error);
-
+		blk_mq_free_request(rq);
 }
 
 void __blk_mq_end_io(struct request *rq, int error)
@@ -973,8 +968,7 @@  int blk_mq_execute_rq(struct request_queue *q, struct request *rq)
 	if (rq->errors)
 		err = -EIO;
 
-	blk_mq_finish_request(rq, rq->errors);
-
+	blk_mq_free_request(rq);
 	return err;
 }
 EXPORT_SYMBOL(blk_mq_execute_rq);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 42d0110..52bf1f9 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,7 +27,6 @@  void blk_mq_complete_request(struct request *rq, int error);
 void blk_mq_run_request(struct request *rq, bool run_queue, bool async);
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_init_flush(struct request_queue *q);
-void blk_mq_finish_request(struct request *rq, int error);
 
 /*
  * CPU hotplug helpers