Message ID | 1374296162.7397.1098.camel@haakon3.risingtidesystems.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
On Fri, Jul 19, 2013 at 09:56:02PM -0700, Nicholas A. Bellinger wrote: > On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote: > OK, after further investigation the root cause is a actually a missing > bio->bio_end_io() -> bio_copy_kern_endio() -> bio_put() from the > blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is > currently using. Yes, missing bio_copy_kern_endio() callback is exactly the reason I turned to blk_mq_execute_rq() initially. I should have been more specific on this :| I will try Mike's and your other change, hopefully soon (sorry, constantly getting distracted). > Including the following patch into the scsi-mq working branch now, and > reverting the libata dma_alignment=0x03 hack.
On Mon, 2013-07-22 at 17:03 +0200, Alexander Gordeev wrote: > On Fri, Jul 19, 2013 at 09:56:02PM -0700, Nicholas A. Bellinger wrote: > > On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote: > > OK, after further investigation the root cause is a actually a missing > > bio->bio_end_io() -> bio_copy_kern_endio() -> bio_put() from the > > blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is > > currently using. > > Yes, missing bio_copy_kern_endio() callback is exactly the reason I > turned to blk_mq_execute_rq() initially. I should have been more > specific on this :| > > I will try Mike's and your other change, hopefully soon (sorry, > constantly getting distracted). > Np. FYI, you'll want to use the latest commit e7827b351 HEAD from target-pending/scsi-mq, which now has functioning scsi-generic support. Also, your scsi_times_out patch from earlier has not been included just yet, but that should be the only extra patch you need to apply in order to get scsi-mq enabled libata/ata_piix running. --nab -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote: > Np. FYI, you'll want to use the latest commit e7827b351 HEAD from > target-pending/scsi-mq, which now has functioning scsi-generic support. Survives a boot, a kernel build and the build's result :)
On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote: > On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote: > > Np. FYI, you'll want to use the latest commit e7827b351 HEAD from > > target-pending/scsi-mq, which now has functioning scsi-generic support. > > Survives a boot, a kernel build and the build's result :) Great. Thanks for the feedback Alexander! So the next step on my end is to enable -mq for ahci, and verify initial correctness using QEMU/KVM hardware emulation. Btw, I've been looking at enabling the SHT->cmd_size for struct ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors are already all pre-allocated by libata and obtained via ata_qc_new() -> __ata_qc_from_tag() during ata_scsi_queuecmd(). So that said, with the struct request + struct scsi_cmnd pre-allocations already provided by blk-mq -> scsi-mq code, all memory allocations should have already been eliminated from I/O fast path. --nab -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 25 2013, Nicholas A. Bellinger wrote: > On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote: > > On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote: > > > Np. FYI, you'll want to use the latest commit e7827b351 HEAD from > > > target-pending/scsi-mq, which now has functioning scsi-generic support. > > > > Survives a boot, a kernel build and the build's result :) > > Great. Thanks for the feedback Alexander! > > So the next step on my end is to enable -mq for ahci, and verify initial > correctness using QEMU/KVM hardware emulation. > > Btw, I've been looking at enabling the SHT->cmd_size for struct > ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors > are already all pre-allocated by libata and obtained via ata_qc_new() -> > __ata_qc_from_tag() during ata_scsi_queuecmd(). Might still not be a bad idea to do it: - Cleans up a driver, getting rid of the need to alloc, maintain, and free those structures. - Should be some cache locality benefits to having it all sequential. > So that said, with the struct request + struct scsi_cmnd pre-allocations > already provided by blk-mq -> scsi-mq code, all memory allocations > should have already been eliminated from I/O fast path. Nice!
On Thu, 2013-07-25 at 20:09 -0600, Jens Axboe wrote: > On Thu, Jul 25 2013, Nicholas A. Bellinger wrote: > > On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote: > > > On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote: > > > > Np. FYI, you'll want to use the latest commit e7827b351 HEAD from > > > > target-pending/scsi-mq, which now has functioning scsi-generic support. > > > > > > Survives a boot, a kernel build and the build's result :) > > > > Great. Thanks for the feedback Alexander! > > > > So the next step on my end is to enable -mq for ahci, and verify initial > > correctness using QEMU/KVM hardware emulation. > > > > Btw, I've been looking at enabling the SHT->cmd_size for struct > > ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors > > are already all pre-allocated by libata and obtained via ata_qc_new() -> > > __ata_qc_from_tag() during ata_scsi_queuecmd(). > > Might still not be a bad idea to do it: > > - Cleans up a driver, getting rid of the need to alloc, maintain, and > free those structures. > > - Should be some cache locality benefits to having it all sequential. > Looking at this some more, there are a number of locations outside of the main blk_mq_ops->queue_rq() -> SHT->queuecommand_mq() dispatch that use *ata_qc_from_tag() to obtain *ata_queued_cmd, and a few without a associated struct scsi_cmnd like libata-core.c:ata_exec_internal_sg() for example.. So I don't think (completely) getting rid of ata_port->qcmds[] will be possible, and just converting the ata_scsi_queuecmd() path to use the extra SHT->cmd_size pre-allocation for *ata_queued_cmd might end up being more trouble that it's worth. Still undecided on that part.. Tejun, do you have any thoughts + input here..? --nab -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/26/2013 04:09 AM, Jens Axboe wrote: > On Thu, Jul 25 2013, Nicholas A. Bellinger wrote: >> On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote: >>> On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote: >>>> Np. FYI, you'll want to use the latest commit e7827b351 HEAD from >>>> target-pending/scsi-mq, which now has functioning scsi-generic support. >>> >>> Survives a boot, a kernel build and the build's result :) >> >> Great. Thanks for the feedback Alexander! >> >> So the next step on my end is to enable -mq for ahci, and verify initial >> correctness using QEMU/KVM hardware emulation. >> >> Btw, I've been looking at enabling the SHT->cmd_size for struct >> ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors >> are already all pre-allocated by libata and obtained via ata_qc_new() -> >> __ata_qc_from_tag() during ata_scsi_queuecmd(). > > Might still not be a bad idea to do it: > > - Cleans up a driver, getting rid of the need to alloc, maintain, and > free those structures. > > - Should be some cache locality benefits to having it all sequential. > >> So that said, with the struct request + struct scsi_cmnd pre-allocations >> already provided by blk-mq -> scsi-mq code, all memory allocations >> should have already been eliminated from I/O fast path. > > Nice! > Hmm. I'm trying to work out if it would be possible to move multipath handling over to scsi-mq. However, when doing so I would need to reconfigure 'nr_hw_queues' on the fly. Now with all the static cmd preallocation going on this is going to be tricky. This leaves me with two choices: - Tear down the command pool altogether whenever I need to reconfigure the device (which is going to be painful) - Allocate some max nr_hw_queues, and mark the superfluous ones as 'unused' or something. Seeing the a sane max nr_hw_queues will be possibly the number of cpus this might end up hogging quite some memory. Would you accept patches moving the static command allocation over to pools or is this a desired feature? Cheers, Hannes
Hello, On Fri, Jul 26, 2013 at 02:14:36PM -0700, Nicholas A. Bellinger wrote: > So I don't think (completely) getting rid of ata_port->qcmds[] will be > possible, and just converting the ata_scsi_queuecmd() path to use the > extra SHT->cmd_size pre-allocation for *ata_queued_cmd might end up > being more trouble that it's worth. Still undecided on that part.. > > Tejun, do you have any thoughts + input here..? libata exception handling which includes probing doesn't go through SCSI at all. It all works inside libata proper using ata_queuecmds and only the result is exposed to SCSI. Most of those SCSI semantics need to be emulated anyway, so this makes things a lot easier than going through SCSI for each command. As it currently stands, it'd be a lot of effort to try to embed ata_qc's into higher layer construct. Given how it's used, I don't think it's a high priority task. One thing which would probably be worthwhile tho is getting rid of the bitmap based qc tag allocator in libata. That one is just borderline stupid to keep around on any setup which is supposed to be scalable. Thanks.
On 07/29/2013 05:46 AM, Tejun Heo wrote: > Hello, > > On Fri, Jul 26, 2013 at 02:14:36PM -0700, Nicholas A. Bellinger wrote: >> So I don't think (completely) getting rid of ata_port->qcmds[] will be >> possible, and just converting the ata_scsi_queuecmd() path to use the >> extra SHT->cmd_size pre-allocation for *ata_queued_cmd might end up >> being more trouble that it's worth. Still undecided on that part.. >> >> Tejun, do you have any thoughts + input here..? > > libata exception handling which includes probing doesn't go through > SCSI at all. It all works inside libata proper using ata_queuecmds > and only the result is exposed to SCSI. Most of those SCSI semantics > need to be emulated anyway, so this makes things a lot easier than > going through SCSI for each command. As it currently stands, it'd be > a lot of effort to try to embed ata_qc's into higher layer construct. > Given how it's used, I don't think it's a high priority task. > > One thing which would probably be worthwhile tho is getting rid of the > bitmap based qc tag allocator in libata. That one is just borderline > stupid to keep around on any setup which is supposed to be scalable. Your border might be wider than mine :-). Yes, the bitmap should definitely go.
diff --git a/block/blk-exec.c b/block/blk-exec.c index 0761c89..70303d2 100644 --- a/block/blk-exec.c +++ b/block/blk-exec.c @@ -25,7 +25,10 @@ static void blk_end_sync_rq(struct request *rq, int error) struct completion *waiting = rq->end_io_data; rq->end_io_data = NULL; - if (!rq->q->mq_ops) { + if (rq->q->mq_ops) { + if (rq->bio) + bio_endio(rq->bio, error); + } else { __blk_put_request(rq->q, rq); }