Patchwork [RESEND,0/1] AHCI: Optimize interrupt processing

login
register
mail settings
Submitter Nicholas A. Bellinger
Date July 20, 2013, 4:56 a.m.
Message ID <1374296162.7397.1098.camel@haakon3.risingtidesystems.com>
Download mbox | patch
Permalink /patch/260402/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Nicholas A. Bellinger - July 20, 2013, 4:56 a.m.
On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote:
> On Fri, 2013-07-19 at 08:33 -0700, James Bottomley wrote:
> > On Thu, 2013-07-18 at 23:34 -0700, Nicholas A. Bellinger wrote:
> > > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> > > index 0101af5..191bc15 100644
> > > --- a/drivers/ata/libata-scsi.c
> > > +++ b/drivers/ata/libata-scsi.c
> > > @@ -1144,7 +1144,11 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
> > >                         "sector_size=%u > PAGE_SIZE, PIO may malfunction\n",
> > >                         sdev->sector_size);
> > >  
> > > -       blk_queue_update_dma_alignment(q, sdev->sector_size - 1);
> > > +       if (!q->mq_ops) {
> > > +               blk_queue_update_dma_alignment(q, sdev->sector_size - 1);
> > > +       } else {
> > > +               printk("Skipping dma_alignment for libata w/ scsi-mq\n");
> > > +       }
> > 
> > Amazingly enough there is a reason for the dma alignment, and it wasn't
> > just to annoy you, so you can't blindly do this.
> > 
> > The email thread is probably lost in the mists of time, but if I
> > remember correctly the problem is that some ahci DMA controllers barf if
> > the sector they're doing DMA on crosses a page boundary.  Some are
> > annoying enough to actually cause silent data corruption.  You won't
> > find every ahci DMA controller doing this, so the change will work for
> > some, but it will be hard to identify those it won't work for until
> > people start losing data.
> 
> Thanks for the extra background.
> 
> So at least from what I gather thus far this shouldn't be an issue for
> initial testing with scsi-mq <-> libata w/ ata_piix.
> 
> > 
> > The correct fix, obviously, is to do the bio copy on the kernel path for
> > unaligned data.  It is OK to assume that REQ_TYPE_FS data is correctly
> > aligned (because of the block to page alignment).
> > 
> 
> Indeed.  Looking into the bio_copy_kern() breakage next..
> 

OK, after further investigation the root cause is a actually a missing
bio->bio_end_io() -> bio_copy_kern_endio() -> bio_put() from the
blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is
currently using.

Including the following patch into the scsi-mq working branch now, and
reverting the libata dma_alignment=0x03 hack.

Alexander, care to give this a try..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Gordeev - July 22, 2013, 3:03 p.m.
On Fri, Jul 19, 2013 at 09:56:02PM -0700, Nicholas A. Bellinger wrote:
> On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote:
> OK, after further investigation the root cause is a actually a missing
> bio->bio_end_io() -> bio_copy_kern_endio() -> bio_put() from the
> blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is
> currently using.

Yes, missing bio_copy_kern_endio() callback is exactly the reason I
turned to blk_mq_execute_rq() initially. I should have been more
specific on this :|

I will try Mike's and your other change, hopefully soon (sorry,
constantly getting distracted).

> Including the following patch into the scsi-mq working branch now, and
> reverting the libata dma_alignment=0x03 hack.
Nicholas A. Bellinger - July 22, 2013, 9:10 p.m.
On Mon, 2013-07-22 at 17:03 +0200, Alexander Gordeev wrote:
> On Fri, Jul 19, 2013 at 09:56:02PM -0700, Nicholas A. Bellinger wrote:
> > On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote:
> > OK, after further investigation the root cause is a actually a missing
> > bio->bio_end_io() -> bio_copy_kern_endio() -> bio_put() from the
> > blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is
> > currently using.
> 
> Yes, missing bio_copy_kern_endio() callback is exactly the reason I
> turned to blk_mq_execute_rq() initially. I should have been more
> specific on this :|
> 
> I will try Mike's and your other change, hopefully soon (sorry,
> constantly getting distracted).
> 

Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
target-pending/scsi-mq, which now has functioning scsi-generic support.

Also, your scsi_times_out patch from earlier has not been included just
yet, but that should be the only extra patch you need to apply in order
to get scsi-mq enabled libata/ata_piix running.

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Gordeev - July 25, 2013, 10:16 a.m.
On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
> Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
> target-pending/scsi-mq, which now has functioning scsi-generic support.

Survives a boot, a kernel build and the build's result :)
Nicholas A. Bellinger - July 25, 2013, 10:08 p.m.
On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
> On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
> > Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
> > target-pending/scsi-mq, which now has functioning scsi-generic support.
> 
> Survives a boot, a kernel build and the build's result :)

Great.  Thanks for the feedback Alexander!

So the next step on my end is to enable -mq for ahci, and verify initial
correctness using QEMU/KVM hardware emulation.

Btw, I've been looking at enabling the SHT->cmd_size for struct
ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
are already all pre-allocated by libata and obtained via ata_qc_new() ->
__ata_qc_from_tag() during ata_scsi_queuecmd().

So that said, with the struct request + struct scsi_cmnd pre-allocations
already provided by blk-mq -> scsi-mq code, all memory allocations
should have already been eliminated from I/O fast path.

--nab








--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe - July 26, 2013, 2:09 a.m.
On Thu, Jul 25 2013, Nicholas A. Bellinger wrote:
> On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
> > On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
> > > Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
> > > target-pending/scsi-mq, which now has functioning scsi-generic support.
> > 
> > Survives a boot, a kernel build and the build's result :)
> 
> Great.  Thanks for the feedback Alexander!
> 
> So the next step on my end is to enable -mq for ahci, and verify initial
> correctness using QEMU/KVM hardware emulation.
> 
> Btw, I've been looking at enabling the SHT->cmd_size for struct
> ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
> are already all pre-allocated by libata and obtained via ata_qc_new() ->
> __ata_qc_from_tag() during ata_scsi_queuecmd().

Might still not be a bad idea to do it:

- Cleans up a driver, getting rid of the need to alloc, maintain, and
  free those structures.

- Should be some cache locality benefits to having it all sequential.

> So that said, with the struct request + struct scsi_cmnd pre-allocations
> already provided by blk-mq -> scsi-mq code, all memory allocations
> should have already been eliminated from I/O fast path.

Nice!
Nicholas A. Bellinger - July 26, 2013, 9:14 p.m.
On Thu, 2013-07-25 at 20:09 -0600, Jens Axboe wrote:
> On Thu, Jul 25 2013, Nicholas A. Bellinger wrote:
> > On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
> > > On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
> > > > Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
> > > > target-pending/scsi-mq, which now has functioning scsi-generic support.
> > > 
> > > Survives a boot, a kernel build and the build's result :)
> > 
> > Great.  Thanks for the feedback Alexander!
> > 
> > So the next step on my end is to enable -mq for ahci, and verify initial
> > correctness using QEMU/KVM hardware emulation.
> > 
> > Btw, I've been looking at enabling the SHT->cmd_size for struct
> > ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
> > are already all pre-allocated by libata and obtained via ata_qc_new() ->
> > __ata_qc_from_tag() during ata_scsi_queuecmd().
> 
> Might still not be a bad idea to do it:
> 
> - Cleans up a driver, getting rid of the need to alloc, maintain, and
>   free those structures.
> 
> - Should be some cache locality benefits to having it all sequential.
> 

Looking at this some more, there are a number of locations outside of
the main blk_mq_ops->queue_rq() -> SHT->queuecommand_mq() dispatch that
use *ata_qc_from_tag() to obtain *ata_queued_cmd, and a few without a
associated struct scsi_cmnd like libata-core.c:ata_exec_internal_sg()
for example..

So I don't think (completely) getting rid of ata_port->qcmds[] will be
possible, and just converting the ata_scsi_queuecmd() path to use the
extra SHT->cmd_size pre-allocation for *ata_queued_cmd might end up
being more trouble that it's worth.  Still undecided on that part..

Tejun, do you have any thoughts + input here..?

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke - July 29, 2013, 7:28 a.m.
On 07/26/2013 04:09 AM, Jens Axboe wrote:
> On Thu, Jul 25 2013, Nicholas A. Bellinger wrote:
>> On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
>>> On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
>>>> Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
>>>> target-pending/scsi-mq, which now has functioning scsi-generic support.
>>>
>>> Survives a boot, a kernel build and the build's result :)
>>
>> Great.  Thanks for the feedback Alexander!
>>
>> So the next step on my end is to enable -mq for ahci, and verify initial
>> correctness using QEMU/KVM hardware emulation.
>>
>> Btw, I've been looking at enabling the SHT->cmd_size for struct
>> ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
>> are already all pre-allocated by libata and obtained via ata_qc_new() ->
>> __ata_qc_from_tag() during ata_scsi_queuecmd().
> 
> Might still not be a bad idea to do it:
> 
> - Cleans up a driver, getting rid of the need to alloc, maintain, and
>   free those structures.
> 
> - Should be some cache locality benefits to having it all sequential.
> 
>> So that said, with the struct request + struct scsi_cmnd pre-allocations
>> already provided by blk-mq -> scsi-mq code, all memory allocations
>> should have already been eliminated from I/O fast path.
> 
> Nice!
> 
Hmm.

I'm trying to work out if it would be possible to move multipath
handling over to scsi-mq.
However, when doing so I would need to reconfigure 'nr_hw_queues' on
the fly. Now with all the static cmd preallocation going on this is
going to be tricky.

This leaves me with two choices:
- Tear down the command pool altogether whenever I need to
  reconfigure the device (which is going to be painful)
- Allocate some max nr_hw_queues, and mark the superfluous
  ones as 'unused' or something. Seeing the a sane max nr_hw_queues
  will be possibly the number of cpus this might end up hogging
  quite some memory.

Would you accept patches moving the static command allocation over
to pools or is this a desired feature?

Cheers,

Hannes
Tejun Heo - July 29, 2013, 11:46 a.m.
Hello,

On Fri, Jul 26, 2013 at 02:14:36PM -0700, Nicholas A. Bellinger wrote:
> So I don't think (completely) getting rid of ata_port->qcmds[] will be
> possible, and just converting the ata_scsi_queuecmd() path to use the
> extra SHT->cmd_size pre-allocation for *ata_queued_cmd might end up
> being more trouble that it's worth.  Still undecided on that part..
> 
> Tejun, do you have any thoughts + input here..?

libata exception handling which includes probing doesn't go through
SCSI at all.  It all works inside libata proper using ata_queuecmds
and only the result is exposed to SCSI.  Most of those SCSI semantics
need to be emulated anyway, so this makes things a lot easier than
going through SCSI for each command.  As it currently stands, it'd be
a lot of effort to try to embed ata_qc's into higher layer construct.
Given how it's used, I don't think it's a high priority task.

One thing which would probably be worthwhile tho is getting rid of the
bitmap based qc tag allocator in libata.  That one is just borderline
stupid to keep around on any setup which is supposed to be scalable.

Thanks.
Jens Axboe - July 29, 2013, 2:03 p.m.
On 07/29/2013 05:46 AM, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jul 26, 2013 at 02:14:36PM -0700, Nicholas A. Bellinger wrote:
>> So I don't think (completely) getting rid of ata_port->qcmds[] will be
>> possible, and just converting the ata_scsi_queuecmd() path to use the
>> extra SHT->cmd_size pre-allocation for *ata_queued_cmd might end up
>> being more trouble that it's worth.  Still undecided on that part..
>>
>> Tejun, do you have any thoughts + input here..?
> 
> libata exception handling which includes probing doesn't go through
> SCSI at all.  It all works inside libata proper using ata_queuecmds
> and only the result is exposed to SCSI.  Most of those SCSI semantics
> need to be emulated anyway, so this makes things a lot easier than
> going through SCSI for each command.  As it currently stands, it'd be
> a lot of effort to try to embed ata_qc's into higher layer construct.
> Given how it's used, I don't think it's a high priority task.
> 
> One thing which would probably be worthwhile tho is getting rid of the
> bitmap based qc tag allocator in libata.  That one is just borderline
> stupid to keep around on any setup which is supposed to be scalable.

Your border might be wider than mine :-). Yes, the bitmap should
definitely go.

Patch

diff --git a/block/blk-exec.c b/block/blk-exec.c
index 0761c89..70303d2 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -25,7 +25,10 @@  static void blk_end_sync_rq(struct request *rq, int error)
        struct completion *waiting = rq->end_io_data;
 
        rq->end_io_data = NULL;
-       if (!rq->q->mq_ops) {
+       if (rq->q->mq_ops) {
+               if (rq->bio)
+                       bio_endio(rq->bio, error);
+       } else {
                __blk_put_request(rq->q, rq);
        }