Message ID | 1647945585-197349-2-git-send-email-john.garry@huawei.com |
---|---|
State | New |
Headers | show |
Series | blk-mq/libata/scsi: SCSI driver tagging improvements | expand |
On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote: > Add an API to allocate a request queue which accepts a custom set of > blk_mq_ops for that request queue. > > The reason which we may want custom ops is for queuing requests which we > don't want to go through the normal queuing path. Eww. I really do not think we should do separate ops per queue, as that is going to get us into a deep mess eventually.
On 22/03/2022 11:18, Christoph Hellwig wrote: > On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote: >> Add an API to allocate a request queue which accepts a custom set of >> blk_mq_ops for that request queue. >> >> The reason which we may want custom ops is for queuing requests which we >> don't want to go through the normal queuing path. > > Eww. I really do not think we should do separate ops per queue, as that > is going to get us into a deep mess eventually. > Yeah... so far (here) it works out quite nicely, as we don't need to change the SCSI blk mq ops nor allocate a scsi_device - everything is just separate. The other method mentioned previously was to add the request "reserved" flag and add new paths in scsi_queue_rq() et al to handle this, but that gets messy. Any other ideas ...? Cheers, John
On 3/22/22 12:33, John Garry wrote: > On 22/03/2022 11:18, Christoph Hellwig wrote: >> On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote: >>> Add an API to allocate a request queue which accepts a custom set of >>> blk_mq_ops for that request queue. >>> >>> The reason which we may want custom ops is for queuing requests which we >>> don't want to go through the normal queuing path. >> >> Eww. I really do not think we should do separate ops per queue, as that >> is going to get us into a deep mess eventually. >> > > Yeah... so far (here) it works out quite nicely, as we don't need to > change the SCSI blk mq ops nor allocate a scsi_device - everything is > just separate. > > The other method mentioned previously was to add the request "reserved" > flag and add new paths in scsi_queue_rq() et al to handle this, but that > gets messy. > > Any other ideas ...? > As outlined in the other mail, I think might be useful is to have a _third_ type of requests (in addition to the normal and the reserved ones). That one would be allocated from the normal I/O pool (and hence could fail if the pool is exhausted), but would be able to carry a different payload (type) than the normal requests. And we could have a separate queue_rq for these requests, as we can differentiate them in the block layer. Cheers, Hannes
On 22/03/2022 12:16, Hannes Reinecke wrote: > On 3/22/22 12:33, John Garry wrote: >> On 22/03/2022 11:18, Christoph Hellwig wrote: >>> On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote: >>>> Add an API to allocate a request queue which accepts a custom set of >>>> blk_mq_ops for that request queue. >>>> >>>> The reason which we may want custom ops is for queuing requests >>>> which we >>>> don't want to go through the normal queuing path. >>> >>> Eww. I really do not think we should do separate ops per queue, as that >>> is going to get us into a deep mess eventually. >>> >> >> Yeah... so far (here) it works out quite nicely, as we don't need to >> change the SCSI blk mq ops nor allocate a scsi_device - everything is >> just separate. >> >> The other method mentioned previously was to add the request >> "reserved" flag and add new paths in scsi_queue_rq() et al to handle >> this, but that gets messy. >> >> Any other ideas ...? >> > > As outlined in the other mail, I think might be useful is to have a > _third_ type of requests (in addition to the normal and the reserved ones). > That one would be allocated from the normal I/O pool (and hence could > fail if the pool is exhausted), but would be able to carry a different > payload (type) than the normal requests. As mentioned in the cover letter response, it just seems best to keep the normal scsi_cmnd payload but have other means to add on the internal command data, like using host_scribble or scsi_cmnd priv data. > And we could have a separate queue_rq for these requests, as we can > differentiate them in the block layer. I don't know, let me think about it. Maybe we could add an "internal" blk flag, which uses a separate "internal" queue_rq callback. Thanks, John
On 3/22/22 13:30, John Garry wrote: > On 22/03/2022 12:16, Hannes Reinecke wrote: >> On 3/22/22 12:33, John Garry wrote: >>> On 22/03/2022 11:18, Christoph Hellwig wrote: >>>> On Tue, Mar 22, 2022 at 06:39:35PM +0800, John Garry wrote: >>>>> Add an API to allocate a request queue which accepts a custom set of >>>>> blk_mq_ops for that request queue. >>>>> >>>>> The reason which we may want custom ops is for queuing requests >>>>> which we >>>>> don't want to go through the normal queuing path. >>>> >>>> Eww. I really do not think we should do separate ops per queue, as >>>> that >>>> is going to get us into a deep mess eventually. >>>> >>> >>> Yeah... so far (here) it works out quite nicely, as we don't need to >>> change the SCSI blk mq ops nor allocate a scsi_device - everything is >>> just separate. >>> >>> The other method mentioned previously was to add the request >>> "reserved" flag and add new paths in scsi_queue_rq() et al to handle >>> this, but that gets messy. >>> >>> Any other ideas ...? >>> >> >> As outlined in the other mail, I think might be useful is to have a >> _third_ type of requests (in addition to the normal and the reserved >> ones). >> That one would be allocated from the normal I/O pool (and hence could >> fail if the pool is exhausted), but would be able to carry a different >> payload (type) than the normal requests. > > As mentioned in the cover letter response, it just seems best to keep > the normal scsi_cmnd payload but have other means to add on the internal > command data, like using host_scribble or scsi_cmnd priv data. > Well; I found that most drivers I had been looking at the scsi command payload isn't used at all; the drivers primarily cared about the (driver-provided) payload, and were completely ignoring the scsi command payload. Similar for ATA/libsas: you basically never issue real scsi commands, but either 'raw' ATA requests or SCSI TMFs. None of which are scsi commands, so providing them is a bit of a waste. (And causes irritations, too, as a scsi command requires associated pointers like ->device etc to be set up. Which makes it tricky to use for the initial device setup.) >> And we could have a separate queue_rq for these requests, as we can >> differentiate them in the block layer. > > I don't know, let me think about it. Maybe we could add an "internal" > blk flag, which uses a separate "internal" queue_rq callback. > Yeah, that's what I had in mind. Cheers, Hannes
On 22/03/2022 14:03, Hannes Reinecke wrote: >> >> As mentioned in the cover letter response, it just seems best to keep >> the normal scsi_cmnd payload but have other means to add on the >> internal command data, like using host_scribble or scsi_cmnd priv data. >> > Well; I found that most drivers I had been looking at the scsi command > payload isn't used at all; the drivers primarily cared about the > (driver-provided) payload, and were completely ignoring the scsi command > payload. > > Similar for ATA/libsas: you basically never issue real scsi commands, > but either 'raw' ATA requests or SCSI TMFs. None of which are scsi > commands, so providing them is a bit of a waste. > > (And causes irritations, too, as a scsi command requires associated > pointers like ->device etc to be set up. Which makes it tricky to use > for the initial device setup.) A problem I see is that in scsi_mq_init_request() we allocate memories like sense_buffer and prot_sdb and store the pointers in the scsi_cmnd payload. If we then reuse a scsi_cmnd payload as an "internal" command payload then this data may be lost. It might be possible to reuse the scsi cmnd payload for the "internal", but I would rather not get hung up on it now if possible. Thanks, John
On 3/22/22 16:17, John Garry wrote: > On 22/03/2022 14:03, Hannes Reinecke wrote: >>> >>> As mentioned in the cover letter response, it just seems best to keep >>> the normal scsi_cmnd payload but have other means to add on the >>> internal command data, like using host_scribble or scsi_cmnd priv data. >>> >> Well; I found that most drivers I had been looking at the scsi command >> payload isn't used at all; the drivers primarily cared about the >> (driver-provided) payload, and were completely ignoring the scsi >> command payload. >> >> Similar for ATA/libsas: you basically never issue real scsi commands, >> but either 'raw' ATA requests or SCSI TMFs. None of which are scsi >> commands, so providing them is a bit of a waste. >> >> (And causes irritations, too, as a scsi command requires associated >> pointers like ->device etc to be set up. Which makes it tricky to use >> for the initial device setup.) > > A problem I see is that in scsi_mq_init_request() we allocate memories > like sense_buffer and prot_sdb and store the pointers in the scsi_cmnd > payload. If we then reuse a scsi_cmnd payload as an "internal" command > payload then this data may be lost. > > It might be possible to reuse the scsi cmnd payload for the "internal", > but I would rather not get hung up on it now if possible. > Or, keep the payload as is, and use the 'internal' marker to indicate that the scsi payload is not valid. That would save us quite some checks during endio processing. Cheers, Hannes
On 3/22/22 03:39, John Garry wrote: > Add an API to allocate a request queue which accepts a custom set of > blk_mq_ops for that request queue. > > The reason which we may want custom ops is for queuing requests which we > don't want to go through the normal queuing path. Custom ops shouldn't be required for this. See e.g. how tmf_queue is used in the UFS driver for an example of a queue implementation with custom operations and that does not require changes of the block layer core. Thanks, Bart.
On 23/03/2022 02:57, Bart Van Assche wrote: > On 3/22/22 03:39, John Garry wrote: >> Add an API to allocate a request queue which accepts a custom set of >> blk_mq_ops for that request queue. >> >> The reason which we may want custom ops is for queuing requests which we >> don't want to go through the normal queuing path. > Hi Bart, > Custom ops shouldn't be required for this. See e.g. how tmf_queue > is used in the UFS driver for an example of a queue implementation > with custom operations and that does not require changes of the block > layer core. The UFS code uses a private tagset (in ufs_hba.tmf_tag_set) for only management of TMF tags/memories. This tagset does not really have any custom operations. All it has is a stub of .queue_rq CB in ufshcd_queue_tmf() and that is because this CB is compulsory. As for the idea of having multiple tagsets per shost with real custom operations, this idea was mentioned before, but I think managing multiple tagsets could be trouble. For a start, it would mean that we need a distinct allocation of reserved and regular tags, and sometimes we don't want this - as Hannes mentioned earlier, many HBAs have low queue depth and cannot afford to permanently carve out a bunch of reserved tags. Thanks, John
diff --git a/block/blk-mq.c b/block/blk-mq.c index f3bf3358a3bb..8ea3447339ca 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3858,7 +3858,7 @@ void blk_mq_release(struct request_queue *q) } static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set, - void *queuedata) + void *queuedata, const struct blk_mq_ops *ops) { struct request_queue *q; int ret; @@ -3867,27 +3867,35 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set, if (!q) return ERR_PTR(-ENOMEM); q->queuedata = queuedata; - ret = blk_mq_init_allocated_queue(set, q); + ret = blk_mq_init_allocated_queue(set, q, ops); if (ret) { blk_cleanup_queue(q); return ERR_PTR(ret); } + return q; } struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set) { - return blk_mq_init_queue_data(set, NULL); + return blk_mq_init_queue_data(set, NULL, NULL); } EXPORT_SYMBOL(blk_mq_init_queue); +struct request_queue *blk_mq_init_queue_ops(struct blk_mq_tag_set *set, + const struct blk_mq_ops *custom_ops) +{ + return blk_mq_init_queue_data(set, NULL, custom_ops); +} +EXPORT_SYMBOL(blk_mq_init_queue_ops); + struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata, struct lock_class_key *lkclass) { struct request_queue *q; struct gendisk *disk; - q = blk_mq_init_queue_data(set, queuedata); + q = blk_mq_init_queue_data(set, queuedata, NULL); if (IS_ERR(q)) return ERR_CAST(q); @@ -4010,13 +4018,16 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, } int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, - struct request_queue *q) + struct request_queue *q, const struct blk_mq_ops *custom_ops) { WARN_ON_ONCE(blk_queue_has_srcu(q) != !!(set->flags & BLK_MQ_F_BLOCKING)); /* mark the queue as mq asap */ - q->mq_ops = set->ops; + if (custom_ops) + q->mq_ops = custom_ops; + else + q->mq_ops = set->ops; q->poll_cb = blk_stat_alloc_callback(blk_mq_poll_stats_fn, blk_mq_poll_stats_bkt, diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 3907950a0ddc..9d93f72a3eec 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -560,7 +560,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t) if (err) goto out_kfree_tag_set; - err = blk_mq_init_allocated_queue(md->tag_set, md->queue); + err = blk_mq_init_allocated_queue(md->tag_set, md->queue, NULL); if (err) goto out_tag_set; return 0; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index d319ffa59354..e12d17c86c52 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -688,8 +688,11 @@ struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata, __blk_mq_alloc_disk(set, queuedata, &__key); \ }) struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *); +struct request_queue *blk_mq_init_queue_ops(struct blk_mq_tag_set *, + const struct blk_mq_ops *custom_ops); + int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, - struct request_queue *q); + struct request_queue *q, const struct blk_mq_ops *custom_ops); void blk_mq_unregister_dev(struct device *, struct request_queue *); int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set);
Add an API to allocate a request queue which accepts a custom set of blk_mq_ops for that request queue. The reason which we may want custom ops is for queuing requests which we don't want to go through the normal queuing path. Signed-off-by: John Garry <john.garry@huawei.com> --- block/blk-mq.c | 23 +++++++++++++++++------ drivers/md/dm-rq.c | 2 +- include/linux/blk-mq.h | 5 ++++- 3 files changed, 22 insertions(+), 8 deletions(-)