diff mbox

[v3,1/2] block: Add iocontext priority to request

Message ID 1476218427-4021-2-git-send-email-adam.manzanares@hgst.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Adam Manzanares Oct. 11, 2016, 8:40 p.m. UTC
Patch adds an association between iocontext ioprio and the ioprio of
a request. This feature is only enabled if a queue flag is set to
indicate that requests should have ioprio associated with them. The
queue flag is exposed as the req_prio queue sysfs entry.

Signed-off-by: Adam Mananzanares <adam.manzanares@hgst.com>
---
 Documentation/block/queue-sysfs.txt | 12 ++++++++++++
 block/blk-core.c                    |  5 +++++
 block/blk-sysfs.c                   | 32 ++++++++++++++++++++++++++++++++
 include/linux/blkdev.h              |  3 +++
 4 files changed, 52 insertions(+)

Comments

Jens Axboe Oct. 12, 2016, 2:49 p.m. UTC | #1
On 10/11/2016 02:40 PM, Adam Manzanares wrote:
> Patch adds an association between iocontext ioprio and the ioprio of
> a request. This feature is only enabled if a queue flag is set to
> indicate that requests should have ioprio associated with them. The
> queue flag is exposed as the req_prio queue sysfs entry.

Honestly, I don't get this patchset. For the normal file system path, we
inherit the iocontext priority into the bio. That in turns gets
inherited by the request. Why is this any different?

It'd be much cleaner to just have 'rq' inherit the IO priority from the
io context when the 'rq' is allocated, and ensuring that we inherit and
override this if the bio has one set instead. It should already work
like that.

And in no way should we add some sysfs file to control this, that is
nuts.
Adam Manzanares Oct. 12, 2016, 5:58 p.m. UTC | #2
The 10/12/2016 08:49, Jens Axboe wrote:
> On 10/11/2016 02:40 PM, Adam Manzanares wrote:
> >Patch adds an association between iocontext ioprio and the ioprio of
> >a request. This feature is only enabled if a queue flag is set to
> >indicate that requests should have ioprio associated with them. The
> >queue flag is exposed as the req_prio queue sysfs entry.
> 
> Honestly, I don't get this patchset. For the normal file system path, we
> inherit the iocontext priority into the bio. That in turns gets
> inherited by the request. Why is this any different?
>
I was hoping this was true before I started looking into this, but the 
iocontext priority is not set on the bio. I did look into setting the
iocontext priority on the bio, and this would have do be done close 
in the call stack to init_request_from_bio so I chose to set it here.

If I missed where this occurs I would appreciate it if you pointed me to the 
code where the bio gets the iopriority from the iocontext. 

> It'd be much cleaner to just have 'rq' inherit the IO priority from the
> io context when the 'rq' is allocated, and ensuring that we inherit and
> override this if the bio has one set instead. It should already work
> like that.

I looked into the request allocation code and the only place I see where 
the iocontext is associated with the request is through a icq. Looking at 
the documentation of the icq it states that the icq_size of the elevator 
has to be set in order for block core to manage this. The only scheduler 
using this currently is the cfq scheduler and I think prioritized requests 
should be independent of the scheduler used. 

I agree that this should make it into the code where the rq is allocated.

Again if I have missed something please point me to the relevant code and 
I will modify the patch as necessary.

> 
> And in no way should we add some sysfs file to control this, that is
> nuts.

My concern is that we will now be exposing priority information to lower 
layers and if there happens to be code that uses the priority now it will 
actually make an impact. My example being the fusion mptsas driver. 

The second issue I foresee is that lower level drivers will need a sysfs 
file to control whether or not we send prioritized commands to the device.
We are wary of sending prioritized commands by default when we are unsure
of how the device will respond to these prioritized commands.

The reason I propose a sysfs file in the queue is that it solves these two 
issues at the same time. 

I would appreciate it if you could suggest an alternative fix for these issues
or an explanation of why these concerns are not valid.

> 
> -- 
> Jens Axboe
> 

Take care,
Adam
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Oct. 12, 2016, 9:05 p.m. UTC | #3
On 10/12/2016 11:58 AM, Adam Manzanares wrote:
> The 10/12/2016 08:49, Jens Axboe wrote:
>> On 10/11/2016 02:40 PM, Adam Manzanares wrote:
>>> Patch adds an association between iocontext ioprio and the ioprio of
>>> a request. This feature is only enabled if a queue flag is set to
>>> indicate that requests should have ioprio associated with them. The
>>> queue flag is exposed as the req_prio queue sysfs entry.
>>
>> Honestly, I don't get this patchset. For the normal file system path, we
>> inherit the iocontext priority into the bio. That in turns gets
>> inherited by the request. Why is this any different?
>>
> I was hoping this was true before I started looking into this, but the
> iocontext priority is not set on the bio. I did look into setting the
> iocontext priority on the bio, and this would have do be done close
> in the call stack to init_request_from_bio so I chose to set it here.
>
> If I missed where this occurs I would appreciate it if you pointed me
> to the code where the bio gets the iopriority from the iocontext.

It currently happens in CFQ, since that is what deals with priorities.
But we should just make that the case, generically. The rule is that if
priority is set in a bio, that is what we'll use. But if not, we'll
use what the application has set in general, and that is available in
the io context.

>> It'd be much cleaner to just have 'rq' inherit the IO priority from the
>> io context when the 'rq' is allocated, and ensuring that we inherit and
>> override this if the bio has one set instead. It should already work
>> like that.
>
> I looked into the request allocation code and the only place I see where
> the iocontext is associated with the request is through a icq. Looking at
> the documentation of the icq it states that the icq_size of the elevator
> has to be set in order for block core to manage this. The only scheduler
> using this currently is the cfq scheduler and I think prioritized requests
> should be independent of the scheduler used.

Agree, see above, it should just happen generically.

>> And in no way should we add some sysfs file to control this, that is
>> nuts.
>
> My concern is that we will now be exposing priority information to lower
> layers and if there happens to be code that uses the priority now it will
> actually make an impact. My example being the fusion mptsas driver.
>
> The second issue I foresee is that lower level drivers will need a sysfs
> file to control whether or not we send prioritized commands to the device.
> We are wary of sending prioritized commands by default when we are unsure
> of how the device will respond to these prioritized commands.
>
> The reason I propose a sysfs file in the queue is that it solves these two
> issues at the same time.
>
> I would appreciate it if you could suggest an alternative fix for
> these issues or an explanation of why these concerns are not valid.

At that point it should be a driver knob, it's not something that should
be part of the generic block layer.
diff mbox

Patch

diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt
index 2a39040..3ca4e8f 100644
--- a/Documentation/block/queue-sysfs.txt
+++ b/Documentation/block/queue-sysfs.txt
@@ -144,6 +144,18 @@  For storage configurations that need to maximize distribution of completion
 processing setting this option to '2' forces the completion to run on the
 requesting cpu (bypassing the "group" aggregation logic).
 
+rq_ioc_prio (RW)
+----------------
+If this option is '1', and there is a valid iocontext associated with the
+issuing context, and the bio we are processing does not have a valid
+prio, then we save the prio value from the iocontext with the request.
+
+This feature can be combined with device drivers that are aware of prio
+values in order to handle prio accordingly. An example would be if the ata
+layer recognizes prio and creates ata commands with high priority and sends
+them to the device. If the hardware supports priorities for commands then
+this has the potential to speed up response times for high priority IO.
+
 scheduler (RW)
 --------------
 When read, this file will display the current and available IO schedulers
diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..2e740c4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -33,6 +33,7 @@ 
 #include <linux/ratelimit.h>
 #include <linux/pm_runtime.h>
 #include <linux/blk-cgroup.h>
+#include <linux/ioprio.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/block.h>
@@ -1648,6 +1649,7 @@  unsigned int blk_plug_queued_count(struct request_queue *q)
 
 void init_request_from_bio(struct request *req, struct bio *bio)
 {
+	struct io_context *ioc = rq_ioc(bio);
 	req->cmd_type = REQ_TYPE_FS;
 
 	req->cmd_flags |= bio->bi_opf & REQ_COMMON_MASK;
@@ -1657,6 +1659,9 @@  void init_request_from_bio(struct request *req, struct bio *bio)
 	req->errors = 0;
 	req->__sector = bio->bi_iter.bi_sector;
 	req->ioprio = bio_prio(bio);
+	if (blk_queue_rq_ioc_prio(req->q) && !ioprio_valid(req->ioprio) && ioc)
+		req->ioprio = ioc->ioprio;
+
 	blk_rq_bio_prep(req->q, req, bio);
 }
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..a9c5105 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -384,6 +384,31 @@  static ssize_t queue_dax_show(struct request_queue *q, char *page)
 	return queue_var_show(blk_queue_dax(q), page);
 }
 
+static ssize_t queue_rq_ioc_prio_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(blk_queue_rq_ioc_prio(q), page);
+}
+
+static ssize_t queue_rq_ioc_prio_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long rq_ioc_prio_on;
+	ssize_t ret;
+
+	ret = queue_var_store(&rq_ioc_prio_on, page, count);
+	if (ret < 0)
+		return ret;
+
+	spin_lock_irq(q->queue_lock);
+	if (rq_ioc_prio_on)
+		queue_flag_set(QUEUE_FLAG_RQ_IOC_PRIO, q);
+	else
+		queue_flag_clear(QUEUE_FLAG_RQ_IOC_PRIO, q);
+	spin_unlock_irq(q->queue_lock);
+
+	return ret;
+}
+
 static struct queue_sysfs_entry queue_requests_entry = {
 	.attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_requests_show,
@@ -526,6 +551,12 @@  static struct queue_sysfs_entry queue_dax_entry = {
 	.show = queue_dax_show,
 };
 
+static struct queue_sysfs_entry queue_rq_ioc_prio_entry = {
+	.attr = {.name = "rq_ioc_prio", .mode = S_IRUGO | S_IWUSR },
+	.show = queue_rq_ioc_prio_show,
+	.store = queue_rq_ioc_prio_store,
+};
+
 static struct attribute *default_attrs[] = {
 	&queue_requests_entry.attr,
 	&queue_ra_entry.attr,
@@ -553,6 +584,7 @@  static struct attribute *default_attrs[] = {
 	&queue_poll_entry.attr,
 	&queue_wc_entry.attr,
 	&queue_dax_entry.attr,
+	&queue_rq_ioc_prio_entry.attr,
 	NULL,
 };
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..63b842a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -505,6 +505,7 @@  struct request_queue {
 #define QUEUE_FLAG_FUA	       24	/* device supports FUA writes */
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
+#define QUEUE_FLAG_RQ_IOC_PRIO 27	/* Use iocontext ioprio */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -595,6 +596,8 @@  static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_queue_secure_erase(q) \
 	(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_rq_ioc_prio(q) \
+	test_bit(QUEUE_FLAG_RQ_IOC_PRIO, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \