From patchwork Thu Jan 10 03:12:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1022676 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43Zrfv4RR5z9sDL; Thu, 10 Jan 2019 14:13:27 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ghQmX-0006rz-Ol; Thu, 10 Jan 2019 03:13:21 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1ghQmR-0006me-Sk for kernel-team@lists.ubuntu.com; Thu, 10 Jan 2019 03:13:15 +0000 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1ghQmR-0000XJ-6p for kernel-team@lists.ubuntu.com; Thu, 10 Jan 2019 03:13:15 +0000 Received: by mail-qk1-f197.google.com with SMTP id f22so8061387qkm.11 for ; Wed, 09 Jan 2019 19:13:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=uYtRveyu3K6ijGln4kP0nePGYDmZDgU7duSE1sW9LGI=; b=kk3wbAuxzSUSFz+E1DFDrvcYrDPzKNqULz//alLgDpx+yHViZiylqPvHIuN1l7CuH7 QDZ/ZrngSOne3t83FH3Et8eztQbNsueM6XequnKVSt7wAWMNoxlhYZA3YgGIUSpsfUDl cVlDA0xdk1i5IF2fhAMFvJ4RHwBMIubx5DRxqeAGZxSAxPw4vP+p7XJz0tMEWtNAWdRP qKjYZBsFIGdhDHqmzX4ydlS5im/2OJXFhQxTnzVhiZtsJfwnU7oONsl9M2w3Rc0tPUyt t6AD8ZzreKzgSXn5Or/oaRab+5XPqjjErk00s5z/XSTw0NMfFwQEjL8n44u2bz2Jpy56 gz4A== X-Gm-Message-State: AJcUukdIlThm49DIyc2esepWYGqmL8Q3+deohzml2mPu4aGyiO513fPz sPqB1QG2Dq9yntSn7sdmIk1dXZcaV8lSLRQx39EcjCuaxr3dxO0JwnH3fZZ+ULdV6m9H8fGniUO p8UgNKwl2nL9g9abNhmumjQe8eTy2p3MKGmJraIvmVQ== X-Received: by 2002:ac8:29b7:: with SMTP id 52mr7984299qts.266.1547089994015; Wed, 09 Jan 2019 19:13:14 -0800 (PST) X-Google-Smtp-Source: ALg8bN448y168p24hrGSKjlxZCx9CbDNXMG4M8LNUVlNaPMf/IKA/aMy5Zc0Q7q4cxNNb46LQ5PAIA== X-Received: by 2002:ac8:29b7:: with SMTP id 52mr7984267qts.266.1547089993182; Wed, 09 Jan 2019 19:13:13 -0800 (PST) Received: from localhost.localdomain ([177.181.227.0]) by smtp.gmail.com with ESMTPSA id s9sm43971956qta.35.2019.01.09.19.13.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Jan 2019 19:13:12 -0800 (PST) From: Mauricio Faria de Oliveira To: kernel-team@lists.ubuntu.com Subject: [SRU B][PATCH 06/13] blk-rq-qos: refactor out common elements of blk-wbt Date: Thu, 10 Jan 2019 01:12:12 -0200 Message-Id: <20190110031219.30676-7-mfo@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190110031219.30676-1-mfo@canonical.com> References: <20190110031219.30676-1-mfo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Josef Bacik BugLink: https://bugs.launchpad.net/bugs/1810998 blkcg-qos is going to do essentially what wbt does, only on a cgroup basis. Break out the common code that will be shared between blkcg-qos and wbt into blk-rq-qos.* so they can both utilize the same infrastructure. Signed-off-by: Josef Bacik Signed-off-by: Jens Axboe (backported from commit a79050434b45959f397042080fd1d70ffa9bd9df) [mfo: backport: - blk-core.c: - hunk 4: refresh last context line due to lack of commit c3036021c7bd ("block: use GFP_NOIO instead of __GFP_DIRECT_RECLAIM"), not required nor its dependencies, for this fix. - hunk 5: refresh upper 2 context lines and last context line due to lack of commit 544ccc8dc904 ("block: get rid of struct blk_issue_stat") and commit e14575b3d457 ("block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit"), not required. - hunk 6: refresh top context line due to lack of commit 522a777566f ("block: consolidate struct request timestamp fields"), not required. - blk-mq.c: - hunk 2: refresh top context line due to lack of commit 522a777566f ("block: consolidate struct request timestamp fields"), not required. - hunk 3: update upper context lines due to lack of commit 544ccc8dc904 ("block: get rid of struct blk_issue_stat"), not required, and the lower context lines due to lack of commit 1d9bd5161ba3 ("blk-mq: replace timeout synchronization with a RCU and generation based scheme"), not required either for this fix. - hunk 4: refresh lower context lines due to lack of commit 1d9bd5161ba3 ("blk-mq: replace timeout synchronization with a RCU and generation based scheme"), plus commit 5a61c36398d06 ("blk-mq: remove REQ_ATOM_STARTED"), not required either. - hunk 5: refresh upper context lines due to lack of commit 544ccc8dc904 ("block: get rid of struct blk_issue_stat"), not required. - blkdev.h: - hunk 2: refresh upper context lines due to lack of commit 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()"), not required.] Signed-off-by: Mauricio Faria de Oliveira --- block/Makefile | 2 +- block/blk-core.c | 12 +- block/blk-mq.c | 12 +- block/blk-rq-qos.c | 178 ++++++++++++++++++++++ block/blk-rq-qos.h | 106 ++++++++++++++ block/blk-settings.c | 4 +- block/blk-sysfs.c | 22 ++- block/blk-wbt.c | 326 ++++++++++++++++++----------------------- block/blk-wbt.h | 63 +++----- include/linux/blkdev.h | 4 +- 10 files changed, 478 insertions(+), 251 deletions(-) create mode 100644 block/blk-rq-qos.c create mode 100644 block/blk-rq-qos.h diff --git a/block/Makefile b/block/Makefile index 6a56303b9925..981605042cad 100644 --- a/block/Makefile +++ b/block/Makefile @@ -9,7 +9,7 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o blk-sysfs.o \ blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \ blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \ genhd.o partition-generic.o ioprio.o \ - badblocks.o partitions/ + badblocks.o partitions/ blk-rq-qos.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o diff --git a/block/blk-core.c b/block/blk-core.c index 6ed5c0eb29fc..5659da721708 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1553,7 +1553,7 @@ void blk_requeue_request(struct request_queue *q, struct request *rq) blk_delete_timer(rq); blk_clear_rq_complete(rq); trace_block_rq_requeue(q, rq); - wbt_requeue(q->rq_wb, rq); + rq_qos_requeue(q, rq); if (rq->rq_flags & RQF_QUEUED) blk_queue_end_tag(q, rq); @@ -1659,7 +1659,7 @@ void __blk_put_request(struct request_queue *q, struct request *req) /* this is a bio leak */ WARN_ON(req->bio != NULL); - wbt_done(q->rq_wb, req); + rq_qos_done(q, req); /* * Request may not have originated from ll_rw_blk. if not, @@ -1951,7 +1951,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio) } get_rq: - wb_acct = wbt_wait(q->rq_wb, bio, q->queue_lock); + wb_acct = rq_qos_throttle(q, bio, q->queue_lock); /* * Grab a free request. This is might sleep but can not fail. @@ -1961,7 +1961,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio) req = get_request(q, bio->bi_opf, bio, 0); if (IS_ERR(req)) { blk_queue_exit(q); - __wbt_done(q->rq_wb, wb_acct); + rq_qos_cleanup(q, wb_acct); if (PTR_ERR(req) == -ENOMEM) bio->bi_status = BLK_STS_RESOURCE; else @@ -2849,7 +2849,7 @@ void blk_start_request(struct request *req) if (test_bit(QUEUE_FLAG_STATS, &req->q->queue_flags)) { blk_stat_set_issue(&req->issue_stat, blk_rq_sectors(req)); req->rq_flags |= RQF_STATS; - wbt_issue(req->q->rq_wb, req); + rq_qos_issue(req->q, req); } BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags)); @@ -3068,7 +3068,7 @@ void blk_finish_request(struct request *req, blk_status_t error) blk_account_io_done(req); if (req->end_io) { - wbt_done(req->q->rq_wb, req); + rq_qos_done(q, req); req->end_io(req, error); } else { if (blk_bidi_rq(req)) diff --git a/block/blk-mq.c b/block/blk-mq.c index 7172ae4b9993..b741cf6b4c62 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -499,7 +499,7 @@ void blk_mq_free_request(struct request *rq) if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq))) laptop_io_completion(q->backing_dev_info); - wbt_done(q->rq_wb, rq); + rq_qos_done(q, rq); if (blk_rq_rl(rq)) blk_put_rl(blk_rq_rl(rq)); @@ -520,7 +520,7 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error) blk_account_io_done(rq); if (rq->end_io) { - wbt_done(rq->q->rq_wb, rq); + rq_qos_done(rq->q, rq); rq->end_io(rq, error); } else { if (unlikely(blk_bidi_rq(rq))) @@ -614,7 +614,7 @@ void blk_mq_start_request(struct request *rq) if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags)) { blk_stat_set_issue(&rq->issue_stat, blk_rq_sectors(rq)); rq->rq_flags |= RQF_STATS; - wbt_issue(q->rq_wb, rq); + rq_qos_issue(q, rq); } blk_add_timer(rq); @@ -673,7 +673,7 @@ static void __blk_mq_requeue_request(struct request *rq) blk_mq_put_driver_tag(rq); trace_block_rq_requeue(q, rq); - wbt_requeue(q->rq_wb, rq); + rq_qos_requeue(q, rq); if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) { if (q->dma_drain_size && blk_rq_bytes(rq)) @@ -1758,13 +1758,13 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) if (blk_mq_sched_bio_merge(q, bio)) return BLK_QC_T_NONE; - wb_acct = wbt_wait(q->rq_wb, bio, NULL); + wb_acct = rq_qos_throttle(q, bio, NULL); trace_block_getrq(q, bio, bio->bi_opf); rq = blk_mq_get_request(q, bio, bio->bi_opf, &data); if (unlikely(!rq)) { - __wbt_done(q->rq_wb, wb_acct); + rq_qos_cleanup(q, wb_acct); if (bio->bi_opf & REQ_NOWAIT) bio_wouldblock_error(bio); return BLK_QC_T_NONE; diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c new file mode 100644 index 000000000000..d2f2af8aa10c --- /dev/null +++ b/block/blk-rq-qos.c @@ -0,0 +1,178 @@ +#include "blk-rq-qos.h" + +#include "blk-wbt.h" + +/* + * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded, + * false if 'v' + 1 would be bigger than 'below'. + */ +static bool atomic_inc_below(atomic_t *v, int below) +{ + int cur = atomic_read(v); + + for (;;) { + int old; + + if (cur >= below) + return false; + old = atomic_cmpxchg(v, cur, cur + 1); + if (old == cur) + break; + cur = old; + } + + return true; +} + +bool rq_wait_inc_below(struct rq_wait *rq_wait, int limit) +{ + return atomic_inc_below(&rq_wait->inflight, limit); +} + +void rq_qos_cleanup(struct request_queue *q, enum wbt_flags wb_acct) +{ + struct rq_qos *rqos; + + for (rqos = q->rq_qos; rqos; rqos = rqos->next) { + if (rqos->ops->cleanup) + rqos->ops->cleanup(rqos, wb_acct); + } +} + +void rq_qos_done(struct request_queue *q, struct request *rq) +{ + struct rq_qos *rqos; + + for (rqos = q->rq_qos; rqos; rqos = rqos->next) { + if (rqos->ops->done) + rqos->ops->done(rqos, rq); + } +} + +void rq_qos_issue(struct request_queue *q, struct request *rq) +{ + struct rq_qos *rqos; + + for(rqos = q->rq_qos; rqos; rqos = rqos->next) { + if (rqos->ops->issue) + rqos->ops->issue(rqos, rq); + } +} + +void rq_qos_requeue(struct request_queue *q, struct request *rq) +{ + struct rq_qos *rqos; + + for(rqos = q->rq_qos; rqos; rqos = rqos->next) { + if (rqos->ops->requeue) + rqos->ops->requeue(rqos, rq); + } +} + +enum wbt_flags rq_qos_throttle(struct request_queue *q, struct bio *bio, + spinlock_t *lock) +{ + struct rq_qos *rqos; + enum wbt_flags flags = 0; + + for(rqos = q->rq_qos; rqos; rqos = rqos->next) { + if (rqos->ops->throttle) + flags |= rqos->ops->throttle(rqos, bio, lock); + } + return flags; +} + +/* + * Return true, if we can't increase the depth further by scaling + */ +bool rq_depth_calc_max_depth(struct rq_depth *rqd) +{ + unsigned int depth; + bool ret = false; + + /* + * For QD=1 devices, this is a special case. It's important for those + * to have one request ready when one completes, so force a depth of + * 2 for those devices. On the backend, it'll be a depth of 1 anyway, + * since the device can't have more than that in flight. If we're + * scaling down, then keep a setting of 1/1/1. + */ + if (rqd->queue_depth == 1) { + if (rqd->scale_step > 0) + rqd->max_depth = 1; + else { + rqd->max_depth = 2; + ret = true; + } + } else { + /* + * scale_step == 0 is our default state. If we have suffered + * latency spikes, step will be > 0, and we shrink the + * allowed write depths. If step is < 0, we're only doing + * writes, and we allow a temporarily higher depth to + * increase performance. + */ + depth = min_t(unsigned int, rqd->default_depth, + rqd->queue_depth); + if (rqd->scale_step > 0) + depth = 1 + ((depth - 1) >> min(31, rqd->scale_step)); + else if (rqd->scale_step < 0) { + unsigned int maxd = 3 * rqd->queue_depth / 4; + + depth = 1 + ((depth - 1) << -rqd->scale_step); + if (depth > maxd) { + depth = maxd; + ret = true; + } + } + + rqd->max_depth = depth; + } + + return ret; +} + +void rq_depth_scale_up(struct rq_depth *rqd) +{ + /* + * Hit max in previous round, stop here + */ + if (rqd->scaled_max) + return; + + rqd->scale_step--; + + rqd->scaled_max = rq_depth_calc_max_depth(rqd); +} + +/* + * Scale rwb down. If 'hard_throttle' is set, do it quicker, since we + * had a latency violation. + */ +void rq_depth_scale_down(struct rq_depth *rqd, bool hard_throttle) +{ + /* + * Stop scaling down when we've hit the limit. This also prevents + * ->scale_step from going to crazy values, if the device can't + * keep up. + */ + if (rqd->max_depth == 1) + return; + + if (rqd->scale_step < 0 && hard_throttle) + rqd->scale_step = 0; + else + rqd->scale_step++; + + rqd->scaled_max = false; + rq_depth_calc_max_depth(rqd); +} + +void rq_qos_exit(struct request_queue *q) +{ + while (q->rq_qos) { + struct rq_qos *rqos = q->rq_qos; + q->rq_qos = rqos->next; + rqos->ops->exit(rqos); + } +} diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h new file mode 100644 index 000000000000..f9a39bd6ece3 --- /dev/null +++ b/block/blk-rq-qos.h @@ -0,0 +1,106 @@ +#ifndef RQ_QOS_H +#define RQ_QOS_H + +#include +#include +#include +#include +#include + +enum rq_qos_id { + RQ_QOS_WBT, + RQ_QOS_CGROUP, +}; + +struct rq_wait { + wait_queue_head_t wait; + atomic_t inflight; +}; + +struct rq_qos { + struct rq_qos_ops *ops; + struct request_queue *q; + enum rq_qos_id id; + struct rq_qos *next; +}; + +struct rq_qos_ops { + enum wbt_flags (*throttle)(struct rq_qos *, struct bio *, + spinlock_t *); + void (*issue)(struct rq_qos *, struct request *); + void (*requeue)(struct rq_qos *, struct request *); + void (*done)(struct rq_qos *, struct request *); + void (*cleanup)(struct rq_qos *, enum wbt_flags); + void (*exit)(struct rq_qos *); +}; + +struct rq_depth { + unsigned int max_depth; + + int scale_step; + bool scaled_max; + + unsigned int queue_depth; + unsigned int default_depth; +}; + +static inline struct rq_qos *rq_qos_id(struct request_queue *q, + enum rq_qos_id id) +{ + struct rq_qos *rqos; + for (rqos = q->rq_qos; rqos; rqos = rqos->next) { + if (rqos->id == id) + break; + } + return rqos; +} + +static inline struct rq_qos *wbt_rq_qos(struct request_queue *q) +{ + return rq_qos_id(q, RQ_QOS_WBT); +} + +static inline struct rq_qos *blkcg_rq_qos(struct request_queue *q) +{ + return rq_qos_id(q, RQ_QOS_CGROUP); +} + +static inline void rq_wait_init(struct rq_wait *rq_wait) +{ + atomic_set(&rq_wait->inflight, 0); + init_waitqueue_head(&rq_wait->wait); +} + +static inline void rq_qos_add(struct request_queue *q, struct rq_qos *rqos) +{ + rqos->next = q->rq_qos; + q->rq_qos = rqos; +} + +static inline void rq_qos_del(struct request_queue *q, struct rq_qos *rqos) +{ + struct rq_qos *cur, *prev = NULL; + for (cur = q->rq_qos; cur; cur = cur->next) { + if (cur == rqos) { + if (prev) + prev->next = rqos->next; + else + q->rq_qos = cur; + break; + } + prev = cur; + } +} + +bool rq_wait_inc_below(struct rq_wait *rq_wait, int limit); +void rq_depth_scale_up(struct rq_depth *rqd); +void rq_depth_scale_down(struct rq_depth *rqd, bool hard_throttle); +bool rq_depth_calc_max_depth(struct rq_depth *rqd); + +void rq_qos_cleanup(struct request_queue *, enum wbt_flags); +void rq_qos_done(struct request_queue *, struct request *); +void rq_qos_issue(struct request_queue *, struct request *); +void rq_qos_requeue(struct request_queue *, struct request *); +enum wbt_flags rq_qos_throttle(struct request_queue *, struct bio *, spinlock_t *); +void rq_qos_exit(struct request_queue *); +#endif diff --git a/block/blk-settings.c b/block/blk-settings.c index 48ebe6be07b7..93982d81e6d2 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -877,7 +877,7 @@ EXPORT_SYMBOL_GPL(blk_queue_flush_queueable); void blk_set_queue_depth(struct request_queue *q, unsigned int depth) { q->queue_depth = depth; - wbt_set_queue_depth(q->rq_wb, depth); + wbt_set_queue_depth(q, depth); } EXPORT_SYMBOL(blk_set_queue_depth); @@ -902,7 +902,7 @@ void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua) queue_flag_clear(QUEUE_FLAG_FUA, q); spin_unlock_irq(q->queue_lock); - wbt_set_write_cache(q->rq_wb, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); + wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); } EXPORT_SYMBOL_GPL(blk_queue_write_cache); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index c29ec0cfab1d..d3eb9c734062 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -426,16 +426,16 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page, static ssize_t queue_wb_lat_show(struct request_queue *q, char *page) { - if (!q->rq_wb) + if (!wbt_rq_qos(q)) return -EINVAL; - return sprintf(page, "%llu\n", div_u64(q->rq_wb->min_lat_nsec, 1000)); + return sprintf(page, "%llu\n", div_u64(wbt_get_min_lat(q), 1000)); } static ssize_t queue_wb_lat_store(struct request_queue *q, const char *page, size_t count) { - struct rq_wb *rwb; + struct rq_qos *rqos; ssize_t ret; s64 val; @@ -445,23 +445,21 @@ static ssize_t queue_wb_lat_store(struct request_queue *q, const char *page, if (val < -1) return -EINVAL; - rwb = q->rq_wb; - if (!rwb) { + rqos = wbt_rq_qos(q); + if (!rqos) { ret = wbt_init(q); if (ret) return ret; } - rwb = q->rq_wb; if (val == -1) - rwb->min_lat_nsec = wbt_default_latency_nsec(q); + val = wbt_default_latency_nsec(q); else if (val >= 0) - rwb->min_lat_nsec = val * 1000ULL; + val *= 1000ULL; - if (rwb->enable_state == WBT_STATE_ON_DEFAULT) - rwb->enable_state = WBT_STATE_ON_MANUAL; + wbt_set_min_lat(q, val); - wbt_update_limits(rwb); + wbt_update_limits(q); return count; } @@ -964,7 +962,7 @@ void blk_unregister_queue(struct gendisk *disk) kobject_del(&q->kobj); blk_trace_remove_sysfs(disk_to_dev(disk)); - wbt_exit(q); + rq_qos_exit(q); mutex_lock(&q->sysfs_lock); if (q->request_fn || (q->mq_ops && q->elevator)) diff --git a/block/blk-wbt.c b/block/blk-wbt.c index efc7f9fa2346..4a5dc6d8d9ef 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -25,6 +25,7 @@ #include #include "blk-wbt.h" +#include "blk-rq-qos.h" #define CREATE_TRACE_POINTS #include @@ -78,28 +79,6 @@ static inline bool rwb_enabled(struct rq_wb *rwb) return rwb && rwb->wb_normal != 0; } -/* - * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded, - * false if 'v' + 1 would be bigger than 'below'. - */ -static bool atomic_inc_below(atomic_t *v, int below) -{ - int cur = atomic_read(v); - - for (;;) { - int old; - - if (cur >= below) - return false; - old = atomic_cmpxchg(v, cur, cur + 1); - if (old == cur) - break; - cur = old; - } - - return true; -} - static void wb_timestamp(struct rq_wb *rwb, unsigned long *var) { if (rwb_enabled(rwb)) { @@ -116,7 +95,7 @@ static void wb_timestamp(struct rq_wb *rwb, unsigned long *var) */ static bool wb_recent_wait(struct rq_wb *rwb) { - struct bdi_writeback *wb = &rwb->queue->backing_dev_info->wb; + struct bdi_writeback *wb = &rwb->rqos.q->backing_dev_info->wb; return time_before(jiffies, wb->dirty_sleep + HZ); } @@ -142,8 +121,9 @@ static void rwb_wake_all(struct rq_wb *rwb) } } -void __wbt_done(struct rq_wb *rwb, enum wbt_flags wb_acct) +static void __wbt_done(struct rq_qos *rqos, enum wbt_flags wb_acct) { + struct rq_wb *rwb = RQWB(rqos); struct rq_wait *rqw; int inflight, limit; @@ -189,10 +169,9 @@ void __wbt_done(struct rq_wb *rwb, enum wbt_flags wb_acct) * Called on completion of a request. Note that it's also called when * a request is merged, when the request gets freed. */ -void wbt_done(struct rq_wb *rwb, struct request *rq) +static void wbt_done(struct rq_qos *rqos, struct request *rq) { - if (!rwb) - return; + struct rq_wb *rwb = RQWB(rqos); if (!wbt_is_tracked(rq)) { if (rwb->sync_cookie == rq) { @@ -204,72 +183,11 @@ void wbt_done(struct rq_wb *rwb, struct request *rq) wb_timestamp(rwb, &rwb->last_comp); } else { WARN_ON_ONCE(rq == rwb->sync_cookie); - __wbt_done(rwb, wbt_flags(rq)); + __wbt_done(rqos, wbt_flags(rq)); } wbt_clear_state(rq); } -/* - * Return true, if we can't increase the depth further by scaling - */ -static bool calc_wb_limits(struct rq_wb *rwb) -{ - unsigned int depth; - bool ret = false; - - if (!rwb->min_lat_nsec) { - rwb->wb_max = rwb->wb_normal = rwb->wb_background = 0; - return false; - } - - /* - * For QD=1 devices, this is a special case. It's important for those - * to have one request ready when one completes, so force a depth of - * 2 for those devices. On the backend, it'll be a depth of 1 anyway, - * since the device can't have more than that in flight. If we're - * scaling down, then keep a setting of 1/1/1. - */ - if (rwb->queue_depth == 1) { - if (rwb->scale_step > 0) - rwb->wb_max = rwb->wb_normal = 1; - else { - rwb->wb_max = rwb->wb_normal = 2; - ret = true; - } - rwb->wb_background = 1; - } else { - /* - * scale_step == 0 is our default state. If we have suffered - * latency spikes, step will be > 0, and we shrink the - * allowed write depths. If step is < 0, we're only doing - * writes, and we allow a temporarily higher depth to - * increase performance. - */ - depth = min_t(unsigned int, RWB_DEF_DEPTH, rwb->queue_depth); - if (rwb->scale_step > 0) - depth = 1 + ((depth - 1) >> min(31, rwb->scale_step)); - else if (rwb->scale_step < 0) { - unsigned int maxd = 3 * rwb->queue_depth / 4; - - depth = 1 + ((depth - 1) << -rwb->scale_step); - if (depth > maxd) { - depth = maxd; - ret = true; - } - } - - /* - * Set our max/normal/bg queue depths based on how far - * we have scaled down (->scale_step). - */ - rwb->wb_max = depth; - rwb->wb_normal = (rwb->wb_max + 1) / 2; - rwb->wb_background = (rwb->wb_max + 3) / 4; - } - - return ret; -} - static inline bool stat_sample_valid(struct blk_rq_stat *stat) { /* @@ -302,7 +220,8 @@ enum { static int latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat) { - struct backing_dev_info *bdi = rwb->queue->backing_dev_info; + struct backing_dev_info *bdi = rwb->rqos.q->backing_dev_info; + struct rq_depth *rqd = &rwb->rq_depth; u64 thislat; /* @@ -346,7 +265,7 @@ static int latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat) return LAT_EXCEEDED; } - if (rwb->scale_step) + if (rqd->scale_step) trace_wbt_stat(bdi, stat); return LAT_OK; @@ -354,58 +273,48 @@ static int latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat) static void rwb_trace_step(struct rq_wb *rwb, const char *msg) { - struct backing_dev_info *bdi = rwb->queue->backing_dev_info; + struct backing_dev_info *bdi = rwb->rqos.q->backing_dev_info; + struct rq_depth *rqd = &rwb->rq_depth; - trace_wbt_step(bdi, msg, rwb->scale_step, rwb->cur_win_nsec, - rwb->wb_background, rwb->wb_normal, rwb->wb_max); + trace_wbt_step(bdi, msg, rqd->scale_step, rwb->cur_win_nsec, + rwb->wb_background, rwb->wb_normal, rqd->max_depth); } -static void scale_up(struct rq_wb *rwb) +static void calc_wb_limits(struct rq_wb *rwb) { - /* - * Hit max in previous round, stop here - */ - if (rwb->scaled_max) - return; + if (rwb->min_lat_nsec == 0) { + rwb->wb_normal = rwb->wb_background = 0; + } else if (rwb->rq_depth.max_depth <= 2) { + rwb->wb_normal = rwb->rq_depth.max_depth; + rwb->wb_background = 1; + } else { + rwb->wb_normal = (rwb->rq_depth.max_depth + 1) / 2; + rwb->wb_background = (rwb->rq_depth.max_depth + 3) / 4; + } +} - rwb->scale_step--; +static void scale_up(struct rq_wb *rwb) +{ + rq_depth_scale_up(&rwb->rq_depth); + calc_wb_limits(rwb); rwb->unknown_cnt = 0; - - rwb->scaled_max = calc_wb_limits(rwb); - - rwb_wake_all(rwb); - - rwb_trace_step(rwb, "step up"); + rwb_trace_step(rwb, "scale up"); } -/* - * Scale rwb down. If 'hard_throttle' is set, do it quicker, since we - * had a latency violation. - */ static void scale_down(struct rq_wb *rwb, bool hard_throttle) { - /* - * Stop scaling down when we've hit the limit. This also prevents - * ->scale_step from going to crazy values, if the device can't - * keep up. - */ - if (rwb->wb_max == 1) - return; - - if (rwb->scale_step < 0 && hard_throttle) - rwb->scale_step = 0; - else - rwb->scale_step++; - - rwb->scaled_max = false; - rwb->unknown_cnt = 0; + rq_depth_scale_down(&rwb->rq_depth, hard_throttle); calc_wb_limits(rwb); - rwb_trace_step(rwb, "step down"); + rwb->unknown_cnt = 0; + rwb_wake_all(rwb); + rwb_trace_step(rwb, "scale down"); } static void rwb_arm_timer(struct rq_wb *rwb) { - if (rwb->scale_step > 0) { + struct rq_depth *rqd = &rwb->rq_depth; + + if (rqd->scale_step > 0) { /* * We should speed this up, using some variant of a fast * integer inverse square root calculation. Since we only do @@ -413,7 +322,7 @@ static void rwb_arm_timer(struct rq_wb *rwb) * though. */ rwb->cur_win_nsec = div_u64(rwb->win_nsec << 4, - int_sqrt((rwb->scale_step + 1) << 8)); + int_sqrt((rqd->scale_step + 1) << 8)); } else { /* * For step < 0, we don't want to increase/decrease the @@ -428,12 +337,13 @@ static void rwb_arm_timer(struct rq_wb *rwb) static void wb_timer_fn(struct blk_stat_callback *cb) { struct rq_wb *rwb = cb->data; + struct rq_depth *rqd = &rwb->rq_depth; unsigned int inflight = wbt_inflight(rwb); int status; status = latency_exceeded(rwb, cb->stat); - trace_wbt_timer(rwb->queue->backing_dev_info, status, rwb->scale_step, + trace_wbt_timer(rwb->rqos.q->backing_dev_info, status, rqd->scale_step, inflight); /* @@ -464,9 +374,9 @@ static void wb_timer_fn(struct blk_stat_callback *cb) * currently don't have a valid read/write sample. For that * case, slowly return to center state (step == 0). */ - if (rwb->scale_step > 0) + if (rqd->scale_step > 0) scale_up(rwb); - else if (rwb->scale_step < 0) + else if (rqd->scale_step < 0) scale_down(rwb, false); break; default: @@ -476,19 +386,50 @@ static void wb_timer_fn(struct blk_stat_callback *cb) /* * Re-arm timer, if we have IO in flight */ - if (rwb->scale_step || inflight) + if (rqd->scale_step || inflight) rwb_arm_timer(rwb); } -void wbt_update_limits(struct rq_wb *rwb) +static void __wbt_update_limits(struct rq_wb *rwb) { - rwb->scale_step = 0; - rwb->scaled_max = false; + struct rq_depth *rqd = &rwb->rq_depth; + + rqd->scale_step = 0; + rqd->scaled_max = false; + + rq_depth_calc_max_depth(rqd); calc_wb_limits(rwb); rwb_wake_all(rwb); } +void wbt_update_limits(struct request_queue *q) +{ + struct rq_qos *rqos = wbt_rq_qos(q); + if (!rqos) + return; + __wbt_update_limits(RQWB(rqos)); +} + +u64 wbt_get_min_lat(struct request_queue *q) +{ + struct rq_qos *rqos = wbt_rq_qos(q); + if (!rqos) + return 0; + return RQWB(rqos)->min_lat_nsec; +} + +void wbt_set_min_lat(struct request_queue *q, u64 val) +{ + struct rq_qos *rqos = wbt_rq_qos(q); + if (!rqos) + return; + RQWB(rqos)->min_lat_nsec = val; + RQWB(rqos)->enable_state = WBT_STATE_ON_MANUAL; + __wbt_update_limits(RQWB(rqos)); +} + + static bool close_io(struct rq_wb *rwb) { const unsigned long now = jiffies; @@ -512,7 +453,7 @@ static inline unsigned int get_limit(struct rq_wb *rwb, unsigned long rw) * IO for a bit. */ if ((rw & REQ_HIPRIO) || wb_recent_wait(rwb) || current_is_kswapd()) - limit = rwb->wb_max; + limit = rwb->rq_depth.max_depth; else if ((rw & REQ_BACKGROUND) || close_io(rwb)) { /* * If less than 100ms since we completed unrelated IO, @@ -546,7 +487,7 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw, rqw->wait.head.next != &wait->entry) return false; - return atomic_inc_below(&rqw->inflight, get_limit(rwb, rw)); + return rq_wait_inc_below(rqw, get_limit(rwb, rw)); } /* @@ -607,8 +548,10 @@ static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio) * in an irq held spinlock, if it holds one when calling this function. * If we do sleep, we'll release and re-grab it. */ -enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, spinlock_t *lock) +static enum wbt_flags wbt_wait(struct rq_qos *rqos, struct bio *bio, + spinlock_t *lock) { + struct rq_wb *rwb = RQWB(rqos); enum wbt_flags ret = 0; if (!rwb_enabled(rwb)) @@ -634,8 +577,10 @@ enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, spinlock_t *lock) return ret | WBT_TRACKED; } -void wbt_issue(struct rq_wb *rwb, struct request *rq) +void wbt_issue(struct rq_qos *rqos, struct request *rq) { + struct rq_wb *rwb = RQWB(rqos); + if (!rwb_enabled(rwb)) return; @@ -652,8 +597,9 @@ void wbt_issue(struct rq_wb *rwb, struct request *rq) } } -void wbt_requeue(struct rq_wb *rwb, struct request *rq) +void wbt_requeue(struct rq_qos *rqos, struct request *rq) { + struct rq_wb *rwb = RQWB(rqos); if (!rwb_enabled(rwb)) return; if (rq == rwb->sync_cookie) { @@ -662,39 +608,30 @@ void wbt_requeue(struct rq_wb *rwb, struct request *rq) } } -void wbt_set_queue_depth(struct rq_wb *rwb, unsigned int depth) +void wbt_set_queue_depth(struct request_queue *q, unsigned int depth) { - if (rwb) { - rwb->queue_depth = depth; - wbt_update_limits(rwb); + struct rq_qos *rqos = wbt_rq_qos(q); + if (rqos) { + RQWB(rqos)->rq_depth.queue_depth = depth; + __wbt_update_limits(RQWB(rqos)); } } -void wbt_set_write_cache(struct rq_wb *rwb, bool write_cache_on) -{ - if (rwb) - rwb->wc = write_cache_on; -} - -/* - * Disable wbt, if enabled by default. - */ -void wbt_disable_default(struct request_queue *q) +void wbt_set_write_cache(struct request_queue *q, bool write_cache_on) { - struct rq_wb *rwb = q->rq_wb; - - if (rwb && rwb->enable_state == WBT_STATE_ON_DEFAULT) - wbt_exit(q); + struct rq_qos *rqos = wbt_rq_qos(q); + if (rqos) + RQWB(rqos)->wc = write_cache_on; } -EXPORT_SYMBOL_GPL(wbt_disable_default); /* * Enable wbt if defaults are configured that way */ void wbt_enable_default(struct request_queue *q) { + struct rq_qos *rqos = wbt_rq_qos(q); /* Throttling already enabled? */ - if (q->rq_wb) + if (rqos) return; /* Queue not registered? Maybe shutting down... */ @@ -732,6 +669,41 @@ static int wbt_data_dir(const struct request *rq) return -1; } +static void wbt_exit(struct rq_qos *rqos) +{ + struct rq_wb *rwb = RQWB(rqos); + struct request_queue *q = rqos->q; + + blk_stat_remove_callback(q, rwb->cb); + blk_stat_free_callback(rwb->cb); + kfree(rwb); +} + +/* + * Disable wbt, if enabled by default. + */ +void wbt_disable_default(struct request_queue *q) +{ + struct rq_qos *rqos = wbt_rq_qos(q); + struct rq_wb *rwb; + if (!rqos) + return; + rwb = RQWB(rqos); + if (rwb->enable_state == WBT_STATE_ON_DEFAULT) + rwb->wb_normal = 0; +} +EXPORT_SYMBOL_GPL(wbt_disable_default); + + +static struct rq_qos_ops wbt_rqos_ops = { + .throttle = wbt_wait, + .issue = wbt_issue, + .requeue = wbt_requeue, + .done = wbt_done, + .cleanup = __wbt_done, + .exit = wbt_exit, +}; + int wbt_init(struct request_queue *q) { struct rq_wb *rwb; @@ -749,39 +721,29 @@ int wbt_init(struct request_queue *q) return -ENOMEM; } - for (i = 0; i < WBT_NUM_RWQ; i++) { - atomic_set(&rwb->rq_wait[i].inflight, 0); - init_waitqueue_head(&rwb->rq_wait[i].wait); - } + for (i = 0; i < WBT_NUM_RWQ; i++) + rq_wait_init(&rwb->rq_wait[i]); + rwb->rqos.id = RQ_QOS_WBT; + rwb->rqos.ops = &wbt_rqos_ops; + rwb->rqos.q = q; rwb->last_comp = rwb->last_issue = jiffies; - rwb->queue = q; rwb->win_nsec = RWB_WINDOW_NSEC; rwb->enable_state = WBT_STATE_ON_DEFAULT; - wbt_update_limits(rwb); + rwb->wc = 1; + rwb->rq_depth.default_depth = RWB_DEF_DEPTH; + __wbt_update_limits(rwb); /* * Assign rwb and add the stats callback. */ - q->rq_wb = rwb; + rq_qos_add(q, &rwb->rqos); blk_stat_add_callback(q, rwb->cb); rwb->min_lat_nsec = wbt_default_latency_nsec(q); - wbt_set_queue_depth(rwb, blk_queue_depth(q)); - wbt_set_write_cache(rwb, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); + wbt_set_queue_depth(q, blk_queue_depth(q)); + wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags)); return 0; } - -void wbt_exit(struct request_queue *q) -{ - struct rq_wb *rwb = q->rq_wb; - - if (rwb) { - blk_stat_remove_callback(q, rwb->cb); - blk_stat_free_callback(rwb->cb); - q->rq_wb = NULL; - kfree(rwb); - } -} diff --git a/block/blk-wbt.h b/block/blk-wbt.h index 5cf2398caf49..eeb031545acb 100644 --- a/block/blk-wbt.h +++ b/block/blk-wbt.h @@ -9,6 +9,7 @@ #include #include "blk-stat.h" +#include "blk-rq-qos.h" enum wbt_flags { WBT_TRACKED = 1, /* write, tracked for throttling */ @@ -33,20 +34,12 @@ enum { WBT_STATE_ON_MANUAL = 2, }; -struct rq_wait { - wait_queue_head_t wait; - atomic_t inflight; -}; - struct rq_wb { /* * Settings that govern how we throttle */ unsigned int wb_background; /* background writeback */ unsigned int wb_normal; /* normal writeback */ - unsigned int wb_max; /* max throughput writeback */ - int scale_step; - bool scaled_max; short enable_state; /* WBT_STATE_* */ @@ -65,15 +58,20 @@ struct rq_wb { void *sync_cookie; unsigned int wc; - unsigned int queue_depth; unsigned long last_issue; /* last non-throttled issue */ unsigned long last_comp; /* last non-throttled comp */ unsigned long min_lat_nsec; - struct request_queue *queue; + struct rq_qos rqos; struct rq_wait rq_wait[WBT_NUM_RWQ]; + struct rq_depth rq_depth; }; +static inline struct rq_wb *RQWB(struct rq_qos *rqos) +{ + return container_of(rqos, struct rq_wb, rqos); +} + static inline unsigned int wbt_inflight(struct rq_wb *rwb) { unsigned int i, ret = 0; @@ -84,6 +82,7 @@ static inline unsigned int wbt_inflight(struct rq_wb *rwb) return ret; } + #ifdef CONFIG_BLK_WBT static inline void wbt_track(struct request *rq, enum wbt_flags flags) @@ -91,19 +90,16 @@ static inline void wbt_track(struct request *rq, enum wbt_flags flags) rq->issue_stat.stat |= ((u64)flags) << BLK_STAT_RES_SHIFT; } -void __wbt_done(struct rq_wb *, enum wbt_flags); -void wbt_done(struct rq_wb *, struct request *); -enum wbt_flags wbt_wait(struct rq_wb *, struct bio *, spinlock_t *); int wbt_init(struct request_queue *); -void wbt_exit(struct request_queue *); -void wbt_update_limits(struct rq_wb *); -void wbt_requeue(struct rq_wb *, struct request *); -void wbt_issue(struct rq_wb *, struct request *); +void wbt_update_limits(struct request_queue *); void wbt_disable_default(struct request_queue *); void wbt_enable_default(struct request_queue *); -void wbt_set_queue_depth(struct rq_wb *, unsigned int); -void wbt_set_write_cache(struct rq_wb *, bool); +u64 wbt_get_min_lat(struct request_queue *q); +void wbt_set_min_lat(struct request_queue *q, u64 val); + +void wbt_set_queue_depth(struct request_queue *, unsigned int); +void wbt_set_write_cache(struct request_queue *, bool); u64 wbt_default_latency_nsec(struct request_queue *); @@ -112,43 +108,30 @@ u64 wbt_default_latency_nsec(struct request_queue *); static inline void wbt_track(struct request *rq, enum wbt_flags flags) { } -static inline void __wbt_done(struct rq_wb *rwb, enum wbt_flags flags) -{ -} -static inline void wbt_done(struct rq_wb *rwb, struct request *rq) -{ -} -static inline enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, - spinlock_t *lock) -{ - return 0; -} static inline int wbt_init(struct request_queue *q) { return -EINVAL; } -static inline void wbt_exit(struct request_queue *q) -{ -} -static inline void wbt_update_limits(struct rq_wb *rwb) +static inline void wbt_update_limits(struct request_queue *q) { } -static inline void wbt_requeue(struct rq_wb *rwb, struct request *rq) +static inline void wbt_disable_default(struct request_queue *q) { } -static inline void wbt_issue(struct rq_wb *rwb, struct request *rq) +static inline void wbt_enable_default(struct request_queue *q) { } -static inline void wbt_disable_default(struct request_queue *q) +static inline void wbt_set_queue_depth(struct request_queue *q, unsigned int depth) { } -static inline void wbt_enable_default(struct request_queue *q) +static inline void wbt_set_write_cache(struct request_queue *q, bool wc) { } -static inline void wbt_set_queue_depth(struct rq_wb *rwb, unsigned int depth) +static inline u64 wbt_get_min_lat(struct request_queue *q) { + return 0; } -static inline void wbt_set_write_cache(struct rq_wb *rwb, bool wc) +static inline void wbt_set_min_lat(struct request_queue *q, u64 val) { } static inline u64 wbt_default_latency_nsec(struct request_queue *q) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 50c81c2b9dea..ad7625b18fcc 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -40,7 +40,7 @@ struct bsg_job; struct blkcg_gq; struct blk_flush_queue; struct pr_ops; -struct rq_wb; +struct rq_qos; struct blk_queue_stats; struct blk_stat_callback; @@ -415,7 +415,7 @@ struct request_queue { atomic_t shared_hctx_restart; struct blk_queue_stats *stats; - struct rq_wb *rq_wb; + struct rq_qos *rq_qos; /* * If blkcg is not used, @q->root_rl serves all requests. If blkcg