From patchwork Wed Nov 27 20:18:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201786 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXCb4n3Jz9sRK; Thu, 28 Nov 2019 07:18:35 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3lf-0002w0-BL; Wed, 27 Nov 2019 20:18:31 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3ld-0002ve-8A for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:29 +0000 Received: from mail-qk1-f198.google.com ([209.85.222.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3ld-0004RS-0L for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:29 +0000 Received: by mail-qk1-f198.google.com with SMTP id d26so9042692qkk.8 for ; Wed, 27 Nov 2019 12:18:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vPKNojyiF6HWt8MsQ2ZMs7WpHNYTwKOT6mumeAqXsK8=; b=JPbIWNFLByzWy44WZB2vTDXJusGzR9YlgvfyuiATGIy3znjaaUtQseQ2QKBFQ6nK2S Hm0BzuPLeK5hBAjeBixOGFNk4T7exHqYLmB9ebOoAQbhA/U+rSti2lhtdJGi9A5UtUXb SDm5JIsNyITI9AzO4ipIFuustCAHSdXKutaJg7wDxtyldjI+auvCu6dfLsvIs6Tg7KO4 rteWbNPu27+qTSUrTkO3CSRRB73u4RcAPfqqUDyptH08LbNQoY1NYUNX6V/pD6/Elgx5 lEwPqzsV17WLSHH0vZYfTDlIrysqJgg0iJ82/cCFCTftb+RIY/Lhp4mwdahbkYbaKglX 44xA== X-Gm-Message-State: APjAAAWEPiaWa+d/JrXcW9SMTohp6+gOqJ2StBjSc79G9P9ewY1hnA5G SPHFoZ0y5ZcY/1VQgtbEPwfKA/5aP5yCZGkbSWUQixX5ZOcGlXTyrI9CA39j/G5JwJc1FKnahQO /94CuNXEDkgo7aM//Ck/6zPrK9cW/FMA/vSOkAKaX X-Received: by 2002:a0c:c588:: with SMTP id a8mr7085107qvj.9.1574885907725; Wed, 27 Nov 2019 12:18:27 -0800 (PST) X-Google-Smtp-Source: APXvYqzyNOxysSWAsJ+mbP4nDV4FBQQ2vC3dVnS/FfJodt9gJK6A36kUzR/hBU4JWg8RPjBW0Xga3g== X-Received: by 2002:a0c:c588:: with SMTP id a8mr7085074qvj.9.1574885907389; Wed, 27 Nov 2019 12:18:27 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:26 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 01/15] blk-mq: quiesce queue during switching io sched and updating nr_requests Date: Wed, 27 Nov 2019 17:18:06 -0300 Message-Id: <20191127201820.32174-2-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Ming Lei BugLink: https://bugs.launchpad.net/bugs/1848739 Dispatch may still be in-progress after queue is frozen, so we have to quiesce queue before switching IO scheduler and updating nr_requests. Also when switching io schedulers, blk_mq_run_hw_queue() may still be called somewhere(such as from nvme_reset_work()), and io scheduler's per-hctx data may not be setup yet, so cause oops even inside blk_mq_hctx_has_pending(), such as it can be run just between: ret = e->ops.mq.init_sched(q, e); AND ret = e->ops.mq.init_hctx(hctx, i) inside blk_mq_init_sched(). This reverts commit 7a148c2fcff8330(block: don't call blk_mq_quiesce_queue() after queue is frozen) basically, and makes sure blk_mq_hctx_has_pending won't be called if queue is quiesced. Reviewed-by: Christoph Hellwig Fixes: 7a148c2fcff83309(block: don't call blk_mq_quiesce_queue() after queue is frozen) Reported-by: Yi Zhang Tested-by: Yi Zhang Signed-off-by: Ming Lei Signed-off-by: Jens Axboe (cherry picked from commit 24f5a90f0d13a97b51aa79f468143fafea4246bb) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq.c | 27 ++++++++++++++++++++++++++- block/elevator.c | 2 ++ 2 files changed, 28 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 8a4268a0f6e2..a5322d5052fd 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1360,7 +1360,30 @@ EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) { - if (blk_mq_hctx_has_pending(hctx)) { + int srcu_idx; + bool need_run; + + /* + * When queue is quiesced, we may be switching io scheduler, or + * updating nr_hw_queues, or other things, and we can't run queue + * any more, even __blk_mq_hctx_has_pending() can't be called safely. + * + * And queue will be rerun in blk_mq_unquiesce_queue() if it is + * quiesced. + */ + if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { + rcu_read_lock(); + need_run = !blk_queue_quiesced(hctx->queue) && + blk_mq_hctx_has_pending(hctx); + rcu_read_unlock(); + } else { + srcu_idx = srcu_read_lock(hctx->queue_rq_srcu); + need_run = !blk_queue_quiesced(hctx->queue) && + blk_mq_hctx_has_pending(hctx); + srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx); + } + + if (need_run) { __blk_mq_delay_run_hw_queue(hctx, async, 0); return true; } @@ -2802,6 +2825,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr) return 0; blk_mq_freeze_queue(q); + blk_mq_quiesce_queue(q); ret = 0; queue_for_each_hw_ctx(q, hctx, i) { @@ -2825,6 +2849,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr) if (!ret) q->nr_requests = nr; + blk_mq_unquiesce_queue(q); blk_mq_unfreeze_queue(q); return ret; diff --git a/block/elevator.c b/block/elevator.c index 4dead1ae1270..40804c0bf4b4 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -968,6 +968,7 @@ static int elevator_switch_mq(struct request_queue *q, int ret; blk_mq_freeze_queue(q); + blk_mq_quiesce_queue(q); if (q->elevator) { if (q->elevator->registered) @@ -994,6 +995,7 @@ static int elevator_switch_mq(struct request_queue *q, blk_add_trace_msg(q, "elv switch: none"); out: + blk_mq_unquiesce_queue(q); blk_mq_unfreeze_queue(q); return ret; } From patchwork Wed Nov 27 20:18:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201787 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXCd6MWYz9sRW; Thu, 28 Nov 2019 07:18:37 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3lh-0002xZ-OZ; Wed, 27 Nov 2019 20:18:33 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lf-0002w2-GR for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:31 +0000 Received: from mail-qt1-f197.google.com ([209.85.160.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lf-0004Rd-6c for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:31 +0000 Received: by mail-qt1-f197.google.com with SMTP id t20so3341067qtr.3 for ; Wed, 27 Nov 2019 12:18:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hewkezUjd9RPTf5ieZDXsfDvOBkfj7j7N8ZwXwj/t1A=; b=JhPZ9MDxO8KFJ+4/IkKXSLcy1xgdGiko4/8DLj55DrenMJHx+JYzpcyEaupfz4NCIT LBhZuTKyrPLAMvNe78SX2wfq7jsMm7ibcAawU027QLhURQJL7ShmZ8lmGkSBpRJNwy0q HLpDFSZ2zMYeyZM9hpLsMF8tM0vJBY8WoOcWKURnJ3qAL1NT89rJKqTcijhyDrNQU4JC gOLvP3n1Q00N/I20/+skDMArxZJiY7VphaseDmHX67ueuhMX6sOxVaayfQE51q293yLa jz7xALTPvZBPa8IeB98QGjtT08Tpr/snLE+nS5sg/ozg7bQqoUzJEkaT3/7VEYO5IqhB MG7g== X-Gm-Message-State: APjAAAUVH0eLyVTYHHe3xf5DoudBWxNwWIVqd48GKrDC3KgTthvPfEND Up+7irUxHG5CzD3KZHtccAf1wTox/FNJdEpah75YLuithrnbPkuYTLNsMNQr0wBExla0kiiXKT+ aA/Ovcy/LqMlH/AvIuO978Nue3xwMZbFWlD5ziWts X-Received: by 2002:ac8:6b84:: with SMTP id z4mr9119194qts.4.1574885909917; Wed, 27 Nov 2019 12:18:29 -0800 (PST) X-Google-Smtp-Source: APXvYqxSdZF0+qk4yAQ2edQZV+umb8PGNUcU5ntQZ5UZtOE1cRBf+/YQTqSjHsQYp0mDGDL8/cZHlw== X-Received: by 2002:ac8:6b84:: with SMTP id z4mr9119168qts.4.1574885909530; Wed, 27 Nov 2019 12:18:29 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:28 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 02/15] blk-mq: move hctx lock/unlock into a helper Date: Wed, 27 Nov 2019 17:18:07 -0300 Message-Id: <20191127201820.32174-3-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jens Axboe BugLink: https://bugs.launchpad.net/bugs/1848739 Move the RCU vs SRCU logic into lock/unlock helpers, which makes the actual functional bits within the locked region much easier to read. tj: Reordered in front of timeout revamp patches and added the missing blk_mq_run_hw_queue() conversion. Signed-off-by: Jens Axboe Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe (cherry picked from commit 04ced159cec863f9bc27015d6b970bb13cfa6176) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq.c | 66 ++++++++++++++++++++++++-------------------------- 1 file changed, 32 insertions(+), 34 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index a5322d5052fd..0d46183a0bd5 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -577,6 +577,22 @@ static void __blk_mq_complete_request(struct request *rq) put_cpu(); } +static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx) +{ + if (!(hctx->flags & BLK_MQ_F_BLOCKING)) + rcu_read_unlock(); + else + srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx); +} + +static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx) +{ + if (!(hctx->flags & BLK_MQ_F_BLOCKING)) + rcu_read_lock(); + else + *srcu_idx = srcu_read_lock(hctx->queue_rq_srcu); +} + /** * blk_mq_complete_request - end I/O on a request * @rq: the request being processed @@ -1263,17 +1279,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) */ WARN_ON_ONCE(in_interrupt()); - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { - rcu_read_lock(); - blk_mq_sched_dispatch_requests(hctx); - rcu_read_unlock(); - } else { - might_sleep(); + might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING); - srcu_idx = srcu_read_lock(hctx->queue_rq_srcu); - blk_mq_sched_dispatch_requests(hctx); - srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx); - } + hctx_lock(hctx, &srcu_idx); + blk_mq_sched_dispatch_requests(hctx); + hctx_unlock(hctx, srcu_idx); } static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx) @@ -1371,17 +1381,10 @@ bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) * And queue will be rerun in blk_mq_unquiesce_queue() if it is * quiesced. */ - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { - rcu_read_lock(); - need_run = !blk_queue_quiesced(hctx->queue) && - blk_mq_hctx_has_pending(hctx); - rcu_read_unlock(); - } else { - srcu_idx = srcu_read_lock(hctx->queue_rq_srcu); - need_run = !blk_queue_quiesced(hctx->queue) && - blk_mq_hctx_has_pending(hctx); - srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx); - } + hctx_lock(hctx, &srcu_idx); + need_run = !blk_queue_quiesced(hctx->queue) && + blk_mq_hctx_has_pending(hctx); + hctx_unlock(hctx, srcu_idx); if (need_run) { __blk_mq_delay_run_hw_queue(hctx, async, 0); @@ -1691,7 +1694,7 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq) static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *rq, - blk_qc_t *cookie, bool may_sleep) + blk_qc_t *cookie) { struct request_queue *q = rq->q; struct blk_mq_queue_data bd = { @@ -1741,25 +1744,20 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, } insert: - blk_mq_sched_insert_request(rq, false, run_queue, false, may_sleep); + blk_mq_sched_insert_request(rq, false, run_queue, false, + hctx->flags & BLK_MQ_F_BLOCKING); } static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_qc_t *cookie) { - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) { - rcu_read_lock(); - __blk_mq_try_issue_directly(hctx, rq, cookie, false); - rcu_read_unlock(); - } else { - unsigned int srcu_idx; + int srcu_idx; - might_sleep(); + might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING); - srcu_idx = srcu_read_lock(hctx->queue_rq_srcu); - __blk_mq_try_issue_directly(hctx, rq, cookie, true); - srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx); - } + hctx_lock(hctx, &srcu_idx); + __blk_mq_try_issue_directly(hctx, rq, cookie); + hctx_unlock(hctx, srcu_idx); } static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) From patchwork Wed Nov 27 20:18:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201788 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXCg0b7Jz9sR8; Thu, 28 Nov 2019 07:18:39 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3li-0002yV-Sn; Wed, 27 Nov 2019 20:18:34 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lh-0002xQ-Co for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:33 +0000 Received: from mail-qk1-f199.google.com ([209.85.222.199]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lh-0004Rp-5J for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:33 +0000 Received: by mail-qk1-f199.google.com with SMTP id q13so11629181qke.11 for ; Wed, 27 Nov 2019 12:18:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=a1AYVFvFBIEqv8D2rwQABrlK9dr1Zarrb4YG3rJJc3k=; b=fP1n7/XWCaPyIS+r0kLRd3uP2Uxz0DkVS+JxAumEttGCZR7BAXls0Vlspy78z137ux l/Bq4hxO5DhIfw55CWc0aznN3VjhjmhUDth/lhkhz/aCmPuF2iFyokdxn4dog00H97Z2 lcwlGAa98PTzRtyhU1NBIsLsQ1sSxRwh7+ZnTvyoFggfrl/8I2p13IX5PXzBbrad8ohF 4GI9d1NPuh9hYGfXG4T6BXE4VSsUul1qZSJm6Tvsrg1VFg/0Qv090vgVdYa6t0bVqYqJ 6mdDOSKqSXeLizuXMrUrr1PygNqTEGEYbt0u6MOytbOZK4SE7trwvZfzgHJihMVYsvUw ohVw== X-Gm-Message-State: APjAAAURGQ2gA+37mJ/a6BUhDTmu1sZs8Mi1yCGOVyTJWHoyG8OKJnC4 hqyVTtlrfCVEuGUYJBm0Hz+yr4dWyWaWKmpqBGpJkXOddWw6pMvIeBQNhs3dwIMNrITwKvtI/5J hL+E5GgQ9hawuBPUwmHjiWmfv1kl+Pziqwid4O+AK X-Received: by 2002:aed:3c0a:: with SMTP id t10mr43394467qte.173.1574885911897; Wed, 27 Nov 2019 12:18:31 -0800 (PST) X-Google-Smtp-Source: APXvYqxv/1S3L98OrS48YNN2UPTLuVPsQjYqobiPjY0+QstMb8v26DDVlV4Kafigaqm/4kNggmZaQQ== X-Received: by 2002:aed:3c0a:: with SMTP id t10mr43394447qte.173.1574885911590; Wed, 27 Nov 2019 12:18:31 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:30 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 03/15] blk-mq: factor out a few helpers from __blk_mq_try_issue_directly Date: Wed, 27 Nov 2019 17:18:08 -0300 Message-Id: <20191127201820.32174-4-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Mike Snitzer BugLink: https://bugs.launchpad.net/bugs/1848739 No functional change. Just makes code flow more logically. In following commit, __blk_mq_try_issue_directly() will be used to return the dispatch result (blk_status_t) to DM. DM needs this information to improve IO merging. Signed-off-by: Mike Snitzer Signed-off-by: Jens Axboe (cherry picked from commit 0f95549c0ea1e8075ae049202088b2c6a0cb40ad) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq.c | 79 +++++++++++++++++++++++++++++++++----------------- 1 file changed, 52 insertions(+), 27 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 0d46183a0bd5..5f12f803af10 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1692,9 +1692,9 @@ static blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx, struct request *rq) return blk_tag_to_qc_t(rq->internal_tag, hctx->queue_num, true); } -static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, - struct request *rq, - blk_qc_t *cookie) +static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, + struct request *rq, + blk_qc_t *cookie) { struct request_queue *q = rq->q; struct blk_mq_queue_data bd = { @@ -1703,6 +1703,43 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, }; blk_qc_t new_cookie; blk_status_t ret; + + new_cookie = request_to_qc_t(hctx, rq); + + /* + * For OK queue, we are done. For error, caller may kill it. + * Any other error (busy), just add it to our list as we + * previously would have done. + */ + ret = q->mq_ops->queue_rq(hctx, &bd); + switch (ret) { + case BLK_STS_OK: + *cookie = new_cookie; + break; + case BLK_STS_RESOURCE: + __blk_mq_requeue_request(rq); + break; + default: + *cookie = BLK_QC_T_NONE; + break; + } + + return ret; +} + +static void __blk_mq_fallback_to_insert(struct blk_mq_hw_ctx *hctx, + struct request *rq, + bool run_queue) +{ + blk_mq_sched_insert_request(rq, false, run_queue, false, + hctx->flags & BLK_MQ_F_BLOCKING); +} + +static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, + struct request *rq, + blk_qc_t *cookie) +{ + struct request_queue *q = rq->q; bool run_queue = true; /* RCU or SRCU read lock is needed before checking quiesced flag */ @@ -1722,41 +1759,29 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, goto insert; } - new_cookie = request_to_qc_t(hctx, rq); - - /* - * For OK queue, we are done. For error, kill it. Any other - * error (busy), just add it to our list as we previously - * would have done - */ - ret = q->mq_ops->queue_rq(hctx, &bd); - switch (ret) { - case BLK_STS_OK: - *cookie = new_cookie; - return; - case BLK_STS_RESOURCE: - __blk_mq_requeue_request(rq); - goto insert; - default: - *cookie = BLK_QC_T_NONE; - blk_mq_end_request(rq, ret); - return; - } - + return __blk_mq_issue_directly(hctx, rq, cookie); insert: - blk_mq_sched_insert_request(rq, false, run_queue, false, - hctx->flags & BLK_MQ_F_BLOCKING); + __blk_mq_fallback_to_insert(hctx, rq, run_queue); + + return BLK_STS_OK; } static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_qc_t *cookie) { + blk_status_t ret; int srcu_idx; might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING); hctx_lock(hctx, &srcu_idx); - __blk_mq_try_issue_directly(hctx, rq, cookie); + + ret = __blk_mq_try_issue_directly(hctx, rq, cookie); + if (ret == BLK_STS_RESOURCE) + __blk_mq_fallback_to_insert(hctx, rq, true); + else if (ret != BLK_STS_OK) + blk_mq_end_request(rq, ret); + hctx_unlock(hctx, srcu_idx); } From patchwork Wed Nov 27 20:18:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201789 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXCm1byGz9sRD; Thu, 28 Nov 2019 07:18:44 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3lo-00031t-7i; Wed, 27 Nov 2019 20:18:40 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lj-0002zC-IE for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:35 +0000 Received: from mail-qv1-f69.google.com ([209.85.219.69]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lj-0004Rx-8J for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:35 +0000 Received: by mail-qv1-f69.google.com with SMTP id w7so15556250qvs.15 for ; Wed, 27 Nov 2019 12:18:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wugLsXADlVk64O7jki6nnaPfPpEV5NpLWPlEzuvrKZc=; b=gIrV9v8sziRfgg6T1TwIhjs9xrGSmWE0i2nuEfPwy+2pLP2nS00blvjYZUvMml3Qqq L+w0yU9xBmx8jvePGTsVblDgPVu7df2BF5+TlVaNQPe49svE5ErBkhYcLgIYKg9LTO/J fqmFAoHN/xZ+VtuNtMiGmZ/53T/KFcIfgvCB9JvCD3OrryoWjdB17QTlHRNdlIIlKvRu /lqpq0Pc+Zw2Uz2CF9omkHvOPAy8QEYxqshzC5q1mPmnFTT39AY9ooSsy5OXzhR0kSyZ 2Md2xlCegi7+ZKdV96V1Z2sdpQL4Gr8Rl3QYey5gwOoar6VdpF1pzsch2v5NByhaPcYb VvTg== X-Gm-Message-State: APjAAAVQyCBZ3j6mNDc1jCwQ6gqX8NbRcrIWG+Co16FYKneYMfc2HAks Ym6WHaoso5W3oqbd6FELcfq86v8TLcH7EWLIaqZIxBg3p5vhs2qu8RveJARwfdN7H/fFuoShGHO TjCZUpzmSjuG6dlBWeRVmvYDpmPHscH8ZsZBsjNTF X-Received: by 2002:ac8:424e:: with SMTP id r14mr24800185qtm.193.1574885913952; Wed, 27 Nov 2019 12:18:33 -0800 (PST) X-Google-Smtp-Source: APXvYqzZRtqtQbS8pWEkCNXqCaHRjya0R7NK6GJwShQdrykDFyjwnA0Y/AvRZrz8JnvCGr/aMUC5/g== X-Received: by 2002:ac8:424e:: with SMTP id r14mr24800159qtm.193.1574885913584; Wed, 27 Nov 2019 12:18:33 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:32 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 04/15] blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback Date: Wed, 27 Nov 2019 17:18:09 -0300 Message-Id: <20191127201820.32174-5-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Ming Lei BugLink: https://bugs.launchpad.net/bugs/1848739 blk_insert_cloned_request() is called in the fast path of a dm-rq driver (e.g. blk-mq request-based DM mpath). blk_insert_cloned_request() uses blk_mq_request_bypass_insert() to directly append the request to the blk-mq hctx->dispatch_list of the underlying queue. 1) This way isn't efficient enough because the hctx spinlock is always used. 2) With blk_insert_cloned_request(), we completely bypass underlying queue's elevator and depend on the upper-level dm-rq driver's elevator to schedule IO. But dm-rq currently can't get the underlying queue's dispatch feedback at all. Without knowing whether a request was issued or not (e.g. due to underlying queue being busy) the dm-rq elevator will not be able to provide effective IO merging (as a side-effect of dm-rq currently blindly destaging a request from its elevator only to requeue it after a delay, which kills any opportunity for merging). This obviously causes very bad sequential IO performance. Fix this by updating blk_insert_cloned_request() to use blk_mq_request_direct_issue(). blk_mq_request_direct_issue() allows a request to be issued directly to the underlying queue and returns the dispatch feedback (blk_status_t). If blk_mq_request_direct_issue() returns BLK_SYS_RESOURCE the dm-rq driver will now use DM_MAPIO_REQUEUE to _not_ destage the request. Whereby preserving the opportunity to merge IO. With this, request-based DM's blk-mq sequential IO performance is vastly improved (as much as 3X in mpath/virtio-scsi testing). Signed-off-by: Ming Lei [blk-mq.c changes heavily influenced by Ming Lei's initial solution, but they were refactored to make them less fragile and easier to read/review] Signed-off-by: Mike Snitzer Signed-off-by: Jens Axboe (cherry picked from commit 396eaf21ee17c476e8f66249fb1f4a39003d0ab4) Signed-off-by: Marcelo Henrique Cerri --- block/blk-core.c | 3 +-- block/blk-mq.c | 37 +++++++++++++++++++++++++++++-------- block/blk-mq.h | 3 +++ drivers/md/dm-rq.c | 19 ++++++++++++++++--- 4 files changed, 49 insertions(+), 13 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 6cce18587bdd..1ae88e0a7ed4 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2507,8 +2507,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * * bypass a potential scheduler on the bottom device for * insert. */ - blk_mq_request_bypass_insert(rq, true); - return BLK_STS_OK; + return blk_mq_request_direct_issue(rq); } spin_lock_irqsave(q->queue_lock, flags); diff --git a/block/blk-mq.c b/block/blk-mq.c index 5f12f803af10..90050e6ac9bd 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1729,15 +1729,19 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, static void __blk_mq_fallback_to_insert(struct blk_mq_hw_ctx *hctx, struct request *rq, - bool run_queue) + bool run_queue, bool bypass_insert) { - blk_mq_sched_insert_request(rq, false, run_queue, false, - hctx->flags & BLK_MQ_F_BLOCKING); + if (!bypass_insert) + blk_mq_sched_insert_request(rq, false, run_queue, false, + hctx->flags & BLK_MQ_F_BLOCKING); + else + blk_mq_request_bypass_insert(rq, run_queue); } static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *rq, - blk_qc_t *cookie) + blk_qc_t *cookie, + bool bypass_insert) { struct request_queue *q = rq->q; bool run_queue = true; @@ -1748,7 +1752,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, goto insert; } - if (q->elevator) + if (q->elevator && !bypass_insert) goto insert; if (!blk_mq_get_dispatch_budget(hctx)) @@ -1761,7 +1765,9 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, return __blk_mq_issue_directly(hctx, rq, cookie); insert: - __blk_mq_fallback_to_insert(hctx, rq, run_queue); + __blk_mq_fallback_to_insert(hctx, rq, run_queue, bypass_insert); + if (bypass_insert) + return BLK_STS_RESOURCE; return BLK_STS_OK; } @@ -1776,15 +1782,30 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, hctx_lock(hctx, &srcu_idx); - ret = __blk_mq_try_issue_directly(hctx, rq, cookie); + ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false); if (ret == BLK_STS_RESOURCE) - __blk_mq_fallback_to_insert(hctx, rq, true); + __blk_mq_fallback_to_insert(hctx, rq, true, false); else if (ret != BLK_STS_OK) blk_mq_end_request(rq, ret); hctx_unlock(hctx, srcu_idx); } +blk_status_t blk_mq_request_direct_issue(struct request *rq) +{ + blk_status_t ret; + int srcu_idx; + blk_qc_t unused_cookie; + struct blk_mq_ctx *ctx = rq->mq_ctx; + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); + + hctx_lock(hctx, &srcu_idx); + ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true); + hctx_unlock(hctx, srcu_idx); + + return ret; +} + static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) { const int is_sync = op_is_sync(bio->bi_opf); diff --git a/block/blk-mq.h b/block/blk-mq.h index 7c528c07fe07..0daa9f2c3d61 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -60,6 +60,9 @@ void blk_mq_request_bypass_insert(struct request *rq, bool run_queue); void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list); +/* Used by blk_insert_cloned_request() to issue request directly */ +blk_status_t blk_mq_request_direct_issue(struct request *rq); + /* * CPU -> queue mappings */ diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 6f3c88b26d55..ecbf1f422f94 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -395,7 +395,7 @@ static void end_clone_request(struct request *clone, blk_status_t error) dm_complete_request(tio->orig, error); } -static void dm_dispatch_clone_request(struct request *clone, struct request *rq) +static blk_status_t dm_dispatch_clone_request(struct request *clone, struct request *rq) { blk_status_t r; @@ -404,9 +404,10 @@ static void dm_dispatch_clone_request(struct request *clone, struct request *rq) clone->start_time = jiffies; r = blk_insert_cloned_request(clone->q, clone); - if (r) + if (r != BLK_STS_OK && r != BLK_STS_RESOURCE) /* must complete clone in terms of original request */ dm_complete_request(rq, r); + return r; } static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig, @@ -476,8 +477,10 @@ static int map_request(struct dm_rq_target_io *tio) struct mapped_device *md = tio->md; struct request *rq = tio->orig; struct request *clone = NULL; + blk_status_t ret; r = ti->type->clone_and_map_rq(ti, rq, &tio->info, &clone); +check_again: switch (r) { case DM_MAPIO_SUBMITTED: /* The target has taken the I/O to submit by itself later */ @@ -492,7 +495,17 @@ static int map_request(struct dm_rq_target_io *tio) /* The target has remapped the I/O so dispatch it */ trace_block_rq_remap(clone->q, clone, disk_devt(dm_disk(md)), blk_rq_pos(rq)); - dm_dispatch_clone_request(clone, rq); + ret = dm_dispatch_clone_request(clone, rq); + if (ret == BLK_STS_RESOURCE) { + blk_rq_unprep_clone(clone); + tio->ti->type->release_clone_rq(clone); + tio->clone = NULL; + if (!rq->q->mq_ops) + r = DM_MAPIO_DELAY_REQUEUE; + else + r = DM_MAPIO_REQUEUE; + goto check_again; + } break; case DM_MAPIO_REQUEUE: /* The target wants to requeue the I/O */ From patchwork Wed Nov 27 20:18:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201790 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXCp49nFz9sRK; Thu, 28 Nov 2019 07:18:46 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3lp-000336-S5; Wed, 27 Nov 2019 20:18:41 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3lm-00030Z-2y for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:38 +0000 Received: from mail-qv1-f72.google.com ([209.85.219.72]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3ll-0004S5-9n for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:37 +0000 Received: by mail-qv1-f72.google.com with SMTP id m12so15540166qvv.8 for ; Wed, 27 Nov 2019 12:18:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XnhzLW/uCsHAJZh/P3dCW5d1prbJgBBVD8JsdlxaqPM=; b=sQqhWZHRD31Vne2HUPOEgzqUDdTDLCerAqS3dS/Z7mXMPk1DSb9l5J+GQwok2I3z6X UZh9zfNHMyvkCE/zI1y6gzfqjr9POjO2GMmS4RaHMHsMrYH9LU/h+lsuMHnp8Og6cZT4 IqFs/phDbN3eHLFU/dbmwxaowzE6BDBRDX2c36tPKa4uhoiGeRGguzxOe1fya8p54reW eCwVeYWDgtAQLRzo5aDzT0CACNAbSo6tL27cmz1NlstMJBCfBQXTX4AmT0kpiCmvubUN AYrkTA0xzQWwcuNtkHqxi2XYIaXNFlUk99lcGnp+x6JB7yfuYfFZbYbRDTU1A47v/6z6 Phlw== X-Gm-Message-State: APjAAAXQ58QAqg+O/Ma+Vk0KrSSFM6mKbTj8t8rd9A3TPM2q4kt6EzvS en2u2rQRG8gI2kkxrpGje60KhT0oNxhHELmdUl311VAO/9b5jom8iCsOfSUg+xWYsGgKFjenvjb wcmuwdjetQLsFs+d5E/6Dmakz5Dq0oqi5Foxgg1/d X-Received: by 2002:ac8:1afc:: with SMTP id h57mr31530899qtk.250.1574885916031; Wed, 27 Nov 2019 12:18:36 -0800 (PST) X-Google-Smtp-Source: APXvYqyBkjzCv7rBgHaGrblBDZGb2u8yodeHRp0C8ADjouwMMcP9uFPEo2QtTASGinChqW8+E+dw/w== X-Received: by 2002:ac8:1afc:: with SMTP id h57mr31530875qtk.250.1574885915726; Wed, 27 Nov 2019 12:18:35 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:34 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 05/15] dm mpath: fix missing call of path selector type->end_io Date: Wed, 27 Nov 2019 17:18:10 -0300 Message-Id: <20191127201820.32174-6-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Yufen Yu BugLink: https://bugs.launchpad.net/bugs/1848739 After commit 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback"), map_request() will requeue the tio when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. Thus, if device driver status is error, a tio may be requeued multiple times until the return value is not DM_MAPIO_REQUEUE. That means type->start_io may be called multiple times, while type->end_io is only called when IO complete. In fact, even without commit 396eaf21ee17, setup_clone() failure can also cause tio requeue and associated missed call to type->end_io. The service-time path selector selects path based on in_flight_size, which is increased by st_start_io() and decreased by st_end_io(). Missed calls to st_end_io() can lead to in_flight_size count error and will cause the selector to make the wrong choice. In addition, queue-length path selector will also be affected. To fix the problem, call type->end_io in ->release_clone_rq before tio requeue. map_info is passed to ->release_clone_rq() for map_request() error path that result in requeue. Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Cc: stable@vger.kernl.org Signed-off-by: Yufen Yu Signed-off-by: Mike Snitzer (cherry picked from commit 5de719e3d01b4abe0de0d7b857148a880ff2a90b) [marcelo.cerri@canonical.com: This patch was already partially applied before via upstream stable updates. This patch adds the remaining changes after the inclusion of 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback").] Signed-off-by: Marcelo Henrique Cerri --- drivers/md/dm-rq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index ecbf1f422f94..134cd855ba35 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -498,7 +498,7 @@ static int map_request(struct dm_rq_target_io *tio) ret = dm_dispatch_clone_request(clone, rq); if (ret == BLK_STS_RESOURCE) { blk_rq_unprep_clone(clone); - tio->ti->type->release_clone_rq(clone); + tio->ti->type->release_clone_rq(clone, &tio->info); tio->clone = NULL; if (!rq->q->mq_ops) r = DM_MAPIO_DELAY_REQUEUE; From patchwork Wed Nov 27 20:18:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201791 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXCq1BWpz9sRQ; Thu, 28 Nov 2019 07:18:47 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3lq-00033f-Fd; Wed, 27 Nov 2019 20:18:42 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lo-00031R-3a for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:40 +0000 Received: from mail-qk1-f198.google.com ([209.85.222.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3ln-0004SH-8I for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:39 +0000 Received: by mail-qk1-f198.google.com with SMTP id s144so14602641qke.20 for ; Wed, 27 Nov 2019 12:18:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=P/ZpPR6ObuSQFg8uBJuqZUzlTsGoCHCwm/MZo8s3Hso=; b=CSzTcslrS7MdmF1LkNX1sPZx5zVGdfy6dBAwk+tU3khQRNdFj/Y3BEy5DHOvrCQC+c aEC2kvSmgXxSaRG0uPpwgpVyk+Uju2pkGi/WuJqqMMb6gPpG3rTl68QpUH1yzqlQM2Tg mQ31bGErC5eSaIj2rchVZuk+8Nk1RKUDjwjPSk7jIpCK2pesfHgGdnp9ICy8fGdiA+Jr mGvZgEYOvr4LeXRtCgPcSrs62KuJfJbX1vVeDrFN6zBNBY9+3p7mOfxg20eYVGlEe3JN cetQhp0BZ7tgPy7FJRiqew0eQxowvaN3PJeWcLnyuCdlRfYColirtY3sURU0zSU0Vp0S /yJA== X-Gm-Message-State: APjAAAUaXV+lWGW1LEfUgK3vwVdkQLVrYCEMGvcEkRpA/VWrxYVB8g+z vU09Kw6cr4twozQ9AkmKPk8EIIBwbzzNre4xg78sax5Nn6ZUpzCHTh81hawoIV1zJxTnP9nZoX8 Xd7LsCt/3gT+ELNG9eIHrildeqeDWALaLQBR6ShA9 X-Received: by 2002:ac8:41c3:: with SMTP id o3mr26352793qtm.88.1574885917996; Wed, 27 Nov 2019 12:18:37 -0800 (PST) X-Google-Smtp-Source: APXvYqz33q28pFCAiFrOPxqNhy5Mq3QFiQwwUthaoMwnJxFavSTG63YruyzfK7bKN6MebjhIeJjuXA== X-Received: by 2002:ac8:41c3:: with SMTP id o3mr26352773qtm.88.1574885917735; Wed, 27 Nov 2019 12:18:37 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:36 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 06/15] blk-mq-sched: remove unused 'can_block' arg from blk_mq_sched_insert_request Date: Wed, 27 Nov 2019 17:18:11 -0300 Message-Id: <20191127201820.32174-7-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Mike Snitzer BugLink: https://bugs.launchpad.net/bugs/1848739 After commit: 923218f6166a ("blk-mq: don't allocate driver tag upfront for flush rq") we no longer use the 'can_block' argument in blk_mq_sched_insert_request(). Kill it. Signed-off-by: Mike Snitzer Added actual commit message as to why it's being removed. Signed-off-by: Jens Axboe (cherry picked from commit 9e97d2951a7e6ee6e204f87f6bda4ff754a8cede) [marcelo.cerri@canonical.com: fixed conflict in blk_mq_requeue_work() because the commit aef1897cd36d ("blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue") was already applied] Signed-off-by: Marcelo Henrique Cerri --- block/blk-exec.c | 2 +- block/blk-mq-sched.c | 2 +- block/blk-mq-sched.h | 2 +- block/blk-mq.c | 16 +++++++--------- 4 files changed, 10 insertions(+), 12 deletions(-) diff --git a/block/blk-exec.c b/block/blk-exec.c index 5c0f3dc446dc..f7b292f12449 100644 --- a/block/blk-exec.c +++ b/block/blk-exec.c @@ -61,7 +61,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk, * be reused after dying flag is set */ if (q->mq_ops) { - blk_mq_sched_insert_request(rq, at_head, true, false, false); + blk_mq_sched_insert_request(rq, at_head, true, false); return; } diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index fc64558241c9..f3380331e5f3 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -429,7 +429,7 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx) } void blk_mq_sched_insert_request(struct request *rq, bool at_head, - bool run_queue, bool async, bool can_block) + bool run_queue, bool async) { struct request_queue *q = rq->q; struct elevator_queue *e = q->elevator; diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index ba1d1418a96d..1e9c9018ace1 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -18,7 +18,7 @@ bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq); void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx); void blk_mq_sched_insert_request(struct request *rq, bool at_head, - bool run_queue, bool async, bool can_block); + bool run_queue, bool async); void blk_mq_sched_insert_requests(struct request_queue *q, struct blk_mq_ctx *ctx, struct list_head *list, bool run_queue_async); diff --git a/block/blk-mq.c b/block/blk-mq.c index 90050e6ac9bd..9abc5cbb58f1 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -733,13 +733,13 @@ static void blk_mq_requeue_work(struct work_struct *work) if (rq->rq_flags & RQF_DONTPREP) blk_mq_request_bypass_insert(rq, false); else - blk_mq_sched_insert_request(rq, true, false, false, true); + blk_mq_sched_insert_request(rq, true, false, false); } while (!list_empty(&rq_list)) { rq = list_entry(rq_list.next, struct request, queuelist); list_del_init(&rq->queuelist); - blk_mq_sched_insert_request(rq, false, false, false, true); + blk_mq_sched_insert_request(rq, false, false, false); } blk_mq_run_hw_queues(q, false); @@ -1727,13 +1727,11 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, return ret; } -static void __blk_mq_fallback_to_insert(struct blk_mq_hw_ctx *hctx, - struct request *rq, +static void __blk_mq_fallback_to_insert(struct request *rq, bool run_queue, bool bypass_insert) { if (!bypass_insert) - blk_mq_sched_insert_request(rq, false, run_queue, false, - hctx->flags & BLK_MQ_F_BLOCKING); + blk_mq_sched_insert_request(rq, false, run_queue, false); else blk_mq_request_bypass_insert(rq, run_queue); } @@ -1765,7 +1763,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, return __blk_mq_issue_directly(hctx, rq, cookie); insert: - __blk_mq_fallback_to_insert(hctx, rq, run_queue, bypass_insert); + __blk_mq_fallback_to_insert(rq, run_queue, bypass_insert); if (bypass_insert) return BLK_STS_RESOURCE; @@ -1784,7 +1782,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false); if (ret == BLK_STS_RESOURCE) - __blk_mq_fallback_to_insert(hctx, rq, true, false); + __blk_mq_fallback_to_insert(rq, true, false); else if (ret != BLK_STS_OK) blk_mq_end_request(rq, ret); @@ -1914,7 +1912,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) } else if (q->elevator) { blk_mq_put_ctx(data.ctx); blk_mq_bio_to_request(rq, bio); - blk_mq_sched_insert_request(rq, false, true, true, true); + blk_mq_sched_insert_request(rq, false, true, true); } else { blk_mq_put_ctx(data.ctx); blk_mq_bio_to_request(rq, bio); From patchwork Wed Nov 27 20:18:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201792 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXCv0jh1z9sR8; Thu, 28 Nov 2019 07:18:51 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3lt-00036x-Vf; Wed, 27 Nov 2019 20:18:45 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lq-00032v-22 for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:42 +0000 Received: from mail-qv1-f71.google.com ([209.85.219.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lp-0004T3-97 for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:41 +0000 Received: by mail-qv1-f71.google.com with SMTP id y9so3763356qvi.10 for ; Wed, 27 Nov 2019 12:18:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+hUY5BRXgqKe0zhfFZMF+a1iR/WWLdzx7fa9MYQE2fg=; b=YbYeD8epUBhObtX98VvuzO5rGkDUngcfjxhW/yhKkVYMfMJfnDbHtp98qkARAlB3cw 0tQuVjZ+NUZVjYEXUk5dgyUlkOY93+HBekbDjhxcj5Ut+e4VjPxm+5plSGwQn5SeLMVD EHDKaO+sX5iCIvPDT4yZAviizBlnGU37N8xUyn0VCSBukhseqg0JoeDn7lQPS6Lpl/yl e6PltEUZhNZPyToZpEzsKUd/ggB4wkL2HcFkUMfUKDo27PslY/H9GhKtxS2Bjz0ojo3A 37yRur5S6Pk3EKM/T6LGMO9XEd5LkauAiq4O88mfRIIkwZTlFsBz+Tt5vQJFokg7FvNG 1LVg== X-Gm-Message-State: APjAAAXSGfIoHN6jvAECZY5tQZPGrWts1k+Gt2/qcvs11zyr+V+k0lMn rtIS/jvqZUunetrJdZz3QxKlBPs3pWabZq4zFfnDCrXBrWCNvK8Axuw3BKPdjDauKVL2R2j+Hb1 eG09wvZz4/UdCfn06vJ7NIetNxEd2or9l3GXCdPLV X-Received: by 2002:ac8:73d0:: with SMTP id v16mr25405664qtp.335.1574885920031; Wed, 27 Nov 2019 12:18:40 -0800 (PST) X-Google-Smtp-Source: APXvYqzzc2D+ORuBSXtRXLniA170jzHeDhro050YgC05AG0k9T1Jz5SOdRApgycnDZC/lMLQhXlGFQ== X-Received: by 2002:ac8:73d0:: with SMTP id v16mr25405639qtp.335.1574885919714; Wed, 27 Nov 2019 12:18:39 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:38 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 07/15] blk-mq: don't dispatch request in blk_mq_request_direct_issue if queue is busy Date: Wed, 27 Nov 2019 17:18:12 -0300 Message-Id: <20191127201820.32174-8-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Ming Lei BugLink: https://bugs.launchpad.net/bugs/1848739 If we run into blk_mq_request_direct_issue(), when queue is busy, we don't want to dispatch this request into hctx->dispatch_list, and what we need to do is to return the queue busy info to caller, so that caller can deal with it well. Fixes: 396eaf21ee ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Reported-by: Laurence Oberman Reviewed-by: Mike Snitzer Signed-off-by: Ming Lei Signed-off-by: Jens Axboe (cherry picked from commit 23d4ee19e789ae3dce3e04bd24e3d1537965475f) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 9abc5cbb58f1..d4945ffaf034 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1727,15 +1727,6 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, return ret; } -static void __blk_mq_fallback_to_insert(struct request *rq, - bool run_queue, bool bypass_insert) -{ - if (!bypass_insert) - blk_mq_sched_insert_request(rq, false, run_queue, false); - else - blk_mq_request_bypass_insert(rq, run_queue); -} - static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_qc_t *cookie, @@ -1744,9 +1735,16 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request_queue *q = rq->q; bool run_queue = true; - /* RCU or SRCU read lock is needed before checking quiesced flag */ + /* + * RCU or SRCU read lock is needed before checking quiesced flag. + * + * When queue is stopped or quiesced, ignore 'bypass_insert' from + * blk_mq_request_direct_issue(), and return BLK_STS_OK to caller, + * and avoid driver to try to dispatch again. + */ if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)) { run_queue = false; + bypass_insert = false; goto insert; } @@ -1763,10 +1761,10 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, return __blk_mq_issue_directly(hctx, rq, cookie); insert: - __blk_mq_fallback_to_insert(rq, run_queue, bypass_insert); if (bypass_insert) return BLK_STS_RESOURCE; + blk_mq_sched_insert_request(rq, false, run_queue, false); return BLK_STS_OK; } @@ -1782,7 +1780,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false); if (ret == BLK_STS_RESOURCE) - __blk_mq_fallback_to_insert(rq, true, false); + blk_mq_sched_insert_request(rq, false, true, false); else if (ret != BLK_STS_OK) blk_mq_end_request(rq, ret); From patchwork Wed Nov 27 20:18:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201793 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXCy6Z0Cz9sRG; Thu, 28 Nov 2019 07:18:54 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3lw-00038P-BH; Wed, 27 Nov 2019 20:18:48 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lt-00035t-3N for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:45 +0000 Received: from mail-qt1-f199.google.com ([209.85.160.199]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lr-0004TY-ME for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:43 +0000 Received: by mail-qt1-f199.google.com with SMTP id s8so15499464qtq.17 for ; Wed, 27 Nov 2019 12:18:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GaCb/NpB1KhdS712GbqvgnZd0+9r8WXsxORN0bzDbKw=; b=baaj3aU/BnbepZem1WjUDnZTx1CSjHrkaL1dtbkJAM+Cbbe+T7K3hhzSt1QE9X1lrh mQU31bf/j6C4pHOonTU4SdZxIEjUE5nsR1/iKiXuN6abCOdX/eNnPGDyiqOZoknBVRYB jcFSOXwKYRXoXWuqXI4uK2465F1hFY1VYb1wlWD+zo7FotH3HejJU2wpsWkDOr07rjnc EoULLMA9USnSmXE9WPa0afImAZfJewJL1i0hk4YO5g/PCoFwmELqX+sGPncp4ZG/rQUc xJwa+Ozv/oiJECPozA5mXtsk4BfFvf3IcPM2LBFiGHV9ZLZ8bmJAMBrhH0Qy+r9468Eg 56Ow== X-Gm-Message-State: APjAAAVujoJrkdOBa+Ocz2OiQNLnq7gBoGO4sZC8xa1HSxEj1JkaprFO IrSL9sMfz4QstGzz69WpX00mV2irWzqOGgzMI5e0JjPRU3g1ieKuGOkv92PxTWlfMdL7cx/uhP9 fU6557cyhDvZxy32m9+rSle9ZZ2rfm57RltP/Ek4o X-Received: by 2002:ac8:47d3:: with SMTP id d19mr7020859qtr.142.1574885922230; Wed, 27 Nov 2019 12:18:42 -0800 (PST) X-Google-Smtp-Source: APXvYqyUFceuwZNw7rZIKUFUbp9ybaTMzL+HhAxf+Q/1SIraQLEtAbh4zF9H5/P1zRcYAnDmLRM2aA== X-Received: by 2002:ac8:47d3:: with SMTP id d19mr7020828qtr.142.1574885921810; Wed, 27 Nov 2019 12:18:41 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:41 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 08/15] blk-mq: introduce BLK_STS_DEV_RESOURCE Date: Wed, 27 Nov 2019 17:18:13 -0300 Message-Id: <20191127201820.32174-9-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Ming Lei BugLink: https://bugs.launchpad.net/bugs/1848739 This status is returned from driver to block layer if device related resource is unavailable, but driver can guarantee that IO dispatch will be triggered in future when the resource is available. Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is 3 ms because both scsi-mq and nvmefc are using that magic value. If a driver can make sure there is in-flight IO, it is safe to return BLK_STS_DEV_RESOURCE because: 1) If all in-flight IOs complete before examining SCHED_RESTART in blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue is run immediately in this case by blk_mq_dispatch_rq_list(); 2) if there is any in-flight IO after/when examining SCHED_RESTART in blk_mq_dispatch_rq_list(): - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) - otherwise, this request will be dispatched after any in-flight IO is completed via blk_mq_sched_restart() 3) if SCHED_RESTART is set concurently in context because of BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two cases and make sure IO hang can be avoided. One invariant is that queue will be rerun if SCHED_RESTART is set. Suggested-by: Jens Axboe Tested-by: Laurence Oberman Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer Signed-off-by: Jens Axboe (cherry picked from commit 86ff7c2a80cd357f6156a53b354f6a0b357dc0c9) [marcelo.cerri@canonical.com: Fixed context in include/linux/blk_types.h, the missing context is from commit 9111e5686c8c ("block: Provide blk_status_t decoding for path errors") which is not necessary] Signed-off-by: Marcelo Henrique Cerri --- block/blk-core.c | 1 + block/blk-mq.c | 20 ++++++++++++++++---- drivers/block/null_blk.c | 2 +- drivers/block/virtio_blk.c | 2 +- drivers/block/xen-blkfront.c | 2 +- drivers/md/dm-rq.c | 5 ++--- drivers/nvme/host/fc.c | 12 ++---------- drivers/scsi/scsi_lib.c | 6 +++--- include/linux/blk_types.h | 18 ++++++++++++++++++ 9 files changed, 45 insertions(+), 23 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 1ae88e0a7ed4..30a3cb2eca5a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -143,6 +143,7 @@ static const struct { [BLK_STS_MEDIUM] = { -ENODATA, "critical medium" }, [BLK_STS_PROTECTION] = { -EILSEQ, "protection" }, [BLK_STS_RESOURCE] = { -ENOMEM, "kernel resource" }, + [BLK_STS_DEV_RESOURCE] = { -EBUSY, "device resource" }, [BLK_STS_AGAIN] = { -EAGAIN, "nonblocking retry" }, /* device mapper special case, should not leak out: */ diff --git a/block/blk-mq.c b/block/blk-mq.c index d4945ffaf034..e035215b1546 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1118,6 +1118,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, } } +#define BLK_MQ_RESOURCE_DELAY 3 /* ms units */ + bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, bool got_budget) { @@ -1125,6 +1127,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, struct request *rq, *nxt; bool no_tag = false; int errors, queued; + blk_status_t ret = BLK_STS_OK; if (list_empty(list)) return false; @@ -1137,7 +1140,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, errors = queued = 0; do { struct blk_mq_queue_data bd; - blk_status_t ret; rq = list_first_entry(list, struct request, queuelist); @@ -1181,7 +1183,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, } ret = q->mq_ops->queue_rq(hctx, &bd); - if (ret == BLK_STS_RESOURCE) { + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { /* * If an I/O scheduler has been configured and we got a * driver tag for the next request already, free it @@ -1212,6 +1214,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, * that is where we will continue on next queue run. */ if (!list_empty(list)) { + bool needs_restart; + spin_lock(&hctx->lock); list_splice_init(list, &hctx->dispatch); spin_unlock(&hctx->lock); @@ -1235,10 +1239,17 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, * - Some but not all block drivers stop a queue before * returning BLK_STS_RESOURCE. Two exceptions are scsi-mq * and dm-rq. + * + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART + * bit is set, run queue after a delay to avoid IO stalls + * that could otherwise occur if the queue is idle. */ - if (!blk_mq_sched_needs_restart(hctx) || + needs_restart = blk_mq_sched_needs_restart(hctx); + if (!needs_restart || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) blk_mq_run_hw_queue(hctx, true); + else if (needs_restart && (ret == BLK_STS_RESOURCE)) + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY); } return (queued + errors) != 0; @@ -1717,6 +1728,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, *cookie = new_cookie; break; case BLK_STS_RESOURCE: + case BLK_STS_DEV_RESOURCE: __blk_mq_requeue_request(rq); break; default: @@ -1779,7 +1791,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, hctx_lock(hctx, &srcu_idx); ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false); - if (ret == BLK_STS_RESOURCE) + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) blk_mq_sched_insert_request(rq, false, true, false); else if (ret != BLK_STS_OK) blk_mq_end_request(rq, ret); diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index b7d9528c3b9d..79b896f07b01 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -1239,7 +1239,7 @@ static blk_status_t null_handle_cmd(struct nullb_cmd *cmd) return BLK_STS_OK; } else /* requeue request */ - return BLK_STS_RESOURCE; + return BLK_STS_DEV_RESOURCE; } } diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 8767401f75e0..78dbaab47633 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -276,7 +276,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, /* Out of mem doesn't actually happen, since we fall back * to direct descriptors */ if (err == -ENOMEM || err == -ENOSPC) - return BLK_STS_RESOURCE; + return BLK_STS_DEV_RESOURCE; return BLK_STS_IOERR; } diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 32ac5f551e55..a98f78b9c4c6 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -912,7 +912,7 @@ static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx, out_busy: blk_mq_stop_hw_queue(hctx); spin_unlock_irqrestore(&rinfo->ring_lock, flags); - return BLK_STS_RESOURCE; + return BLK_STS_DEV_RESOURCE; } static void blkif_complete_rq(struct request *rq) diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 134cd855ba35..2af020c6bea2 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -404,7 +404,7 @@ static blk_status_t dm_dispatch_clone_request(struct request *clone, struct requ clone->start_time = jiffies; r = blk_insert_cloned_request(clone->q, clone); - if (r != BLK_STS_OK && r != BLK_STS_RESOURCE) + if (r != BLK_STS_OK && r != BLK_STS_RESOURCE && r != BLK_STS_DEV_RESOURCE) /* must complete clone in terms of original request */ dm_complete_request(rq, r); return r; @@ -496,7 +496,7 @@ static int map_request(struct dm_rq_target_io *tio) trace_block_rq_remap(clone->q, clone, disk_devt(dm_disk(md)), blk_rq_pos(rq)); ret = dm_dispatch_clone_request(clone, rq); - if (ret == BLK_STS_RESOURCE) { + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_rq_unprep_clone(clone); tio->ti->type->release_clone_rq(clone, &tio->info); tio->clone = NULL; @@ -771,7 +771,6 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx, /* Undo dm_start_request() before requeuing */ rq_end_stats(md, rq); rq_completed(md, rq_data_dir(rq), false); - blk_mq_delay_run_hw_queue(hctx, 100/*ms*/); return BLK_STS_RESOURCE; } diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 25356a55cae2..636765f6d394 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -35,8 +35,6 @@ enum nvme_fc_queue_flags { NVME_FC_Q_LIVE, }; -#define NVMEFC_QUEUE_DELAY 3 /* ms units */ - #define NVME_FC_DEFAULT_DEV_LOSS_TMO 60 /* seconds */ struct nvme_fc_queue { @@ -2231,7 +2229,7 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue, * the target device is present */ if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) - goto busy; + return BLK_STS_RESOURCE; if (!nvme_fc_ctrl_get(ctrl)) return BLK_STS_IOERR; @@ -2311,16 +2309,10 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue, ret != -EBUSY) return BLK_STS_IOERR; - goto busy; + return BLK_STS_RESOURCE; } return BLK_STS_OK; - -busy: - if (!(op->flags & FCOP_FLAGS_AEN) && queue->hctx) - blk_mq_delay_run_hw_queue(queue->hctx, NVMEFC_QUEUE_DELAY); - - return BLK_STS_RESOURCE; } static inline blk_status_t nvme_fc_is_ready(struct nvme_fc_queue *queue, diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 92b82c52c27b..896a077fee07 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2069,9 +2069,9 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, case BLK_STS_OK: break; case BLK_STS_RESOURCE: - if (atomic_read(&sdev->device_busy) == 0 && - !scsi_device_blocked(sdev)) - blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); + if (atomic_read(&sdev->device_busy) || + scsi_device_blocked(sdev)) + ret = BLK_STS_DEV_RESOURCE; break; default: if (unlikely(!scsi_device_online(sdev))) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 3c1b51f67914..297978a0e486 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -44,6 +44,24 @@ typedef u8 __bitwise blk_status_t; #define BLK_STS_AGAIN ((__force blk_status_t)12) +/* + * BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if + * device related resources are unavailable, but the driver can guarantee + * that the queue will be rerun in the future once resources become + * available again. This is typically the case for device specific + * resources that are consumed for IO. If the driver fails allocating these + * resources, we know that inflight (or pending) IO will free these + * resource upon completion. + * + * This is different from BLK_STS_RESOURCE in that it explicitly references + * a device specific resource. For resources of wider scope, allocation + * failure can happen without having pending IO. This means that we can't + * rely on request completions freeing these resources, as IO may not be in + * flight. Examples of that are kernel memory allocations, DMA mappings, or + * any other system wide resources. + */ +#define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13) + struct blk_issue_stat { u64 stat; }; From patchwork Wed Nov 27 20:18:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201794 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXCz6zD6z9sR8; Thu, 28 Nov 2019 07:18:55 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3lx-00039G-MN; Wed, 27 Nov 2019 20:18:49 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lt-00036c-Ob for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:45 +0000 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lt-0004Tr-Fo for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:45 +0000 Received: by mail-qt1-f198.google.com with SMTP id e37so1167548qtk.7 for ; Wed, 27 Nov 2019 12:18:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bPlezNudUp4i514SlB0nN/TTczJAiHDMf1dDJpBD3qA=; b=QBEQGXZ5rpzxtw+mq7oBTjPt23ddNmMyE265XsL8dPWYGl3tTUYB5MsSwgg3oORry4 srwIBKbrbPjfM1SW/XKW0jAj14CLp5ZidR4MBycpX7nHunGeXbQR8MpVCj6p2v4/ryBs /nJOb69Mn+TG6zVrhuoa4o/PM1rDbgL0lk/yglcrjxLkcF2bTAzNQCHqrH60N3jM92Iv EDyTqkCDrSsp06r71FW+F6ARawpd7CI+lv8FFBrMWkfzmW8hthxW4xXFjXop5FpZ3isp 0oN4BfljZmxIAiimp62QBmirv1MTa7orOtJinRSNKtgOQWDrNFLiQ1qIDdClktEI9jwI kJ6g== X-Gm-Message-State: APjAAAVDZnmqBhBX09bx/7zKKFYGTl7N3gUlyc2NdboQ/rd0M33iQEpa cJMCgtaTxaH3RcjXLXv21pPEo64VyHklbzrU6Kkpc5jbfoeBuJpKw5X93/uBl+m+1bj480Lepxp Q/ElBydXsky4PvwnOT5El/Bm3PHkN4NzF30VDbYe6 X-Received: by 2002:ad4:52c8:: with SMTP id p8mr7274376qvs.84.1574885924204; Wed, 27 Nov 2019 12:18:44 -0800 (PST) X-Google-Smtp-Source: APXvYqxINfjrC2xyA8IsCJ9H5deaql5eVFJ4QxlK4evPklkdPEsUlTZ7dyaQejrhEnIxcS4zBAfg6g== X-Received: by 2002:ad4:52c8:: with SMTP id p8mr7274337qvs.84.1574885923819; Wed, 27 Nov 2019 12:18:43 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:43 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 09/15] blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly() Date: Wed, 27 Nov 2019 17:18:14 -0300 Message-Id: <20191127201820.32174-10-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Bart Van Assche BugLink: https://bugs.launchpad.net/bugs/1848739 Most blk-mq functions have a name that follows the pattern blk_mq_${action}. However, the function name blk_mq_request_direct_issue is an exception. Hence rename this function. This patch does not change any functionality. Reviewed-by: Mike Snitzer Signed-off-by: Bart Van Assche Signed-off-by: Jens Axboe (cherry picked from commit c77ff7fd03ddca8face268c4cf093c0edf4bcf1f) Signed-off-by: Marcelo Henrique Cerri --- block/blk-core.c | 2 +- block/blk-mq.c | 4 ++-- block/blk-mq.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 30a3cb2eca5a..f885b65324c2 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2508,7 +2508,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request * * bypass a potential scheduler on the bottom device for * insert. */ - return blk_mq_request_direct_issue(rq); + return blk_mq_request_issue_directly(rq); } spin_lock_irqsave(q->queue_lock, flags); diff --git a/block/blk-mq.c b/block/blk-mq.c index e035215b1546..3145221ca824 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1751,7 +1751,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, * RCU or SRCU read lock is needed before checking quiesced flag. * * When queue is stopped or quiesced, ignore 'bypass_insert' from - * blk_mq_request_direct_issue(), and return BLK_STS_OK to caller, + * blk_mq_request_issue_directly(), and return BLK_STS_OK to caller, * and avoid driver to try to dispatch again. */ if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)) { @@ -1799,7 +1799,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, hctx_unlock(hctx, srcu_idx); } -blk_status_t blk_mq_request_direct_issue(struct request *rq) +blk_status_t blk_mq_request_issue_directly(struct request *rq) { blk_status_t ret; int srcu_idx; diff --git a/block/blk-mq.h b/block/blk-mq.h index 0daa9f2c3d61..c11c627ebd6d 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -61,7 +61,7 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list); /* Used by blk_insert_cloned_request() to issue request directly */ -blk_status_t blk_mq_request_direct_issue(struct request *rq); +blk_status_t blk_mq_request_issue_directly(struct request *rq); /* * CPU -> queue mappings From patchwork Wed Nov 27 20:18:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201795 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXD41wBCz9sR8; Thu, 28 Nov 2019 07:19:00 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3m2-0003CZ-2a; Wed, 27 Nov 2019 20:18:54 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lw-000389-DB for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:48 +0000 Received: from mail-qt1-f197.google.com ([209.85.160.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lv-0004U2-GS for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:47 +0000 Received: by mail-qt1-f197.google.com with SMTP id g13so15479515qtq.16 for ; Wed, 27 Nov 2019 12:18:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3cMWGyp0dZwz9ZDcr1rYVlQ7FOPHiUfEYyuNejeEO3w=; b=Q50fUh2i+YF9tQ/nrT8w9ORRagN5puo4Vl16cuXtq1TD65/8zwRCRjAkl4UnT+w2V0 jweFi1j2AwMj+874ax7774ReSrSupxD5AKRT6SfF0x9M0pjjHoUkqCynK8PZg3zWMSxj cpIb8BaWctmtLeufGVeHkirkCgnhkI6gz1Sr0iGR9avpIhE4B1ykRLvoleDzfeYaI5l1 6SiFpwVpNhE3CxzJi2VXga9PKQpMHJcy/4B95XK+Q2fl+1DS7PYfKI7rY/6jNWd7B0KZ 7i4gKwhFcJPMkjf4OC7MJ/MZBoPsJgCsbZhD6cujrmTIqAxZ1dRwacCLQsT6AtAZUBVe cXsw== X-Gm-Message-State: APjAAAUYHzJWdh0rVtNtZigMpiPf/VCeJ+atUV5UHmvg8aJ4KAFZV8QV buMYy6x7+rr6M7hsLsrADClnszH8W/pJ31+WKCr9a47RgYbwpcYOyOHEONd0d9JSL2JxgD0hYER 61GRapNdEPmw/a9IyOttb5rsXsyfESFWHCeZZxRHS X-Received: by 2002:a0c:edb2:: with SMTP id h18mr7018055qvr.36.1574885926239; Wed, 27 Nov 2019 12:18:46 -0800 (PST) X-Google-Smtp-Source: APXvYqzD4T6AdCLEd1dLiNAXxFLSynmeDnX1V2xWXCPQyBwe7luNrvGgCQTsWsm/j2Fsi2HEEmXspw== X-Received: by 2002:a0c:edb2:: with SMTP id h18mr7018026qvr.36.1574885925933; Wed, 27 Nov 2019 12:18:45 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:45 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 10/15] blk-mq: don't queue more if we get a busy return Date: Wed, 27 Nov 2019 17:18:15 -0300 Message-Id: <20191127201820.32174-11-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jens Axboe BugLink: https://bugs.launchpad.net/bugs/1848739 Some devices have different queue limits depending on the type of IO. A classic case is SATA NCQ, where some commands can queue, but others cannot. If we have NCQ commands inflight and encounter a non-queueable command, the driver returns busy. Currently we attempt to dispatch more from the scheduler, if we were able to queue some commands. But for the case where we ended up stopping due to BUSY, we should not attempt to retrieve more from the scheduler. If we do, we can get into a situation where we attempt to queue a non-queueable command, get BUSY, then successfully retrieve more commands from that scheduler and queue those. This can repeat forever, starving the non-queuable command indefinitely. Fix this by NOT attempting to pull more commands from the scheduler, if we get a BUSY return. This should also be more optimal in terms of letting requests stay in the scheduler for as long as possible, if we get a BUSY due to the regular out-of-tags condition. Reviewed-by: Omar Sandoval Reviewed-by: Ming Lei Signed-off-by: Jens Axboe (cherry picked from commit 1f57f8d442f8017587eeebd8617913bfc3661d3d) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index 3145221ca824..913157d52b92 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1120,6 +1120,9 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, #define BLK_MQ_RESOURCE_DELAY 3 /* ms units */ +/* + * Returns true if we did some work AND can potentially do more. + */ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, bool got_budget) { @@ -1250,8 +1253,17 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, blk_mq_run_hw_queue(hctx, true); else if (needs_restart && (ret == BLK_STS_RESOURCE)) blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY); + + return false; } + /* + * If the host/device is unable to accept more work, inform the + * caller of that. + */ + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) + return false; + return (queued + errors) != 0; } From patchwork Wed Nov 27 20:18:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201796 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXD46FVvz9sRK; Thu, 28 Nov 2019 07:19:00 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3m3-0003DI-BJ; Wed, 27 Nov 2019 20:18:55 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lz-00039e-Es for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:51 +0000 Received: from mail-qt1-f197.google.com ([209.85.160.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lx-0004U8-Hh for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:49 +0000 Received: by mail-qt1-f197.google.com with SMTP id l2so9556344qti.19 for ; Wed, 27 Nov 2019 12:18:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LEkZEdutr9+yQlZSBhMRUjHdsHaBtiyG2YfGho4P9Hg=; b=nfV+UP1a29unvvLyS11weXFC3q+hHT0JJooGUSznheU7HhNxolTjHir/wreZVnVRdy fp0l03HkWVYliEtkUqsh+ssynYEIOUXOYWR4rRYELlpMkOtu0/DqhXxUCaNYnmxtUttn YfxN9v2MAWRxax0t6omR60FuuU7SmDk/iUKotuGEdW0Gc9B75lX6cBUN42jjBwxhWbgE wgzHL2lwOrL0F1NqEs+GTF7gJ3IXO1BbPp0nC3/4Ue10S9gIyhPkdIpuuSu8/ygzne75 4l9oWEEqt2rFylXN/H7Nr/DR6AGB8OyCmrMRomg7PwfsFM8jm3x2d5VORAQ2SFzSf8jJ V7BA== X-Gm-Message-State: APjAAAW817bfWMxrfNedZci3qI0jEyoN8FpgIbqzcGDyyqc1ZAlOJ/oE OC72Th2FvJ23ZkgZoW/cTdmuRxvGxbMpicR3GPvekM0FMw7aUbvtM89iUFKd0DiYBzgYrhd+ETV DXn8L4xKlegWV8sQYA4agdnjDluD3QK1Lks19S2vM X-Received: by 2002:a0c:facf:: with SMTP id p15mr7126334qvo.212.1574885928281; Wed, 27 Nov 2019 12:18:48 -0800 (PST) X-Google-Smtp-Source: APXvYqwb5gsRgoNyUTZ9RvuZHRPWNQcwpWwW9hfnT0Fi1LdQ7R6fv1lIQ7IXG7qns1mSnifTBFAR5w== X-Received: by 2002:a0c:facf:: with SMTP id p15mr7126305qvo.212.1574885927932; Wed, 27 Nov 2019 12:18:47 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:47 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 11/15] blk-mq: dequeue request one by one from sw queue if hctx is busy Date: Wed, 27 Nov 2019 17:18:16 -0300 Message-Id: <20191127201820.32174-12-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Ming Lei BugLink: https://bugs.launchpad.net/bugs/1848739 It won't be efficient to dequeue request one by one from sw queue, but we have to do that when queue is busy for better merge performance. This patch takes the Exponential Weighted Moving Average(EWMA) to figure out if queue is busy, then only dequeue request one by one from sw queue when queue is busy. Fixes: b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") Cc: Kashyap Desai Cc: Laurence Oberman Cc: Omar Sandoval Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Hannes Reinecke Reported-by: Kashyap Desai Tested-by: Kashyap Desai Signed-off-by: Ming Lei Signed-off-by: Jens Axboe (cherry picked from commit 6e768717304bdbe8d2897ca8298f6b58863fdc41) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq-debugfs.c | 9 +++++++++ block/blk-mq-sched.c | 11 ++--------- block/blk-mq.c | 33 ++++++++++++++++++++++++++++++++- include/linux/blk-mq.h | 3 ++- 4 files changed, 45 insertions(+), 11 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 54bd8c31b822..ead271fb641e 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -607,6 +607,14 @@ static int hctx_active_show(void *data, struct seq_file *m) return 0; } +static int hctx_dispatch_busy_show(void *data, struct seq_file *m) +{ + struct blk_mq_hw_ctx *hctx = data; + + seq_printf(m, "%u\n", hctx->dispatch_busy); + return 0; +} + static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos) __acquires(&ctx->lock) { @@ -776,6 +784,7 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = { {"queued", 0600, hctx_queued_show, hctx_queued_write}, {"run", 0600, hctx_run_show, hctx_run_write}, {"active", 0400, hctx_active_show}, + {"dispatch_busy", 0400, hctx_dispatch_busy_show}, {}, }; diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index f3380331e5f3..1518c794a78c 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -220,15 +220,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) } } else if (has_sched_dispatch) { blk_mq_do_dispatch_sched(hctx); - } else if (q->mq_ops->get_budget) { - /* - * If we need to get budget before queuing request, we - * dequeue request one by one from sw queue for avoiding - * to mess up I/O merge when dispatch runs out of resource. - * - * TODO: get more budgets, and dequeue more requests in - * one time. - */ + } else if (hctx->dispatch_busy) { + /* dequeue request one by one from sw queue if queue is busy */ blk_mq_do_dispatch_ctx(hctx); } else { blk_mq_flush_busy_ctxs(hctx, &rq_list); diff --git a/block/blk-mq.c b/block/blk-mq.c index 913157d52b92..691ed5f8f6d9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1118,6 +1118,35 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, } } +#define BLK_MQ_DISPATCH_BUSY_EWMA_WEIGHT 8 +#define BLK_MQ_DISPATCH_BUSY_EWMA_FACTOR 4 +/* + * Update dispatch busy with the Exponential Weighted Moving Average(EWMA): + * - EWMA is one simple way to compute running average value + * - weight(7/8 and 1/8) is applied so that it can decrease exponentially + * - take 4 as factor for avoiding to get too small(0) result, and this + * factor doesn't matter because EWMA decreases exponentially + */ +static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy) +{ + unsigned int ewma; + + if (hctx->queue->elevator) + return; + + ewma = hctx->dispatch_busy; + + if (!ewma && !busy) + return; + + ewma *= BLK_MQ_DISPATCH_BUSY_EWMA_WEIGHT - 1; + if (busy) + ewma += 1 << BLK_MQ_DISPATCH_BUSY_EWMA_FACTOR; + ewma /= BLK_MQ_DISPATCH_BUSY_EWMA_WEIGHT; + + hctx->dispatch_busy = ewma; +} + #define BLK_MQ_RESOURCE_DELAY 3 /* ms units */ /* @@ -1254,8 +1283,10 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, else if (needs_restart && (ret == BLK_STS_RESOURCE)) blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY); + blk_mq_update_dispatch_busy(hctx, true); return false; - } + } else + blk_mq_update_dispatch_busy(hctx, false); /* * If the host/device is unable to accept more work, inform the diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 95c9a5c862e2..f3188bf2acee 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -32,9 +32,10 @@ struct blk_mq_hw_ctx { struct sbitmap ctx_map; struct blk_mq_ctx *dispatch_from; + unsigned int dispatch_busy; - struct blk_mq_ctx **ctxs; unsigned int nr_ctx; + struct blk_mq_ctx **ctxs; wait_queue_entry_t dispatch_wait; atomic_t wait_index; From patchwork Wed Nov 27 20:18:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201797 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXD925V3z9sR8; Thu, 28 Nov 2019 07:19:05 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3m7-0003Fp-7N; Wed, 27 Nov 2019 20:18:59 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3m0-0003B4-EI for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:52 +0000 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lz-0004UT-Kt for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:51 +0000 Received: by mail-qk1-f197.google.com with SMTP id d144so14585038qke.16 for ; Wed, 27 Nov 2019 12:18:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/ks55vZeATZBwMwbhTYuTMmw0IQ+8kj9/jgtB3tmHD4=; b=cs8MaDmp4/8g6ZieO+9rl1ktrsfCelWkfgBTIOIw6Jf/UtxU9btBqeShFLPFJ4Jgn4 pvIPIg6Ghy4Z02Zm6kU4XejC4OOP3ThsCx7Z7lgSbSiisfq4a4OKCgjp6jTYs4CYHGKu 1nedPrpJLZ3zLUVV0lQ4cKObwZ7lWAmkT/ZHMUT4JBqweRiSEqHLtT4Yf4bXQLhQkNzx 0KQPShuCZ4b2Env7lBHUPeguGy115eyYFlkceKMURD3dFxm30eQCaQgQJjrOfYJYAUr7 JO/X39xEK0qFwRH0ioGNywto8+6IkUHQlXoRiJZoIcoAVFE1UzjLlswGtC8qpw13CLz5 xe1A== X-Gm-Message-State: APjAAAWg/2xA/shkP4v6NLw89Jx7x+Xy/ZImYjze84MxWO1bnDiKQKPm SBTB/5LgPFYM8XWTXxV8qaeB8PAEJl7xT04xgsRwMzBSl+Y4gv158Cfqz7l5Ka7hVpFLX/EoXPe tcaGkj/rCtwZcbrPlSD89qjO8MdeKxD0ypmpXTI65 X-Received: by 2002:a37:68d5:: with SMTP id d204mr6538670qkc.268.1574885930353; Wed, 27 Nov 2019 12:18:50 -0800 (PST) X-Google-Smtp-Source: APXvYqwe/VN9Pu/Nz/jQSHY2gJfJFL5DS3TyI7ArI2S6lpiPhLq5PAd8LMLVtE/NsUsz6reQ3MPMhg== X-Received: by 2002:a37:68d5:: with SMTP id d204mr6538642qkc.268.1574885930000; Wed, 27 Nov 2019 12:18:50 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:49 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 12/15] blk-mq: issue directly if hw queue isn't busy in case of 'none' Date: Wed, 27 Nov 2019 17:18:17 -0300 Message-Id: <20191127201820.32174-13-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Ming Lei BugLink: https://bugs.launchpad.net/bugs/1848739 In case of 'none' io scheduler, when hw queue isn't busy, it isn't necessary to enqueue request to sw queue and dequeue it from sw queue because request may be submitted to hw queue asap without extra cost, meantime there shouldn't be much request in sw queue, and we don't need to worry about effect on IO merge. There are still some single hw queue SCSI HBAs(HPSA, megaraid_sas, ...) which may connect high performance devices, so 'none' is often required for obtaining good performance. This patch improves IOPS and decreases CPU unilization on megaraid_sas, per Kashyap's test. Cc: Kashyap Desai Cc: Laurence Oberman Cc: Omar Sandoval Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Hannes Reinecke Reported-by: Kashyap Desai Tested-by: Kashyap Desai Signed-off-by: Ming Lei Signed-off-by: Jens Axboe (cherry picked from commit 6ce3dd6eec114930cf2035a8bcb1e80477ed79a8) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq-sched.c | 13 ++++++++++++- block/blk-mq.c | 23 ++++++++++++++++++++++- block/blk-mq.h | 2 ++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 1518c794a78c..45d8e861fe55 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -465,8 +465,19 @@ void blk_mq_sched_insert_requests(struct request_queue *q, if (e && e->type->ops.mq.insert_requests) e->type->ops.mq.insert_requests(hctx, list, false); - else + else { + /* + * try to issue requests directly if the hw queue isn't + * busy in case of 'none' scheduler, and this way may save + * us one extra enqueue & dequeue to sw queue. + */ + if (!hctx->dispatch_busy && !e && !run_queue_async) { + blk_mq_try_issue_list_directly(hctx, list); + if (list_empty(list)) + return; + } blk_mq_insert_requests(hctx, ctx, list); + } blk_mq_run_hw_queue(hctx, run_queue_async); } diff --git a/block/blk-mq.c b/block/blk-mq.c index 691ed5f8f6d9..ea3feeab1fd0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1768,13 +1768,16 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, ret = q->mq_ops->queue_rq(hctx, &bd); switch (ret) { case BLK_STS_OK: + blk_mq_update_dispatch_busy(hctx, false); *cookie = new_cookie; break; case BLK_STS_RESOURCE: case BLK_STS_DEV_RESOURCE: + blk_mq_update_dispatch_busy(hctx, true); __blk_mq_requeue_request(rq); break; default: + blk_mq_update_dispatch_busy(hctx, false); *cookie = BLK_QC_T_NONE; break; } @@ -1857,6 +1860,23 @@ blk_status_t blk_mq_request_issue_directly(struct request *rq) return ret; } +void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx, + struct list_head *list) +{ + while (!list_empty(list)) { + blk_status_t ret; + struct request *rq = list_first_entry(list, struct request, + queuelist); + + list_del_init(&rq->queuelist); + ret = blk_mq_request_issue_directly(rq); + if (ret != BLK_STS_OK) { + list_add(&rq->queuelist, list); + break; + } + } +} + static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) { const int is_sync = op_is_sync(bio->bi_opf); @@ -1958,7 +1978,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) blk_mq_try_issue_directly(data.hctx, same_queue_rq, &cookie); } - } else if (q->nr_hw_queues > 1 && is_sync) { + } else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator && + !data.hctx->dispatch_busy)) { blk_mq_put_ctx(data.ctx); blk_mq_bio_to_request(rq, bio); blk_mq_try_issue_directly(data.hctx, rq, &cookie); diff --git a/block/blk-mq.h b/block/blk-mq.h index c11c627ebd6d..b78cdcad7d7f 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -62,6 +62,8 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, /* Used by blk_insert_cloned_request() to issue request directly */ blk_status_t blk_mq_request_issue_directly(struct request *rq); +void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx, + struct list_head *list); /* * CPU -> queue mappings From patchwork Wed Nov 27 20:18:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201798 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXDC1TSLz9sRW; Thu, 28 Nov 2019 07:19:07 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3m9-0003Ht-Op; Wed, 27 Nov 2019 20:19:01 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3m3-0003Co-EG for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:55 +0000 Received: from mail-qv1-f72.google.com ([209.85.219.72]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3m1-0004Un-TS for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:54 +0000 Received: by mail-qv1-f72.google.com with SMTP id y9so3763757qvi.10 for ; Wed, 27 Nov 2019 12:18:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/yYPfgIAlRX7lSzXjDE6tVH6olaalYJrK3BJqtNgi+I=; b=b6HEBGGFJqb1bUin7aQCMvIA6AwbGlNxarc42ETvGI6q0PynBlGgSqPnImncF4JJOI i1nmyP1gvX1Lgsc7erkZ9SSTbLikBjMphGIYjQc+oIiJLd9aJo+OPLAvNNeFlBtXWv/Y 3fk0sAptNJkQltFOE1MajTXAwOZqH4zWLSz+K1TrQRJD8M/7RN/3FB6NumBbxzlzrspU f5dbZVH5GD3NhZkVbo/163IU0fMU44QtTP2jSp12QDxlTmpT9KX87+emLxLnCHFIsbxP oAvGT0LvfhWW9Pj20s299t38vPdjQL/Fian5yPLoqsN6InICc0VKq7knRx/3DJL/eC1+ PQNg== X-Gm-Message-State: APjAAAVqXOXDOcho5yxarPmK756Dwu0EmuL/VcP7F7xzyeGonoJSpA2E 2QRBRe9yxVWmQ4YGWCwdCK5xEW0bzEnekzgibI8w/1i+788OolVET85AlMgTcWj2t/E6QQtRZgQ 2029TdCaVlSYDMc41MzHC9z6O98UAPM2yftP/HOFc X-Received: by 2002:a37:4e54:: with SMTP id c81mr2460132qkb.107.1574885932476; Wed, 27 Nov 2019 12:18:52 -0800 (PST) X-Google-Smtp-Source: APXvYqx7nR34YkfXs7VS06NxaLE9jT6WY4RHLm3t8yZ6MpuM+WkX29VTcMhym6y5n68KX1KDd3TKIw== X-Received: by 2002:a37:4e54:: with SMTP id c81mr2460101qkb.107.1574885932098; Wed, 27 Nov 2019 12:18:52 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:51 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 13/15] blk-mq: fix corruption with direct issue Date: Wed, 27 Nov 2019 17:18:18 -0300 Message-Id: <20191127201820.32174-14-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jens Axboe BugLink: https://bugs.launchpad.net/bugs/1848739 If we attempt a direct issue to a SCSI device, and it returns BUSY, then we queue the request up normally. However, the SCSI layer may have already setup SG tables etc for this particular command. If we later merge with this request, then the old tables are no longer valid. Once we issue the IO, we only read/write the original part of the request, not the new state of it. This causes data corruption, and is most often noticed with the file system complaining about the just read data being invalid: [ 235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256) because most of it is garbage... This doesn't happen from the normal issue path, as we will simply defer the request to the hardware queue dispatch list if we fail. Once it's on the dispatch list, we never merge with it. Fix this from the direct issue path by flagging the request as REQ_NOMERGE so we don't change the size of it before issue. See also: https://bugzilla.kernel.org/show_bug.cgi?id=201685 Tested-by: Guenter Roeck Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'") Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe (cherry picked from commit ffe81d45322cc3cb140f0db080a4727ea284661e) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index ea3feeab1fd0..f539357f5d3b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1773,6 +1773,15 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, break; case BLK_STS_RESOURCE: case BLK_STS_DEV_RESOURCE: + /* + * If direct dispatch fails, we cannot allow any merging on + * this IO. Drivers (like SCSI) may have set up permanent state + * for this request, like SG tables and mappings, and if we + * merge to it later on then we'll still only do IO to the + * original part. + */ + rq->cmd_flags |= REQ_NOMERGE; + blk_mq_update_dispatch_busy(hctx, true); __blk_mq_requeue_request(rq); break; @@ -1785,6 +1794,18 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, return ret; } +/* + * Don't allow direct dispatch of anything but regular reads/writes, + * as some of the other commands can potentially share request space + * with data we need for the IO scheduler. If we attempt a direct dispatch + * on those and fail, we can't safely add it to the scheduler afterwards + * without potentially overwriting data that the driver has already written. + */ +static bool blk_rq_can_direct_dispatch(struct request *rq) +{ + return req_op(rq) == REQ_OP_READ || req_op(rq) == REQ_OP_WRITE; +} + static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_qc_t *cookie, @@ -1806,7 +1827,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, goto insert; } - if (q->elevator && !bypass_insert) + if (!blk_rq_can_direct_dispatch(rq) || (q->elevator && !bypass_insert)) goto insert; if (!blk_mq_get_dispatch_budget(hctx)) @@ -1868,6 +1889,9 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx, struct request *rq = list_first_entry(list, struct request, queuelist); + if (!blk_rq_can_direct_dispatch(rq)) + break; + list_del_init(&rq->queuelist); ret = blk_mq_request_issue_directly(rq); if (ret != BLK_STS_OK) { From patchwork Wed Nov 27 20:18:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201800 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXDG0k9tz9sR8; Thu, 28 Nov 2019 07:19:10 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3mD-0003L8-91; Wed, 27 Nov 2019 20:19:05 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3m5-0003E0-BJ for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:57 +0000 Received: from mail-qt1-f197.google.com ([209.85.160.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3m3-0004V5-SC for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:55 +0000 Received: by mail-qt1-f197.google.com with SMTP id h15so15505400qtn.6 for ; Wed, 27 Nov 2019 12:18:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0vboumVDIiLVFzUbMstauz3ej6/T68XYQ+Jl6HuPLgg=; b=oigKu0LO3FHlO9VFidp7kUDCIOWXadAgVlC+Zin6fp2Po0kscWvXMmMNF4CmpQEEBp u8Z40AiFbRfEo7M+JLgHW6aMjl+7rDSdUsvd3zJOKr6q/vO3F+oJNCaB+rsmSfE5GjRU bPhOcBkMRVZyZZrCu0ozFTbjSeKh44JE299oYF/8QnvNPEPTxAeKodoFywIZZGEF+SYE R0AG2b4YuXLS56dB+zcBNNZk8d6pXwNJy8PZavXrLMVUSvCIpH7pUm+Xg5XOyhdBnZJt lZJtn0IWyS4ZULzrGxRjFj8pmjRA4yc60k3VzVpiH8a0u6Ybq4Ub0k/klxGr7WyAw9vb +c2Q== X-Gm-Message-State: APjAAAXRt8A7xSXiSObKFyDkpkrsg/++j0gYIxWgRJ0Mi9Duye2yNha2 vA9JFEsTJW6R3CQRQg1yTgvIuq97fKltcoyFEqK4JAFIij5Ql/jex6KIkYxsliO4rIgZRvi4/vS mN689zJRTMGWW1dqbuIz5oSmyo3Y1CML46/dzrjUq X-Received: by 2002:a0c:c789:: with SMTP id k9mr6894049qvj.37.1574885934576; Wed, 27 Nov 2019 12:18:54 -0800 (PST) X-Google-Smtp-Source: APXvYqwI/aglpBFFoRRBMPCqxEwr7Rao58jDPK3be196CYVTlPE5Y7vusy56V0pLtR+gE6lCwb+ufA== X-Received: by 2002:a0c:c789:: with SMTP id k9mr6894025qvj.37.1574885934299; Wed, 27 Nov 2019 12:18:54 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:53 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 14/15] blk-mq: fail the request in case issue failure Date: Wed, 27 Nov 2019 17:18:19 -0300 Message-Id: <20191127201820.32174-15-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Ming Lei BugLink: https://bugs.launchpad.net/bugs/1848739 Inside blk_mq_try_issue_list_directly(), if the request is issued as failed, we shouldn't try to do it again, otherwise the warning in blk_mq_start_request() will be triggered. This change is aligned to behaviour of other ways of request issue & dispatch. Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'") Cc: Kashyap Desai Cc: Laurence Oberman Cc: Omar Sandoval Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Hannes Reinecke Cc: Kashyap Desai Cc: kernel test robot Cc: LKP Reported-by: kernel test robot Signed-off-by: Ming Lei Signed-off-by: Jens Axboe (cherry picked from commit 8824f62246bef288173a6624a363352f0d4d3b09) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index f539357f5d3b..6a1b7e3af232 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1895,8 +1895,12 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx, list_del_init(&rq->queuelist); ret = blk_mq_request_issue_directly(rq); if (ret != BLK_STS_OK) { - list_add(&rq->queuelist, list); - break; + if (ret == BLK_STS_RESOURCE || + ret == BLK_STS_DEV_RESOURCE) { + list_add(&rq->queuelist, list); + break; + } + blk_mq_end_request(rq, ret); } } } From patchwork Wed Nov 27 20:18:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201799 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXDF0BxLz9sRQ; Thu, 28 Nov 2019 07:19:09 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3mC-0003KZ-Vd; Wed, 27 Nov 2019 20:19:04 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3m7-0003FY-LD for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:59 +0000 Received: from mail-qk1-f199.google.com ([209.85.222.199]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3m6-0004VA-0m for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:58 +0000 Received: by mail-qk1-f199.google.com with SMTP id q13so11629857qke.11 for ; Wed, 27 Nov 2019 12:18:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tHg8cLma/NEwvRB2TiNw5l9zyHA8DaFso1ivYyOqHpw=; b=hZtyb1QnFoncJUkI2ZfP3NQAYDeTc4YXj2Xt9u5+R4qCCNqnVQU95o30I3fL5dYheI Qs4BuyCjZzs0qogD+g1v3xXB3dmAMUxFDUrVrTs5/6bBTEh+29Q81TNXWQWgwDFEj3b7 74AVEsOwoN2aSKvquQSzJPMZv7b/0UbnFHWMpCcjbBKrEO1+Q/4XHprheCnAFg7BxE0Y uSLBc8oiIlU+JXMGjQVAKS/5NZQmHWsDliWriWAC/BCQCXe027D309Bb1zrHNLti75hV Rh1Ebj2HDiKD8onZw4ItAodzcCpvkvQ/AulGTvjNP1MVGlw6LLhPv2F8Do35ZYVk0S73 kr5g== X-Gm-Message-State: APjAAAWQzgYM9Tbgi8fZCi8XPok/OYwA1TpuOyEdORX5eOaJRN04lFJ4 bvLg429iaMXR6rpmD5D993DKv3i3YuPtXD+8Ea1xem63nlNPoqCr+4VQ1H3aTpLOtmYf0t/eyU7 gSPIal9ppDHIa/12eJzBkrH/JWheA4TPZX8WHs9Xc X-Received: by 2002:a0c:f412:: with SMTP id h18mr7240018qvl.124.1574885936702; Wed, 27 Nov 2019 12:18:56 -0800 (PST) X-Google-Smtp-Source: APXvYqxxJCMZEYguS9/S877llrbhiEPOkSaV6BlMYNoRMezCFqI9xAwdi/Kgbr67WvR1xjYLHq8zHw== X-Received: by 2002:a0c:f412:: with SMTP id h18mr7239993qvl.124.1574885936318; Wed, 27 Nov 2019 12:18:56 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:55 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 15/15] blk-mq: punt failed direct issue to dispatch list Date: Wed, 27 Nov 2019 17:18:20 -0300 Message-Id: <20191127201820.32174-16-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jens Axboe BugLink: https://bugs.launchpad.net/bugs/1848739 After the direct dispatch corruption fix, we permanently disallow direct dispatch of non read/write requests. This works fine off the normal IO path, as they will be retried like any other failed direct dispatch request. But for the blk_insert_cloned_request() that only DM uses to bypass the bottom level scheduler, we always first attempt direct dispatch. For some types of requests, that's now a permanent failure, and no amount of retrying will make that succeed. This results in a livelock. Instead of making special cases for what we can direct issue, and now having to deal with DM solving the livelock while still retaining a BUSY condition feedback loop, always just add a request that has been through ->queue_rq() to the hardware queue dispatch list. These are safe to use as no merging can take place there. Additionally, if requests do have prepped data from drivers, we aren't dependent on them not sharing space in the request structure to safely add them to the IO scheduler lists. This basically reverts ffe81d45322c and is based on a patch from Ming, but with the list insert case covered as well. Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue") Cc: stable@vger.kernel.org Suggested-by: Ming Lei Reported-by: Bart Van Assche Tested-by: Ming Lei Acked-by: Mike Snitzer Signed-off-by: Jens Axboe (cherry picked from commit c616cbee97aed4bc6178f148a7240206dcdb85a6) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq.c | 33 +++++---------------------------- 1 file changed, 5 insertions(+), 28 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 6a1b7e3af232..8ef75eba264d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1773,15 +1773,6 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, break; case BLK_STS_RESOURCE: case BLK_STS_DEV_RESOURCE: - /* - * If direct dispatch fails, we cannot allow any merging on - * this IO. Drivers (like SCSI) may have set up permanent state - * for this request, like SG tables and mappings, and if we - * merge to it later on then we'll still only do IO to the - * original part. - */ - rq->cmd_flags |= REQ_NOMERGE; - blk_mq_update_dispatch_busy(hctx, true); __blk_mq_requeue_request(rq); break; @@ -1794,18 +1785,6 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, return ret; } -/* - * Don't allow direct dispatch of anything but regular reads/writes, - * as some of the other commands can potentially share request space - * with data we need for the IO scheduler. If we attempt a direct dispatch - * on those and fail, we can't safely add it to the scheduler afterwards - * without potentially overwriting data that the driver has already written. - */ -static bool blk_rq_can_direct_dispatch(struct request *rq) -{ - return req_op(rq) == REQ_OP_READ || req_op(rq) == REQ_OP_WRITE; -} - static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_qc_t *cookie, @@ -1827,7 +1806,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, goto insert; } - if (!blk_rq_can_direct_dispatch(rq) || (q->elevator && !bypass_insert)) + if (q->elevator && !bypass_insert) goto insert; if (!blk_mq_get_dispatch_budget(hctx)) @@ -1843,7 +1822,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, if (bypass_insert) return BLK_STS_RESOURCE; - blk_mq_sched_insert_request(rq, false, run_queue, false); + blk_mq_request_bypass_insert(rq, run_queue); return BLK_STS_OK; } @@ -1859,7 +1838,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false); if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) - blk_mq_sched_insert_request(rq, false, true, false); + blk_mq_request_bypass_insert(rq, true); else if (ret != BLK_STS_OK) blk_mq_end_request(rq, ret); @@ -1889,15 +1868,13 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx, struct request *rq = list_first_entry(list, struct request, queuelist); - if (!blk_rq_can_direct_dispatch(rq)) - break; - list_del_init(&rq->queuelist); ret = blk_mq_request_issue_directly(rq); if (ret != BLK_STS_OK) { if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { - list_add(&rq->queuelist, list); + blk_mq_request_bypass_insert(rq, + list_empty(list)); break; } blk_mq_end_request(rq, ret);