From patchwork Wed Nov 27 20:18:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 1201796 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47NXD46FVvz9sRK; Thu, 28 Nov 2019 07:19:00 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ia3m3-0003DI-BJ; Wed, 27 Nov 2019 20:18:55 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lz-00039e-Es for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:51 +0000 Received: from mail-qt1-f197.google.com ([209.85.160.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ia3lx-0004U8-Hh for kernel-team@lists.ubuntu.com; Wed, 27 Nov 2019 20:18:49 +0000 Received: by mail-qt1-f197.google.com with SMTP id l2so9556344qti.19 for ; Wed, 27 Nov 2019 12:18:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LEkZEdutr9+yQlZSBhMRUjHdsHaBtiyG2YfGho4P9Hg=; b=nfV+UP1a29unvvLyS11weXFC3q+hHT0JJooGUSznheU7HhNxolTjHir/wreZVnVRdy fp0l03HkWVYliEtkUqsh+ssynYEIOUXOYWR4rRYELlpMkOtu0/DqhXxUCaNYnmxtUttn YfxN9v2MAWRxax0t6omR60FuuU7SmDk/iUKotuGEdW0Gc9B75lX6cBUN42jjBwxhWbgE wgzHL2lwOrL0F1NqEs+GTF7gJ3IXO1BbPp0nC3/4Ue10S9gIyhPkdIpuuSu8/ygzne75 4l9oWEEqt2rFylXN/H7Nr/DR6AGB8OyCmrMRomg7PwfsFM8jm3x2d5VORAQ2SFzSf8jJ V7BA== X-Gm-Message-State: APjAAAW817bfWMxrfNedZci3qI0jEyoN8FpgIbqzcGDyyqc1ZAlOJ/oE OC72Th2FvJ23ZkgZoW/cTdmuRxvGxbMpicR3GPvekM0FMw7aUbvtM89iUFKd0DiYBzgYrhd+ETV DXn8L4xKlegWV8sQYA4agdnjDluD3QK1Lks19S2vM X-Received: by 2002:a0c:facf:: with SMTP id p15mr7126334qvo.212.1574885928281; Wed, 27 Nov 2019 12:18:48 -0800 (PST) X-Google-Smtp-Source: APXvYqwb5gsRgoNyUTZ9RvuZHRPWNQcwpWwW9hfnT0Fi1LdQ7R6fv1lIQ7IXG7qns1mSnifTBFAR5w== X-Received: by 2002:a0c:facf:: with SMTP id p15mr7126305qvo.212.1574885927932; Wed, 27 Nov 2019 12:18:47 -0800 (PST) Received: from gallifrey.lan ([2804:14c:4e6:1bc:4960:b0eb:4714:41f]) by smtp.gmail.com with ESMTPSA id o13sm8284524qto.96.2019.11.27.12.18.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2019 12:18:47 -0800 (PST) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [xenial:linux-azure][PATCH 11/15] blk-mq: dequeue request one by one from sw queue if hctx is busy Date: Wed, 27 Nov 2019 17:18:16 -0300 Message-Id: <20191127201820.32174-12-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191127201820.32174-1-marcelo.cerri@canonical.com> References: <20191127201820.32174-1-marcelo.cerri@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Ming Lei BugLink: https://bugs.launchpad.net/bugs/1848739 It won't be efficient to dequeue request one by one from sw queue, but we have to do that when queue is busy for better merge performance. This patch takes the Exponential Weighted Moving Average(EWMA) to figure out if queue is busy, then only dequeue request one by one from sw queue when queue is busy. Fixes: b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") Cc: Kashyap Desai Cc: Laurence Oberman Cc: Omar Sandoval Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Hannes Reinecke Reported-by: Kashyap Desai Tested-by: Kashyap Desai Signed-off-by: Ming Lei Signed-off-by: Jens Axboe (cherry picked from commit 6e768717304bdbe8d2897ca8298f6b58863fdc41) Signed-off-by: Marcelo Henrique Cerri --- block/blk-mq-debugfs.c | 9 +++++++++ block/blk-mq-sched.c | 11 ++--------- block/blk-mq.c | 33 ++++++++++++++++++++++++++++++++- include/linux/blk-mq.h | 3 ++- 4 files changed, 45 insertions(+), 11 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 54bd8c31b822..ead271fb641e 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -607,6 +607,14 @@ static int hctx_active_show(void *data, struct seq_file *m) return 0; } +static int hctx_dispatch_busy_show(void *data, struct seq_file *m) +{ + struct blk_mq_hw_ctx *hctx = data; + + seq_printf(m, "%u\n", hctx->dispatch_busy); + return 0; +} + static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos) __acquires(&ctx->lock) { @@ -776,6 +784,7 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = { {"queued", 0600, hctx_queued_show, hctx_queued_write}, {"run", 0600, hctx_run_show, hctx_run_write}, {"active", 0400, hctx_active_show}, + {"dispatch_busy", 0400, hctx_dispatch_busy_show}, {}, }; diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index f3380331e5f3..1518c794a78c 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -220,15 +220,8 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) } } else if (has_sched_dispatch) { blk_mq_do_dispatch_sched(hctx); - } else if (q->mq_ops->get_budget) { - /* - * If we need to get budget before queuing request, we - * dequeue request one by one from sw queue for avoiding - * to mess up I/O merge when dispatch runs out of resource. - * - * TODO: get more budgets, and dequeue more requests in - * one time. - */ + } else if (hctx->dispatch_busy) { + /* dequeue request one by one from sw queue if queue is busy */ blk_mq_do_dispatch_ctx(hctx); } else { blk_mq_flush_busy_ctxs(hctx, &rq_list); diff --git a/block/blk-mq.c b/block/blk-mq.c index 913157d52b92..691ed5f8f6d9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1118,6 +1118,35 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, } } +#define BLK_MQ_DISPATCH_BUSY_EWMA_WEIGHT 8 +#define BLK_MQ_DISPATCH_BUSY_EWMA_FACTOR 4 +/* + * Update dispatch busy with the Exponential Weighted Moving Average(EWMA): + * - EWMA is one simple way to compute running average value + * - weight(7/8 and 1/8) is applied so that it can decrease exponentially + * - take 4 as factor for avoiding to get too small(0) result, and this + * factor doesn't matter because EWMA decreases exponentially + */ +static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy) +{ + unsigned int ewma; + + if (hctx->queue->elevator) + return; + + ewma = hctx->dispatch_busy; + + if (!ewma && !busy) + return; + + ewma *= BLK_MQ_DISPATCH_BUSY_EWMA_WEIGHT - 1; + if (busy) + ewma += 1 << BLK_MQ_DISPATCH_BUSY_EWMA_FACTOR; + ewma /= BLK_MQ_DISPATCH_BUSY_EWMA_WEIGHT; + + hctx->dispatch_busy = ewma; +} + #define BLK_MQ_RESOURCE_DELAY 3 /* ms units */ /* @@ -1254,8 +1283,10 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, else if (needs_restart && (ret == BLK_STS_RESOURCE)) blk_mq_delay_run_hw_queue(hctx, BLK_MQ_RESOURCE_DELAY); + blk_mq_update_dispatch_busy(hctx, true); return false; - } + } else + blk_mq_update_dispatch_busy(hctx, false); /* * If the host/device is unable to accept more work, inform the diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 95c9a5c862e2..f3188bf2acee 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -32,9 +32,10 @@ struct blk_mq_hw_ctx { struct sbitmap ctx_map; struct blk_mq_ctx *dispatch_from; + unsigned int dispatch_busy; - struct blk_mq_ctx **ctxs; unsigned int nr_ctx; + struct blk_mq_ctx **ctxs; wait_queue_entry_t dispatch_wait; atomic_t wait_index;