From patchwork Fri Jan 11 11:07:57 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Mauricio Faria de Oliveira <mfo@canonical.com>
X-Patchwork-Id: 1023516
Return-Path: <kernel-team-bounces@lists.ubuntu.com>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com
	(client-ip=91.189.94.19; helo=huckleberry.canonical.com;
	envelope-from=kernel-team-bounces@lists.ubuntu.com;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none)
	header.from=canonical.com
Received: from huckleberry.canonical.com (huckleberry.canonical.com
	[91.189.94.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 43bg8c321bz9sCh;
	Fri, 11 Jan 2019 22:08:32 +1100 (AEDT)
Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com)
	by huckleberry.canonical.com with esmtp (Exim 4.86_2)
	(envelope-from <kernel-team-bounces@lists.ubuntu.com>)
	id 1ghufr-0007Cb-QP; Fri, 11 Jan 2019 11:08:27 +0000
Received: from youngberry.canonical.com ([91.189.89.112])
	by huckleberry.canonical.com with esmtps
	(TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128)
	(Exim 4.86_2) (envelope-from <mfo@canonical.com>) id 1ghufo-00079v-Nl
	for kernel-team@lists.ubuntu.com; Fri, 11 Jan 2019 11:08:24 +0000
Received: from mail-qt1-f199.google.com ([209.85.160.199])
	by youngberry.canonical.com with esmtps
	(TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.76) (envelope-from <mfo@canonical.com>) id 1ghufo-0004tq-9y
	for kernel-team@lists.ubuntu.com; Fri, 11 Jan 2019 11:08:24 +0000
Received: by mail-qt1-f199.google.com with SMTP id z6so16133238qtj.21
	for <kernel-team@lists.ubuntu.com>;
	Fri, 11 Jan 2019 03:08:24 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to
	:references;
	bh=86KXuItm+zIX8d53eeHS6xHOXn8obPjhhJfOuJG7N/0=;
	b=Si3oiO975P+keJp1fhXhlPhyVYGCrRygGPT6yed0FpEgT/E9DmWoRxFf3Hyn8TSpTM
	8Qs8nkX5Py1geg1WM0gGDa5OxgptSa4EUfFtXBxgnzq68ZE7y0HwrCKg55rl034L6t52
	09aDb29sLC2bEtT1HFfQe9lOjehPHqT+K5WcPcCBgqAYUYrGc3cUXe8sNNqZ4lXUbExE
	Wj5WNmky1xj3gjh+LGtaQ/DEq7A8Jve5j1mMd3xOMLH5VN8xzOVnJazkmcLktga8DeG0
	bvjGUz1n16QIDUJePVoqK5YZTylQi2BsIhkXN6HQVUxhUvjDA1u63fSrZfkpUWpVt0g0
	6K5A==
X-Gm-Message-State: AJcUukfhCpw0XMUJTkM4+UfR1EMXvhHn6UPS2tlbQXSG6+8UZuNYS3Pb
	4bGxwaWTRwWZw+wkPGqlVfBFxY32SOJ02qWN99GgOzcn+Zi27Zm1uB7bZl1tU4K6dBZQM3ZaMiJ
	Lt6BXj4YhDiUfdgwSZJQO8lBDoijRR2wA2YsKEF1Mug==
X-Received: by 2002:a0c:981b:: with SMTP id
	c27mr13882242qvd.184.1547204903317;
	Fri, 11 Jan 2019 03:08:23 -0800 (PST)
X-Google-Smtp-Source: 
 ALg8bN76Nw4XqmheD/why5uVH8BoiPa8WC0xiP5IkhmWdSCgyCSuW2mR+030G86VIHcKAbuSFnlp4g==
X-Received: by 2002:a0c:981b:: with SMTP id
	c27mr13882232qvd.184.1547204903132;
	Fri, 11 Jan 2019 03:08:23 -0800 (PST)
Received: from localhost.localdomain ([177.181.227.3])
	by smtp.gmail.com with ESMTPSA id
	o65sm39280460qkl.11.2019.01.11.03.08.20
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Fri, 11 Jan 2019 03:08:22 -0800 (PST)
From: Mauricio Faria de Oliveira <mfo@canonical.com>
To: kernel-team@lists.ubuntu.com
Subject: [SRU B][PATCH v2 7/7] blk-wbt: improve waking of tasks
Date: Fri, 11 Jan 2019 09:07:57 -0200
Message-Id: <20190111110757.17936-8-mfo@canonical.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190111110757.17936-1-mfo@canonical.com>
References: <20190111110757.17936-1-mfo@canonical.com>
X-BeenThere: kernel-team@lists.ubuntu.com
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Kernel team discussions <kernel-team.lists.ubuntu.com>
List-Unsubscribe: <https://lists.ubuntu.com/mailman/options/kernel-team>,
	<mailto:kernel-team-request@lists.ubuntu.com?subject=unsubscribe>
List-Archive: <https://lists.ubuntu.com/archives/kernel-team>
List-Post: <mailto:kernel-team@lists.ubuntu.com>
List-Help: <mailto:kernel-team-request@lists.ubuntu.com?subject=help>
List-Subscribe: <https://lists.ubuntu.com/mailman/listinfo/kernel-team>,
	<mailto:kernel-team-request@lists.ubuntu.com?subject=subscribe>
MIME-Version: 1.0
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: "kernel-team" <kernel-team-bounces@lists.ubuntu.com>

From: Jens Axboe <axboe@kernel.dk>

BugLink: https://bugs.launchpad.net/bugs/1810998

We have two potential issues:

1) After commit 2887e41b910b, we only wake one process at the time when
   we finish an IO. We really want to wake up as many tasks as can
   queue IO. Before this commit, we woke up everyone, which could cause
   a thundering herd issue.

2) A task can potentially consume two wakeups, causing us to (in
   practice) miss a wakeup.

Fix both by providing our own wakeup function, which stops
__wake_up_common() from waking up more tasks if we fail to get a
queueing token. With the strict ordering we have on the wait list, this
wakes the right tasks and the right amount of tasks.

Based on a patch from Jianchao Wang <jianchao.w.wang@oracle.com>.

Tested-by: Agarwal, Anchal <anchalag@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(backported from commit 38cfb5a45ee013bfab5d1ae4c4738815e744b440)
[mfo: backport:
 - hunk 2: s/rq_wait_inc_below(data->rqw/atomic_inc_below(&data->rqw->inflight/
 - hunk 3: s/rq_wait_inc_below(rqw/atomic_inc_below(&rqw->inflight/]
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
---
 block/blk-wbt.c | 63 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 56 insertions(+), 7 deletions(-)
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index d891b4f80c2f..742f0cb4b1cf 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -157,7 +157,7 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
 		int diff = limit - inflight;
 
 		if (!inflight || diff >= rwb->wb_background / 2)
-			wake_up(&rqw->wait);
+			wake_up_all(&rqw->wait);
 	}
 }
 
@@ -519,6 +519,34 @@ static inline unsigned int get_limit(struct rq_wb *rwb, unsigned long rw)
 	return limit;
 }
 
+struct wbt_wait_data {
+	struct wait_queue_entry wq;
+	struct task_struct *task;
+	struct rq_wb *rwb;
+	struct rq_wait *rqw;
+	unsigned long rw;
+	bool got_token;
+};
+
+static int wbt_wake_function(struct wait_queue_entry *curr, unsigned int mode,
+			     int wake_flags, void *key)
+{
+	struct wbt_wait_data *data = container_of(curr, struct wbt_wait_data,
+							wq);
+
+	/*
+	 * If we fail to get a budget, return -1 to interrupt the wake up
+	 * loop in __wake_up_common.
+	 */
+	if (!atomic_inc_below(&data->rqw->inflight, get_limit(data->rwb, data->rw)))
+		return -1;
+
+	data->got_token = true;
+	list_del_init(&curr->entry);
+	wake_up_process(data->task);
+	return 1;
+}
+
 /*
  * Block if we will exceed our limit, or if we are currently waiting for
  * the timer to kick off queuing again.
@@ -529,19 +557,40 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
 	__acquires(lock)
 {
 	struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
-	DECLARE_WAITQUEUE(wait, current);
+	struct wbt_wait_data data = {
+		.wq = {
+			.func	= wbt_wake_function,
+			.entry	= LIST_HEAD_INIT(data.wq.entry),
+		},
+		.task = current,
+		.rwb = rwb,
+		.rqw = rqw,
+		.rw = rw,
+	};
 	bool has_sleeper;
 
 	has_sleeper = wq_has_sleeper(&rqw->wait);
 	if (!has_sleeper && atomic_inc_below(&rqw->inflight, get_limit(rwb, rw)))
 		return;
 
-	add_wait_queue_exclusive(&rqw->wait, &wait);
+	prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
 	do {
-		set_current_state(TASK_UNINTERRUPTIBLE);
+		if (data.got_token)
+			break;
 
-		if (!has_sleeper && atomic_inc_below(&rqw->inflight, get_limit(rwb, rw)))
+		if (!has_sleeper &&
+		    atomic_inc_below(&rqw->inflight, get_limit(rwb, rw))) {
+			finish_wait(&rqw->wait, &data.wq);
+
+			/*
+			 * We raced with wbt_wake_function() getting a token,
+			 * which means we now have two. Put our local token
+			 * and wake anyone else potentially waiting for one.
+			 */
+			if (data.got_token)
+				wbt_rqw_done(rwb, rqw, wb_acct);
 			break;
+		}
 
 		if (lock) {
 			spin_unlock_irq(lock);
@@ -549,11 +598,11 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
 			spin_lock_irq(lock);
 		} else
 			io_schedule();
+
 		has_sleeper = false;
 	} while (1);
 
-	__set_current_state(TASK_RUNNING);
-	remove_wait_queue(&rqw->wait, &wait);
+	finish_wait(&rqw->wait, &data.wq);
 }
 
 static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio)