From patchwork Fri Jan 11 11:08:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 1023523 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43bg9T2fCjz9sBQ; Fri, 11 Jan 2019 22:09:17 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1ghugZ-0007Z3-UQ; Fri, 11 Jan 2019 11:09:11 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1ghugX-0007Vh-8y for kernel-team@lists.ubuntu.com; Fri, 11 Jan 2019 11:09:09 +0000 Received: from mail-qt1-f200.google.com ([209.85.160.200]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1ghugV-0004xn-IV for kernel-team@lists.ubuntu.com; Fri, 11 Jan 2019 11:09:07 +0000 Received: by mail-qt1-f200.google.com with SMTP id p24so16281150qtl.2 for ; Fri, 11 Jan 2019 03:09:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=GVOmrPwO8WL645o9xrcZTWQyxZiEejVnM4fHxG6XxkU=; b=plnO4+lCq44HgdOjgdREVKdX5hKZmGpn3FTzCq7oADCFJNkgAt8QErkrm0SuyYIcDS wKNKjC4AWVZXhq+YCvTbV7P3BPqJa0qANu1T447X9lVe4mTh5X6Kxru3/2b2lyj+5mru eKDYBObnw4ScH5rtnZNv31EHPqYxu0icbgHOixXJX3XOswaHNHsrO/IXUvthqfrBwdCg TM8pMY51Jk59iLy2lsxTAzxFM6+8bN+xFh5an1i12kNjmrLcYaXZqN4CzXuPMTKW4n/p VsNOSNHoP4wI9HfXZr324lD1kC28RAQUbQ6tjpx286bkh3VA57wj6xJFaCnA3eqoGCY7 DaUw== X-Gm-Message-State: AJcUukc5gNp/9O38q8gSJbb2PDNYa1ggZjkm9CnyINL5RZLDHCIkfafv FJ45Iz5j/garczCdTm5fLQETmYIrEyCvi0NZe+EHAViIsLlM6SWxZ/9TW3F97ddfluJ948f8mOv Aev7qepMAqVmY8Z8dlfiGmqMynjtCZLR2AoyGqCfUIQ== X-Received: by 2002:a37:553:: with SMTP id 80mr13009954qkf.200.1547204946513; Fri, 11 Jan 2019 03:09:06 -0800 (PST) X-Google-Smtp-Source: ALg8bN6IsrR69q0BEV0gz9GgP1/YwyDEByF53loSgYP7PL1Q55+iqYOjw3XACWWLKBv+gMZZQbLUYw== X-Received: by 2002:a37:553:: with SMTP id 80mr13009942qkf.200.1547204946296; Fri, 11 Jan 2019 03:09:06 -0800 (PST) Received: from localhost.localdomain ([177.181.227.3]) by smtp.gmail.com with ESMTPSA id x202sm35844345qka.67.2019.01.11.03.09.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Jan 2019 03:09:05 -0800 (PST) From: Mauricio Faria de Oliveira To: kernel-team@lists.ubuntu.com Subject: [SRU C][PATCH v2 6/6] blk-wbt: improve waking of tasks Date: Fri, 11 Jan 2019 09:08:43 -0200 Message-Id: <20190111110843.18042-7-mfo@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190111110843.18042-1-mfo@canonical.com> References: <20190111110843.18042-1-mfo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jens Axboe BugLink: https://bugs.launchpad.net/bugs/1810998 We have two potential issues: 1) After commit 2887e41b910b, we only wake one process at the time when we finish an IO. We really want to wake up as many tasks as can queue IO. Before this commit, we woke up everyone, which could cause a thundering herd issue. 2) A task can potentially consume two wakeups, causing us to (in practice) miss a wakeup. Fix both by providing our own wakeup function, which stops __wake_up_common() from waking up more tasks if we fail to get a queueing token. With the strict ordering we have on the wait list, this wakes the right tasks and the right amount of tasks. Based on a patch from Jianchao Wang . Tested-by: Agarwal, Anchal Signed-off-by: Jens Axboe (backported from commit 38cfb5a45ee013bfab5d1ae4c4738815e744b440) [mfo: backport: - hunk 2: s/rq_wait_inc_below(data->rqw/atomic_inc_below(&data->rqw->inflight/ - hunk 3: s/rq_wait_inc_below(rqw/atomic_inc_below(&rqw->inflight/] Signed-off-by: Mauricio Faria de Oliveira --- block/blk-wbt.c | 63 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 56 insertions(+), 7 deletions(-) diff --git a/block/blk-wbt.c b/block/blk-wbt.c index fe20486bd9b4..e9efcfc3a0d5 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -182,7 +182,7 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw, int diff = limit - inflight; if (!inflight || diff >= rwb->wb_background / 2) - wake_up(&rqw->wait); + wake_up_all(&rqw->wait); } } @@ -547,6 +547,34 @@ static inline unsigned int get_limit(struct rq_wb *rwb, unsigned long rw) return limit; } +struct wbt_wait_data { + struct wait_queue_entry wq; + struct task_struct *task; + struct rq_wb *rwb; + struct rq_wait *rqw; + unsigned long rw; + bool got_token; +}; + +static int wbt_wake_function(struct wait_queue_entry *curr, unsigned int mode, + int wake_flags, void *key) +{ + struct wbt_wait_data *data = container_of(curr, struct wbt_wait_data, + wq); + + /* + * If we fail to get a budget, return -1 to interrupt the wake up + * loop in __wake_up_common. + */ + if (!atomic_inc_below(&data->rqw->inflight, get_limit(data->rwb, data->rw))) + return -1; + + data->got_token = true; + list_del_init(&curr->entry); + wake_up_process(data->task); + return 1; +} + /* * Block if we will exceed our limit, or if we are currently waiting for * the timer to kick off queuing again. @@ -557,19 +585,40 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct, __acquires(lock) { struct rq_wait *rqw = get_rq_wait(rwb, wb_acct); - DECLARE_WAITQUEUE(wait, current); + struct wbt_wait_data data = { + .wq = { + .func = wbt_wake_function, + .entry = LIST_HEAD_INIT(data.wq.entry), + }, + .task = current, + .rwb = rwb, + .rqw = rqw, + .rw = rw, + }; bool has_sleeper; has_sleeper = wq_has_sleeper(&rqw->wait); if (!has_sleeper && atomic_inc_below(&rqw->inflight, get_limit(rwb, rw))) return; - add_wait_queue_exclusive(&rqw->wait, &wait); + prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE); do { - set_current_state(TASK_UNINTERRUPTIBLE); + if (data.got_token) + break; - if (!has_sleeper && atomic_inc_below(&rqw->inflight, get_limit(rwb, rw))) + if (!has_sleeper && + atomic_inc_below(&rqw->inflight, get_limit(rwb, rw))) { + finish_wait(&rqw->wait, &data.wq); + + /* + * We raced with wbt_wake_function() getting a token, + * which means we now have two. Put our local token + * and wake anyone else potentially waiting for one. + */ + if (data.got_token) + wbt_rqw_done(rwb, rqw, wb_acct); break; + } if (lock) { spin_unlock_irq(lock); @@ -577,11 +626,11 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct, spin_lock_irq(lock); } else io_schedule(); + has_sleeper = false; } while (1); - __set_current_state(TASK_RUNNING); - remove_wait_queue(&rqw->wait, &wait); + finish_wait(&rqw->wait, &data.wq); } static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio)