From patchwork Mon Jun  5 05:46:01 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Gerald Yang <gerald.yang@canonical.com>
X-Patchwork-Id: 1790222
Return-Path: <kernel-team-bounces@lists.ubuntu.com>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com
 (client-ip=91.189.94.19; helo=huckleberry.canonical.com;
 envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=<UNKNOWN>)
Authentication-Results: legolas.ozlabs.org;
	dkim=fail reason="signature verification failed" (2048-bit key;
 unprotected) header.d=canonical.com header.i=@canonical.com
 header.a=rsa-sha256 header.s=20210705 header.b=iHC/RdiC;
	dkim-atps=neutral
Received: from huckleberry.canonical.com (huckleberry.canonical.com
 [91.189.94.19])
	(using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4QZN0542D7z20Tb
	for <incoming@patchwork.ozlabs.org>; Mon,  5 Jun 2023 15:46:45 +1000 (AEST)
Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com)
	by huckleberry.canonical.com with esmtp (Exim 4.86_2)
	(envelope-from <kernel-team-bounces@lists.ubuntu.com>)
	id 1q6336-0007H5-89; Mon, 05 Jun 2023 05:46:36 +0000
Received: from smtp-relay-internal-1.internal ([10.131.114.114]
 helo=smtp-relay-internal-1.canonical.com)
 by huckleberry.canonical.com with esmtps
 (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2)
 (envelope-from <gerald.yang@canonical.com>) id 1q6330-00079R-Ml
 for kernel-team@lists.ubuntu.com; Mon, 05 Jun 2023 05:46:30 +0000
Received: from mail-oa1-f70.google.com (mail-oa1-f70.google.com
 [209.85.160.70])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
 (No client certificate requested)
 by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 098F13F47C
 for <kernel-team@lists.ubuntu.com>; Mon,  5 Jun 2023 05:46:29 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com;
 s=20210705; t=1685943989;
 bh=njrG9exquyu4CLsFvqZ5x0tjrQCyYYtME10yCbbA37M=;
 h=From:To:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=iHC/RdiCRkhLadeDgUWmoc9+DuIsO/ltTLuOo5W8iyrtOeATMLCwMPEeeW+bnPtmV
 H0EoW4Xr68ARIE8LgHDLbALYxg3zJcB3KUVSENjGWPZ0V9jxR3d3PGqE1/ZT8ctuxt
 yiX3J5toqtZNGdbn0yYFLjJ7hCcRE/bcgcF2JIGZjDlG5YZ/5I2NBLneFEbKP5Tkm/
 6X+ojA1lB8b6eNFyhhIb2YdSGOcUKIm9YL8hGlGE91nKeBfIUE13RENfucEg7SFLEh
 EklhvZGdeHZobxpjkGg+g7p/qDou56UiQAGHYkVOUVDjwrw4HwaNYzh6sazmu1gAJ6
 6c25f+4k7tFMA==
Received: by mail-oa1-f70.google.com with SMTP id
 586e51a60fabf-19f0dde4a72so4250618fac.2
 for <kernel-team@lists.ubuntu.com>; Sun, 04 Jun 2023 22:46:28 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20221208; t=1685943987; x=1688535987;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=njrG9exquyu4CLsFvqZ5x0tjrQCyYYtME10yCbbA37M=;
 b=QCvTmg4DY0sa78TXn8ZFTNSNUCdZdFqmxSI3lv3cGH3YLlUi0hpdukiCr8g1B3fHjx
 +sWqK1QUUyJSIsLHzvtDvYNNvbf4u2lBfmJtDdtRRosCuwvzQEtvB8UdgaGBrDx7R13P
 +CDwiZ/T4GDzhFWWXFnPfta/sI9Ispv2Hbh9zV5ge1+MJhasAOcCRNIF5Z+T56El+bgj
 1IMoq6wX8CjDwqGe0L8CCHZAp1FlqTFW6l+L5ARBs41LIpScBoJRmvPvG3YRKzWT2cLZ
 nCg4DWKKCKuxh1cpGCgeKyoi09o/lY3IQOyCV4OzHgJ5g0pyGXx5O5CDXKIYDowPyGNG
 hZNQ==
X-Gm-Message-State: AC+VfDzgu/FMdfTUE7p5WVJUZvSQIf7UCmBxzP3rxQuZK2GGMadFoyI9
 A3dSuUpImeH+Bh0qMbqo2TXyoZ0o+JiqXSpZEE3eArmG6mdz90flrYsq8CVpfU8WY/nQfKTtpFR
 UZJfgPytHwFnGJ693fPxxVtwix/SLhR937ax2KPLiTz939O+rQQ==
X-Received: by 2002:a05:6870:5316:b0:188:1338:fbb6 with SMTP id
 j22-20020a056870531600b001881338fbb6mr7656128oan.36.1685943987010;
 Sun, 04 Jun 2023 22:46:27 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ6eNOxH4rpxkys7jTQLELcGtqS0/dk33LuKXyFLCXNbX3svPvzxKpUNR2M2Mt7h6rYJYtmG8g==
X-Received: by 2002:a05:6870:5316:b0:188:1338:fbb6 with SMTP id
 j22-20020a056870531600b001881338fbb6mr7656121oan.36.1685943986705;
 Sun, 04 Jun 2023 22:46:26 -0700 (PDT)
Received: from localhost.localdomain (220-135-31-21.hinet-ip.hinet.net.
 [220.135.31.21]) by smtp.gmail.com with ESMTPSA id
 p5-20020a170902eac500b001b03a1a3151sm5637798pld.70.2023.06.04.22.46.25
 for <kernel-team@lists.ubuntu.com>
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Sun, 04 Jun 2023 22:46:26 -0700 (PDT)
From: Gerald Yang <gerald.yang@canonical.com>
To: kernel-team@lists.ubuntu.com
Subject: [SRU][K][PATCH 8/8] sbitmap: fix lockup while swapping
Date: Mon,  5 Jun 2023 13:46:01 +0800
Message-Id: <20230605054601.1410517-9-gerald.yang@canonical.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20230605054601.1410517-1-gerald.yang@canonical.com>
References: <20230605054601.1410517-1-gerald.yang@canonical.com>
MIME-Version: 1.0
X-BeenThere: kernel-team@lists.ubuntu.com
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Kernel team discussions <kernel-team.lists.ubuntu.com>
List-Unsubscribe: <https://lists.ubuntu.com/mailman/options/kernel-team>,
 <mailto:kernel-team-request@lists.ubuntu.com?subject=unsubscribe>
List-Archive: <https://lists.ubuntu.com/archives/kernel-team>
List-Post: <mailto:kernel-team@lists.ubuntu.com>
List-Help: <mailto:kernel-team-request@lists.ubuntu.com?subject=help>
List-Subscribe: <https://lists.ubuntu.com/mailman/listinfo/kernel-team>,
 <mailto:kernel-team-request@lists.ubuntu.com?subject=subscribe>
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: "kernel-team" <kernel-team-bounces@lists.ubuntu.com>

From: Hugh Dickins <hughd@google.com>

Commit 4acb83417cad ("sbitmap: fix batched wait_cnt accounting")
is a big improvement: without it, I had to revert to before commit
040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup")
to avoid the high system time and freezes which that had introduced.

Now okay on the NVME laptop, but 4acb83417cad is a disaster for heavy
swapping (kernel builds in low memory) on another: soon locking up in
sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling
around with waitqueue_active() but wait_cnt 0 .  Here is a backtrace,
showing the common pattern of outer sbitmap_queue_wake_up() interrupted
before setting wait_cnt 0 back to wake_batch (in some cases other CPUs
are idle, in other cases they're spinning for a lock in dd_bio_merge()):

sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
__blk_mq_free_request < blk_mq_free_request < __blk_mq_end_request <
scsi_end_request < scsi_io_completion < scsi_finish_command <
scsi_complete < blk_complete_reqs < blk_done_softirq < __do_softirq <
__irq_exit_rcu < irq_exit_rcu < common_interrupt < asm_common_interrupt <
_raw_spin_unlock_irqrestore < __wake_up_common_lock < __wake_up <
sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
__blk_mq_free_request < blk_mq_free_request < dd_bio_merge <
blk_mq_sched_bio_merge < blk_mq_attempt_bio_merge < blk_mq_submit_bio <
__submit_bio < submit_bio_noacct_nocheck < submit_bio_noacct <
submit_bio < __swap_writepage < swap_writepage < pageout <
shrink_folio_list < evict_folios < lru_gen_shrink_lruvec <
shrink_lruvec < shrink_node < do_try_to_free_pages < try_to_free_pages <
__alloc_pages_slowpath < __alloc_pages < folio_alloc < vma_alloc_folio <
do_anonymous_page < __handle_mm_fault < handle_mm_fault <
do_user_addr_fault < exc_page_fault < asm_exc_page_fault

See how the process-context sbitmap_queue_wake_up() has been interrupted,
after bringing wait_cnt down to 0 (and in this example, after doing its
wakeups), before advancing wake_index and refilling wake_cnt: an
interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck.

I have almost no grasp of all the possible sbitmap races, and their
consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0,
so it is better if sbq_wake_ptr() skips on to the next ws in that case:
which fixes the lockup and shows no adverse consequence for me.

The check for wait_cnt being 0 is obviously racy, and ultimately can lead
to lost wakeups: for example, when there is only a single waitqueue with
waiters.  However, lost wakeups are unlikely to matter in these cases,
and a proper fix requires redesign (and benchmarking) of the batched
wakeup code: so let's plug the hole with this bandaid for now.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/9c2038a7-cdc5-5ee-854c-fbc6168bf16@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Gerald Yang <gerald.yang@canonical.com>
---
 lib/sbitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 624fa7f118d1..a8108a962dfd 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -587,7 +587,7 @@ static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq)
 	for (i = 0; i < SBQ_WAIT_QUEUES; i++) {
 		struct sbq_wait_state *ws = &sbq->ws[wake_index];
 
-		if (waitqueue_active(&ws->wait)) {
+		if (waitqueue_active(&ws->wait) && atomic_read(&ws->wait_cnt)) {
 			if (wake_index != atomic_read(&sbq->wake_index))
 				atomic_set(&sbq->wake_index, wake_index);
 			return ws;