From patchwork Mon Jun 5 05:46:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gerald Yang X-Patchwork-Id: 1790222 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=iHC/RdiC; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QZN0542D7z20Tb for ; Mon, 5 Jun 2023 15:46:45 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1q6336-0007H5-89; Mon, 05 Jun 2023 05:46:36 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1q6330-00079R-Ml for kernel-team@lists.ubuntu.com; Mon, 05 Jun 2023 05:46:30 +0000 Received: from mail-oa1-f70.google.com (mail-oa1-f70.google.com [209.85.160.70]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 098F13F47C for ; Mon, 5 Jun 2023 05:46:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1685943989; bh=njrG9exquyu4CLsFvqZ5x0tjrQCyYYtME10yCbbA37M=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iHC/RdiCRkhLadeDgUWmoc9+DuIsO/ltTLuOo5W8iyrtOeATMLCwMPEeeW+bnPtmV H0EoW4Xr68ARIE8LgHDLbALYxg3zJcB3KUVSENjGWPZ0V9jxR3d3PGqE1/ZT8ctuxt yiX3J5toqtZNGdbn0yYFLjJ7hCcRE/bcgcF2JIGZjDlG5YZ/5I2NBLneFEbKP5Tkm/ 6X+ojA1lB8b6eNFyhhIb2YdSGOcUKIm9YL8hGlGE91nKeBfIUE13RENfucEg7SFLEh EklhvZGdeHZobxpjkGg+g7p/qDou56UiQAGHYkVOUVDjwrw4HwaNYzh6sazmu1gAJ6 6c25f+4k7tFMA== Received: by mail-oa1-f70.google.com with SMTP id 586e51a60fabf-19f0dde4a72so4250618fac.2 for ; Sun, 04 Jun 2023 22:46:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685943987; x=1688535987; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=njrG9exquyu4CLsFvqZ5x0tjrQCyYYtME10yCbbA37M=; b=QCvTmg4DY0sa78TXn8ZFTNSNUCdZdFqmxSI3lv3cGH3YLlUi0hpdukiCr8g1B3fHjx +sWqK1QUUyJSIsLHzvtDvYNNvbf4u2lBfmJtDdtRRosCuwvzQEtvB8UdgaGBrDx7R13P +CDwiZ/T4GDzhFWWXFnPfta/sI9Ispv2Hbh9zV5ge1+MJhasAOcCRNIF5Z+T56El+bgj 1IMoq6wX8CjDwqGe0L8CCHZAp1FlqTFW6l+L5ARBs41LIpScBoJRmvPvG3YRKzWT2cLZ nCg4DWKKCKuxh1cpGCgeKyoi09o/lY3IQOyCV4OzHgJ5g0pyGXx5O5CDXKIYDowPyGNG hZNQ== X-Gm-Message-State: AC+VfDzgu/FMdfTUE7p5WVJUZvSQIf7UCmBxzP3rxQuZK2GGMadFoyI9 A3dSuUpImeH+Bh0qMbqo2TXyoZ0o+JiqXSpZEE3eArmG6mdz90flrYsq8CVpfU8WY/nQfKTtpFR UZJfgPytHwFnGJ693fPxxVtwix/SLhR937ax2KPLiTz939O+rQQ== X-Received: by 2002:a05:6870:5316:b0:188:1338:fbb6 with SMTP id j22-20020a056870531600b001881338fbb6mr7656128oan.36.1685943987010; Sun, 04 Jun 2023 22:46:27 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6eNOxH4rpxkys7jTQLELcGtqS0/dk33LuKXyFLCXNbX3svPvzxKpUNR2M2Mt7h6rYJYtmG8g== X-Received: by 2002:a05:6870:5316:b0:188:1338:fbb6 with SMTP id j22-20020a056870531600b001881338fbb6mr7656121oan.36.1685943986705; Sun, 04 Jun 2023 22:46:26 -0700 (PDT) Received: from localhost.localdomain (220-135-31-21.hinet-ip.hinet.net. [220.135.31.21]) by smtp.gmail.com with ESMTPSA id p5-20020a170902eac500b001b03a1a3151sm5637798pld.70.2023.06.04.22.46.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Jun 2023 22:46:26 -0700 (PDT) From: Gerald Yang To: kernel-team@lists.ubuntu.com Subject: [SRU][K][PATCH 8/8] sbitmap: fix lockup while swapping Date: Mon, 5 Jun 2023 13:46:01 +0800 Message-Id: <20230605054601.1410517-9-gerald.yang@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230605054601.1410517-1-gerald.yang@canonical.com> References: <20230605054601.1410517-1-gerald.yang@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Hugh Dickins Commit 4acb83417cad ("sbitmap: fix batched wait_cnt accounting") is a big improvement: without it, I had to revert to before commit 040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup") to avoid the high system time and freezes which that had introduced. Now okay on the NVME laptop, but 4acb83417cad is a disaster for heavy swapping (kernel builds in low memory) on another: soon locking up in sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling around with waitqueue_active() but wait_cnt 0 . Here is a backtrace, showing the common pattern of outer sbitmap_queue_wake_up() interrupted before setting wait_cnt 0 back to wake_batch (in some cases other CPUs are idle, in other cases they're spinning for a lock in dd_bio_merge()): sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag < __blk_mq_free_request < blk_mq_free_request < __blk_mq_end_request < scsi_end_request < scsi_io_completion < scsi_finish_command < scsi_complete < blk_complete_reqs < blk_done_softirq < __do_softirq < __irq_exit_rcu < irq_exit_rcu < common_interrupt < asm_common_interrupt < _raw_spin_unlock_irqrestore < __wake_up_common_lock < __wake_up < sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag < __blk_mq_free_request < blk_mq_free_request < dd_bio_merge < blk_mq_sched_bio_merge < blk_mq_attempt_bio_merge < blk_mq_submit_bio < __submit_bio < submit_bio_noacct_nocheck < submit_bio_noacct < submit_bio < __swap_writepage < swap_writepage < pageout < shrink_folio_list < evict_folios < lru_gen_shrink_lruvec < shrink_lruvec < shrink_node < do_try_to_free_pages < try_to_free_pages < __alloc_pages_slowpath < __alloc_pages < folio_alloc < vma_alloc_folio < do_anonymous_page < __handle_mm_fault < handle_mm_fault < do_user_addr_fault < exc_page_fault < asm_exc_page_fault See how the process-context sbitmap_queue_wake_up() has been interrupted, after bringing wait_cnt down to 0 (and in this example, after doing its wakeups), before advancing wake_index and refilling wake_cnt: an interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck. I have almost no grasp of all the possible sbitmap races, and their consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0, so it is better if sbq_wake_ptr() skips on to the next ws in that case: which fixes the lockup and shows no adverse consequence for me. The check for wait_cnt being 0 is obviously racy, and ultimately can lead to lost wakeups: for example, when there is only a single waitqueue with waiters. However, lost wakeups are unlikely to matter in these cases, and a proper fix requires redesign (and benchmarking) of the batched wakeup code: so let's plug the hole with this bandaid for now. Signed-off-by: Hugh Dickins Reviewed-by: Jan Kara Reviewed-by: Keith Busch Link: https://lore.kernel.org/r/9c2038a7-cdc5-5ee-854c-fbc6168bf16@google.com Signed-off-by: Jens Axboe Signed-off-by: Gerald Yang --- lib/sbitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 624fa7f118d1..a8108a962dfd 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -587,7 +587,7 @@ static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq) for (i = 0; i < SBQ_WAIT_QUEUES; i++) { struct sbq_wait_state *ws = &sbq->ws[wake_index]; - if (waitqueue_active(&ws->wait)) { + if (waitqueue_active(&ws->wait) && atomic_read(&ws->wait_cnt)) { if (wake_index != atomic_read(&sbq->wake_index)) atomic_set(&sbq->wake_index, wake_index); return ws;