From patchwork Tue Jun 6 10:28:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gerald Yang X-Patchwork-Id: 1791051 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=FQXNOJCg; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qb6C930GTz20X0 for ; Tue, 6 Jun 2023 20:28:53 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1q6Tvh-0005sn-TF; Tue, 06 Jun 2023 10:28:45 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1q6Tvf-0005s1-Lc for kernel-team@lists.ubuntu.com; Tue, 06 Jun 2023 10:28:43 +0000 Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 73A923F137 for ; Tue, 6 Jun 2023 10:28:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1686047323; bh=GQakCYPPVd/VCcY0IKFtGEJyInnRoZ3ihL5b5J0BiNg=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FQXNOJCgrWku97G+zF+eIQ/V+JnD9j5BIXqutx9IPLWiixzEY3IJpiGou8kMHnZ0M CXMckS4jjLzJfaLpNE0iI3lweKnog7qbaKHz5qQKQc3Dn1ThdfNGQ9bEDIln3QVLJ7 Br/u2q6kQOluvMGNsSshkvAjbnQd1V1Zy6tSrBNySeWQ2iijS9Y1CJ8Agfh95xgy5/ GuWGuGpUQyShr2XOokcBk7Oc9qSYg+9Dvw7lO1WBgW+E3Iwsn3Z2PH4Qxa41yLpUaj idbSjInJAZKbRg29MwG8elgyG2IO8b6jkWmI0vmTSUvtTOzUzF5s/TKLi3spSDjqsy R0iL92VcafSFg== Received: by mail-pj1-f70.google.com with SMTP id 98e67ed59e1d1-256563a2097so2000522a91.0 for ; Tue, 06 Jun 2023 03:28:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686047321; x=1688639321; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GQakCYPPVd/VCcY0IKFtGEJyInnRoZ3ihL5b5J0BiNg=; b=KO3/zfya2Ek0PV/V+fga44R+Bc9ZM2qsmY1imh3TRsRi5Tw+ypjecQNPG4cJ6Bh86N 138iYBhFVtBaKKXBDUTlQDbuDMc3rBpGvzzF60J090uX3kGIoH1/oxkQ6vG5hg2Ty2DF IOetpWL5NRjIlPBBSdmArQ62K/8O79geNP3YqyIjk5LLiOTS07pzY56SMf+znLqvN7dW AebfMbgqAfBCcxvbqvHC/SD8qqWTy53RFfM+pCQvvcD2bo0DT5ZQOfzOwGkEWF87Wk/f d1MsxF6AhHif90spRQ2sIfpGplsI09BDGLnfc6RLDYDtFE/LRYeJM4kiJMdmlIHGYdGA FhiQ== X-Gm-Message-State: AC+VfDzjX5/jLsJPB9GAmoVYSFpX815l7nGEjyO9lDO2wl3ooVj7wAPY zcp1uXvBPossuVYgT5uoZDOmSe7lQVtKFuBHVdlsIjydJZZ8uJoRxlhw6qbhwme8CeqMwOng2UF at/vagxQ6/eUawWUSoEHNtpiz+TR13icdhQE3HwmQmdeR4ZE0Vw== X-Received: by 2002:a17:90a:51:b0:259:3e26:b5d9 with SMTP id 17-20020a17090a005100b002593e26b5d9mr347610pjb.2.1686047321045; Tue, 06 Jun 2023 03:28:41 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4eos7XSgkynHlejNzA0YTHvhHMASC/upigf2z6ybga7NgYOuWHvQmnDj1m/Ddwbg5BFMsd3A== X-Received: by 2002:a17:90a:51:b0:259:3e26:b5d9 with SMTP id 17-20020a17090a005100b002593e26b5d9mr347608pjb.2.1686047320624; Tue, 06 Jun 2023 03:28:40 -0700 (PDT) Received: from localhost.localdomain (220-135-31-21.hinet-ip.hinet.net. [220.135.31.21]) by smtp.gmail.com with ESMTPSA id 8-20020a17090a0c0800b00256a6ec8507sm9777629pjs.19.2023.06.06.03.28.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jun 2023 03:28:40 -0700 (PDT) From: Gerald Yang To: kernel-team@lists.ubuntu.com Subject: [SRU][K][PATCH 1/6] sbitmap: fix possible io hung due to lost wakeup Date: Tue, 6 Jun 2023 18:28:23 +0800 Message-Id: <20230606102828.218014-2-gerald.yang@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230606102828.218014-1-gerald.yang@canonical.com> References: <20230606102828.218014-1-gerald.yang@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Yu Kuai BugLink: https://bugs.launchpad.net/bugs/2022318 There are two problems can lead to lost wakeup: 1) invalid wakeup on the wrong waitqueue: For example, 2 * wake_batch tags are put, while only wake_batch threads are woken: __sbq_wake_up atomic_cmpxchg -> reset wait_cnt __sbq_wake_up -> decrease wait_cnt ... __sbq_wake_up -> wait_cnt is decreased to 0 again atomic_cmpxchg sbq_index_atomic_inc -> increase wake_index wake_up_nr -> wake up and waitqueue might be empty sbq_index_atomic_inc -> increase again, one waitqueue is skipped wake_up_nr -> invalid wake up because old wakequeue might be empty To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'. 2) 'wait_cnt' can be decreased while waitqueue is empty As pointed out by Jan Kara, following race is possible: CPU1 CPU2 __sbq_wake_up __sbq_wake_up sbq_wake_ptr() sbq_wake_ptr() -> the same wait_cnt = atomic_dec_return() /* decreased to 0 */ sbq_index_atomic_inc() /* move to next waitqueue */ atomic_set() /* reset wait_cnt */ wake_up_nr() /* wake up on the old waitqueue */ wait_cnt = atomic_dec_return() /* * decrease wait_cnt in the old * waitqueue, while it can be * empty. */ Fix the problem by waking up before updating 'wake_index' and 'wait_cnt'. With this patch, noted that 'wait_cnt' is still decreased in the old empty waitqueue, however, the wakeup is redirected to a active waitqueue, and the extra decrement on the old empty waitqueue is not handled. Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library") Signed-off-by: Yu Kuai Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe (cherry picked from commit 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4) Signed-off-by: Gerald Yang --- lib/sbitmap.c | 55 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 33 insertions(+), 22 deletions(-) diff --git a/lib/sbitmap.c b/lib/sbitmap.c index 29eb0484215a..1f31147872e6 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -611,32 +611,43 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq) return false; wait_cnt = atomic_dec_return(&ws->wait_cnt); - if (wait_cnt <= 0) { - int ret; + /* + * For concurrent callers of this, callers should call this function + * again to wakeup a new batch on a different 'ws'. + */ + if (wait_cnt < 0 || !waitqueue_active(&ws->wait)) + return true; - wake_batch = READ_ONCE(sbq->wake_batch); + if (wait_cnt > 0) + return false; - /* - * Pairs with the memory barrier in sbitmap_queue_resize() to - * ensure that we see the batch size update before the wait - * count is reset. - */ - smp_mb__before_atomic(); + wake_batch = READ_ONCE(sbq->wake_batch); - /* - * For concurrent callers of this, the one that failed the - * atomic_cmpxhcg() race should call this function again - * to wakeup a new batch on a different 'ws'. - */ - ret = atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wake_batch); - if (ret == wait_cnt) { - sbq_index_atomic_inc(&sbq->wake_index); - wake_up_nr(&ws->wait, wake_batch); - return false; - } + /* + * Wake up first in case that concurrent callers decrease wait_cnt + * while waitqueue is empty. + */ + wake_up_nr(&ws->wait, wake_batch); - return true; - } + /* + * Pairs with the memory barrier in sbitmap_queue_resize() to + * ensure that we see the batch size update before the wait + * count is reset. + * + * Also pairs with the implicit barrier between decrementing wait_cnt + * and checking for waitqueue_active() to make sure waitqueue_active() + * sees result of the wakeup if atomic_dec_return() has seen the result + * of atomic_set(). + */ + smp_mb__before_atomic(); + + /* + * Increase wake_index before updating wait_cnt, otherwise concurrent + * callers can see valid wait_cnt in old waitqueue, which can cause + * invalid wakeup on the old waitqueue. + */ + sbq_index_atomic_inc(&sbq->wake_index); + atomic_set(&ws->wait_cnt, wake_batch); return false; }