[{"id":3668953,"web_url":"http://patchwork.ozlabs.org/comment/3668953/","msgid":"<B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>","list_archive_url":null,"date":"2026-03-25T10:15:42","subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","submitter":{"id":4514,"url":"http://patchwork.ozlabs.org/api/people/4514/","name":"Andreas Dilger","email":"adilger@dilger.ca"},"content":"On Mar 25, 2026, at 03:33, Diangang Li <diangangli@gmail.com> wrote:\n> \n> From: Diangang Li <lidiangang@bytedance.com>\n> \n> ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read fails,\n> the buffer remains !Uptodate. With concurrent callers, each waiter can\n> retry the same failing read after the previous holder drops BH_Lock. This\n> amplifies device retry latency and may trigger hung tasks.\n> \n> In the normal read path the block driver already performs its own retries.\n> Once the retries keep failing, re-submitting the same metadata read from\n> the filesystem just amplifies the latency by serializing waiters on\n> BH_Lock.\n> \n> Remember read failures on buffer_head and fail fast for ext4 metadata reads\n> once a buffer has already failed to read. Clear the flag on successful\n> read/write completion so the buffer can recover. ext4 read-ahead uses\n> ext4_read_bh_nowait(), so it does not set the failure flag and remains\n> best-effort.\n\nNot that the patch is bad, but if the BH_Read_EIO flag is set on a buffer\nand it prevents other tasks from reading that block again, how would the\nbuffer ever become Uptodate to clear the flag?  There isn't enough state\nin a 1-bit flag to have any kind of expiry and later retry.\n\nCheers, Andreas","headers":{"Return-Path":"\n <SRS0=bW4J=BZ=vger.kernel.org=linux-ext4+bounces-15362-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=dilger-ca.20230601.gappssmtp.com\n header.i=@dilger-ca.20230601.gappssmtp.com header.a=rsa-sha256\n header.s=20230601 header.b=KZfZlDPw;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=150.107.74.76; helo=mail.ozlabs.org;\n envelope-from=srs0=bw4j=bz=vger.kernel.org=linux-ext4+bounces-15362-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=\"2600:3c09:e001:a7::12fc:5321\"\n arc.chain=subspace.kernel.org","gandalf.ozlabs.org;\n dmarc=none (p=none dis=none) header.from=dilger.ca","gandalf.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=dilger-ca.20230601.gappssmtp.com\n header.i=@dilger-ca.20230601.gappssmtp.com header.a=rsa-sha256\n header.s=20230601 header.b=KZfZlDPw;\n\tdkim-atps=neutral","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c09:e001:a7::12fc:5321; helo=sto.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15362-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=dilger-ca.20230601.gappssmtp.com\n header.i=@dilger-ca.20230601.gappssmtp.com header.b=\"KZfZlDPw\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=209.85.210.179","smtp.subspace.kernel.org;\n dmarc=none (p=none dis=none) header.from=dilger.ca","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=dilger.ca"],"Received":["from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1 raw public key)\n server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fgjbQ4lS4z1xy1\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 25 Mar 2026 21:20:33 +1100 (AEDT)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fgjbP4F99z4wM0\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 25 Mar 2026 21:20:33 +1100 (AEDT)","by gandalf.ozlabs.org (Postfix)\n\tid 4fgjbP4819z4wHf; Wed, 25 Mar 2026 21:20:33 +1100 (AEDT)","from sto.lore.kernel.org (sto.lore.kernel.org\n [IPv6:2600:3c09:e001:a7::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fgjbK4v5wz4wM0\n\tfor <patchwork-incoming@ozlabs.org>; Wed, 25 Mar 2026 21:20:29 +1100 (AEDT)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sto.lore.kernel.org (Postfix) with ESMTP id 1C39A3088B39\n\tfor <patchwork-incoming@ozlabs.org>; Wed, 25 Mar 2026 10:16:58 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id E06F33B3892;\n\tWed, 25 Mar 2026 10:15:56 +0000 (UTC)","from mail-pf1-f179.google.com (mail-pf1-f179.google.com\n [209.85.210.179])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id E0D023AEF50\n\tfor <linux-ext4@vger.kernel.org>; Wed, 25 Mar 2026 10:15:54 +0000 (UTC)","by mail-pf1-f179.google.com with SMTP id\n d2e1a72fcca58-82c2239140aso1835609b3a.0\n        for <linux-ext4@vger.kernel.org>;\n Wed, 25 Mar 2026 03:15:54 -0700 (PDT)","from smtpclient.apple (S01068c763f81ca4b.cg.shawcable.net.\n [70.77.200.158])\n        by smtp.gmail.com with ESMTPSA id\n d2e1a72fcca58-82b03bc1e59sm15996602b3a.13.2026.03.25.03.15.53\n        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);\n        Wed, 25 Mar 2026 03:15:53 -0700 (PDT)"],"ARC-Seal":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1774434033; cv=pass;\n\tb=U2EXAJQyVeUwa24EAp1D/2e5+Ld9T42V/hUFPnvFDYv0Wi5Lyvtb1u/dt2w0nt9ghIgZlLxujXyLz9x5j9bTKDBeeVKKFLjMVZGmI1M+boxaUJfXs+g2CjOKW4J9EiZhaxM9ho2eRUNJEZXCdMNL89gzvqSffSN4m2M1m5OkDaOHXPEGsPW+MRILaxoQoGnGzYpSkDlWDxs+37JNSW2aa60v86MlrH/WnuKn8SbCSUcu/LLjjvW4Lb49Q5x6jeDWbEwmBD1BosG0DjnZkvERSnSXd2jR1OXSNpaRnwpm+PRXZj0D8NMTby+kFG9L/W+vCWs7GBPfu6+Y1gBA8krUhg==","i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1774433756; cv=none;\n b=gShJW6gTLY7OfBBoDBTs53MsWV0Ve6qyR2GdViN7uDXlspFXqkMGvnicZEmk/M2OZa9dt5A9kH3pHV7H/acTlH2+40dD92jAoABmNSx4zH91vQdOM/oFOImV4/bQjS6EUuD88+yTWsuOK8z4RFEoLIJlXdhg8oaU9uz4GTDT0DU="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1774434033; c=relaxed/relaxed;\n\tbh=mAbulfAiOEAPrh906GbXLOqJOMJAMSMFlPNota4Sbhc=;\n\th=Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc:\n\t Message-Id:References:To;\n b=C3ZKuZprPQ9Myzg7wvQRVNxegDr7AjHzsLJNqLxYgWjqePdXD7Zv8jizK/dcEh/UTpL1JAOdsFG1CDFIc0dau1HG0XU1Ov0FgefuL5/+91vS/a70X/9fMgbnp0yYBZf3XeYp2U879HxL/Ub8g0sVtKRbOya0wZvyCFu5P3P+js60ojwoX6PM3SYY1vbRFWtX7OYdjexgQpajxBtkV5e1i6/e8f8sm5F1TxMUds9If18GB4eCjzBz7rqo8pmwavI7mTz4uu1458JNlDZ4Cto37LQgyk0qULQbUedy7YWczg1WZVgdBlyhgRwgki+OuM8OYPNTJFDXAm9Ra2jtCEkVAw==","i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1774433756; c=relaxed/simple;\n\tbh=dTbxFPey7xSpEexAbeZMQA6ed4XE8QfZT3WtybmGSMY=;\n\th=Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc:\n\t Message-Id:References:To;\n b=mw5TBMrL842TMz7aOgom/X6Knm8Vd2gihlW2YmqTopVKWPVwg4r3zJgywSWWtnMv7AqGiNkdIvgSUpe5Fp6IuzA3/w0j8y/1kQ9WWtVXtpUxn9oW2V5biFN/KAcxtQIyy4fVgA8/aYTZWlkW7WegG1Kyet/ojeM7lKa5njZsRrs="],"ARC-Authentication-Results":["i=2; gandalf.ozlabs.org;\n dmarc=none (p=none dis=none) header.from=dilger.ca; dkim=pass (2048-bit key;\n unprotected) header.d=dilger-ca.20230601.gappssmtp.com\n header.i=@dilger-ca.20230601.gappssmtp.com header.a=rsa-sha256\n header.s=20230601 header.b=KZfZlDPw; dkim-atps=neutral;\n spf=pass (client-ip=2600:3c09:e001:a7::12fc:5321; helo=sto.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15362-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=1; smtp.subspace.kernel.org;\n dmarc=none (p=none dis=none) header.from=dilger.ca;\n spf=pass smtp.mailfrom=dilger.ca;\n dkim=pass (2048-bit key) header.d=dilger-ca.20230601.gappssmtp.com\n header.i=@dilger-ca.20230601.gappssmtp.com header.b=KZfZlDPw;\n arc=none smtp.client-ip=209.85.210.179"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n        d=dilger-ca.20230601.gappssmtp.com; s=20230601; t=1774433754;\n x=1775038554; darn=vger.kernel.org;\n        h=to:references:message-id:content-transfer-encoding:cc:date\n         :in-reply-to:from:subject:mime-version:from:to:cc:subject:date\n         :message-id:reply-to;\n        bh=mAbulfAiOEAPrh906GbXLOqJOMJAMSMFlPNota4Sbhc=;\n        b=KZfZlDPw0qnVjy/dNdph6mtgtzPAy8k4qrpyhZC4GsDW6UNt6FUjA1O207xSlcZyNU\n         2e0TSTM48MDmZ9s53fRvrHXIDnuQHs/mjdegBE1q1CbA6cxfH6KstCVKlLrjMDsdoieC\n         XcXcraLctLcPyP9hVOLoXKztjng9dUS5cpa6jVqdRyKyfdWvyI0u41BgVluYvv7FE3kU\n         Cafyqb+R/EcmP4eDjhC5FtoifCUexYsbHTwtZmQszYgYYefN2epoMQEDmYPdj9TfCHFr\n         0eq5fvdOQM8GNF181Ep1Ttun0pP4O/GnYTJ+EPWf+Is5w6LMggZOnzDx0RvFHlnzj7Ae\n         lRpg==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n        d=1e100.net; s=20251104; t=1774433754; x=1775038554;\n        h=to:references:message-id:content-transfer-encoding:cc:date\n         :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state\n         :from:to:cc:subject:date:message-id:reply-to;\n        bh=mAbulfAiOEAPrh906GbXLOqJOMJAMSMFlPNota4Sbhc=;\n        b=KYEWfClAErd5qS/18x5XgOKqdxrH6RRcXZIk57YK+sYLDkznAPQUGNAFkorB1wX/OR\n         h+7+Z3RICueWKLNnOI6Bg2EEvhb3sUwIbVqkiOfuZV/LxhWG7cO8fbXMB+xeoXcjEasl\n         5dPFppUdTxLRif/Iqic4/p2G/PcmFehX849lYRl852GB9Z8inhpKGGCdgpZaYwb+DZ4A\n         Vcdjep7Dxu/fR6ZNrMrLJ29xKRnk+Z0lU0nlONx8cMWHBYBpBHXngbOjprAf/MRDXNy0\n         HoawvTS0iyrsd59anJMj9nt4r616Yo8R31/nb7HPzm4AU7DyOXLYESFdb9rBfsjGQZBR\n         2g6A==","X-Forwarded-Encrypted":"i=1;\n AJvYcCUSsUh17Pxf0T3JKho0yMuKZGMwEZ9RPne3txdrpg9eg9xcCao1rV0U+7LiAS0P+677Y35b+xL8Gvbi@vger.kernel.org","X-Gm-Message-State":"AOJu0YyhSESL+i7kkr3zP+YEsyjATmPvSEX7KdO8GNqWokwhQgftMRJg\n\tP4UZ3+vfK5qEDoLmDPrc0fBfq7En8m3RLvmcSXbcI27aVoRk9Dy3Sa53FEGaPOQg92lBM1Qh/pS\n\tJOQa6t3c=","X-Gm-Gg":"ATEYQzwjvRQfP1nmRmJJO1icNmDUHpb82K/BLSNwujQQXZMINwFg7Ik/IC63TOwfmUI\n\txmJmSH8R+gUNzuDsoS7NHWW+P9LrQD6VdsnMr8NQB/CcFzUneIyw2UtdSbS19GPtrh4/DXZNO4N\n\tJoQ6VYvsztBnNCvb+Lw1XcREDbvp/aisB2SMlIroxEqJnjZGs4XjlzJLSqKm+OUf1VdvIqLQn6+\n\tMMcGLKUxMPlllkYN3uQWVIDzbCiGPCh3EQG4QaC4/QwpbP3KxqsYFeNw97dBdS46FQxCdVYIxkZ\n\tzZUYJ3Edh/1C1GOBCNmDvQq29lCfCxUTaDXovOZXDEb0hPoYUGGzXwTcAFBQoFCnAVSuxAsfFnV\n\tGgkpo08X1iZcFF5sRiPUkFNo6RD7C2pO04LTwugePeGnfqI7PjCqjfMOnTqXwOmjgzIhs1eZ2Av\n\tieoQMROdn8X5+PExbVL6VoKcu8jja0m7odA3Xe9vJdaW/cZalrxtHXMgHT8YSO/ZxHJG1BhnEb0\n\tJDQ4A==","X-Received":"by 2002:a05:6a00:4398:b0:829:800b:9fe with SMTP id\n d2e1a72fcca58-82c6dfb397emr2776396b3a.39.1774433754335;\n        Wed, 25 Mar 2026 03:15:54 -0700 (PDT)","Content-Type":"text/plain;\n\tcharset=us-ascii","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","Mime-Version":"1.0 (Mac OS X Mail 16.0 \\(3864.100.1.1.5\\))","Subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","From":"Andreas Dilger <adilger@dilger.ca>","In-Reply-To":"<20260325093349.630193-2-diangangli@gmail.com>","Date":"Wed, 25 Mar 2026 04:15:42 -0600","Cc":"tytso@mit.edu,\n linux-ext4@vger.kernel.org,\n linux-fsdevel@vger.kernel.org,\n linux-kernel@vger.kernel.org,\n changfengnan@bytedance.com,\n Diangang Li <lidiangang@bytedance.com>","Content-Transfer-Encoding":"7bit","Message-Id":"<B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>","References":"<20260325093349.630193-1-diangangli@gmail.com>\n <20260325093349.630193-2-diangangli@gmail.com>","To":"Diangang Li <diangangli@gmail.com>","X-Mailer":"Apple Mail (2.3864.100.1.1.5)","X-Spam-Status":"No, score=-1.1 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DMARC_MISSING,HEADER_FROM_DIFFERENT_DOMAINS,\n\tMAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=disabled\n\tversion=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"}},{"id":3669006,"web_url":"http://patchwork.ozlabs.org/comment/3669006/","msgid":"<c6f4b982-c6e4-4f77-a16d-0c381c1e25f0@bytedance.com>","list_archive_url":null,"date":"2026-03-25T11:13:21","subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","submitter":{"id":90455,"url":"http://patchwork.ozlabs.org/api/people/90455/","name":"Diangang Li","email":"lidiangang@bytedance.com"},"content":"Hi Andreas,\n\nBH_Read_EIO is cleared on successful read or write.\n\nIn practice bad blocks are typically repaired/remapped on write, so we\nexpect recovery after a successful rewrite. If the block is never\nrewritten, repeatedly issuing the same failing read does not help.\n\nWe clear the flag on successful reads so the buffer can recover \nimmediately if the error was transient. Since read-ahead reads are not \nblocked, a later successful read-ahead will clear the flag and allow \nsubsequent synchronous readers to proceed normally.\n\nBest,\nDiangang\n\nOn 3/25/26 6:15 PM, Andreas Dilger wrote:\n> On Mar 25, 2026, at 03:33, Diangang Li <diangangli@gmail.com> wrote:\n>>\n>> From: Diangang Li <lidiangang@bytedance.com>\n>>\n>> ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read fails,\n>> the buffer remains !Uptodate. With concurrent callers, each waiter can\n>> retry the same failing read after the previous holder drops BH_Lock. This\n>> amplifies device retry latency and may trigger hung tasks.\n>>\n>> In the normal read path the block driver already performs its own retries.\n>> Once the retries keep failing, re-submitting the same metadata read from\n>> the filesystem just amplifies the latency by serializing waiters on\n>> BH_Lock.\n>>\n>> Remember read failures on buffer_head and fail fast for ext4 metadata reads\n>> once a buffer has already failed to read. Clear the flag on successful\n>> read/write completion so the buffer can recover. ext4 read-ahead uses\n>> ext4_read_bh_nowait(), so it does not set the failure flag and remains\n>> best-effort.\n> \n> Not that the patch is bad, but if the BH_Read_EIO flag is set on a buffer\n> and it prevents other tasks from reading that block again, how would the\n> buffer ever become Uptodate to clear the flag?  There isn't enough state\n> in a 1-bit flag to have any kind of expiry and later retry.\n> \n> Cheers, Andreas","headers":{"Return-Path":"\n <SRS0=okxo=BZ=vger.kernel.org=linux-ext4+bounces-15365-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=lp7VoRpX;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=2404:9400:2221:ea00::3; helo=mail.ozlabs.org;\n envelope-from=srs0=okxo=bz=vger.kernel.org=linux-ext4+bounces-15365-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=172.105.105.114 arc.chain=subspace.kernel.org","gandalf.ozlabs.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com","gandalf.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=lp7VoRpX;\n\tdkim-atps=neutral","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=172.105.105.114; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15365-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com\n header.b=\"lp7VoRpX\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=209.127.231.115","smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=bytedance.com"],"Received":["from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fgl123tZ3z1xy1\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 25 Mar 2026 22:24:21 +1100 (AEDT)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fgl113X7Jz4wSJ\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 25 Mar 2026 22:24:21 +1100 (AEDT)","by gandalf.ozlabs.org (Postfix)\n\tid 4fgl113RLhz4wSK; Wed, 25 Mar 2026 22:24:21 +1100 (AEDT)","from tor.lore.kernel.org (tor.lore.kernel.org [172.105.105.114])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fgl0y03jyz4wSJ\n\tfor <patchwork-incoming@ozlabs.org>; Wed, 25 Mar 2026 22:24:17 +1100 (AEDT)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id ABE0D30F775B\n\tfor <patchwork-incoming@ozlabs.org>; Wed, 25 Mar 2026 11:13:52 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 176303CB2CA;\n\tWed, 25 Mar 2026 11:13:50 +0000 (UTC)","from va-2-115.ptr.blmpb.com (va-2-115.ptr.blmpb.com\n [209.127.231.115])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BC87285041\n\tfor <linux-ext4@vger.kernel.org>; Wed, 25 Mar 2026 11:13:46 +0000 (UTC)"],"ARC-Seal":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1774437861; cv=pass;\n\tb=b5gaQ3ZKVJLF4z3S6prBfrK2Wa9sxwumB2bpNiwDppmp0sKhvScjnyDApN1Bc/TX+aywQ2VyJQE/1L6sqnqMww+u2BNb786OMGeKddRQJfoYKlLAi+/O0yBZoI859Gi6llXAWuWdWm6bZmf1yKihhJXDwKpEcCWC3j7o4Z+L57K/ER9L2Bkak6U1w26H0QX6BNtorT4y2QVJ3ORcQaCp6tPS2carjx67zZXsb2Gn0ocJnBGkVXl4UdWiPKeGw0pG3VQERqQidiISK8cJwr4DEL/bD73lf86Iy//Vff91yePcgRkMXg26bTtxxXwxJgnpThmr0TzMnVectn0uUf/gRg==","i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1774437229; cv=none;\n b=uM7ak3HCTEWRGr+VqrYKyJeMNzhH+648mXjHjIayisbYoRHD/oGHB+gXkc85KLHCatvidyoo25wa0bcERh796CSJEkH9MbN3QPX2u1pDCEc/h0UarEIu9hr3775PSp0/o9r88QGH2NhDx+91xEsokgA40ThOl8IkqSTV4uqO9kA="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1774437861; c=relaxed/relaxed;\n\tbh=/2uLAgm3FI5zhNajV9t6fx76xgfhgjkuo6m+NR3j7To=;\n\th=Message-Id:To:Cc:Subject:Mime-Version:References:From:Date:\n\t Content-Type:In-Reply-To;\n b=wv+uRkZR9oUxJGowT1adL9Qc+zCrHZ8MswhploGW8U/Mq/l2N/MmFhtbMnX4oqYrIfrRTrUPVDyvSljXU4SOhOWZjvTquwA6Mx6XkgFozSi6oSRD7lhtahNjIcLRYu4pJtE6uaDw2kLvKrtYKmGkUmN0IUL3z9VqgJOavW+RkkYoPZqhbKURq+xEmO+lBiDzlVrIAcjWi61ekS+YGhdBHeA7dxue+8CnlxWPM5z87wHV5uDRNeJn/IFNXGLPU2GtDQYsha32a2W08I3GjSQDmZjreWBYdUmR0gtsbJh2XARmWpV3BliLz+ydjHpMUeecMeM6qsJD2RYtuj3R09QPQw==","i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1774437229; c=relaxed/simple;\n\tbh=SFHCUZ67nBPcXWqe4pxF+iah/BBWuCNThzxz4J6SuqU=;\n\th=Message-Id:To:Cc:Subject:Mime-Version:References:From:Date:\n\t Content-Type:In-Reply-To;\n b=UIjoUGSiT+8Hv/jL+NlvWcvPvFVdGoOf6yxF87ku4XNpprenWViRjr7fGnZ2YjSYppWl2GNBMmpk7iga5NOJYbyz8v0VMw2+EGcjor1HFwr4azoRlsFfJz21AXmSpjvC7nffzOrPM+UBIRX27AHvJGrcsfvrjNy/JvnSwOpI6q8="],"ARC-Authentication-Results":["i=2; gandalf.ozlabs.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;\n dkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=lp7VoRpX; dkim-atps=neutral;\n spf=pass (client-ip=172.105.105.114; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15365-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;\n spf=pass smtp.mailfrom=bytedance.com;\n dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com\n header.b=lp7VoRpX; arc=none smtp.client-ip=209.127.231.115"],"DKIM-Signature":"v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;\n s=2212171451; d=bytedance.com; t=1774437216; h=from:subject:\n mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:\n mime-version:in-reply-to:message-id;\n bh=/2uLAgm3FI5zhNajV9t6fx76xgfhgjkuo6m+NR3j7To=;\n b=lp7VoRpX9+VtgW0e0EYqD10+zbp8UcEA3tsnrK0jjQ1DSvQHk4BRcfg0QX8NhDE0ESbU4I\n 4SazBM4BV1ZXKr9VlJyFtBkHIglUlJrlZfdOpE1oEE/5Lxg5qUBASPWBMLt1pmpxdtTLPY\n +UXzNKqUjPhasZENl/BQfsv+iwvmHHjWGMUpmC8uLFpDkXrNIFTp5wFoPmv3vnfZDkUMJe\n MWhdvlKKodsvds3d4u6hJ30EEugsWfv8RPZJLuWX+8WzbWfBJHpCZJxMj4wHWhcVV6tOso\n uiliJxBc4EkMtl60u0jgT42EapGxptTwrINUIPWMCCIGZGi8eqxzDDHvIxts3Q==","Message-Id":"<c6f4b982-c6e4-4f77-a16d-0c381c1e25f0@bytedance.com>","X-Original-From":"Diangang Li <lidiangang@bytedance.com>","User-Agent":"Mozilla Thunderbird","To":"\"Andreas Dilger\" <adilger@dilger.ca>,\n\t\"Diangang Li\" <diangangli@gmail.com>","Cc":"<tytso@mit.edu>, <linux-ext4@vger.kernel.org>,\n\t<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,\n\t<changfengnan@bytedance.com>","Subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","Content-Transfer-Encoding":"7bit","X-Lms-Return-Path":"\n <lba+269c3c35e+b21b58+vger.kernel.org+lidiangang@bytedance.com>","Content-Language":"en-US","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","Mime-Version":"1.0","References":"<20260325093349.630193-1-diangangli@gmail.com>\n <20260325093349.630193-2-diangangli@gmail.com>\n <B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>","From":"\"Diangang Li\" <lidiangang@bytedance.com>","Date":"Wed, 25 Mar 2026 19:13:21 +0800","Content-Type":"text/plain; charset=UTF-8","In-Reply-To":"<B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>","X-Spam-Status":"No, score=-1.2 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,\n\tHEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,\n\tSPF_PASS autolearn=disabled version=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"}},{"id":3669129,"web_url":"http://patchwork.ozlabs.org/comment/3669129/","msgid":"<e5c657e6-ffbd-4327-adaf-ae52cb50b96d@gmail.com>","list_archive_url":null,"date":"2026-03-25T14:27:13","subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","submitter":{"id":90700,"url":"http://patchwork.ozlabs.org/api/people/90700/","name":"Zhang Yi","email":"yizhang089@gmail.com"},"content":"Hi, Diangang,\n\nOn 3/25/2026 7:13 PM, Diangang Li wrote:\n> Hi Andreas,\n> \n> BH_Read_EIO is cleared on successful read or write.\n\nI think what Andreas means is, since you modified the ext4_read_bh() \ninterface, if the bh to be read already has the Read_EIO flag set, then \nsubsequent read operations through this interface will directly return \nfailure without issuing a read I/O. At the same time, because its state \nis also not uptodate, for an existing block, a write request will not be \nissued either. How can we clear this Read_EIO flag? IIRC, relying solely \non ext4_read_bh_nowait() doesn't seem sufficient to achieve this.\n\nThanks,\nYi.\n\n> \n> In practice bad blocks are typically repaired/remapped on write, so we\n> expect recovery after a successful rewrite. If the block is never\n> rewritten, repeatedly issuing the same failing read does not help.\n> \n> We clear the flag on successful reads so the buffer can recover\n> immediately if the error was transient. Since read-ahead reads are not\n> blocked, a later successful read-ahead will clear the flag and allow\n> subsequent synchronous readers to proceed normally.\n> \n> Best,\n> Diangang\n> \n> On 3/25/26 6:15 PM, Andreas Dilger wrote:\n>> On Mar 25, 2026, at 03:33, Diangang Li <diangangli@gmail.com> wrote:\n>>>\n>>> From: Diangang Li <lidiangang@bytedance.com>\n>>>\n>>> ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read fails,\n>>> the buffer remains !Uptodate. With concurrent callers, each waiter can\n>>> retry the same failing read after the previous holder drops BH_Lock. This\n>>> amplifies device retry latency and may trigger hung tasks.\n>>>\n>>> In the normal read path the block driver already performs its own retries.\n>>> Once the retries keep failing, re-submitting the same metadata read from\n>>> the filesystem just amplifies the latency by serializing waiters on\n>>> BH_Lock.\n>>>\n>>> Remember read failures on buffer_head and fail fast for ext4 metadata reads\n>>> once a buffer has already failed to read. Clear the flag on successful\n>>> read/write completion so the buffer can recover. ext4 read-ahead uses\n>>> ext4_read_bh_nowait(), so it does not set the failure flag and remains\n>>> best-effort.\n>>\n>> Not that the patch is bad, but if the BH_Read_EIO flag is set on a buffer\n>> and it prevents other tasks from reading that block again, how would the\n>> buffer ever become Uptodate to clear the flag?  There isn't enough state\n>> in a 1-bit flag to have any kind of expiry and later retry.\n>>\n>> Cheers, Andreas\n>","headers":{"Return-Path":"\n <SRS0=IXwM=BZ=vger.kernel.org=linux-ext4+bounces-15371-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256\n header.s=20251104 header.b=i4m+R0JT;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=2404:9400:2221:ea00::3; helo=mail.ozlabs.org;\n envelope-from=srs0=ixwm=bz=vger.kernel.org=linux-ext4+bounces-15371-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=172.105.105.114 arc.chain=subspace.kernel.org","gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com","gandalf.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256\n header.s=20251104 header.b=i4m+R0JT;\n\tdkim-atps=neutral","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=172.105.105.114; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15371-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com\n header.b=\"i4m+R0JT\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=209.85.216.45","smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=gmail.com"],"Received":["from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fgqGP13RJz1xy1\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 01:36:11 +1100 (AEDT)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fgqGM34DFz4wCH\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 01:36:11 +1100 (AEDT)","by gandalf.ozlabs.org (Postfix)\n\tid 4fgqGM2ydSz4wHf; Thu, 26 Mar 2026 01:36:11 +1100 (AEDT)","from tor.lore.kernel.org (tor.lore.kernel.org [172.105.105.114])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fgqGJ08ndz4wCH\n\tfor <patchwork-incoming@ozlabs.org>; Thu, 26 Mar 2026 01:36:07 +1100 (AEDT)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id AFACA302DF63\n\tfor <patchwork-incoming@ozlabs.org>; Wed, 25 Mar 2026 14:27:32 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 1ED002989B5;\n\tWed, 25 Mar 2026 14:27:30 +0000 (UTC)","from mail-pj1-f45.google.com (mail-pj1-f45.google.com\n [209.85.216.45])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 64B133E0C5A\n\tfor <linux-ext4@vger.kernel.org>; Wed, 25 Mar 2026 14:27:26 +0000 (UTC)","by mail-pj1-f45.google.com with SMTP id\n 98e67ed59e1d1-35c1a131946so25247a91.0\n        for <linux-ext4@vger.kernel.org>;\n Wed, 25 Mar 2026 07:27:26 -0700 (PDT)","from ?IPV6:240e:390:a8f:6471:c002:22f6:23a0:e7b?\n ([240e:390:a8f:6471:c002:22f6:23a0:e7b])\n        by smtp.gmail.com with ESMTPSA id\n 98e67ed59e1d1-35c0ea59e43sm895702a91.11.2026.03.25.07.27.19\n        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);\n        Wed, 25 Mar 2026 07:27:25 -0700 (PDT)"],"ARC-Seal":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1774449371; cv=pass;\n\tb=VdsaWeMT1YYyLynwSO+WFhP239F22G7LoHkOJ31T1kpVzxX8oYyZM8Yt3J6DMDdPNgUiNQ0zquOHquySr9+xucO6Gub1QcrkEj5jYPs+P0aTjXhYXphBb8a61Ix1/bX5LhbNC+vq0BDIzaUGb/evQd7C+1xk9oYZ0RIm4dMS/oJqFYC6NVIAbSaZLwDe1eICIQynjRbY1hfTMV8KEZpy8KTXxPl2LNTyOxxFqLTSLMFiGtVVYmdUkRB+sRJptc7tEDBACbvnoSZuJiO/02rdENwFsMImbHpKp8zjY/B69aLm+bMFynxwoYM9XdEn5+smSuLod2bQlQ5/O+AqSDmZkg==","i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1774448849; cv=none;\n b=hbykKQ/USCaQbs0VHxeMPRR9r7ha0fqjHRSqWB+Hd+JY8O9tyJiz2pDtdJRAan/X+vxs7t+Pxgnw6POPsWJ9hTVPsXDhF9ef/9BBT1rMZfsGEJpvSrVxoOfdID9zkNMiq6Uw9XxCJA0TwzFTS/H/OkFMePfp1CLIn7V+Wh7UhyY="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1774449371; c=relaxed/relaxed;\n\tbh=+HbLmM1zEuc8AHm8qtNLH+mQFIhDfBmY8SlHP2vRU+U=;\n\th=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:\n\t In-Reply-To:Content-Type;\n b=lQw72szG1+WEpciPVQMBsQwjOC6rxlZ68DR/BUfy2pupxwDrkKwbXhmCRhPcCCr4jmnCIpyWjzptTeqLXBrTgAhYdZ97ABXLAX4UU/SEXppRMoJNJI8Tx/Dv+K77G2YpCV2j5sS+JdsQF6xQ0qVuvQkGNL45oxVKuKg7TVNOztwkTQ/1vfdv6fqSNFa0HqmFfQ3LaqqZ94KMSH21ayMAqtlgb/ngIvO8m27x2sm0wSsaSY1xMXEml2kzNGLQ+cNyh17WVzdfdX3C+v23RCrF1KrblSItsw9kzwj8i2+cK1XbGIahIzxf3ncKgZpcczdWRDdGyujhBjbLvt8Yeccfsw==","i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1774448849; c=relaxed/simple;\n\tbh=DPgC4r4t9YgPeQc6G7vPdpzUoDB6zcmmHLwvynylhXo=;\n\th=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:\n\t In-Reply-To:Content-Type;\n b=DKVsWcIbCv3KhLNkH8NoyNGPe/LbuPkftKDsDWmK99v9Fhlfm1HwlTqj0zIbg/rs50byf8AaNCrVyNRwBnznKrlB1ymm0CeLDyeKw8Es0BH34pW4hPnfybzqDjov7eGVcvEGcYVf8s7N128kC3kbFKvuGwl/z+gbZaTefD6Q67M="],"ARC-Authentication-Results":["i=2; gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com; dkim=pass (2048-bit key;\n unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256\n header.s=20251104 header.b=i4m+R0JT; dkim-atps=neutral;\n spf=pass (client-ip=172.105.105.114; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15371-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com;\n spf=pass smtp.mailfrom=gmail.com;\n dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com\n header.b=i4m+R0JT; arc=none smtp.client-ip=209.85.216.45"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n        d=gmail.com; s=20251104; t=1774448845; x=1775053645;\n darn=vger.kernel.org;\n        h=content-transfer-encoding:in-reply-to:from:content-language\n         :references:cc:to:subject:user-agent:mime-version:date:message-id\n         :from:to:cc:subject:date:message-id:reply-to;\n        bh=+HbLmM1zEuc8AHm8qtNLH+mQFIhDfBmY8SlHP2vRU+U=;\n        b=i4m+R0JTVt7TRk93noS7bxsVxb7aWp/RNIBxXtf267iASQXYWEogiX58BzXf0TZNjG\n         6MvIt9iQqzH0VwhxNX8LUYACP8AUDETjEd7nnR08QjA8sTpJx4Br3CJ+GYBqExiLFMOT\n         NgyGuvCnu6Xm77ydpmmkx9htEyCULjx+WP44pst4kFcORl7s/U6TAOcMovcTGr3WuExp\n         o21Xu1cZJsB3DLBvZi5lP6SjrNumibqh7zhY4VeOUh1uJzGsfCRWSVHeMxu3ylNSDdKt\n         ZyM2QSfIBtT4b1rJkrSZ8WC02rMGEU+P31p3HMRBPGbX0me7F7ICBqLMWDhH/p6OG3g6\n         1F/g==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n        d=1e100.net; s=20251104; t=1774448845; x=1775053645;\n        h=content-transfer-encoding:in-reply-to:from:content-language\n         :references:cc:to:subject:user-agent:mime-version:date:message-id\n         :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id\n         :reply-to;\n        bh=+HbLmM1zEuc8AHm8qtNLH+mQFIhDfBmY8SlHP2vRU+U=;\n        b=fLpxIu3My1QXJVpMDHBzCCybJZdFJqlBMVAPggtP2W27jRQUqWyumTYqF6+CQSV0+8\n         Ii1L7sj/WCordrC/3Hf5ssz2HV5KnCVX+KT86u7siJnzcQQB5UP1O9pSnmDtzLRhtQ3S\n         zvsayoX4U9wQ/LMQ3FGfvLLmFnFnWT9Gj/wbwsUH8ap63tkwOTbKQ5EHLnA/auk9gYic\n         xDbyGvJasdxo8rpFcNftj2nQ05nhElKTJIC9yRxPE5hbJspsWN5kmYPbHyT+6cCPi702\n         xqVSz4+KddoliFKYjkeyJaLLblK/8Sp1PI9hXwDVEqH0wOOeSV+uaUba1IVgVbkCZCOU\n         4O1A==","X-Forwarded-Encrypted":"i=1;\n AJvYcCUQXczHLCumUIB1RLRaKm2RlGoqSZVdLuVhW8khGnTQpyOii5Gji8BkpMuQlon5JI0yf8E/9IeGsYYk@vger.kernel.org","X-Gm-Message-State":"AOJu0YyU2g/6zMEgeVcYC5WXDKEHH2BlksuTUgrH44Pme2B4jGMdDOkH\n\t3ugBWVU70Xb9VbGpBwg37Dl40xV+g1pissEEbqtTtnbyzN0PmFxbJHta","X-Gm-Gg":"ATEYQzz/9XBPt+lZ9odaoXuslPyRKL0jIsDX9CtUx5Z8WpEwWbpLyQrObpcoQ2QCNny\n\tnn8AdOFecAhDcpOEQlq3yYd08uP2068kOfQtDhCuZsWxOLj0zlojZTNYSlHcV8nul6JQu2aZLrH\n\tKTR/EEUIPcDELLoFJqhRnn1/I6BrRqjn5t5OlBoCZ3wiAy8sSWjAi/twfqJtjhqHwrUT96nVnib\n\to7E16z49cxtMWb+AK0KOUlvedlTX+/OoySuGpDN9q4TB7OUghZIB9mdfwPQg5nbjkweWQ1v28SE\n\t9i9mHVc9FZpCVHsHmCCiwWiETsVlt0Xe7HfwZrwj7CO8q7leDDk/ro5lHJzb3yhK6TvDcu43t7H\n\tOjBAVAYAx+qLKZuOorbMRbfaVzrY7otWBXffcplBPXmNACiXDKtaD+yFpsgTtVLg7pr5hAvyopg\n\tIl+2b2jq7D/CedkbRUxZjdHVZGXQ8ubJpJVZtRUi2/EQ8TAjXK2CfcyUINb9swPfp+l6mngC0=","X-Received":"by 2002:a17:90b:57e3:b0:34a:be93:72ee with SMTP id\n 98e67ed59e1d1-35c0d1451d7mr3028367a91.8.1774448845550;\n        Wed, 25 Mar 2026 07:27:25 -0700 (PDT)","Message-ID":"<e5c657e6-ffbd-4327-adaf-ae52cb50b96d@gmail.com>","Date":"Wed, 25 Mar 2026 22:27:13 +0800","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","To":"Diangang Li <lidiangang@bytedance.com>, Andreas Dilger\n <adilger@dilger.ca>, Diangang Li <diangangli@gmail.com>","Cc":"tytso@mit.edu, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,\n linux-kernel@vger.kernel.org, changfengnan@bytedance.com","References":"<20260325093349.630193-1-diangangli@gmail.com>\n <20260325093349.630193-2-diangangli@gmail.com>\n <B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>\n <c6f4b982-c6e4-4f77-a16d-0c381c1e25f0@bytedance.com>","Content-Language":"en-US","From":"Zhang Yi <yizhang089@gmail.com>","In-Reply-To":"<c6f4b982-c6e4-4f77-a16d-0c381c1e25f0@bytedance.com>","Content-Type":"text/plain; charset=UTF-8; format=flowed","Content-Transfer-Encoding":"7bit","X-Spam-Status":"No, score=-1.2 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,\n\tFREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,\n\tMAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=disabled\n\tversion=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"}},{"id":3669163,"web_url":"http://patchwork.ozlabs.org/comment/3669163/","msgid":"<acP5-v85CwUQZlMB@casper.infradead.org>","list_archive_url":null,"date":"2026-03-25T15:06:34","subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","submitter":{"id":70855,"url":"http://patchwork.ozlabs.org/api/people/70855/","name":"Matthew Wilcox","email":"willy@infradead.org"},"content":"On Wed, Mar 25, 2026 at 04:15:42AM -0600, Andreas Dilger wrote:\n> On Mar 25, 2026, at 03:33, Diangang Li <diangangli@gmail.com> wrote:\n> > \n> > From: Diangang Li <lidiangang@bytedance.com>\n> > \n> > ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read fails,\n> > the buffer remains !Uptodate. With concurrent callers, each waiter can\n> > retry the same failing read after the previous holder drops BH_Lock. This\n> > amplifies device retry latency and may trigger hung tasks.\n> > \n> > In the normal read path the block driver already performs its own retries.\n> > Once the retries keep failing, re-submitting the same metadata read from\n> > the filesystem just amplifies the latency by serializing waiters on\n> > BH_Lock.\n> > \n> > Remember read failures on buffer_head and fail fast for ext4 metadata reads\n> > once a buffer has already failed to read. Clear the flag on successful\n> > read/write completion so the buffer can recover. ext4 read-ahead uses\n> > ext4_read_bh_nowait(), so it does not set the failure flag and remains\n> > best-effort.\n> \n> Not that the patch is bad, but if the BH_Read_EIO flag is set on a buffer\n> and it prevents other tasks from reading that block again, how would the\n> buffer ever become Uptodate to clear the flag?  There isn't enough state\n> in a 1-bit flag to have any kind of expiry and later retry.\n\nI've been thinking about this problem too, albeit from a folio read\nperspective, not from a buffer_head read perspective.  You're quite\nright that one bit isn't enough.  The solution I was considering but\nhaven't implemented yet was to tell all the current waiters that\nthe IO has failed, but not set any kind of permanent error flag.\n\nI was thinking about starting with this:\n\n+++ b/include/linux/wait_bit.h\n@@ -10,6 +10,7 @@\n struct wait_bit_key {\n        unsigned long           *flags;\n        int                     bit_nr;\n+       int                     error;\n        unsigned long           timeout;\n };\n\n\nand then adding/changing various APIs to allow an error to be passed in\nand noticed by the woken task.\n\nWith this change, the thundering herd all wake up, see the error and\nreturn immediately instead of each submitting their own I/O.  New reads\nwill retry the read, but each will only be held up for a maximum of\ntheir own timeout.","headers":{"Return-Path":"\n <SRS0=8YfK=BZ=vger.kernel.org=linux-ext4+bounces-15372-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256\n header.s=casper.20170209 header.b=ZZt0Yymr;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=2404:9400:2221:ea00::3; helo=mail.ozlabs.org;\n envelope-from=srs0=8yfk=bz=vger.kernel.org=linux-ext4+bounces-15372-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=\"2600:3c0a:e001:db::12fc:5321\"\n arc.chain=subspace.kernel.org","gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=infradead.org","gandalf.ozlabs.org;\n\tdkim=pass (2048-bit key;\n secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256\n header.s=casper.20170209 header.b=ZZt0Yymr;\n\tdkim-atps=neutral","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c0a:e001:db::12fc:5321; helo=sea.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15372-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org\n header.b=\"ZZt0Yymr\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=90.155.50.34","smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=infradead.org","smtp.subspace.kernel.org;\n spf=none smtp.mailfrom=infradead.org"],"Received":["from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1 raw public key)\n server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fgrlv2KjQz1y1K\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 02:43:23 +1100 (AEDT)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fgrls0m38z4wCH\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 02:43:21 +1100 (AEDT)","by gandalf.ozlabs.org (Postfix)\n\tid 4fgrls0gnnz4wM6; Thu, 26 Mar 2026 02:43:21 +1100 (AEDT)","from sea.lore.kernel.org (sea.lore.kernel.org\n [IPv6:2600:3c0a:e001:db::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fgrln4ygXz4wCH\n\tfor <patchwork-incoming@ozlabs.org>; Thu, 26 Mar 2026 02:43:17 +1100 (AEDT)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sea.lore.kernel.org (Postfix) with ESMTP id 6A9C132511CA\n\tfor <patchwork-incoming@ozlabs.org>; Wed, 25 Mar 2026 15:12:02 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 37E643EDAC6;\n\tWed, 25 Mar 2026 15:06:42 +0000 (UTC)","from casper.infradead.org (casper.infradead.org [90.155.50.34])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id E20EA3EAC88;\n\tWed, 25 Mar 2026 15:06:38 +0000 (UTC)","from willy by casper.infradead.org with local (Exim 4.98.2 #2 (Red\n Hat Linux))\n\tid 1w5PoY-0000000G2du-1PBr;\n\tWed, 25 Mar 2026 15:06:34 +0000"],"ARC-Seal":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1774453401; cv=pass;\n\tb=MxAR3Dnd4rJJPqp8afSjpL0KsMtfrq+BjSlZ0Bix32IZdxZ1k0BCY3U/PET0IrmV3JVqMWryvtAvVtYq5on25R+ttTfHAit3NyHvED0g8ntZH/EdC9zjiwoejepx28OfN31rmg8PqyVPUkWf9KxZWNp/9yjwY2udviJW/1CrwOG+ZeR98XPEMzaSQ1Gw5lQHlqifqXW7MwV1f29E92ZcE40/CNo+OdoV4WQrTINAS5Trqqiw2A8Vur/RxEUHJ8ngukgMs1NwcAeOIxifAQ5n8r0XLk/xfTZGQ2JPEZlTX5bfSNqPxYbjg8TFUuI9qWlCfqBKxt7XpzBTobDvswZ6uw==","i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1774451202; cv=none;\n b=t7Akrtu0bcd4zNozc2LT/a2RcBnIFNk6K3fccLrbmhPhrEG1LEH/r/C58N+tp7qaFuivzPbja9kZd3B+LHprCmJGV7yXj91o/nF9dT//vWlwfNgz4ct3ctqVksNI7LDcbfdrZgANwi+3wlcZykoOaWTQYMAxu5BHjIxXS+GpAO8="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1774453401; c=relaxed/relaxed;\n\tbh=cYXwyEMIc1ILNL2kDIuCLRrloFnkyAEPOIgK7nUnE1Q=;\n\th=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:\n\t Content-Type:Content-Disposition:In-Reply-To;\n b=VjzYz5rwfDfA9XJdU7ZnLm/CpTAR3FcHiMWOik7Lyx18JCGU2Exe/J1OAkbSCL3qBghHIf82diLCs2s47awpbFiaMgErX5mHkPG2sTQJi9dYAwRvK7D7CjqAVSh7ou7ySjjG/P8ITOm7oHBeF0EMbvK6vjlc8INH4U52+WfLC52aY/2URwzuF5p0yiotkLxJiwXzJeOkN/k/Wdk7bPMLJMOIttTiykuYlQtt0OCunmU5WQsfGR1ahzVFaupfvdpPkGFKpJUs5J7OFYVaGBnFqnwtOQpfWX0af95mDkrg3WDCx6+uQOFizM+2Qy/XNgu//c1bi3Hhs7SCY3RBTOhwXA==","i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1774451202; c=relaxed/simple;\n\tbh=abjIJLR7w82rkCffSg5vDwCkeaoGvkn81T+GGKFIJeA=;\n\th=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:\n\t Content-Type:Content-Disposition:In-Reply-To;\n b=DgkVfdfkqYqrjaMP0Ea4LqASK7xgQq4sz5xb9K+551HanIdS+z8cRCfUDBg40Ow1L9S4PSw7Mkas6qHadPOTf9PUjp9WMqW8Gc76u+/rVsVDwh2XdaWdR/Oa706NX+KQPCAyimo10egIgVO/q/MXSQDhKlvJHnI/TBspM72XLMs="],"ARC-Authentication-Results":["i=2; gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=infradead.org;\n dkim=pass (2048-bit key;\n secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256\n header.s=casper.20170209 header.b=ZZt0Yymr; dkim-atps=neutral;\n spf=pass (client-ip=2600:3c0a:e001:db::12fc:5321; helo=sea.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15372-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=infradead.org;\n spf=none smtp.mailfrom=infradead.org;\n dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org\n header.b=ZZt0Yymr; arc=none smtp.client-ip=90.155.50.34"],"DKIM-Signature":"v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;\n\td=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version:\n\tReferences:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:\n\tContent-Transfer-Encoding:Content-ID:Content-Description;\n\tbh=cYXwyEMIc1ILNL2kDIuCLRrloFnkyAEPOIgK7nUnE1Q=; b=ZZt0YymrAGmGOAlCDew140Csby\n\tSqro/tJ1OLbP+0eCHxcuBsYQLcChXM3MKErmu8pWHq3e6cRwelDRiiVEbe/qOrEd+lzp5XvGs6cJT\n\tNgnQCqOVoPkR4GWJhlzCOIZTar/5gg3Mo6kztithAhgAcbIEeThPm6IiKFzcrr2dMV3hV3dAxC1A1\n\thsCnaj6JDcdfpDOi+9LobAF5PIpRY5gaV6IUqI2yYcLIAAGneQXbB3eXTKTcT01MmaZyhaDdM5Dqw\n\t/CFhoLcKrzZMWMYw3tWja4X2i/kSk12PeVu4sM27yu5JQzMgcY99A0mrIx8/KN+hUQTRFuMEmkDL0\n\t3h7pPLDQ==;","Date":"Wed, 25 Mar 2026 15:06:34 +0000","From":"Matthew Wilcox <willy@infradead.org>","To":"Andreas Dilger <adilger@dilger.ca>","Cc":"Diangang Li <diangangli@gmail.com>, tytso@mit.edu,\n\tlinux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org, changfengnan@bytedance.com,\n\tDiangang Li <lidiangang@bytedance.com>","Subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","Message-ID":"<acP5-v85CwUQZlMB@casper.infradead.org>","References":"<20260325093349.630193-1-diangangli@gmail.com>\n <20260325093349.630193-2-diangangli@gmail.com>\n <B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>","X-Spam-Status":"No, score=-1.2 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,\n\tHEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,\n\tSPF_PASS autolearn=disabled version=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"}},{"id":3669444,"web_url":"http://patchwork.ozlabs.org/comment/3669444/","msgid":"<d9210bcdf73fbe1ac8b6ec132865609a3ed68688.b75b68ec.808e.4625.9191.7f725153fe9d@bytedance.com>","list_archive_url":null,"date":"2026-03-26T02:26:16","subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","submitter":{"id":85004,"url":"http://patchwork.ozlabs.org/api/people/85004/","name":"changfengnan","email":"changfengnan@bytedance.com"},"content":"> From: \"Zhang Yi\"<yizhang089@gmail.com>\n> Date:  Wed, Mar 25, 2026, 22:27\n> Subject:  Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO failure\n> To: \"Diangang Li\"<lidiangang@bytedance.com>, \"Andreas Dilger\"<adilger@dilger.ca>, \"Diangang Li\"<diangangli@gmail.com>\n> Cc: <tytso@mit.edu>, <linux-ext4@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <changfengnan@bytedance.com>\n> Hi, Diangang,\n> \n> On 3/25/2026 7:13 PM, Diangang Li wrote:\n> > Hi Andreas,\n> > \n> > BH_Read_EIO is cleared on successful read or write.\n> \n> I think what Andreas means is, since you modified the ext4_read_bh() \n> interface, if the bh to be read already has the Read_EIO flag set, then \n> subsequent read operations through this interface will directly return \n> failure without issuing a read I/O. At the same time, because its state\n\nIMO, we first need to reach a consensus on whether we can expect a\nretry to succeed after a read failure. \nGiven that current SCSI and NVMe drivers already perform multiple\nretries for I/O errors.\nIMO, this depends on the specific error. If the block layer returns\nBLK_STS_RESOURCE or BLK_STS_AGAIN, we can retry; however, if\nit returns BLK_STS_MEDIUM or BLK_STS_IOERR, there is no need to retry.\nFor scenarios requiring a retry, we should also wait for a certain time\nwindow before retrying.\n\nThanks.\nFengnan.\n\n> is also not uptodate, for an existing block, a write request will not be \n> issued either. How can we clear this Read_EIO flag? IIRC, relying solely \n> on ext4_read_bh_nowait() doesn't seem sufficient to achieve this.\n> \n> Thanks,\n> Yi.\n> \n> > \n> > In practice bad blocks are typically repaired/remapped on write, so we\n> > expect recovery after a successful rewrite. If the block is never\n> > rewritten, repeatedly issuing the same failing read does not help.\n> > \n> > We clear the flag on successful reads so the buffer can recover\n> > immediately if the error was transient. Since read-ahead reads are not\n> > blocked, a later successful read-ahead will clear the flag and allow\n> > subsequent synchronous readers to proceed normally.\n> > \n> > Best,\n> > Diangang\n> > \n> > On 3/25/26 6:15 PM, Andreas Dilger wrote:\n> >> On Mar 25, 2026, at 03:33, Diangang Li <diangangli@gmail.com> wrote:\n> >>>\n> >>> From: Diangang Li <lidiangang@bytedance.com>\n> >>>\n> >>> ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read fails,\n> >>> the buffer remains !Uptodate. With concurrent callers, each waiter can\n> >>> retry the same failing read after the previous holder drops BH_Lock. This\n> >>> amplifies device retry latency and may trigger hung tasks.\n> >>>\n> >>> In the normal read path the block driver already performs its own retries.\n> >>> Once the retries keep failing, re-submitting the same metadata read from\n> >>> the filesystem just amplifies the latency by serializing waiters on\n> >>> BH_Lock.\n> >>>\n> >>> Remember read failures on buffer_head and fail fast for ext4 metadata reads\n> >>> once a buffer has already failed to read. Clear the flag on successful\n> >>> read/write completion so the buffer can recover. ext4 read-ahead uses\n> >>> ext4_read_bh_nowait(), so it does not set the failure flag and remains\n> >>> best-effort.\n> >>\n> >> Not that the patch is bad, but if the BH_Read_EIO flag is set on a buffer\n> >> and it prevents other tasks from reading that block again, how would the\n> >> buffer ever become Uptodate to clear the flag?  There isn't enough state\n> >> in a 1-bit flag to have any kind of expiry and later retry.\n> >>\n> >> Cheers, Andreas\n> >\n>","headers":{"Return-Path":"\n <SRS0=kSBi=B2=vger.kernel.org=linux-ext4+bounces-15385-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=E5n+X4Q/;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=2404:9400:2221:ea00::3; helo=mail.ozlabs.org;\n envelope-from=srs0=ksbi=b2=vger.kernel.org=linux-ext4+bounces-15385-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=172.234.253.10 arc.chain=subspace.kernel.org","gandalf.ozlabs.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com","gandalf.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=E5n+X4Q/;\n\tdkim-atps=neutral","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=172.234.253.10; helo=sea.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15385-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com\n header.b=\"E5n+X4Q/\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=209.127.231.111","smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=bytedance.com"],"Received":["from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1 raw public key)\n server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fh76D198sz1y1G\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 13:30:11 +1100 (AEDT)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fh76C1KHjz4w9r\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 13:30:11 +1100 (AEDT)","by gandalf.ozlabs.org (Postfix)\n\tid 4fh76C1CgCz4wHj; Thu, 26 Mar 2026 13:30:11 +1100 (AEDT)","from sea.lore.kernel.org (sea.lore.kernel.org [172.234.253.10])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fh7666GYkz4w9r\n\tfor <patchwork-incoming@ozlabs.org>; Thu, 26 Mar 2026 13:30:06 +1100 (AEDT)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sea.lore.kernel.org (Postfix) with ESMTP id 7FC153041BC6\n\tfor <patchwork-incoming@ozlabs.org>; Thu, 26 Mar 2026 02:26:29 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id ECDFD3793BC;\n\tThu, 26 Mar 2026 02:26:28 +0000 (UTC)","from va-2-111.ptr.blmpb.com (va-2-111.ptr.blmpb.com\n [209.127.231.111])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 801AD22FE11\n\tfor <linux-ext4@vger.kernel.org>; Thu, 26 Mar 2026 02:26:26 +0000 (UTC)"],"ARC-Seal":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1774492211; cv=pass;\n\tb=sey0SaM2uR+k1gdlmaCsfyePfnbXyGcJsnDr07L8B678OAO5XJZoMJk8H5YK40BYo+q7ypygTWgrFv2Wt77Rpdp1Uz/syyhYZil1Jted5G4h46grpK3xw0ZGASbN0tJDlizrmwdcZ0s3UThZdpNTiXiLBK7rO17N31KWa24gyEHZxDo19PvpwCZRAoQ9B/WSqU+OqG+8UsnpxBj7X9VhSQfJaIzw5WdKIWeqORgr6A/vVdc6GwakLFqxUAgSc42sHhWxUco5biEE55vFB4kAxY5yvjhATSjezmczXON51jCQO4o+ybTovy+aV+i9VmtgKBzZVlJDZovwEJxi9/KHTQ==","i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1774491988; cv=none;\n b=d6hLT29DpiI0mR/V2D0eztIwrjCNbhDtQ3BQGKVFjRjkS/t9QpynCcU6d+62+vDJV9JCJzAshKDnQab04zRRBwyvY2bnxK36aDEhXG2ZDmsOHanHxpowH19r/hHpTnkKVZoBoORBbAYx7biypP0VHKEgtUgQKQRARG765rOaSck="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1774492211; c=relaxed/relaxed;\n\tbh=ynby6JrDwn5Ax6abbGMCPRmuLTqO16MRFElp7xF6HjQ=;\n\th=From:References:In-Reply-To:Date:Message-Id:To:Cc:Subject:\n\t Mime-Version:Content-Type;\n b=m57o1Onm4U/jyb0cEfm9r9KIT66Dmccjy5qCFYV7yxa+9ZCwrLbRER7QbIJ4NwkGEvuKIdrR+LBIGdWT2K/A/xDnv1O7ZItfVT/EpEfgOiVWGEOX2XvrrFTKZ1i2+KPVLVQjYttpu60x8RZd7UGL9jQwAvxU7SsV9A1k6DUJOF0SElAHRYBbLZ5ai6OLXr+2SfyGRUc9iCPFiOtxqShMsyFRr4uGX5rbJsIcJCkJ4mOMbVqbjpcHw8rFqt84dy16yv5rbBDIIKkf6NhLx4miPSwCmIRzReKLcG9fHW+haAoon8btdG8G/HnYYihHrQJN9Y5VZolcL7354geyaZ+otg==","i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1774491988; c=relaxed/simple;\n\tbh=ynby6JrDwn5Ax6abbGMCPRmuLTqO16MRFElp7xF6HjQ=;\n\th=From:References:In-Reply-To:Date:Message-Id:To:Cc:Subject:\n\t Mime-Version:Content-Type;\n b=Ct78pSYzPpAxKeikug/S8DD6PF8cKabcwfbAPeAAsZPM0LC98n5uwFhcNPImZV4IMf/7fGmjBYHyryNZOYjtlqz2SdtfCXJ/Z9uhgWPg3A2AGu7kepzON/ZhPIwGxLnztr4XzYjZCuzHFeL36Sz2FC+pj8l7Qzmt++YMtptyv3I="],"ARC-Authentication-Results":["i=2; gandalf.ozlabs.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;\n dkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=E5n+X4Q/; dkim-atps=neutral;\n spf=pass (client-ip=172.234.253.10; helo=sea.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15385-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;\n spf=pass smtp.mailfrom=bytedance.com;\n dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com\n header.b=E5n+X4Q/; arc=none smtp.client-ip=209.127.231.111"],"DKIM-Signature":"v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;\n s=2212171451; d=bytedance.com; t=1774491981; h=from:subject:\n mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:\n mime-version:in-reply-to:message-id;\n bh=ynby6JrDwn5Ax6abbGMCPRmuLTqO16MRFElp7xF6HjQ=;\n b=E5n+X4Q/L5G+c2DiV3CVngYaOn9sbMB7AIgJUv6IY7v9FVZ1RcFbsFwFgeFq8x7uyikqrQ\n TPKICGsDxpiYnpqHCAuYfQmp7c6qaeG6TC/0Sl96LjxHZ1AJRx61izL4RE6/Qy0Zuiowg9\n 8Eck6rrI3ozlqwKZeJFgrAqulfWIICwvdkiDGVV75vI6AL0G25UP1Sb80BI2tJMH8cB6wG\n hURQ+qW6w6hCnjJsOhbJ89362V+2/Ljbk18Zgiy6k94yInEXEqepHtF3b1R+ZYRX6sESd5\n yyGZKACfNiBG5KsVR3sK+xqpVqMEhUvtpwYdeM/TB/TsJctUeSBbylpvbGqsTQ==","From":"\"changfengnan\" <changfengnan@bytedance.com>","X-Lms-Return-Path":"\n <lba+169c4994b+c2d505+vger.kernel.org+changfengnan@bytedance.com>","Content-Transfer-Encoding":"quoted-printable","References":"<20260325093349.630193-1-diangangli@gmail.com>\n <20260325093349.630193-2-diangangli@gmail.com>\n <B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>\n <c6f4b982-c6e4-4f77-a16d-0c381c1e25f0@bytedance.com>\n\t<e5c657e6-ffbd-4327-adaf-ae52cb50b96d@gmail.com>","In-Reply-To":"<e5c657e6-ffbd-4327-adaf-ae52cb50b96d@gmail.com>","Date":"Thu, 26 Mar 2026 10:26:16 +0800","Message-Id":"\n <d9210bcdf73fbe1ac8b6ec132865609a3ed68688.b75b68ec.808e.4625.9191.7f725153fe9d@bytedance.com>","To":"\"Zhang Yi\" <yizhang089@gmail.com>","Cc":"\"Diangang Li\" <lidiangang@bytedance.com>,\n\t\"Andreas Dilger\" <adilger@dilger.ca>,\n\t\"Diangang Li\" <diangangli@gmail.com>, <tytso@mit.edu>,\n\t<linux-ext4@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,\n\t<linux-kernel@vger.kernel.org>","Subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","Mime-Version":"1.0","Content-Type":"text/plain; charset=UTF-8","X-Spam-Status":"No, score=-1.2 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,\n\tHEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,\n\tSPF_PASS autolearn=disabled version=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"}},{"id":3669517,"web_url":"http://patchwork.ozlabs.org/comment/3669517/","msgid":"<aa1f5bb1-ffa0-4c85-a228-730593bd4ed0@bytedance.com>","list_archive_url":null,"date":"2026-03-26T07:42:57","subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","submitter":{"id":90455,"url":"http://patchwork.ozlabs.org/api/people/90455/","name":"Diangang Li","email":"lidiangang@bytedance.com"},"content":"Hi, Yi,\n\nThanks. Yes, for existing metadata blocks ext4 is read-modify-write, so \nwithout a successful read (Uptodate) there is no write path to update \nthat block.\n\nIn the case we're seeing, the read keeps failing (repeated I/O errors on \nthe same LBA), so the write never has a chance to run either. Given \nthat, would it make sense (as Fengnan suggested) to treat persistent \nmedia errors (e.g. MEDIUM ERROR / IO ERROR) as non-retryable at the \nfilesystem level, i.e. keep failing fast for that block? That would \navoid the BH_Lock thundering herd and prevent hung tasks.\n\nThanks,\nDiangang\n\nOn 3/25/26 10:27 PM, Zhang Yi wrote:\n> Hi, Diangang,\n> \n> On 3/25/2026 7:13 PM, Diangang Li wrote:\n>> Hi Andreas,\n>>\n>> BH_Read_EIO is cleared on successful read or write.\n> \n> I think what Andreas means is, since you modified the ext4_read_bh() \n> interface, if the bh to be read already has the Read_EIO flag set, then \n> subsequent read operations through this interface will directly return \n> failure without issuing a read I/O. At the same time, because its state \n> is also not uptodate, for an existing block, a write request will not be \n> issued either. How can we clear this Read_EIO flag? IIRC, relying solely \n> on ext4_read_bh_nowait() doesn't seem sufficient to achieve this.\n> \n> Thanks,\n> Yi.\n> \n>>\n>> In practice bad blocks are typically repaired/remapped on write, so we\n>> expect recovery after a successful rewrite. If the block is never\n>> rewritten, repeatedly issuing the same failing read does not help.\n>>\n>> We clear the flag on successful reads so the buffer can recover\n>> immediately if the error was transient. Since read-ahead reads are not\n>> blocked, a later successful read-ahead will clear the flag and allow\n>> subsequent synchronous readers to proceed normally.\n>>\n>> Best,\n>> Diangang\n>>\n>> On 3/25/26 6:15 PM, Andreas Dilger wrote:\n>>> On Mar 25, 2026, at 03:33, Diangang Li <diangangli@gmail.com> wrote:\n>>>>\n>>>> From: Diangang Li <lidiangang@bytedance.com>\n>>>>\n>>>> ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read \n>>>> fails,\n>>>> the buffer remains !Uptodate. With concurrent callers, each waiter can\n>>>> retry the same failing read after the previous holder drops BH_Lock. \n>>>> This\n>>>> amplifies device retry latency and may trigger hung tasks.\n>>>>\n>>>> In the normal read path the block driver already performs its own \n>>>> retries.\n>>>> Once the retries keep failing, re-submitting the same metadata read \n>>>> from\n>>>> the filesystem just amplifies the latency by serializing waiters on\n>>>> BH_Lock.\n>>>>\n>>>> Remember read failures on buffer_head and fail fast for ext4 \n>>>> metadata reads\n>>>> once a buffer has already failed to read. Clear the flag on successful\n>>>> read/write completion so the buffer can recover. ext4 read-ahead uses\n>>>> ext4_read_bh_nowait(), so it does not set the failure flag and remains\n>>>> best-effort.\n>>>\n>>> Not that the patch is bad, but if the BH_Read_EIO flag is set on a \n>>> buffer\n>>> and it prevents other tasks from reading that block again, how would the\n>>> buffer ever become Uptodate to clear the flag?  There isn't enough state\n>>> in a 1-bit flag to have any kind of expiry and later retry.\n>>>\n>>> Cheers, Andreas\n>>\n>","headers":{"Return-Path":"\n <SRS0=Ov4N=B2=vger.kernel.org=linux-ext4+bounces-15390-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=R4sQiRwh;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=2404:9400:2221:ea00::3; helo=mail.ozlabs.org;\n envelope-from=srs0=ov4n=b2=vger.kernel.org=linux-ext4+bounces-15390-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=\"2600:3c04:e001:36c::12fc:5321\"\n arc.chain=subspace.kernel.org","gandalf.ozlabs.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com","gandalf.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=R4sQiRwh;\n\tdkim-atps=neutral","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c04:e001:36c::12fc:5321; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15390-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com\n header.b=\"R4sQiRwh\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=209.127.230.114","smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=bytedance.com"],"Received":["from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1 raw public key)\n server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fhG3j0sfpz1y1G\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 18:43:27 +1100 (AEDT)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fhG3g3MKvz4wBB\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 18:43:27 +1100 (AEDT)","by gandalf.ozlabs.org (Postfix)\n\tid 4fhG3g3K7Cz4wCX; Thu, 26 Mar 2026 18:43:27 +1100 (AEDT)","from tor.lore.kernel.org (tor.lore.kernel.org\n [IPv6:2600:3c04:e001:36c::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fhG3b1wcwz4wBB\n\tfor <patchwork-incoming@ozlabs.org>; Thu, 26 Mar 2026 18:43:23 +1100 (AEDT)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id 618EB301DD98\n\tfor <patchwork-incoming@ozlabs.org>; Thu, 26 Mar 2026 07:43:21 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id BC66934FF45;\n\tThu, 26 Mar 2026 07:43:20 +0000 (UTC)","from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com\n [209.127.230.114])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id D9B2233F5B9\n\tfor <linux-ext4@vger.kernel.org>; Thu, 26 Mar 2026 07:43:17 +0000 (UTC)"],"ARC-Seal":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1774511007; cv=pass;\n\tb=yksSO9KZ5i7tjhfRmuSKUj2ZmOkxtGvT4uHthb1CBOw1pXuYci4Eh4j/2g5SHtSsemZn/TlpgzQz/WwtUq7w6glPltR/rlpglqtYs9d1UK4yuVSwf0TL1xZnwHApVwIKwWyHlayaH4CYhOoL1s/psqmLc2Q/Hvup9E6kavcA+udItoBI8gtXJH+qAWEJ8ClXhu0y6z3aqUn6rcgAvU2WFz6f7qnAbzB+kBcYAvsiAHC2G8NXAb/VoedgcP4fsgENhERCbwCDog75YJAW0tVULs/dYro7x7qz1LilwWoRaW69gDL69ggxAOwIuyOk9dVfmuB7LVP5QIVBvR+Waw/paQ==","i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1774511000; cv=none;\n b=AKwavZlrkd+crfPSbAOzYyrC39F0NGZKvlFMaSMvwnt2DJDh57XCBHtCzQmERF6PhsVHA8DdWaJ2yjt/yb07IHAaSXGDQWP7xezJFw2XcMSp0KzynssVJfyX6Y+nFsz/VpHvo3ujOEQsx4ZnXleYRAdGX8h1F5UNrT3WcW4rD74="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1774511007; c=relaxed/relaxed;\n\tbh=nSUnf9vKTG2WbDQxCg6oJsNQhxrvJOvA3SX+S7lkv8M=;\n\th=Cc:Date:Content-Type:To:References:From:Subject:Message-Id:\n\t Mime-Version:In-Reply-To;\n b=Uep1j1LUVoHKZlT+poG2w7bPPHRLbk2j1/Q86Y20mjhMxdQni/uJu2gwWQtFfY8SxMSZzBIJy8e2zpOoPM7H46CPaO5R+om40wBwZOjAq4EIh01tyjgXmB8qbdg72sfWoIW7cdAgFFgypzGcKu4WzUq9rw7IEW+kjoN4zgJ66U6Bn/SvIybb/AIIPBAV93ppocujtYJlXDslz3/m8OdwRDTfSNIe0Dv04btoHkNSBBRozOvCTNqFlbmsM6DWKj9m5In5QDg1glRvwDismDdUm0uOqn+TNsQvntej0PphHqGDGdIM3zTFhklmxPyi8cqPtmWCt/Ege4yUKr3bef3nXw==","i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1774511000; c=relaxed/simple;\n\tbh=nSUnf9vKTG2WbDQxCg6oJsNQhxrvJOvA3SX+S7lkv8M=;\n\th=Cc:Date:Content-Type:To:References:From:Subject:Message-Id:\n\t Mime-Version:In-Reply-To;\n b=TmM2jI63pbiTL79DNGlAOzrMpHTwOaxiTMiMqW2AbIIUTtd/N6KInBys0bofXgdSXp9iPoLYOEkQaW5Ig9CS35FRbJwhAl8NEFnDRHGVX9SCMmP1XVSgS6oWroKCu7Xz80ttDyGTMif2Bg6GyJqZZqhaGJFvwU5DM427DCG/sF4="],"ARC-Authentication-Results":["i=2; gandalf.ozlabs.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;\n dkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=R4sQiRwh; dkim-atps=neutral;\n spf=pass (client-ip=2600:3c04:e001:36c::12fc:5321; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15390-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;\n spf=pass smtp.mailfrom=bytedance.com;\n dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com\n header.b=R4sQiRwh; arc=none smtp.client-ip=209.127.230.114"],"DKIM-Signature":"v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;\n s=2212171451; d=bytedance.com; t=1774510992; h=from:subject:\n mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:\n mime-version:in-reply-to:message-id;\n bh=nSUnf9vKTG2WbDQxCg6oJsNQhxrvJOvA3SX+S7lkv8M=;\n b=R4sQiRwhdLEqS5YRcOfMoaYtmkrggJzqSkDSoPINbQB7s45paTZHW/auWQyfjU2j+pj2tt\n BElStn97jncS23PSU0oiqRfV9hfVhgoz2H+gAu1p7qC2hXDJI3f1sTOmQB4nMdsWwM/G+H\n WNzA6mXcfq/hP6/ssv8TtDN6SaSpy4qgDKbBts7HfqNDxmloTx2FE5fw6WqPfaOIMEM7/U\n KQc+Oc4WwCTv+H+xDiEb9OZrStlQ5R8tONLGSFXXjsVG+DCaKrafgXz9RPUu3BPwgeMev9\n yEuylpIBrnqDw912bUUNaDnCoiadgwKFeE4Kj9qVqHuylUQSWfVj0RIhCVadiQ==","Cc":"<tytso@mit.edu>, <linux-ext4@vger.kernel.org>,\n\t<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,\n\t<changfengnan@bytedance.com>","X-Lms-Return-Path":"\n <lba+269c4e38e+14f4dc+vger.kernel.org+lidiangang@bytedance.com>","Content-Transfer-Encoding":"quoted-printable","Date":"Thu, 26 Mar 2026 15:42:57 +0800","Content-Type":"text/plain; charset=UTF-8","To":"\"Zhang Yi\" <yizhang089@gmail.com>, \"Andreas Dilger\" <adilger@dilger.ca>,\n\t\"Diangang Li\" <diangangli@gmail.com>","User-Agent":"Mozilla Thunderbird","X-Original-From":"Diangang Li <lidiangang@bytedance.com>","References":"<20260325093349.630193-1-diangangli@gmail.com>\n <20260325093349.630193-2-diangangli@gmail.com>\n <B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>\n <c6f4b982-c6e4-4f77-a16d-0c381c1e25f0@bytedance.com>\n <e5c657e6-ffbd-4327-adaf-ae52cb50b96d@gmail.com>","Content-Language":"en-US","From":"\"Diangang Li\" <lidiangang@bytedance.com>","Subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","Message-Id":"<aa1f5bb1-ffa0-4c85-a228-730593bd4ed0@bytedance.com>","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","Mime-Version":"1.0","In-Reply-To":"<e5c657e6-ffbd-4327-adaf-ae52cb50b96d@gmail.com>","X-Spam-Status":"No, score=-1.2 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,\n\tHEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,\n\tSPF_PASS autolearn=disabled version=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"}},{"id":3669656,"web_url":"http://patchwork.ozlabs.org/comment/3669656/","msgid":"<51472d04-507b-4797-80ca-5f8b50eb9d3d@huaweicloud.com>","list_archive_url":null,"date":"2026-03-26T11:09:20","subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","submitter":{"id":85428,"url":"http://patchwork.ozlabs.org/api/people/85428/","name":"Zhang Yi","email":"yi.zhang@huaweicloud.com"},"content":"On 3/26/2026 3:42 PM, Diangang Li wrote:\n> Hi, Yi,\n> \n> Thanks. Yes, for existing metadata blocks ext4 is read-modify-write, so \n> without a successful read (Uptodate) there is no write path to update \n> that block.\n> \n> In the case we're seeing, the read keeps failing (repeated I/O errors on \n> the same LBA), so the write never has a chance to run either. Given \n> that, would it make sense (as Fengnan suggested) to treat persistent \n> media errors (e.g. MEDIUM ERROR / IO ERROR) as non-retryable at the \n> filesystem level, i.e. keep failing fast for that block? That would \n> avoid the BH_Lock thundering herd and prevent hung tasks.\n> \n\nFYI, AFAICT, while this approach makes sense in theory, it actually\nfaces challenges in fault recovery. This is because these error codes\nare not always reliable (especially BLK_STS_IOERR). In some scenarios\nwhere reliability requirements are not very high, customers might not\nimmediately notice these errors due to transient faults on some storage\ndevices(such as some network storage scenarios), and these errors might\nresolve themselves after a certain period of time. However, after this,\nwe have to perform some heavy-weight operations, such as stopping\nservices and remounting the file system, to recover our services. I\nbelieve there will definitely be customers who will complain about\nthis.\n\nThanks,\nYi.\n\n> Thanks,\n> Diangang\n> \n> On 3/25/26 10:27 PM, Zhang Yi wrote:\n>> Hi, Diangang,\n>>\n>> On 3/25/2026 7:13 PM, Diangang Li wrote:\n>>> Hi Andreas,\n>>>\n>>> BH_Read_EIO is cleared on successful read or write.\n>>\n>> I think what Andreas means is, since you modified the ext4_read_bh() \n>> interface, if the bh to be read already has the Read_EIO flag set, then \n>> subsequent read operations through this interface will directly return \n>> failure without issuing a read I/O. At the same time, because its state \n>> is also not uptodate, for an existing block, a write request will not be \n>> issued either. How can we clear this Read_EIO flag? IIRC, relying solely \n>> on ext4_read_bh_nowait() doesn't seem sufficient to achieve this.\n>>\n>> Thanks,\n>> Yi.\n>>\n>>>\n>>> In practice bad blocks are typically repaired/remapped on write, so we\n>>> expect recovery after a successful rewrite. If the block is never\n>>> rewritten, repeatedly issuing the same failing read does not help.\n>>>\n>>> We clear the flag on successful reads so the buffer can recover\n>>> immediately if the error was transient. Since read-ahead reads are not\n>>> blocked, a later successful read-ahead will clear the flag and allow\n>>> subsequent synchronous readers to proceed normally.\n>>>\n>>> Best,\n>>> Diangang\n>>>\n>>> On 3/25/26 6:15 PM, Andreas Dilger wrote:\n>>>> On Mar 25, 2026, at 03:33, Diangang Li <diangangli@gmail.com> wrote:\n>>>>>\n>>>>> From: Diangang Li <lidiangang@bytedance.com>\n>>>>>\n>>>>> ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read \n>>>>> fails,\n>>>>> the buffer remains !Uptodate. With concurrent callers, each waiter can\n>>>>> retry the same failing read after the previous holder drops BH_Lock. \n>>>>> This\n>>>>> amplifies device retry latency and may trigger hung tasks.\n>>>>>\n>>>>> In the normal read path the block driver already performs its own \n>>>>> retries.\n>>>>> Once the retries keep failing, re-submitting the same metadata read \n>>>>> from\n>>>>> the filesystem just amplifies the latency by serializing waiters on\n>>>>> BH_Lock.\n>>>>>\n>>>>> Remember read failures on buffer_head and fail fast for ext4 \n>>>>> metadata reads\n>>>>> once a buffer has already failed to read. Clear the flag on successful\n>>>>> read/write completion so the buffer can recover. ext4 read-ahead uses\n>>>>> ext4_read_bh_nowait(), so it does not set the failure flag and remains\n>>>>> best-effort.\n>>>>\n>>>> Not that the patch is bad, but if the BH_Read_EIO flag is set on a \n>>>> buffer\n>>>> and it prevents other tasks from reading that block again, how would the\n>>>> buffer ever become Uptodate to clear the flag?  There isn't enough state\n>>>> in a 1-bit flag to have any kind of expiry and later retry.\n>>>>\n>>>> Cheers, Andreas\n>>>\n>>\n> \n>","headers":{"Return-Path":"\n <SRS0=IfsB=B2=vger.kernel.org=linux-ext4+bounces-15451-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=2404:9400:2221:ea00::3; helo=mail.ozlabs.org;\n envelope-from=srs0=ifsb=b2=vger.kernel.org=linux-ext4+bounces-15451-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=\"2600:3c04:e001:36c::12fc:5321\"\n arc.chain=subspace.kernel.org","gandalf.ozlabs.org;\n dmarc=none (p=none dis=none) header.from=huaweicloud.com","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c04:e001:36c::12fc:5321; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15451-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=45.249.212.51","smtp.subspace.kernel.org;\n dmarc=none (p=none dis=none) header.from=huaweicloud.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=huaweicloud.com"],"Received":["from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fhM5K1b08z1yGD\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 22:30:13 +1100 (AEDT)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fhM5K15cKz4wM0\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 22:30:13 +1100 (AEDT)","by gandalf.ozlabs.org (Postfix)\n\tid 4fhM5K13Wtz4wHX; Thu, 26 Mar 2026 22:30:13 +1100 (AEDT)","from tor.lore.kernel.org (tor.lore.kernel.org\n [IPv6:2600:3c04:e001:36c::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fhM5D0xGWz4wM0\n\tfor <patchwork-incoming@ozlabs.org>; Thu, 26 Mar 2026 22:30:08 +1100 (AEDT)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id D47D8303C53C\n\tfor <patchwork-incoming@ozlabs.org>; Thu, 26 Mar 2026 11:28:35 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 23B203CF05C;\n\tThu, 26 Mar 2026 11:28:27 +0000 (UTC)","from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com\n [45.249.212.51])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F8FC38AC7F;\n\tThu, 26 Mar 2026 11:28:21 +0000 (UTC)","from mail.maildlp.com (unknown [172.19.163.177])\n\tby dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4fhLd26sD3zYQtys;\n\tThu, 26 Mar 2026 19:09:10 +0800 (CST)","from mail02.huawei.com (unknown [10.116.40.112])\n\tby mail.maildlp.com (Postfix) with ESMTP id E077A40591;\n\tThu, 26 Mar 2026 19:09:22 +0800 (CST)","from [10.174.178.253] (unknown [10.174.178.253])\n\tby APP1 (Coremail) with SMTP id cCh0CgC3utrhE8VpDK8mCQ--.8375S3;\n\tThu, 26 Mar 2026 19:09:22 +0800 (CST)"],"ARC-Seal":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1774524613; cv=pass;\n\tb=olyTf0e8O7KC7jXDGqBgUA8w0pBRmP4o083Rf2Nr3SXMLbWzijx4eVQTmi8oDfOnUtL8QKZL1aC4RnSnni6J7pg6acQRJZay9VLKkd6xynz6objawvp1+XTRptZnTJoFE1H0GQOzIaCTWPO7wf6k4+9y12B+nNQ+WC+f0i5ZcD8lECR5wWh9GCIYuVlnnS0jEpJiYbJ9qwByKttMQztSZPMx7Qw5ipDqsqkiC/aqqzI03aViI0arBBPE6TXoRjGhSZJLZxrz6VTb5lfyTLoA7MElsa6E5qvDCSq9m3btSnEqFsFn7VhcP/+M2Gmiu83a1Lflhd3asigxEwh6DD7LhQ==","i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1774524506; cv=none;\n b=Ismiz0tHmSrP3XImbhRpGahFZrG1c4RqYDHwcoekza/yeqypIe2c7jbjYGkDWcCk1f5k97MjpPq35sqoTPPE7+pcSzIS7DUsMaZYo5n8kfrZJjNYckfQGQgnW3QcgX4hgvSpOaQp4qV6zh0zUnhYWP6J14CvQL1Dv3A00WAI9nc="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1774524613; c=relaxed/relaxed;\n\tbh=79MK3SATAblNk/M2AWn57PUmyPWRnf2MGLxUAuK/qOs=;\n\th=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:\n\t In-Reply-To:Content-Type;\n b=l4iumBa6hWItTO13hS+cqs4afVnOkweI8yrz4ysnuWdoLlXXUgq/TXWkL1j1usEtOHDO69auS8phJMZnknj+wJyYPPHA/OYn9sr7k9TxIlV75wgANSHOdpFwZW+nJfOrZsONlvnO53WIojP7nN+kutTUHDl+cjtJVH0fiRnQECxTv5IQ3kX+FEytz9Q18NJsVbxSWS95OKTsiGT0Mfz95mSZLb796coXfVvze3CqZS9bGCrPSSXXNnfbLsFcQZb/kDN89LWntNzG+6YIUZgOnLMJd58OlAcd1cqObfkFv9I8Tiuhli5Vt3Fn2GtoLnWwE5LbC4CXZAtWGKMZfYjgHw==","i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1774524506; c=relaxed/simple;\n\tbh=DxIaFV1+ajfUv/NX5Haw9UtXGbylkKr0bLug7vtIfVQ=;\n\th=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:\n\t In-Reply-To:Content-Type;\n b=FDYyJMZqZkiSUYJOoJ/1l1mDj8CGVUCbW2GurWBaG7eG+XmICORmzZMT1YfFp6TTKIXtZqjUQABY2jnrHwWUuGMyjFnCTAcX13NBn7xF0ltFC6z379vqET5b2s0OsPFG34vuMjWQo4FrkDV8yHFWR4MxHicUvLwY4nXK+Hd3018="],"ARC-Authentication-Results":["i=2; gandalf.ozlabs.org;\n dmarc=none (p=none dis=none) header.from=huaweicloud.com;\n spf=pass (client-ip=2600:3c04:e001:36c::12fc:5321; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15451-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=1; smtp.subspace.kernel.org;\n dmarc=none (p=none dis=none) header.from=huaweicloud.com;\n spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51"],"Message-ID":"<51472d04-507b-4797-80ca-5f8b50eb9d3d@huaweicloud.com>","Date":"Thu, 26 Mar 2026 19:09:20 +0800","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","User-Agent":"Mozilla Thunderbird","Subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","To":"Diangang Li <lidiangang@bytedance.com>, Zhang Yi <yizhang089@gmail.com>,\n Andreas Dilger <adilger@dilger.ca>, Diangang Li <diangangli@gmail.com>","Cc":"tytso@mit.edu, linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,\n linux-kernel@vger.kernel.org, changfengnan@bytedance.com","References":"<20260325093349.630193-1-diangangli@gmail.com>\n <20260325093349.630193-2-diangangli@gmail.com>\n <B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>\n <c6f4b982-c6e4-4f77-a16d-0c381c1e25f0@bytedance.com>\n <e5c657e6-ffbd-4327-adaf-ae52cb50b96d@gmail.com>\n <aa1f5bb1-ffa0-4c85-a228-730593bd4ed0@bytedance.com>","Content-Language":"en-US","From":"Zhang Yi <yi.zhang@huaweicloud.com>","In-Reply-To":"<aa1f5bb1-ffa0-4c85-a228-730593bd4ed0@bytedance.com>","Content-Type":"text/plain; charset=UTF-8","Content-Transfer-Encoding":"8bit","X-CM-TRANSID":"cCh0CgC3utrhE8VpDK8mCQ--.8375S3","X-Coremail-Antispam":"1UD129KBjvJXoWxAw43uw1xJw1fuF4fZF17trb_yoWrWw47pr\n\tWSka17Kr4Dt34SvrsFvw1xtay8tw12yFWYqrn5Gr13Aas09r1SqFyxtayY9FW7Ars7K3Wj\n\tvr40q3srXr15AFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2\n\t9KBjDU0xBIdaVrnRJUUUyGb4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2\n\t6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4\n\tvEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7Cj\n\txVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x\n\t0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG\n\t6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV\n\tCjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxkF7I0En4kS14v26r1q6r43MxAIw28IcxkI7VAK\n\tI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7\n\txvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xII\n\tjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAIw2\n\t0EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x02\n\t67AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU17KsUUUUUU==","X-CM-SenderInfo":"d1lo6xhdqjqx5xdzvxpfor3voofrz/","X-Spam-Status":"No, score=-1.1 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDMARC_MISSING,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,\n\tSPF_HELO_NONE,SPF_PASS autolearn=disabled version=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"}},{"id":3669682,"web_url":"http://patchwork.ozlabs.org/comment/3669682/","msgid":"<7792457f-453b-4534-95bb-f932cbc62d12@bytedance.com>","list_archive_url":null,"date":"2026-03-26T12:09:11","subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","submitter":{"id":90455,"url":"http://patchwork.ozlabs.org/api/people/90455/","name":"Diangang Li","email":"lidiangang@bytedance.com"},"content":"On 3/25/26 11:06 PM, Matthew Wilcox wrote:\n> On Wed, Mar 25, 2026 at 04:15:42AM -0600, Andreas Dilger wrote:\n>> On Mar 25, 2026, at 03:33, Diangang Li <diangangli@gmail.com> wrote:\n>>>\n>>> From: Diangang Li <lidiangang@bytedance.com>\n>>>\n>>> ext4 metadata reads serialize on BH_Lock (lock_buffer). If the read fails,\n>>> the buffer remains !Uptodate. With concurrent callers, each waiter can\n>>> retry the same failing read after the previous holder drops BH_Lock. This\n>>> amplifies device retry latency and may trigger hung tasks.\n>>>\n>>> In the normal read path the block driver already performs its own retries.\n>>> Once the retries keep failing, re-submitting the same metadata read from\n>>> the filesystem just amplifies the latency by serializing waiters on\n>>> BH_Lock.\n>>>\n>>> Remember read failures on buffer_head and fail fast for ext4 metadata reads\n>>> once a buffer has already failed to read. Clear the flag on successful\n>>> read/write completion so the buffer can recover. ext4 read-ahead uses\n>>> ext4_read_bh_nowait(), so it does not set the failure flag and remains\n>>> best-effort.\n>>\n>> Not that the patch is bad, but if the BH_Read_EIO flag is set on a buffer\n>> and it prevents other tasks from reading that block again, how would the\n>> buffer ever become Uptodate to clear the flag?  There isn't enough state\n>> in a 1-bit flag to have any kind of expiry and later retry.\n> \n> I've been thinking about this problem too, albeit from a folio read\n> perspective, not from a buffer_head read perspective.  You're quite\n> right that one bit isn't enough.  The solution I was considering but\n> haven't implemented yet was to tell all the current waiters that\n> the IO has failed, but not set any kind of permanent error flag.\n> \n> I was thinking about starting with this:\n> \n> +++ b/include/linux/wait_bit.h\n> @@ -10,6 +10,7 @@\n>   struct wait_bit_key {\n>          unsigned long           *flags;\n>          int                     bit_nr;\n> +       int                     error;\n>          unsigned long           timeout;\n>   };\n> \n> \n> and then adding/changing various APIs to allow an error to be passed in\n> and noticed by the woken task.\n> \n> With this change, the thundering herd all wake up, see the error and\n> return immediately instead of each submitting their own I/O.  New reads\n> will retry the read, but each will only be held up for a maximum of\n> their own timeout.\n\nHi Matthew and all,\n\nThanks. The idea of waking the current waiters with an error makes a lot \nof sense.\n\nI’ve been considering a smaller change on the buffer_head side that \nmight get most of the benefit without touching the generic wait_bit \nAPIs. The idea is to tell whether taking BH_Lock required waiting. If we \nhad to wait, and the buffer is already marked with \nbuffer_read_io_error(), then just return -EIO and don’t submit another \nread. If we got the lock without waiting, still submit the read. That \nshould stop the thundering herd from reissuing the same failing IO.\n\nAnother option is a simple retry window. After a read failure, don’t\nretry for some period of time, and size that window by error type. For\npersistent media errors (e.g. MEDIUM ERROR, repeated IO ERROR) the \nwindow could be effectively infinite, while for transient cases (e.g. \nfew IO ERROR, BLK_STS_RESOURCE) it could be small.\n\nAny opinions on these two approaches, or other ideas for this problem?\n\nThanks,\nDiangang","headers":{"Return-Path":"\n <SRS0=+wP2=B2=vger.kernel.org=linux-ext4+bounces-15460-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=dqs42eCK;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=150.107.74.76; helo=mail.ozlabs.org;\n envelope-from=srs0=+wp2=b2=vger.kernel.org=linux-ext4+bounces-15460-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=172.232.135.74 arc.chain=subspace.kernel.org","gandalf.ozlabs.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com","gandalf.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=dqs42eCK;\n\tdkim-atps=neutral","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=172.232.135.74; helo=sto.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15460-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com\n header.b=\"dqs42eCK\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=209.127.230.111","smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=bytedance.com"],"Received":["from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fhN251xmWz1yGD\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 23:12:28 +1100 (AEDT)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fhN245hHQz4wCF\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 26 Mar 2026 23:12:28 +1100 (AEDT)","by gandalf.ozlabs.org (Postfix)\n\tid 4fhN245Fyxz4wBB; Thu, 26 Mar 2026 23:12:28 +1100 (AEDT)","from sto.lore.kernel.org (sto.lore.kernel.org [172.232.135.74])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fhN201br5z4wCF\n\tfor <patchwork-incoming@ozlabs.org>; Thu, 26 Mar 2026 23:12:24 +1100 (AEDT)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sto.lore.kernel.org (Postfix) with ESMTP id CF4AE30651AA\n\tfor <patchwork-incoming@ozlabs.org>; Thu, 26 Mar 2026 12:09:39 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 3A4133F54DC;\n\tThu, 26 Mar 2026 12:09:35 +0000 (UTC)","from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com\n [209.127.230.111])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 282C034EF05\n\tfor <linux-ext4@vger.kernel.org>; Thu, 26 Mar 2026 12:09:31 +0000 (UTC)"],"ARC-Seal":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1774527148; cv=pass;\n\tb=PLfdgrG6YJVUhQ7oWgdveaaaVWmxsqne4Wt3kc8EW1nEhWxkrxQPK01KJmxnLf5BTliLPM1tDByUU7JOqTMAkIUZJQzsPyA5+CsYDd3OigmD8Ntl+Fd2FaDfMcQpUqaAeZ9A1489yGPvAxU89inRP7V9nj0r3uUrDM1+o2IPedTV5ObRSs5+3kVPy69EiVGIQJOCMfM3idCIc8NZHsp4FH+oOkcjf46DTigOsAl1Mr6Up4LLfE8Bn+Y7MDzBUeIZGxVUgdx1Jqq7+arDrlfFMFiQanXfuQ2e5Gf38SWvM2XYkXqAsgnEGcuL+m3HMbl0YskyylB6O2uXsWSyrJlnsg==","i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1774526975; cv=none;\n b=I2Atqwzgt1n5JxeYayVPFEyFRG9r4pQwoAIiMCdC5dyRa0skdQx0fPJsOOKCFJKpxZtHdH6CiFHQiBEYM7+vPfzse8WJv3zhBN9upr6lH2TX8Dd21FKNMLytWu3Ltyid3Z4pZdr9OpBE945ClTZKGSvjmhq8rttB/R3gFOkLvG0="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1774527148; c=relaxed/relaxed;\n\tbh=kCXkJW1bYEy/j6tn+IASdmpLScjftFpR6RUcFeI7UuM=;\n\th=Mime-Version:In-Reply-To:Message-Id:References:Cc:Subject:\n\t Content-Type:To:Date:From;\n b=s43xxnFHWct+BbTsViFAgAhmS/d5KUHJ2TCtu/3QpfAsa7lht4Eq0ravm7qgzV+rmgZr3rFOmHyZx4cC9atkMOc7mzPyPXK5w2g2UvprxKJn8T2Oo/vryF84CYeraE/rIhX76EjX6ZylpAKCgGsDrYy2WxZ3cyJHPfznGepyFlp892vojq68sN+5qQldQcefN66AKLBW0K+KBObEOl4xsH4mH7Twz/HDfDodrlpYmsl+POuLdLiCPiCPutJUD5ApZKeXrQt7VlfgczgCNFMJAHBK1va8Ab+d7Wnnyya25ZmyPhprXpKleAGogUStoVWGarcD2oejV71b1uaQvqoOXA==","i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1774526975; c=relaxed/simple;\n\tbh=6V2ldDbGE+hirwsVOobsXeMIKV23Yx22E9G8nbKthfk=;\n\th=Mime-Version:In-Reply-To:Message-Id:References:Cc:Subject:\n\t Content-Type:To:Date:From;\n b=a5504QIA8AeMqOUQ4jPxdEXdBEmqZtglqhLqvLRcKvjWK/QSnNHDaLUNHsTGuaQp5bvBSQaHWr6eGPTehufQpox3KfqlkgjArQX0MTYbJq/PAhBiipvHmA2EhzWKFWA9YD2+76BI9xxJ+sU+uYnZgyxYMI3SlQg/eeBszJQBeLM="],"ARC-Authentication-Results":["i=2; gandalf.ozlabs.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;\n dkim=pass (2048-bit key;\n unprotected) header.d=bytedance.com header.i=@bytedance.com\n header.a=rsa-sha256 header.s=2212171451 header.b=dqs42eCK; dkim-atps=neutral;\n spf=pass (client-ip=172.232.135.74; helo=sto.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15460-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;\n spf=pass smtp.mailfrom=bytedance.com;\n dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com\n header.b=dqs42eCK; arc=none smtp.client-ip=209.127.230.111"],"DKIM-Signature":"v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;\n s=2212171451; d=bytedance.com; t=1774526967; h=from:subject:\n mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:\n mime-version:in-reply-to:message-id;\n bh=kCXkJW1bYEy/j6tn+IASdmpLScjftFpR6RUcFeI7UuM=;\n b=dqs42eCK5R+smDgmkiXsOMEgIwVTXa/mVovLEKA8JRa4bd0zE5cx+Lqqwtoe2KQhhsM+RS\n IsYaBABLq3LGb9kiD2sBSuCkSdjiJo8IeOyfzZllECPwZ+oQAwy0psWR6QqVIo5F2O6/id\n /h3deqvr0JrNRMesRZepcdlFhQPLcN9PP5i+aZMVAkJygZnnss6likKdG1J8VMF/1xUSHW\n 2kQ99+/s7FmIRWeW7BiDsOnVedvjaHtQ4wspllCle/hQsHuOCB6CHNBvw3dhrL76/jp+VN\n fssjGGFnUALa254HKYWa9jMj4euzhGbsPmiCmqQy8w5BrmEDvtfQCje4Z0iDkg==","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","Mime-Version":"1.0","Content-Transfer-Encoding":"quoted-printable","In-Reply-To":"<acP5-v85CwUQZlMB@casper.infradead.org>","Message-Id":"<7792457f-453b-4534-95bb-f932cbc62d12@bytedance.com>","Content-Language":"en-US","User-Agent":"Mozilla Thunderbird","References":"<20260325093349.630193-1-diangangli@gmail.com>\n <20260325093349.630193-2-diangangli@gmail.com>\n <B53E253C-F314-4376-BD9D-58867FC8D3F6@dilger.ca>\n <acP5-v85CwUQZlMB@casper.infradead.org>","Cc":"\"Diangang Li\" <diangangli@gmail.com>, <tytso@mit.edu>,\n\t<linux-ext4@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,\n\t<linux-kernel@vger.kernel.org>, <changfengnan@bytedance.com>","Subject":"Re: [RFC 1/1] ext4: fail fast on repeated metadata reads after IO\n failure","Content-Type":"text/plain; charset=UTF-8","X-Original-From":"Diangang Li <lidiangang@bytedance.com>","To":"\"Matthew Wilcox\" <willy@infradead.org>,\n\t\"Andreas Dilger\" <adilger@dilger.ca>","Date":"Thu, 26 Mar 2026 20:09:11 +0800","X-Lms-Return-Path":"\n <lba+269c521f5+7d4f15+vger.kernel.org+lidiangang@bytedance.com>","From":"\"Diangang Li\" <lidiangang@bytedance.com>","X-Spam-Status":"No, score=-1.2 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,\n\tHEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,\n\tSPF_PASS autolearn=disabled version=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"}}]