{"id":2222590,"url":"http://patchwork.ozlabs.org/api/1.1/covers/2222590/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-ext4/cover/20260413062500.1380307-1-diangangli@gmail.com/","project":{"id":8,"url":"http://patchwork.ozlabs.org/api/1.1/projects/8/?format=json","name":"Linux ext4 filesystem development","link_name":"linux-ext4","list_id":"linux-ext4.vger.kernel.org","list_email":"linux-ext4@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null},"msgid":"<20260413062500.1380307-1-diangangli@gmail.com>","date":"2026-04-13T06:24:59","name":"[RFC,v2,0/1] ext4: fail fast on repeated buffer_head reads after IO failure","submitter":{"id":92966,"url":"http://patchwork.ozlabs.org/api/1.1/people/92966/?format=json","name":"Diangang Li","email":"diangangli@gmail.com"},"mbox":"http://patchwork.ozlabs.org/project/linux-ext4/cover/20260413062500.1380307-1-diangangli@gmail.com/mbox/","series":[{"id":499650,"url":"http://patchwork.ozlabs.org/api/1.1/series/499650/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-ext4/list/?series=499650","date":"2026-04-13T06:24:59","name":"ext4: fail fast on repeated buffer_head reads after IO failure","version":2,"mbox":"http://patchwork.ozlabs.org/series/499650/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/covers/2222590/comments/","headers":{"Return-Path":"\n <SRS0=he3L=CM=vger.kernel.org=linux-ext4+bounces-15799-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256\n header.s=20251104 header.b=g5klmUEI;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=2404:9400:2221:ea00::3; helo=mail.ozlabs.org;\n envelope-from=srs0=he3l=cm=vger.kernel.org=linux-ext4+bounces-15799-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=172.105.105.114 arc.chain=subspace.kernel.org","gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com","gandalf.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256\n header.s=20251104 header.b=g5klmUEI;\n\tdkim-atps=neutral","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=172.105.105.114; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15799-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com\n header.b=\"g5klmUEI\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=209.85.210.172","smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=gmail.com"],"Received":["from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1 raw public key)\n server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fvHV319M2z1yDF\n\tfor <incoming@patchwork.ozlabs.org>; Mon, 13 Apr 2026 16:26:02 +1000 (AEST)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fvHV1728Kz4wCG\n\tfor <incoming@patchwork.ozlabs.org>; Mon, 13 Apr 2026 16:26:01 +1000 (AEST)","by gandalf.ozlabs.org (Postfix)\n\tid 4fvHV16v10z4wHx; Mon, 13 Apr 2026 16:26:01 +1000 (AEST)","from tor.lore.kernel.org (tor.lore.kernel.org [172.105.105.114])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fvHTy2ZwDz4wCG\n\tfor <patchwork-incoming@ozlabs.org>; Mon, 13 Apr 2026 16:25:58 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id DB6F73013739\n\tfor <patchwork-incoming@ozlabs.org>; Mon, 13 Apr 2026 06:25:49 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 2B7E23921D1;\n\tMon, 13 Apr 2026 06:25:44 +0000 (UTC)","from mail-pf1-f172.google.com (mail-pf1-f172.google.com\n [209.85.210.172])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AC73241690\n\tfor <linux-ext4@vger.kernel.org>; Mon, 13 Apr 2026 06:25:41 +0000 (UTC)","by mail-pf1-f172.google.com with SMTP id\n d2e1a72fcca58-82ce49785a0so1693040b3a.2\n        for <linux-ext4@vger.kernel.org>;\n Sun, 12 Apr 2026 23:25:41 -0700 (PDT)","from n37-098-250.byted.org ([115.190.40.14])\n        by smtp.gmail.com with ESMTPSA id\n d2e1a72fcca58-82f0c20aed6sm10035640b3a.0.2026.04.12.23.25.37\n        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);\n        Sun, 12 Apr 2026 23:25:40 -0700 (PDT)"],"ARC-Seal":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1776061561; cv=pass;\n\tb=ZT1jDfp4PbtHj00sJ88sYkBXPp+wgvsyuQ6gfpLg5AwXuOWgAPrPuf9Mc1vd+QqMnu6y7AO7asLeaZkfmXpTeZ27skZ/HyPLUddwH0FhdixYRDVoGQLuUrYx3njGTsPxbExCYGTthrL6t7u0bWyc5ITsxM1XGiRHJTBwD0GG2FN918MhnYcaDFNxbUZRHLJeCOEC00QNP1MN2HMvGEJ798RBxKDuw4UwQqSHg9EGlgqPVKFDl0NfujXdbllsmfVA6rNW3ikOQnwP0u3ZP4M4UqAp0C9JJ4xV00j/vXR8yngPXzo8FglknCg6Gvyyc7LVNYAXmDTc2dV6ZIYf7NWX/A==","i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1776061544; cv=none;\n b=YGkoz5CaO+qw7gY6DCDAlXnxw/imAd0ehYZlElUSFREcuiTX5aQWhmcfgRk/OPY07aR4QvSO8lAM94DHLXZE3q63xYIp69FHqoEiQ50MNLrd+Pt4ARqFvR9YzjR3Gk2T+MLdMx30JygNvHYG6kdRuk4m/X8ZoC5M/OOPolga+s4="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1776061561; c=relaxed/relaxed;\n\tbh=YSBOAhViOuwEID8TtZg66KhyEdupsaPAGXgm7OuInv0=;\n\th=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:\n\t MIME-Version;\n b=nLm2Rfwubol+XgBh8CxG6DbCqnJG5vX5uSVYqmkn6LNCC38JwPQN0JJBxHsPm3geHmC40lT9crOcwjFkVzuPfj85rrU300xYrJgCWWpKZomXASX08TC2wxQetPm1sxf1mAe7k6uSCoYPp9ofCmqVWEG3SnQxpDxSK+NTdLv1Si8ia+cwrM5Xb3bMa6Q4usI8l9Sd71s9uDJlgcDa2dXHXB28FvNi/TjhRe+Q2VUg1NOrqWyvjVAfcte+hZjZ3RZXoZx4FhSapIdOPfQjJ++lf+JGTE3VQOhG9fa757lKENINsUha+rnxH2nJUHzLeM6ewW7ufQg/y1rSOKWFu2DBRw==","i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1776061544; c=relaxed/simple;\n\tbh=aJt4VQAGHE/DGXXmuCohzjPL5TZ8JVdErns35A0/b04=;\n\th=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:\n\t MIME-Version;\n b=ReeGxFX64BwhqFtyazo/nkUkXCoByf+ddhbRIpRjrxMDbOYKtrH5KZMs1Vam0Mn4uHRp4nCQJ3vk0ToNP6QraLwckTEusWUMonLSsLJkm82JE2jOB72HTwIjTm46KdZ0vPl7JadlgVcILtU+QsK3CWO1aD61jhR65wHxARuTmO4="],"ARC-Authentication-Results":["i=2; gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com; dkim=pass (2048-bit key;\n unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256\n header.s=20251104 header.b=g5klmUEI; dkim-atps=neutral;\n spf=pass (client-ip=172.105.105.114; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15799-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com;\n spf=pass smtp.mailfrom=gmail.com;\n dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com\n header.b=g5klmUEI; arc=none smtp.client-ip=209.85.210.172"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n        d=gmail.com; s=20251104; t=1776061541; x=1776666341;\n darn=vger.kernel.org;\n        h=content-transfer-encoding:mime-version:references:in-reply-to\n         :message-id:date:subject:cc:to:from:from:to:cc:subject:date\n         :message-id:reply-to;\n        bh=YSBOAhViOuwEID8TtZg66KhyEdupsaPAGXgm7OuInv0=;\n        b=g5klmUEI5bgU9/qtZXpXuq+w5X4vF5p7pAPT91vNHdqeSu4kqK91TkTlbbQHREvxUy\n         V136FGFBpLsPmR0lasxUQv+j0tkfqpUwPMkmJJp2XRBmLZYQCLXMScYGgclerEdSj5Ux\n         9oafBXIeS2z2iLG6aY/UsmCf1IvfdxrOouJcZkQC2Q8m5Y0lZWB8YOcx8P3V+sh9tkEW\n         FpUZStstZsUbCdU2TsXWWAx/yJyWwLxkRUuEokGu91K9vEbh7VzTthOkqvwKQbSIlTkJ\n         u6suzLQ+A2Vtt5pdFl0aaHd5gQ1icjjuRsu0YSvX9Q7/dDHH6SRRJx+CUcXQJwRD//0t\n         6zGw==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n        d=1e100.net; s=20251104; t=1776061541; x=1776666341;\n        h=content-transfer-encoding:mime-version:references:in-reply-to\n         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from\n         :to:cc:subject:date:message-id:reply-to;\n        bh=YSBOAhViOuwEID8TtZg66KhyEdupsaPAGXgm7OuInv0=;\n        b=s7ywVNHReiJYnZxqgltF42VlLVKAUReiClQOsC3/lXj7VgsNuvnGv/aEroy08+c/yh\n         +PdSKo1PVgci8qkD3mYvwuT4kAeuZFubzuhy+1evCYdJKNErN49/dWgNiV7L2vBUEz6v\n         DmOmR4HFGG+OyeZUVDtbRJXa4v24n44nEMZNPc39ClcMgQ6NJEaRaEInC0FoxliXzI8F\n         X2ZGzl8NwkN1NS0ZNFpzQgMZM1wzyvohbAIql2Aa9whDVHzDA0qe96J6ZeYG7RELjNm6\n         wHpzWgCEQgH4G/B6Cs3ZvzRSAqfLMo+4YnL/RT+Y+/N+I/hADnwRLLfDnGqATQdZMOEL\n         d7Wg==","X-Gm-Message-State":"AOJu0YwxenpDpbYTJQ3xhk7013RmKtYzrOZtuGmA71JvyF68o3sCw1Ny\n\tMCa4RSwfktqDa4gM+uwvc4I+b8yNtVpTvqQ+MDsNyb8FcQOzBvvu3qKmq8v1ixCn","X-Gm-Gg":"AeBDietiymnR434rw/qM+2Mv50qTGV8i7gX970M34Luf/ryhQTV01vwcp96T81o8H7Y\n\tHckQ2JiyOcRgqr3iDcfDp06kH18Fcey6eDwoi67+hIZBUKwkq19MX2zZscPeTsmFOaKT0UvgvFJ\n\t+7Fb4oKh06PHrYP5JQzcY0c2rAUPV7z7v2H1kmOErN1RbAIga8fcLbm/e61tk68n4T10+slxbQn\n\tONxWMZiymauO2w+jm4AuJ3yOFckkyAeFeueA16bdWRazSlPDN/7OxUhTpQ6Mw26jXvFCH5WE5wT\n\tVlIlwOgcWuK1ZJQpa9hrc6aB7FjQ27bvMfEuyS/T6RhWxVg/y9cOWNsK6ehnF4cabtWSZNLflQI\n\te1a8OT++b1OtwMylzshn4sLMJTzIkPqiLYH3h7SGLCwM9GO/Rsr4bUr0qEboZjp47XT6fsrVUHF\n\ttwYJ/th3xI3PYR29fNW5w5CkcbjLt4oKVjDzKC","X-Received":"by 2002:a05:6a00:2388:b0:82c:66f2:1226 with SMTP id\n d2e1a72fcca58-82f0c25258fmr12342783b3a.38.1776061540812;\n        Sun, 12 Apr 2026 23:25:40 -0700 (PDT)","From":"Diangang Li <diangangli@gmail.com>","To":"tytso@mit.edu,\n\tadilger.kernel@dilger.ca","Cc":"linux-ext4@vger.kernel.org,\n\tlinux-fsdevel@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org,\n\tchangfengnan@bytedance.com,\n\tyizhang089@gmail.com,\n\twilly@infradead.org,\n\tDiangang Li <lidiangang@bytedance.com>","Subject":"[RFC v2 0/1] ext4: fail fast on repeated buffer_head reads after IO\n failure","Date":"Mon, 13 Apr 2026 14:24:59 +0800","Message-Id":"<20260413062500.1380307-1-diangangli@gmail.com>","X-Mailer":"git-send-email 2.39.5","In-Reply-To":"<20260325093349.630193-1-diangangli@gmail.com>","References":"<20260325093349.630193-1-diangangli@gmail.com>","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Transfer-Encoding":"8bit","X-Spam-Status":"No, score=-1.2 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,\n\tFREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,\n\tMAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=disabled\n\tversion=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"},"content":"From: Diangang Li <lidiangang@bytedance.com>\n\nA production system reported hung tasks blocked for 300s+ in ext4 buffer_head\npaths. Hung task reports were accompanied by disk IO errors, but profiling\nshowed that most individual reads completed (or failed) within 10s, with\nthe worst case around 60s.\n\nAt the same time, we observed a high repeat rate to the same disk LBAs.\nThe repeated reads frequently showed seconds-level latency and ended with\nIO errors, e.g.:\n\n  [Tue Mar 24 14:16:24 2026] blk_update_request: I/O error, dev sdi,\n      sector 10704150288 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0\n  [Tue Mar 24 14:16:25 2026] blk_update_request: I/O error, dev sdi,\n      sector 10704488160 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0\n  [Tue Mar 24 14:16:26 2026] blk_update_request: I/O error, dev sdi,\n      sector 10704382912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0\n\nWe also sampled repeated-LBA latency histograms on /dev/sdi and saw that\nthe same error-prone LBAs were re-submitted many times with ~1-4s latency:\n\n  LBA 10704488160 (count=22): 1-2s: 20, 2-4s: 2\n  LBA 10704382912 (count=21): 1-2s: 20, 2-4s: 1\n  LBA 10704150288 (count=21): 1-2s: 19, 2-4s: 2\n\nRoot cause\n==========\n\next4 buffer_head reads serialize IO via BH_Lock. When one read fails, the\nbuffer remains !Uptodate. With multiple threads concurrently accessing\nthe same buffer_head, each waiter wakes up after the previous owner drops\nBH_Lock, then submits the same read again and waits again. This makes the\nlatency grow linearly with the number of contending threads, leading to\n300s+ hung tasks.\n\nThe failing IOs are repeatedly issued to the same LBA. The observed 1s+\nper-IO latency is likely from device-side retry/error recovery. On SCSI the\ndriver typically retries reads several times (e.g. 5 retries in our\nenvironment), so a single filesystem submission can easily accumulate 5s+\ndelay before failing. When multiple threads then re-submit the same failing\nread and serialize on BH_Lock, the delay is amplified into 300s+ hung tasks.\n\nSimilar behavior exists for other devices (e.g. NVMe with multiple internal\nretries).\n\nExample hung stacks:\n\n  INFO: task toutiao.infra.t:3760933 blocked for more than 327 seconds.\n  Call Trace:\n   __schedule\n   io_schedule\n   __wait_on_bit_lock\n   bh_uptodate_or_lock\n   __read_extent_tree_block\n   ext4_find_extent\n   ext4_ext_map_blocks\n   ext4_map_blocks\n   ext4_getblk\n   ext4_bread\n   __ext4_read_dirblock\n   dx_probe\n   ext4_htree_fill_tree\n   ext4_readdir\n   iterate_dir\n   ksys_getdents64\n\n  INFO: task toutiao.infra.t:2724456 blocked for more than 327 seconds.\n  Call Trace:\n   __schedule\n   io_schedule\n   __wait_on_bit_lock\n   ext4_read_bh_lock\n   ext4_bread\n   __ext4_read_dirblock\n   htree_dirblock_to_tree\n   ext4_htree_fill_tree\n   ext4_readdir\n   iterate_dir\n   ksys_getdents64\n\nApproach\n========\n\nRecord read failures on buffer_head (BH_Read_EIO + b_err_timestamp). When a\nretry window is configured (sysfs: err_retry_sec), ext4 will skip submitting\nanother read for the buffer_head that already failed within the window and\nreturn/unlock immediately. Clear the state on successful completion so the\nbuffer can recover if the error is transient.\n\nerr_retry_sec defaults to 0, which keeps the current behavior: after a read\nerror, callers may keep retrying the same read. Set it to a non-zero value\nto throttle repeated reads within the window.\n\nPatch summary\n=============\n\n  1) Add BH_Read_EIO, b_err_timestamp and a small helper for tracking read\n     failures on buffer_head.\n  2) Update end_buffer_read_sync() and end_buffer_write_sync() (success path)\n     to maintain that state.\n  3) Add ext4 sysfs knob err_retry_sec and throttle ext4 buffer_head reads\n     within the configured window.\n  4) Pass sb into ext4_read_bh_nowait(), ext4_read_bh() and ext4_read_bh_lock()\n     so __ext4_read_bh() can apply the per-sb retry window check.\n\nDiangang Li (1):\n  ext4: fail fast on repeated buffer_head reads after IO failure\n\n fs/buffer.c                 |  2 ++\n fs/ext4/balloc.c            |  2 +-\n fs/ext4/ext4.h              | 13 ++++++----\n fs/ext4/extents.c           |  2 +-\n fs/ext4/ialloc.c            |  3 ++-\n fs/ext4/indirect.c          |  2 +-\n fs/ext4/inode.c             | 10 ++++----\n fs/ext4/mmp.c               |  2 +-\n fs/ext4/move_extent.c       |  2 +-\n fs/ext4/resize.c            |  2 +-\n fs/ext4/super.c             | 51 +++++++++++++++++++++++++++----------\n fs/ext4/sysfs.c             |  2 ++\n include/linux/buffer_head.h | 16 ++++++++++++\n 13 files changed, 79 insertions(+), 30 deletions(-)"}