Cover Letter Detail
Show a cover letter.
GET /api/1.2/covers/2222590/?format=api
{ "id": 2222590, "url": "http://patchwork.ozlabs.org/api/1.2/covers/2222590/?format=api", "web_url": "http://patchwork.ozlabs.org/project/linux-ext4/cover/20260413062500.1380307-1-diangangli@gmail.com/", "project": { "id": 8, "url": "http://patchwork.ozlabs.org/api/1.2/projects/8/?format=api", "name": "Linux ext4 filesystem development", "link_name": "linux-ext4", "list_id": "linux-ext4.vger.kernel.org", "list_email": "linux-ext4@vger.kernel.org", "web_url": null, "scm_url": null, "webscm_url": null, "list_archive_url": "", "list_archive_url_format": "", "commit_url_format": "" }, "msgid": "<20260413062500.1380307-1-diangangli@gmail.com>", "list_archive_url": null, "date": "2026-04-13T06:24:59", "name": "[RFC,v2,0/1] ext4: fail fast on repeated buffer_head reads after IO failure", "submitter": { "id": 92966, "url": "http://patchwork.ozlabs.org/api/1.2/people/92966/?format=api", "name": "Diangang Li", "email": "diangangli@gmail.com" }, "mbox": "http://patchwork.ozlabs.org/project/linux-ext4/cover/20260413062500.1380307-1-diangangli@gmail.com/mbox/", "series": [ { "id": 499650, "url": "http://patchwork.ozlabs.org/api/1.2/series/499650/?format=api", "web_url": "http://patchwork.ozlabs.org/project/linux-ext4/list/?series=499650", "date": "2026-04-13T06:24:59", "name": "ext4: fail fast on repeated buffer_head reads after IO failure", "version": 2, "mbox": "http://patchwork.ozlabs.org/series/499650/mbox/" } ], "comments": "http://patchwork.ozlabs.org/api/covers/2222590/comments/", "headers": { "Return-Path": "\n <SRS0=he3L=CM=vger.kernel.org=linux-ext4+bounces-15799-patchwork-incoming=ozlabs.org@ozlabs.org>", "X-Original-To": [ "incoming@patchwork.ozlabs.org", "linux-ext4@vger.kernel.org" ], "Delivered-To": [ "patchwork-incoming@legolas.ozlabs.org", "patchwork-incoming@ozlabs.org" ], "Authentication-Results": [ "legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256\n header.s=20251104 header.b=g5klmUEI;\n\tdkim-atps=neutral", "legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=2404:9400:2221:ea00::3; helo=mail.ozlabs.org;\n envelope-from=srs0=he3l=cm=vger.kernel.org=linux-ext4+bounces-15799-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)", "gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=172.105.105.114 arc.chain=subspace.kernel.org", "gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com", "gandalf.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256\n header.s=20251104 header.b=g5klmUEI;\n\tdkim-atps=neutral", "gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=172.105.105.114; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15799-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)", "smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com\n header.b=\"g5klmUEI\"", "smtp.subspace.kernel.org;\n arc=none smtp.client-ip=209.85.210.172", "smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com", "smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=gmail.com" ], "Received": [ "from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1 raw public key)\n server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fvHV319M2z1yDF\n\tfor <incoming@patchwork.ozlabs.org>; Mon, 13 Apr 2026 16:26:02 +1000 (AEST)", "from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fvHV1728Kz4wCG\n\tfor <incoming@patchwork.ozlabs.org>; Mon, 13 Apr 2026 16:26:01 +1000 (AEST)", "by gandalf.ozlabs.org (Postfix)\n\tid 4fvHV16v10z4wHx; Mon, 13 Apr 2026 16:26:01 +1000 (AEST)", "from tor.lore.kernel.org (tor.lore.kernel.org [172.105.105.114])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fvHTy2ZwDz4wCG\n\tfor <patchwork-incoming@ozlabs.org>; Mon, 13 Apr 2026 16:25:58 +1000 (AEST)", "from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id DB6F73013739\n\tfor <patchwork-incoming@ozlabs.org>; Mon, 13 Apr 2026 06:25:49 +0000 (UTC)", "from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 2B7E23921D1;\n\tMon, 13 Apr 2026 06:25:44 +0000 (UTC)", "from mail-pf1-f172.google.com (mail-pf1-f172.google.com\n [209.85.210.172])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AC73241690\n\tfor <linux-ext4@vger.kernel.org>; Mon, 13 Apr 2026 06:25:41 +0000 (UTC)", "by mail-pf1-f172.google.com with SMTP id\n d2e1a72fcca58-82ce49785a0so1693040b3a.2\n for <linux-ext4@vger.kernel.org>;\n Sun, 12 Apr 2026 23:25:41 -0700 (PDT)", "from n37-098-250.byted.org ([115.190.40.14])\n by smtp.gmail.com with ESMTPSA id\n d2e1a72fcca58-82f0c20aed6sm10035640b3a.0.2026.04.12.23.25.37\n (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);\n Sun, 12 Apr 2026 23:25:40 -0700 (PDT)" ], "ARC-Seal": [ "i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1776061561; cv=pass;\n\tb=ZT1jDfp4PbtHj00sJ88sYkBXPp+wgvsyuQ6gfpLg5AwXuOWgAPrPuf9Mc1vd+QqMnu6y7AO7asLeaZkfmXpTeZ27skZ/HyPLUddwH0FhdixYRDVoGQLuUrYx3njGTsPxbExCYGTthrL6t7u0bWyc5ITsxM1XGiRHJTBwD0GG2FN918MhnYcaDFNxbUZRHLJeCOEC00QNP1MN2HMvGEJ798RBxKDuw4UwQqSHg9EGlgqPVKFDl0NfujXdbllsmfVA6rNW3ikOQnwP0u3ZP4M4UqAp0C9JJ4xV00j/vXR8yngPXzo8FglknCg6Gvyyc7LVNYAXmDTc2dV6ZIYf7NWX/A==", "i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1776061544; cv=none;\n b=YGkoz5CaO+qw7gY6DCDAlXnxw/imAd0ehYZlElUSFREcuiTX5aQWhmcfgRk/OPY07aR4QvSO8lAM94DHLXZE3q63xYIp69FHqoEiQ50MNLrd+Pt4ARqFvR9YzjR3Gk2T+MLdMx30JygNvHYG6kdRuk4m/X8ZoC5M/OOPolga+s4=" ], "ARC-Message-Signature": [ "i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1776061561; c=relaxed/relaxed;\n\tbh=YSBOAhViOuwEID8TtZg66KhyEdupsaPAGXgm7OuInv0=;\n\th=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:\n\t MIME-Version;\n b=nLm2Rfwubol+XgBh8CxG6DbCqnJG5vX5uSVYqmkn6LNCC38JwPQN0JJBxHsPm3geHmC40lT9crOcwjFkVzuPfj85rrU300xYrJgCWWpKZomXASX08TC2wxQetPm1sxf1mAe7k6uSCoYPp9ofCmqVWEG3SnQxpDxSK+NTdLv1Si8ia+cwrM5Xb3bMa6Q4usI8l9Sd71s9uDJlgcDa2dXHXB28FvNi/TjhRe+Q2VUg1NOrqWyvjVAfcte+hZjZ3RZXoZx4FhSapIdOPfQjJ++lf+JGTE3VQOhG9fa757lKENINsUha+rnxH2nJUHzLeM6ewW7ufQg/y1rSOKWFu2DBRw==", "i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1776061544; c=relaxed/simple;\n\tbh=aJt4VQAGHE/DGXXmuCohzjPL5TZ8JVdErns35A0/b04=;\n\th=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:\n\t MIME-Version;\n b=ReeGxFX64BwhqFtyazo/nkUkXCoByf+ddhbRIpRjrxMDbOYKtrH5KZMs1Vam0Mn4uHRp4nCQJ3vk0ToNP6QraLwckTEusWUMonLSsLJkm82JE2jOB72HTwIjTm46KdZ0vPl7JadlgVcILtU+QsK3CWO1aD61jhR65wHxARuTmO4=" ], "ARC-Authentication-Results": [ "i=2; gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com; dkim=pass (2048-bit key;\n unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256\n header.s=20251104 header.b=g5klmUEI; dkim-atps=neutral;\n spf=pass (client-ip=172.105.105.114; helo=tor.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15799-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org", "i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=gmail.com;\n spf=pass smtp.mailfrom=gmail.com;\n dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com\n header.b=g5klmUEI; arc=none smtp.client-ip=209.85.210.172" ], "DKIM-Signature": "v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=gmail.com; s=20251104; t=1776061541; x=1776666341;\n darn=vger.kernel.org;\n h=content-transfer-encoding:mime-version:references:in-reply-to\n :message-id:date:subject:cc:to:from:from:to:cc:subject:date\n :message-id:reply-to;\n bh=YSBOAhViOuwEID8TtZg66KhyEdupsaPAGXgm7OuInv0=;\n b=g5klmUEI5bgU9/qtZXpXuq+w5X4vF5p7pAPT91vNHdqeSu4kqK91TkTlbbQHREvxUy\n V136FGFBpLsPmR0lasxUQv+j0tkfqpUwPMkmJJp2XRBmLZYQCLXMScYGgclerEdSj5Ux\n 9oafBXIeS2z2iLG6aY/UsmCf1IvfdxrOouJcZkQC2Q8m5Y0lZWB8YOcx8P3V+sh9tkEW\n FpUZStstZsUbCdU2TsXWWAx/yJyWwLxkRUuEokGu91K9vEbh7VzTthOkqvwKQbSIlTkJ\n u6suzLQ+A2Vtt5pdFl0aaHd5gQ1icjjuRsu0YSvX9Q7/dDHH6SRRJx+CUcXQJwRD//0t\n 6zGw==", "X-Google-DKIM-Signature": "v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20251104; t=1776061541; x=1776666341;\n h=content-transfer-encoding:mime-version:references:in-reply-to\n :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from\n :to:cc:subject:date:message-id:reply-to;\n bh=YSBOAhViOuwEID8TtZg66KhyEdupsaPAGXgm7OuInv0=;\n b=s7ywVNHReiJYnZxqgltF42VlLVKAUReiClQOsC3/lXj7VgsNuvnGv/aEroy08+c/yh\n +PdSKo1PVgci8qkD3mYvwuT4kAeuZFubzuhy+1evCYdJKNErN49/dWgNiV7L2vBUEz6v\n DmOmR4HFGG+OyeZUVDtbRJXa4v24n44nEMZNPc39ClcMgQ6NJEaRaEInC0FoxliXzI8F\n X2ZGzl8NwkN1NS0ZNFpzQgMZM1wzyvohbAIql2Aa9whDVHzDA0qe96J6ZeYG7RELjNm6\n wHpzWgCEQgH4G/B6Cs3ZvzRSAqfLMo+4YnL/RT+Y+/N+I/hADnwRLLfDnGqATQdZMOEL\n d7Wg==", "X-Gm-Message-State": "AOJu0YwxenpDpbYTJQ3xhk7013RmKtYzrOZtuGmA71JvyF68o3sCw1Ny\n\tMCa4RSwfktqDa4gM+uwvc4I+b8yNtVpTvqQ+MDsNyb8FcQOzBvvu3qKmq8v1ixCn", "X-Gm-Gg": "AeBDietiymnR434rw/qM+2Mv50qTGV8i7gX970M34Luf/ryhQTV01vwcp96T81o8H7Y\n\tHckQ2JiyOcRgqr3iDcfDp06kH18Fcey6eDwoi67+hIZBUKwkq19MX2zZscPeTsmFOaKT0UvgvFJ\n\t+7Fb4oKh06PHrYP5JQzcY0c2rAUPV7z7v2H1kmOErN1RbAIga8fcLbm/e61tk68n4T10+slxbQn\n\tONxWMZiymauO2w+jm4AuJ3yOFckkyAeFeueA16bdWRazSlPDN/7OxUhTpQ6Mw26jXvFCH5WE5wT\n\tVlIlwOgcWuK1ZJQpa9hrc6aB7FjQ27bvMfEuyS/T6RhWxVg/y9cOWNsK6ehnF4cabtWSZNLflQI\n\te1a8OT++b1OtwMylzshn4sLMJTzIkPqiLYH3h7SGLCwM9GO/Rsr4bUr0qEboZjp47XT6fsrVUHF\n\ttwYJ/th3xI3PYR29fNW5w5CkcbjLt4oKVjDzKC", "X-Received": "by 2002:a05:6a00:2388:b0:82c:66f2:1226 with SMTP id\n d2e1a72fcca58-82f0c25258fmr12342783b3a.38.1776061540812;\n Sun, 12 Apr 2026 23:25:40 -0700 (PDT)", "From": "Diangang Li <diangangli@gmail.com>", "To": "tytso@mit.edu,\n\tadilger.kernel@dilger.ca", "Cc": "linux-ext4@vger.kernel.org,\n\tlinux-fsdevel@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org,\n\tchangfengnan@bytedance.com,\n\tyizhang089@gmail.com,\n\twilly@infradead.org,\n\tDiangang Li <lidiangang@bytedance.com>", "Subject": "[RFC v2 0/1] ext4: fail fast on repeated buffer_head reads after IO\n failure", "Date": "Mon, 13 Apr 2026 14:24:59 +0800", "Message-Id": "<20260413062500.1380307-1-diangangli@gmail.com>", "X-Mailer": "git-send-email 2.39.5", "In-Reply-To": "<20260325093349.630193-1-diangangli@gmail.com>", "References": "<20260325093349.630193-1-diangangli@gmail.com>", "Precedence": "bulk", "X-Mailing-List": "linux-ext4@vger.kernel.org", "List-Id": "<linux-ext4.vger.kernel.org>", "List-Subscribe": "<mailto:linux-ext4+subscribe@vger.kernel.org>", "List-Unsubscribe": "<mailto:linux-ext4+unsubscribe@vger.kernel.org>", "MIME-Version": "1.0", "Content-Transfer-Encoding": "8bit", "X-Spam-Status": "No, score=-1.2 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,\n\tFREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,\n\tMAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=disabled\n\tversion=4.0.1", "X-Spam-Checker-Version": "SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org" }, "content": "From: Diangang Li <lidiangang@bytedance.com>\n\nA production system reported hung tasks blocked for 300s+ in ext4 buffer_head\npaths. Hung task reports were accompanied by disk IO errors, but profiling\nshowed that most individual reads completed (or failed) within 10s, with\nthe worst case around 60s.\n\nAt the same time, we observed a high repeat rate to the same disk LBAs.\nThe repeated reads frequently showed seconds-level latency and ended with\nIO errors, e.g.:\n\n [Tue Mar 24 14:16:24 2026] blk_update_request: I/O error, dev sdi,\n sector 10704150288 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0\n [Tue Mar 24 14:16:25 2026] blk_update_request: I/O error, dev sdi,\n sector 10704488160 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0\n [Tue Mar 24 14:16:26 2026] blk_update_request: I/O error, dev sdi,\n sector 10704382912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0\n\nWe also sampled repeated-LBA latency histograms on /dev/sdi and saw that\nthe same error-prone LBAs were re-submitted many times with ~1-4s latency:\n\n LBA 10704488160 (count=22): 1-2s: 20, 2-4s: 2\n LBA 10704382912 (count=21): 1-2s: 20, 2-4s: 1\n LBA 10704150288 (count=21): 1-2s: 19, 2-4s: 2\n\nRoot cause\n==========\n\next4 buffer_head reads serialize IO via BH_Lock. When one read fails, the\nbuffer remains !Uptodate. With multiple threads concurrently accessing\nthe same buffer_head, each waiter wakes up after the previous owner drops\nBH_Lock, then submits the same read again and waits again. This makes the\nlatency grow linearly with the number of contending threads, leading to\n300s+ hung tasks.\n\nThe failing IOs are repeatedly issued to the same LBA. The observed 1s+\nper-IO latency is likely from device-side retry/error recovery. On SCSI the\ndriver typically retries reads several times (e.g. 5 retries in our\nenvironment), so a single filesystem submission can easily accumulate 5s+\ndelay before failing. When multiple threads then re-submit the same failing\nread and serialize on BH_Lock, the delay is amplified into 300s+ hung tasks.\n\nSimilar behavior exists for other devices (e.g. NVMe with multiple internal\nretries).\n\nExample hung stacks:\n\n INFO: task toutiao.infra.t:3760933 blocked for more than 327 seconds.\n Call Trace:\n __schedule\n io_schedule\n __wait_on_bit_lock\n bh_uptodate_or_lock\n __read_extent_tree_block\n ext4_find_extent\n ext4_ext_map_blocks\n ext4_map_blocks\n ext4_getblk\n ext4_bread\n __ext4_read_dirblock\n dx_probe\n ext4_htree_fill_tree\n ext4_readdir\n iterate_dir\n ksys_getdents64\n\n INFO: task toutiao.infra.t:2724456 blocked for more than 327 seconds.\n Call Trace:\n __schedule\n io_schedule\n __wait_on_bit_lock\n ext4_read_bh_lock\n ext4_bread\n __ext4_read_dirblock\n htree_dirblock_to_tree\n ext4_htree_fill_tree\n ext4_readdir\n iterate_dir\n ksys_getdents64\n\nApproach\n========\n\nRecord read failures on buffer_head (BH_Read_EIO + b_err_timestamp). When a\nretry window is configured (sysfs: err_retry_sec), ext4 will skip submitting\nanother read for the buffer_head that already failed within the window and\nreturn/unlock immediately. Clear the state on successful completion so the\nbuffer can recover if the error is transient.\n\nerr_retry_sec defaults to 0, which keeps the current behavior: after a read\nerror, callers may keep retrying the same read. Set it to a non-zero value\nto throttle repeated reads within the window.\n\nPatch summary\n=============\n\n 1) Add BH_Read_EIO, b_err_timestamp and a small helper for tracking read\n failures on buffer_head.\n 2) Update end_buffer_read_sync() and end_buffer_write_sync() (success path)\n to maintain that state.\n 3) Add ext4 sysfs knob err_retry_sec and throttle ext4 buffer_head reads\n within the configured window.\n 4) Pass sb into ext4_read_bh_nowait(), ext4_read_bh() and ext4_read_bh_lock()\n so __ext4_read_bh() can apply the per-sb retry window check.\n\nDiangang Li (1):\n ext4: fail fast on repeated buffer_head reads after IO failure\n\n fs/buffer.c | 2 ++\n fs/ext4/balloc.c | 2 +-\n fs/ext4/ext4.h | 13 ++++++----\n fs/ext4/extents.c | 2 +-\n fs/ext4/ialloc.c | 3 ++-\n fs/ext4/indirect.c | 2 +-\n fs/ext4/inode.c | 10 ++++----\n fs/ext4/mmp.c | 2 +-\n fs/ext4/move_extent.c | 2 +-\n fs/ext4/resize.c | 2 +-\n fs/ext4/super.c | 51 +++++++++++++++++++++++++++----------\n fs/ext4/sysfs.c | 2 ++\n include/linux/buffer_head.h | 16 ++++++++++++\n 13 files changed, 79 insertions(+), 30 deletions(-)" }