[{"id":3676718,"web_url":"http://patchwork.ozlabs.org/comment/3676718/","msgid":"<20260413124703.GA20496@macsyma-wired.lan>","list_archive_url":null,"date":"2026-04-13T12:47:03","subject":"Re: [RFC v2 0/1] ext4: fail fast on repeated buffer_head reads after\n IO failure","submitter":{"id":350,"url":"http://patchwork.ozlabs.org/api/people/350/","name":"Theodore Tso","email":"tytso@mit.edu"},"content":"On Mon, Apr 13, 2026 at 02:24:59PM +0800, Diangang Li wrote:\n> From: Diangang Li <lidiangang@bytedance.com>\n> \n> A production system reported hung tasks blocked for 300s+ in ext4\n> buffer_head paths....\n> \n>   [Tue Mar 24 14:16:24 2026] blk_update_request: I/O error, dev sdi,\n>       sector 10704150288 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0\n>   [Tue Mar 24 14:16:25 2026] blk_update_request: I/O error, dev sdi,\n>       sector 10704488160 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0\n>   [Tue Mar 24 14:16:26 2026] blk_update_request: I/O error, dev sdi,\n>       sector 10704382912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0\n\nI wonder whether the ext4 layer is the right place to be handle this\nsort of issue.  For example, it could be handled by having a subsystem\nscanning dmesg (or by wiring up notifications so block device errors\nget sent to a userspace daemon), and when certain criteria is met, the\nmachine is automatically sent to hardware operations to run\ndiagnostics and (most likey) replace the failing disk.\n\nIt could also be handled in the driver or SCSI layer so the \"fail\nfast\" semantics are handled there, so that it supports all file\nsystems, not just ext4.  The SCSI layer also has more information\nabout the type of error; you might want to handle things like media\nerrors differently from Fibre Channel or iSCSI timeouts (which might\nbe something where \"fast fast\" is not appropriate).\n\nBy the time the error gets propagated up to the buffer head, we lose a\nlot of detail about why the error took place.  Also, in the long term\nwe will hopefully be moving away from using buffer cache.\n\n   \t\t     \t    \t      \t    - Ted","headers":{"Return-Path":"\n <SRS0=SNOQ=CM=vger.kernel.org=linux-ext4+bounces-15809-patchwork-incoming=ozlabs.org@ozlabs.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ext4@vger.kernel.org"],"Delivered-To":["patchwork-incoming@legolas.ozlabs.org","patchwork-incoming@ozlabs.org"],"Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=mit.edu header.i=@mit.edu header.a=rsa-sha256\n header.s=outgoing header.b=Dr7NHjXJ;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org\n (client-ip=150.107.74.76; helo=mail.ozlabs.org;\n envelope-from=srs0=snoq=cm=vger.kernel.org=linux-ext4+bounces-15809-patchwork-incoming=ozlabs.org@ozlabs.org;\n receiver=patchwork.ozlabs.org)","gandalf.ozlabs.org;\n arc=pass smtp.remote-ip=\"2600:3c0a:e001:db::12fc:5321\"\n arc.chain=subspace.kernel.org","gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=mit.edu","gandalf.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=mit.edu header.i=@mit.edu header.a=rsa-sha256\n header.s=outgoing header.b=Dr7NHjXJ;\n\tdkim-atps=neutral","gandalf.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c0a:e001:db::12fc:5321; helo=sea.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15809-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=mit.edu header.i=@mit.edu\n header.b=\"Dr7NHjXJ\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=18.9.28.11","smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=mit.edu","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=mit.edu"],"Received":["from mail.ozlabs.org (gandalf.ozlabs.org [150.107.74.76])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1 raw public key)\n server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4fvS1Y2Lvnz1yDF\n\tfor <incoming@patchwork.ozlabs.org>; Mon, 13 Apr 2026 22:50:25 +1000 (AEST)","from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3])\n\tby gandalf.ozlabs.org (Postfix) with ESMTP id 4fvS1W4mW6z4wBB\n\tfor <incoming@patchwork.ozlabs.org>; Mon, 13 Apr 2026 22:50:23 +1000 (AEST)","by gandalf.ozlabs.org (Postfix)\n\tid 4fvS1W4WFCz4wCp; Mon, 13 Apr 2026 22:50:23 +1000 (AEST)","from sea.lore.kernel.org (sea.lore.kernel.org\n [IPv6:2600:3c0a:e001:db::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby gandalf.ozlabs.org (Postfix) with ESMTPS id 4fvS1P513Xz4wBB\n\tfor <patchwork-incoming@ozlabs.org>; Mon, 13 Apr 2026 22:50:17 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sea.lore.kernel.org (Postfix) with ESMTP id B29E03006B48\n\tfor <patchwork-incoming@ozlabs.org>; Mon, 13 Apr 2026 12:48:39 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 5E8323C7DF5;\n\tMon, 13 Apr 2026 12:48:39 +0000 (UTC)","from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id C0D2C3C73FB\n\tfor <linux-ext4@vger.kernel.org>; Mon, 13 Apr 2026 12:48:37 +0000 (UTC)","from macsyma.thunk.org (pool-173-48-113-10.bstnma.fios.verizon.net\n [173.48.113.10])\n\t(authenticated bits=0)\n        (User authenticated as tytso@ATHENA.MIT.EDU)\n\tby outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 63DCm4lu013301\n\t(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);\n\tMon, 13 Apr 2026 08:48:05 -0400","by macsyma.thunk.org (Postfix, from userid 15806)\n\tid EEB1962D9DC2; Mon, 13 Apr 2026 08:47:03 -0400 (EDT)"],"ARC-Seal":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1776084623; cv=pass;\n\tb=EpOy3QqRcTSmj5z+PnMR5tB23TAMirou0OEy0rGSRgjfXMkb3PGggUeFJzW5/9JI5oAqNXkeuTSsgrAPVtP2xpT/9tAEScDQ5NWcmi1nVG4Uuoh5i3eyvZv2rlpdmL5lY3MkytKUIq8KRpYOad/8lY0EWDxHdoLR7vpAy4EDUVCng9NPV9dVl7DPVCiTgfscPs1zFxBhBB5DuPHjAy+nTH9V1Bh60DTi57QjyIs8NLVuLKi9KigfPSwaKPhJ/s0X2vY/mu9jh1Kgof3utxMaWgPoYivRzAZEQzOTbS85wsemTGSmBj/3dnr8TGu4QIQI9N96tg7jFZtLAUfqmQBssA==","i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1776084519; cv=none;\n b=sL+3wdycTWFeWH8Hmxy6XKoDgmDWRNbTF6yN1ZMnVf9ozhGR6/iqC5uJAMu3L3aGn8poilYjmu1nyNR2lCKelLaz2IiFY/NnMXe5A0n8d77rkAEgqDtDFRSvoXABRj147lcn7mD4TIuHpCtfC51Z7XBlxUNLdb01fQdj1D+J9YA="],"ARC-Message-Signature":["i=2; a=rsa-sha256; d=ozlabs.org; s=201707;\n\tt=1776084623; c=relaxed/relaxed;\n\tbh=ew0N4RW/+mMo1okUz8IGgXrBvzrPN7l+s8J2ydS6XZI=;\n\th=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:\n\t Content-Type:Content-Disposition:In-Reply-To;\n b=RSQoGVQuAQL5+X0x6q8eirx2trI/Z0y44IIQ8t6tgKELo38YnySRT7yl+jmZhqvE/p6gwn4VXPWQAjXCr0VATZ72ihK+yyK/USXiAT1fMZlRlyWLFbCXg6zqZiLkf6zTgLE3qTQDjBtQP2BmQfrjO35HMdno5FICi1Gb82wzbstKt6N5tQFHn7+25bCuwMihNtUFO2zzAAgMuCVqJ/E9vEYBJHZVQ0nSxcMLpOf3L1Fbld3Rb5vQsY5To1bTUs5oKHsjYFKcgNqr4tTBu1oFAes0tVauxpk7qP6kf5DPwUJ2sGACRgYCD2FyfwHo2MMK8zM//uvI4YIrqBG6G/5ElA==","i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1776084519; c=relaxed/simple;\n\tbh=v7asg6qA7pO2JxM8JSNzirNqzT0RT3u69UBi0wGyXEE=;\n\th=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:\n\t Content-Type:Content-Disposition:In-Reply-To;\n b=D6ayqEpmIF0kBQPjGmTKBvvbBaWSF/pvefW8A1xEFWqT1GMzuw0Id0tZti3XSnWTYOwxCWof/Z9c3yruJWLUTsIzB3IFMl4zVfepHGB5SrP1zQoi/DBT8zo8f+p7LqgZRCiCLtcVIFYtPIHggk5NYq4YI1gHvIzIiz50//KMZMk="],"ARC-Authentication-Results":["i=2; gandalf.ozlabs.org;\n dmarc=pass (p=none dis=none) header.from=mit.edu; dkim=pass (2048-bit key;\n unprotected) header.d=mit.edu header.i=@mit.edu header.a=rsa-sha256\n header.s=outgoing header.b=Dr7NHjXJ; dkim-atps=neutral;\n spf=pass (client-ip=2600:3c0a:e001:db::12fc:5321; helo=sea.lore.kernel.org;\n envelope-from=linux-ext4+bounces-15809-patchwork-incoming=ozlabs.org@vger.kernel.org;\n receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org","i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=mit.edu;\n spf=pass smtp.mailfrom=mit.edu;\n dkim=pass (2048-bit key) header.d=mit.edu header.i=@mit.edu\n header.b=Dr7NHjXJ; arc=none smtp.client-ip=18.9.28.11"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing;\n\tt=1776084487; bh=ew0N4RW/+mMo1okUz8IGgXrBvzrPN7l+s8J2ydS6XZI=;\n\th=Date:From:Subject:Message-ID:MIME-Version:Content-Type;\n\tb=Dr7NHjXJPY+Ug6d5M4eVJih7sF68tlFdq1cMBnCxIwJJkQob4ee6dd0dXgT5zcXOs\n\t Aq4t9D3pjx4bp/bw7i3vf9AJdc2XGYB7eyq+0PKUaLa6ZurQunYb9bJLVHsJVoobxF\n\t bOwGP6sk/x8lXvfQib94symCoyaKSJTSYQlki3v0bsLZr7HjZ9gP3rbY3B17qvwo7K\n\t E6pChC1Yjfh1spZwnVOqZ4+o4sxXzEWLrHUvFzsqlknMx03hiSGq+Eh2SAeIuxlwkQ\n\t nZ2numKdP/Y+6lmfv89AUJY8cqA5FZLODZlW60zdD0KdNdRTwEd6TD43h3WJ4xbT+/\n\t 6c/5wrg0ukgPQ==","Date":"Mon, 13 Apr 2026 08:47:03 -0400","From":"\"Theodore Tso\" <tytso@mit.edu>","To":"Diangang Li <diangangli@gmail.com>","Cc":"adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,\n        linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,\n        changfengnan@bytedance.com, yizhang089@gmail.com, willy@infradead.org,\n        Diangang Li <lidiangang@bytedance.com>","Subject":"Re: [RFC v2 0/1] ext4: fail fast on repeated buffer_head reads after\n IO failure","Message-ID":"<20260413124703.GA20496@macsyma-wired.lan>","References":"<20260325093349.630193-1-diangangli@gmail.com>\n <20260413062500.1380307-1-diangangli@gmail.com>","Precedence":"bulk","X-Mailing-List":"linux-ext4@vger.kernel.org","List-Id":"<linux-ext4.vger.kernel.org>","List-Subscribe":"<mailto:linux-ext4+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ext4+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20260413062500.1380307-1-diangangli@gmail.com>","X-Spam-Status":"No, score=-1.2 required=5.0 tests=ARC_SIGNED,ARC_VALID,\n\tDKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS,\n\tHEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,\n\tSPF_PASS autolearn=disabled version=4.0.1","X-Spam-Checker-Version":"SpamAssassin 4.0.1 (2024-03-25) on gandalf.ozlabs.org"}}]