{"id":2230893,"url":"http://patchwork.ozlabs.org/api/1.1/patches/2230893/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-ide/patch/20260430073417.1803833-1-wdhh6@aliyun.com/","project":{"id":13,"url":"http://patchwork.ozlabs.org/api/1.1/projects/13/?format=json","name":"Linux IDE development","link_name":"linux-ide","list_id":"linux-ide.vger.kernel.org","list_email":"linux-ide@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null},"msgid":"<20260430073417.1803833-1-wdhh6@aliyun.com>","date":"2026-04-30T07:34:16","name":"libata: disable device after repeated media errors","commit_ref":null,"pull_url":null,"state":"new","archived":false,"hash":"a59c93e5e73ac57cf556770cd4fe63ef96b95ad9","submitter":{"id":90742,"url":"http://patchwork.ozlabs.org/api/1.1/people/90742/?format=json","name":"Chaohai Chen","email":"wdhh6@aliyun.com"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/linux-ide/patch/20260430073417.1803833-1-wdhh6@aliyun.com/mbox/","series":[{"id":502226,"url":"http://patchwork.ozlabs.org/api/1.1/series/502226/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-ide/list/?series=502226","date":"2026-04-30T07:34:16","name":"libata: disable device after repeated media errors","version":1,"mbox":"http://patchwork.ozlabs.org/series/502226/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/2230893/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/2230893/checks/","tags":{},"headers":{"Return-Path":"\n <linux-ide+bounces-5578-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ide@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=aliyun.com header.i=@aliyun.com header.a=rsa-sha256\n header.s=s1024 header.b=wtaYBWSY;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c0a:e001:db::12fc:5321; helo=sea.lore.kernel.org;\n envelope-from=linux-ide+bounces-5578-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (1024-bit key) header.d=aliyun.com header.i=@aliyun.com\n header.b=\"wtaYBWSY\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=115.124.30.70","smtp.subspace.kernel.org;\n dmarc=pass (p=reject dis=none) header.from=aliyun.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=aliyun.com"],"Received":["from sea.lore.kernel.org (sea.lore.kernel.org\n [IPv6:2600:3c0a:e001:db::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g5mDb0SYdz1yGq\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 30 Apr 2026 17:35:43 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sea.lore.kernel.org (Postfix) with ESMTP id C7BFB30378B6\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 30 Apr 2026 07:34:49 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 7D0AD3DD528;\n\tThu, 30 Apr 2026 07:34:46 +0000 (UTC)","from out30-70.freemail.mail.aliyun.com\n (out30-70.freemail.mail.aliyun.com [115.124.30.70])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DD5A3D88FC;\n\tThu, 30 Apr 2026 07:34:39 +0000 (UTC)","from localhost.localdomain(mailfrom:wdhh6@aliyun.com\n fp:SMTPD_---0X2-Gogb_1777534468 cluster:ay36)\n          by smtp.aliyun-inc.com;\n          Thu, 30 Apr 2026 15:34:35 +0800"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1777534485; cv=none;\n b=JjiMGvDmreStyUg1Hc3z9FABAGZ2r3RjmTm7Dl4SDBhWBQMIg9/opffLvzRFsAJSnLs8bEl3hz6zhBBob0Q9yyBDWfTP6g27vQ8wx4oBVu5Umq+YIq+yvZJt/CkrZcFvL/y0Z/DnGDq7mymeErbxXqcB7HOmy2kz4ze4QDNzFFw=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1777534485; c=relaxed/simple;\n\tbh=P9sj3hk0mCZPWcHY9ILG/Es22mpMCVVmn9hdmfE2J5Q=;\n\th=From:To:Cc:Subject:Date:Message-ID:MIME-Version;\n b=ZfntwsKdAkQYxipD60VyiT9ygL6sKDl35b93ypddmJNQUf0MQWyKciKDtboBzWztU821j1L2G6/zdKX36R9/WgsnfjvDy4voVisIRO8M1o6iAhEXWkIfLUDg72n3utS00FEMap5S1k+l4EZSJ/uDUC1c+XyoFUmDAsOT5yRqbbU=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=reject dis=none) header.from=aliyun.com;\n spf=pass smtp.mailfrom=aliyun.com;\n dkim=pass (1024-bit key) header.d=aliyun.com header.i=@aliyun.com\n header.b=wtaYBWSY; arc=none smtp.client-ip=115.124.30.70","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=aliyun.com; s=s1024;\n\tt=1777534476; h=From:To:Subject:Date:Message-ID:MIME-Version;\n\tbh=Y9ydREAEDogILRxFScgPo36dDIpEsk5PId899GH/I9Y=;\n\tb=wtaYBWSYkSykMLWbjsLHCskIocpdKkfLuCM5d5xulX9Ygg3TsUI7r/W+u7fcsgBAGxSP9Yxw5W64M/b714yIPMbKXXa1Z5ZNqxlEUOjdnPVl+BSeRSgDhHQp/NTtG+ZeckyaPabfPWWC0kw5Rzapd1drhZ7jlgL/f3du+WeAsvs=","X-Alimail-AntiSpam":"\n AC=CONTINUE;BC=0.07357557|-1;CH=green;DM=|CONTINUE|false|;DS=CONTINUE|ham_alarm|0.00609031-0.000238013-0.993672;FP=18194540581231619322|0|0|0|0|-1|-1|-1;HT=maildocker-contentspam033045133197;MF=wdhh6@aliyun.com;NM=1;PH=DS;RN=5;RT=5;SR=0;TI=SMTPD_---0X2-Gogb_1777534468;","From":"Chaohai Chen <wdhh6@aliyun.com>","To":"dlemoal@kernel.org,\n\tcassel@kernel.org","Cc":"linux-ide@vger.kernel.org,\n\tlinux-kernel@vger.kernel.org,\n\tChaohai Chen <wdhh6@aliyun.com>","Subject":"[PATCH] libata: disable device after repeated media errors","Date":"Thu, 30 Apr 2026 15:34:16 +0800","Message-ID":"<20260430073417.1803833-1-wdhh6@aliyun.com>","X-Mailer":"git-send-email 2.43.7","Precedence":"bulk","X-Mailing-List":"linux-ide@vger.kernel.org","List-Id":"<linux-ide.vger.kernel.org>","List-Subscribe":"<mailto:linux-ide+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ide+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Transfer-Encoding":"8bit"},"content":"When a SATA device (particularly those behind SAS HBAs using libsas)\nhits unrecoverable media errors, it can trigger an infinite EH loop:\nthe device returns medium error, libata performs a hard reset (which\nsucceeds since the device is functional), then the upper layer retries\nthe read to the same bad sector, triggering another EH cycle.\n\nThis loop is particularly harmful for SATA devices behind SAS HBAs\nbecause all devices sharing the same Scsi_Host are blocked during\nSHOST_RECOVERY, not just the faulty device.\n\nFix this by tracking media error frequency per device. If a device\ntriggers more than media_err_limit (default 10) media errors within\na media_err_window (default 60 seconds), disable the device. This\nallows the SCSI layer to offline the faulty device and restore I/O\nto healthy devices on the same HBA.\n\nThe parameters are exposed via sysfs for runtime tuning:\n  /sys/class/ata_device/devX.Y/media_err_limit  (rw, 0=disable)\n  /sys/class/ata_device/devX.Y/media_err_window (rw, seconds)\n  /sys/class/ata_device/devX.Y/media_err_count  (ro, current count)\n\nSigned-off-by: Chaohai Chen <wdhh6@aliyun.com>\n---\n drivers/ata/libata-core.c      |  2 +\n drivers/ata/libata-eh.c        | 35 ++++++++++++++++\n drivers/ata/libata-transport.c | 76 ++++++++++++++++++++++++++++++++++\n include/linux/libata.h         | 12 ++++++\n 4 files changed, 125 insertions(+)","diff":"diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c\nindex e76d15411e2a..9ea32ed53156 100644\n--- a/drivers/ata/libata-core.c\n+++ b/drivers/ata/libata-core.c\n@@ -5559,6 +5559,8 @@ void ata_dev_init(struct ata_device *dev)\n \tdev->pio_mask = UINT_MAX;\n \tdev->mwdma_mask = UINT_MAX;\n \tdev->udma_mask = UINT_MAX;\n+\tdev->media_err_limit = ATA_EH_MEDIA_ERR_LIMIT;\n+\tdev->media_err_window = ATA_EH_MEDIA_ERR_WINDOW;\n }\n \n /**\ndiff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c\nindex 9a4b67b90b17..9fc78020ac7e 100644\n--- a/drivers/ata/libata-eh.c\n+++ b/drivers/ata/libata-eh.c\n@@ -2419,12 +2419,47 @@ static void ata_eh_link_autopsy(struct ata_link *link)\n \t\t      ata_dev_enabled(link->device))))\n \t    dev = link->device;\n \n+\t/*\n+\t * Track repeated media errors. If the same device hits media errors\n+\t * too many times within a configurable time window, disable it to\n+\t * prevent infinite EH loops that block other devices sharing the\n+\t * same Scsi_Host (particularly relevant for SATA devices behind\n+\t * SAS HBAs using libsas).\n+\t *\n+\t * media_err_limit == 0 means this feature is disabled.\n+\t */\n+\tif (dev && (all_err_mask & AC_ERR_MEDIA) && dev->media_err_limit) {\n+\t\tunsigned long now = jiffies;\n+\t\tunsigned long window = (unsigned long)dev->media_err_window * HZ;\n+\n+\t\tif (!dev->media_err_count ||\n+\t\t    time_after(now, dev->media_err_first_jiffies + window)) {\n+\t\t\tdev->media_err_count = 1;\n+\t\t\tdev->media_err_first_jiffies = now;\n+\t\t} else {\n+\t\t\tdev->media_err_count++;\n+\t\t}\n+\n+\t\tif (dev->media_err_count >= dev->media_err_limit) {\n+\t\t\tata_dev_err(dev,\n+\t\t\t\t\"too many media errors (%u in %u seconds), disabling device\\n\",\n+\t\t\t\tdev->media_err_count,\n+\t\t\t\tjiffies_to_msecs(now - dev->media_err_first_jiffies) / 1000);\n+\t\t\tata_dev_disable(dev);\n+\t\t\tdev->media_err_count = 0;\n+\t\t\t/* skip speed_down for disabled device */\n+\t\t\tgoto out_autopsy;\n+\t\t}\n+\t}\n+\n \tif (dev) {\n \t\tif (dev->flags & ATA_DFLAG_DUBIOUS_XFER)\n \t\t\teflags |= ATA_EFLAG_DUBIOUS_XFER;\n \t\tehc->i.action |= ata_eh_speed_down(dev, eflags, all_err_mask);\n \t\ttrace_ata_eh_link_autopsy(dev, ehc->i.action, all_err_mask);\n \t}\n+out_autopsy:\n+\treturn;\n }\n \n /**\ndiff --git a/drivers/ata/libata-transport.c b/drivers/ata/libata-transport.c\nindex 95862dc34419..c73a0bf0eeb7 100644\n--- a/drivers/ata/libata-transport.c\n+++ b/drivers/ata/libata-transport.c\n@@ -477,6 +477,79 @@ show_ata_dev_trim(struct device *dev,\n \n static DEVICE_ATTR(trim, S_IRUGO, show_ata_dev_trim, NULL);\n \n+static ssize_t\n+media_err_limit_show(struct device *dev,\n+\t\t     struct device_attribute *attr, char *buf)\n+{\n+\tstruct ata_device *ata_dev = transport_class_to_dev(dev);\n+\n+\treturn sysfs_emit(buf, \"%u\\n\", ata_dev->media_err_limit);\n+}\n+\n+static ssize_t\n+media_err_limit_store(struct device *dev,\n+\t\t      struct device_attribute *attr,\n+\t\t      const char *buf, size_t count)\n+{\n+\tstruct ata_device *ata_dev = transport_class_to_dev(dev);\n+\tunsigned int val;\n+\tint rc;\n+\n+\trc = kstrtouint(buf, 0, &val);\n+\tif (rc)\n+\t\treturn rc;\n+\n+\tata_dev->media_err_limit = val;\n+\tata_dev->media_err_count = 0;\n+\treturn count;\n+}\n+\n+static DEVICE_ATTR_RW(media_err_limit);\n+\n+static ssize_t\n+media_err_window_show(struct device *dev,\n+\t\t      struct device_attribute *attr, char *buf)\n+{\n+\tstruct ata_device *ata_dev = transport_class_to_dev(dev);\n+\n+\treturn sysfs_emit(buf, \"%u\\n\", ata_dev->media_err_window);\n+}\n+\n+static ssize_t\n+media_err_window_store(struct device *dev,\n+\t\t       struct device_attribute *attr,\n+\t\t       const char *buf, size_t count)\n+{\n+\tstruct ata_device *ata_dev = transport_class_to_dev(dev);\n+\tunsigned int val;\n+\tint rc;\n+\n+\trc = kstrtouint(buf, 0, &val);\n+\tif (rc)\n+\t\treturn rc;\n+\n+\t/* window=0 would prevent the counter from ever accumulating */\n+\tif (!val)\n+\t\treturn -EINVAL;\n+\n+\tata_dev->media_err_window = val;\n+\tata_dev->media_err_count = 0;\n+\treturn count;\n+}\n+\n+static DEVICE_ATTR_RW(media_err_window);\n+\n+static ssize_t\n+media_err_count_show(struct device *dev,\n+\t\t     struct device_attribute *attr, char *buf)\n+{\n+\tstruct ata_device *ata_dev = transport_class_to_dev(dev);\n+\n+\treturn sysfs_emit(buf, \"%u\\n\", ata_dev->media_err_count);\n+}\n+\n+static DEVICE_ATTR_RO(media_err_count);\n+\n static const struct attribute *const ata_device_attr_attrs[] = {\n \t&dev_attr_class.attr,\n \t&dev_attr_pio_mode.attr,\n@@ -487,6 +560,9 @@ static const struct attribute *const ata_device_attr_attrs[] = {\n \t&dev_attr_id.attr,\n \t&dev_attr_gscr.attr,\n \t&dev_attr_trim.attr,\n+\t&dev_attr_media_err_limit.attr,\n+\t&dev_attr_media_err_window.attr,\n+\t&dev_attr_media_err_count.attr,\n \tNULL\n };\n \ndiff --git a/include/linux/libata.h b/include/linux/libata.h\nindex 5c085ef4eda7..8715704e06a6 100644\n--- a/include/linux/libata.h\n+++ b/include/linux/libata.h\n@@ -419,6 +419,11 @@ enum {\n \tATA_EH_PMP_TRIES\t= 5,\n \tATA_EH_PMP_LINK_TRIES\t= 3,\n \n+\t/* default: disable device after this many media errors in time window */\n+\tATA_EH_MEDIA_ERR_LIMIT\t= 10,\n+\t/* default: time window in seconds */\n+\tATA_EH_MEDIA_ERR_WINDOW\t= 60,\n+\n \tSATA_PMP_RW_TIMEOUT\t= 3000,\t\t/* PMP read/write timeout */\n \n \t/* This should match the actual table size of\n@@ -786,6 +791,13 @@ struct ata_device {\n \n \t/* error history */\n \tint\t\t\tspdn_cnt;\n+\n+\t/* media error tracking for repeated EH */\n+\tunsigned int\t\tmedia_err_count;\n+\tunsigned long\t\tmedia_err_first_jiffies;\n+\tunsigned int\t\tmedia_err_limit;\n+\tunsigned int\t\tmedia_err_window;\n+\n \t/* ering is CLEAR_END, read comment above CLEAR_END */\n \tstruct ata_ering\tering;\n \n","prefixes":[]}