{"id":2175400,"url":"http://patchwork.ozlabs.org/api/1.0/patches/2175400/?format=json","project":{"id":13,"url":"http://patchwork.ozlabs.org/api/1.0/projects/13/?format=json","name":"Linux IDE development","link_name":"linux-ide","list_id":"linux-ide.vger.kernel.org","list_email":"linux-ide@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null},"msgid":"<20251217231712.490765-3-dlemoal@kernel.org>","date":"2025-12-17T23:17:12","name":"[2/2] ata: libata-scsi: avoid passthrough command starvation","commit_ref":null,"pull_url":null,"state":"new","archived":false,"hash":"7294984256f2a9f8ce03798069fd5aea4fe6c087","submitter":{"id":86188,"url":"http://patchwork.ozlabs.org/api/1.0/people/86188/?format=json","name":"Damien Le Moal","email":"dlemoal@kernel.org"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/linux-ide/patch/20251217231712.490765-3-dlemoal@kernel.org/mbox/","series":[{"id":485778,"url":"http://patchwork.ozlabs.org/api/1.0/series/485778/?format=json","date":"2025-12-17T23:17:10","name":"Prevent non-NCQ command starvation","version":1,"mbox":"http://patchwork.ozlabs.org/series/485778/mbox/"}],"check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/2175400/checks/","tags":{},"headers":{"Return-Path":"\n <linux-ide+bounces-4799-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-ide@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256\n header.s=k20201202 header.b=ldg4gqyn;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=172.234.253.10; helo=sea.lore.kernel.org;\n envelope-from=linux-ide+bounces-4799-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=\"ldg4gqyn\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=10.30.226.201"],"Received":["from sea.lore.kernel.org (sea.lore.kernel.org [172.234.253.10])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4dWqZ149Y5z1xpw\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 18 Dec 2025 10:21:45 +1100 (AEDT)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sea.lore.kernel.org (Postfix) with ESMTP id CAB8D302EF4F\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 17 Dec 2025 23:21:42 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id A6EF0330338;\n\tWed, 17 Dec 2025 23:21:42 +0000 (UTC)","from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org\n [10.30.226.201])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 63FB72253FF\n\tfor <linux-ide@vger.kernel.org>; Wed, 17 Dec 2025 23:21:42 +0000 (UTC)","by smtp.kernel.org (Postfix) with ESMTPSA id 715F7C4CEF5;\n\tWed, 17 Dec 2025 23:21:41 +0000 (UTC)"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1766013702; cv=none;\n b=jo7esOzHDbV7i4zl9OJjea7BLMxLnK1u2IMIYsPDFzqWR3jPuP6uZPMrCNI6weFlpSIvNvrXrL6sPgMRTtpyrH0x0zkQ4EuROitpFXmBIpQDJiwsOJCIW4uHK+Vmb1UpUTTSlM/+CzS5PUc9OVJrpeHIHRXLXHWTn9rNqJIOQh8=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1766013702; c=relaxed/simple;\n\tbh=6y9bcpnEzUPvrn5mzek6eF6SZPUB6/WP0oQ09lJ61Uw=;\n\th=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:\n\t MIME-Version;\n b=fbj1TM//7zlrM9XU6m3eyUb39eISkhY9MumqhO5P0ObJrUjRNtVR+AmuJZJnqwkhWe0M/rfIuB8Eq2K564roANBT3wWpvh32abN4DBvX1RDejufoj8vDh3hE0NRE399XNfxWp7GvkJAMfz6cZDDdtSp7wkzuO4IgnsfC0Um8+fI=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=ldg4gqyn; arc=none smtp.client-ip=10.30.226.201","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;\n\ts=k20201202; t=1766013701;\n\tbh=6y9bcpnEzUPvrn5mzek6eF6SZPUB6/WP0oQ09lJ61Uw=;\n\th=From:To:Cc:Subject:Date:In-Reply-To:References:From;\n\tb=ldg4gqynLYbsJN0fAfK8XmGwE0OvU43hRNqsGzrLdBMbU6iMMvVMsrQ4sWffdPxRe\n\t gObOj6Yl8EV+gfkJ9RTx0zZtUxNrjHqisPtgML031Yw1gBX/FWWzZ7gj+zBY5W/to6\n\t Dz7vVasulcFPCnLkkAHjJ0qi2HvMUBheQs09TU/0Q8OxaVX9njjf54Z4V4kpUAIRjq\n\t CE99bHd68EVLjrdDTkELpnKEAIno6i/NqExXZqjF7tj9OWY1awMDTYI0IEDlcov1zX\n\t +NU3j0ZpouvvexR1EWb+g0//c6k345jXKOVLht3aQSUEAotzp9GFP4RjPhUTBQnZmC\n\t Jsrg4hahfalbg==","From":"Damien Le Moal <dlemoal@kernel.org>","To":"linux-ide@vger.kernel.org,\n\tNiklas Cassel <cassel@kernel.org>","Cc":"Igor Pylypiv <ipylypiv@google.com>,\n\tXingui Yang <yangxingui@huawei.com>","Subject":"[PATCH 2/2] ata: libata-scsi: avoid passthrough command starvation","Date":"Thu, 18 Dec 2025 08:17:12 +0900","Message-ID":"<20251217231712.490765-3-dlemoal@kernel.org>","X-Mailer":"git-send-email 2.52.0","In-Reply-To":"<20251217231712.490765-1-dlemoal@kernel.org>","References":"<20251217231712.490765-1-dlemoal@kernel.org>","Precedence":"bulk","X-Mailing-List":"linux-ide@vger.kernel.org","List-Id":"<linux-ide.vger.kernel.org>","List-Subscribe":"<mailto:linux-ide+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-ide+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Transfer-Encoding":"8bit"},"content":"When a non-NCQ passthrough command is issued while NCQ commands are\nbeing executed, ata_scsi_defer() indicates to ata_scsi_translate() that\nata_qc_issue() should not be called for the passthrough command, and\ninstead returns SCSI_MLQUEUE_XXX_BUSY to defer the command execution.\nThis command deferring is correct and as mandated by the ACS\nspecifications since NCQ and non-NCQ commands cannot be mixed.\n\nHowever, in the case of a host adapter using multiple submission queues,\nwhen the target device is under a constant load of NCQ read or write\ncommands, there are no guarantees that requeueing the non-NCQ command\nwill lead to it not being deferred again repeatedly, since other\nsubmission queues can constantly issue NCQ commands from different CPUs\nahead of the non-NCQ command. This can lead to very long delays for the\nexecution of non-NCQ passthrough commands, and even complete starvation\nin the worst case scenario.\n\nSince the block layer and the SCSI layer do not distinguish between\nqueuable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT\nimplementation must ensure forward progress for non-NCQ commands in the\npresence of NCQ command traffic. This is similar to what SAS HBAs with a\nhardware/firmware based implementation do.\n\nImplement such forward progress guarantee by limiting requeueing of\nnon-NCQ commands: when a non-NCQ command is received and NCQ commands\nare in-flight, do not force a requeue of the non-NCQ command by\nreturning SCSI_MLQUEUE_XXX_BUSY in ata_scsi_translate() and instead\nhold on to the qc using the new deferred_qc field of struct ata_port.\n\nThis deferred qc will be issued using the work item deferred_qc_work\nrunning the function ata_scsi_deferred_qc_work() once all in-flight\ncommands complete, which is checked with the port qc_defer() callback\nindicating that no further delay is necessary. This check is done using\nthe helper function ata_scsi_schedule_deferred_qc() which is called from\nata_scsi_qc_complete(). This thus excludes this mechanism from all\ninternal non-NCQ commands issued by ATA EH.\n\nWhen a port deferred_qc is non NULL, that is, the port has a command\nwaiting for the device queue to drain, the issuing of all incoming\ncommands (both NCQ and non-NCQ) is deferred using the regular busy\nmechanism. This simplifies the code and also avoids potential denial of\nservice problems if a user issues too many non-NCQ passthrough commands.\n\nFinally, whenever ata EH is scheduled, regardless of the reason, a\ndeferred qc is always requeued so that it can be retried once EH\ncompletes. This is done by calling the function\nata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids\nthe need for any special processing for the deferred qc in case of NCQ\nerror, link or device reset, or device timeout.\n\nReported-by: Xingui Yang <yangxingui@huawei.com>\nReported-by: Igor Pylypiv <ipylypiv@google.com>\nFixes: 42f22fe36d51 (\"scsi: pm8001: Expose hardware queues for pm80xx\")\nCc: stable@vger.kernel.org\nSigned-off-by: Damien Le Moal <dlemoal@kernel.org>\n---\n drivers/ata/libata-core.c |  1 +\n drivers/ata/libata-eh.c   |  6 +++\n drivers/ata/libata-scsi.c | 93 ++++++++++++++++++++++++++++++++++++++-\n drivers/ata/libata.h      |  2 +\n include/linux/libata.h    |  3 ++\n 5 files changed, 104 insertions(+), 1 deletion(-)","diff":"diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c\nindex 0b24bd169d61..121f35115d33 100644\n--- a/drivers/ata/libata-core.c\n+++ b/drivers/ata/libata-core.c\n@@ -5558,6 +5558,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)\n \tmutex_init(&ap->scsi_scan_mutex);\n \tINIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug);\n \tINIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan);\n+\tINIT_WORK(&ap->deferred_qc_work, ata_scsi_deferred_qc_work);\n \tINIT_LIST_HEAD(&ap->eh_done_q);\n \tinit_waitqueue_head(&ap->eh_wait_q);\n \tinit_completion(&ap->park_req_pending);\ndiff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c\nindex 2586e77ebf45..b90b17f680f8 100644\n--- a/drivers/ata/libata-eh.c\n+++ b/drivers/ata/libata-eh.c\n@@ -917,6 +917,12 @@ static void ata_eh_set_pending(struct ata_port *ap, bool fastdrain)\n \n \tap->pflags |= ATA_PFLAG_EH_PENDING;\n \n+\t/*\n+\t * If we have a deferred qc, requeue it so that it is retried once EH\n+\t * completes.\n+\t */\n+\tata_scsi_requeue_deferred_qc(ap);\n+\n \tif (!fastdrain)\n \t\treturn;\n \ndiff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c\nindex 42d103542525..c5ebd2bab356 100644\n--- a/drivers/ata/libata-scsi.c\n+++ b/drivers/ata/libata-scsi.c\n@@ -1658,8 +1658,75 @@ static void ata_qc_done(struct ata_queued_cmd *qc)\n \tdone(cmd);\n }\n \n+void ata_scsi_deferred_qc_work(struct work_struct *work)\n+{\n+\tstruct ata_port *ap =\n+\t\tcontainer_of(work, struct ata_port, deferred_qc_work);\n+\tstruct ata_queued_cmd *qc;\n+\tunsigned long flags;\n+\n+\tspin_lock_irqsave(ap->lock, flags);\n+\n+\t/*\n+\t * If we still have a deferred QC and we are not in EH, issue it. In\n+\t * such case, we should not need any more deferring the QC, so warn if\n+\t * qc_defer() says otherwise.\n+\t */\n+\tqc = ap->deferred_qc;\n+\tif (qc && !ata_port_eh_scheduled(ap)) {\n+\t\tWARN_ON_ONCE(ap->ops->qc_defer(qc));\n+\t\tap->deferred_qc = NULL;\n+\t\tata_qc_issue(qc);\n+\t}\n+\n+\tspin_unlock_irqrestore(ap->lock, flags);\n+}\n+\n+void ata_scsi_requeue_deferred_qc(struct ata_port *ap)\n+{\n+\tstruct ata_queued_cmd *qc = ap->deferred_qc;\n+\n+\tlockdep_assert_held(ap->lock);\n+\n+\t/*\n+\t * If we have a differed QC when a reset occurs or NCQ commands fail, do\n+\t * not try to be smart about what to do with this deferred command and\n+\t * simply retry it by completing it with DID_SOFT_ERROR.\n+\t */\n+\tif (qc) {\n+\t\tstruct scsi_cmnd *scmd = qc->scsicmd;\n+\n+\t\tap->deferred_qc = NULL;\n+\t\tata_qc_free(qc);\n+\t\tscmd->result = (DID_SOFT_ERROR << 16);\n+\t\tscsi_done(scmd);\n+\t}\n+}\n+\n+static void ata_scsi_schedule_deferred_qc(struct ata_port *ap)\n+{\n+\tstruct ata_queued_cmd *qc = ap->deferred_qc;\n+\n+\tlockdep_assert_held(ap->lock);\n+\n+\t/*\n+\t * If we have a differed QC, then qc_defer() is defined and we can use\n+\t * this callback to determine if this QC is good to go, unless EH has\n+\t * been scheduled.\n+\t */\n+\tif (qc) {\n+\t\tif (ata_port_eh_scheduled(ap)) {\n+\t\t\tata_scsi_requeue_deferred_qc(ap);\n+\t\t\treturn;\n+\t\t}\n+\t\tif (!ap->ops->qc_defer(qc))\n+\t\t\tqueue_work(system_highpri_wq, &ap->deferred_qc_work);\n+\t}\n+}\n+\n static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)\n {\n+\tstruct ata_port *ap = qc->ap;\n \tstruct scsi_cmnd *cmd = qc->scsicmd;\n \tu8 *cdb = cmd->cmnd;\n \tbool have_sense = qc->flags & ATA_QCFLAG_SENSE_VALID;\n@@ -1689,12 +1756,22 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)\n \t}\n \n \tata_qc_done(qc);\n+\n+\tata_scsi_schedule_deferred_qc(ap);\n }\n \n static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)\n {\n \tint ret;\n \n+\t/*\n+\t * If we already have a deferred QC, then rely on the SCSI layer to\n+\t * defer and requeue all incoming commands until the deferred QC is\n+\t * processed, once all on-going commands are completed.\n+\t */\n+\tif (ap->deferred_qc)\n+\t\treturn SCSI_MLQUEUE_DEVICE_BUSY;\n+\n \tif (!ap->ops->qc_defer)\n \t\treturn 0;\n \n@@ -1702,6 +1779,17 @@ static int ata_scsi_defer(struct ata_port *ap, struct ata_queued_cmd *qc)\n \tif (!ret)\n \t\treturn 0;\n \n+\t/*\n+\t * We must defer this QC: if this is not an NCQ command, keep this QC as\n+\t * a deferred one and wait for all on-going NCQ commands to complete\n+\t * before issuing it with the deferred QC work.\n+\t */\n+\tif (!ata_is_ncq(qc->tf.protocol)) {\n+\t\tap->deferred_qc = qc;\n+\t\treturn SCSI_MLQUEUE_DEVICE_BUSY;\n+\t}\n+\n+\t/* Use the SCSI layer to defer and requeue the command. */\n \tata_qc_free(qc);\n \n \tswitch (ret) {\n@@ -1777,8 +1865,11 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,\n \t\tgoto done;\n \n \trc = ata_scsi_defer(ap, qc);\n-\tif (rc)\n+\tif (rc) {\n+\t\tif (qc == ap->deferred_qc)\n+\t\t\treturn 0;\n \t\treturn rc;\n+\t}\n \n \tata_qc_issue(qc);\n \ndiff --git a/drivers/ata/libata.h b/drivers/ata/libata.h\nindex 0e7ecac73680..60a675df61dc 100644\n--- a/drivers/ata/libata.h\n+++ b/drivers/ata/libata.h\n@@ -165,6 +165,8 @@ void ata_scsi_sdev_config(struct scsi_device *sdev);\n int ata_scsi_dev_config(struct scsi_device *sdev, struct queue_limits *lim,\n \t\tstruct ata_device *dev);\n int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);\n+void ata_scsi_deferred_qc_work(struct work_struct *work);\n+void ata_scsi_requeue_deferred_qc(struct ata_port *ap);\n \n /* libata-eh.c */\n extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);\ndiff --git a/include/linux/libata.h b/include/linux/libata.h\nindex 39534fafa36a..c5b27d97dfaf 100644\n--- a/include/linux/libata.h\n+++ b/include/linux/libata.h\n@@ -903,6 +903,9 @@ struct ata_port {\n \tu64\t\t\tqc_active;\n \tint\t\t\tnr_active_links; /* #links with active qcs */\n \n+\tstruct work_struct\tdeferred_qc_work;\n+\tstruct ata_queued_cmd\t*deferred_qc;\n+\n \tstruct ata_link\t\tlink;\t\t/* host default link */\n \tstruct ata_link\t\t*slave_link;\t/* see ata_slave_link_init() */\n \n","prefixes":["2/2"]}