From patchwork Tue Nov 29 00:41:45 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uma Krishnan X-Patchwork-Id: 700245 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tSPyX0XVDz9vDw for ; Tue, 29 Nov 2016 11:47:16 +1100 (AEDT) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3tSPyW6vH5zDwGF for ; Tue, 29 Nov 2016 11:47:15 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3tSPrT1FcJzDvtr for ; Tue, 29 Nov 2016 11:42:01 +1100 (AEDT) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uAT0dAY6114524 for ; Mon, 28 Nov 2016 19:41:59 -0500 Received: from e38.co.us.ibm.com (e38.co.us.ibm.com [32.97.110.159]) by mx0a-001b2d01.pphosted.com with ESMTP id 270vsg7kma-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 28 Nov 2016 19:41:59 -0500 Received: from localhost by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 28 Nov 2016 17:41:58 -0700 Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 28 Nov 2016 17:41:57 -0700 Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 291F41FF0025; Mon, 28 Nov 2016 17:41:36 -0700 (MST) Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id uAT0ft9f12452200; Mon, 28 Nov 2016 17:41:55 -0700 Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 99A2E6E035; Mon, 28 Nov 2016 17:41:55 -0700 (MST) Received: from p8tul1-build.aus.stglabs.ibm.com (unknown [9.3.141.206]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP id 19A646E038; Mon, 28 Nov 2016 17:41:55 -0700 (MST) From: Uma Krishnan To: linux-scsi@vger.kernel.org, James Bottomley , "Martin K. Petersen" , "Matthew R. Ochs" , "Manoj N. Kumar" Subject: [PATCH v2 04/14] cxlflash: Avoid command room violation Date: Mon, 28 Nov 2016 18:41:45 -0600 X-Mailer: git-send-email 2.1.0 In-Reply-To: <1480379984-60114-1-git-send-email-ukrishn@linux.vnet.ibm.com> References: <1480379984-60114-1-git-send-email-ukrishn@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16112900-0028-0000-0000-000006241BF4 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006159; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000193; SDB=6.00786629; UDB=6.00380415; IPR=6.00564317; BA=6.00004921; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013474; XFM=3.00000011; UTC=2016-11-29 00:41:58 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16112900-0029-0000-0000-00003136FCA2 Message-Id: <1480380105-60310-1-git-send-email-ukrishn@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-11-28_19:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611290008 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Christophe Lombard , Frederic Barrat , Ian Munsie , Andrew Donnellan , Brian King , linuxppc-dev@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" During test, a command room violation interrupt is occasionally seen for the master context when the CXL flash devices are stressed. After studying the code, there could be gaps in the way command room value is being cached in cxlflash. When the cached command room is zero the thread attempting to send becomes burdened with updating the cached value with the actual value from the AFU. Today, this is handled with an atomic set operation of the raw value read. Following the atomic update, the thread proceeds to send. This behavior is incorrect on two counts: - The update fails to take into account the current thread and its consumption of one of the hardware commands. - The update does not take into account other threads also atomically updating. Per design, a worker thread updates the cached value when a send thread times out. By not protecting the update with a lock, the cached value can be incorrectly clobbered. To correct these issues, the update of the cached command room has been simplified and also protected using a spin lock which is held until the MMIO is complete. This ensures the command room is properly consumed by the same thread. Update of cached value also takes into account the current thread consuming a hardware command. Signed-off-by: Uma Krishnan Acked-by: Matthew R. Ochs --- drivers/scsi/cxlflash/common.h | 4 +-- drivers/scsi/cxlflash/main.c | 66 ++++++++++++------------------------------ 2 files changed, 20 insertions(+), 50 deletions(-) diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h index 6e68155..ef2943d 100644 --- a/drivers/scsi/cxlflash/common.h +++ b/drivers/scsi/cxlflash/common.h @@ -173,8 +173,8 @@ struct afu { u64 *hrrq_end; u64 *hrrq_curr; bool toggle; - bool read_room; - atomic64_t room; + s64 room; + spinlock_t rrin_slock; /* Lock to rrin queuing and cmd_room updates */ u64 hb; u32 cmd_couts; /* Number of command checkouts */ u32 internal_lun; /* User-desired LUN mode for this AFU */ diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c index 6d33d8c..1292a74 100644 --- a/drivers/scsi/cxlflash/main.c +++ b/drivers/scsi/cxlflash/main.c @@ -306,59 +306,34 @@ static int send_cmd(struct afu *afu, struct afu_cmd *cmd) { struct cxlflash_cfg *cfg = afu->parent; struct device *dev = &cfg->dev->dev; - int nretry = 0; int rc = 0; - u64 room; - long newval; + s64 room; + ulong lock_flags; /* - * This routine is used by critical users such an AFU sync and to - * send a task management function (TMF). Thus we want to retry a - * bit before returning an error. To avoid the performance penalty - * of MMIO, we spread the update of 'room' over multiple commands. + * To avoid the performance penalty of MMIO, spread the update of + * 'room' over multiple commands. */ -retry: - newval = atomic64_dec_if_positive(&afu->room); - if (!newval) { - do { - room = readq_be(&afu->host_map->cmd_room); - atomic64_set(&afu->room, room); - if (room) - goto write_ioarrin; - udelay(1 << nretry); - } while (nretry++ < MC_ROOM_RETRY_CNT); - - dev_err(dev, "%s: no cmd_room to send 0x%X\n", - __func__, cmd->rcb.cdb[0]); - - goto no_room; - } else if (unlikely(newval < 0)) { - /* This should be rare. i.e. Only if two threads race and - * decrement before the MMIO read is done. In this case - * just benefit from the other thread having updated - * afu->room. - */ - if (nretry++ < MC_ROOM_RETRY_CNT) { - udelay(1 << nretry); - goto retry; + spin_lock_irqsave(&afu->rrin_slock, lock_flags); + if (--afu->room < 0) { + room = readq_be(&afu->host_map->cmd_room); + if (room <= 0) { + dev_dbg_ratelimited(dev, "%s: no cmd_room to send " + "0x%02X, room=0x%016llX\n", + __func__, cmd->rcb.cdb[0], room); + afu->room = 0; + rc = SCSI_MLQUEUE_HOST_BUSY; + goto out; } - - goto no_room; + afu->room = room - 1; } -write_ioarrin: writeq_be((u64)&cmd->rcb, &afu->host_map->ioarrin); out: + spin_unlock_irqrestore(&afu->rrin_slock, lock_flags); pr_devel("%s: cmd=%p len=%d ea=%p rc=%d\n", __func__, cmd, cmd->rcb.data_len, (void *)cmd->rcb.data_ea, rc); return rc; - -no_room: - afu->read_room = true; - kref_get(&cfg->afu->mapcount); - schedule_work(&cfg->work_q); - rc = SCSI_MLQUEUE_HOST_BUSY; - goto out; } /** @@ -1827,7 +1802,8 @@ static int init_afu(struct cxlflash_cfg *cfg) } afu_err_intr_init(cfg->afu); - atomic64_set(&afu->room, readq_be(&afu->host_map->cmd_room)); + spin_lock_init(&afu->rrin_slock); + afu->room = readq_be(&afu->host_map->cmd_room); /* Restore the LUN mappings */ cxlflash_restore_luntable(cfg); @@ -2399,7 +2375,6 @@ MODULE_DEVICE_TABLE(pci, cxlflash_pci_table); * Handles the following events: * - Link reset which cannot be performed on interrupt context due to * blocking up to a few seconds - * - Read AFU command room * - Rescan the host */ static void cxlflash_worker_thread(struct work_struct *work) @@ -2436,11 +2411,6 @@ static void cxlflash_worker_thread(struct work_struct *work) cfg->lr_state = LINK_RESET_COMPLETE; } - if (afu->read_room) { - atomic64_set(&afu->room, readq_be(&afu->host_map->cmd_room)); - afu->read_room = false; - } - spin_unlock_irqrestore(cfg->host->host_lock, lock_flags); if (atomic_dec_if_positive(&cfg->scan_host_needed) >= 0)