From patchwork Fri Oct 30 18:50:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391249 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBKR4nvJz9sS8 for ; Sat, 31 Oct 2020 05:53:31 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727522AbgJ3SxC (ORCPT ); Fri, 30 Oct 2020 14:53:02 -0400 Received: from mga06.intel.com ([134.134.136.31]:65477 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727553AbgJ3SxA (ORCPT ); Fri, 30 Oct 2020 14:53:00 -0400 IronPort-SDR: AXD8hg3gcAa468sDxByC0AX/0b3l+NaLWHHz6rZiXFYunomrDE8r7rs/EAY0nrz6HhYwoe/jL5 p0rLDGDdDrbQ== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="230287761" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="230287761" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:50:56 -0700 IronPort-SDR: P/EszHGZMS3LYCtTJeIVgFGLl03yO2IZ+VGXlypcKQUAJNJmboelDzpeQyf2D7wTrVKdPF3TOC YCDdP23muVzw== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="527209865" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:50:55 -0700 Subject: [PATCH v4 01/17] irqchip: Add IMS (Interrupt Message Store) driver From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:50:54 -0700 Message-ID: <160408385476.912050.16205225207591080075.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Thomas Gleixner Generic IMS(Interrupt Message Store) irq chips and irq domain implementations for IMS based devices which store the interrupt messages in an array in device memory. Allocation and freeing of interrupts happens via the generic msi_domain_alloc/free_irqs() interface. No special purpose IMS magic required as long as the interrupt domain is stored in the underlying device struct. Provide storage and a setter for an Address Space Identifier. The identifier is stored in the top level irq_data and it only can be modified when the interrupt is not active. Add the necessary storage and helper functions and validate that interrupts which require an ASID have one assigned. [Megha : Fixed compile time errors Added necessary dependencies to IMS_MSI_ARRAY config Fixed polarity of IMS_VECTOR_CTRL Added reads after writes to flush writes to device Tested the IMS infrastructure with the IDXD driver] Signed-off-by: Thomas Gleixner Signed-off-by: Megha Dey Signed-off-by: Dave Jiang --- drivers/irqchip/Kconfig | 14 ++ drivers/irqchip/Makefile | 1 drivers/irqchip/irq-ims-msi.c | 204 +++++++++++++++++++++++++++++++++++ include/linux/interrupt.h | 2 include/linux/irq.h | 4 + include/linux/irqchip/irq-ims-msi.h | 68 ++++++++++++ kernel/irq/manage.c | 32 +++++ 7 files changed, 325 insertions(+) create mode 100644 drivers/irqchip/irq-ims-msi.c create mode 100644 include/linux/irqchip/irq-ims-msi.h diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index c6098eee0c7c..862ea81a69a0 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -597,4 +597,18 @@ config MST_IRQ help Support MStar Interrupt Controller. +config IMS_MSI + depends on PCI + select DEVICE_MSI + bool + +config IMS_MSI_ARRAY + bool "IMS Interrupt Message Store MSI controller for device memory storage arrays" + depends on PCI + select IMS_MSI + select GENERIC_MSI_IRQ_DOMAIN + help + Support for IMS Interrupt Message Store MSI controller + with IMS slot storage in a slot array in device memory + endmenu diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile index 94c2885882ee..a7d54605060a 100644 --- a/drivers/irqchip/Makefile +++ b/drivers/irqchip/Makefile @@ -114,3 +114,4 @@ obj-$(CONFIG_LOONGSON_PCH_PIC) += irq-loongson-pch-pic.o obj-$(CONFIG_LOONGSON_PCH_MSI) += irq-loongson-pch-msi.o obj-$(CONFIG_MST_IRQ) += irq-mst-intc.o obj-$(CONFIG_SL28CPLD_INTC) += irq-sl28cpld.o +obj-$(CONFIG_IMS_MSI) += irq-ims-msi.o diff --git a/drivers/irqchip/irq-ims-msi.c b/drivers/irqchip/irq-ims-msi.c new file mode 100644 index 000000000000..d54a54f5fdcc --- /dev/null +++ b/drivers/irqchip/irq-ims-msi.c @@ -0,0 +1,204 @@ +// SPDX-License-Identifier: GPL-2.0 +// (C) Copyright 2020 Thomas Gleixner +/* + * Shared interrupt chips and irq domains for IMS devices + */ +#include +#include +#include +#include +#include + +#include + +#ifdef CONFIG_IMS_MSI_ARRAY + +struct ims_array_data { + struct ims_array_info info; + unsigned long map[0]; +}; + +static inline void iowrite32_and_flush(u32 value, void __iomem *addr) +{ + iowrite32(value, addr); + ioread32(addr); +} + +static void ims_array_mask_irq(struct irq_data *data) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + struct ims_slot __iomem *slot = desc->device_msi.priv_iomem; + u32 __iomem *ctrl = &slot->ctrl; + + iowrite32_and_flush(ioread32(ctrl) | IMS_CTRL_VECTOR_MASKBIT, ctrl); +} + +static void ims_array_unmask_irq(struct irq_data *data) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + struct ims_slot __iomem *slot = desc->device_msi.priv_iomem; + u32 __iomem *ctrl = &slot->ctrl; + + iowrite32_and_flush(ioread32(ctrl) & ~IMS_CTRL_VECTOR_MASKBIT, ctrl); +} + +static void ims_array_write_msi_msg(struct irq_data *data, struct msi_msg *msg) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + struct ims_slot __iomem *slot = desc->device_msi.priv_iomem; + + iowrite32(msg->address_lo, &slot->address_lo); + iowrite32(msg->address_hi, &slot->address_hi); + iowrite32_and_flush(msg->data, &slot->data); +} + +static int ims_array_set_auxdata(struct irq_data *data, unsigned int which, + u64 auxval) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + struct ims_slot __iomem *slot = desc->device_msi.priv_iomem; + u32 val, __iomem *ctrl = &slot->ctrl; + + if (which != IMS_AUXDATA_CONTROL_WORD) + return -EINVAL; + if (auxval & ~(u64)IMS_CONTROL_WORD_AUXMASK) + return -EINVAL; + + val = ioread32(ctrl) & IMS_CONTROL_WORD_IRQMASK; + iowrite32_and_flush(val | (u32) auxval, ctrl); + return 0; +} + +static const struct irq_chip ims_array_msi_controller = { + .name = "IMS", + .irq_mask = ims_array_mask_irq, + .irq_unmask = ims_array_unmask_irq, + .irq_write_msi_msg = ims_array_write_msi_msg, + .irq_set_auxdata = ims_array_set_auxdata, + .irq_retrigger = irq_chip_retrigger_hierarchy, + .flags = IRQCHIP_SKIP_SET_WAKE, +}; + +static void ims_array_reset_slot(struct ims_slot __iomem *slot) +{ + iowrite32(0, &slot->address_lo); + iowrite32(0, &slot->address_hi); + iowrite32(0, &slot->data); + iowrite32_and_flush(IMS_CTRL_VECTOR_MASKBIT, &slot->ctrl); +} + +static void ims_array_free_msi_store(struct irq_domain *domain, + struct device *dev) +{ + struct msi_domain_info *info = domain->host_data; + struct ims_array_data *ims = info->data; + struct msi_desc *entry; + + for_each_msi_entry(entry, dev) { + if (entry->device_msi.priv_iomem) { + clear_bit(entry->device_msi.hwirq, ims->map); + ims_array_reset_slot(entry->device_msi.priv_iomem); + entry->device_msi.priv_iomem = NULL; + entry->device_msi.hwirq = 0; + } + } +} + +static int ims_array_alloc_msi_store(struct irq_domain *domain, + struct device *dev, int nvec) +{ + struct msi_domain_info *info = domain->host_data; + struct ims_array_data *ims = info->data; + struct msi_desc *entry; + + for_each_msi_entry(entry, dev) { + unsigned int idx; + + idx = find_first_zero_bit(ims->map, ims->info.max_slots); + if (idx >= ims->info.max_slots) + goto fail; + set_bit(idx, ims->map); + entry->device_msi.priv_iomem = &ims->info.slots[idx]; + ims_array_reset_slot(entry->device_msi.priv_iomem); + entry->device_msi.hwirq = idx; + } + return 0; + +fail: + ims_array_free_msi_store(domain, dev); + return -ENOSPC; +} + +struct ims_array_domain_template { + struct msi_domain_ops ops; + struct msi_domain_info info; +}; + +static const struct ims_array_domain_template ims_array_domain_template = { + .ops = { + .msi_alloc_store = ims_array_alloc_msi_store, + .msi_free_store = ims_array_free_msi_store, + }, + .info = { + .flags = MSI_FLAG_USE_DEF_DOM_OPS | + MSI_FLAG_USE_DEF_CHIP_OPS, + .handler = handle_edge_irq, + .handler_name = "edge", + }, +}; + +struct irq_domain * +pci_ims_array_create_msi_irq_domain(struct pci_dev *pdev, + struct ims_array_info *ims_info) +{ + struct ims_array_domain_template *info; + struct ims_array_data *data; + struct irq_domain *domain; + struct irq_chip *chip; + unsigned int size; + + /* Allocate new domain storage */ + info = kmemdup(&ims_array_domain_template, + sizeof(ims_array_domain_template), GFP_KERNEL); + if (!info) + return NULL; + /* Link the ops */ + info->info.ops = &info->ops; + + /* Allocate ims_info along with the bitmap */ + size = sizeof(*data); + size += BITS_TO_LONGS(ims_info->max_slots) * sizeof(unsigned long); + data = kzalloc(size, GFP_KERNEL); + if (!data) + goto err_info; + + data->info = *ims_info; + info->info.data = data; + + /* + * Allocate an interrupt chip because the core needs to be able to + * update it with default callbacks. + */ + chip = kmemdup(&ims_array_msi_controller, + sizeof(ims_array_msi_controller), GFP_KERNEL); + if (!chip) + goto err_data; + info->info.chip = chip; + + domain = pci_subdevice_msi_create_irq_domain(pdev, &info->info); + if (!domain) + goto err_chip; + + return domain; + +err_chip: + kfree(chip); +err_data: + kfree(data); +err_info: + kfree(info); + return NULL; +} +EXPORT_SYMBOL_GPL(pci_ims_array_create_msi_irq_domain); + +#endif /* CONFIG_IMS_MSI_ARRAY */ diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index ee8299eb1f52..43a8d1e9647e 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -487,6 +487,8 @@ extern int irq_get_irqchip_state(unsigned int irq, enum irqchip_irq_state which, extern int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state which, bool state); +int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val); + #ifdef CONFIG_IRQ_FORCED_THREADING # ifdef CONFIG_PREEMPT_RT # define force_irqthreads (true) diff --git a/include/linux/irq.h b/include/linux/irq.h index c54365309e97..fd162aea0c3f 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -491,6 +491,8 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d) * irq_request_resources * @irq_compose_msi_msg: optional to compose message content for MSI * @irq_write_msi_msg: optional to write message content for MSI + * @irq_set_auxdata: Optional function to update auxiliary data e.g. in + * shared registers * @irq_get_irqchip_state: return the internal state of an interrupt * @irq_set_irqchip_state: set the internal state of a interrupt * @irq_set_vcpu_affinity: optional to target a vCPU in a virtual machine @@ -538,6 +540,8 @@ struct irq_chip { void (*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg); void (*irq_write_msi_msg)(struct irq_data *data, struct msi_msg *msg); + int (*irq_set_auxdata)(struct irq_data *data, unsigned int which, u64 auxval); + int (*irq_get_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool *state); int (*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state); diff --git a/include/linux/irqchip/irq-ims-msi.h b/include/linux/irqchip/irq-ims-msi.h new file mode 100644 index 000000000000..a9e43e1f7890 --- /dev/null +++ b/include/linux/irqchip/irq-ims-msi.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* (C) Copyright 2020 Thomas Gleixner */ + +#ifndef _LINUX_IRQCHIP_IRQ_IMS_MSI_H +#define _LINUX_IRQCHIP_IRQ_IMS_MSI_H + +#include +#include + +/** + * ims_hw_slot - The hardware layout of an IMS based MSI message + * @address_lo: Lower 32bit address + * @address_hi: Upper 32bit address + * @data: Message data + * @ctrl: Control word + * + * This structure is used by both the device memory array and the queue + * memory variants of IMS. + */ +struct ims_slot { + u32 address_lo; + u32 address_hi; + u32 data; + u32 ctrl; +} __packed; + +/* + * The IMS control word utilizes bit 0-2 for interrupt control. The remaining + * bits can contain auxiliary data. + */ +#define IMS_CONTROL_WORD_IRQMASK GENMASK(2, 0) +#define IMS_CONTROL_WORD_AUXMASK GENMASK(31, 3) + +/* Bit to mask the interrupt in ims_hw_slot::ctrl */ +#define IMS_CTRL_VECTOR_MASKBIT BIT(0) + +/* Auxiliary control word data related defines */ +enum { + IMS_AUXDATA_CONTROL_WORD, +}; + +#define IMS_CTRL_PASID_ENABLE BIT(3) +#define IMS_CTRL_PASID_SHIFT 12 + +static inline u32 ims_ctrl_pasid_aux(unsigned int pasid, bool enable) +{ + u32 auxval = pasid << IMS_CTRL_PASID_SHIFT; + + return enable ? auxval | IMS_CTRL_PASID_ENABLE : auxval; +} + +/** + * struct ims_array_info - Information to create an IMS array domain + * @slots: Pointer to the start of the array + * @max_slots: Maximum number of slots in the array + */ +struct ims_array_info { + struct ims_slot __iomem *slots; + unsigned int max_slots; +}; + +struct pci_dev; +struct irq_domain; + +struct irq_domain *pci_ims_array_create_msi_irq_domain(struct pci_dev *pdev, + struct ims_array_info *ims_info); + +#endif diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index dc65d90108db..d7bf2ae67170 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -2752,3 +2752,35 @@ int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state which, return err; } EXPORT_SYMBOL_GPL(irq_set_irqchip_state); + +/** + * irq_set_auxdata - Set auxiliary data + * @irq: Interrupt to update + * @which: Selector which data to update + * @auxval: Auxiliary data value + * + * Function to update auxiliary data for an interrupt, e.g. to update data + * which is stored in a shared register or data storage (e.g. IMS). + */ +int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val) +{ + struct irq_desc *desc; + struct irq_data *data; + unsigned long flags; + int res = -ENODEV; + + desc = irq_get_desc_buslock(irq, &flags, 0); + if (!desc) + return -EINVAL; + + for (data = &desc->irq_data; data; data = irqd_get_parent_data(data)) { + if (data->chip->irq_set_auxdata) { + res = data->chip->irq_set_auxdata(data, which, val); + break; + } + } + + irq_put_desc_busunlock(desc, flags); + return res; +} +EXPORT_SYMBOL_GPL(irq_set_auxdata); From patchwork Fri Oct 30 18:51:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391246 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBK14Hs9z9sSC for ; Sat, 31 Oct 2020 05:53:09 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727429AbgJ3SxH (ORCPT ); Fri, 30 Oct 2020 14:53:07 -0400 Received: from mga07.intel.com ([134.134.136.100]:22741 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727433AbgJ3SxH (ORCPT ); Fri, 30 Oct 2020 14:53:07 -0400 IronPort-SDR: DeGUHHLoErfJBn7u+sbQ8DY7rppUlCbyUcQG+PicBrENuavDbpHXgA/YM3RyUsJfknAdLQ8VOw +LU38EyoiHgg== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="232829889" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="232829889" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:04 -0700 IronPort-SDR: eVYiPMA8Q+l8KImatJ+KalV+jgUqWTy47/v6sOZ/RYykVBZEPRcsNWg1iY0a9kErx8I2djCj9o l42+LTccZztw== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="469608105" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:02 -0700 Subject: [PATCH v4 02/17] iommu/vt-d: Add DEV-MSI support From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:51:01 -0700 Message-ID: <160408386122.912050.7027904087316715077.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Megha Dey Add required support in the interrupt remapping driver for devices which generate dev-msi interrupts and use the intel remapping domain as the parent domain. Reviewed-by: Ashok Raj Suggested-by: Thomas Gleixner Signed-off-by: Megha Dey Signed-off-by: Dave Jiang --- drivers/iommu/intel/irq_remapping.c | 34 ++++++++++++++++++++++------------ 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 0cfce1d3b7bb..0e8d106d34c0 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1260,6 +1260,16 @@ static struct irq_chip intel_ir_chip = { .irq_set_vcpu_affinity = intel_ir_set_vcpu_affinity, }; +static void irte_prepare_msg(struct msi_msg *msg, int index, int subhandle) +{ + msg->address_hi = MSI_ADDR_BASE_HI; + msg->data = subhandle; + msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT | + MSI_ADDR_IR_SHV | + MSI_ADDR_IR_INDEX1(index) | + MSI_ADDR_IR_INDEX2(index); +} + static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, struct irq_cfg *irq_cfg, struct irq_alloc_info *info, @@ -1301,19 +1311,18 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, break; case X86_IRQ_ALLOC_TYPE_HPET: + set_hpet_sid(irte, info->devid); + irte_prepare_msg(msg, index, sub_handle); + break; + case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - if (info->type == X86_IRQ_ALLOC_TYPE_HPET) - set_hpet_sid(irte, info->devid); - else - set_msi_sid(irte, msi_desc_to_pci_dev(info->desc)); - - msg->address_hi = MSI_ADDR_BASE_HI; - msg->data = sub_handle; - msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT | - MSI_ADDR_IR_SHV | - MSI_ADDR_IR_INDEX1(index) | - MSI_ADDR_IR_INDEX2(index); + set_msi_sid(irte, msi_desc_to_pci_dev(info->desc)); + irte_prepare_msg(msg, index, sub_handle); + break; + + case X86_IRQ_ALLOC_TYPE_DEV_MSI: + irte_prepare_msg(msg, index, sub_handle); break; default: @@ -1358,7 +1367,8 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain, if (!info || !iommu) return -EINVAL; if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI && - info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX) + info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX && + info->type != X86_IRQ_ALLOC_TYPE_DEV_MSI) return -EINVAL; /* From patchwork Fri Oct 30 18:51:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391247 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBKJ2K50z9sS8 for ; Sat, 31 Oct 2020 05:53:24 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727635AbgJ3SxP (ORCPT ); Fri, 30 Oct 2020 14:53:15 -0400 Received: from mga12.intel.com ([192.55.52.136]:35295 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727608AbgJ3SxO (ORCPT ); Fri, 30 Oct 2020 14:53:14 -0400 IronPort-SDR: EvoLUMt5v+SwmRwF5M1PJX3m0gpHEa0E8j9jrKgMrwATKqYYlBYhpCx9HOakJWLt1mt+mdkCBA DSEeOsHrPcfA== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="147936810" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="147936810" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:11 -0700 IronPort-SDR: 2Il6LhmX4D6Z5JnAyWW1xB2zxcZQENZoBdYJjW6JOsF5YIRwsOB3fgm9Thipxte67tU/cWakId dfAtrq5uwbBA== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="351960498" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:10 -0700 Subject: [PATCH v4 03/17] dmaengine: idxd: add theory of operation documentation for idxd mdev From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:51:09 -0700 Message-ID: <160408386942.912050.10453115629849166836.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add idxd vfio mediated device theory of operation documentation. Provide description on mdev design, usage, and why vfio mdev was chosen. Reviewed-by: Ashok Raj Reviewed-by: Kevin Tian Signed-off-by: Dave Jiang --- Documentation/driver-api/vfio/mdev-idxd.rst | 404 +++++++++++++++++++++++++++ MAINTAINERS | 1 2 files changed, 405 insertions(+) create mode 100644 Documentation/driver-api/vfio/mdev-idxd.rst diff --git a/Documentation/driver-api/vfio/mdev-idxd.rst b/Documentation/driver-api/vfio/mdev-idxd.rst new file mode 100644 index 000000000000..c75b7d88ef6b --- /dev/null +++ b/Documentation/driver-api/vfio/mdev-idxd.rst @@ -0,0 +1,404 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============= +IDXD Overview +============= +IDXD (Intel Data Accelerator Driver) is the driver for the Intel Data +Streaming Accelerator (DSA). Intel DSA is a high performance data copy +and transformation accelerator. In addition to data move operations, +the device also supports data fill, CRC generation, Data Integrity Field +(DIF), and memory compare and delta generation. Intel DSA supports +a variety of PCI-SIG defined capabilities such as Address Translation +Services (ATS), Process address Space ID (PASID), Page Request Interface +(PRI), Message Signalled Interrupts Extended (MSI-X), and Advanced Error +Reporting (AER). Some of those capabilities enable the device to support +Shared Virtual Memory (SVM), or also known as Shared Virtual Addressing +(SVA). Intel DSA also supports Intel Scalable I/O Virtualization (SIOV) +to improve scalability of device assignment. + + +The Intel DSA device contains the following basic components: +* Work queue (WQ) + + A WQ is an on device storage to queue descriptors to the + device. Requests are added to a WQ by using new CPU instructions + (MOVDIR64B and ENQCMD(S)) to write the memory mapped “portal” + associated with each WQ. + +* Engine + + Operation unit that pulls descriptors from WQs and processes them. + +* Group + + Abstract container to associate one or more engines with one or more WQs. + + +Two types of WQs are supported: +* Dedicated WQ (DWQ) + + A single client should owns this exclusively and can submit work + to it. The MOVDIR64B instruction is used to submit descriptors to + this type of WQ. The instruction is a posted write, therefore the + submitter must ensure not exceed the WQ length for submission. The + use of PASID is optional with DWQ. Multiple clients can submit to + a DWQ, but sychronization is required due to when the WQ is full, + the submission is silently dropped. + +* Shared WQ (SWQ) + + Multiple clients can submit work to this WQ. The submitter must use + ENQMCDS (from supervisor mode) or ENQCMD (from user mode). These + instructions will indicate via EFLAGS.ZF bit whether a submission + succeeds. The use of PASID is mandatory to identify the address space + of each client. + + +For more information about the new instructions [1][2]. + +The IDXD driver is broken down into following usages: +* In kernel interface through dmaengine subsystem API. +* Userspace DMA support through character device. mmap(2) is utilized + to map directly to mmio address (or portals) for descriptor submission. +* VFIO Mediated device (mdev) supporting device passthrough usages. + + +================================= +Assignable Device Interface (ADI) +================================= +The term ADI is used to represent the minimal unit of assignment for +Intel Scalable IOV device. Each ADI instance refers to the set of device +backend resources that are allocated, configured and organized as an +isolated unit. + +Intel DSA defines each WQ as an ADI. The MMIO registers of each work queue +are partitioned into two categories: +* MMIO registers accessed for data-path operations. +* MMIO registers accessed for control-path operations. + +Data-path MMIO registers of each WQ are contained within +one or more system page size aligned regions and can be mapped in the +CPU page table for direct access from the guest. Control-path MMIO +registers of all WQs are located together but segregated from data-path +MMIO regions. Therefore, guest updates to control-path registers must +be intercepted and then go through the host driver to be reflected in +the device. + +Data-path MMIO registers of DSA WQ are portals for submitting descriptors +to the device. There are four portals per WQ, each being 64 bytes +in size and located on a separate 4KB page in BAR2. Each portal has +different implications regarding interrupt message type (MSI vs. IMS) +and occupancy control (limited vs. unlimited). It is not necessary to +map all portals to the guest. + +Control-path MMIO registers of DSA WQ include global configurations +(shared by all WQs) and WQ-specific configurations. The owner +(e.g. the guest) of the WQ is expected to only change WQ-specific +configurations. Intel DSA spec introduces a “Configuration Support” +capability which, if cleared, indicates that some fields of WQ +configuration registers are read-only and the WQ configuration is +pre-configured by the host. + + +Interrupt Message Store (IMS) +============================= +The ADI utilizes Interrupt Message Store (IMS), a device-specific MSI +implementation, instead of MSIX for interrupts for the guest. This +preserves MSIX for host usages and also allows a significantly larger +number of interrupt vectors for large number of guests usage. + +Intel DSA device implements IMS as on-device memory mapped unified +storage. Each interrupt message is stored as a DWORD size data payload +and a 64-bit address (same as MSI-X). Access to the IMS is through the +host idxd driver. + +The idxd driver makes use of the generic IMS irq chip and domain which +stores the interrupt messages in an array in device memory. Allocation and +freeing of interrupts happens via the generic msi_domain_alloc/free_irqs() +interface. Driver only needs to ensure the interrupt domain is stored in +the underlying device struct. + + +ADI Isolation +============= +Operations or functioning of one ADI must not affect the functioning +of another ADI or the physical device. Upstream memory requests from +different ADIs are distinguished using a Process Address Space Identifier +(PASID). With the support of PASID-granular address translation in Intel +VT-d, the address space targeted by a request from ADI can be a Host +Virtual Address (HVA), Host I/O Virtual Address (HIOVA), Guest Physical +Address (GPA), Guest Virtual Address (GVA), Guest I/O Virtual Address +(GIOVA), etc. The PASID identity for an ADI is expected to be accessed +or modified by privileged software through the host driver. + +========================= +Virtual DSA (vDSA) Device +========================= +The DSA WQ itself is not a PCI device thus must be composed into a +virtual DSA device to the guest. + +The composition logic needs to handle four main requirements: +* Emulate PCI config space. +* Map data-path portals for direct access from the guest. +* Emulate control-path MMIO registers and selectively forward WQ + configuration requests through host driver to the device. +* Forward and emulate WQ interrupts to the guest. + +The composition logic tells the guest aspects of WQ which are configurable +through a combination of capability fields, e.g.: +* Configuration Support (if cleared, most aspects are not modifiable). +* WQ Mode Support (if cleared, cannot change between dedicated and + shared mode). +* Dedicated Mode Support. +* Shared Mode Support. +* ... + +The virtual capability fields are set according to the vDSA +type. Following are examples of vDSA types and related WQ configurability: +* Type ‘1DWQ_v1’ + * One DSA gen 1 WQ dedicated to this guest + * Guest cannot share the WQ between its clients (no guest SVA) + * Guest cannot change any WQ configuration +* Type ‘1SWQ_v1’ + * One DSA gen 1 WQ shared between multiple VMs + * Guest can further share the WQ between its clients (guest SVA is required) + * Guest cannot change any WQ configuration +* Type ‘1WQfull_v1’ + * One DSA gen 1 WQ dedicated to this guest + * Guest is allowed to do limited WQ configurations (thru WQCFG + register), including WQ mode (dedicated/shared), privilege, + threshold, PASID enable, PASID value, etc. + +Besides, the composition logic also needs to serve administrative commands +(thru virtual CMD register) through host driver, including: +* Drain/abort all descriptors submitted by this guest. +* Drain/abort descriptors associated with a PASID. +* Enable/disable/reset the WQ (when it’s not shared by multiple VMs). +* Request interrupt handle. + +With this design, vDSA emulation is **greatly simplified**. Most +registers are emulated in simple READ-ONLY flavor, and handling limited +configurability is required only for a few registers. + +=========================== +VFIO mdev vs. userspace DMA +=========================== +There are two avenues to support vDSA composition. +1. VFIO mediated device (mdev) +2. Userspace DMA through char device + +VFIO mdev provides a generic subdevice passthrough framework. Unified +uAPIs are used for both device and subdevice passthrough, thus any +userspace VMM which already supports VFIO device passthrough would +naturally support mdev/subdevice passthrough. The implication of VFIO +mdev is putting emulation of device interface in the kernel (part of +host driver) which must be carefully scrutinized. Fortunately, vDSA +composition includes only a small portion of emulation code, due to the +fact that most registers are simply READ-ONLY to the guest. The majority +logic of handling limited configurability and administrative commands +is anyway required to sit in the kernel, regardless of which kernel uAPI +is pursued. In this regard, VFIO mdev is a nice fit for vDSA composition. + +IDXD driver provides a char device interface for applications to +map the WQ portal and directly submit descriptors to do DMA. This +interface provides only data-path access to userspace and relies on +the host driver to handle control-path configurations. Expanding such +interface to support subdevice passthrough allows moving the emulation +code to userspace. However, quite some work is required to grow it from +an application-oriented interface into a passthrough-oriented interface: +new uAPIs to handle guest WQ configurability and administrative commands, +and new uAPIs to handle passthrough specific requirements (e.g. DMA map, +guest SVA, live migration, posted interrupt, etc.). And once it is done, +every userspace VMM has to explicitly bind to IDXD specific uAPI, even +though the real user is in the guest (instead of the VMM itself) in the +passthrough scenario. + +Although some generalization might be possible to reduce the work of +handling passthrough, we feel the difference between userspace DMA +and subdevice passthrough is distinct in IDXD. Therefore, we choose to +build vDSA composition on top of VFIO mdev framework and leave userspace +DMA intact after discussion at LPC 2020. + +============================= +Host Registration and Release +============================= + +Intel DSA reports support for Intel Scalable IOV via a PCI Express +Designated Vendor Specific Extended Capability (DVSEC). In addition, +PASID-granular address translation capability is required in the +IOMMU. During host initialization, the IDXD driver should check the +presence of both capabilities before calling mdev_register_device() +to register with the VFIO mdev framework and provide a set of ops +(struct mdev_parent_ops). The IOMMU capability is indicated by the +IOMMU_DEV_FEAT_AUX feature flag with iommu_dev_has_feature() and enabled +with iommu_dev_enable_feature(). + +On release, iommu_dev_disable_feature() is called after +mdev_unregister_device() to disable the IOMMU_DEV_FEAT_AUX flag that +the driver enabled during host initialization. + +The mdev_parent_ops data structure is filled out by the driver to provide +a number of ops called by VFIO mdev framework:: + + struct mdev_parent_ops { + .supported_type_groups + .create + .remove + .open + .release + .read + .write + .mmap + .ioctl + }; + +Supported_type_groups +--------------------- +At the moment only one vDSA type is supported. + +“1DWQ_v1”: + Single dedicated WQ (DSA 1.0) with read-only configuration exposed to + the guest. On the guest kernel, a vDSA device shows up with a single + WQ that is pre-configured by the host. The configuration for the WQ + is entirely read-only and cannot be reconfigured. There is no support + of guest SVA on this WQ. + + The interrupt vector 0 is emulated by the driver to support the admin + command completion and error reporting. A second interrupt vector is + bound to the IMS and used for I/O operation. + + +create +------ +API function to create the mdev. mdev_set_iommu_device() is called to +associate the mdev device to the parent PCI device. This function is +where the driver sets up and initializes the resources to support a single +mdev device. This is triggered through sysfs to initiate the creation. + +remove +------ +API function that mirrors the create() function and releases all the +resources backing the mdev. This is also triggered through sysfs. + +open +---- +API function that is called down from VFIO userspace to indicate to the +driver that the upper layers are ready to claim and utilize the mdev. IMS +entries are allocated and setup here. + +release +------- +The mirror function to open that releases the mdev by VFIO userspace. + +read / write +------------ +This is where the Intel IDXD driver provides read/write emulation of +PCI config space and MMIO registers. These paths are the “slow” path +of the mediated device and emulation is used rather than direct access +to the hardware resources. Typically configuration and administrative +commands go through this path. This allows the mdev to show up as a +virtual PCI device on the guest kernel. + +The emulation of PCI config space is nothing special, which is simply +copied from kvmgt. In the future this part might be consolidated to +reduce duplication. + +Emulating MMIO reads are simply memory copies. There is no side-effect +to be emulated upon guest read. + +Emulating MMIO writes are required only for a few registers, due to +read-only configuration on the ‘1DWQ-v1’ type. Majority of composition +logic is hooked in the CMD register for performing administrative commands +such as WQ drain, abort, enable, disable and reset operations. The rest of +the emulation is about handling errors (GENCTRL/SWERROR) and interrupts +(INTCAUSE/MSIXPERM) on the vDSA device. Future mdev types might allow +limited WQ configurability, which then requires additional emulation of +the WQCFG register. + +mmap +---- +This is the function that provides the setup to expose a portion of the +hardware, also known as portals, for direct access for “fast” path +operations through the mmap() syscall. A limited region of the hardware +is mapped to the guest for direct I/O submission. + +There are four portals per WQ: unlimited MSI-X, limited MSI-X, unlimited +IMS, limited IMS. Descriptors submitted to limited portals are subject +to threshold configuration limitations for shared WQs. The MSI-X portals +are used for host submissions, and the IMS portals are mapped to vm for +guest submission. + +ioctl +----- +This API function does several things +* Provides general device information to VFIO userspace. +* Provides device region information (PCI, mmio, etc). +* Get interrupts information +* Setup interrupts for the mediated device. +* Mdev device reset + +For the Intel idxd driver, Interrupt Message Store (IMS) vectors are being +used for mdev interrupts rather than MSIX vectors. IMS provides additional +interrupt vectors outside of PCI MSIX specification in order to support +significantly more vectors. The emulated interrupt (0) is connected through +kernel eventfd. When interrupt 0 needs to be asserted, the driver will +signal the eventfd to trigger the MSIX vector 0 interrupt on the guest. +The IMS interrupts are setup via eventfd as well. However, it utilizes +irq bypass manager to directly inject the interrupt in the guest. + +To allocate IMS, we utilize the IMS array APIs. On host init, we need +to create the MSI domain:: + + struct ims_array_info ims_info; + struct device *dev = &pci_dev->dev; + + + /* assign the device IMS size */ + ims_info.max_slots = max_ims_size; + /* assign the MMIO base address for the IMS table */ + ims_info.slots = mmio_base + ims_offset; + /* assign the MSI domain to the device */ + dev->msi_domain = pci_ims_array_create_msi_irq_domain(pci_dev, &ims_info); + +When we are ready to allocate the interrupts:: + + struct device *dev = mdev_dev(mdev); + + irq_domain = pci_dev->dev.msi_domain; + /* the irqs are allocated against device of mdev */ + rc = msi_domain_alloc_irqs(irq_domain, dev, num_vecs); + + + /* we can retrieve the slot index from msi_entry */ + for_each_msi_entry(entry, dev) { + slot_index = entry->device_msi.hwirq; + irq = entry->irq; + } + + request_irq(irq, interrupt_handler_function, 0, “ims”, context); + + +The DSA device is structured such that MSI-X table entry 0 is used for +admin commands completion, error reporting, and other misc commands. The +remaining MSI-X table entries are used for WQ completion. For vm support, +the virtual device also presents a similar layout. Therefore, vector 0 +is emulated by the software. Additional vector(s) are associated with IMS. + +The index (slot) for the per device IMS entry is managed by the MSI +core. The index is the “interrupt handle” that the guest kernel +needs to program into a DMA descriptor. That interrupt handle tells the +hardware which IMS vector to trigger the interrupt on for the host. + +The virtual device presents an admin command called “request interrupt +handle” that is not supported by the physical device. On probe of +the DSA device on the guest kernel, the guest driver will issue the +“request interrupt handle” command in order to get the interrupt +handle for descriptor programming. The host driver will return the +assigned slot for the IMS entry table to the guest. + +References +========== +[1] https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html +[2] https://software.intel.com/en-us/articles/intel-sdm +[3] https://software.intel.com/sites/default/files/managed/cc/0e/intel-scalable-io-virtualization-technical-specification.pdf +[4] https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification diff --git a/MAINTAINERS b/MAINTAINERS index e73636b75f29..af04e674853c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8888,6 +8888,7 @@ INTEL IADX DRIVER M: Dave Jiang L: dmaengine@vger.kernel.org S: Supported +F: Documentation/driver-api/vfio/mdev-idxd.rst F: drivers/dma/idxd/* F: include/uapi/linux/idxd.h From patchwork Fri Oct 30 18:51:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391248 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBKL0lfVz9sSW for ; Sat, 31 Oct 2020 05:53:26 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727628AbgJ3SxY (ORCPT ); Fri, 30 Oct 2020 14:53:24 -0400 Received: from mga05.intel.com ([192.55.52.43]:20436 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727660AbgJ3SxX (ORCPT ); Fri, 30 Oct 2020 14:53:23 -0400 IronPort-SDR: uK2Pl6nZCymJBNIz5WnbblBcDVz6RWtqkKH/wYOhgJj8pZdpVnruWjxZh+dL8dfuamS1UFeW/E n3IwzspOsH2A== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="253357844" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="253357844" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:20 -0700 IronPort-SDR: 2aNGAojTYt3r+B1DPzIvQTYVxJBY0KYxNv3I+SYVqwvnXbKprodQgshAfE1YipyQZK1bRijD6k HXP7YP8US8/A== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="324168078" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:16 -0700 Subject: [PATCH v4 04/17] dmaengine: idxd: add support for readonly config devices From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:51:16 -0700 Message-ID: <160408387598.912050.4521050687104233583.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org The VFIO mediated device for idxd driver will provide a virtual DSA device by backing it with a workqueue. The virtual device will be limited with the wq configuration registers set to read-only. Add support and helper functions for the handling of a DSA device with the configuration registers marked as read-only. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 116 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/idxd.h | 1 drivers/dma/idxd/init.c | 8 +++ drivers/dma/idxd/sysfs.c | 20 +++++--- 4 files changed, 137 insertions(+), 8 deletions(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index d6f551dcbcb6..7003884cd8ad 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -778,3 +778,119 @@ int idxd_device_config(struct idxd_device *idxd) return 0; } + +static int idxd_wq_load_config(struct idxd_wq *wq) +{ + struct idxd_device *idxd = wq->idxd; + struct device *dev = &idxd->pdev->dev; + int wqcfg_offset; + int i; + + wqcfg_offset = WQCFG_OFFSET(idxd, wq->id, 0); + memcpy_fromio(wq->wqcfg, idxd->reg_base + wqcfg_offset, idxd->wqcfg_size); + + wq->size = wq->wqcfg->wq_size; + wq->threshold = wq->wqcfg->wq_thresh; + if (wq->wqcfg->priv) + wq->type = IDXD_WQT_KERNEL; + + /* The driver does not support shared WQ mode in read-only config yet */ + if (wq->wqcfg->mode == 0 || wq->wqcfg->pasid_en) + return -EOPNOTSUPP; + + set_bit(WQ_FLAG_DEDICATED, &wq->flags); + + wq->priority = wq->wqcfg->priority; + + for (i = 0; i < WQCFG_STRIDES(idxd); i++) { + wqcfg_offset = WQCFG_OFFSET(idxd, wq->id, i); + dev_dbg(dev, "WQ[%d][%d][%#x]: %#x\n", wq->id, i, wqcfg_offset, wq->wqcfg->bits[i]); + } + + return 0; +} + +static void idxd_group_load_config(struct idxd_group *group) +{ + struct idxd_device *idxd = group->idxd; + struct device *dev = &idxd->pdev->dev; + int i, j, grpcfg_offset; + + /* + * Load WQS bit fields + * Iterate through all 256 bits 64 bits at a time + */ + for (i = 0; i < GRPWQCFG_STRIDES; i++) { + struct idxd_wq *wq; + + grpcfg_offset = GRPWQCFG_OFFSET(idxd, group->id, i); + group->grpcfg.wqs[i] = ioread64(idxd->reg_base + grpcfg_offset); + dev_dbg(dev, "GRPCFG wq[%d:%d: %#x]: %#llx\n", + group->id, i, grpcfg_offset, group->grpcfg.wqs[i]); + + if (i * 64 >= idxd->max_wqs) + break; + + /* Iterate through all 64 bits and check for wq set */ + for (j = 0; j < 64; j++) { + int id = i * 64 + j; + + /* No need to check beyond max wqs */ + if (id >= idxd->max_wqs) + break; + + /* Set group assignment for wq if wq bit is set */ + if (group->grpcfg.wqs[i] & BIT(j)) { + wq = &idxd->wqs[id]; + wq->group = group; + } + } + } + + grpcfg_offset = GRPENGCFG_OFFSET(idxd, group->id); + group->grpcfg.engines = ioread64(idxd->reg_base + grpcfg_offset); + dev_dbg(dev, "GRPCFG engs[%d: %#x]: %#llx\n", group->id, + grpcfg_offset, group->grpcfg.engines); + + /* Iterate through all 64 bits to check engines set */ + for (i = 0; i < 64; i++) { + if (i >= idxd->max_engines) + break; + + if (group->grpcfg.engines & BIT(i)) { + struct idxd_engine *engine = &idxd->engines[i]; + + engine->group = group; + } + } + + grpcfg_offset = GRPFLGCFG_OFFSET(idxd, group->id); + group->grpcfg.flags.bits = ioread32(idxd->reg_base + grpcfg_offset); + dev_dbg(dev, "GRPFLAGS flags[%d: %#x]: %#x\n", + group->id, grpcfg_offset, group->grpcfg.flags.bits); +} + +int idxd_device_load_config(struct idxd_device *idxd) +{ + union gencfg_reg reg; + int i, rc; + + reg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET); + idxd->token_limit = reg.token_limit; + + for (i = 0; i < idxd->max_groups; i++) { + struct idxd_group *group = &idxd->groups[i]; + + idxd_group_load_config(group); + } + + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq = &idxd->wqs[i]; + + rc = idxd_wq_load_config(wq); + if (rc < 0) + return rc; + } + + return 0; +} diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 7e54209c433a..1afc34be4ed0 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -317,6 +317,7 @@ void idxd_device_cleanup(struct idxd_device *idxd); int idxd_device_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); +int idxd_device_load_config(struct idxd_device *idxd); /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 45b0eac640c3..98b1091181bb 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -349,6 +349,14 @@ static int idxd_probe(struct idxd_device *idxd) if (rc) goto err_setup; + /* If the configs are readonly, then load them from device */ + if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) { + dev_dbg(dev, "Loading RO device config\n"); + rc = idxd_device_load_config(idxd); + if (rc < 0) + goto err_setup; + } + rc = idxd_setup_interrupts(idxd); if (rc) goto err_setup; diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 6d292eb79bf3..304eb2cf532e 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -102,7 +102,7 @@ static int idxd_config_bus_match(struct device *dev, static int idxd_config_bus_probe(struct device *dev) { - int rc; + int rc = 0; unsigned long flags; dev_dbg(dev, "%s called\n", __func__); @@ -120,7 +120,8 @@ static int idxd_config_bus_probe(struct device *dev) /* Perform IDXD configuration and enabling */ spin_lock_irqsave(&idxd->dev_lock, flags); - rc = idxd_device_config(idxd); + if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) + rc = idxd_device_config(idxd); spin_unlock_irqrestore(&idxd->dev_lock, flags); if (rc < 0) { module_put(THIS_MODULE); @@ -207,7 +208,8 @@ static int idxd_config_bus_probe(struct device *dev) } spin_lock_irqsave(&idxd->dev_lock, flags); - rc = idxd_device_config(idxd); + if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) + rc = idxd_device_config(idxd); spin_unlock_irqrestore(&idxd->dev_lock, flags); if (rc < 0) { mutex_unlock(&wq->wq_lock); @@ -328,12 +330,14 @@ static int idxd_config_bus_remove(struct device *dev) idxd_unregister_dma_device(idxd); rc = idxd_device_disable(idxd); - for (i = 0; i < idxd->max_wqs; i++) { - struct idxd_wq *wq = &idxd->wqs[i]; + if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) { + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq = &idxd->wqs[i]; - mutex_lock(&wq->wq_lock); - idxd_wq_disable_cleanup(wq); - mutex_unlock(&wq->wq_lock); + mutex_lock(&wq->wq_lock); + idxd_wq_disable_cleanup(wq); + mutex_unlock(&wq->wq_lock); + } } module_put(THIS_MODULE); if (rc < 0) From patchwork Fri Oct 30 18:51:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391250 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBKd6MZ8z9sTr for ; Sat, 31 Oct 2020 05:53:41 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727535AbgJ3Sxa (ORCPT ); Fri, 30 Oct 2020 14:53:30 -0400 Received: from mga09.intel.com ([134.134.136.24]:10513 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727658AbgJ3Sx3 (ORCPT ); Fri, 30 Oct 2020 14:53:29 -0400 IronPort-SDR: vI1QbNRJO5rQgSoT8f6KP/TgwiIMuNcJee3ktnzA1MyMiGn/deCO5WX/IUE4O8smXNFsGbOfTP 877VywADkjLQ== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="168781964" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="168781964" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:27 -0700 IronPort-SDR: NhlgJn489O9QulgVEROoMY+MoNSpZbLLUFSYR9kZ6vZDeNQ1uhzXbtPEqLref3lVwBHUMaZRAr lIe7bo3rHS/A== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="319408331" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:25 -0700 Subject: [PATCH v4 05/17] dmaengine: idxd: add interrupt handle request support From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:51:25 -0700 Message-ID: <160408388556.912050.3620822104414694832.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add support for requesting interrupt handle from the device. The interrupt handle is put in the interrupt handle field of a descriptor for the device to determine which interrupt vector to use be it MSI-X or IMS. On the host device, the interrupt handle is indexed to the MSI-X table. This allows a descriptor to program the interrupt handle 1:1 with the MSI-X index without getting it from the request interrupt handle device command. For a guest device, the index can be any index that the host assigned for the IMS table, and therefore it must be requested from the virtual device during MSI-X setup by the driver running on the guest. On the actual hardware the MSIX vector 0 is misc interrupt and handles events such as administrative command completion, error reporting, performance monitor overflow, and etc. The MSIX vectors 1...N are used for descriptor completion interrupts. On the guest kernel, the MSIX interrupts are backed by the mediated device through emulation or IMS vectors. Vector 0 is handled through emulation by the host vdcm. It only requires the host driver to send the signal to qemu. The vector 1 (and more may be supported later) is backed by IMS. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 58 ++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/idxd.h | 13 +++++++++ drivers/dma/idxd/init.c | 48 +++++++++++++++++++++++++++++++++++ drivers/dma/idxd/registers.h | 9 ++++++- drivers/dma/idxd/submit.c | 29 ++++++++++++++++----- 5 files changed, 149 insertions(+), 8 deletions(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 7003884cd8ad..a9ae970db0a4 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -532,6 +532,64 @@ void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid) dev_dbg(dev, "pasid %d drained\n", pasid); } +int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, + enum idxd_interrupt_type irq_type) +{ + struct device *dev = &idxd->pdev->dev; + u32 operand, status; + + if (!(idxd->hw.cmd_cap & BIT(IDXD_CMD_REQUEST_INT_HANDLE))) + return -EOPNOTSUPP; + + dev_dbg(dev, "get int handle, idx %d\n", idx); + + operand = idx & GENMASK(15, 0); + if (irq_type == IDXD_IRQ_IMS) + operand |= CMD_INT_HANDLE_IMS; + + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_REQUEST_INT_HANDLE, operand); + + idxd_cmd_exec(idxd, IDXD_CMD_REQUEST_INT_HANDLE, operand, &status); + + if ((status & IDXD_CMDSTS_ERR_MASK) != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "request int handle failed: %#x\n", status); + return -ENXIO; + } + + *handle = (status >> IDXD_CMDSTS_RES_SHIFT) & GENMASK(15, 0); + + dev_dbg(dev, "int handle acquired: %u\n", *handle); + return 0; +} + +int idxd_device_release_int_handle(struct idxd_device *idxd, int handle, + enum idxd_interrupt_type irq_type) +{ + struct device *dev = &idxd->pdev->dev; + u32 operand, status; + + if (!(idxd->hw.cmd_cap & BIT(IDXD_CMD_RELEASE_INT_HANDLE))) + return -EOPNOTSUPP; + + dev_dbg(dev, "release int handle, handle %d\n", handle); + + operand = handle & GENMASK(15, 0); + if (irq_type == IDXD_IRQ_IMS) + operand |= CMD_INT_HANDLE_IMS; + + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_RELEASE_INT_HANDLE, operand); + + idxd_cmd_exec(idxd, IDXD_CMD_RELEASE_INT_HANDLE, operand, &status); + + if ((status & IDXD_CMDSTS_ERR_MASK) != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "release int handle failed: %#x\n", status); + return -ENXIO; + } + + dev_dbg(dev, "int handle released.\n"); + return 0; +} + /* Device configuration bits */ static void idxd_group_config_write(struct idxd_group *group) { diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 1afc34be4ed0..a506a16c83ee 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -140,6 +140,7 @@ struct idxd_hw { union group_cap_reg group_cap; union engine_cap_reg engine_cap; struct opcap opcap; + u32 cmd_cap; }; enum idxd_device_state { @@ -205,6 +206,8 @@ struct idxd_device { struct dma_device dma_dev; struct workqueue_struct *wq; struct work_struct work; + + int *int_handles; }; /* IDXD software descriptor */ @@ -218,6 +221,7 @@ struct idxd_desc { struct list_head list; int id; int cpu; + unsigned int vector; struct idxd_wq *wq; }; @@ -253,6 +257,11 @@ enum idxd_portal_prot { IDXD_PORTAL_LIMITED, }; +enum idxd_interrupt_type { + IDXD_IRQ_MSIX = 0, + IDXD_IRQ_IMS, +}; + static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot) { return prot * 0x1000; @@ -318,6 +327,10 @@ int idxd_device_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); int idxd_device_load_config(struct idxd_device *idxd); +int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, + enum idxd_interrupt_type irq_type); +int idxd_device_release_int_handle(struct idxd_device *idxd, int handle, + enum idxd_interrupt_type irq_type); /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 98b1091181bb..c136216e19e8 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -133,6 +133,22 @@ static int idxd_setup_interrupts(struct idxd_device *idxd) } dev_dbg(dev, "Allocated idxd-msix %d for vector %d\n", i, msix->vector); + + if (idxd->hw.cmd_cap & BIT(IDXD_CMD_REQUEST_INT_HANDLE)) { + /* + * The MSIX vector enumeration starts at 1 with vector 0 being the + * misc interrupt that handles non I/O completion events. The + * interrupt handles are for IMS enumeration on guest. The misc + * interrupt vector does not require a handle and therefore we start + * the int_handles at index 0. Since 'i' starts at 1, the first + * int_handles index will be 0. + */ + rc = idxd_device_request_int_handle(idxd, i, &idxd->int_handles[i - 1], + IDXD_IRQ_MSIX); + if (rc < 0) + goto err_no_irq; + dev_dbg(dev, "int handle requested: %u\n", idxd->int_handles[i - 1]); + } } idxd_unmask_error_interrupts(idxd); @@ -160,6 +176,13 @@ static int idxd_setup_internals(struct idxd_device *idxd) int i; init_waitqueue_head(&idxd->cmd_waitq); + + if (idxd->hw.cmd_cap & BIT(IDXD_CMD_REQUEST_INT_HANDLE)) { + idxd->int_handles = devm_kcalloc(dev, idxd->max_wqs, sizeof(int), GFP_KERNEL); + if (!idxd->int_handles) + return -ENOMEM; + } + idxd->groups = devm_kcalloc(dev, idxd->max_groups, sizeof(struct idxd_group), GFP_KERNEL); if (!idxd->groups) @@ -233,6 +256,12 @@ static void idxd_read_caps(struct idxd_device *idxd) /* reading generic capabilities */ idxd->hw.gen_cap.bits = ioread64(idxd->reg_base + IDXD_GENCAP_OFFSET); dev_dbg(dev, "gen_cap: %#llx\n", idxd->hw.gen_cap.bits); + + if (idxd->hw.gen_cap.cmd_cap) { + idxd->hw.cmd_cap = ioread32(idxd->reg_base + IDXD_CMDCAP_OFFSET); + dev_dbg(dev, "cmd_cap: %#x\n", idxd->hw.cmd_cap); + } + idxd->max_xfer_bytes = 1ULL << idxd->hw.gen_cap.max_xfer_shift; dev_dbg(dev, "max xfer size: %llu bytes\n", idxd->max_xfer_bytes); idxd->max_batch_size = 1U << idxd->hw.gen_cap.max_batch_shift; @@ -471,6 +500,24 @@ static void idxd_flush_work_list(struct idxd_irq_entry *ie) } } +static void idxd_release_int_handles(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int i, rc; + + for (i = 0; i < idxd->num_wq_irqs; i++) { + if (idxd->hw.cmd_cap & BIT(IDXD_CMD_RELEASE_INT_HANDLE)) { + rc = idxd_device_release_int_handle(idxd, idxd->int_handles[i], + IDXD_IRQ_MSIX); + if (rc < 0) + dev_warn(dev, "irq handle %d release failed\n", + idxd->int_handles[i]); + else + dev_dbg(dev, "int handle requested: %u\n", idxd->int_handles[i]); + } + } +} + static void idxd_shutdown(struct pci_dev *pdev) { struct idxd_device *idxd = pci_get_drvdata(pdev); @@ -495,6 +542,7 @@ static void idxd_shutdown(struct pci_dev *pdev) idxd_flush_work_list(irq_entry); } + idxd_release_int_handles(idxd); destroy_workqueue(idxd->wq); } diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index d29a58ee2651..d02fd59a8e39 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -23,8 +23,8 @@ union gen_cap_reg { u64 overlap_copy:1; u64 cache_control_mem:1; u64 cache_control_cache:1; + u64 cmd_cap:1; u64 rsvd:3; - u64 int_handle_req:1; u64 dest_readback:1; u64 drain_readback:1; u64 rsvd2:6; @@ -179,8 +179,11 @@ enum idxd_cmd { IDXD_CMD_DRAIN_PASID, IDXD_CMD_ABORT_PASID, IDXD_CMD_REQUEST_INT_HANDLE, + IDXD_CMD_RELEASE_INT_HANDLE, }; +#define CMD_INT_HANDLE_IMS 0x10000 + #define IDXD_CMDSTS_OFFSET 0xa8 union cmdsts_reg { struct { @@ -192,6 +195,8 @@ union cmdsts_reg { u32 bits; } __packed; #define IDXD_CMDSTS_ACTIVE 0x80000000 +#define IDXD_CMDSTS_ERR_MASK 0xff +#define IDXD_CMDSTS_RES_SHIFT 8 enum idxd_cmdsts_err { IDXD_CMDSTS_SUCCESS = 0, @@ -227,6 +232,8 @@ enum idxd_cmdsts_err { IDXD_CMDSTS_ERR_NO_HANDLE, }; +#define IDXD_CMDCAP_OFFSET 0xb0 + #define IDXD_SWERR_OFFSET 0xc0 #define IDXD_SWERR_VALID 0x00000001 #define IDXD_SWERR_OVERFLOW 0x00000002 diff --git a/drivers/dma/idxd/submit.c b/drivers/dma/idxd/submit.c index efca5d8468a6..cdea5d37ef24 100644 --- a/drivers/dma/idxd/submit.c +++ b/drivers/dma/idxd/submit.c @@ -22,11 +22,17 @@ static struct idxd_desc *__get_desc(struct idxd_wq *wq, int idx, int cpu) desc->hw->pasid = idxd->pasid; /* - * Descriptor completion vectors are 1-8 for MSIX. We will round - * robin through the 8 vectors. + * Descriptor completion vectors are 1...N for MSIX. We will round + * robin through the N vectors. */ wq->vec_ptr = (wq->vec_ptr % idxd->num_wq_irqs) + 1; - desc->hw->int_handle = wq->vec_ptr; + if (!idxd->int_handles) { + desc->hw->int_handle = wq->vec_ptr; + } else { + desc->vector = wq->vec_ptr; + desc->hw->int_handle = idxd->int_handles[desc->vector]; + } + return desc; } @@ -79,7 +85,6 @@ void idxd_free_desc(struct idxd_wq *wq, struct idxd_desc *desc) int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc) { struct idxd_device *idxd = wq->idxd; - int vec = desc->hw->int_handle; void __iomem *portal; int rc; @@ -112,9 +117,19 @@ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc) * Pending the descriptor to the lockless list for the irq_entry * that we designated the descriptor to. */ - if (desc->hw->flags & IDXD_OP_FLAG_RCI) - llist_add(&desc->llnode, - &idxd->irq_entries[vec].pending_llist); + if (desc->hw->flags & IDXD_OP_FLAG_RCI) { + int vec; + + /* + * If the driver is on host kernel, it would be the value + * assigned to interrupt handle, which is index for MSIX + * vector. If it's guest then can't use the int_handle since + * that is the index to IMS for the entire device. The guest + * device local index will be used. + */ + vec = !idxd->int_handles ? desc->hw->int_handle : desc->vector; + llist_add(&desc->llnode, &idxd->irq_entries[vec].pending_llist); + } return 0; } From patchwork Fri Oct 30 18:51:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391259 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBLg1jSgz9sVH for ; Sat, 31 Oct 2020 05:54:35 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727288AbgJ3Swb (ORCPT ); Fri, 30 Oct 2020 14:52:31 -0400 Received: from mga12.intel.com ([192.55.52.136]:35177 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726061AbgJ3Swa (ORCPT ); Fri, 30 Oct 2020 14:52:30 -0400 IronPort-SDR: Ytvwh/SkQwsTC6yjIfIOxlKUeESPHrt9NJQCnSxUoT1uS10PmFThfmVwgbKe8Vfak9ZKUNscti zPlzuT8vuqNQ== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="147936850" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="147936850" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:33 -0700 IronPort-SDR: 9U/h/5H6+ey8Jh7kJdRLLy7240505xw+hKhlUkFhE0eePhaDPi1zHyK3SJuIJpn7uN3e/3LkqQ yVWOw30R+9GA== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="525974696" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:32 -0700 Subject: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:51:32 -0700 Message-ID: <160408389228.912050.10790556040702698268.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Intel Scalable I/O Virtualization (SIOV) enables sharing of I/O devices across isolated domains through PASID based sub-device partitioning. Interrupt Message Storage (IMS) enables devices to store the interrupt messages in a device-specific optimized manner without the scalability restrictions of the PCIe defined MSI-X capability. IMS is one of the features supported under SIOV. Move SIOV detection code from Intel iommu driver code to common PCI. Making the detection code common allows supported accelerator drivers to query the PCI core for SIOV and IMS capabilities. The support code will add the ability to query the PCI DVSEC capabilities for the SIOV cap. Suggested-by: Thomas Gleixner Cc: Baolu Lu Signed-off-by: Dave Jiang Reviewed-by: Ashok Raj --- drivers/iommu/intel/iommu.c | 31 ++----------------------- drivers/pci/Kconfig | 15 ++++++++++++ drivers/pci/Makefile | 2 ++ drivers/pci/dvsec.c | 40 +++++++++++++++++++++++++++++++++ drivers/pci/siov.c | 50 +++++++++++++++++++++++++++++++++++++++++ include/linux/pci-siov.h | 18 +++++++++++++++ include/linux/pci.h | 3 ++ include/uapi/linux/pci_regs.h | 4 +++ 8 files changed, 134 insertions(+), 29 deletions(-) create mode 100644 drivers/pci/dvsec.c create mode 100644 drivers/pci/siov.c create mode 100644 include/linux/pci-siov.h diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 3e77a88b236c..d9335f590b42 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -5883,34 +5884,6 @@ static int intel_iommu_disable_auxd(struct device *dev) return 0; } -/* - * A PCI express designated vendor specific extended capability is defined - * in the section 3.7 of Intel scalable I/O virtualization technical spec - * for system software and tools to detect endpoint devices supporting the - * Intel scalable IO virtualization without host driver dependency. - * - * Returns the address of the matching extended capability structure within - * the device's PCI configuration space or 0 if the device does not support - * it. - */ -static int siov_find_pci_dvsec(struct pci_dev *pdev) -{ - int pos; - u16 vendor, id; - - pos = pci_find_next_ext_capability(pdev, 0, 0x23); - while (pos) { - pci_read_config_word(pdev, pos + 4, &vendor); - pci_read_config_word(pdev, pos + 8, &id); - if (vendor == PCI_VENDOR_ID_INTEL && id == 5) - return pos; - - pos = pci_find_next_ext_capability(pdev, pos, 0x23); - } - - return 0; -} - static bool intel_iommu_dev_has_feat(struct device *dev, enum iommu_dev_features feat) { @@ -5925,7 +5898,7 @@ intel_iommu_dev_has_feat(struct device *dev, enum iommu_dev_features feat) if (ret < 0) return false; - return !!siov_find_pci_dvsec(to_pci_dev(dev)); + return pci_siov_supported(to_pci_dev(dev)); } if (feat == IOMMU_DEV_FEAT_SVA) { diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index 0c473d75e625..cf7f4d17d8cc 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -161,6 +161,21 @@ config PCI_PASID If unsure, say N. +config PCI_DVSEC + bool + +config PCI_SIOV + select PCI_PASID + select PCI_DVSEC + bool "PCI SIOV support" + help + Scalable I/O Virtualzation enables sharing of I/O devices across isolated + domains through PASID based sub-device partitioning. One of the sub features + supported by SIOV is Inetrrupt Message Storage (IMS). Select this option if + you want to compile the support into your kernel. + + If unsure, say N. + config PCI_P2PDMA bool "PCI peer-to-peer transfer support" depends on ZONE_DEVICE diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 522d2b974e91..653a1d69b0fc 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -20,6 +20,8 @@ obj-$(CONFIG_PCI_QUIRKS) += quirks.o obj-$(CONFIG_HOTPLUG_PCI) += hotplug/ obj-$(CONFIG_PCI_MSI) += msi.o obj-$(CONFIG_PCI_ATS) += ats.o +obj-$(CONFIG_PCI_DVSEC) += dvsec.o +obj-$(CONFIG_PCI_SIOV) += siov.o obj-$(CONFIG_PCI_IOV) += iov.o obj-$(CONFIG_PCI_BRIDGE_EMUL) += pci-bridge-emul.o obj-$(CONFIG_PCI_LABEL) += pci-label.o diff --git a/drivers/pci/dvsec.c b/drivers/pci/dvsec.c new file mode 100644 index 000000000000..e49b079f0717 --- /dev/null +++ b/drivers/pci/dvsec.c @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * PCI DVSEC helper functions + * Copyright (C) 2020 Intel Corp. + */ + +#include +#include +#include +#include "pci.h" + +/** + * pci_find_dvsec - return position of DVSEC with provided vendor and dvsec id + * @dev: the PCI device + * @vendor: Vendor for the DVSEC + * @id: the DVSEC cap id + * + * Return the offset of DVSEC on success or -ENOTSUPP if not found + */ +int pci_find_dvsec(struct pci_dev *dev, u16 vendor, u16 id) +{ + u16 dev_vendor, dev_id; + int pos; + + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DVSEC); + if (!pos) + return -ENOTSUPP; + + while (pos) { + pci_read_config_word(dev, pos + PCI_DVSEC_HEADER1, &dev_vendor); + pci_read_config_word(dev, pos + PCI_DVSEC_HEADER2, &dev_id); + if (dev_vendor == vendor && dev_id == id) + return pos; + + pos = pci_find_next_ext_capability(dev, pos, PCI_EXT_CAP_ID_DVSEC); + } + + return -ENOTSUPP; +} +EXPORT_SYMBOL_GPL(pci_find_dvsec); diff --git a/drivers/pci/siov.c b/drivers/pci/siov.c new file mode 100644 index 000000000000..6147e6ae5832 --- /dev/null +++ b/drivers/pci/siov.c @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Scalable I/O Virtualization support + * Copyright (C) 2020 Intel Corp. + */ + +#include +#include +#include +#include +#include "pci.h" + +/* + * A PCI express designated vendor specific extended capability is defined + * in the section 3.7 of Intel scalable I/O virtualization technical spec + * for system software and tools to detect endpoint devices supporting the + * Intel scalable IO virtualization without host driver dependency. + */ + +/** + * pci_siov_supported - check if the device can use SIOV + * @dev: the PCI device + * + * Returns true if the device supports SIOV, false otherwise. + */ +bool pci_siov_supported(struct pci_dev *dev) +{ + return pci_find_dvsec(dev, PCI_VENDOR_ID_INTEL, PCI_DVSEC_ID_INTEL_SIOV) < 0 ? false : true; +} +EXPORT_SYMBOL_GPL(pci_siov_supported); + +/** + * pci_ims_supported - check if the device can use IMS + * @dev: the PCI device + * + * Returns true if the device supports IMS, false otherwise. + */ +bool pci_ims_supported(struct pci_dev *dev) +{ + int pos; + u32 caps; + + pos = pci_find_dvsec(dev, PCI_VENDOR_ID_INTEL, PCI_DVSEC_ID_INTEL_SIOV); + if (pos < 0) + return false; + + pci_read_config_dword(dev, pos + PCI_DVSEC_INTEL_SIOV_CAP, &caps); + return (caps & PCI_DVSEC_INTEL_SIOV_CAP_IMS) ? true : false; +} +EXPORT_SYMBOL_GPL(pci_ims_supported); diff --git a/include/linux/pci-siov.h b/include/linux/pci-siov.h new file mode 100644 index 000000000000..a8a4eb5f4634 --- /dev/null +++ b/include/linux/pci-siov.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef LINUX_PCI_SIOV_H +#define LINUX_PCI_SIOV_H + +#include + +#ifdef CONFIG_PCI_SIOV +/* Scalable I/O Virtualization */ +bool pci_siov_supported(struct pci_dev *dev); +bool pci_ims_supported(struct pci_dev *dev); +#else /* CONFIG_PCI_SIOV */ +static inline bool pci_siov_supported(struct pci_dev *d) +{ return false; } +static inline bool pci_ims_supported(struct pci_dev *d) +{ return false; } +#endif /* CONFIG_PCI_SIOV */ + +#endif /* LINUX_PCI_SIOV_H */ diff --git a/include/linux/pci.h b/include/linux/pci.h index 22207a79762c..4710f09b43b1 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1070,6 +1070,7 @@ int pci_find_next_ext_capability(struct pci_dev *dev, int pos, int cap); int pci_find_ht_capability(struct pci_dev *dev, int ht_cap); int pci_find_next_ht_capability(struct pci_dev *dev, int pos, int ht_cap); struct pci_bus *pci_find_next_bus(const struct pci_bus *from); +int pci_find_dvsec(struct pci_dev *dev, u16 vendor, u16 id); u64 pci_get_dsn(struct pci_dev *dev); @@ -1726,6 +1727,8 @@ static inline int pci_find_next_capability(struct pci_dev *dev, u8 post, { return 0; } static inline int pci_find_ext_capability(struct pci_dev *dev, int cap) { return 0; } +static inline int pci_find_dvsec(struct pci_dev *dev, u16 vendor, u16 id) +{ return 0; } static inline u64 pci_get_dsn(struct pci_dev *dev) { return 0; } diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index 8f8bd2318c6c..3532528441ef 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -1071,6 +1071,10 @@ #define PCI_DVSEC_HEADER1 0x4 /* Designated Vendor-Specific Header1 */ #define PCI_DVSEC_HEADER2 0x8 /* Designated Vendor-Specific Header2 */ +#define PCI_DVSEC_ID_INTEL_SIOV 0x5 +#define PCI_DVSEC_INTEL_SIOV_CAP 0x14 +#define PCI_DVSEC_INTEL_SIOV_CAP_IMS 0x1 + /* Data Link Feature */ #define PCI_DLF_CAP 0x04 /* Capabilities Register */ #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ From patchwork Fri Oct 30 18:51:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391243 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBJK6tdbz9sS8 for ; Sat, 31 Oct 2020 05:52:33 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727337AbgJ3Swb (ORCPT ); Fri, 30 Oct 2020 14:52:31 -0400 Received: from mga12.intel.com ([192.55.52.136]:35185 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726297AbgJ3Swa (ORCPT ); Fri, 30 Oct 2020 14:52:30 -0400 IronPort-SDR: c8C4y+7gdM2Wn+cFte6Js/0UEpgIcZyKUS0rg6Rp5cJL2QKjhhmhCYMGn7xdXU/MTK3GCcqQo6 L1pI79aH1pvQ== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="147936863" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="147936863" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:39 -0700 IronPort-SDR: CCoamM27cI/mm2k/OA1RJJpAX7t21eMH6nG1FMiRF9gZSd7zTfdtQ858C9hTnAeNnrVT0Ot6qh q4t4dS8FVDSw== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="527210084" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:38 -0700 Subject: [PATCH v4 07/17] dmaengine: idxd: add IMS support in base driver From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:51:38 -0700 Message-ID: <160408389855.912050.5169538738792960557.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org In preparation for support of VFIO mediated device for idxd driver, the enabling for Interrupt Message Store (IMS) interrupts is added for the idxd With IMS support the idxd driver can dynamically allocate interrupts on a per mdev basis based on how many IMS vectors that are mapped to the mdev device. This commit only provides the support functions in the base driver and not the VFIO mdev code utilization. The commit has some portal related changes. A "portal" is a special location within the MMIO BAR2 of the DSA device where descriptors are submitted via the CPU command MOVDIR64B or ENQCMD(S). The offset for the portal address determines whether the submitted descriptor is for MSI-X or IMS notification. See Intel SIOV spec for more details: https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification Signed-off-by: Dave Jiang --- Documentation/ABI/stable/sysfs-driver-dma-idxd | 6 ++++++ drivers/dma/idxd/cdev.c | 4 ++-- drivers/dma/idxd/idxd.h | 13 +++++++++---- drivers/dma/idxd/init.c | 19 +++++++++++++++++++ drivers/dma/idxd/submit.c | 10 ++++++++-- drivers/dma/idxd/sysfs.c | 9 +++++++++ 6 files changed, 53 insertions(+), 8 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd index 5ea81ffd3c1a..ed5aeecf7015 100644 --- a/Documentation/ABI/stable/sysfs-driver-dma-idxd +++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd @@ -129,6 +129,12 @@ KernelVersion: 5.10.0 Contact: dmaengine@vger.kernel.org Description: The last executed device administrative command's status/error. +What: /sys/bus/dsa/devices/dsa/ims_size +Date: Oct 15, 2020 +KernelVersion: 5.11.0 +Contact: dmaengine@vger.kernel.org +Description: The total number of vectors available for Interrupt Message Store. + What: /sys/bus/dsa/devices/wq./block_on_fault Date: Oct 27, 2020 KernelVersion: 5.11.0 diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 010b820d8f74..b774bf336347 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -204,8 +204,8 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma) return rc; vma->vm_flags |= VM_DONTCOPY; - pfn = (base + idxd_get_wq_portal_full_offset(wq->id, - IDXD_PORTAL_LIMITED)) >> PAGE_SHIFT; + pfn = (base + idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED, + IDXD_IRQ_MSIX)) >> PAGE_SHIFT; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_private_data = ctx; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index a506a16c83ee..549426bfb443 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -154,6 +154,7 @@ enum idxd_device_flag { IDXD_FLAG_CONFIGURABLE = 0, IDXD_FLAG_CMD_RUNNING, IDXD_FLAG_PASID_ENABLED, + IDXD_FLAG_SIOV_SUPPORTED, }; struct idxd_device { @@ -181,6 +182,7 @@ struct idxd_device { int num_groups; + u32 ims_offset; u32 msix_perm_offset; u32 wqcfg_offset; u32 grpcfg_offset; @@ -188,6 +190,7 @@ struct idxd_device { u64 max_xfer_bytes; u32 max_batch_size; + int ims_size; int max_groups; int max_engines; int max_tokens; @@ -262,15 +265,17 @@ enum idxd_interrupt_type { IDXD_IRQ_IMS, }; -static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot) +static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return prot * 0x1000; + return prot * 0x1000 + irq_type * 0x2000; } static inline int idxd_get_wq_portal_full_offset(int wq_id, - enum idxd_portal_prot prot) + enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot); + return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot, irq_type); } static inline void idxd_set_type(struct idxd_device *idxd) diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index c136216e19e8..4a21c2a17a62 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include "../dmaengine.h" @@ -244,10 +245,27 @@ static void idxd_read_table_offsets(struct idxd_device *idxd) dev_dbg(dev, "IDXD Work Queue Config Offset: %#x\n", idxd->wqcfg_offset); idxd->msix_perm_offset = offsets.msix_perm * IDXD_TABLE_MULT; dev_dbg(dev, "IDXD MSIX Permission Offset: %#x\n", idxd->msix_perm_offset); + idxd->ims_offset = offsets.ims * IDXD_TABLE_MULT; + dev_dbg(dev, "IDXD IMS Offset: %#x\n", idxd->ims_offset); idxd->perfmon_offset = offsets.perfmon * IDXD_TABLE_MULT; dev_dbg(dev, "IDXD Perfmon Offset: %#x\n", idxd->perfmon_offset); } +static void idxd_check_siov(struct idxd_device *idxd) +{ + struct pci_dev *pdev = idxd->pdev; + + if (pci_ims_supported(idxd->pdev) && idxd->hw.gen_cap.max_ims_mult) { + idxd->ims_size = idxd->hw.gen_cap.max_ims_mult * 256ULL; + dev_dbg(&pdev->dev, "IMS size: %u\n", idxd->ims_size); + set_bit(IDXD_FLAG_SIOV_SUPPORTED, &idxd->flags); + dev_dbg(&pdev->dev, "IMS supported for device\n"); + return; + } + + dev_dbg(&pdev->dev, "SIOV unsupported for device\n"); +} + static void idxd_read_caps(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; @@ -266,6 +284,7 @@ static void idxd_read_caps(struct idxd_device *idxd) dev_dbg(dev, "max xfer size: %llu bytes\n", idxd->max_xfer_bytes); idxd->max_batch_size = 1U << idxd->hw.gen_cap.max_batch_shift; dev_dbg(dev, "max batch size: %u\n", idxd->max_batch_size); + idxd_check_siov(idxd); if (idxd->hw.gen_cap.config_en) set_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags); diff --git a/drivers/dma/idxd/submit.c b/drivers/dma/idxd/submit.c index cdea5d37ef24..f76d154d1dbd 100644 --- a/drivers/dma/idxd/submit.c +++ b/drivers/dma/idxd/submit.c @@ -30,7 +30,13 @@ static struct idxd_desc *__get_desc(struct idxd_wq *wq, int idx, int cpu) desc->hw->int_handle = wq->vec_ptr; } else { desc->vector = wq->vec_ptr; - desc->hw->int_handle = idxd->int_handles[desc->vector]; + /* + * int_handles are only for descriptor completion. However for device + * MSIX enumeration, vec 0 is used for misc interrupts. Therefore even + * though we are rotating through 1...N for descriptor interrupts, we + * need to acqurie the int_handles from 0..N-1. + */ + desc->hw->int_handle = idxd->int_handles[desc->vector - 1]; } return desc; @@ -91,7 +97,7 @@ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc) if (idxd->state != IDXD_DEV_ENABLED) return -EIO; - portal = wq->portal + idxd_get_wq_portal_offset(IDXD_PORTAL_LIMITED); + portal = wq->portal + idxd_get_wq_portal_offset(IDXD_PORTAL_LIMITED, IDXD_IRQ_MSIX); /* * The wmb() flushes writes to coherent DMA data before diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 304eb2cf532e..17f13ebae028 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -1353,6 +1353,14 @@ static ssize_t numa_node_show(struct device *dev, } static DEVICE_ATTR_RO(numa_node); +static ssize_t ims_size_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct idxd_device *idxd = container_of(dev, struct idxd_device, conf_dev); + + return sprintf(buf, "%u\n", idxd->ims_size); +} +static DEVICE_ATTR_RO(ims_size); + static ssize_t max_batch_size_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1548,6 +1556,7 @@ static struct attribute *idxd_device_attributes[] = { &dev_attr_max_work_queues_size.attr, &dev_attr_max_engines.attr, &dev_attr_numa_node.attr, + &dev_attr_ims_size.attr, &dev_attr_max_batch_size.attr, &dev_attr_max_transfer_size.attr, &dev_attr_op_cap.attr, From patchwork Fri Oct 30 18:51:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391260 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBLh1pVhz9sVL for ; Sat, 31 Oct 2020 05:54:36 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727192AbgJ3Swb (ORCPT ); Fri, 30 Oct 2020 14:52:31 -0400 Received: from mga14.intel.com ([192.55.52.115]:56189 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726317AbgJ3Swa (ORCPT ); Fri, 30 Oct 2020 14:52:30 -0400 IronPort-SDR: 7DLXuMND1a8Kh351f0+mk051JOcQ/Pcitxa90weE01mjgO3WvOvqUIxxwy4nWQV/BiFpB81qko 8H5dOyoDS0Gg== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="167868421" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="167868421" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:46 -0700 IronPort-SDR: wljAkm/RmAi+ReQrhytFmoF1qSKrwyD+ShKtCFX8Ps6kt5OZmxQcsvbZ2SQ0yBJsdAQQOYlZo2 Cr65KqZ3lwvA== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="361933604" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:45 -0700 Subject: [PATCH v4 08/17] dmaengine: idxd: add device support functions in prep for mdev From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:51:44 -0700 Message-ID: <160408390489.912050.13504232395876662339.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add device support helper functions in preparation of adding VFIO mdev support. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 61 ++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/idxd.h | 4 +++ drivers/dma/idxd/registers.h | 3 +- 3 files changed, 67 insertions(+), 1 deletion(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index a9ae970db0a4..8aff07b1acb4 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -287,6 +287,30 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq) devm_iounmap(dev, wq->portal); } +int idxd_wq_abort(struct idxd_wq *wq) +{ + struct idxd_device *idxd = wq->idxd; + struct device *dev = &idxd->pdev->dev; + u32 operand, status; + + dev_dbg(dev, "Abort WQ %d\n", wq->id); + if (wq->state != IDXD_WQ_ENABLED) { + dev_dbg(dev, "WQ %d not active\n", wq->id); + return -ENXIO; + } + + operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &status); + if (status != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ abort failed: %#x\n", status); + return -ENXIO; + } + + dev_dbg(dev, "WQ %d aborted\n", wq->id); + return 0; +} + int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) { struct idxd_device *idxd = wq->idxd; @@ -366,6 +390,32 @@ void idxd_wq_disable_cleanup(struct idxd_wq *wq) } } +void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + + /* PASID fields are 8 bytes into the WQCFG register */ + offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PASID_IDX); + wq->wqcfg->pasid = pasid; + iowrite32(wq->wqcfg->bits[WQCFG_PASID_IDX], idxd->reg_base + offset); +} + +void idxd_wq_setup_priv(struct idxd_wq *wq, int priv) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + + /* priv field is 8 bytes into the WQCFG register */ + offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIV_IDX); + wq->wqcfg->priv = !!priv; + iowrite32(wq->wqcfg->bits[WQCFG_PRIV_IDX], idxd->reg_base + offset); +} + /* Device control bits */ static inline bool idxd_is_enabled(struct idxd_device *idxd) { @@ -532,6 +582,17 @@ void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid) dev_dbg(dev, "pasid %d drained\n", pasid); } +void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid) +{ + struct device *dev = &idxd->pdev->dev; + u32 operand; + + operand = pasid; + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_PASID, operand); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_PASID, operand, NULL); + dev_dbg(dev, "pasid %d aborted\n", pasid); +} + int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, enum idxd_interrupt_type irq_type) { diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 549426bfb443..eb8552d32a0a 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -331,6 +331,7 @@ void idxd_device_cleanup(struct idxd_device *idxd); int idxd_device_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); +void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid); int idxd_device_load_config(struct idxd_device *idxd); int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, enum idxd_interrupt_type irq_type); @@ -348,6 +349,9 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq); void idxd_wq_disable_cleanup(struct idxd_wq *wq); int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid); int idxd_wq_disable_pasid(struct idxd_wq *wq); +int idxd_wq_abort(struct idxd_wq *wq); +void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid); +void idxd_wq_setup_priv(struct idxd_wq *wq, int priv); /* submission */ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc); diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index d02fd59a8e39..acc071df48eb 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -345,7 +345,8 @@ union wqcfg { u32 bits[8]; } __packed; -#define WQCFG_PASID_IDX 2 +#define WQCFG_PASID_IDX 2 +#define WQCFG_PRIV_IDX 2 /* * This macro calculates the offset into the WQCFG register From patchwork Fri Oct 30 18:51:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391258 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBLd1XcRz9sSP for ; Sat, 31 Oct 2020 05:54:33 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727634AbgJ3SyU (ORCPT ); Fri, 30 Oct 2020 14:54:20 -0400 Received: from mga01.intel.com ([192.55.52.88]:23381 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726430AbgJ3Swb (ORCPT ); Fri, 30 Oct 2020 14:52:31 -0400 IronPort-SDR: JRXN6VRi0gnAsJQZMX4KcxIIwOP1TrS5xZvEl0B914to5Z0fW1VTBYoRtCn2kmxPxmpMm1Hjeq OXBxfLdLyO2Q== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="186464526" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="186464526" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:54 -0700 IronPort-SDR: RzdRlzMiXgHO0rt2ua3NVPHwO56fWxzom2hgAmc0u2vitUhJxpKZSV90VpPPQsSoFNfTpUJbv+ ei1FsFjAhMIQ== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="362528435" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:51 -0700 Subject: [PATCH v4 09/17] dmaengine: idxd: add basic mdev registration and helper functions From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:51:51 -0700 Message-ID: <160408391111.912050.4468475162485932156.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Create a mediated device through the VFIO mediated device framework. The mdev framework allows creation of an mediated device by the driver with portion of the device's resources. The driver will emulate the slow path such as the PCI config space, MMIO bar, and the command registers. The descriptor submission portal(s) will be mmaped to the guest in order to submit descriptors directly by the guest kernel or apps. The mediated device support code in the idxd will be referred to as the Virtual Device Composition Module (vdcm). Add basic plumbing to fill out the mdev_parent_ops struct that VFIO mdev requires to support a mediated device. Signed-off-by: Dave Jiang --- drivers/dma/Kconfig | 7 drivers/dma/idxd/Makefile | 2 drivers/dma/idxd/idxd.h | 14 + drivers/dma/idxd/init.c | 11 + drivers/dma/idxd/mdev.c | 968 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/mdev.h | 115 +++++ drivers/dma/idxd/vdev.c | 75 +++ drivers/dma/idxd/vdev.h | 19 + 8 files changed, 1211 insertions(+) create mode 100644 drivers/dma/idxd/mdev.c create mode 100644 drivers/dma/idxd/mdev.h create mode 100644 drivers/dma/idxd/vdev.c create mode 100644 drivers/dma/idxd/vdev.h diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 6a908785a5f7..c5970e4a3a2c 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -306,6 +306,13 @@ config INTEL_IDXD_SVM depends on PCI_PASID depends on PCI_IOV +config INTEL_IDXD_MDEV + bool "IDXD VFIO Mediated Device Support" + depends on INTEL_IDXD + depends on VFIO_MDEV + depends on VFIO_MDEV_DEVICE + select PCI_SIOV + config INTEL_IOATDMA tristate "Intel I/OAT DMA support" depends on PCI && X86_64 diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile index 8978b898d777..30cad704a95a 100644 --- a/drivers/dma/idxd/Makefile +++ b/drivers/dma/idxd/Makefile @@ -1,2 +1,4 @@ obj-$(CONFIG_INTEL_IDXD) += idxd.o idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o + +idxd-$(CONFIG_INTEL_IDXD_MDEV) += mdev.o vdev.o diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index eb8552d32a0a..ab28a1bffb7c 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -8,6 +8,7 @@ #include #include #include +#include #include "registers.h" #define IDXD_DRIVER_VERSION "1.00" @@ -123,6 +124,7 @@ struct idxd_wq { char name[WQ_NAME_SIZE + 1]; u64 max_xfer_bytes; u32 max_batch_size; + struct list_head vdcm_list; }; struct idxd_engine { @@ -155,6 +157,7 @@ enum idxd_device_flag { IDXD_FLAG_CMD_RUNNING, IDXD_FLAG_PASID_ENABLED, IDXD_FLAG_SIOV_SUPPORTED, + IDXD_FLAG_MDEV_ENABLED, }; struct idxd_device { @@ -250,11 +253,17 @@ static inline bool device_pasid_enabled(struct idxd_device *idxd) return test_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags); } + static inline bool device_swq_supported(struct idxd_device *idxd) { return (support_enqcmd && device_pasid_enabled(idxd)); } +static inline bool device_mdev_enabled(struct idxd_device *idxd) +{ + return test_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags); +} + enum idxd_portal_prot { IDXD_PORTAL_UNLIMITED = 0, IDXD_PORTAL_LIMITED, @@ -375,4 +384,9 @@ int idxd_cdev_get_major(struct idxd_device *idxd); int idxd_wq_add_cdev(struct idxd_wq *wq); void idxd_wq_del_cdev(struct idxd_wq *wq); +/* mdev */ +int idxd_mdev_host_init(struct idxd_device *idxd); +void idxd_mdev_host_release(struct idxd_device *idxd); +int idxd_mdev_get_pasid(struct mdev_device *mdev); + #endif diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 4a21c2a17a62..ab91293aedb9 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -218,6 +218,7 @@ static int idxd_setup_internals(struct idxd_device *idxd) wq->wqcfg = devm_kzalloc(dev, idxd->wqcfg_size, GFP_KERNEL); if (!wq->wqcfg) return -ENOMEM; + INIT_LIST_HEAD(&wq->vdcm_list); } for (i = 0; i < idxd->max_engines; i++) { @@ -479,6 +480,14 @@ static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) return -ENODEV; } + if (IS_ENABLED(CONFIG_INTEL_IDXD_MDEV)) { + rc = idxd_mdev_host_init(idxd); + if (rc < 0) + dev_warn(dev, "VFIO mdev not setup: %d\n", rc); + else + set_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags); + } + rc = idxd_setup_sysfs(idxd); if (rc) { dev_err(dev, "IDXD sysfs setup failed\n"); @@ -572,6 +581,8 @@ static void idxd_remove(struct pci_dev *pdev) dev_dbg(&pdev->dev, "%s called\n", __func__); idxd_cleanup_sysfs(idxd); idxd_shutdown(pdev); + if (IS_ENABLED(CONFIG_INTEL_IDXD_MDEV) && device_mdev_enabled(idxd)) + idxd_mdev_host_release(idxd); if (device_pasid_enabled(idxd)) idxd_disable_system_pasid(idxd); mutex_lock(&idxd_idr_lock); diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c new file mode 100644 index 000000000000..3b6febe22a0e --- /dev/null +++ b/drivers/dma/idxd/mdev.c @@ -0,0 +1,968 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" +#include "../../vfio/pci/vfio_pci_private.h" +#include "mdev.h" +#include "vdev.h" + +static u64 idxd_pci_config[] = { + 0x001000000b258086ULL, + 0x0080000008800000ULL, + 0x000000000000000cULL, + 0x000000000000000cULL, + 0x0000000000000000ULL, + 0x2010808600000000ULL, + 0x0000004000000000ULL, + 0x000000ff00000000ULL, + 0x0000060000015011ULL, /* MSI-X capability, hardcoded 2 entries, Encoded as N-1 */ + 0x0000070000000000ULL, + 0x0000000000920010ULL, /* PCIe capability */ + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, +}; + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, unsigned int index, + unsigned int start, unsigned int count, void *data); + +int idxd_mdev_get_pasid(struct mdev_device *mdev) +{ + struct iommu_domain *domain; + struct device *dev = mdev_dev(mdev); + + domain = iommu_get_domain_for_dev(dev); + if (!domain) + return -ENODEV; + + return iommu_aux_get_pasid(domain, dev->parent); +} + +static inline void reset_vconfig(struct vdcm_idxd *vidxd) +{ + memset(vidxd->cfg, 0, VIDXD_MAX_CFG_SPACE_SZ); + memcpy(vidxd->cfg, idxd_pci_config, sizeof(idxd_pci_config)); +} + +static inline void reset_vmmio(struct vdcm_idxd *vidxd) +{ + memset(&vidxd->bar0, 0, VIDXD_MAX_MMIO_SPACE_SZ); +} + +static void idxd_vdcm_init(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq = vidxd->wq; + + reset_vconfig(vidxd); + reset_vmmio(vidxd); + + vidxd->bar_size[0] = VIDXD_BAR0_SIZE; + vidxd->bar_size[1] = VIDXD_BAR2_SIZE; + + vidxd_mmio_init(vidxd); + + if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) + idxd_wq_disable(wq); +} + +static void idxd_vdcm_release(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "vdcm_idxd_release %d\n", vidxd->type->type); + mutex_lock(&vidxd->dev_lock); + if (!vidxd->refcount) + goto out; + + idxd_vdcm_set_irqs(vidxd, VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER, + VFIO_PCI_MSIX_IRQ_INDEX, 0, 0, NULL); + + vidxd_free_ims_entries(vidxd); + + /* Re-initialize the VIDXD to a pristine state for re-use */ + idxd_vdcm_init(vidxd); + vidxd->refcount--; + + out: + mutex_unlock(&vidxd->dev_lock); +} + +static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev, + struct vdcm_idxd_type *type) +{ + struct vdcm_idxd *vidxd; + struct idxd_wq *wq = NULL; + + /* PLACEHOLDER, wq matching comes later */ + + if (!wq) + return ERR_PTR(-ENODEV); + + vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); + if (!vidxd) + return ERR_PTR(-ENOMEM); + + mutex_init(&vidxd->dev_lock); + vidxd->idxd = idxd; + vidxd->vdev.mdev = mdev; + vidxd->wq = wq; + mdev_set_drvdata(mdev, vidxd); + vidxd->type = type; + vidxd->num_wqs = VIDXD_MAX_WQS; + + idxd_vdcm_init(vidxd); + mutex_lock(&wq->wq_lock); + idxd_wq_get(wq); + mutex_unlock(&wq->wq_lock); + + return vidxd; +} + +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES]; + +static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev, + const char *name) +{ + int i; + char dev_name[IDXD_MDEV_NAME_LEN]; + + for (i = 0; i < IDXD_MDEV_TYPES; i++) { + snprintf(dev_name, IDXD_MDEV_NAME_LEN, "idxd-%s", + idxd_mdev_types[i].name); + + if (!strncmp(name, dev_name, IDXD_MDEV_NAME_LEN)) + return &idxd_mdev_types[i]; + } + + return NULL; +} + +static int idxd_vdcm_create(struct kobject *kobj, struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd; + struct vdcm_idxd_type *type; + struct device *dev, *parent; + struct idxd_device *idxd; + struct idxd_wq *wq; + + parent = mdev_parent_dev(mdev); + idxd = dev_get_drvdata(parent); + dev = mdev_dev(mdev); + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + if (!type) { + dev_err(dev, "failed to find type %s to create\n", + kobject_name(kobj)); + return -EINVAL; + } + + vidxd = vdcm_vidxd_create(idxd, mdev, type); + if (IS_ERR(vidxd)) { + dev_err(dev, "failed to create vidxd: %ld\n", PTR_ERR(vidxd)); + return PTR_ERR(vidxd); + } + + wq = vidxd->wq; + mutex_lock(&wq->wq_lock); + list_add(&vidxd->list, &wq->vdcm_list); + mutex_unlock(&wq->wq_lock); + dev_dbg(dev, "mdev creation success: %s\n", dev_name(mdev_dev(mdev))); + + return 0; +} + +static int idxd_vdcm_remove(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_device *idxd = vidxd->idxd; + struct device *dev = &idxd->pdev->dev; + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s: removing for wq %d\n", __func__, vidxd->wq->id); + + mutex_lock(&wq->wq_lock); + list_del(&vidxd->list); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + + kfree(vidxd); + return 0; +} + +static int idxd_vdcm_open(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + int rc; + struct vdcm_idxd_type *type = vidxd->type; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s: type: %d\n", __func__, type->type); + + mutex_lock(&vidxd->dev_lock); + if (vidxd->refcount) + goto out; + + /* allocate and setup IMS entries */ + rc = vidxd_setup_ims_entries(vidxd); + if (rc < 0) + goto out; + + vidxd->refcount++; + mutex_unlock(&vidxd->dev_lock); + + return rc; + + out: + mutex_unlock(&vidxd->dev_lock); + return rc; +} + +static ssize_t idxd_vdcm_rw(struct mdev_device *mdev, char *buf, size_t count, loff_t *ppos, + enum idxd_vdcm_rw mode) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); + u64 pos = *ppos & VFIO_PCI_OFFSET_MASK; + struct device *dev = mdev_dev(mdev); + int rc = -EINVAL; + + if (index >= VFIO_PCI_NUM_REGIONS) { + dev_err(dev, "invalid index: %u\n", index); + return -EINVAL; + } + + switch (index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vidxd_cfg_write(vidxd, pos, buf, count); + else + rc = vidxd_cfg_read(vidxd, pos, buf, count); + break; + case VFIO_PCI_BAR0_REGION_INDEX: + case VFIO_PCI_BAR1_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vidxd_mmio_write(vidxd, vidxd->bar_val[0] + pos, buf, count); + else + rc = vidxd_mmio_read(vidxd, vidxd->bar_val[0] + pos, buf, count); + break; + case VFIO_PCI_BAR2_REGION_INDEX: + case VFIO_PCI_BAR3_REGION_INDEX: + case VFIO_PCI_BAR4_REGION_INDEX: + case VFIO_PCI_BAR5_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + case VFIO_PCI_ROM_REGION_INDEX: + default: + dev_err(dev, "unsupported region: %u\n", index); + } + + return rc == 0 ? count : rc; +} + +static ssize_t idxd_vdcm_read(struct mdev_device *mdev, char __user *buf, size_t count, + loff_t *ppos) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int done = 0; + int rc; + + mutex_lock(&vidxd->dev_lock); + while (count) { + size_t filled; + + if (count >= 4 && !(*ppos % 4)) { + u32 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 2; + } else { + u8 val; + + rc = idxd_vdcm_rw(mdev, &val, sizeof(val), ppos, + IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + mutex_unlock(&vidxd->dev_lock); + return done; + + read_err: + mutex_unlock(&vidxd->dev_lock); + return -EFAULT; +} + +static ssize_t idxd_vdcm_write(struct mdev_device *mdev, const char __user *buf, size_t count, + loff_t *ppos) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int done = 0; + int rc; + + mutex_lock(&vidxd->dev_lock); + while (count) { + size_t filled; + + if (count >= 4 && !(*ppos % 4)) { + u32 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, + sizeof(val), ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 2; + } else { + u8 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, &val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + mutex_unlock(&vidxd->dev_lock); + return done; + +write_err: + mutex_unlock(&vidxd->dev_lock); + return -EFAULT; +} + +static int check_vma(struct idxd_wq *wq, struct vm_area_struct *vma) +{ + if (vma->vm_end < vma->vm_start) + return -EINVAL; + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + + return 0; +} + +static int idxd_vdcm_mmap(struct mdev_device *mdev, struct vm_area_struct *vma) +{ + unsigned int wq_idx, rc; + unsigned long req_size, pgoff = 0, offset; + pgprot_t pg_prot; + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_wq *wq = vidxd->wq; + struct idxd_device *idxd = vidxd->idxd; + enum idxd_portal_prot virt_portal, phys_portal; + phys_addr_t base = pci_resource_start(idxd->pdev, IDXD_WQ_BAR); + struct device *dev = mdev_dev(mdev); + + rc = check_vma(wq, vma); + if (rc) + return rc; + + pg_prot = vma->vm_page_prot; + req_size = vma->vm_end - vma->vm_start; + vma->vm_flags |= VM_DONTCOPY; + + offset = (vma->vm_pgoff << PAGE_SHIFT) & + ((1ULL << VFIO_PCI_OFFSET_SHIFT) - 1); + + wq_idx = offset >> (PAGE_SHIFT + 2); + if (wq_idx >= 1) { + dev_err(dev, "mapping invalid wq %d off %lx\n", + wq_idx, offset); + return -EINVAL; + } + + /* + * Check and see if the guest wants to map to the limited or unlimited portal. + * The driver will allow mapping to unlimited portal only if the the wq is a + * dedicated wq. Otherwise, it goes to limited. + */ + virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1; + phys_portal = IDXD_PORTAL_LIMITED; + if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq)) + phys_portal = IDXD_PORTAL_UNLIMITED; + + /* We always map IMS portals to the guest */ + pgoff = (base + idxd_get_wq_portal_full_offset(wq->id, phys_portal, + IDXD_IRQ_IMS)) >> PAGE_SHIFT; + + dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size, + pgprot_val(pg_prot)); + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_private_data = mdev; + vma->vm_pgoff = pgoff; + + return remap_pfn_range(vma, vma->vm_start, pgoff, req_size, pg_prot); +} + +static int idxd_vdcm_get_irq_count(struct vdcm_idxd *vidxd, int type) +{ + /* + * Even though the number of MSIX vectors supported are not tied to number of + * wqs being exported, the current design is to allow 1 vector per WQ for guest. + * So here we end up with num of wqs plus 1 that handles the misc interrupts. + */ + if (type == VFIO_PCI_MSI_IRQ_INDEX || type == VFIO_PCI_MSIX_IRQ_INDEX) + return VIDXD_MAX_MSIX_VECS; + + return 0; +} + +static irqreturn_t idxd_guest_wq_completion(int irq, void *data) +{ + struct ims_irq_entry *irq_entry = data; + + /* + * WQ irq_entry 0 is actually MSIX vector 1 for guest. MSIX vector 0 + * is emulated. + */ + vidxd_send_interrupt(irq_entry->vidxd, irq_entry->id + 1); + return IRQ_HANDLED; +} + +static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct ims_irq_entry *irq_entry; + int rc; + + if (!vidxd->vdev.msix_trigger[index]) + return 0; + + dev_dbg(dev, "disable MSIX trigger %d\n", index); + if (index) { + u32 auxval; + + irq_entry = &vidxd->irq_entries[index - 1]; + if (irq_entry->irq_set) { + free_irq(irq_entry->entry->irq, irq_entry); + irq_entry->irq_set = false; + } + + auxval = ims_ctrl_pasid_aux(0, false); + rc = irq_set_auxdata(irq_entry->entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval); + if (rc) + return rc; + } + eventfd_ctx_put(vidxd->vdev.msix_trigger[index]); + vidxd->vdev.msix_trigger[index] = NULL; + + return 0; +} + +static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct ims_irq_entry *irq_entry; + struct eventfd_ctx *trigger; + int rc; + + rc = msix_trigger_unregister(vidxd, index); + if (rc < 0) + return rc; + + dev_dbg(dev, "enable MSIX trigger %d\n", index); + trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(trigger)) { + dev_warn(dev, "eventfd_ctx_fdget failed %d\n", index); + return PTR_ERR(trigger); + } + + /* + * The MSIX vector 0 is emulated by the mdev. Starting with vector 1 + * the interrupt is backed by IMS and needs to be set up, but we + * will be setting up entry 0 of the IMS vectors. So here we pass + * in i - 1 to the host setup and irq_entries. + */ + if (index) { + int pasid; + u32 auxval; + + irq_entry = &vidxd->irq_entries[index - 1]; + pasid = idxd_mdev_get_pasid(mdev); + if (pasid < 0) + return pasid; + + /* + * Program and enable the pasid field in the IMS entry. The programmed pasid and + * enabled field is checked against the pasid and enable field for the work queue + * configuration and the pasid for the descriptor. A mismatch will result in blocked + * IMS interrupt. + */ + auxval = ims_ctrl_pasid_aux(pasid, true); + rc = irq_set_auxdata(irq_entry->entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval); + if (rc < 0) + return rc; + + rc = request_irq(irq_entry->entry->irq, idxd_guest_wq_completion, 0, "idxd-ims", + irq_entry); + if (rc) { + dev_warn(dev, "failed to request ims irq\n"); + eventfd_ctx_put(trigger); + auxval = ims_ctrl_pasid_aux(0, false); + irq_set_auxdata(irq_entry->entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval); + return rc; + } + irq_entry->irq_set = true; + } + + vidxd->vdev.msix_trigger[index] = trigger; + return 0; +} + +static int vdcm_idxd_set_msix_trigger(struct vdcm_idxd *vidxd, + unsigned int index, unsigned int start, + unsigned int count, uint32_t flags, + void *data) +{ + int i, rc = 0; + + if (count > VIDXD_MAX_MSIX_ENTRIES - 1) + count = VIDXD_MAX_MSIX_ENTRIES - 1; + + /* + * The MSIX vector 0 is emulated by the mdev. Starting with vector 1 + * the interrupt is backed by IMS and needs to be set up, but we + * will be setting up entry 0 of the IMS vectors. So here we pass + * in i - 1 to the host setup and irq_entries. + */ + if (count == 0 && (flags & VFIO_IRQ_SET_DATA_NONE)) { + /* Disable all MSIX entries */ + for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) { + rc = msix_trigger_unregister(vidxd, i); + if (rc < 0) + return rc; + } + return 0; + } + + for (i = 0; i < count; i++) { + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + u32 fd = *(u32 *)(data + i * sizeof(u32)); + + rc = msix_trigger_register(vidxd, fd, i); + if (rc < 0) + return rc; + } else if (flags & VFIO_IRQ_SET_DATA_NONE) { + rc = msix_trigger_unregister(vidxd, i); + if (rc < 0) + return rc; + } + } + return rc; +} + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, + unsigned int index, unsigned int start, + unsigned int count, void *data) +{ + int (*func)(struct vdcm_idxd *vidxd, unsigned int index, + unsigned int start, unsigned int count, uint32_t flags, + void *data) = NULL; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + switch (index) { + case VFIO_PCI_INTX_IRQ_INDEX: + dev_warn(dev, "intx interrupts not supported.\n"); + break; + case VFIO_PCI_MSI_IRQ_INDEX: + dev_dbg(dev, "msi interrupt.\n"); + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; + break; + } + break; + case VFIO_PCI_MSIX_IRQ_INDEX: + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; + break; + } + break; + default: + return -ENOTTY; + } + + if (!func) + return -ENOTTY; + + return func(vidxd, index, start, count, flags, data); +} + +static void vidxd_vdcm_reset(struct vdcm_idxd *vidxd) +{ + vidxd_reset(vidxd); +} + +static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd, + unsigned long arg) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned long minsz; + int rc = -EINVAL; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "vidxd %p ioctl, cmd: %d\n", vidxd, cmd); + + mutex_lock(&vidxd->dev_lock); + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz) { + rc = -EINVAL; + goto out; + } + + info.flags = VFIO_DEVICE_FLAGS_PCI; + info.flags |= VFIO_DEVICE_FLAGS_RESET; + info.num_regions = VFIO_PCI_NUM_REGIONS; + info.num_irqs = VFIO_PCI_NUM_IRQS; + + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { + struct vfio_region_info info; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + struct vfio_region_info_cap_sparse_mmap *sparse = NULL; + size_t size; + int nr_areas = 1; + int cap_type_id = 0; + + minsz = offsetofend(struct vfio_region_info, offset); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz) { + rc = -EINVAL; + goto out; + } + + switch (info.index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = VIDXD_MAX_CFG_SPACE_SZ; + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR0_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = vidxd->bar_size[info.index]; + if (!info.size) { + info.flags = 0; + break; + } + + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR1_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + break; + case VFIO_PCI_BAR2_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.flags = VFIO_REGION_INFO_FLAG_CAPS | VFIO_REGION_INFO_FLAG_MMAP | + VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + info.size = vidxd->bar_size[1]; + + /* + * Every WQ has two areas for unlimited and limited + * MSI-X portals. IMS portals are not reported + */ + nr_areas = 2; + + size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas)); + sparse = kzalloc(size, GFP_KERNEL); + if (!sparse) { + rc = -ENOMEM; + goto out; + } + + sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + sparse->header.version = 1; + sparse->nr_areas = nr_areas; + cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + + sparse->areas[0].offset = 0; + sparse->areas[0].size = PAGE_SIZE; + + sparse->areas[1].offset = PAGE_SIZE; + sparse->areas[1].size = PAGE_SIZE; + break; + + case VFIO_PCI_BAR3_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + dev_dbg(dev, "get region info bar:%d\n", info.index); + break; + + case VFIO_PCI_ROM_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + dev_dbg(dev, "get region info index:%d\n", info.index); + break; + default: { + if (info.index >= VFIO_PCI_NUM_REGIONS) + rc = -EINVAL; + else + rc = 0; + goto out; + } /* default */ + } /* info.index switch */ + + if ((info.flags & VFIO_REGION_INFO_FLAG_CAPS) && sparse) { + if (cap_type_id == VFIO_REGION_INFO_CAP_SPARSE_MMAP) { + rc = vfio_info_add_capability(&caps, &sparse->header, + sizeof(*sparse) + (sparse->nr_areas * + sizeof(*sparse->areas))); + kfree(sparse); + if (rc) + goto out; + } + } + + if (caps.size) { + if (info.argsz < sizeof(info) + caps.size) { + info.argsz = sizeof(info) + caps.size; + info.cap_offset = 0; + } else { + vfio_info_cap_shift(&caps, sizeof(info)); + if (copy_to_user((void __user *)arg + sizeof(info), + caps.buf, caps.size)) { + kfree(caps.buf); + rc = -EFAULT; + goto out; + } + info.cap_offset = sizeof(info); + } + + kfree(caps.buf); + } + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { + struct vfio_irq_info info; + + minsz = offsetofend(struct vfio_irq_info, count); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) { + rc = -EINVAL; + goto out; + } + + switch (info.index) { + case VFIO_PCI_MSI_IRQ_INDEX: + case VFIO_PCI_MSIX_IRQ_INDEX: + default: + rc = -EINVAL; + goto out; + } /* switch(info.index) */ + + info.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE; + info.count = idxd_vdcm_get_irq_count(vidxd, info.index); + + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_SET_IRQS) { + struct vfio_irq_set hdr; + u8 *data = NULL; + size_t data_size = 0; + + minsz = offsetofend(struct vfio_irq_set, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) { + int max = idxd_vdcm_get_irq_count(vidxd, hdr.index); + + rc = vfio_set_irqs_validate_and_prepare(&hdr, max, VFIO_PCI_NUM_IRQS, + &data_size); + if (rc) { + dev_err(dev, "intel:vfio_set_irqs_validate_and_prepare failed\n"); + rc = -EINVAL; + goto out; + } + if (data_size) { + data = memdup_user((void __user *)(arg + minsz), data_size); + if (IS_ERR(data)) { + rc = PTR_ERR(data); + goto out; + } + } + } + + if (!data) { + rc = -EINVAL; + goto out; + } + + rc = idxd_vdcm_set_irqs(vidxd, hdr.flags, hdr.index, hdr.start, hdr.count, data); + kfree(data); + goto out; + } else if (cmd == VFIO_DEVICE_RESET) { + vidxd_vdcm_reset(vidxd); + } + + out: + mutex_unlock(&vidxd->dev_lock); + return rc; +} + +static const struct mdev_parent_ops idxd_vdcm_ops = { + .create = idxd_vdcm_create, + .remove = idxd_vdcm_remove, + .open = idxd_vdcm_open, + .release = idxd_vdcm_release, + .read = idxd_vdcm_read, + .write = idxd_vdcm_write, + .mmap = idxd_vdcm_mmap, + .ioctl = idxd_vdcm_ioctl, +}; + +int idxd_mdev_host_init(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + if (!test_bit(IDXD_FLAG_SIOV_SUPPORTED, &idxd->flags)) + return -EOPNOTSUPP; + + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { + rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) { + dev_warn(dev, "Failed to enable aux-domain: %d\n", rc); + return rc; + } + } else { + dev_warn(dev, "No aux-domain feature.\n"); + return -EOPNOTSUPP; + } + + return mdev_register_device(dev, &idxd_vdcm_ops); +} + +void idxd_mdev_host_release(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + mdev_unregister_device(dev); + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { + rc = iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) + dev_warn(dev, "Failed to disable aux-domain: %d\n", + rc); + } +} diff --git a/drivers/dma/idxd/mdev.h b/drivers/dma/idxd/mdev.h new file mode 100644 index 000000000000..b474f2303ba0 --- /dev/null +++ b/drivers/dma/idxd/mdev.h @@ -0,0 +1,115 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_MDEV_H_ +#define _IDXD_MDEV_H_ + +/* two 64-bit BARs implemented */ +#define VIDXD_MAX_BARS 2 +#define VIDXD_MAX_CFG_SPACE_SZ 4096 +#define VIDXD_MAX_MMIO_SPACE_SZ 8192 +#define VIDXD_MSIX_TBL_SZ_OFFSET 0x42 +#define VIDXD_CAP_CTRL_SZ 0x100 +#define VIDXD_GRP_CTRL_SZ 0x100 +#define VIDXD_WQ_CTRL_SZ 0x100 +#define VIDXD_WQ_OCPY_INT_SZ 0x20 +#define VIDXD_MSIX_TBL_SZ 0x90 +#define VIDXD_MSIX_PERM_TBL_SZ 0x48 + +#define VIDXD_MSIX_TABLE_OFFSET 0x600 +#define VIDXD_MSIX_PERM_OFFSET 0x300 +#define VIDXD_GRPCFG_OFFSET 0x400 +#define VIDXD_WQCFG_OFFSET 0x500 +#define VIDXD_IMS_OFFSET 0x1000 + +#define VIDXD_BAR0_SIZE 0x2000 +#define VIDXD_BAR2_SIZE 0x20000 +#define VIDXD_MAX_MSIX_ENTRIES (VIDXD_MSIX_TBL_SZ / 0x10) +#define VIDXD_MAX_WQS 1 +#define VIDXD_MAX_MSIX_VECS 2 + +#define VIDXD_ATS_OFFSET 0x100 +#define VIDXD_PRS_OFFSET 0x110 +#define VIDXD_PASID_OFFSET 0x120 +#define VIDXD_MSIX_PBA_OFFSET 0x700 + +struct ims_irq_entry { + struct vdcm_idxd *vidxd; + struct msi_desc *entry; + bool irq_set; + int id; +}; + +struct idxd_vdev { + struct mdev_device *mdev; + struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES]; +}; + +struct vdcm_idxd { + struct idxd_device *idxd; + struct idxd_wq *wq; + struct idxd_vdev vdev; + struct vdcm_idxd_type *type; + int num_wqs; + struct ims_irq_entry irq_entries[VIDXD_MAX_MSIX_ENTRIES]; + + /* For VM use case */ + u64 bar_val[VIDXD_MAX_BARS]; + u64 bar_size[VIDXD_MAX_BARS]; + u8 cfg[VIDXD_MAX_CFG_SPACE_SZ]; + u8 bar0[VIDXD_MAX_MMIO_SPACE_SZ]; + struct list_head list; + struct mutex dev_lock; /* lock for vidxd resources */ + + int refcount; +}; + +static inline struct vdcm_idxd *to_vidxd(struct idxd_vdev *vdev) +{ + return container_of(vdev, struct vdcm_idxd, vdev); +} + +#define IDXD_MDEV_NAME_LEN 16 +#define IDXD_MDEV_DESCRIPTION_LEN 64 + +enum idxd_mdev_type { + IDXD_MDEV_TYPE_1_DWQ = 0, +}; + +#define IDXD_MDEV_TYPES 1 + +struct vdcm_idxd_type { + char name[IDXD_MDEV_NAME_LEN]; + char description[IDXD_MDEV_DESCRIPTION_LEN]; + enum idxd_mdev_type type; + unsigned int avail_instance; +}; + +enum idxd_vdcm_rw { + IDXD_VDCM_READ = 0, + IDXD_VDCM_WRITE, +}; + +static inline u64 get_reg_val(void *buf, int size) +{ + u64 val = 0; + + switch (size) { + case 8: + val = *(u64 *)buf; + break; + case 4: + val = *(u32 *)buf; + break; + case 2: + val = *(u16 *)buf; + break; + case 1: + val = *(u8 *)buf; + break; + } + + return val; +} + +#endif diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c new file mode 100644 index 000000000000..6cc097edc6e9 --- /dev/null +++ b/drivers/dma/idxd/vdev.c @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" +#include "../../vfio/pci/vfio_pci_private.h" +#include "mdev.h" +#include "vdev.h" + +int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx) +{ + /* PLACE HOLDER */ + return 0; +} + +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +void vidxd_mmio_init(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} + +void vidxd_reset(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} + +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ + return 0; +} + +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} diff --git a/drivers/dma/idxd/vdev.h b/drivers/dma/idxd/vdev.h new file mode 100644 index 000000000000..baa30d98f9cb --- /dev/null +++ b/drivers/dma/idxd/vdev.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_VDEV_H_ +#define _IDXD_VDEV_H_ + +#include "mdev.h" + +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size); +void vidxd_mmio_init(struct vdcm_idxd *vidxd); +void vidxd_reset(struct vdcm_idxd *vidxd); +int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx); +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd); +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd); + +#endif From patchwork Fri Oct 30 18:51:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391255 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBLB1rDcz9sSP for ; Sat, 31 Oct 2020 05:54:10 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726317AbgJ3Swc (ORCPT ); Fri, 30 Oct 2020 14:52:32 -0400 Received: from mga12.intel.com ([192.55.52.136]:35148 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727026AbgJ3Swa (ORCPT ); Fri, 30 Oct 2020 14:52:30 -0400 IronPort-SDR: oWZIHvm1qK4Lqf3E/iM9Rf/PYUb3iuS5MQLn9NF2Onp5fVn67jGP9fKCnvvMCeyoDvKfgKxNn+ zpH8kNh0y52g== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="147936919" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="147936919" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:01 -0700 IronPort-SDR: kdVf2E0N636sACP3F1/MNtapJ+9T5LOdfwuZxKKvuvc8CIIWolPDLmH4+xAc4owRWXjSV6ek7o m1VBd1FsyYlw== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="537163694" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:51:59 -0700 Subject: [PATCH v4 10/17] dmaengine: idxd: add emulation rw routines From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:51:59 -0700 Message-ID: <160408391904.912050.5705076915206045627.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add emulation routines for PCI config read/write, MMIO read/write, and interrupt handling routine for the emulated device. The rw routines are called when PCI config read/writes or BAR0 mmio read/writes and being issued by the guest kernel through KVM/qemu. Because we are supporting read-only configuration, most of the MMIO emulations are simple memory copy except for cases such as handling device commands and interrupts. Signed-off-by: Dave Jiang --- drivers/dma/idxd/registers.h | 10 + drivers/dma/idxd/vdev.c | 427 +++++++++++++++++++++++++++++++++++++++++- drivers/dma/idxd/vdev.h | 8 + include/uapi/linux/idxd.h | 2 4 files changed, 439 insertions(+), 8 deletions(-) diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index acc071df48eb..5a76fd0ab6ad 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -194,7 +194,8 @@ union cmdsts_reg { }; u32 bits; } __packed; -#define IDXD_CMDSTS_ACTIVE 0x80000000 +#define IDXD_CMDS_ACTIVE_BIT 31 +#define IDXD_CMDSTS_ACTIVE BIT(IDXD_CMDS_ACTIVE_BIT) #define IDXD_CMDSTS_ERR_MASK 0xff #define IDXD_CMDSTS_RES_SHIFT 8 @@ -277,6 +278,11 @@ union msix_perm { u32 bits; } __packed; +#define IDXD_MSIX_PERM_MASK 0xfffff00c +#define IDXD_MSIX_PERM_IGNORE 0x3 +#define MSIX_ENTRY_MASK_INT 0x1 +#define MSIX_ENTRY_CTRL_BYTE 12 + union group_flags { struct { u32 tc_a:3; @@ -347,6 +353,8 @@ union wqcfg { #define WQCFG_PASID_IDX 2 #define WQCFG_PRIV_IDX 2 +#define WQCFG_MODE_DEDICATED 1 +#define WQCFG_MODE_SHARED 0 /* * This macro calculates the offset into the WQCFG register diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index 6cc097edc6e9..b38bb676e604 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -25,35 +25,443 @@ int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx) { - /* PLACE HOLDER */ + int rc = -1; + struct device *dev = &vidxd->idxd->pdev->dev; + + dev_dbg(dev, "%s interrput %d\n", __func__, msix_idx); + + if (!vidxd->vdev.msix_trigger[msix_idx]) { + dev_warn(dev, "%s: intr evtfd not found %d\n", __func__, msix_idx); + return -EINVAL; + } + + rc = eventfd_signal(vidxd->vdev.msix_trigger[msix_idx], 1); + if (rc != 1) + dev_err(dev, "eventfd signal failed (%d)\n", rc); + else + dev_dbg(dev, "vidxd interrupt triggered wq(%d) %d\n", vidxd->wq->id, msix_idx); + + return rc; +} + +static void vidxd_report_error(struct vdcm_idxd *vidxd, unsigned int error) +{ + u8 *bar0 = vidxd->bar0; + union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET); + union genctrl_reg *genctrl; + bool send = false; + + if (!swerr->valid) { + memset(swerr, 0, sizeof(*swerr)); + swerr->valid = 1; + swerr->error = error; + send = true; + } else if (swerr->valid && !swerr->overflow) { + swerr->overflow = 1; + } + + genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET); + if (send && genctrl->softerr_int_en) { + u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET); + + *intcause |= IDXD_INTC_ERR; + vidxd_send_interrupt(vidxd, 0); + } +} + +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + u32 offset = pos & (vidxd->bar_size[0] - 1); + u8 *bar0 = vidxd->bar0; + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + dev_dbg(dev, "vidxd mmio W %d %x %x: %llx\n", vidxd->wq->id, size, + offset, get_reg_val(buf, size)); + + if (((size & (size - 1)) != 0) || (offset & (size - 1)) != 0) + return -EINVAL; + + /* If we don't limit this, we potentially can write out of bound */ + if (size > sizeof(u32)) + return -EINVAL; + + switch (offset) { + case IDXD_GENCFG_OFFSET ... IDXD_GENCFG_OFFSET + 3: + /* Write only when device is disabled. */ + if (vidxd_state(vidxd) == IDXD_DEVICE_STATE_DISABLED) + memcpy(bar0 + offset, buf, size); + break; + + case IDXD_GENCTRL_OFFSET: + memcpy(bar0 + offset, buf, size); + break; + + case IDXD_INTCAUSE_OFFSET: + bar0[offset] &= ~(get_reg_val(buf, 1) & GENMASK(4, 0)); + break; + + case IDXD_CMD_OFFSET: { + u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET); + u32 val = get_reg_val(buf, size); + + if (size != sizeof(u32)) + return -EINVAL; + + /* Check and set command in progress */ + if (test_and_set_bit(IDXD_CMDS_ACTIVE_BIT, (unsigned long *)cmdsts) == 0) + vidxd_do_command(vidxd, val); + else + vidxd_report_error(vidxd, DSA_ERR_CMD_REG); + break; + } + + case IDXD_SWERR_OFFSET: + /* W1C */ + bar0[offset] &= ~(get_reg_val(buf, 1) & GENMASK(1, 0)); + break; + + case VIDXD_WQCFG_OFFSET ... VIDXD_WQCFG_OFFSET + VIDXD_WQ_CTRL_SZ - 1: + case VIDXD_GRPCFG_OFFSET ... VIDXD_GRPCFG_OFFSET + VIDXD_GRP_CTRL_SZ - 1: + /* Nothing is written. Should be all RO */ + break; + + case VIDXD_MSIX_TABLE_OFFSET ... VIDXD_MSIX_TABLE_OFFSET + VIDXD_MSIX_TBL_SZ - 1: { + int index = (offset - VIDXD_MSIX_TABLE_OFFSET) / 0x10; + u8 *msix_entry = &bar0[VIDXD_MSIX_TABLE_OFFSET + index * 0x10]; + u64 *pba = (u64 *)(bar0 + VIDXD_MSIX_PBA_OFFSET); + u8 ctrl; + + ctrl = msix_entry[MSIX_ENTRY_CTRL_BYTE]; + memcpy(bar0 + offset, buf, size); + /* Handle clearing of UNMASK bit */ + if (!(msix_entry[MSIX_ENTRY_CTRL_BYTE] & MSIX_ENTRY_MASK_INT) && + ctrl & MSIX_ENTRY_MASK_INT) + if (test_and_clear_bit(index, (unsigned long *)pba)) + vidxd_send_interrupt(vidxd, index); + break; + } + + case VIDXD_MSIX_PERM_OFFSET ... VIDXD_MSIX_PERM_OFFSET + VIDXD_MSIX_PERM_TBL_SZ - 1: + memcpy(bar0 + offset, buf, size); + break; + } /* offset */ + return 0; } int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) { - /* PLACEHOLDER */ + u32 offset = pos & (vidxd->bar_size[0] - 1); + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + memcpy(buf, vidxd->bar0 + offset, size); + + dev_dbg(dev, "vidxd mmio R %d %x %x: %llx\n", + vidxd->wq->id, size, offset, get_reg_val(buf, size)); return 0; } -int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) { - /* PLACEHOLDER */ + u32 offset = pos & 0xfff; + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + memcpy(buf, &vidxd->cfg[offset], count); + + dev_dbg(dev, "vidxd pci R %d %x %x: %llx\n", + vidxd->wq->id, count, offset, get_reg_val(buf, count)); + return 0; } -int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) +/* + * Much of the emulation code has been borrowed from Intel i915 cfg space + * emulation code. + * drivers/gpu/drm/i915/gvt/cfg_space.c: + */ + +/* + * Bitmap for writable bits (RW or RW1C bits, but cannot co-exist in one + * byte) byte by byte in standard pci configuration space. (not the full + * 256 bytes.) + */ +static const u8 pci_cfg_space_rw_bmp[PCI_INTERRUPT_LINE + 4] = { + [PCI_COMMAND] = 0xff, 0x07, + [PCI_STATUS] = 0x00, 0xf9, /* the only one RW1C byte */ + [PCI_CACHE_LINE_SIZE] = 0xff, + [PCI_BASE_ADDRESS_0 ... PCI_CARDBUS_CIS - 1] = 0xff, + [PCI_ROM_ADDRESS] = 0x01, 0xf8, 0xff, 0xff, + [PCI_INTERRUPT_LINE] = 0xff, +}; + +static void _pci_cfg_mem_write(struct vdcm_idxd *vidxd, unsigned int off, u8 *src, + unsigned int bytes) { - /* PLACEHOLDER */ + u8 *cfg_base = vidxd->cfg; + u8 mask, new, old; + int i = 0; + + for (; i < bytes && (off + i < sizeof(pci_cfg_space_rw_bmp)); i++) { + mask = pci_cfg_space_rw_bmp[off + i]; + old = cfg_base[off + i]; + new = src[i] & mask; + + /** + * The PCI_STATUS high byte has RW1C bits, here + * emulates clear by writing 1 for these bits. + * Writing a 0b to RW1C bits has no effect. + */ + if (off + i == PCI_STATUS + 1) + new = (~new & old) & mask; + + cfg_base[off + i] = (old & ~mask) | new; + } + + /* For other configuration space directly copy as it is. */ + if (i < bytes) + memcpy(cfg_base + off + i, src + i, bytes - i); +} + +static inline void _write_pci_bar(struct vdcm_idxd *vidxd, u32 offset, u32 val, bool low) +{ + u32 *pval; + + /* BAR offset should be 32 bits algiend */ + offset = rounddown(offset, 4); + pval = (u32 *)(vidxd->cfg + offset); + + if (low) { + /* + * only update bit 31 - bit 4, + * leave the bit 3 - bit 0 unchanged. + */ + *pval = (val & GENMASK(31, 4)) | (*pval & GENMASK(3, 0)); + } else { + *pval = val; + } +} + +static int _pci_cfg_bar_write(struct vdcm_idxd *vidxd, unsigned int offset, void *p_data, + unsigned int bytes) +{ + u32 new = *(u32 *)(p_data); + bool lo = IS_ALIGNED(offset, 8); + u64 size; + unsigned int bar_id; + + /* + * Power-up software can determine how much address + * space the device requires by writing a value of + * all 1's to the register and then reading the value + * back. The device will return 0's in all don't-care + * address bits. + */ + if (new == 0xffffffff) { + switch (offset) { + case PCI_BASE_ADDRESS_0: + case PCI_BASE_ADDRESS_1: + case PCI_BASE_ADDRESS_2: + case PCI_BASE_ADDRESS_3: + bar_id = (offset - PCI_BASE_ADDRESS_0) / 8; + size = vidxd->bar_size[bar_id]; + _write_pci_bar(vidxd, offset, size >> (lo ? 0 : 32), lo); + break; + default: + /* Unimplemented BARs */ + _write_pci_bar(vidxd, offset, 0x0, false); + } + } else { + switch (offset) { + case PCI_BASE_ADDRESS_0: + case PCI_BASE_ADDRESS_1: + case PCI_BASE_ADDRESS_2: + case PCI_BASE_ADDRESS_3: + _write_pci_bar(vidxd, offset, new, lo); + break; + default: + break; + } + } return 0; } int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size) { - /* PLACEHOLDER */ + struct device *dev = &vidxd->idxd->pdev->dev; + + if (size > 4) + return -EINVAL; + + if (pos + size > VIDXD_MAX_CFG_SPACE_SZ) + return -EINVAL; + + dev_dbg(dev, "vidxd pci W %d %x %x: %llx\n", vidxd->wq->id, size, pos, + get_reg_val(buf, size)); + + /* First check if it's PCI_COMMAND */ + if (IS_ALIGNED(pos, 2) && pos == PCI_COMMAND) { + bool new_bme; + bool bme; + + if (size > 2) + return -EINVAL; + + new_bme = !!(get_reg_val(buf, 2) & PCI_COMMAND_MASTER); + bme = !!(vidxd->cfg[pos] & PCI_COMMAND_MASTER); + _pci_cfg_mem_write(vidxd, pos, buf, size); + + /* Flag error if turning off BME while device is enabled */ + if ((bme && !new_bme) && vidxd_state(vidxd) == IDXD_DEVICE_STATE_ENABLED) + vidxd_report_error(vidxd, DSA_ERR_PCI_CFG); + return 0; + } + + switch (pos) { + case PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_5: + if (!IS_ALIGNED(pos, 4)) + return -EINVAL; + return _pci_cfg_bar_write(vidxd, pos, buf, size); + + default: + _pci_cfg_mem_write(vidxd, pos, buf, size); + } return 0; } +static void vidxd_mmio_init_grpcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union group_cap_reg *grp_cap = (union group_cap_reg *)(bar0 + IDXD_GRPCAP_OFFSET); + + /* single group for current implementation */ + grp_cap->token_en = 0; + grp_cap->token_limit = 0; + grp_cap->num_groups = 1; +} + +static void vidxd_mmio_init_grpcfg(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + struct grpcfg *grpcfg = (struct grpcfg *)(bar0 + VIDXD_GRPCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + struct idxd_group *group = wq->group; + int i; + + /* + * At this point, we are only exporting a single workqueue for + * each mdev. So we need to just fake it as first workqueue + * and also mark the available engines in this group. + */ + + /* Set single workqueue and the first one */ + grpcfg->wqs[0] = BIT(0); + grpcfg->engines = 0; + for (i = 0; i < group->num_engines; i++) + grpcfg->engines |= BIT(i); + grpcfg->flags.bits = group->grpcfg.flags.bits; +} + +static void vidxd_mmio_init_wqcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + struct idxd_wq *wq = vidxd->wq; + union wq_cap_reg *wq_cap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET); + + wq_cap->occupancy_int = 0; + wq_cap->occupancy = 0; + wq_cap->priority = 0; + wq_cap->total_wq_size = wq->size; + wq_cap->num_wqs = VIDXD_MAX_WQS; + if (wq_dedicated(wq)) + wq_cap->dedicated_mode = 1; +} + +static void vidxd_mmio_init_wqcfg(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + struct idxd_wq *wq = vidxd->wq; + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + + wqcfg->wq_size = wq->size; + wqcfg->wq_thresh = wq->threshold; + + if (wq_dedicated(wq)) + wqcfg->mode = WQCFG_MODE_DEDICATED; + + if (idxd->hw.gen_cap.block_on_fault && + test_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags)) + wqcfg->bof = 1; + + wqcfg->priority = wq->priority; + wqcfg->max_xfer_shift = idxd->hw.gen_cap.max_xfer_shift; + wqcfg->max_batch_shift = idxd->hw.gen_cap.max_batch_shift; + /* make mode change read-only */ + wqcfg->mode_support = 0; +} + +static void vidxd_mmio_init_engcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union engine_cap_reg *engcap = (union engine_cap_reg *)(bar0 + IDXD_ENGCAP_OFFSET); + struct idxd_wq *wq = vidxd->wq; + struct idxd_group *group = wq->group; + + engcap->num_engines = group->num_engines; +} + +static void vidxd_mmio_init_gencap(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union gen_cap_reg *gencap = (union gen_cap_reg *)(bar0 + IDXD_GENCAP_OFFSET); + + gencap->bits = idxd->hw.gen_cap.bits; + gencap->config_en = 0; + gencap->max_ims_mult = 0; + gencap->cmd_cap = 1; +} + +static void vidxd_mmio_init_cmdcap(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + u32 *cmdcap = (u32 *)(bar0 + IDXD_CMDCAP_OFFSET); + + if (idxd->hw.cmd_cap) + *cmdcap = idxd->hw.cmd_cap; + else + *cmdcap = 0x1ffe; + + *cmdcap |= BIT(IDXD_CMD_REQUEST_INT_HANDLE) | BIT(IDXD_CMD_RELEASE_INT_HANDLE); +} + void vidxd_mmio_init(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union offsets_reg *offsets; + + /* Copy up to where table offset is */ + memcpy_fromio(vidxd->bar0, idxd->reg_base, IDXD_TABLE_OFFSET); + + vidxd_mmio_init_gencap(vidxd); + vidxd_mmio_init_cmdcap(vidxd); + vidxd_mmio_init_wqcap(vidxd); + vidxd_mmio_init_wqcfg(vidxd); + vidxd_mmio_init_grpcap(vidxd); + vidxd_mmio_init_grpcfg(vidxd); + vidxd_mmio_init_engcap(vidxd); + + offsets = (union offsets_reg *)(bar0 + IDXD_TABLE_OFFSET); + offsets->grpcfg = VIDXD_GRPCFG_OFFSET / 0x100; + offsets->wqcfg = VIDXD_WQCFG_OFFSET / 0x100; + offsets->msix_perm = VIDXD_MSIX_PERM_OFFSET / 0x100; + + memset(bar0 + VIDXD_MSIX_PERM_OFFSET, 0, VIDXD_MSIX_PERM_TBL_SZ); +} + +static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val) { /* PLACEHOLDER */ } @@ -63,6 +471,11 @@ void vidxd_reset(struct vdcm_idxd *vidxd) /* PLACEHOLDER */ } +void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) +{ + /* PLACEHOLDER */ +} + int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) { /* PLACEHOLDER */ diff --git a/drivers/dma/idxd/vdev.h b/drivers/dma/idxd/vdev.h index baa30d98f9cb..d23e63eb7f43 100644 --- a/drivers/dma/idxd/vdev.h +++ b/drivers/dma/idxd/vdev.h @@ -6,6 +6,13 @@ #include "mdev.h" +static inline u8 vidxd_state(struct vdcm_idxd *vidxd) +{ + union gensts_reg *gensts = (union gensts_reg *)(vidxd->bar0 + IDXD_GENSTATS_OFFSET); + + return gensts->state; +} + int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); @@ -15,5 +22,6 @@ void vidxd_reset(struct vdcm_idxd *vidxd); int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx); int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd); void vidxd_free_ims_entries(struct vdcm_idxd *vidxd); +void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val); #endif diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h index fdcdfe414223..a0c0475a4626 100644 --- a/include/uapi/linux/idxd.h +++ b/include/uapi/linux/idxd.h @@ -78,6 +78,8 @@ enum dsa_completion_status { DSA_COMP_HW_ERR1, DSA_COMP_HW_ERR_DRB, DSA_COMP_TRANSLATION_FAIL, + DSA_ERR_PCI_CFG = 0x51, + DSA_ERR_CMD_REG, }; #define DSA_COMP_STATUS_MASK 0x7f From patchwork Fri Oct 30 18:52:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391244 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBJL3q1Dz9sSG for ; Sat, 31 Oct 2020 05:52:34 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727372AbgJ3Swc (ORCPT ); Fri, 30 Oct 2020 14:52:32 -0400 Received: from mga01.intel.com ([192.55.52.88]:23396 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726913AbgJ3Swa (ORCPT ); Fri, 30 Oct 2020 14:52:30 -0400 IronPort-SDR: PfNxLOU0V84g5SakC6Qwcs6TZkxnK4SkC5RMA/stYPD5c1t4Pl06p8Chkmsq/hSwm0gmhaQUNR qegsAmUZGDlA== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="186464563" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="186464563" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:07 -0700 IronPort-SDR: +OhzfTUwoK3R6uaRRaSVUb38ye7Zbv/o732paXxR0zFqFO8ExjdCAmq3ZeEgEMKdhdN0cbPpMp iehYvDAtSHXw== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="469608347" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:06 -0700 Subject: [PATCH v4 11/17] dmaengine: idxd: prep for virtual device commands From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:52:06 -0700 Message-ID: <160408392598.912050.13058651162025112198.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Update some of the device commands in order to support usage by the virtual device commands emulated by the vdcm. Expose some of the commands' raw status so the virtual commands can utilize them accordingly. Signed-off-by: Dave Jiang --- drivers/dma/idxd/cdev.c | 2 + drivers/dma/idxd/device.c | 69 +++++++++++++++++++++++++++++---------------- drivers/dma/idxd/idxd.h | 8 +++-- drivers/dma/idxd/irq.c | 2 + drivers/dma/idxd/mdev.c | 2 + drivers/dma/idxd/sysfs.c | 8 +++-- 6 files changed, 56 insertions(+), 35 deletions(-) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index b774bf336347..1f504d1f0c42 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -159,7 +159,7 @@ static int idxd_cdev_release(struct inode *node, struct file *filep) if (rc < 0) dev_err(dev, "wq disable pasid failed.\n"); } else { - idxd_wq_drain(wq); + idxd_wq_drain(wq, NULL); } } diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 8aff07b1acb4..52fc8e64c5fc 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -197,22 +197,25 @@ void idxd_wq_free_resources(struct idxd_wq *wq) sbitmap_queue_free(&wq->sbq); } -int idxd_wq_enable(struct idxd_wq *wq) +int idxd_wq_enable(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 status; + u32 stat; if (wq->state == IDXD_WQ_ENABLED) { dev_dbg(dev, "WQ %d already enabled\n", wq->id); return -ENXIO; } - idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &status); + idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &stat); - if (status != IDXD_CMDSTS_SUCCESS && - status != IDXD_CMDSTS_ERR_WQ_ENABLED) { - dev_dbg(dev, "WQ enable failed: %#x\n", status); + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS && + stat != IDXD_CMDSTS_ERR_WQ_ENABLED) { + dev_dbg(dev, "WQ enable failed: %#x\n", stat); return -ENXIO; } @@ -221,11 +224,11 @@ int idxd_wq_enable(struct idxd_wq *wq) return 0; } -int idxd_wq_disable(struct idxd_wq *wq) +int idxd_wq_disable(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 status, operand; + u32 stat, operand; dev_dbg(dev, "Disabling WQ %d\n", wq->id); @@ -235,10 +238,13 @@ int idxd_wq_disable(struct idxd_wq *wq) } operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); - idxd_cmd_exec(idxd, IDXD_CMD_DISABLE_WQ, operand, &status); + idxd_cmd_exec(idxd, IDXD_CMD_DISABLE_WQ, operand, &stat); + + if (status) + *status = stat; - if (status != IDXD_CMDSTS_SUCCESS) { - dev_dbg(dev, "WQ disable failed: %#x\n", status); + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ disable failed: %#x\n", stat); return -ENXIO; } @@ -247,20 +253,31 @@ int idxd_wq_disable(struct idxd_wq *wq) return 0; } -void idxd_wq_drain(struct idxd_wq *wq) +int idxd_wq_drain(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 operand; + u32 operand, stat; if (wq->state != IDXD_WQ_ENABLED) { dev_dbg(dev, "WQ %d in wrong state: %d\n", wq->id, wq->state); - return; + return 0; } dev_dbg(dev, "Draining WQ %d\n", wq->id); operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); - idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, NULL); + idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, &stat); + + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ drain failed: %#x\n", stat); + return -ENXIO; + } + + dev_dbg(dev, "WQ %d drained\n", wq->id); + return 0; } int idxd_wq_map_portal(struct idxd_wq *wq) @@ -287,11 +304,11 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq) devm_iounmap(dev, wq->portal); } -int idxd_wq_abort(struct idxd_wq *wq) +int idxd_wq_abort(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 operand, status; + u32 operand, stat; dev_dbg(dev, "Abort WQ %d\n", wq->id); if (wq->state != IDXD_WQ_ENABLED) { @@ -301,9 +318,13 @@ int idxd_wq_abort(struct idxd_wq *wq) operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand); - idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &status); - if (status != IDXD_CMDSTS_SUCCESS) { - dev_dbg(dev, "WQ abort failed: %#x\n", status); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &stat); + + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ abort failed: %#x\n", stat); return -ENXIO; } @@ -319,7 +340,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) unsigned int offset; unsigned long flags; - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) return rc; @@ -331,7 +352,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset); spin_unlock_irqrestore(&idxd->dev_lock, flags); - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) return rc; @@ -346,7 +367,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq) unsigned int offset; unsigned long flags; - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) return rc; @@ -358,7 +379,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq) iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset); spin_unlock_irqrestore(&idxd->dev_lock, flags); - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) return rc; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index ab28a1bffb7c..e616d18b53c0 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -350,15 +350,15 @@ int idxd_device_release_int_handle(struct idxd_device *idxd, int handle, /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); void idxd_wq_free_resources(struct idxd_wq *wq); -int idxd_wq_enable(struct idxd_wq *wq); -int idxd_wq_disable(struct idxd_wq *wq); -void idxd_wq_drain(struct idxd_wq *wq); +int idxd_wq_enable(struct idxd_wq *wq, u32 *status); +int idxd_wq_disable(struct idxd_wq *wq, u32 *status); +int idxd_wq_drain(struct idxd_wq *wq, u32 *status); int idxd_wq_map_portal(struct idxd_wq *wq); void idxd_wq_unmap_portal(struct idxd_wq *wq); void idxd_wq_disable_cleanup(struct idxd_wq *wq); int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid); int idxd_wq_disable_pasid(struct idxd_wq *wq); -int idxd_wq_abort(struct idxd_wq *wq); +int idxd_wq_abort(struct idxd_wq *wq, u32 *status); void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid); void idxd_wq_setup_priv(struct idxd_wq *wq, int priv); diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index 593a2f6ed16c..a94fce00767b 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -48,7 +48,7 @@ static void idxd_device_reinit(struct work_struct *work) struct idxd_wq *wq = &idxd->wqs[i]; if (wq->state == IDXD_WQ_ENABLED) { - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) { dev_warn(dev, "Unable to re-enable wq %s\n", dev_name(&wq->conf_dev)); diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c index 3b6febe22a0e..91270121dfbc 100644 --- a/drivers/dma/idxd/mdev.c +++ b/drivers/dma/idxd/mdev.c @@ -85,7 +85,7 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd) vidxd_mmio_init(vidxd); if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) - idxd_wq_disable(wq); + idxd_wq_disable(wq, NULL); } static void idxd_vdcm_release(struct mdev_device *mdev) diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 17f13ebae028..fe5f95509c5c 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -218,7 +218,7 @@ static int idxd_config_bus_probe(struct device *dev) return rc; } - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) { mutex_unlock(&wq->wq_lock); dev_warn(dev, "WQ %d enabling failed: %d\n", @@ -229,7 +229,7 @@ static int idxd_config_bus_probe(struct device *dev) rc = idxd_wq_map_portal(wq); if (rc < 0) { dev_warn(dev, "wq portal mapping failed: %d\n", rc); - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) dev_warn(dev, "IDXD wq disable failed\n"); mutex_unlock(&wq->wq_lock); @@ -287,8 +287,8 @@ static void disable_wq(struct idxd_wq *wq) idxd_wq_unmap_portal(wq); - idxd_wq_drain(wq); - rc = idxd_wq_disable(wq); + idxd_wq_drain(wq, NULL); + rc = idxd_wq_disable(wq, NULL); idxd_wq_free_resources(wq); wq->client_count = 0; From patchwork Fri Oct 30 18:52:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391257 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBLZ52snz9sSP for ; Sat, 31 Oct 2020 05:54:30 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727684AbgJ3SyU (ORCPT ); Fri, 30 Oct 2020 14:54:20 -0400 Received: from mga07.intel.com ([134.134.136.100]:22685 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727097AbgJ3Swb (ORCPT ); Fri, 30 Oct 2020 14:52:31 -0400 IronPort-SDR: FtPUhgjoZyhZ8uNx+YuLP1ppnQwmuxL/Ds4zlUzEJX6YP1e2eDOqdgQTx5m/9rSbZtaK0+1bj3 /8eg2gLCbb0A== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="232830042" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="232830042" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:14 -0700 IronPort-SDR: QKAoVBxguYe+lyC5FbPS/L3VDc+dETEG6m/SONP+IC9a5Qrv3k9CIo+ATQuXq7J4gPbCMcKyo0 gP4CRZBD6kNw== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="335504562" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:13 -0700 Subject: [PATCH v4 12/17] dmaengine: idxd: virtual device commands emulation From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:52:12 -0700 Message-ID: <160408393273.912050.10185046057399795762.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add all the helper functions that supports the emulation of the commands that are submitted to the device command register. Signed-off-by: Dave Jiang --- drivers/dma/idxd/registers.h | 16 +- drivers/dma/idxd/vdev.c | 427 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 438 insertions(+), 5 deletions(-) diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index 5a76fd0ab6ad..17f0d868e5a4 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -119,7 +119,8 @@ union gencfg_reg { union genctrl_reg { struct { u32 softerr_int_en:1; - u32 rsvd:31; + u32 halt_state_int_en:1; + u32 rsvd:30; }; u32 bits; } __packed; @@ -141,6 +142,8 @@ enum idxd_device_status_state { IDXD_DEVICE_STATE_HALT, }; +#define IDXD_GENSTATS_MASK 0x03 + enum idxd_device_reset_type { IDXD_DEVICE_RESET_SOFTWARE = 0, IDXD_DEVICE_RESET_FLR, @@ -153,6 +156,7 @@ enum idxd_device_reset_type { #define IDXD_INTC_CMD 0x02 #define IDXD_INTC_OCCUPY 0x04 #define IDXD_INTC_PERFMON_OVFL 0x08 +#define IDXD_INTC_HALT_STATE 0x10 #define IDXD_CMD_OFFSET 0xa0 union idxd_command_reg { @@ -164,6 +168,7 @@ union idxd_command_reg { }; u32 bits; } __packed; +#define IDXD_CMD_INT_MASK 0x80000000 enum idxd_cmd { IDXD_CMD_ENABLE_DEVICE = 1, @@ -227,10 +232,11 @@ enum idxd_cmdsts_err { /* disable device errors */ IDXD_CMDSTS_ERR_DIS_DEV_EN = 0x31, /* disable WQ, drain WQ, abort WQ, reset WQ */ - IDXD_CMDSTS_ERR_DEV_NOT_EN, + IDXD_CMDSTS_ERR_WQ_NOT_EN, /* request interrupt handle */ IDXD_CMDSTS_ERR_INVAL_INT_IDX = 0x41, IDXD_CMDSTS_ERR_NO_HANDLE, + IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE, }; #define IDXD_CMDCAP_OFFSET 0xb0 @@ -351,6 +357,12 @@ union wqcfg { u32 bits[8]; } __packed; +enum idxd_wq_hw_state { + IDXD_WQ_DEV_DISABLED = 0, + IDXD_WQ_DEV_ENABLED, + IDXD_WQ_DEV_BUSY, +}; + #define WQCFG_PASID_IDX 2 #define WQCFG_PRIV_IDX 2 #define WQCFG_MODE_DEDICATED 1 diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index b38bb676e604..6e7f98d0e52f 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -463,17 +463,438 @@ void vidxd_mmio_init(struct vdcm_idxd *vidxd) static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val) { - /* PLACEHOLDER */ + u8 *bar0 = vidxd->bar0; + u32 *cmd = (u32 *)(bar0 + IDXD_CMD_OFFSET); + u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET); + u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + *cmdsts = val; + dev_dbg(dev, "%s: cmd: %#x status: %#x\n", __func__, *cmd, val); + + if (*cmd & IDXD_CMD_INT_MASK) { + *intcause |= IDXD_INTC_CMD; + vidxd_send_interrupt(vidxd, 0); + } +} + +static void vidxd_enable(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s\n", __func__); + if (gensts->state == IDXD_DEVICE_STATE_ENABLED) + return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_ENABLED); + + /* Check PCI configuration */ + if (!(vidxd->cfg[PCI_COMMAND] & PCI_COMMAND_MASTER)) + return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_BUSMASTER_EN); + + gensts->state = IDXD_DEVICE_STATE_ENABLED; + + return idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_disable(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq; + union wqcfg *wqcfg; + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (gensts->state == IDXD_DEVICE_STATE_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN); + return; + } + + wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + wq = vidxd->wq; + + /* If it is a DWQ, need to disable the DWQ as well */ + if (wq_dedicated(wq)) { + idxd_wq_disable(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable (wq disable) failed: %#x\n", status); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN); + return; + } + } else { + idxd_wq_drain(wq, &status); + if (status) + dev_warn(dev, "vidxd disable (wq drain) failed: %#x\n", status); + } + + wqcfg->wq_state = 0; + gensts->state = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_drain_all(struct vdcm_idxd *vidxd) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s\n", __func__); + + idxd_wq_drain(wq, NULL); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_drain(struct vdcm_idxd *vidxd, int val) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_drain(wq, &status); + if (status) { + dev_dbg(dev, "wq drain failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_abort_all(struct vdcm_idxd *vidxd) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s\n", __func__); + idxd_wq_abort(wq, NULL); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_abort(struct vdcm_idxd *vidxd, int val) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_abort(wq, &status); + if (status) { + dev_dbg(dev, "wq abort failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); } void vidxd_reset(struct vdcm_idxd *vidxd) { - /* PLACEHOLDER */ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct idxd_wq *wq; + + dev_dbg(dev, "%s\n", __func__); + gensts->state = IDXD_DEVICE_STATE_DRAIN; + wq = vidxd->wq; + + if (wq->state == IDXD_WQ_ENABLED) { + idxd_wq_abort(wq, NULL); + idxd_wq_disable(wq, NULL); + } + + vidxd_mmio_init(vidxd); + gensts->state = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_reset(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + wq = vidxd->wq; + dev_dbg(dev, "vidxd reset wq %u:%u\n", 0, wq->id); + + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_abort(wq, &status); + if (status) { + dev_dbg(dev, "vidxd reset wq failed to abort: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_wq_disable(wq, &status); + if (status) { + dev_dbg(dev, "vidxd reset wq failed to disable: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + wqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_alloc_int_handle(struct vdcm_idxd *vidxd, int operand) +{ + bool ims = !!(operand & CMD_INT_HANDLE_IMS); + u32 cmdsts; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + int ims_idx, vidx; + + vidx = operand & GENMASK(15, 0); + + dev_dbg(dev, "allocating int handle for %d\n", vidx); + + /* vidx cannot be 0 since that's emulated and does not require IMS handle */ + if (vidx <= 0 || vidx >= VIDXD_MAX_MSIX_ENTRIES) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX); + return; + } + + if (ims) { + dev_warn(dev, "IMS allocation is not implemented yet\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_NO_HANDLE); + return; + } + + ims_idx = vidxd->irq_entries[vidx - 1].entry->device_msi.hwirq; + vidx--; /* MSIX idx 0 is a slow path interrupt */ + cmdsts = ims_idx << IDXD_CMDSTS_RES_SHIFT; + dev_dbg(dev, "int handle %d:%d\n", vidx, ims_idx); + idxd_complete_command(vidxd, cmdsts); +} + +static void vidxd_release_int_handle(struct vdcm_idxd *vidxd, int operand) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + bool ims = !!(operand & CMD_INT_HANDLE_IMS); + int handle, i; + bool found = false; + + handle = operand & GENMASK(15, 0); + dev_dbg(dev, "allocating int handle %d\n", handle); + + if (ims) { + dev_warn(dev, "IMS allocation is not implemented yet\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE); + return; + } + + for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES - 1; i++) { + if (vidxd->irq_entries[i].entry->device_msi.hwirq == handle) { + found = true; + break; + } + } + + if (!found) { + dev_warn(dev, "Freeing unallocated int handle.\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE); + } + + dev_dbg(dev, "int handle %d released.\n", handle); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_enable(struct vdcm_idxd *vidxd, int wq_id) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wq_cap_reg *wqcap; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_device *idxd; + union wqcfg *vwqcfg, *wqcfg; + unsigned long flags; + int wq_pasid; + u32 status; + int priv; + + if (wq_id >= VIDXD_MAX_WQS) { + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_WQIDX); + return; + } + + idxd = vidxd->idxd; + wq = vidxd->wq; + + dev_dbg(dev, "%s: wq %u:%u\n", __func__, wq_id, wq->id); + + vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET + wq_id * 32); + wqcap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET); + wqcfg = wq->wqcfg; + + if (vidxd_state(vidxd) != IDXD_DEVICE_STATE_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOTEN); + return; + } + + if (vwqcfg->wq_state != IDXD_WQ_DEV_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_ENABLED); + return; + } + + if (wq_dedicated(wq) && wqcap->dedicated_mode == 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_MODE); + return; + } + + wq_pasid = idxd_mdev_get_pasid(mdev); + priv = 1; + + if (wq_pasid >= 0) { + /* Clear pasid_en, pasid, and priv values */ + wqcfg->bits[WQCFG_PASID_IDX] &= ~GENMASK(29, 8); + wqcfg->priv = priv; + wqcfg->pasid_en = 1; + wqcfg->pasid = wq_pasid; + dev_dbg(dev, "program pasid %d in wq %d\n", wq_pasid, wq->id); + spin_lock_irqsave(&idxd->dev_lock, flags); + idxd_wq_setup_pasid(wq, wq_pasid); + idxd_wq_setup_priv(wq, priv); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + idxd_wq_enable(wq, &status); + if (status) { + dev_err(dev, "vidxd enable wq %d failed\n", wq->id); + idxd_complete_command(vidxd, status); + return; + } + } else { + dev_err(dev, "idxd pasid setup failed wq %d wq_pasid %d\n", wq->id, wq_pasid); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_PASID_EN); + return; + } + + vwqcfg->wq_state = IDXD_WQ_DEV_ENABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + union wqcfg *wqcfg; + u8 *bar0 = vidxd->bar0; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + wq = vidxd->wq; + + dev_dbg(dev, "vidxd disable wq %u:%u\n", 0, wq->id); + + wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + /* If it is a DWQ, need to disable the DWQ as well */ + if (wq_dedicated(wq)) { + idxd_wq_disable(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable wq failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + } else { + idxd_wq_drain(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable drain wq failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + } + + wqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); } void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) { - /* PLACEHOLDER */ + union idxd_command_reg *reg = (union idxd_command_reg *)(vidxd->bar0 + IDXD_CMD_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + reg->bits = val; + + dev_dbg(dev, "%s: cmd code: %u reg: %x\n", __func__, reg->cmd, reg->bits); + + switch (reg->cmd) { + case IDXD_CMD_ENABLE_DEVICE: + vidxd_enable(vidxd); + break; + case IDXD_CMD_DISABLE_DEVICE: + vidxd_disable(vidxd); + break; + case IDXD_CMD_DRAIN_ALL: + vidxd_drain_all(vidxd); + break; + case IDXD_CMD_ABORT_ALL: + vidxd_abort_all(vidxd); + break; + case IDXD_CMD_RESET_DEVICE: + vidxd_reset(vidxd); + break; + case IDXD_CMD_ENABLE_WQ: + vidxd_wq_enable(vidxd, reg->operand); + break; + case IDXD_CMD_DISABLE_WQ: + vidxd_wq_disable(vidxd, reg->operand); + break; + case IDXD_CMD_DRAIN_WQ: + vidxd_wq_drain(vidxd, reg->operand); + break; + case IDXD_CMD_ABORT_WQ: + vidxd_wq_abort(vidxd, reg->operand); + break; + case IDXD_CMD_RESET_WQ: + vidxd_wq_reset(vidxd, reg->operand); + break; + case IDXD_CMD_REQUEST_INT_HANDLE: + vidxd_alloc_int_handle(vidxd, reg->operand); + break; + case IDXD_CMD_RELEASE_INT_HANDLE: + vidxd_release_int_handle(vidxd, reg->operand); + break; + default: + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD); + break; + } } int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) From patchwork Fri Oct 30 18:52:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391256 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBLC1VV7z9sV1 for ; Sat, 31 Oct 2020 05:54:11 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727384AbgJ3Swc (ORCPT ); Fri, 30 Oct 2020 14:52:32 -0400 Received: from mga11.intel.com ([192.55.52.93]:29382 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727099AbgJ3Swa (ORCPT ); Fri, 30 Oct 2020 14:52:30 -0400 IronPort-SDR: MB+ja8NbqjsD28m+jdPYr/8c5zhIloVt7n5ftJZxNbp1amNl4rT/975kcu3GZeD4cPFFhxR2LT abxaQhdtkQCQ== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="165153569" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="165153569" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:21 -0700 IronPort-SDR: 8srn9ldgkqzLxGgWXcY3yuJG2bqUVkhuCw8vHEQkPahsbwyj3hpaSnGdbQeL1QfFkSABskFcBn vNroNpvCtf0w== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="527210204" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:19 -0700 Subject: [PATCH v4 13/17] dmaengine: idxd: ims setup for the vdcm From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: Megha Dey , dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:52:19 -0700 Message-ID: <160408393950.912050.6095700006969517668.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add setup for IMS enabling for the mediated device. On the actual hardware the MSIX vector 0 is misc interrupt and handles events such as administrative command completion, error reporting, performance monitor overflow, and etc. The MSIX vectors 1...N are used for descriptor completion interrupts. On the guest kernel, the MSIX interrupts are backed by the mediated device through emulation or IMS vectors. Vector 0 is handled through emulation by the host vdcm. The vector 1 (and more may be supported later) is backed by IMS. IMS can be setup with interrupt handlers via request_irq() just like MSIX interrupts once the relevant IRQ domain is set. The msi_domain_alloc_irqs()/msi_domain_free_irqs() APIs can then be used to allocate interrupts from the above set domain. Register with the irq bypass manager in order to allow the IMS interrupt be injected into the guest and bypass the host. Signed-off-by: Megha Dey Signed-off-by: Dave Jiang --- drivers/dma/Kconfig | 2 ++ drivers/dma/idxd/idxd.h | 1 + drivers/dma/idxd/mdev.c | 26 +++++++++++++++++++++ drivers/dma/idxd/mdev.h | 1 + drivers/dma/idxd/vdev.c | 57 ++++++++++++++++++++++++++++++++++++++--------- kernel/irq/msi.c | 2 ++ 6 files changed, 78 insertions(+), 11 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index c5970e4a3a2c..b0335a4321f5 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -312,6 +312,8 @@ config INTEL_IDXD_MDEV depends on VFIO_MDEV depends on VFIO_MDEV_DEVICE select PCI_SIOV + select IRQ_BYPASS_MANAGER + select IMS_MSI_ARRAY config INTEL_IOATDMA tristate "Intel I/OAT DMA support" diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index e616d18b53c0..72c30826f1bb 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -213,6 +213,7 @@ struct idxd_device { struct workqueue_struct *wq; struct work_struct work; + struct irq_domain *ims_domain; int *int_handles; }; diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c index 91270121dfbc..ed79c85e692e 100644 --- a/drivers/dma/idxd/mdev.c +++ b/drivers/dma/idxd/mdev.c @@ -508,8 +508,12 @@ static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index) dev_dbg(dev, "disable MSIX trigger %d\n", index); if (index) { + struct irq_bypass_producer *producer; u32 auxval; + producer = &vidxd->vdev.producer[index - 1]; + irq_bypass_unregister_producer(producer); + irq_entry = &vidxd->irq_entries[index - 1]; if (irq_entry->irq_set) { free_irq(irq_entry->entry->irq, irq_entry); @@ -553,9 +557,11 @@ static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index) * in i - 1 to the host setup and irq_entries. */ if (index) { + struct irq_bypass_producer *producer; int pasid; u32 auxval; + producer = &vidxd->vdev.producer[index - 1]; irq_entry = &vidxd->irq_entries[index - 1]; pasid = idxd_mdev_get_pasid(mdev); if (pasid < 0) @@ -581,6 +587,14 @@ static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index) irq_set_auxdata(irq_entry->entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval); return rc; } + + producer->token = trigger; + producer->irq = irq_entry->entry->irq; + rc = irq_bypass_register_producer(producer); + if (unlikely(rc)) + dev_info(dev, "irq bypass producer (token %p) registration failed: %d\n", + producer->token, rc); + irq_entry->irq_set = true; } @@ -934,6 +948,7 @@ static const struct mdev_parent_ops idxd_vdcm_ops = { int idxd_mdev_host_init(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; + struct ims_array_info ims_info; int rc; if (!test_bit(IDXD_FLAG_SIOV_SUPPORTED, &idxd->flags)) @@ -950,6 +965,15 @@ int idxd_mdev_host_init(struct idxd_device *idxd) return -EOPNOTSUPP; } + ims_info.max_slots = idxd->ims_size; + ims_info.slots = idxd->reg_base + idxd->ims_offset; + idxd->ims_domain = pci_ims_array_create_msi_irq_domain(idxd->pdev, &ims_info); + if (!idxd->ims_domain) { + dev_warn(dev, "Fail to acquire IMS domain\n"); + iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); + return -ENODEV; + } + return mdev_register_device(dev, &idxd_vdcm_ops); } @@ -958,6 +982,8 @@ void idxd_mdev_host_release(struct idxd_device *idxd) struct device *dev = &idxd->pdev->dev; int rc; + irq_domain_remove(idxd->ims_domain); + mdev_unregister_device(dev); if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { rc = iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); diff --git a/drivers/dma/idxd/mdev.h b/drivers/dma/idxd/mdev.h index b474f2303ba0..266231987331 100644 --- a/drivers/dma/idxd/mdev.h +++ b/drivers/dma/idxd/mdev.h @@ -43,6 +43,7 @@ struct ims_irq_entry { struct idxd_vdev { struct mdev_device *mdev; struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES]; + struct irq_bypass_producer producer[VIDXD_MAX_MSIX_ENTRIES]; }; struct vdcm_idxd { diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index 6e7f98d0e52f..d61bc17624b9 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include "registers.h" #include "idxd.h" @@ -844,6 +845,51 @@ static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask) idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); } +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd) +{ + struct irq_domain *irq_domain; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + int i; + + for (i = 0; i < VIDXD_MAX_MSIX_VECS; i++) + vidxd->irq_entries[i].entry = NULL; + + irq_domain = dev_get_msi_domain(dev); + if (irq_domain) + msi_domain_free_irqs(irq_domain, dev); + else + dev_warn(dev, "No IMS irq domain.\n"); +} + +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) +{ + struct irq_domain *irq_domain; + struct idxd_device *idxd = vidxd->idxd; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + int vecs = VIDXD_MAX_MSIX_VECS - 1; + struct msi_desc *entry; + struct ims_irq_entry *irq_entry; + int rc, i = 0; + + irq_domain = idxd->ims_domain; + dev_set_msi_domain(dev, irq_domain); + rc = msi_domain_alloc_irqs(irq_domain, dev, vecs); + if (rc < 0) + return rc; + + for_each_msi_entry(entry, dev) { + irq_entry = &vidxd->irq_entries[i]; + irq_entry->vidxd = vidxd; + irq_entry->entry = entry; + irq_entry->id = i; + i++; + } + + return 0; +} + void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) { union idxd_command_reg *reg = (union idxd_command_reg *)(vidxd->bar0 + IDXD_CMD_OFFSET); @@ -896,14 +942,3 @@ void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) break; } } - -int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) -{ - /* PLACEHOLDER */ - return 0; -} - -void vidxd_free_ims_entries(struct vdcm_idxd *vidxd) -{ - /* PLACEHOLDER */ -} diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index c7e47c26cd90..89cf60a30803 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -536,6 +536,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, return ops->domain_alloc_irqs(domain, dev, nvec); } +EXPORT_SYMBOL(msi_domain_alloc_irqs); void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) { @@ -572,6 +573,7 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) return ops->domain_free_irqs(domain, dev); } +EXPORT_SYMBOL(msi_domain_free_irqs); /** * msi_get_domain_info - Get the MSI interrupt domain info for @domain From patchwork Fri Oct 30 18:52:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391254 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBL91k5Dz9sVD for ; Sat, 31 Oct 2020 05:54:09 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727392AbgJ3Swd (ORCPT ); Fri, 30 Oct 2020 14:52:33 -0400 Received: from mga17.intel.com ([192.55.52.151]:57349 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727159AbgJ3Swb (ORCPT ); Fri, 30 Oct 2020 14:52:31 -0400 IronPort-SDR: CSDWrC4kAupOvKgshVv5DvzuIkbdknOO/yxH2DMzfaE3AnNtSLs1WtydJDb454W4QPMTW1gRiI eyZarsuj78vQ== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="148507808" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="148507808" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:27 -0700 IronPort-SDR: hH1MMuCBHvpYrafB1x2GvMuuMv52ZUUNhPpXdhV7FOJmh1v8lAoH6bjtHuZtf5KUSTW6uQRGG7 62zYq436VlqA== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="537163791" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:26 -0700 Subject: [PATCH v4 14/17] dmaengine: idxd: add mdev type as a new wq type From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:52:25 -0700 Message-ID: <160408394597.912050.15142293272388686868.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add "mdev" wq type and support helpers. The mdev wq type marks the wq to be utilized as a VFIO mediated device. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 2 ++ drivers/dma/idxd/sysfs.c | 13 +++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 72c30826f1bb..4e583fdd15d2 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -72,6 +72,7 @@ enum idxd_wq_type { IDXD_WQT_NONE = 0, IDXD_WQT_KERNEL, IDXD_WQT_USER, + IDXD_WQT_MDEV, }; struct idxd_cdev { @@ -321,6 +322,7 @@ void idxd_cleanup_sysfs(struct idxd_device *idxd); int idxd_register_driver(void); void idxd_unregister_driver(void); struct bus_type *idxd_get_bus_type(struct idxd_device *idxd); +bool is_idxd_wq_mdev(struct idxd_wq *wq); /* device interrupt control */ irqreturn_t idxd_irq_handler(int vec, void *data); diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index fe5f95509c5c..5b79d9019f2e 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -14,6 +14,7 @@ static char *idxd_wq_type_names[] = { [IDXD_WQT_NONE] = "none", [IDXD_WQT_KERNEL] = "kernel", [IDXD_WQT_USER] = "user", + [IDXD_WQT_MDEV] = "mdev", }; static void idxd_conf_device_release(struct device *dev) @@ -69,6 +70,11 @@ static inline bool is_idxd_wq_cdev(struct idxd_wq *wq) return wq->type == IDXD_WQT_USER; } +inline bool is_idxd_wq_mdev(struct idxd_wq *wq) +{ + return wq->type == IDXD_WQT_MDEV ? true : false; +} + static int idxd_config_bus_match(struct device *dev, struct device_driver *drv) { @@ -1094,8 +1100,9 @@ static ssize_t wq_type_show(struct device *dev, return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_KERNEL]); case IDXD_WQT_USER: - return sprintf(buf, "%s\n", - idxd_wq_type_names[IDXD_WQT_USER]); + return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_USER]); + case IDXD_WQT_MDEV: + return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_MDEV]); case IDXD_WQT_NONE: default: return sprintf(buf, "%s\n", @@ -1122,6 +1129,8 @@ static ssize_t wq_type_store(struct device *dev, wq->type = IDXD_WQT_KERNEL; else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_USER])) wq->type = IDXD_WQT_USER; + else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_MDEV])) + wq->type = IDXD_WQT_MDEV; else return -EINVAL; From patchwork Fri Oct 30 18:52:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391253 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBL13SpDz9sSP for ; Sat, 31 Oct 2020 05:54:01 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726225AbgJ3Sxy (ORCPT ); Fri, 30 Oct 2020 14:53:54 -0400 Received: from mga05.intel.com ([192.55.52.43]:20400 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727406AbgJ3Swf (ORCPT ); Fri, 30 Oct 2020 14:52:35 -0400 IronPort-SDR: Yqxli0gOw28aQKPmHKXBGe4JdYWgrtWGpBLMIayffV1R5dxvxUEe8eS+PVavjnhfmpy1F9myI2 8omB7DJ3HGZw== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="253357974" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="253357974" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:34 -0700 IronPort-SDR: 8apaJQkCdj7Ru9nFrYM5dDS8Q7Ah7+wG255Tv3/sTBnYZgALfpj3NZGEO0Td5Y9WRUiE2hG90F /DX8kCgySRuw== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="351960852" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:33 -0700 Subject: [PATCH v4 15/17] dmaengine: idxd: add dedicated wq mdev type From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:52:32 -0700 Message-ID: <160408395282.912050.4836928882705571237.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add the support code for "1dwq" mdev type. This mdev type follows the standard VFIO mdev flow. The "1dwq" type will export a single dedicated wq to the mdev. The dwq will have read-only configuration that is configured by the host. The mdev type does not support PASID and SVA and will match the stage 1 driver in functional support. For backward compatibility, the mdev will maintain the DSA spec definition of this mdev type once the commit goes upstream. Signed-off-by: Dave Jiang --- drivers/dma/idxd/mdev.c | 141 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 133 insertions(+), 8 deletions(-) diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c index ed79c85e692e..16b56f8f7fc1 100644 --- a/drivers/dma/idxd/mdev.c +++ b/drivers/dma/idxd/mdev.c @@ -111,20 +111,58 @@ static void idxd_vdcm_release(struct mdev_device *mdev) mutex_unlock(&vidxd->dev_lock); } +static struct idxd_wq *find_any_dwq(struct idxd_device *idxd) +{ + int i; + struct idxd_wq *wq; + unsigned long flags; + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < idxd->max_wqs; i++) { + wq = &idxd->wqs[i]; + + if (wq->state != IDXD_WQ_ENABLED) + continue; + + if (!wq_dedicated(wq)) + continue; + + if (idxd_wq_refcount(wq) != 0) + continue; + + spin_unlock_irqrestore(&idxd->dev_lock, flags); + mutex_lock(&wq->wq_lock); + if (idxd_wq_refcount(wq)) { + spin_lock_irqsave(&idxd->dev_lock, flags); + continue; + } + + idxd_wq_get(wq); + mutex_unlock(&wq->wq_lock); + return wq; + } + + spin_unlock_irqrestore(&idxd->dev_lock, flags); + return NULL; +} + static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev, struct vdcm_idxd_type *type) { struct vdcm_idxd *vidxd; struct idxd_wq *wq = NULL; + int rc; - /* PLACEHOLDER, wq matching comes later */ - + if (type->type == IDXD_MDEV_TYPE_1_DWQ) + wq = find_any_dwq(idxd); if (!wq) return ERR_PTR(-ENODEV); vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); - if (!vidxd) - return ERR_PTR(-ENOMEM); + if (!vidxd) { + rc = -ENOMEM; + goto err; + } mutex_init(&vidxd->dev_lock); vidxd->idxd = idxd; @@ -135,14 +173,23 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev vidxd->num_wqs = VIDXD_MAX_WQS; idxd_vdcm_init(vidxd); - mutex_lock(&wq->wq_lock); - idxd_wq_get(wq); - mutex_unlock(&wq->wq_lock); return vidxd; + + err: + mutex_lock(&wq->wq_lock); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + return ERR_PTR(rc); } -static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES]; +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES] = { + { + .name = "1dwq-v1", + .description = "IDXD MDEV with 1 dedicated workqueue", + .type = IDXD_MDEV_TYPE_1_DWQ, + }, +}; static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev, const char *name) @@ -934,7 +981,85 @@ static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd, return rc; } +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf) +{ + struct vdcm_idxd_type *type; + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + + if (type) + return sprintf(buf, "%s\n", type->description); + + return -EINVAL; +} +static MDEV_TYPE_ATTR_RO(name); + +static int find_available_mdev_instances(struct idxd_device *idxd, struct vdcm_idxd_type *type) +{ + int count = 0, i; + unsigned long flags; + + if (type->type != IDXD_MDEV_TYPE_1_DWQ) + return 0; + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq; + + wq = &idxd->wqs[i]; + if (!is_idxd_wq_mdev(wq) || !wq_dedicated(wq) || idxd_wq_refcount(wq)) + continue; + + count++; + } + spin_unlock_irqrestore(&idxd->dev_lock, flags); + + return count; +} + +static ssize_t available_instances_show(struct kobject *kobj, + struct device *dev, char *buf) +{ + int count; + struct idxd_device *idxd = dev_get_drvdata(dev); + struct vdcm_idxd_type *type; + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + if (!type) + return -EINVAL; + + count = find_available_mdev_instances(idxd, type); + + return sprintf(buf, "%d\n", count); +} +static MDEV_TYPE_ATTR_RO(available_instances); + +static ssize_t device_api_show(struct kobject *kobj, struct device *dev, + char *buf) +{ + return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING); +} +static MDEV_TYPE_ATTR_RO(device_api); + +static struct attribute *idxd_mdev_types_attrs[] = { + &mdev_type_attr_name.attr, + &mdev_type_attr_device_api.attr, + &mdev_type_attr_available_instances.attr, + NULL, +}; + +static struct attribute_group idxd_mdev_type_group0 = { + .name = "1dwq-v1", + .attrs = idxd_mdev_types_attrs, +}; + +static struct attribute_group *idxd_mdev_type_groups[] = { + &idxd_mdev_type_group0, + NULL, +}; + static const struct mdev_parent_ops idxd_vdcm_ops = { + .supported_type_groups = idxd_mdev_type_groups, .create = idxd_vdcm_create, .remove = idxd_vdcm_remove, .open = idxd_vdcm_open, From patchwork Fri Oct 30 18:52:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391251 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBKm1Y49z9sV1 for ; Sat, 31 Oct 2020 05:53:48 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727480AbgJ3Swu (ORCPT ); Fri, 30 Oct 2020 14:52:50 -0400 Received: from mga06.intel.com ([134.134.136.31]:65456 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727457AbgJ3Swn (ORCPT ); Fri, 30 Oct 2020 14:52:43 -0400 IronPort-SDR: r/uik6O+cL/PotGwbyf43M4mKE3wubuCyiRmVRXwecL88iakbE5GFWiIVLg0Ueyyd71CeIEZtE j5YTqEOmwQ6Q== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="230287931" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="230287931" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:42 -0700 IronPort-SDR: yi6ih3tbmuV2vdf2jorI4qWEUeO2gw83dlmTJAuOwlUFIwikNDtbviJIfE4Wzek81sfINS5KOo uJ+q3CtMGdYw== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="324168355" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:39 -0700 Subject: [PATCH v4 16/17] dmaengine: idxd: add new wq state for mdev From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:52:39 -0700 Message-ID: <160408395903.912050.14092194237099902168.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org When a dedicated wq is enabled as mdev, we must disable the wq on the device in order to program the pasid to the wq. Introduce a wq state IDXD_WQ_LOCKED that is software state only in order to prevent the user from modifying the configuration while mdev wq is in this state. While in this state, the wq is not in DISABLED state and will prevent any modifications to the configuration. It is also not in the ENABLED state and therefore prevents any actions allowed in the ENABLED state. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 1 + drivers/dma/idxd/mdev.c | 4 +++- drivers/dma/idxd/sysfs.c | 2 ++ 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 4e583fdd15d2..03275ad9e849 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -61,6 +61,7 @@ struct idxd_group { enum idxd_wq_state { IDXD_WQ_DISABLED = 0, IDXD_WQ_ENABLED, + IDXD_WQ_LOCKED, }; enum idxd_wq_flag { diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c index 16b56f8f7fc1..3db7717a10c0 100644 --- a/drivers/dma/idxd/mdev.c +++ b/drivers/dma/idxd/mdev.c @@ -84,8 +84,10 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd) vidxd_mmio_init(vidxd); - if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) + if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) { idxd_wq_disable(wq, NULL); + wq->state = IDXD_WQ_LOCKED; + } } static void idxd_vdcm_release(struct mdev_device *mdev) diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 5b79d9019f2e..3bbbd413980e 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -821,6 +821,8 @@ static ssize_t wq_state_show(struct device *dev, return sprintf(buf, "disabled\n"); case IDXD_WQ_ENABLED: return sprintf(buf, "enabled\n"); + case IDXD_WQ_LOCKED: + return sprintf(buf, "locked\n"); } return sprintf(buf, "unknown\n"); From patchwork Fri Oct 30 18:52:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1391252 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CNBKn1bJTz9sSW for ; Sat, 31 Oct 2020 05:53:49 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727448AbgJ3Swt (ORCPT ); Fri, 30 Oct 2020 14:52:49 -0400 Received: from mga12.intel.com ([192.55.52.136]:35263 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727482AbgJ3Sws (ORCPT ); Fri, 30 Oct 2020 14:52:48 -0400 IronPort-SDR: yBzCYjpCKjaKTHOtDgB2KJWFsscKfdS+ibukx1R1m/ObbBrUNmkZtk+bV/pE2xiYWCux35AXUF qQXrwdubE3lQ== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="147937020" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="147937020" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:48 -0700 IronPort-SDR: CcqNTyjW675Bk6IYiyG6bHH0TCiyjC2KDKMqn8JT90vViBpjuOXPbj7xnLKUA1Q/aTu4zPtSY3 2gdOHBnsw49Q== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="469608487" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 11:52:47 -0700 Subject: [PATCH v4 17/17] dmaengine: idxd: add error notification from host driver to mediated device From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, tglx@linutronix.de, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 30 Oct 2020 11:52:46 -0700 Message-ID: <160408396666.912050.13272769874427386756.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> References: <160408357912.912050.17005584526266191420.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org When a device error occurs, the mediated device need to be notified in order to notify the guest of device error. Add support to notify the specific mdev when an error is wq specific and broadcast errors to all mdev when it's a generic device error. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 12 ++++++++++++ drivers/dma/idxd/irq.c | 4 ++++ drivers/dma/idxd/vdev.c | 32 ++++++++++++++++++++++++++++++++ drivers/dma/idxd/vdev.h | 1 + 4 files changed, 49 insertions(+) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 03275ad9e849..2f9e44bcd436 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -393,4 +393,16 @@ int idxd_mdev_host_init(struct idxd_device *idxd); void idxd_mdev_host_release(struct idxd_device *idxd); int idxd_mdev_get_pasid(struct mdev_device *mdev); +#ifdef CONFIG_INTEL_IDXD_MDEV +void idxd_vidxd_send_errors(struct idxd_device *idxd); +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq); +#else +static inline void idxd_vidxd_send_errors(struct idxd_device *idxd) +{ +} +static inline void idxd_wq_vidxd_send_errors(struct idxd_wq *wq) +{ +} +#endif /* CONFIG_INTEL_IDXD_MDEV */ + #endif diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index a94fce00767b..9219dcf0a34d 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -137,6 +137,8 @@ irqreturn_t idxd_misc_thread(int vec, void *data) if (wq->type == IDXD_WQT_USER) wake_up_interruptible(&wq->idxd_cdev.err_queue); + else if (wq->type == IDXD_WQT_MDEV) + idxd_wq_vidxd_send_errors(wq); } else { int i; @@ -145,6 +147,8 @@ irqreturn_t idxd_misc_thread(int vec, void *data) if (wq->type == IDXD_WQT_USER) wake_up_interruptible(&wq->idxd_cdev.err_queue); + else if (wq->type == IDXD_WQT_MDEV) + idxd_wq_vidxd_send_errors(wq); } } diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index d61bc17624b9..fd42674490d6 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -942,3 +942,35 @@ void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) break; } } + +static void vidxd_send_errors(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET); + union genctrl_reg *genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET); + int i; + + if (swerr->valid) { + if (!swerr->overflow) + swerr->overflow = 1; + return; + } + + lockdep_assert_held(&idxd->dev_lock); + for (i = 0; i < 4; i++) { + swerr->bits[i] = idxd->sw_err.bits[i]; + swerr++; + } + + if (genctrl->softerr_int_en) + vidxd_send_interrupt(vidxd, 0); +} + +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq) +{ + struct vdcm_idxd *vidxd; + + list_for_each_entry(vidxd, &wq->vdcm_list, list) + vidxd_send_errors(vidxd); +} diff --git a/drivers/dma/idxd/vdev.h b/drivers/dma/idxd/vdev.h index d23e63eb7f43..98810ae95782 100644 --- a/drivers/dma/idxd/vdev.h +++ b/drivers/dma/idxd/vdev.h @@ -23,5 +23,6 @@ int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx); int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd); void vidxd_free_ims_entries(struct vdcm_idxd *vidxd); void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val); +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq); #endif