From patchwork Tue Apr 21 23:33:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274552 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496Kdh4WPHz9sSb for ; Wed, 22 Apr 2020 09:34:00 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726379AbgDUXdz (ORCPT ); Tue, 21 Apr 2020 19:33:55 -0400 Received: from mga03.intel.com ([134.134.136.65]:32224 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726061AbgDUXdz (ORCPT ); Tue, 21 Apr 2020 19:33:55 -0400 IronPort-SDR: EZkGBLwiwN1TodTlXcb1ilP1GrDtgkbErHzB/1/t3SncGMrpScyD/LKPi5wUrzSKp5r4IuPQQf mGA9et/JtiUA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:33:54 -0700 IronPort-SDR: R7c0g+JEGIqbU/dPIxUbI3KDjWZuI6JTjFuXKPTVsw8RrX18rXu7PT5me5fZxDPwH5iy2UyJol S3ll20lx7V/g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="279818664" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga004.fm.intel.com with ESMTP; 21 Apr 2020 16:33:53 -0700 Subject: [PATCH RFC 01/15] drivers/base: Introduce platform_msi_ops From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:33:53 -0700 Message-ID: <158751203294.36773.11436842117908325764.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Megha Dey This is a preparatory patch to introduce Interrupt Message Store (IMS). Until now, platform-msi.c provided a generic way to handle non-PCI MSI interrupts. Platform-msi uses its parent chip's mask/unmask routines and only provides a way to write the message in the generating device. Newly creeping non-PCI complaint MSI-like interrupts (Intel's IMS for instance) might need to provide a device specific mask and unmask callback as well, apart from the write function. Hence, introduce a new structure platform_msi_ops, which would provide device specific write function as well as other device specific callbacks (mask/unmask). Signed-off-by: Megha Dey --- drivers/base/platform-msi.c | 27 ++++++++++++++------------- drivers/dma/mv_xor_v2.c | 6 +++++- drivers/dma/qcom/hidma.c | 6 +++++- drivers/iommu/arm-smmu-v3.c | 6 +++++- drivers/irqchip/irq-mbigen.c | 8 ++++++-- drivers/irqchip/irq-mvebu-icu.c | 6 +++++- drivers/mailbox/bcm-flexrm-mailbox.c | 6 +++++- drivers/perf/arm_smmuv3_pmu.c | 6 +++++- include/linux/msi.h | 24 ++++++++++++++++++------ 9 files changed, 68 insertions(+), 27 deletions(-) diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c index 8da314b81eab..1a3af5f33802 100644 --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -21,11 +21,11 @@ * and the callback to write the MSI message. */ struct platform_msi_priv_data { - struct device *dev; - void *host_data; - msi_alloc_info_t arg; - irq_write_msi_msg_t write_msg; - int devid; + struct device *dev; + void *host_data; + msi_alloc_info_t arg; + const struct platform_msi_ops *ops; + int devid; }; /* The devid allocator */ @@ -83,7 +83,7 @@ static void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg) priv_data = desc->platform.msi_priv_data; - priv_data->write_msg(desc, msg); + priv_data->ops->write_msg(desc, msg); } static void platform_msi_update_chip_ops(struct msi_domain_info *info) @@ -194,7 +194,7 @@ struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode, static struct platform_msi_priv_data * platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec, - irq_write_msi_msg_t write_msi_msg) + const struct platform_msi_ops *platform_ops) { struct platform_msi_priv_data *datap; /* @@ -203,7 +203,8 @@ platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec, * accordingly (which would impact the max number of MSI * capable devices). */ - if (!dev->msi_domain || !write_msi_msg || !nvec || nvec > MAX_DEV_MSIS) + if (!dev->msi_domain || !platform_ops->write_msg || !nvec || + nvec > MAX_DEV_MSIS) return ERR_PTR(-EINVAL); if (dev->msi_domain->bus_token != DOMAIN_BUS_PLATFORM_MSI) { @@ -227,7 +228,7 @@ platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec, return ERR_PTR(err); } - datap->write_msg = write_msi_msg; + datap->ops = platform_ops; datap->dev = dev; return datap; @@ -249,12 +250,12 @@ static void platform_msi_free_priv_data(struct platform_msi_priv_data *data) * Zero for success, or an error code in case of failure */ int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, - irq_write_msi_msg_t write_msi_msg) + const struct platform_msi_ops *platform_ops) { struct platform_msi_priv_data *priv_data; int err; - priv_data = platform_msi_alloc_priv_data(dev, nvec, write_msi_msg); + priv_data = platform_msi_alloc_priv_data(dev, nvec, platform_ops); if (IS_ERR(priv_data)) return PTR_ERR(priv_data); @@ -324,7 +325,7 @@ struct irq_domain * __platform_msi_create_device_domain(struct device *dev, unsigned int nvec, bool is_tree, - irq_write_msi_msg_t write_msi_msg, + const struct platform_msi_ops *platform_ops, const struct irq_domain_ops *ops, void *host_data) { @@ -332,7 +333,7 @@ __platform_msi_create_device_domain(struct device *dev, struct irq_domain *domain; int err; - data = platform_msi_alloc_priv_data(dev, nvec, write_msi_msg); + data = platform_msi_alloc_priv_data(dev, nvec, platform_ops); if (IS_ERR(data)) return NULL; diff --git a/drivers/dma/mv_xor_v2.c b/drivers/dma/mv_xor_v2.c index 157c959311ea..426f520f3765 100644 --- a/drivers/dma/mv_xor_v2.c +++ b/drivers/dma/mv_xor_v2.c @@ -706,6 +706,10 @@ static int mv_xor_v2_resume(struct platform_device *dev) return 0; } +static const struct platform_msi_ops mv_xor_v2_msi_ops = { + .write_msg = mv_xor_v2_set_msi_msg, +}; + static int mv_xor_v2_probe(struct platform_device *pdev) { struct mv_xor_v2_device *xor_dev; @@ -761,7 +765,7 @@ static int mv_xor_v2_probe(struct platform_device *pdev) } ret = platform_msi_domain_alloc_irqs(&pdev->dev, 1, - mv_xor_v2_set_msi_msg); + &mv_xor_v2_msi_ops); if (ret) goto disable_clk; diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c index 411f91fde734..65371535ba26 100644 --- a/drivers/dma/qcom/hidma.c +++ b/drivers/dma/qcom/hidma.c @@ -678,6 +678,10 @@ static void hidma_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg) writel(msg->data, dmadev->dev_evca + 0x120); } } + +static const struct platform_msi_ops hidma_msi_ops = { + .write_msg = hidma_write_msi_msg, +}; #endif static void hidma_free_msis(struct hidma_dev *dmadev) @@ -703,7 +707,7 @@ static int hidma_request_msi(struct hidma_dev *dmadev, struct msi_desc *failed_desc = NULL; rc = platform_msi_domain_alloc_irqs(&pdev->dev, HIDMA_MSI_INTS, - hidma_write_msi_msg); + &hidma_msi_ops); if (rc) return rc; diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 82508730feb7..764e284202f1 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -3425,6 +3425,10 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg) writel_relaxed(ARM_SMMU_MEMATTR_DEVICE_nGnRE, smmu->base + cfg[2]); } +static const struct platform_msi_ops arm_smmu_msi_ops = { + .write_msg = arm_smmu_write_msi_msg, +}; + static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) { struct msi_desc *desc; @@ -3449,7 +3453,7 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) } /* Allocate MSIs for evtq, gerror and priq. Ignore cmdq */ - ret = platform_msi_domain_alloc_irqs(dev, nvec, arm_smmu_write_msi_msg); + ret = platform_msi_domain_alloc_irqs(dev, nvec, &arm_smmu_msi_ops); if (ret) { dev_warn(dev, "failed to allocate MSIs - falling back to wired irqs\n"); return; diff --git a/drivers/irqchip/irq-mbigen.c b/drivers/irqchip/irq-mbigen.c index 6b566bba263b..ff5b75751974 100644 --- a/drivers/irqchip/irq-mbigen.c +++ b/drivers/irqchip/irq-mbigen.c @@ -226,6 +226,10 @@ static const struct irq_domain_ops mbigen_domain_ops = { .free = irq_domain_free_irqs_common, }; +static const struct platform_msi_ops mbigen_msi_ops = { + .write_msg = mbigen_write_msg, +}; + static int mbigen_of_create_domain(struct platform_device *pdev, struct mbigen_device *mgn_chip) { @@ -254,7 +258,7 @@ static int mbigen_of_create_domain(struct platform_device *pdev, } domain = platform_msi_create_device_domain(&child->dev, num_pins, - mbigen_write_msg, + &mbigen_msi_ops, &mbigen_domain_ops, mgn_chip); if (!domain) { @@ -302,7 +306,7 @@ static int mbigen_acpi_create_domain(struct platform_device *pdev, return -EINVAL; domain = platform_msi_create_device_domain(&pdev->dev, num_pins, - mbigen_write_msg, + &mbigen_msi_ops, &mbigen_domain_ops, mgn_chip); if (!domain) diff --git a/drivers/irqchip/irq-mvebu-icu.c b/drivers/irqchip/irq-mvebu-icu.c index 547045d89c4b..49b6390470bb 100644 --- a/drivers/irqchip/irq-mvebu-icu.c +++ b/drivers/irqchip/irq-mvebu-icu.c @@ -295,6 +295,10 @@ static const struct of_device_id mvebu_icu_subset_of_match[] = { {}, }; +static const struct platform_msi_ops mvebu_icu_msi_ops = { + .write_msg = mvebu_icu_write_msg, +}; + static int mvebu_icu_subset_probe(struct platform_device *pdev) { struct mvebu_icu_msi_data *msi_data; @@ -324,7 +328,7 @@ static int mvebu_icu_subset_probe(struct platform_device *pdev) return -ENODEV; irq_domain = platform_msi_create_device_tree_domain(dev, ICU_MAX_IRQS, - mvebu_icu_write_msg, + &mvebu_icu_msi_ops, &mvebu_icu_domain_ops, msi_data); if (!irq_domain) { diff --git a/drivers/mailbox/bcm-flexrm-mailbox.c b/drivers/mailbox/bcm-flexrm-mailbox.c index bee33abb5308..0268337e08e3 100644 --- a/drivers/mailbox/bcm-flexrm-mailbox.c +++ b/drivers/mailbox/bcm-flexrm-mailbox.c @@ -1492,6 +1492,10 @@ static void flexrm_mbox_msi_write(struct msi_desc *desc, struct msi_msg *msg) writel_relaxed(msg->data, ring->regs + RING_MSI_DATA_VALUE); } +static const struct platform_msi_ops flexrm_mbox_msi_ops = { + .write_msg = flexrm_mbox_msi_write, +}; + static int flexrm_mbox_probe(struct platform_device *pdev) { int index, ret = 0; @@ -1604,7 +1608,7 @@ static int flexrm_mbox_probe(struct platform_device *pdev) /* Allocate platform MSIs for each ring */ ret = platform_msi_domain_alloc_irqs(dev, mbox->num_rings, - flexrm_mbox_msi_write); + &flexrm_mbox_msi_ops); if (ret) goto fail_destroy_cmpl_pool; diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index f01a57e5a5f3..bcbd7f5e3d0f 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -652,6 +652,10 @@ static void smmu_pmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg) pmu->reg_base + SMMU_PMCG_IRQ_CFG2); } +static const struct platform_msi_ops smmu_pmu_msi_ops = { + .write_msg = smmu_pmu_write_msi_msg, +}; + static void smmu_pmu_setup_msi(struct smmu_pmu *pmu) { struct msi_desc *desc; @@ -665,7 +669,7 @@ static void smmu_pmu_setup_msi(struct smmu_pmu *pmu) if (!(readl(pmu->reg_base + SMMU_PMCG_CFGR) & SMMU_PMCG_CFGR_MSI)) return; - ret = platform_msi_domain_alloc_irqs(dev, 1, smmu_pmu_write_msi_msg); + ret = platform_msi_domain_alloc_irqs(dev, 1, &smmu_pmu_msi_ops); if (ret) { dev_warn(dev, "failed to allocate MSIs\n"); return; diff --git a/include/linux/msi.h b/include/linux/msi.h index 8ad679e9d9c0..8e08907d70cb 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -321,6 +321,18 @@ enum { MSI_FLAG_LEVEL_CAPABLE = (1 << 6), }; +/* + * platform_msi_ops - Callbacks for platform MSI ops + * @irq_mask: mask an interrupt source + * @irq_unmask: unmask an interrupt source + * @irq_write_msi_msg: write message content + */ +struct platform_msi_ops { + unsigned int (*irq_mask)(struct msi_desc *desc); + unsigned int (*irq_unmask)(struct msi_desc *desc); + irq_write_msi_msg_t write_msg; +}; + int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force); @@ -336,7 +348,7 @@ struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode, struct msi_domain_info *info, struct irq_domain *parent); int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, - irq_write_msi_msg_t write_msi_msg); + const struct platform_msi_ops *platform_ops); void platform_msi_domain_free_irqs(struct device *dev); /* When an MSI domain is used as an intermediate domain */ @@ -348,14 +360,14 @@ struct irq_domain * __platform_msi_create_device_domain(struct device *dev, unsigned int nvec, bool is_tree, - irq_write_msi_msg_t write_msi_msg, + const struct platform_msi_ops *platform_ops, const struct irq_domain_ops *ops, void *host_data); -#define platform_msi_create_device_domain(dev, nvec, write, ops, data) \ - __platform_msi_create_device_domain(dev, nvec, false, write, ops, data) -#define platform_msi_create_device_tree_domain(dev, nvec, write, ops, data) \ - __platform_msi_create_device_domain(dev, nvec, true, write, ops, data) +#define platform_msi_create_device_domain(dev, nvec, p_ops, ops, data) \ + __platform_msi_create_device_domain(dev, nvec, false, p_ops, ops, data) +#define platform_msi_create_device_tree_domain(dev, nvec, p_ops, ops, data) \ + __platform_msi_create_device_domain(dev, nvec, true, p_ops, ops, data) int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs); From patchwork Tue Apr 21 23:33:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274553 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496Kds5tv7z9sSK for ; Wed, 22 Apr 2020 09:34:09 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725822AbgDUXeJ (ORCPT ); Tue, 21 Apr 2020 19:34:09 -0400 Received: from mga05.intel.com ([192.55.52.43]:10107 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726373AbgDUXeI (ORCPT ); Tue, 21 Apr 2020 19:34:08 -0400 IronPort-SDR: gj/HV3+RyhDo1NzmEmmHSGrWoIlFF2YW4aQcwiGJS0XDXn4dLsfKnSXz170QIS6z87Vhfts4xH be6zQnHEziZg== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:34:00 -0700 IronPort-SDR: iJ1MRYp5BPfCtqBEdZDaPFs2OEEnUSCFISx+8YUDKbWRQr5FhFLQ6m1Q5rW+zca6goptwqkxK4 lUsXm58ZWXEQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="258876422" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga006.jf.intel.com with ESMTP; 21 Apr 2020 16:33:59 -0700 Subject: [PATCH RFC 02/15] drivers/base: Introduce a new platform-msi list From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:33:59 -0700 Message-ID: <158751203902.36773.2662739280103265908.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Megha Dey This is a preparatory patch to introduce Interrupt Message Store (IMS). The struct device has a linked list ('msi_list') of the MSI (msi/msi-x, platform-msi) descriptors of that device. This list holds only 1 type of descriptor since it is not possible for a device to support more than one of these descriptors concurrently. However, with the introduction of IMS, a device can support IMS as well as MSI-X at the same time. Instead of sharing this list between IMS (a type of platform-msi) and MSI-X descriptors, introduce a new linked list, platform_msi_list, which will hold all the platform-msi descriptors. Thus, msi_list will point to the MSI/MSIX descriptors of a device, while platform_msi_list will point to the platform-msi descriptors of a device. Signed-off-by: Megha Dey --- drivers/base/core.c | 1 + drivers/base/platform-msi.c | 19 +++++++++++-------- include/linux/device.h | 2 ++ include/linux/list.h | 36 ++++++++++++++++++++++++++++++++++++ include/linux/msi.h | 21 +++++++++++++++++++++ kernel/irq/msi.c | 16 ++++++++-------- 6 files changed, 79 insertions(+), 16 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 139cdf7e7327..5a0116d1a8d0 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -1984,6 +1984,7 @@ void device_initialize(struct device *dev) set_dev_node(dev, -1); #ifdef CONFIG_GENERIC_MSI_IRQ INIT_LIST_HEAD(&dev->msi_list); + INIT_LIST_HEAD(&dev->platform_msi_list); #endif INIT_LIST_HEAD(&dev->links.consumers); INIT_LIST_HEAD(&dev->links.suppliers); diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c index 1a3af5f33802..b25c52f734dc 100644 --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -110,7 +110,8 @@ static void platform_msi_free_descs(struct device *dev, int base, int nvec) { struct msi_desc *desc, *tmp; - list_for_each_entry_safe(desc, tmp, dev_to_msi_list(dev), list) { + list_for_each_entry_safe(desc, tmp, dev_to_platform_msi_list(dev), + list) { if (desc->platform.msi_index >= base && desc->platform.msi_index < (base + nvec)) { list_del(&desc->list); @@ -127,8 +128,8 @@ static int platform_msi_alloc_descs_with_irq(struct device *dev, int virq, struct msi_desc *desc; int i, base = 0; - if (!list_empty(dev_to_msi_list(dev))) { - desc = list_last_entry(dev_to_msi_list(dev), + if (!list_empty(dev_to_platform_msi_list(dev))) { + desc = list_last_entry(dev_to_platform_msi_list(dev), struct msi_desc, list); base = desc->platform.msi_index + 1; } @@ -142,7 +143,7 @@ static int platform_msi_alloc_descs_with_irq(struct device *dev, int virq, desc->platform.msi_index = base + i; desc->irq = virq ? virq + i : 0; - list_add_tail(&desc->list, dev_to_msi_list(dev)); + list_add_tail(&desc->list, dev_to_platform_msi_list(dev)); } if (i != nvec) { @@ -213,7 +214,7 @@ platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec, } /* Already had a helping of MSI? Greed... */ - if (!list_empty(dev_to_msi_list(dev))) + if (!list_empty(dev_to_platform_msi_list(dev))) return ERR_PTR(-EBUSY); datap = kzalloc(sizeof(*datap), GFP_KERNEL); @@ -255,6 +256,8 @@ int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, struct platform_msi_priv_data *priv_data; int err; + dev->platform_msi_type = GEN_PLAT_MSI; + priv_data = platform_msi_alloc_priv_data(dev, nvec, platform_ops); if (IS_ERR(priv_data)) return PTR_ERR(priv_data); @@ -284,10 +287,10 @@ EXPORT_SYMBOL_GPL(platform_msi_domain_alloc_irqs); */ void platform_msi_domain_free_irqs(struct device *dev) { - if (!list_empty(dev_to_msi_list(dev))) { + if (!list_empty(dev_to_platform_msi_list(dev))) { struct msi_desc *desc; - desc = first_msi_entry(dev); + desc = first_platform_msi_entry(dev); platform_msi_free_priv_data(desc->platform.msi_priv_data); } @@ -370,7 +373,7 @@ void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq, { struct platform_msi_priv_data *data = domain->host_data; struct msi_desc *desc, *tmp; - for_each_msi_entry_safe(desc, tmp, data->dev) { + for_each_platform_msi_entry_safe(desc, tmp, data->dev) { if (WARN_ON(!desc->irq || desc->nvec_used != 1)) return; if (!(desc->irq >= virq && desc->irq < (virq + nvec))) diff --git a/include/linux/device.h b/include/linux/device.h index ac8e37cd716a..cbcecb14584e 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -567,6 +567,8 @@ struct device { #endif #ifdef CONFIG_GENERIC_MSI_IRQ struct list_head msi_list; + struct list_head platform_msi_list; + unsigned int platform_msi_type; #endif const struct dma_map_ops *dma_ops; diff --git a/include/linux/list.h b/include/linux/list.h index aff44d34f4e4..7a5ea40cb945 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -492,6 +492,18 @@ static inline void list_splice_tail_init(struct list_head *list, #define list_entry(ptr, type, member) \ container_of(ptr, type, member) +/** + * list_entry_select - get the correct struct for this entry based on condition + * @condition: the condition to choose a particular &struct list head pointer + * @ptr_a: the &struct list_head pointer if @condition is not met. + * @ptr_b: the &struct list_head pointer if @condition is met. + * @type: the type of the struct this is embedded in. + * @member: the name of the list_head within the struct. + */ +#define list_entry_select(condition, ptr_a, ptr_b, type, member)\ + (condition) ? list_entry(ptr_a, type, member) : \ + list_entry(ptr_b, type, member) + /** * list_first_entry - get the first element from a list * @ptr: the list head to take the element from. @@ -503,6 +515,17 @@ static inline void list_splice_tail_init(struct list_head *list, #define list_first_entry(ptr, type, member) \ list_entry((ptr)->next, type, member) +/** + * list_first_entry_select - get the first element from list based on condition + * @condition: the condition to choose a particular &struct list head pointer + * @ptr_a: the &struct list_head pointer if @condition is not met. + * @ptr_b: the &struct list_head pointer if @condition is met. + * @type: the type of the struct this is embedded in. + * @member: the name of the list_head within the struct. + */ +#define list_first_entry_select(condition, ptr_a, ptr_b, type, member) \ + list_entry_select((condition), (ptr_a)->next, (ptr_b)->next, type, member) + /** * list_last_entry - get the last element from a list * @ptr: the list head to take the element from. @@ -602,6 +625,19 @@ static inline void list_splice_tail_init(struct list_head *list, &pos->member != (head); \ pos = list_next_entry(pos, member)) +/** + * list_for_each_entry_select - iterate over list of given type based on condition + * @condition: the condition to choose a particular &struct list head pointer + * @pos: the type * to use as a loop cursor. + * @head_a: the head for your list if condition is met. + * @head_b: the head for your list if condition is not met. + * @member: the name of the list_head within the struct. + */ +#define list_for_each_entry_select(condition, pos, head_a, head_b, member)\ + for (pos = list_first_entry_select((condition), head_a, head_b, typeof(*pos), member);\ + (condition) ? &pos->member != (head_a) : &pos->member != (head_b);\ + pos = list_next_entry(pos, member)) + /** * list_for_each_entry_reverse - iterate backwards over list of given type. * @pos: the type * to use as a loop cursor. diff --git a/include/linux/msi.h b/include/linux/msi.h index 8e08907d70cb..9c15b7403694 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -130,6 +130,11 @@ struct msi_desc { }; }; +enum platform_msi_type { + NOT_PLAT_MSI = 0, + GEN_PLAT_MSI = 1, +}; + /* Helpers to hide struct msi_desc implementation details */ #define msi_desc_to_dev(desc) ((desc)->dev) #define dev_to_msi_list(dev) (&(dev)->msi_list) @@ -140,6 +145,22 @@ struct msi_desc { #define for_each_msi_entry_safe(desc, tmp, dev) \ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) +#define dev_to_platform_msi_list(dev) (&(dev)->platform_msi_list) +#define first_platform_msi_entry(dev) \ + list_first_entry(dev_to_platform_msi_list((dev)), struct msi_desc, list) +#define for_each_platform_msi_entry(desc, dev) \ + list_for_each_entry((desc), dev_to_platform_msi_list((dev)), list) +#define for_each_platform_msi_entry_safe(desc, tmp, dev) \ + list_for_each_entry_safe((desc), (tmp), dev_to_platform_msi_list((dev)), list) + +#define first_msi_entry_common(dev) \ + list_first_entry_select((dev)->platform_msi_type, dev_to_platform_msi_list((dev)), \ + dev_to_msi_list((dev)), struct msi_desc, list) + +#define for_each_msi_entry_common(desc, dev) \ + list_for_each_entry_select((dev)->platform_msi_type, desc, dev_to_platform_msi_list((dev)), \ + dev_to_msi_list((dev)), list) \ + #ifdef CONFIG_IRQ_MSI_IOMMU static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) { diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index eb95f6106a1e..bc5f9e32387f 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -320,7 +320,7 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev, struct msi_desc *desc; int ret = 0; - for_each_msi_entry(desc, dev) { + for_each_msi_entry_common(desc, dev) { /* Don't even try the multi-MSI brain damage. */ if (WARN_ON(!desc->irq || desc->nvec_used != 1)) { ret = -EINVAL; @@ -342,7 +342,7 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev, if (ret) { /* Mop up the damage */ - for_each_msi_entry(desc, dev) { + for_each_msi_entry_common(desc, dev) { if (!(desc->irq >= virq && desc->irq < (virq + nvec))) continue; @@ -383,7 +383,7 @@ static bool msi_check_reservation_mode(struct irq_domain *domain, * Checking the first MSI descriptor is sufficient. MSIX supports * masking and MSI does so when the maskbit is set. */ - desc = first_msi_entry(dev); + desc = first_msi_entry_common(dev); return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit; } @@ -411,7 +411,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, if (ret) return ret; - for_each_msi_entry(desc, dev) { + for_each_msi_entry_common(desc, dev) { ops->set_desc(&arg, desc); virq = __irq_domain_alloc_irqs(domain, -1, desc->nvec_used, @@ -437,7 +437,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, can_reserve = msi_check_reservation_mode(domain, info, dev); - for_each_msi_entry(desc, dev) { + for_each_msi_entry_common(desc, dev) { virq = desc->irq; if (desc->nvec_used == 1) dev_dbg(dev, "irq %d for MSI\n", virq); @@ -468,7 +468,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, * so request_irq() will assign the final vector. */ if (can_reserve) { - for_each_msi_entry(desc, dev) { + for_each_msi_entry_common(desc, dev) { irq_data = irq_domain_get_irq_data(domain, desc->irq); irqd_clr_activated(irq_data); } @@ -476,7 +476,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, return 0; cleanup: - for_each_msi_entry(desc, dev) { + for_each_msi_entry_common(desc, dev) { struct irq_data *irqd; if (desc->irq == virq) @@ -500,7 +500,7 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) { struct msi_desc *desc; - for_each_msi_entry(desc, dev) { + for_each_msi_entry_common(desc, dev) { /* * We might have failed to allocate an MSI early * enough that there is no IRQ associated to this From patchwork Tue Apr 21 23:34:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274556 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496KfC0JkSz9sSh for ; Wed, 22 Apr 2020 09:34:27 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726436AbgDUXeM (ORCPT ); Tue, 21 Apr 2020 19:34:12 -0400 Received: from mga03.intel.com ([134.134.136.65]:32238 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726115AbgDUXeL (ORCPT ); Tue, 21 Apr 2020 19:34:11 -0400 IronPort-SDR: HdzxX1T31gR/D7a1x9O9GOH5VEyvlXJtC2l/S/ccA1Mxxd4ggq3YJ2gPOqMbr0pqifyOSstv9D vdF3ALZFct1Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:34:07 -0700 IronPort-SDR: xxjg9s1XUxfrlmxvbvMt8lEmGxmPguT8DMvlQE+tiLyPmeB+3RNRmwR9VkCnEy3csLXauG4JmA q/YoOCgjOl8Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="245816392" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga007.fm.intel.com with ESMTP; 21 Apr 2020 16:34:05 -0700 Subject: [PATCH RFC 03/15] drivers/base: Allocate/free platform-msi interrupts by group From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:34:05 -0700 Message-ID: <158751204550.36773.459505651659406529.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Megha Dey This is a preparatory patch to introduce Interrupt Message Store (IMS). Dynamic allocation is IMS vectors is a requirement for devices which support Scalable I/o virtualization. A driver can allocate and free vectors not just once during probe (as was the case with the MSI/MSI-X) but also in the post probe phase where actual demand is available. Thus, introduce an API, platform_msi_domain_alloc_irqs_group() which drivers using IMS would be able to call multiple times. The vectors allocated each time this API is called are associated with a group ID, starting from 1. To free the vectors associated with a particular group, platform_msi_domain_free_irqs_group() API can be called. The existing drivers using platform-msi infrastructure will continue to use the existing alloc (platform_msi_domain_alloc_irqs) and free (platform_msi_domain_free_irqs) APIs and are assigned a default group 0. Signed-off-by: Megha Dey --- drivers/base/platform-msi.c | 131 ++++++++++++++++++++++++++++++++----------- include/linux/device.h | 1 include/linux/msi.h | 47 +++++++++++---- kernel/irq/msi.c | 43 +++++++++++--- 4 files changed, 169 insertions(+), 53 deletions(-) diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c index b25c52f734dc..2696aa75983b 100644 --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -106,16 +106,28 @@ static void platform_msi_update_chip_ops(struct msi_domain_info *info) info->flags &= ~MSI_FLAG_LEVEL_CAPABLE; } -static void platform_msi_free_descs(struct device *dev, int base, int nvec) +static void platform_msi_free_descs(struct device *dev, int base, int nvec, + unsigned int group) { struct msi_desc *desc, *tmp; - - list_for_each_entry_safe(desc, tmp, dev_to_platform_msi_list(dev), - list) { - if (desc->platform.msi_index >= base && - desc->platform.msi_index < (base + nvec)) { - list_del(&desc->list); - free_msi_entry(desc); + struct platform_msi_group_entry *platform_msi_group, + *tmp_platform_msi_group; + + list_for_each_entry_safe(platform_msi_group, tmp_platform_msi_group, + dev_to_platform_msi_group_list(dev), + group_list) { + if (platform_msi_group->group_id == group) { + list_for_each_entry_safe(desc, tmp, + &platform_msi_group->entry_list, + list) { + if (desc->platform.msi_index >= base && + desc->platform.msi_index < (base + nvec)) { + list_del(&desc->list); + free_msi_entry(desc); + } + } + list_del(&platform_msi_group->group_list); + kfree(platform_msi_group); } } } @@ -128,8 +140,8 @@ static int platform_msi_alloc_descs_with_irq(struct device *dev, int virq, struct msi_desc *desc; int i, base = 0; - if (!list_empty(dev_to_platform_msi_list(dev))) { - desc = list_last_entry(dev_to_platform_msi_list(dev), + if (!list_empty(platform_msi_current_group_entry_list(dev))) { + desc = list_last_entry(platform_msi_current_group_entry_list(dev), struct msi_desc, list); base = desc->platform.msi_index + 1; } @@ -143,12 +155,13 @@ static int platform_msi_alloc_descs_with_irq(struct device *dev, int virq, desc->platform.msi_index = base + i; desc->irq = virq ? virq + i : 0; - list_add_tail(&desc->list, dev_to_platform_msi_list(dev)); + list_add_tail(&desc->list, + platform_msi_current_group_entry_list(dev)); } if (i != nvec) { /* Clean up the mess */ - platform_msi_free_descs(dev, base, nvec); + platform_msi_free_descs(dev, base, nvec, dev->group_id); return -ENOMEM; } @@ -214,7 +227,7 @@ platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec, } /* Already had a helping of MSI? Greed... */ - if (!list_empty(dev_to_platform_msi_list(dev))) + if (!list_empty(platform_msi_current_group_entry_list(dev))) return ERR_PTR(-EBUSY); datap = kzalloc(sizeof(*datap), GFP_KERNEL); @@ -253,11 +266,36 @@ static void platform_msi_free_priv_data(struct platform_msi_priv_data *data) int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, const struct platform_msi_ops *platform_ops) { + return platform_msi_domain_alloc_irqs_group(dev, nvec, platform_ops, + NULL); +} +EXPORT_SYMBOL_GPL(platform_msi_domain_alloc_irqs); + +int platform_msi_domain_alloc_irqs_group(struct device *dev, unsigned int nvec, + const struct platform_msi_ops *platform_ops, + unsigned int *group_id) +{ + struct platform_msi_group_entry *platform_msi_group; struct platform_msi_priv_data *priv_data; int err; dev->platform_msi_type = GEN_PLAT_MSI; + if (group_id) + *group_id = ++dev->group_id; + + platform_msi_group = kzalloc(sizeof(*platform_msi_group), GFP_KERNEL); + if (!platform_msi_group) { + err = -ENOMEM; + goto out_platform_msi_group; + } + + INIT_LIST_HEAD(&platform_msi_group->group_list); + INIT_LIST_HEAD(&platform_msi_group->entry_list); + platform_msi_group->group_id = dev->group_id; + list_add_tail(&platform_msi_group->group_list, + dev_to_platform_msi_group_list(dev)); + priv_data = platform_msi_alloc_priv_data(dev, nvec, platform_ops); if (IS_ERR(priv_data)) return PTR_ERR(priv_data); @@ -273,13 +311,14 @@ int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, return 0; out_free_desc: - platform_msi_free_descs(dev, 0, nvec); + platform_msi_free_descs(dev, 0, nvec, dev->group_id); out_free_priv_data: platform_msi_free_priv_data(priv_data); - +out_platform_msi_group: + kfree(platform_msi_group); return err; } -EXPORT_SYMBOL_GPL(platform_msi_domain_alloc_irqs); +EXPORT_SYMBOL_GPL(platform_msi_domain_alloc_irqs_group); /** * platform_msi_domain_free_irqs - Free MSI interrupts for @dev @@ -287,17 +326,30 @@ EXPORT_SYMBOL_GPL(platform_msi_domain_alloc_irqs); */ void platform_msi_domain_free_irqs(struct device *dev) { - if (!list_empty(dev_to_platform_msi_list(dev))) { - struct msi_desc *desc; + platform_msi_domain_free_irqs_group(dev, 0); +} +EXPORT_SYMBOL_GPL(platform_msi_domain_free_irqs); - desc = first_platform_msi_entry(dev); - platform_msi_free_priv_data(desc->platform.msi_priv_data); +void platform_msi_domain_free_irqs_group(struct device *dev, unsigned int group) +{ + struct platform_msi_group_entry *platform_msi_group; + + list_for_each_entry(platform_msi_group, + dev_to_platform_msi_group_list((dev)), group_list) { + if (platform_msi_group->group_id == group) { + if (!list_empty(&platform_msi_group->entry_list)) { + struct msi_desc *desc; + + desc = list_first_entry(&(platform_msi_group)->entry_list, + struct msi_desc, list); + platform_msi_free_priv_data(desc->platform.msi_priv_data); + } + } } - - msi_domain_free_irqs(dev->msi_domain, dev); - platform_msi_free_descs(dev, 0, MAX_DEV_MSIS); + msi_domain_free_irqs_group(dev->msi_domain, dev, group); + platform_msi_free_descs(dev, 0, MAX_DEV_MSIS, group); } -EXPORT_SYMBOL_GPL(platform_msi_domain_free_irqs); +EXPORT_SYMBOL_GPL(platform_msi_domain_free_irqs_group); /** * platform_msi_get_host_data - Query the private data associated with @@ -373,15 +425,28 @@ void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq, { struct platform_msi_priv_data *data = domain->host_data; struct msi_desc *desc, *tmp; - for_each_platform_msi_entry_safe(desc, tmp, data->dev) { - if (WARN_ON(!desc->irq || desc->nvec_used != 1)) - return; - if (!(desc->irq >= virq && desc->irq < (virq + nvec))) - continue; - - irq_domain_free_irqs_common(domain, desc->irq, 1); - list_del(&desc->list); - free_msi_entry(desc); + struct platform_msi_group_entry *platform_msi_group, + *tmp_platform_msi_group; + + list_for_each_entry_safe(platform_msi_group, tmp_platform_msi_group, + dev_to_platform_msi_group_list(data->dev), + group_list) { + if (platform_msi_group->group_id == data->dev->group_id) { + list_for_each_entry_safe(desc, tmp, + &platform_msi_group->entry_list, + list) { + if (WARN_ON(!desc->irq || desc->nvec_used != 1)) + return; + if (!(desc->irq >= virq && desc->irq < (virq + nvec))) + continue; + + irq_domain_free_irqs_common(domain, desc->irq, 1); + list_del(&desc->list); + free_msi_entry(desc); + } + list_del(&platform_msi_group->group_list); + kfree(platform_msi_group); + } } } diff --git a/include/linux/device.h b/include/linux/device.h index cbcecb14584e..f6700b85eb95 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -624,6 +624,7 @@ struct device { defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) bool dma_coherent:1; #endif + unsigned int group_id; }; static inline struct device *kobj_to_dev(struct kobject *kobj) diff --git a/include/linux/msi.h b/include/linux/msi.h index 9c15b7403694..3890b143b04d 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -135,6 +135,12 @@ enum platform_msi_type { GEN_PLAT_MSI = 1, }; +struct platform_msi_group_entry { + unsigned int group_id; + struct list_head group_list; + struct list_head entry_list; +}; + /* Helpers to hide struct msi_desc implementation details */ #define msi_desc_to_dev(desc) ((desc)->dev) #define dev_to_msi_list(dev) (&(dev)->msi_list) @@ -145,21 +151,31 @@ enum platform_msi_type { #define for_each_msi_entry_safe(desc, tmp, dev) \ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) -#define dev_to_platform_msi_list(dev) (&(dev)->platform_msi_list) -#define first_platform_msi_entry(dev) \ - list_first_entry(dev_to_platform_msi_list((dev)), struct msi_desc, list) -#define for_each_platform_msi_entry(desc, dev) \ - list_for_each_entry((desc), dev_to_platform_msi_list((dev)), list) -#define for_each_platform_msi_entry_safe(desc, tmp, dev) \ - list_for_each_entry_safe((desc), (tmp), dev_to_platform_msi_list((dev)), list) +#define dev_to_platform_msi_group_list(dev) (&(dev)->platform_msi_list) + +#define first_platform_msi_group_entry(dev) \ + list_first_entry(dev_to_platform_msi_group_list((dev)), \ + struct platform_msi_group_entry, group_list) -#define first_msi_entry_common(dev) \ - list_first_entry_select((dev)->platform_msi_type, dev_to_platform_msi_list((dev)), \ +#define platform_msi_current_group_entry_list(dev) \ + (&((list_last_entry(dev_to_platform_msi_group_list((dev)), \ + struct platform_msi_group_entry, \ + group_list))->entry_list)) + +#define first_msi_entry_current_group(dev) \ + list_first_entry_select((dev)->platform_msi_type, \ + platform_msi_current_group_entry_list((dev)), \ dev_to_msi_list((dev)), struct msi_desc, list) -#define for_each_msi_entry_common(desc, dev) \ - list_for_each_entry_select((dev)->platform_msi_type, desc, dev_to_platform_msi_list((dev)), \ - dev_to_msi_list((dev)), list) \ +#define for_each_msi_entry_current_group(desc, dev) \ + list_for_each_entry_select((dev)->platform_msi_type, desc, \ + platform_msi_current_group_entry_list((dev)),\ + dev_to_msi_list((dev)), list) + +#define for_each_platform_msi_entry_in_group(desc, platform_msi_group, group, dev) \ + list_for_each_entry((platform_msi_group), dev_to_platform_msi_group_list((dev)), group_list) \ + if (((platform_msi_group)->group_id) == (group)) \ + list_for_each_entry((desc), (&(platform_msi_group)->entry_list), list) #ifdef CONFIG_IRQ_MSI_IOMMU static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) @@ -363,6 +379,8 @@ struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode, int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, int nvec); void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev); +void msi_domain_free_irqs_group(struct irq_domain *domain, + struct device *dev, unsigned int group); struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain); struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode, @@ -371,6 +389,11 @@ struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode, int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, const struct platform_msi_ops *platform_ops); void platform_msi_domain_free_irqs(struct device *dev); +int platform_msi_domain_alloc_irqs_group(struct device *dev, unsigned int nvec, + const struct platform_msi_ops *platform_ops, + unsigned int *group_id); +void platform_msi_domain_free_irqs_group(struct device *dev, + unsigned int group_id); /* When an MSI domain is used as an intermediate domain */ int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev, diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index bc5f9e32387f..899ade394ec8 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -320,7 +320,7 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev, struct msi_desc *desc; int ret = 0; - for_each_msi_entry_common(desc, dev) { + for_each_msi_entry_current_group(desc, dev) { /* Don't even try the multi-MSI brain damage. */ if (WARN_ON(!desc->irq || desc->nvec_used != 1)) { ret = -EINVAL; @@ -342,7 +342,7 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev, if (ret) { /* Mop up the damage */ - for_each_msi_entry_common(desc, dev) { + for_each_msi_entry_current_group(desc, dev) { if (!(desc->irq >= virq && desc->irq < (virq + nvec))) continue; @@ -383,7 +383,7 @@ static bool msi_check_reservation_mode(struct irq_domain *domain, * Checking the first MSI descriptor is sufficient. MSIX supports * masking and MSI does so when the maskbit is set. */ - desc = first_msi_entry_common(dev); + desc = first_msi_entry_current_group(dev); return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit; } @@ -411,7 +411,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, if (ret) return ret; - for_each_msi_entry_common(desc, dev) { + for_each_msi_entry_current_group(desc, dev) { ops->set_desc(&arg, desc); virq = __irq_domain_alloc_irqs(domain, -1, desc->nvec_used, @@ -437,7 +437,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, can_reserve = msi_check_reservation_mode(domain, info, dev); - for_each_msi_entry_common(desc, dev) { + for_each_msi_entry_current_group(desc, dev) { virq = desc->irq; if (desc->nvec_used == 1) dev_dbg(dev, "irq %d for MSI\n", virq); @@ -468,7 +468,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, * so request_irq() will assign the final vector. */ if (can_reserve) { - for_each_msi_entry_common(desc, dev) { + for_each_msi_entry_current_group(desc, dev) { irq_data = irq_domain_get_irq_data(domain, desc->irq); irqd_clr_activated(irq_data); } @@ -476,7 +476,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, return 0; cleanup: - for_each_msi_entry_common(desc, dev) { + for_each_msi_entry_current_group(desc, dev) { struct irq_data *irqd; if (desc->irq == virq) @@ -500,7 +500,34 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) { struct msi_desc *desc; - for_each_msi_entry_common(desc, dev) { + for_each_msi_entry_current_group(desc, dev) { + /* + * We might have failed to allocate an MSI early + * enough that there is no IRQ associated to this + * entry. If that's the case, don't do anything. + */ + if (desc->irq) { + irq_domain_free_irqs(desc->irq, desc->nvec_used); + desc->irq = 0; + } + } +} + +/** + * msi_domain_free_irqs_group - Free interrupts from a MSI interrupt @domain + * associated to @dev from a particular group + * @domain: The domain to managing the interrupts + * @dev: Pointer to device struct of the device for which the interrupts + * are free + * @group: group from which interrupts are to be freed + */ +void msi_domain_free_irqs_group(struct irq_domain *domain, + struct device *dev, unsigned int group) +{ + struct msi_desc *desc; + struct platform_msi_group_entry *platform_msi_group; + + for_each_platform_msi_entry_in_group(desc, platform_msi_group, group, dev) { /* * We might have failed to allocate an MSI early * enough that there is no IRQ associated to this From patchwork Tue Apr 21 23:34:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274554 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496Kf4335mz9sSM for ; Wed, 22 Apr 2020 09:34:20 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726457AbgDUXeO (ORCPT ); Tue, 21 Apr 2020 19:34:14 -0400 Received: from mga12.intel.com ([192.55.52.136]:35979 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726451AbgDUXeN (ORCPT ); Tue, 21 Apr 2020 19:34:13 -0400 IronPort-SDR: q2b4Th76ee9wY0QAa3NwC4KU9b4PA+vGVwqMqBJ2ZZUlPKgO/ZC/3fShqYN/6qCPRXyzpyIIew qPgtB2ZA5tPw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:34:13 -0700 IronPort-SDR: PQBV/gzbJQ/w8O+RLeKfbPQdNhHVGd1il12nwqmpVqW+q2dT7TaiYAYDJhaZEv5sGpig9gT3Xb mQh/QnTXQjsw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="245816413" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga007.fm.intel.com with ESMTP; 21 Apr 2020 16:34:11 -0700 Subject: [PATCH RFC 04/15] drivers/base: Add support for a new IMS irq domain From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:34:11 -0700 Message-ID: <158751205175.36773.1874642824360728883.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Megha Dey Add support for the creation of a new IMS irq domain. It creates a new irq chip associated with the IMS domain and adds the necessary domain operations to it. Also, add a new config option MSI_IMS which must be enabled by any driver who would want to use the IMS infrastructure. Signed-off-by: Megha Dey --- arch/x86/include/asm/hw_irq.h | 7 +++ drivers/base/Kconfig | 9 +++ drivers/base/Makefile | 1 drivers/base/ims-msi.c | 100 ++++++++++++++++++++++++++++++++++++++ drivers/base/platform-msi.c | 6 +- drivers/vfio/mdev/mdev_core.c | 6 ++ drivers/vfio/mdev/mdev_private.h | 1 include/linux/mdev.h | 3 + include/linux/msi.h | 2 + 9 files changed, 131 insertions(+), 4 deletions(-) create mode 100644 drivers/base/ims-msi.c diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 4154bc5f6a4e..2e355aa6ba50 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -62,6 +62,7 @@ enum irq_alloc_type { X86_IRQ_ALLOC_TYPE_MSIX, X86_IRQ_ALLOC_TYPE_DMAR, X86_IRQ_ALLOC_TYPE_UV, + X86_IRQ_ALLOC_TYPE_IMS, }; struct irq_alloc_info { @@ -83,6 +84,12 @@ struct irq_alloc_info { irq_hw_number_t msi_hwirq; }; #endif +#ifdef CONFIG_MSI_IMS + struct { + struct device *dev; + irq_hw_number_t ims_hwirq; + }; +#endif #ifdef CONFIG_X86_IO_APIC struct { int ioapic_id; diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 5f0bc74d2409..877e0fdee013 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -209,4 +209,13 @@ config GENERIC_ARCH_TOPOLOGY appropriate scaling, sysfs interface for reading capacity values at runtime. +config MSI_IMS + bool "Device Specific Interrupt Message Storage (IMS)" + depends on X86 + select GENERIC_MSI_IRQ_DOMAIN + select IRQ_REMAP + help + This allows device drivers to enable device specific + interrupt message storage (IMS) besides standard MSI-X interrupts. + endmenu diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..659b9b0c0b8a 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_SOC_BUS) += soc.o obj-$(CONFIG_PINCTRL) += pinctrl.o obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o obj-$(CONFIG_GENERIC_MSI_IRQ_DOMAIN) += platform-msi.o +obj-$(CONFIG_MSI_IMS) += ims-msi.o obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o obj-y += test/ diff --git a/drivers/base/ims-msi.c b/drivers/base/ims-msi.c new file mode 100644 index 000000000000..738f6d153155 --- /dev/null +++ b/drivers/base/ims-msi.c @@ -0,0 +1,100 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Support for Device Specific IMS interrupts. + * + * Copyright © 2019 Intel Corporation. + * + * Author: Megha Dey + */ + +#include +#include +#include +#include + +/* + * Determine if a dev is mdev or not. Return NULL if not mdev device. + * Return mdev's parent dev if success. + */ +static inline struct device *mdev_to_parent(struct device *dev) +{ + struct device *ret = NULL; + struct device *(*fn)(struct device *dev); + struct bus_type *bus = symbol_get(mdev_bus_type); + + if (bus && dev->bus == bus) { + fn = symbol_get(mdev_dev_to_parent_dev); + ret = fn(dev); + symbol_put(mdev_dev_to_parent_dev); + symbol_put(mdev_bus_type); + } + + return ret; +} + +static irq_hw_number_t dev_ims_get_hwirq(struct msi_domain_info *info, + msi_alloc_info_t *arg) +{ + return arg->ims_hwirq; +} + +static int dev_ims_prepare(struct irq_domain *domain, struct device *dev, + int nvec, msi_alloc_info_t *arg) +{ + if (dev_is_mdev(dev)) + dev = mdev_to_parent(dev); + + init_irq_alloc_info(arg, NULL); + arg->dev = dev; + arg->type = X86_IRQ_ALLOC_TYPE_IMS; + + return 0; +} + +static void dev_ims_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) +{ + arg->ims_hwirq = platform_msi_calc_hwirq(desc); +} + +static struct msi_domain_ops dev_ims_domain_ops = { + .get_hwirq = dev_ims_get_hwirq, + .msi_prepare = dev_ims_prepare, + .set_desc = dev_ims_set_desc, +}; + +static struct irq_chip dev_ims_ir_controller = { + .name = "IR-DEV-IMS", + .irq_ack = irq_chip_ack_parent, + .irq_retrigger = irq_chip_retrigger_hierarchy, + .irq_set_vcpu_affinity = irq_chip_set_vcpu_affinity_parent, + .flags = IRQCHIP_SKIP_SET_WAKE, + .irq_write_msi_msg = platform_msi_write_msg, +}; + +static struct msi_domain_info ims_ir_domain_info = { + .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS, + .ops = &dev_ims_domain_ops, + .chip = &dev_ims_ir_controller, + .handler = handle_edge_irq, + .handler_name = "edge", +}; + +struct irq_domain *arch_create_ims_irq_domain(struct irq_domain *parent, + const char *name) +{ + struct fwnode_handle *fn; + struct irq_domain *domain; + + fn = irq_domain_alloc_named_fwnode(name); + if (!fn) + return NULL; + + domain = msi_create_irq_domain(fn, &ims_ir_domain_info, parent); + if (!domain) + return NULL; + + irq_domain_update_bus_token(domain, DOMAIN_BUS_PLATFORM_MSI); + irq_domain_free_fwnode(fn); + + return domain; +} diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c index 2696aa75983b..59160e8cbfb1 100644 --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -31,12 +31,11 @@ struct platform_msi_priv_data { /* The devid allocator */ static DEFINE_IDA(platform_msi_devid_ida); -#ifdef GENERIC_MSI_DOMAIN_OPS /* * Convert an msi_desc to a globaly unique identifier (per-device * devid + msi_desc position in the msi_list). */ -static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc) +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc) { u32 devid; @@ -45,6 +44,7 @@ static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc) return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index; } +#ifdef GENERIC_MSI_DOMAIN_OPS static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) { arg->desc = desc; @@ -76,7 +76,7 @@ static void platform_msi_update_dom_ops(struct msi_domain_info *info) ops->set_desc = platform_msi_set_desc; } -static void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg) +void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg) { struct msi_desc *desc = irq_data_get_msi_desc(data); struct platform_msi_priv_data *priv_data; diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index b558d4cfd082..cecc6a6bdbef 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -33,6 +33,12 @@ struct device *mdev_parent_dev(struct mdev_device *mdev) } EXPORT_SYMBOL(mdev_parent_dev); +struct device *mdev_dev_to_parent_dev(struct device *dev) +{ + return to_mdev_device(dev)->parent->dev; +} +EXPORT_SYMBOL(mdev_dev_to_parent_dev); + void *mdev_get_drvdata(struct mdev_device *mdev) { return mdev->driver_data; diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h index 7d922950caaf..c21f1305a76b 100644 --- a/drivers/vfio/mdev/mdev_private.h +++ b/drivers/vfio/mdev/mdev_private.h @@ -36,7 +36,6 @@ struct mdev_device { }; #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev) -#define dev_is_mdev(d) ((d)->bus == &mdev_bus_type) struct mdev_type { struct kobject kobj; diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 0ce30ca78db0..fa2344e239ef 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -144,5 +144,8 @@ void mdev_unregister_driver(struct mdev_driver *drv); struct device *mdev_parent_dev(struct mdev_device *mdev); struct device *mdev_dev(struct mdev_device *mdev); struct mdev_device *mdev_from_dev(struct device *dev); +struct device *mdev_dev_to_parent_dev(struct device *dev); + +#define dev_is_mdev(dev) ((dev)->bus == symbol_get(mdev_bus_type)) #endif /* MDEV_H */ diff --git a/include/linux/msi.h b/include/linux/msi.h index 3890b143b04d..80386468a7bc 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -418,6 +418,8 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq, void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq, unsigned int nvec); void *platform_msi_get_host_data(struct irq_domain *domain); +irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc); +void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg); #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */ #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN From patchwork Tue Apr 21 23:34:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274555 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496KfB3hNxz9sSM for ; Wed, 22 Apr 2020 09:34:26 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726485AbgDUXeV (ORCPT ); Tue, 21 Apr 2020 19:34:21 -0400 Received: from mga09.intel.com ([134.134.136.24]:48536 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726482AbgDUXeU (ORCPT ); Tue, 21 Apr 2020 19:34:20 -0400 IronPort-SDR: 3/rtuIF31oM0OaVEbKUi3KRsrafnF7Z4rjs0j2KcO80VIWnO4OFucTqNf+DjuJUWYnJ0IW80l5 QZAxJ2Wq17RA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:34:19 -0700 IronPort-SDR: qNc57k773vsWijzuXENpuH2d5vSs9y4rvinSUip2tAmwqhi78IwDN9RiECbY2+K4Mqr/brXoBe oJHkeYg+ZLlA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="290640817" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga002.fm.intel.com with ESMTP; 21 Apr 2020 16:34:18 -0700 Subject: [PATCH RFC 05/15] ims-msi: Add mask/unmask routines From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:34:17 -0700 Message-ID: <158751205785.36773.16321096654677399376.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Megha Dey Introduce the mask/unmask functions which would be used as callbacks to the IRQ chip associated with the IMS domain. Signed-off-by: Megha Dey --- drivers/base/ims-msi.c | 47 +++++++++++++++++++++++++++++++++++++++++++ drivers/base/platform-msi.c | 12 ----------- include/linux/msi.h | 14 +++++++++++++ 3 files changed, 61 insertions(+), 12 deletions(-) diff --git a/drivers/base/ims-msi.c b/drivers/base/ims-msi.c index 738f6d153155..896a5a1b2252 100644 --- a/drivers/base/ims-msi.c +++ b/drivers/base/ims-msi.c @@ -7,11 +7,56 @@ * Author: Megha Dey */ +#include #include +#include #include #include +#include #include +static u32 __dev_ims_desc_mask_irq(struct msi_desc *desc, u32 flag) +{ + u32 mask_bits = desc->platform.masked; + const struct platform_msi_ops *ops; + + ops = desc->platform.msi_priv_data->ops; + if (!ops) + return 0; + + if (flag) { + if (ops->irq_mask) + mask_bits = ops->irq_mask(desc); + } else { + if (ops->irq_unmask) + mask_bits = ops->irq_unmask(desc); + } + + return mask_bits; +} + +/** + * dev_ims_mask_irq - Generic irq chip callback to mask IMS interrupts + * @data: pointer to irqdata associated to that interrupt + */ +static void dev_ims_mask_irq(struct irq_data *data) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + + desc->platform.masked = __dev_ims_desc_mask_irq(desc, 1); +} + +/** + * dev_msi_unmask_irq - Generic irq chip callback to unmask IMS interrupts + * @data: pointer to irqdata associated to that interrupt + */ +void dev_ims_unmask_irq(struct irq_data *data) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + + desc->platform.masked = __dev_ims_desc_mask_irq(desc, 0); +} + /* * Determine if a dev is mdev or not. Return NULL if not mdev device. * Return mdev's parent dev if success. @@ -69,6 +114,8 @@ static struct irq_chip dev_ims_ir_controller = { .irq_set_vcpu_affinity = irq_chip_set_vcpu_affinity_parent, .flags = IRQCHIP_SKIP_SET_WAKE, .irq_write_msi_msg = platform_msi_write_msg, + .irq_unmask = dev_ims_unmask_irq, + .irq_mask = dev_ims_mask_irq, }; static struct msi_domain_info ims_ir_domain_info = { diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c index 59160e8cbfb1..6d8840db4a85 100644 --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -16,18 +16,6 @@ #define DEV_ID_SHIFT 21 #define MAX_DEV_MSIS (1 << (32 - DEV_ID_SHIFT)) -/* - * Internal data structure containing a (made up, but unique) devid - * and the callback to write the MSI message. - */ -struct platform_msi_priv_data { - struct device *dev; - void *host_data; - msi_alloc_info_t arg; - const struct platform_msi_ops *ops; - int devid; -}; - /* The devid allocator */ static DEFINE_IDA(platform_msi_devid_ida); diff --git a/include/linux/msi.h b/include/linux/msi.h index 80386468a7bc..8b5f24bf3c47 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -33,10 +33,12 @@ typedef void (*irq_write_msi_msg_t)(struct msi_desc *desc, * platform_msi_desc - Platform device specific msi descriptor data * @msi_priv_data: Pointer to platform private data * @msi_index: The index of the MSI descriptor for multi MSI + * @masked: mask bits */ struct platform_msi_desc { struct platform_msi_priv_data *msi_priv_data; u16 msi_index; + u32 masked; }; /** @@ -370,6 +372,18 @@ struct platform_msi_ops { irq_write_msi_msg_t write_msg; }; +/* + * Internal data structure containing a (made up, but unique) devid + * and the callback to write the MSI message. + */ +struct platform_msi_priv_data { + struct device *dev; + void *host_data; + msi_alloc_info_t arg; + const struct platform_msi_ops *ops; + int devid; +}; + int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force); From patchwork Tue Apr 21 23:34:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274557 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496KfL2DYJz9sSM for ; Wed, 22 Apr 2020 09:34:34 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726507AbgDUXe0 (ORCPT ); Tue, 21 Apr 2020 19:34:26 -0400 Received: from mga05.intel.com ([192.55.52.43]:10128 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726501AbgDUXe0 (ORCPT ); Tue, 21 Apr 2020 19:34:26 -0400 IronPort-SDR: IaBF5dqT1TOrZzibrxbz5zeoww6iFNj8voSIdt+DCQ9qIcyVYK0mIAmSj56LIIQsnO0mfHIgDt dVeNmTJGmt8w== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:34:25 -0700 IronPort-SDR: AwaYRlSuxOFxY46ppb2H+gAM1iWEA3fYWpl3hjCwgCOSgdSf8b2fA6i7FeamUksBxwR62OuXfv M86Ol0mkLVng== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="365505910" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga001.fm.intel.com with ESMTP; 21 Apr 2020 16:34:24 -0700 Subject: [PATCH RFC 06/15] ims-msi: Enable IMS interrupts From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:34:24 -0700 Message-ID: <158751206394.36773.12409950149228811741.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Megha Dey To enable IMS interrupts, 1. create an IMS irqdomain (arch_create_ims_irq_domain()) associated with the interrupt remapping unit. 2. Add 'IMS' to the enum platform_msi_type to differentiate between specific actions required for different types of platform-msi, currently generic platform-msi and IMS Signed-off-by: Megha Dey --- arch/x86/include/asm/irq_remapping.h | 6 ++++ drivers/base/ims-msi.c | 15 ++++++++++ drivers/base/platform-msi.c | 51 +++++++++++++++++++++++++--------- drivers/iommu/intel-iommu.c | 2 + drivers/iommu/intel_irq_remapping.c | 31 +++++++++++++++++++-- include/linux/intel-iommu.h | 3 ++ include/linux/msi.h | 9 ++++++ 7 files changed, 100 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h index 4bc985f1e2e4..575e48c31b78 100644 --- a/arch/x86/include/asm/irq_remapping.h +++ b/arch/x86/include/asm/irq_remapping.h @@ -53,6 +53,12 @@ irq_remapping_get_irq_domain(struct irq_alloc_info *info); extern struct irq_domain * arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id); +/* Create IMS irqdomain, use @parent as the parent irqdomain. */ +#ifdef CONFIG_MSI_IMS +extern struct irq_domain *arch_create_ims_irq_domain(struct irq_domain *parent, + const char *name); +#endif + /* Get parent irqdomain for interrupt remapping irqdomain */ static inline struct irq_domain *arch_get_ir_parent_domain(void) { diff --git a/drivers/base/ims-msi.c b/drivers/base/ims-msi.c index 896a5a1b2252..ac21088bcb83 100644 --- a/drivers/base/ims-msi.c +++ b/drivers/base/ims-msi.c @@ -14,6 +14,7 @@ #include #include #include +#include static u32 __dev_ims_desc_mask_irq(struct msi_desc *desc, u32 flag) { @@ -101,6 +102,20 @@ static void dev_ims_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) arg->ims_hwirq = platform_msi_calc_hwirq(desc); } +struct irq_domain *dev_get_ims_domain(struct device *dev) +{ + struct irq_alloc_info info; + + if (dev_is_mdev(dev)) + dev = mdev_to_parent(dev); + + init_irq_alloc_info(&info, NULL); + info.type = X86_IRQ_ALLOC_TYPE_IMS; + info.dev = dev; + + return irq_remapping_get_irq_domain(&info); +} + static struct msi_domain_ops dev_ims_domain_ops = { .get_hwirq = dev_ims_get_hwirq, .msi_prepare = dev_ims_prepare, diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c index 6d8840db4a85..204ce8041c17 100644 --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -118,6 +118,8 @@ static void platform_msi_free_descs(struct device *dev, int base, int nvec, kfree(platform_msi_group); } } + + dev->platform_msi_type = 0; } static int platform_msi_alloc_descs_with_irq(struct device *dev, int virq, @@ -205,18 +207,22 @@ platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec, * accordingly (which would impact the max number of MSI * capable devices). */ - if (!dev->msi_domain || !platform_ops->write_msg || !nvec || - nvec > MAX_DEV_MSIS) + if (!platform_ops->write_msg || !nvec || nvec > MAX_DEV_MSIS) return ERR_PTR(-EINVAL); - if (dev->msi_domain->bus_token != DOMAIN_BUS_PLATFORM_MSI) { - dev_err(dev, "Incompatible msi_domain, giving up\n"); - return ERR_PTR(-EINVAL); - } + if (dev->platform_msi_type == GEN_PLAT_MSI) { + if (!dev->msi_domain) + return ERR_PTR(-EINVAL); + + if (dev->msi_domain->bus_token != DOMAIN_BUS_PLATFORM_MSI) { + dev_err(dev, "Incompatible msi_domain, giving up\n"); + return ERR_PTR(-EINVAL); + } - /* Already had a helping of MSI? Greed... */ - if (!list_empty(platform_msi_current_group_entry_list(dev))) - return ERR_PTR(-EBUSY); + /* Already had a helping of MSI? Greed... */ + if (!list_empty(platform_msi_current_group_entry_list(dev))) + return ERR_PTR(-EBUSY); + } datap = kzalloc(sizeof(*datap), GFP_KERNEL); if (!datap) @@ -254,6 +260,7 @@ static void platform_msi_free_priv_data(struct platform_msi_priv_data *data) int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, const struct platform_msi_ops *platform_ops) { + dev->platform_msi_type = GEN_PLAT_MSI; return platform_msi_domain_alloc_irqs_group(dev, nvec, platform_ops, NULL); } @@ -265,12 +272,18 @@ int platform_msi_domain_alloc_irqs_group(struct device *dev, unsigned int nvec, { struct platform_msi_group_entry *platform_msi_group; struct platform_msi_priv_data *priv_data; + struct irq_domain *domain; int err; - dev->platform_msi_type = GEN_PLAT_MSI; - - if (group_id) + if (!dev->platform_msi_type) { *group_id = ++dev->group_id; + dev->platform_msi_type = IMS; + domain = dev_get_ims_domain(dev); + if (!domain) + return -ENOSYS; + } else { + domain = dev->msi_domain; + } platform_msi_group = kzalloc(sizeof(*platform_msi_group), GFP_KERNEL); if (!platform_msi_group) { @@ -292,10 +305,11 @@ int platform_msi_domain_alloc_irqs_group(struct device *dev, unsigned int nvec, if (err) goto out_free_priv_data; - err = msi_domain_alloc_irqs(dev->msi_domain, dev, nvec); + err = msi_domain_alloc_irqs(domain, dev, nvec); if (err) goto out_free_desc; + dev->platform_msi_type = 0; return 0; out_free_desc: @@ -314,6 +328,7 @@ EXPORT_SYMBOL_GPL(platform_msi_domain_alloc_irqs_group); */ void platform_msi_domain_free_irqs(struct device *dev) { + dev->platform_msi_type = GEN_PLAT_MSI; platform_msi_domain_free_irqs_group(dev, 0); } EXPORT_SYMBOL_GPL(platform_msi_domain_free_irqs); @@ -321,6 +336,14 @@ EXPORT_SYMBOL_GPL(platform_msi_domain_free_irqs); void platform_msi_domain_free_irqs_group(struct device *dev, unsigned int group) { struct platform_msi_group_entry *platform_msi_group; + struct irq_domain *domain; + + if (!dev->platform_msi_type) { + dev->platform_msi_type = IMS; + domain = dev_get_ims_domain(dev); + } else { + domain = dev->msi_domain; + } list_for_each_entry(platform_msi_group, dev_to_platform_msi_group_list((dev)), group_list) { @@ -334,7 +357,7 @@ void platform_msi_domain_free_irqs_group(struct device *dev, unsigned int group) } } } - msi_domain_free_irqs_group(dev->msi_domain, dev, group); + msi_domain_free_irqs_group(domain, dev, group); platform_msi_free_descs(dev, 0, MAX_DEV_MSIS, group); } EXPORT_SYMBOL_GPL(platform_msi_domain_free_irqs_group); diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index ef0a5246700e..99bb238caea6 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -794,7 +794,7 @@ is_downstream_to_pci_bridge(struct device *dev, struct device *bridge) return false; } -static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) +struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn) { struct dmar_drhd_unit *drhd = NULL; struct intel_iommu *iommu; diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index 81e43c1df7ec..1e470c9c3e7d 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -234,6 +234,18 @@ static struct intel_iommu *map_dev_to_ir(struct pci_dev *dev) return drhd->iommu; } +static struct intel_iommu *map_gen_dev_to_ir(struct device *dev) +{ + struct intel_iommu *iommu; + u8 bus, devfn; + + iommu = device_to_iommu(dev, &bus, &devfn); + if (!iommu) + return NULL; + + return iommu; +} + static int clear_entries(struct irq_2_iommu *irq_iommu) { struct irte *start, *entry, *end; @@ -572,6 +584,10 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu) arch_create_remap_msi_irq_domain(iommu->ir_domain, "INTEL-IR-MSI", iommu->seq_id); +#if IS_ENABLED(CONFIG_MSI_IMS) + iommu->ir_ims_domain = arch_create_ims_irq_domain(iommu->ir_domain, + "INTEL-IR-IMS"); +#endif ir_table->base = page_address(pages); ir_table->bitmap = bitmap; @@ -637,6 +653,10 @@ static void intel_teardown_irq_remapping(struct intel_iommu *iommu) irq_domain_remove(iommu->ir_domain); iommu->ir_domain = NULL; } + if (iommu->ir_ims_domain) { + irq_domain_remove(iommu->ir_ims_domain); + iommu->ir_ims_domain = NULL; + } free_pages((unsigned long)iommu->ir_table->base, INTR_REMAP_PAGE_ORDER); bitmap_free(iommu->ir_table->bitmap); @@ -1132,6 +1152,11 @@ static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info) if (iommu) return iommu->ir_msi_domain; break; + case X86_IRQ_ALLOC_TYPE_IMS: + iommu = map_gen_dev_to_ir(info->dev); + if (iommu) + return iommu->ir_ims_domain; + break; default: break; } @@ -1299,9 +1324,10 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, case X86_IRQ_ALLOC_TYPE_HPET: case X86_IRQ_ALLOC_TYPE_MSI: case X86_IRQ_ALLOC_TYPE_MSIX: + case X86_IRQ_ALLOC_TYPE_IMS: if (info->type == X86_IRQ_ALLOC_TYPE_HPET) set_hpet_sid(irte, info->hpet_id); - else + else if (info->type != X86_IRQ_ALLOC_TYPE_IMS) set_msi_sid(irte, info->msi_dev); msg->address_hi = MSI_ADDR_BASE_HI; @@ -1354,7 +1380,8 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain, if (!info || !iommu) return -EINVAL; if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_MSI && - info->type != X86_IRQ_ALLOC_TYPE_MSIX) + info->type != X86_IRQ_ALLOC_TYPE_MSIX && + info->type != X86_IRQ_ALLOC_TYPE_IMS) return -EINVAL; /* diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 980234ae0312..cdaab83001da 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -557,6 +557,7 @@ struct intel_iommu { struct ir_table *ir_table; /* Interrupt remapping info */ struct irq_domain *ir_domain; struct irq_domain *ir_msi_domain; + struct irq_domain *ir_ims_domain; #endif struct iommu_device iommu; /* IOMMU core code handle */ int node; @@ -701,6 +702,8 @@ extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev); static inline void intel_svm_check(struct intel_iommu *iommu) {} #endif +extern struct intel_iommu *device_to_iommu(struct device *dev, + u8 *bus, u8 *devfn); #ifdef CONFIG_INTEL_IOMMU_DEBUGFS void intel_iommu_debugfs_init(void); #else diff --git a/include/linux/msi.h b/include/linux/msi.h index 8b5f24bf3c47..2f8fa1391333 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -135,6 +135,7 @@ struct msi_desc { enum platform_msi_type { NOT_PLAT_MSI = 0, GEN_PLAT_MSI = 1, + IMS = 2, }; struct platform_msi_group_entry { @@ -454,4 +455,12 @@ static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev) } #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */ +#ifdef CONFIG_MSI_IMS +struct irq_domain *dev_get_ims_domain(struct device *dev); +#else +static inline struct irq_domain *dev_get_ims_domain(struct device *dev) +{ + return NULL; +} +#endif #endif /* LINUX_MSI_H */ From patchwork Tue Apr 21 23:34:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274558 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496KfN0Vhmz9sSq for ; Wed, 22 Apr 2020 09:34:36 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726501AbgDUXee (ORCPT ); Tue, 21 Apr 2020 19:34:34 -0400 Received: from mga03.intel.com ([134.134.136.65]:32263 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726523AbgDUXed (ORCPT ); Tue, 21 Apr 2020 19:34:33 -0400 IronPort-SDR: XKt8+CmwbvTeu8YVOgtiBR8DD53e4OzgN+fdN9Cs9ufwlCyu1p5qx+BVBN1nP1gvKIXkiRQ5iV GLxkJD4UtHMw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:34:31 -0700 IronPort-SDR: LzDL2DAv1j0ko+h7VF5KExRDU0seEEepGKoH05bc3fs903vHwy6jPKea6VkgpiuFEy+/iBa+Ar uypCG7CXjgoA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="279818733" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga004.fm.intel.com with ESMTP; 21 Apr 2020 16:34:30 -0700 Subject: [PATCH RFC 07/15] Documentation: Interrupt Message store From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:34:30 -0700 Message-ID: <158751207000.36773.18208950543781892.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Megha Dey Add documentation for interrupt message store. This documentation describes the basics of Interrupt Message Store (IMS), the need to introduce a new interrupt mechanism, implementation details in the kernel, driver changes required to support IMS and the general misconceptions and FAQs associated with IMS. Currently the only consumer of the newly introduced IMS APIs is Intel's Data Streaming Accelerator. Signed-off-by: Megha Dey --- Documentation/ims-howto.rst | 210 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 210 insertions(+) create mode 100644 Documentation/ims-howto.rst diff --git a/Documentation/ims-howto.rst b/Documentation/ims-howto.rst new file mode 100644 index 000000000000..a18de152b393 --- /dev/null +++ b/Documentation/ims-howto.rst @@ -0,0 +1,210 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: + +========================== +The IMS Driver Guide HOWTO +========================== + +:Authors: Megha Dey + +:Copyright: 2020 Intel Corporation + +About this guide +================ + +This guide describes the basics of Interrupt Message Store (IMS), the +need to introduce a new interrupt mechanism, implementation details of +IMS in the kernel, driver changes required to support IMS and the general +misconceptions and FAQs associated with IMS. + +What is IMS? +============ + +Intel has introduced the Scalable I/O virtualization (SIOV)[1] which +provides a scalable and lightweight approach to hardware assisted I/O +virtualization by overcoming many of the shortcomings of SR-IOV. + +SIOV shares I/O devices at a much finer granularity, the minimal sharable +resource being the 'Assignable Device Interface' or ADI. Each ADI can +support multiple interrupt messages and thus, we need a matching scalable +interrupt mechanism to process these ADI interrupts. Interrupt Message +Store or IMS is a new interrupt mechanism to meet such a demand. + +Why use IMS? +============ + +Until now, the maximum number of interrupts a device could support was 2048 +(using MSI-X). With IMS, there is no such restriction. A device can report +support for SIOV(and hence IMS) to the kernel through the host device +driver. Alternatively, if the kernel needs a generic way to discover these +capabilities without host driver dependency, the PCIE Designated Vendor +specific Extended capability(DVSEC) can be used. ([1]Section 3.7) + +IMS is device-specific which means that the programming of the interrupt +messages (address/data pairs) is done in some device specific way, and not +by using a standard like PCI. Also, by using IMS, the device is free to +choose where it wants to store the interrupt messages. This makes IMS +highly flexible. Some devices may organise IMS as a table in device memory +(like MSI-X) which can be accessed through one or more memory mapped system +pages, some device implementations may organise it in a distributed/ +replicated fashion at each of the “engines” in the device (with future +multi-tile devices) while context based devices (GPUs for instance), +can have it stored/located in memory (as part of supervisory state of a +command/context), that the hosting function can fetch and cache on demand +at the device. Since the number of contexts cannot be determined at boot +time, there cannot be a standard enumeration of the IMS size during boot. +In any approach, devices may implement IMS as either one unified storage +structure or de-centralized per ADI storage structures. + +Even though the IMS storage organisation is device-specific, IMS entries +store and generate interrupts using the same interrupt message address and +data format as the PCI Express MSI-X table entries, a DWORD size data +payload and a 64-bit address. Interrupt messages are expected to be +programmed only by the host driver. All the IMS interrupt messages are +stored in the remappable format. Hence, if a driver enables IMS, interrupt +remapping is also enabled by default. + +A device can support both MSI-X and IMS entries simultaneously, each being +used for a different purpose. E.g., MSI-X can be used to report device level +errors while IMS for software constructed devices created for native or +guest use. + +Implementation of IMS in the kernel +=================================== + +The Linux kernel today already provides a generic mechanism to support +non-PCI compliant MSI interrupts for platform devices (platform-msi.c). +To support IMS interrupts, we create a new IMS IRQ domain and extend the +existing infrastructure. Dynamic allocation of IMS vectors is a requirement +for devices which support Scalable I/O Virtualization. A driver can allocate +and free vectors not just once during probe (as was the case with MSI/MSI-X) +but also in the post probe phase where actual demand is available. Thus, a +new API, platform_msi_domain_alloc_irqs_group is introduced which drivers +using IMS would be able to call multiple times. The vectors allocated each +time this API is called are associated with a group ID. To free the vectors +associated with a particular group, the platform_msi_domain_free_irqs_group +API can be called. The existing drivers using platform-msi infrastructure +will continue to use the existing alloc (platform_msi_domain_alloc_irqs) +and free (platform_msi_domain_free_irqs) APIs and are assigned a default +group ID of 0. + +Thus, platform-msi.c provides the generic methods which can be used by any +non-pci MSI interrupt type while the newly created ims-msi.c provides IMS +specific callbacks that can be used by drivers capable of generating IMS +interrupts. Intel has introduced data streaming accelerator (DSA)[2] device +which supports SIOV and thus supports IMS. Currently, only Intel's Data +Accelerator (idxd) driver is a consumer of this feature. + +FAQs and general misconceptions: +================================ + +** There were some concerns raised by Thomas Gleixner and Marc Zyngier +during Linux plumbers conference 2019: + +1. Enumeration of IMS needs to be done by PCI core code and not by + individual device drivers: + + Currently, if the kernel needs a generic way to discover IMS capability + without host driver dependency, the PCIE Designated Vendor specific + + However, we cannot have a standard way of enumerating the IMS size + because for context based devices, the interrupt message is part of + the context itself which is managed entirely by the driver. Since + context creation is done on demand, there is no way to tell during boot + time, the maximum number of contexts (and hence the number of interrupt + messages)that the device can support. + + Also, this seems redundant (given only the driver will use this + information). Hence, we thought it may suffice to enumerate it as part + of driver callback interfaces. In the current linux code, even with + MSI-X, the size reported by MSI-X capability is used only to cross check + if the driver is asking more than that or not (and if so, fail the call). + + Although, if you believe it would be useful, we can add the IMS size + enumeration to the SIOV DVSEC capability. + + Perhaps there is a misunderstanding on what IMS serves. IMS is not a + system-wide interrupt solution which serves all devices; it is a + self-serving device level interrupt mechanism (other than using system + vector resources). Since both producer and consumer of IMS belong to + the same device driver, there wouldn't be any ordering problem. Whereas, + if IMS service is provided by one driver which serves multiple drivers, + there would be ordering problems to solve. + +Some other commonly asked questions about IMS are as follows: + +1. Does all SIOV devices support MSI-X (even if they have IMS)? + + Yes, all SIOV hosting functions are expected to have MSI-X capability + (irrespective of whether it supports IMS or not). This is done for + compatibility reasons, because a SIOV hosting function can be used + without enabling any SIOV capabilities as a standard PCIe PF. + +2. Why is Intel designing a new interrupt mechanism rather than extending + MSI-X to address its limitations? Isn't 2048 device interrupts enough? + + MSI-X has a rigid definition of one-table and on-device storage and does + not provide the full flexibility required for future multi-tile + accelerator designs. + IMS was envisioned to be used with large number of ADIs in devices where + each will need unique interrupt resources. For example, a DSA shared + work queue can support large number of clients where each client can + have its own interrupt. In future, with user interrupts, we expect the + demand for messages to increase further. + +3. Will there be devices which only support IMS in the future? + + No. All Scalable IOV devices will support MSI-X. But the number of MSI-X + table entries may be limited compared to number of IMS entries. Device + designs can restrict the number of interrupt messages supported with + MSI-X (e.g., support only what is required for the base PF function + without SIOV), and offer the interrupt message scalability only through + IMS. For e.g., DSA supports only 9 messages with MSI-X and 2K messages + with IMS. + +Device Driver Changes: +===================== + +1. platform_msi_domain_alloc_irqs_group (struct device *dev, unsigned int + nvec, const struct platform_msi_ops *platform_ops, int *group_id) + to allocate IMS interrupts, where: + + dev: The device for which to allocate interrupts + nvec: The number of interrupts to allocate + platform_ops: Callbacks for platform MSI ops (to be provided by driver) + group_id: returned by the call, to be used to free IRQs of a certain type + + eg: static struct platform_msi_ops ims_ops = { + .irq_mask = ims_irq_mask, + .irq_unmask = ims_irq_unmask, + .write_msg = ims_write_msg, + }; + + int group; + platform_msi_domain_alloc_irqs_group (dev, nvec, platform_ops, &group) + + where, struct platform_msi_ops: + irq_mask: mask an interrupt source + irq_unmask: unmask an interrupt source + irq_write_msi_msg: write message content + + This API can be called multiple times. Every time a new group will be + associated with the allocated vectors. Group ID starts from 0. + +2. platform_msi_domain_free_irqs_group(struct device *dev, int group) to + free IMS interrupts from a particular group + +3. To traverse the msi_descs associated with a group: + struct device *device; + struct msi_desc *desc; + struct platform_msi_group_entry *platform_msi_group; + int group; + + for_each_platform_msi_entry_in_group(desc, platform_msi_group, group, dev) { + } + +References: +=========== + +[1]https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification +[2]https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification From patchwork Tue Apr 21 23:34:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274559 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496KfX6lFLz9sSq for ; Wed, 22 Apr 2020 09:34:44 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726551AbgDUXej (ORCPT ); Tue, 21 Apr 2020 19:34:39 -0400 Received: from mga01.intel.com ([192.55.52.88]:36898 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726545AbgDUXei (ORCPT ); Tue, 21 Apr 2020 19:34:38 -0400 IronPort-SDR: gUohqOewZhV6j32SsuJNR+vY/NUQxWRteTj4YJVdOFT8s+cz41NbnuullmMTP/2iPQjQxRmk38 rTgbWLIiIuzA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:34:37 -0700 IronPort-SDR: ZYAjYixMVxHs0BuS7CR9a768GC36QrbGrfIo3IG6FMmvuu65RRToBv/27KZP8lrFfysWeYugkb CvrCRBIVIeUQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="258876497" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga006.jf.intel.com with ESMTP; 21 Apr 2020 16:34:36 -0700 Subject: [PATCH RFC 08/15] vfio/mdev: Add a member for iommu domain in mdev_device From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:34:36 -0700 Message-ID: <158751207630.36773.14545210630713509626.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Lu Baolu This adds a member to save iommu domain in mdev_device structure. Whenever an iommu domain is attached to the mediated device, it must be save here so that a VDCM (Virtual Device Control Module) could retreive it. Below member is added in struct mdev_device: * iommu_domain - A place to save the iommu domain attached to this mdev. Below helpers are added to set and get iommu domain in struct mdev_device. * mdev_set/get_iommu_domain(domain) - A iommu domain which has been attached to the iommu device in order to protect and isolate the mediated device will be kept in the mdev data structure and could be retrieved later. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Suggested-by: Kevin Tian Signed-off-by: Lu Baolu --- drivers/vfio/mdev/mdev_core.c | 16 ++++++++++++++++ drivers/vfio/mdev/mdev_private.h | 1 + include/linux/mdev.h | 10 ++++++++++ 3 files changed, 27 insertions(+) diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index cecc6a6bdbef..15863cf83f3f 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -410,6 +410,22 @@ struct device *mdev_get_iommu_device(struct device *dev) } EXPORT_SYMBOL(mdev_get_iommu_device); +void mdev_set_iommu_domain(struct device *dev, void *domain) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + mdev->iommu_domain = domain; +} +EXPORT_SYMBOL(mdev_set_iommu_domain); + +void *mdev_get_iommu_domain(struct device *dev) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + return mdev->iommu_domain; +} +EXPORT_SYMBOL(mdev_get_iommu_domain); + static int __init mdev_init(void) { return mdev_bus_register(); diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h index c21f1305a76b..c97478b22a02 100644 --- a/drivers/vfio/mdev/mdev_private.h +++ b/drivers/vfio/mdev/mdev_private.h @@ -32,6 +32,7 @@ struct mdev_device { struct list_head next; struct kobject *type_kobj; struct device *iommu_device; + void *iommu_domain; bool active; }; diff --git a/include/linux/mdev.h b/include/linux/mdev.h index fa2344e239ef..0d66daaecc67 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -26,6 +26,16 @@ int mdev_set_iommu_device(struct device *dev, struct device *iommu_device); struct device *mdev_get_iommu_device(struct device *dev); +/* + * Called by vfio iommu modules to save the iommu domain after a domain being + * attached to the mediated device. The vDCM (virtual device control module) + * could call mdev_get_iommu_domain() to retrieve an auxiliary domain attached + * to an mdev. + */ +void mdev_set_iommu_domain(struct device *dev, void *domain); + +void *mdev_get_iommu_domain(struct device *dev); + /** * struct mdev_parent_ops - Structure to be registered for each parent device to * register the device to mdev module. From patchwork Tue Apr 21 23:34:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274560 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496Kfg3jwYz9sSb for ; Wed, 22 Apr 2020 09:34:51 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726575AbgDUXep (ORCPT ); Tue, 21 Apr 2020 19:34:45 -0400 Received: from mga07.intel.com ([134.134.136.100]:63698 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726055AbgDUXeo (ORCPT ); Tue, 21 Apr 2020 19:34:44 -0400 IronPort-SDR: dPSH45TC/zqH8EOaDMtHUzXFakbOzuKNWclZKYFyPCN1k69vTWv/oBWM2v8mBqxn6RZjTjL52y 7WpoazAnLLZQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:34:44 -0700 IronPort-SDR: 881OnYkA0p1Dly6ophckgg8d9a6a39jFKJWZJ35KssbxE3eAKzsiMXHmDmTZaEEQ/Pk1jBQKAA eBGAlWMuRqOA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="255449567" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga003.jf.intel.com with ESMTP; 21 Apr 2020 16:34:42 -0700 Subject: [PATCH RFC 09/15] vfio/type1: Save domain when attach domain to mdev From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:34:42 -0700 Message-ID: <158751208274.36773.9573092458996405211.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Lu Baolu This saves the iommu domain in mdev on attaching a domain to it and clear it on detaching. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Signed-off-by: Lu Baolu --- drivers/vfio/vfio_iommu_type1.c | 52 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 48 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 85b32c325282..40b22c456b06 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1309,20 +1309,62 @@ static struct device *vfio_mdev_get_iommu_device(struct device *dev) return NULL; } +static int vfio_mdev_set_domain(struct device *dev, struct iommu_domain *domain) +{ + void (*fn)(struct device *dev, void *domain); + + fn = symbol_get(mdev_set_iommu_domain); + if (fn) { + fn(dev, domain); + symbol_put(mdev_set_iommu_domain); + + return 0; + } + + return -EINVAL; +} + +static struct iommu_domain *vfio_mdev_get_domain(struct device *dev) +{ + void *(*fn)(struct device *dev); + + fn = symbol_get(mdev_get_iommu_domain); + if (fn) { + struct iommu_domain *domain; + + domain = fn(dev); + symbol_put(mdev_get_iommu_domain); + + return domain; + } + + return NULL; +} + static int vfio_mdev_attach_domain(struct device *dev, void *data) { - struct iommu_domain *domain = data; + struct iommu_domain *domain; struct device *iommu_device; + int ret = -ENODEV; + + /* Only single domain is allowed to attach to an mdev. */ + domain = vfio_mdev_get_domain(dev); + if (domain) + return -EINVAL; + domain = data; iommu_device = vfio_mdev_get_iommu_device(dev); if (iommu_device) { if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX)) - return iommu_aux_attach_device(domain, iommu_device); + ret = iommu_aux_attach_device(domain, iommu_device); else - return iommu_attach_device(domain, iommu_device); + ret = iommu_attach_device(domain, iommu_device); } - return -EINVAL; + if (!ret) + vfio_mdev_set_domain(dev, domain); + + return ret; } static int vfio_mdev_detach_domain(struct device *dev, void *data) @@ -1338,6 +1380,8 @@ static int vfio_mdev_detach_domain(struct device *dev, void *data) iommu_detach_device(domain, iommu_device); } + vfio_mdev_set_domain(dev, NULL); + return 0; } From patchwork Tue Apr 21 23:34:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274561 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496Kfn1h27z9sSb for ; Wed, 22 Apr 2020 09:34:57 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726563AbgDUXev (ORCPT ); Tue, 21 Apr 2020 19:34:51 -0400 Received: from mga09.intel.com ([134.134.136.24]:48558 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726584AbgDUXev (ORCPT ); Tue, 21 Apr 2020 19:34:51 -0400 IronPort-SDR: WhHJM5adF9msLUpJxVzlNVQMMpaBWVSv6yZaMqv8uWIOb2GOCC8IO5/LUXfloWyh0dtS3avXQo sDzT3GBJ3LvA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:34:50 -0700 IronPort-SDR: XkIbvsMu7UxsdfxuM7r4XqFx0YfeWv0LreNwYwu+n4gq+Gy8zDHcg+pzn2MIflGRBhUQf+3uzh hkQTfj2/rePg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="334433607" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga001.jf.intel.com with ESMTP; 21 Apr 2020 16:34:49 -0700 Subject: [PATCH RFC 10/15] dmaengine: idxd: add config support for readonly devices From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:34:49 -0700 Message-ID: <158751208922.36773.10261419272041374043.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org The device can have a readonly bit set for configuration. This especially is true for mediated device in guest that are software emulated. Add support to load configuration if the device config is read only. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 159 ++++++++++++++++++++++++++++++++++++++++++++- drivers/dma/idxd/idxd.h | 2 + drivers/dma/idxd/init.c | 6 ++ drivers/dma/idxd/sysfs.c | 45 +++++++++---- 4 files changed, 194 insertions(+), 18 deletions(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 684a0e167770..a46b6558984c 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -603,6 +603,36 @@ static int idxd_groups_config_write(struct idxd_device *idxd) return 0; } +static int idxd_wq_config_write_ro(struct idxd_wq *wq) +{ + struct idxd_device *idxd = wq->idxd; + int wq_offset; + + if (!wq->group) + return 0; + + if (idxd->pasid_enabled) { + wq->wqcfg.pasid_en = 1; + if (wq->type == IDXD_WQT_KERNEL && wq_dedicated(wq)) + wq->wqcfg.pasid = idxd->pasid; + } else { + wq->wqcfg.pasid_en = 0; + } + + if (wq->type == IDXD_WQT_KERNEL) + wq->wqcfg.priv = 1; + + if (idxd->type == IDXD_TYPE_DSA && + idxd->hw.gen_cap.block_on_fault && + test_bit(WQ_FLAG_BOF, &wq->flags)) + wq->wqcfg.bof = 1; + + wq_offset = idxd->wqcfg_offset + wq->id * 32 + 2 * sizeof(u32); + iowrite32(wq->wqcfg.bits[2], idxd->reg_base + wq_offset); + + return 0; +} + static int idxd_wq_config_write(struct idxd_wq *wq) { struct idxd_device *idxd = wq->idxd; @@ -633,7 +663,8 @@ static int idxd_wq_config_write(struct idxd_wq *wq) if (idxd->pasid_enabled) { wq->wqcfg.pasid_en = 1; - wq->wqcfg.pasid = idxd->pasid; + if (wq->type == IDXD_WQT_KERNEL && wq_dedicated(wq)) + wq->wqcfg.pasid = idxd->pasid; } wq->wqcfg.priority = wq->priority; @@ -658,14 +689,17 @@ static int idxd_wq_config_write(struct idxd_wq *wq) return 0; } -static int idxd_wqs_config_write(struct idxd_device *idxd) +static int idxd_wqs_config_write(struct idxd_device *idxd, bool rw) { int i, rc; for (i = 0; i < idxd->max_wqs; i++) { struct idxd_wq *wq = &idxd->wqs[i]; - rc = idxd_wq_config_write(wq); + if (rw) + rc = idxd_wq_config_write(wq); + else + rc = idxd_wq_config_write_ro(wq); if (rc < 0) return rc; } @@ -764,6 +798,12 @@ static int idxd_wqs_setup(struct idxd_device *idxd) return 0; } +int idxd_device_ro_config(struct idxd_device *idxd) +{ + lockdep_assert_held(&idxd->dev_lock); + return idxd_wqs_config_write(idxd, false); +} + int idxd_device_config(struct idxd_device *idxd) { int rc; @@ -779,7 +819,7 @@ int idxd_device_config(struct idxd_device *idxd) idxd_group_flags_setup(idxd); - rc = idxd_wqs_config_write(idxd); + rc = idxd_wqs_config_write(idxd, true); if (rc < 0) return rc; @@ -789,3 +829,114 @@ int idxd_device_config(struct idxd_device *idxd) return 0; } + +static void idxd_wq_load_config(struct idxd_wq *wq) +{ + struct idxd_device *idxd = wq->idxd; + struct device *dev = &idxd->pdev->dev; + int wqcfg_offset; + int i; + + wqcfg_offset = idxd->wqcfg_offset + wq->id * 32; + memcpy_fromio(&wq->wqcfg, idxd->reg_base + wqcfg_offset, + sizeof(union wqcfg)); + + wq->size = wq->wqcfg.wq_size; + wq->threshold = wq->wqcfg.wq_thresh; + if (wq->wqcfg.priv) + wq->type = IDXD_WQT_KERNEL; + + if (wq->wqcfg.mode) + set_bit(WQ_FLAG_DEDICATED, &wq->flags); + + wq->priority = wq->wqcfg.priority; + + if (wq->wqcfg.bof) + set_bit(WQ_FLAG_BOF, &wq->flags); + + if (idxd->pasid_enabled) { + wq->wqcfg.pasid_en = 1; + wqcfg_offset = idxd->wqcfg_offset + + wq->id * 32 + 2 * sizeof(u32); + iowrite32(wq->wqcfg.bits[2], idxd->reg_base + wqcfg_offset); + } + + for (i = 0; i < 8; i++) { + wqcfg_offset = idxd->wqcfg_offset + + wq->id * 32 + i * sizeof(u32); + dev_dbg(dev, "WQ[%d][%d][%#x]: %#x\n", + wq->id, i, wqcfg_offset, wq->wqcfg.bits[i]); + } +} + +static void idxd_group_load_config(struct idxd_group *group) +{ + struct idxd_device *idxd = group->idxd; + struct device *dev = &idxd->pdev->dev; + int i, j, grpcfg_offset; + + /* load wqs */ + for (i = 0; i < 4; i++) { + struct idxd_wq *wq; + + grpcfg_offset = idxd->grpcfg_offset + + group->id * 64 + i * sizeof(u64); + group->grpcfg.wqs[i] = + ioread64(idxd->reg_base + grpcfg_offset); + dev_dbg(dev, "GRPCFG wq[%d:%d: %#x]: %#llx\n", + group->id, i, grpcfg_offset, + group->grpcfg.wqs[i]); + + for (j = 0; j < sizeof(u64); j++) { + int wq_id = i * 64 + j; + + if (wq_id >= idxd->max_wqs) + break; + if (group->grpcfg.wqs[i] & BIT(j)) { + wq = &idxd->wqs[wq_id]; + wq->group = group; + } + } + } + + grpcfg_offset = idxd->grpcfg_offset + group->id * 64 + 32; + group->grpcfg.engines = ioread64(idxd->reg_base + grpcfg_offset); + dev_dbg(dev, "GRPCFG engs[%d: %#x]: %#llx\n", group->id, + grpcfg_offset, group->grpcfg.engines); + + for (i = 0; i < sizeof(u64); i++) { + if (i > idxd->max_engines) + break; + if (group->grpcfg.engines & BIT(i)) { + struct idxd_engine *engine = &idxd->engines[i]; + + engine->group = group; + } + } + + grpcfg_offset = grpcfg_offset + group->id * 64 + 40; + group->grpcfg.flags.bits = ioread32(idxd->reg_base + grpcfg_offset); + dev_dbg(dev, "GRPFLAGS flags[%d: %#x]: %#x\n", + group->id, grpcfg_offset, group->grpcfg.flags.bits); +} + +void idxd_device_load_config(struct idxd_device *idxd) +{ + union gencfg_reg reg; + int i; + + reg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET); + idxd->token_limit = reg.token_limit; + + for (i = 0; i < idxd->max_groups; i++) { + struct idxd_group *group = &idxd->groups[i]; + + idxd_group_load_config(group); + } + + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq = &idxd->wqs[i]; + + idxd_wq_load_config(wq); + } +} diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 304b76169c0d..82a9b6035722 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -286,8 +286,10 @@ int idxd_device_reset(struct idxd_device *idxd); int __idxd_device_reset(struct idxd_device *idxd); void idxd_device_cleanup(struct idxd_device *idxd); int idxd_device_config(struct idxd_device *idxd); +int idxd_device_ro_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); int idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); +void idxd_device_load_config(struct idxd_device *idxd); /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index f794ee1c7c1b..c0fd796e9dce 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -367,6 +367,12 @@ static int idxd_probe(struct idxd_device *idxd) if (rc) goto err_setup; + /* If the configs are readonly, then load them from device */ + if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) { + dev_dbg(dev, "Loading RO device config\n"); + idxd_device_load_config(idxd); + } + rc = idxd_setup_interrupts(idxd); if (rc) goto err_setup; diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index dc38172be42e..1dd3ade2e438 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -120,13 +120,17 @@ static int idxd_config_bus_probe(struct device *dev) spin_lock_irqsave(&idxd->dev_lock, flags); - /* Perform IDXD configuration and enabling */ - rc = idxd_device_config(idxd); - if (rc < 0) { - spin_unlock_irqrestore(&idxd->dev_lock, flags); - module_put(THIS_MODULE); - dev_warn(dev, "Device config failed: %d\n", rc); - return rc; + if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) { + /* Perform DSA configuration and enabling */ + rc = idxd_device_config(idxd); + if (rc < 0) { + spin_unlock_irqrestore(&idxd->dev_lock, + flags); + module_put(THIS_MODULE); + dev_warn(dev, "Device config failed: %d\n", + rc); + return rc; + } } /* start device */ @@ -211,13 +215,26 @@ static int idxd_config_bus_probe(struct device *dev) } spin_lock_irqsave(&idxd->dev_lock, flags); - rc = idxd_device_config(idxd); - if (rc < 0) { - spin_unlock_irqrestore(&idxd->dev_lock, flags); - mutex_unlock(&wq->wq_lock); - dev_warn(dev, "Writing WQ %d config failed: %d\n", - wq->id, rc); - return rc; + if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) { + rc = idxd_device_config(idxd); + if (rc < 0) { + spin_unlock_irqrestore(&idxd->dev_lock, + flags); + mutex_unlock(&wq->wq_lock); + dev_warn(dev, "Writing WQ %d config failed: %d\n", + wq->id, rc); + return rc; + } + } else { + rc = idxd_device_ro_config(idxd); + if (rc < 0) { + spin_unlock_irqrestore(&idxd->dev_lock, + flags); + mutex_unlock(&wq->wq_lock); + dev_warn(dev, "Writing WQ %d config failed: %d\n", + wq->id, rc); + return rc; + } } rc = idxd_wq_enable(wq); From patchwork Tue Apr 21 23:34:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274562 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496Kfq3C07z9sSb for ; Wed, 22 Apr 2020 09:34:59 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726591AbgDUXe6 (ORCPT ); Tue, 21 Apr 2020 19:34:58 -0400 Received: from mga12.intel.com ([192.55.52.136]:36000 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726608AbgDUXe6 (ORCPT ); Tue, 21 Apr 2020 19:34:58 -0400 IronPort-SDR: nQxz2siVnq6U5jfEJ6lYrNfbk3goRjfrwLQmiMaLDtKbw4+4E+DD7FbAZTyhy3NnMeElgXnYq3 90vAkjJX5Diw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:34:57 -0700 IronPort-SDR: ZJmjtUUbcxGue18s2Ks9zSrgfRaeub7hlTKbqDKmzcFFNV/DSQhrPt2L2rN1SpJVJ6r9/tWBYA mvfERpA1QQBw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="429701097" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga005.jf.intel.com with ESMTP; 21 Apr 2020 16:34:56 -0700 Subject: [PATCH RFC 11/15] dmaengine: idxd: add IMS support in base driver From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:34:55 -0700 Message-ID: <158751209583.36773.15917761221672315662.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org In preparation for support of VFIO mediated device for idxd driver, the enabling for Interrupt Message Store (IMS) interrupts is added for the idxd base driver. Until now, the maximum number of interrupts a device could support was 2048 (MSI-X). With IMS, the maximum number of interrupts can be significantly expanded for guest support. This commit only provides the support functions in the base driver and not the VFIO mdev code utilization. See Intel SIOV spec for more details: https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification Signed-off-by: Dave Jiang --- drivers/dma/Kconfig | 1 drivers/dma/idxd/Makefile | 2 - drivers/dma/idxd/cdev.c | 3 + drivers/dma/idxd/idxd.h | 21 ++++- drivers/dma/idxd/init.c | 46 +++++++++++- drivers/dma/idxd/mdev.c | 179 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/mdev.h | 82 +++++++++++++++++++++ drivers/dma/idxd/submit.c | 3 + drivers/dma/idxd/sysfs.c | 11 +++ 9 files changed, 340 insertions(+), 8 deletions(-) create mode 100644 drivers/dma/idxd/mdev.c create mode 100644 drivers/dma/idxd/mdev.h diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 71ea9f24a8f9..9e7d9eafb1f5 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -290,6 +290,7 @@ config INTEL_IDXD select PCI_PRI select PCI_PASID select PCI_IOV + select MSI_IMS help Enable support for the Intel(R) data accelerators present in Intel Xeon CPU. diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile index 8978b898d777..308e12869f96 100644 --- a/drivers/dma/idxd/Makefile +++ b/drivers/dma/idxd/Makefile @@ -1,2 +1,2 @@ obj-$(CONFIG_INTEL_IDXD) += idxd.o -idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o +idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o mdev.o diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 27be9250606d..ddd3ce16620d 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -186,7 +186,8 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma) vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_WIPEONFORK; pfn = (base + idxd_get_wq_portal_full_offset(wq->id, - IDXD_PORTAL_LIMITED)) >> PAGE_SHIFT; + IDXD_PORTAL_LIMITED, + IDXD_IRQ_MSIX)) >> PAGE_SHIFT; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_private_data = ctx; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 82a9b6035722..3a942e9c5980 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -172,6 +172,7 @@ struct idxd_device { int num_groups; + u32 ims_offset; u32 msix_perm_offset; u32 wqcfg_offset; u32 grpcfg_offset; @@ -179,6 +180,7 @@ struct idxd_device { u64 max_xfer_bytes; u32 max_batch_size; + int ims_size; int max_groups; int max_engines; int max_tokens; @@ -194,6 +196,9 @@ struct idxd_device { struct idxd_irq_entry *irq_entries; struct dma_device dma_dev; + + atomic_t num_allocated_ims; + struct sbitmap ims_sbmap; }; /* IDXD software descriptor */ @@ -224,15 +229,23 @@ enum idxd_portal_prot { IDXD_PORTAL_LIMITED, }; -static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot) +enum idxd_interrupt_type { + IDXD_IRQ_MSIX = 0, + IDXD_IRQ_IMS, +}; + +static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return prot * 0x1000; + return prot * 0x1000 + irq_type * 0x2000; } static inline int idxd_get_wq_portal_full_offset(int wq_id, - enum idxd_portal_prot prot) + enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot); + return ((wq_id * 4) << PAGE_SHIFT) + + idxd_get_wq_portal_offset(prot, irq_type); } static inline void idxd_set_type(struct idxd_device *idxd) diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index c0fd796e9dce..15b3ef73cac3 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -231,10 +231,42 @@ static void idxd_read_table_offsets(struct idxd_device *idxd) idxd->msix_perm_offset = offsets.msix_perm * 0x100; dev_dbg(dev, "IDXD MSIX Permission Offset: %#x\n", idxd->msix_perm_offset); + idxd->ims_offset = offsets.ims * 0x100; + dev_dbg(dev, "IDXD IMS Offset: %#x\n", idxd->ims_offset); idxd->perfmon_offset = offsets.perfmon * 0x100; dev_dbg(dev, "IDXD Perfmon Offset: %#x\n", idxd->perfmon_offset); } +static int device_supports_ims(struct pci_dev *pdev) +{ + int dvsec; + u16 val16; + u32 val32; + + dvsec = pci_find_ext_capability(pdev, 0x23); + pci_read_config_word(pdev, dvsec + 0x4, &val16); + if (val16 != 0x8086) { + dev_dbg(&pdev->dev, "DVSEC vendor id is not Intel\n"); + return -EOPNOTSUPP; + } + + pci_read_config_word(pdev, dvsec + 0x8, &val16); + if (val16 != 0x5) { + dev_dbg(&pdev->dev, "DVSEC ID is not SIOV\n"); + return -EOPNOTSUPP; + } + + pci_read_config_dword(pdev, dvsec + 0x14, &val32); + if (val32 & 0x1) { + dev_dbg(&pdev->dev, "IMS supported for device\n"); + return 0; + } + + dev_dbg(&pdev->dev, "IMS unsupported for device\n"); + + return -EOPNOTSUPP; +} + static void idxd_read_caps(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; @@ -247,9 +279,11 @@ static void idxd_read_caps(struct idxd_device *idxd) dev_dbg(dev, "max xfer size: %llu bytes\n", idxd->max_xfer_bytes); idxd->max_batch_size = 1U << idxd->hw.gen_cap.max_batch_shift; dev_dbg(dev, "max batch size: %u\n", idxd->max_batch_size); + if (device_supports_ims(idxd->pdev) == 0) + idxd->ims_size = idxd->hw.gen_cap.max_ims_mult * 256ULL; + dev_dbg(dev, "IMS size: %u\n", idxd->ims_size); if (idxd->hw.gen_cap.config_en) set_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags); - /* reading group capabilities */ idxd->hw.group_cap.bits = ioread64(idxd->reg_base + IDXD_GRPCAP_OFFSET); @@ -294,6 +328,7 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev) idxd->pdev = pdev; spin_lock_init(&idxd->dev_lock); + atomic_set(&idxd->num_allocated_ims, 0); return idxd; } @@ -389,9 +424,18 @@ static int idxd_probe(struct idxd_device *idxd) idxd->major = idxd_cdev_get_major(idxd); + rc = sbitmap_init_node(&idxd->ims_sbmap, idxd->ims_size, -1, + GFP_KERNEL, dev_to_node(dev)); + if (rc < 0) + goto sbitmap_fail; + dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id); return 0; + sbitmap_fail: + mutex_lock(&idxd_idr_lock); + idr_remove(&idxd_idrs[idxd->type], idxd->id); + mutex_unlock(&idxd_idr_lock); err_idr_fail: idxd_mask_error_interrupts(idxd); idxd_mask_msix_vectors(idxd); diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c new file mode 100644 index 000000000000..2cf0cdf149b7 --- /dev/null +++ b/drivers/dma/idxd/mdev.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "../../vfio/pci/vfio_pci_private.h" +#include +#include "registers.h" +#include "idxd.h" +#include "mdev.h" + +static void idxd_free_ims_index(struct idxd_device *idxd, + unsigned long ims_idx) +{ + sbitmap_clear_bit(&idxd->ims_sbmap, ims_idx); + atomic_dec(&idxd->num_allocated_ims); +} + +static int vidxd_free_ims_entries(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + struct ims_irq_entry *irq_entry; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct msi_desc *desc; + int i = 0; + struct platform_msi_group_entry *platform_msi_group; + + for_each_platform_msi_entry_in_group(desc, platform_msi_group, 0, dev) { + irq_entry = &vidxd->irq_entries[i]; + devm_free_irq(dev, desc->irq, irq_entry); + i++; + } + + platform_msi_domain_free_irqs(dev); + + for (i = 0; i < vidxd->num_wqs; i++) + idxd_free_ims_index(idxd, vidxd->ims_index[i]); + return 0; +} + +static int idxd_alloc_ims_index(struct idxd_device *idxd) +{ + int index; + + index = sbitmap_get(&idxd->ims_sbmap, 0, false); + if (index < 0) + return -ENOSPC; + return index; +} + +static unsigned int idxd_ims_irq_mask(struct msi_desc *desc) +{ + int ims_offset; + u32 mask_bits = desc->platform.masked; + struct device *dev = desc->dev; + struct mdev_device *mdev = mdev_from_dev(dev); + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_device *idxd = vidxd->idxd; + void __iomem *base; + int ims_id = desc->platform.msi_index; + + dev_dbg(dev, "idxd irq mask: %d\n", ims_id); + + mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_id] * 0x10; + base = idxd->reg_base + ims_offset; + iowrite32(mask_bits, base + PCI_MSIX_ENTRY_VECTOR_CTRL); + + return mask_bits; +} + +static unsigned int idxd_ims_irq_unmask(struct msi_desc *desc) +{ + int ims_offset; + u32 mask_bits = desc->platform.masked; + struct device *dev = desc->dev; + struct mdev_device *mdev = mdev_from_dev(dev); + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_device *idxd = vidxd->idxd; + void __iomem *base; + int ims_id = desc->platform.msi_index; + + dev_dbg(dev, "idxd irq unmask: %d\n", ims_id); + + mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_id] * 0x10; + base = idxd->reg_base + ims_offset; + iowrite32(mask_bits, base + PCI_MSIX_ENTRY_VECTOR_CTRL); + + return mask_bits; +} + +static void idxd_ims_write_msg(struct msi_desc *desc, struct msi_msg *msg) +{ + int ims_offset; + struct device *dev = desc->dev; + struct mdev_device *mdev = mdev_from_dev(dev); + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_device *idxd = vidxd->idxd; + void __iomem *base; + int ims_id = desc->platform.msi_index; + + dev_dbg(dev, "ims_write: %d %x\n", ims_id, msg->address_lo); + + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_id] * 0x10; + base = idxd->reg_base + ims_offset; + iowrite32(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR); + iowrite32(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR); + iowrite32(msg->data, base + PCI_MSIX_ENTRY_DATA); +} + +static struct platform_msi_ops idxd_ims_ops = { + .irq_mask = idxd_ims_irq_mask, + .irq_unmask = idxd_ims_irq_unmask, + .write_msg = idxd_ims_write_msg, +}; + +static irqreturn_t idxd_guest_wq_completion_interrupt(int irq, void *data) +{ + /* send virtual interrupt */ + return IRQ_HANDLED; +} + +static int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + struct ims_irq_entry *irq_entry; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct msi_desc *desc; + int err, i = 0; + int group; + struct platform_msi_group_entry *platform_msi_group; + + if (!atomic_add_unless(&idxd->num_allocated_ims, vidxd->num_wqs, + idxd->ims_size)) + return -ENOSPC; + + vidxd->ims_index[0] = idxd_alloc_ims_index(idxd); + + err = platform_msi_domain_alloc_irqs_group(dev, vidxd->num_wqs, + &idxd_ims_ops, &group); + if (err < 0) { + dev_dbg(dev, "Enabling IMS entry! %d\n", err); + return err; + } + + i = 0; + for_each_platform_msi_entry_in_group(desc, platform_msi_group, group, dev) { + irq_entry = &vidxd->irq_entries[i]; + irq_entry->vidxd = vidxd; + irq_entry->int_src = i; + err = devm_request_irq(dev, desc->irq, + idxd_guest_wq_completion_interrupt, 0, + "idxd-ims", irq_entry); + if (err) + break; + i++; + } + + if (err) { + i = 0; + for_each_platform_msi_entry_in_group(desc, platform_msi_group, group, dev) { + irq_entry = &vidxd->irq_entries[i]; + devm_free_irq(dev, desc->irq, irq_entry); + i++; + } + platform_msi_domain_free_irqs_group(dev, group); + } + + return 0; +} diff --git a/drivers/dma/idxd/mdev.h b/drivers/dma/idxd/mdev.h new file mode 100644 index 000000000000..5b05b6cb2b7b --- /dev/null +++ b/drivers/dma/idxd/mdev.h @@ -0,0 +1,82 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2019 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_MDEV_H_ +#define _IDXD_MDEV_H_ + +/* two 64-bit BARs implemented */ +#define VIDXD_MAX_BARS 2 +#define VIDXD_MAX_CFG_SPACE_SZ 4096 +#define VIDXD_MSIX_TBL_SZ_OFFSET 0x42 +#define VIDXD_CAP_CTRL_SZ 0x100 +#define VIDXD_GRP_CTRL_SZ 0x100 +#define VIDXD_WQ_CTRL_SZ 0x100 +#define VIDXD_WQ_OCPY_INT_SZ 0x20 +#define VIDXD_MSIX_TBL_SZ 0x90 +#define VIDXD_MSIX_PERM_TBL_SZ 0x48 + +#define VIDXD_MSIX_TABLE_OFFSET 0x600 +#define VIDXD_MSIX_PERM_OFFSET 0x300 +#define VIDXD_GRPCFG_OFFSET 0x400 +#define VIDXD_WQCFG_OFFSET 0x500 +#define VIDXD_IMS_OFFSET 0x1000 + +#define VIDXD_BAR0_SIZE 0x2000 +#define VIDXD_BAR2_SIZE 0x20000 +#define VIDXD_MAX_MSIX_ENTRIES (VIDXD_MSIX_TBL_SZ / 0x10) +#define VIDXD_MAX_WQS 1 + +#define VIDXD_ATS_OFFSET 0x100 +#define VIDXD_PRS_OFFSET 0x110 +#define VIDXD_PASID_OFFSET 0x120 +#define VIDXD_MSIX_PBA_OFFSET 0x700 + +struct vdcm_idxd_pci_bar0 { + u8 cap_ctrl_regs[VIDXD_CAP_CTRL_SZ]; + u8 grp_ctrl_regs[VIDXD_GRP_CTRL_SZ]; + u8 wq_ctrl_regs[VIDXD_WQ_CTRL_SZ]; + u8 wq_ocpy_int_regs[VIDXD_WQ_OCPY_INT_SZ]; + u8 msix_table[VIDXD_MSIX_TBL_SZ]; + u8 msix_perm_table[VIDXD_MSIX_PERM_TBL_SZ]; + unsigned long msix_pba; +}; + +struct ims_irq_entry { + struct vdcm_idxd *vidxd; + int int_src; +}; + +struct idxd_vdev { + struct mdev_device *mdev; + struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES]; + struct notifier_block group_notifier; + struct kvm *kvm; + struct work_struct release_work; + atomic_t released; +}; + +struct vdcm_idxd { + struct idxd_device *idxd; + struct idxd_wq *wq; + struct idxd_vdev vdev; + struct vdcm_idxd_type *type; + int num_wqs; + unsigned long handle; + u64 ims_index[VIDXD_MAX_WQS]; + struct msix_entry ims_entry; + struct ims_irq_entry irq_entries[VIDXD_MAX_WQS]; + + /* For VM use case */ + u64 bar_val[VIDXD_MAX_BARS]; + u64 bar_size[VIDXD_MAX_BARS]; + u8 cfg[VIDXD_MAX_CFG_SPACE_SZ]; + struct vdcm_idxd_pci_bar0 bar0; + struct list_head list; +}; + +static inline struct vdcm_idxd *to_vidxd(struct idxd_vdev *vdev) +{ + return container_of(vdev, struct vdcm_idxd, vdev); +} + +#endif diff --git a/drivers/dma/idxd/submit.c b/drivers/dma/idxd/submit.c index 741bc3aa7267..bdcac933bb28 100644 --- a/drivers/dma/idxd/submit.c +++ b/drivers/dma/idxd/submit.c @@ -123,7 +123,8 @@ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc, return -EIO; portal = wq->portal + - idxd_get_wq_portal_offset(IDXD_PORTAL_UNLIMITED); + idxd_get_wq_portal_offset(IDXD_PORTAL_UNLIMITED, + IDXD_IRQ_MSIX); if (wq_dedicated(wq)) { /* * The wmb() flushes writes to coherent DMA data before diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 1dd3ade2e438..07bad4f6c7fb 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -1282,6 +1282,16 @@ static ssize_t numa_node_show(struct device *dev, } static DEVICE_ATTR_RO(numa_node); +static ssize_t ims_size_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct idxd_device *idxd = + container_of(dev, struct idxd_device, conf_dev); + + return sprintf(buf, "%u\n", idxd->ims_size); +} +static DEVICE_ATTR_RO(ims_size); + static ssize_t max_batch_size_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1467,6 +1477,7 @@ static struct attribute *idxd_device_attributes[] = { &dev_attr_max_work_queues_size.attr, &dev_attr_max_engines.attr, &dev_attr_numa_node.attr, + &dev_attr_ims_size.attr, &dev_attr_max_batch_size.attr, &dev_attr_max_transfer_size.attr, &dev_attr_op_cap.attr, From patchwork Tue Apr 21 23:35:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274563 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496Kfy3rk5z9sSb for ; Wed, 22 Apr 2020 09:35:06 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726632AbgDUXfF (ORCPT ); Tue, 21 Apr 2020 19:35:05 -0400 Received: from mga02.intel.com ([134.134.136.20]:34876 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726115AbgDUXfE (ORCPT ); Tue, 21 Apr 2020 19:35:04 -0400 IronPort-SDR: z4fLPTsfZeKPfTktkvj75Qo3QZ7L0MM4EqbE1UMmSG73qvLa/stNObnCVIcPwfAblHGQK6Uc1v n8fc+2QGpzGA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:35:03 -0700 IronPort-SDR: e8IXMnwcVy9R7z/Z76DjhLiCt+m694yLoHh2Wh79Jq5CoUJZXiQOn11QrwRKUR0T7U3sR9pjtg JhCdYIHWfNzw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="255449685" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga003.jf.intel.com with ESMTP; 21 Apr 2020 16:35:02 -0700 Subject: [PATCH RFC 12/15] dmaengine: idxd: add device support functions in prep for mdev From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:35:02 -0700 Message-ID: <158751210234.36773.5978383376123318481.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add some device support helper functions that will be used by VFIO mediated device in preparation of adding VFIO mdev support. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 130 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/idxd.h | 7 ++ drivers/dma/idxd/init.c | 19 +++++++ 3 files changed, 156 insertions(+) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index a46b6558984c..830aa5859646 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -319,6 +319,40 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq) devm_iounmap(dev, wq->portal); } +int idxd_wq_abort(struct idxd_wq *wq) +{ + int rc; + struct idxd_device *idxd = wq->idxd; + struct device *dev = &idxd->pdev->dev; + u32 operand, status; + + lockdep_assert_held(&idxd->dev_lock); + + dev_dbg(dev, "Abort WQ %d\n", wq->id); + if (wq->state != IDXD_WQ_ENABLED) { + dev_dbg(dev, "WQ %d not active\n", wq->id); + return -ENXIO; + } + + operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand); + rc = idxd_cmd_send(idxd, IDXD_CMD_ABORT_WQ, operand); + if (rc < 0) + return rc; + + rc = idxd_cmd_wait(idxd, &status, IDXD_DRAIN_TIMEOUT); + if (rc < 0) + return rc; + + if (status != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ abort failed: %#x\n", status); + return -ENXIO; + } + + dev_dbg(dev, "WQ %d aborted\n", wq->id); + return 0; +} + int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) { struct idxd_device *idxd = wq->idxd; @@ -372,6 +406,66 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq) return 0; } +void idxd_wq_update_pasid(struct idxd_wq *wq, int pasid) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + + /* PASID fields are 8 bytes into the WQCFG register */ + offset = idxd->wqcfg_offset + wq->id * 32 + 8; + wq->wqcfg.pasid = pasid; + iowrite32(wq->wqcfg.bits[2], idxd->reg_base + offset); +} + +void idxd_wq_update_priv(struct idxd_wq *wq, int priv) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + + /* priv field is 8 bytes into the WQCFG register */ + offset = idxd->wqcfg_offset + wq->id * 32 + 8; + wq->wqcfg.priv = !!priv; + iowrite32(wq->wqcfg.bits[2], idxd->reg_base + offset); +} + +int idxd_wq_drain(struct idxd_wq *wq) +{ + int rc; + struct idxd_device *idxd = wq->idxd; + struct device *dev = &idxd->pdev->dev; + u32 operand, status; + + lockdep_assert_held(&idxd->dev_lock); + + dev_dbg(dev, "Drain WQ %d\n", wq->id); + if (wq->state != IDXD_WQ_ENABLED) { + dev_dbg(dev, "WQ %d not active\n", wq->id); + return -ENXIO; + } + + operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_DRAIN_WQ, operand); + rc = idxd_cmd_send(idxd, IDXD_CMD_DRAIN_WQ, operand); + if (rc < 0) + return rc; + + rc = idxd_cmd_wait(idxd, &status, IDXD_DRAIN_TIMEOUT); + if (rc < 0) + return rc; + + if (status != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ drain failed: %#x\n", status); + return -ENXIO; + } + + dev_dbg(dev, "WQ %d drained\n", wq->id); + return 0; +} + /* Device control bits */ static inline bool idxd_is_enabled(struct idxd_device *idxd) { @@ -542,6 +636,42 @@ int idxd_device_drain_pasid(struct idxd_device *idxd, int pasid) return 0; } +int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, + int *handle) +{ + int rc; + struct device *dev = &idxd->pdev->dev; + u32 operand, status; + + lockdep_assert_held(&idxd->dev_lock); + + if (!idxd->hw.gen_cap.int_handle_req) + return -EOPNOTSUPP; + + dev_dbg(dev, "get int handle, idx %d\n", idx); + + operand = idx & 0xffff; + dev_dbg(dev, "cmd: %u operand: %#x\n", + IDXD_CMD_REQUEST_INT_HANDLE, operand); + rc = idxd_cmd_send(idxd, IDXD_CMD_REQUEST_INT_HANDLE, operand); + if (rc < 0) + return rc; + + rc = idxd_cmd_wait(idxd, &status, IDXD_REG_TIMEOUT); + if (rc < 0) + return rc; + + if (status != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "request int handle failed: %#x\n", status); + return -ENXIO; + } + + *handle = (status >> 8) & 0xffff; + + dev_dbg(dev, "int handle acquired: %u\n", *handle); + return 0; +} + /* Device configuration bits */ static void idxd_group_config_write(struct idxd_group *group) { diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 3a942e9c5980..9b56a4c7f3fc 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -199,6 +199,7 @@ struct idxd_device { atomic_t num_allocated_ims; struct sbitmap ims_sbmap; + int *int_handles; }; /* IDXD software descriptor */ @@ -303,6 +304,8 @@ int idxd_device_ro_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); int idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); void idxd_device_load_config(struct idxd_device *idxd); +int idxd_device_request_int_handle(struct idxd_device *idxd, + int idx, int *handle); /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); @@ -313,6 +316,10 @@ int idxd_wq_map_portal(struct idxd_wq *wq); void idxd_wq_unmap_portal(struct idxd_wq *wq); int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid); int idxd_wq_disable_pasid(struct idxd_wq *wq); +int idxd_wq_abort(struct idxd_wq *wq); +void idxd_wq_update_pasid(struct idxd_wq *wq, int pasid); +void idxd_wq_update_priv(struct idxd_wq *wq, int priv); +int idxd_wq_drain(struct idxd_wq *wq); /* submission */ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc, diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 15b3ef73cac3..babe6e614087 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -56,6 +56,7 @@ static int idxd_setup_interrupts(struct idxd_device *idxd) int i, msixcnt; int rc = 0; union msix_perm mperm; + unsigned long flags; msixcnt = pci_msix_vec_count(pdev); if (msixcnt < 0) { @@ -130,6 +131,17 @@ static int idxd_setup_interrupts(struct idxd_device *idxd) } dev_dbg(dev, "Allocated idxd-msix %d for vector %d\n", i, msix->vector); + + if (idxd->hw.gen_cap.int_handle_req) { + spin_lock_irqsave(&idxd->dev_lock, flags); + rc = idxd_device_request_int_handle(idxd, i, + &idxd->int_handles[i]); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + if (rc < 0) + goto err_no_irq; + dev_dbg(dev, "int handle requested: %u\n", + idxd->int_handles[i]); + } } idxd_unmask_error_interrupts(idxd); @@ -168,6 +180,13 @@ static int idxd_setup_internals(struct idxd_device *idxd) struct device *dev = &idxd->pdev->dev; int i; + if (idxd->hw.gen_cap.int_handle_req) { + idxd->int_handles = devm_kcalloc(dev, idxd->max_wqs, + sizeof(int), GFP_KERNEL); + if (!idxd->int_handles) + return -ENOMEM; + } + idxd->groups = devm_kcalloc(dev, idxd->max_groups, sizeof(struct idxd_group), GFP_KERNEL); if (!idxd->groups) From patchwork Tue Apr 21 23:35:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274567 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496Kkx1sQSz9sSb for ; Wed, 22 Apr 2020 09:38:33 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726359AbgDUXi1 (ORCPT ); Tue, 21 Apr 2020 19:38:27 -0400 Received: from mga03.intel.com ([134.134.136.65]:32297 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726660AbgDUXi0 (ORCPT ); Tue, 21 Apr 2020 19:38:26 -0400 IronPort-SDR: hoeCV73boYUffI+wRzfZbhuDl42WVSLMivtYHlmemNyP2oqvQEYv+D51q5V6U0siW7fo7asOfy fE+HPzqxXSUw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:35:10 -0700 IronPort-SDR: hodVW9VvG+CS5iFg6Z1WxtbE9gR3mKlOEom+PKawBJ8eKb+n6kd8Y6dkWaFuifSOw+P/i9cCgb nib0bWfsehEQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="456286451" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga005.fm.intel.com with ESMTP; 21 Apr 2020 16:35:08 -0700 Subject: [PATCH RFC 13/15] dmaengine: idxd: add support for VFIO mediated device From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:35:08 -0700 Message-ID: <158751210879.36773.4070933531991265762.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add enabling code that provide VFIO mediated device. A mediated device allows hardware to export resources to guests with significantly less dedicated hardware versus the SR-IOV implementation. For DSA devices through mdev enabling, we can emulate a virtual DSA device in the guest by exporting one or more workqueues to the guest and exposed as DSA device(s). The software emulates PCI config and MMIO accesses. The I/O submission path however is accessed directly to the hardware. A submission portal is mmap'd to the guest in order to allow direct submission of descriptors. The creation of a mediated device will generate a UUID. The UUID can be retrieved from one of the VFIO sysfs attributes. This UUID must be provided to the idxd driver via sysfs in order to tie the specific mdev to the relevant workqueue. Given the various ways a wq can be configured and grouped on a device, this allows the system admin to directly associate a specifically configured wq to be exported to the guest that is desired. The hope is that this design choice provides the max configurability and flexibility. Signed-off-by: Dave Jiang --- drivers/dma/Kconfig | 3 drivers/dma/idxd/Makefile | 2 drivers/dma/idxd/device.c | 36 + drivers/dma/idxd/dma.c | 9 drivers/dma/idxd/idxd.h | 23 + drivers/dma/idxd/init.c | 10 drivers/dma/idxd/irq.c | 2 drivers/dma/idxd/mdev.c | 1558 ++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/mdev.h | 23 + drivers/dma/idxd/registers.h | 10 drivers/dma/idxd/submit.c | 28 + drivers/dma/idxd/sysfs.c | 143 ++++ drivers/dma/idxd/vdev.c | 570 +++++++++++++++ drivers/dma/idxd/vdev.h | 42 + 14 files changed, 2418 insertions(+), 41 deletions(-) create mode 100644 drivers/dma/idxd/vdev.c create mode 100644 drivers/dma/idxd/vdev.h diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 9e7d9eafb1f5..e39e04309587 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -291,6 +291,9 @@ config INTEL_IDXD select PCI_PASID select PCI_IOV select MSI_IMS + select VFIO_PCI + select VFIO_MDEV + select VFIO_MDEV_DEVICE help Enable support for the Intel(R) data accelerators present in Intel Xeon CPU. diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile index 308e12869f96..bb1fb771f6b5 100644 --- a/drivers/dma/idxd/Makefile +++ b/drivers/dma/idxd/Makefile @@ -1,2 +1,2 @@ obj-$(CONFIG_INTEL_IDXD) += idxd.o -idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o mdev.o +idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o mdev.o vdev.o diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 830aa5859646..b92cb1ca20d3 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -223,11 +223,11 @@ void idxd_wq_free_resources(struct idxd_wq *wq) sbitmap_free(&wq->sbmap); } -int idxd_wq_enable(struct idxd_wq *wq) +int idxd_wq_enable(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 status; + u32 stat; int rc; lockdep_assert_held(&idxd->dev_lock); @@ -240,13 +240,16 @@ int idxd_wq_enable(struct idxd_wq *wq) rc = idxd_cmd_send(idxd, IDXD_CMD_ENABLE_WQ, wq->id); if (rc < 0) return rc; - rc = idxd_cmd_wait(idxd, &status, IDXD_REG_TIMEOUT); + rc = idxd_cmd_wait(idxd, &stat, IDXD_REG_TIMEOUT); if (rc < 0) return rc; - if (status != IDXD_CMDSTS_SUCCESS && - status != IDXD_CMDSTS_ERR_WQ_ENABLED) { - dev_dbg(dev, "WQ enable failed: %#x\n", status); + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS && + stat != IDXD_CMDSTS_ERR_WQ_ENABLED) { + dev_dbg(dev, "WQ enable failed: %#x\n", stat); return -ENXIO; } @@ -255,11 +258,11 @@ int idxd_wq_enable(struct idxd_wq *wq) return 0; } -int idxd_wq_disable(struct idxd_wq *wq) +int idxd_wq_disable(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 status, operand; + u32 stat, operand; int rc; lockdep_assert_held(&idxd->dev_lock); @@ -274,12 +277,15 @@ int idxd_wq_disable(struct idxd_wq *wq) rc = idxd_cmd_send(idxd, IDXD_CMD_DISABLE_WQ, operand); if (rc < 0) return rc; - rc = idxd_cmd_wait(idxd, &status, IDXD_REG_TIMEOUT); + rc = idxd_cmd_wait(idxd, &stat, IDXD_REG_TIMEOUT); if (rc < 0) return rc; - if (status != IDXD_CMDSTS_SUCCESS) { - dev_dbg(dev, "WQ disable failed: %#x\n", status); + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ disable failed: %#x\n", stat); return -ENXIO; } @@ -362,7 +368,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) lockdep_assert_held(&idxd->dev_lock); - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) return rc; @@ -373,7 +379,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) wqcfg.pasid = pasid; iowrite32(wqcfg.bits[2], idxd->reg_base + offset); - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) return rc; @@ -389,7 +395,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq) lockdep_assert_held(&idxd->dev_lock); - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) return rc; @@ -399,7 +405,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq) wqcfg.pasid = 0; iowrite32(wqcfg.bits[2], idxd->reg_base + offset); - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) return rc; diff --git a/drivers/dma/idxd/dma.c b/drivers/dma/idxd/dma.c index 9a4f78519e57..a49d4f303d7d 100644 --- a/drivers/dma/idxd/dma.c +++ b/drivers/dma/idxd/dma.c @@ -61,8 +61,6 @@ static inline void idxd_prep_desc_common(struct idxd_wq *wq, u64 addr_f1, u64 addr_f2, u64 len, u64 compl, u32 flags) { - struct idxd_device *idxd = wq->idxd; - hw->flags = flags; hw->opcode = opcode; hw->src_addr = addr_f1; @@ -70,13 +68,6 @@ static inline void idxd_prep_desc_common(struct idxd_wq *wq, hw->xfer_size = len; hw->priv = !!(wq->type == IDXD_WQT_KERNEL); hw->completion_addr = compl; - - /* - * Descriptor completion vectors are 1-8 for MSIX. We will round - * robin through the 8 vectors. - */ - wq->vec_ptr = (wq->vec_ptr % idxd->num_wq_irqs) + 1; - hw->int_handle = wq->vec_ptr; } static struct dma_async_tx_descriptor * diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 9b56a4c7f3fc..92a9718daa15 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -8,6 +8,10 @@ #include #include #include +#include +#include +#include +#include #include "registers.h" #define IDXD_DRIVER_VERSION "1.00" @@ -66,6 +70,7 @@ enum idxd_wq_type { IDXD_WQT_NONE = 0, IDXD_WQT_KERNEL, IDXD_WQT_USER, + IDXD_WQT_MDEV, }; struct idxd_cdev { @@ -75,6 +80,11 @@ struct idxd_cdev { struct wait_queue_head err_queue; }; +struct idxd_wq_uuid { + guid_t uuid; + struct list_head list; +}; + #define IDXD_ALLOCATED_BATCH_SIZE 128U #define WQ_NAME_SIZE 1024 #define WQ_TYPE_SIZE 10 @@ -119,6 +129,9 @@ struct idxd_wq { struct percpu_rw_semaphore submit_lock; wait_queue_head_t submit_waitq; char name[WQ_NAME_SIZE + 1]; + struct list_head uuid_list; + int uuids; + struct list_head vdcm_list; }; struct idxd_engine { @@ -200,6 +213,7 @@ struct idxd_device { atomic_t num_allocated_ims; struct sbitmap ims_sbmap; int *int_handles; + struct mutex mdev_lock; /* mdev creation lock */ }; /* IDXD software descriptor */ @@ -282,6 +296,7 @@ void idxd_cleanup_sysfs(struct idxd_device *idxd); int idxd_register_driver(void); void idxd_unregister_driver(void); struct bus_type *idxd_get_bus_type(struct idxd_device *idxd); +bool is_idxd_wq_mdev(struct idxd_wq *wq); /* device interrupt control */ irqreturn_t idxd_irq_handler(int vec, void *data); @@ -310,8 +325,8 @@ int idxd_device_request_int_handle(struct idxd_device *idxd, /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); void idxd_wq_free_resources(struct idxd_wq *wq); -int idxd_wq_enable(struct idxd_wq *wq); -int idxd_wq_disable(struct idxd_wq *wq); +int idxd_wq_enable(struct idxd_wq *wq, u32 *status); +int idxd_wq_disable(struct idxd_wq *wq, u32 *status); int idxd_wq_map_portal(struct idxd_wq *wq); void idxd_wq_unmap_portal(struct idxd_wq *wq); int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid); @@ -344,4 +359,8 @@ int idxd_cdev_get_major(struct idxd_device *idxd); int idxd_wq_add_cdev(struct idxd_wq *wq); void idxd_wq_del_cdev(struct idxd_wq *wq); +/* mdev */ +int idxd_mdev_host_init(struct idxd_device *idxd); +void idxd_mdev_host_release(struct idxd_device *idxd); + #endif diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index babe6e614087..b0f99a794e91 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -218,6 +218,8 @@ static int idxd_setup_internals(struct idxd_device *idxd) mutex_init(&wq->wq_lock); atomic_set(&wq->dq_count, 0); init_waitqueue_head(&wq->submit_waitq); + INIT_LIST_HEAD(&wq->uuid_list); + INIT_LIST_HEAD(&wq->vdcm_list); wq->idxd_cdev.minor = -1; rc = percpu_init_rwsem(&wq->submit_lock); if (rc < 0) { @@ -347,6 +349,7 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev) idxd->pdev = pdev; spin_lock_init(&idxd->dev_lock); + mutex_init(&idxd->mdev_lock); atomic_set(&idxd->num_allocated_ims, 0); return idxd; @@ -509,6 +512,12 @@ static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) return -ENODEV; } + rc = idxd_mdev_host_init(idxd); + if (rc < 0) { + dev_err(dev, "VFIO mdev init failed\n"); + return rc; + } + rc = idxd_setup_sysfs(idxd); if (rc) { dev_err(dev, "IDXD sysfs setup failed\n"); @@ -584,6 +593,7 @@ static void idxd_remove(struct pci_dev *pdev) dev_dbg(&pdev->dev, "%s called\n", __func__); idxd_cleanup_sysfs(idxd); idxd_shutdown(pdev); + idxd_mdev_host_release(idxd); idxd_wqs_free_lock(idxd); idxd_disable_system_pasid(idxd); mutex_lock(&idxd_idr_lock); diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index 37ad927d6944..bc634dc4e485 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -77,7 +77,7 @@ static int idxd_restart(struct idxd_device *idxd) struct idxd_wq *wq = &idxd->wqs[i]; if (wq->state == IDXD_WQ_ENABLED) { - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) { dev_warn(&idxd->pdev->dev, "Unable to re-enable wq %s\n", diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c index 2cf0cdf149b7..b222ce00a9db 100644 --- a/drivers/dma/idxd/mdev.c +++ b/drivers/dma/idxd/mdev.c @@ -1,19 +1,76 @@ // SPDX-License-Identifier: GPL-2.0 -/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +/* Copyright(c) 2019 Intel Corporation. All rights rsvd. */ #include #include #include #include #include +#include #include -#include -#include +#include +#include #include -#include "../../vfio/pci/vfio_pci_private.h" +#include +#include +#include +#include +#include +#include +#include #include #include "registers.h" #include "idxd.h" +#include "../../vfio/pci/vfio_pci_private.h" #include "mdev.h" +#include "vdev.h" + +static u64 idxd_pci_config[] = { + 0x001000000b258086ULL, + 0x0080000008800000ULL, + 0x000000000000000cULL, + 0x000000000000000cULL, + 0x0000000000000000ULL, + 0x2010808600000000ULL, + 0x0000004000000000ULL, + 0x000000ff00000000ULL, + 0x0000060000005011ULL, /* MSI-X capability */ + 0x0000070000000000ULL, + 0x0000000000920010ULL, /* PCIe capability */ + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0070001000000000ULL, + 0x0000000000000000ULL, +}; + +static u64 idxd_pci_ext_cap[] = { + 0x000000611101000fULL, /* ATS capability */ + 0x0000000000000000ULL, + 0x8100000012010013ULL, /* Page Request capability */ + 0x0000000000000001ULL, + 0x000014040001001bULL, /* PASID capability */ + 0x0000000000000000ULL, + 0x0181808600010023ULL, /* Scalable IOV capability */ + 0x0000000100000005ULL, + 0x0000000000000001ULL, + 0x0000000000000000ULL, +}; + +static u64 idxd_cap_ctrl_reg[] = { + 0x0000000000000100ULL, + 0x0000000000000000ULL, + 0x00000001013f038fULL, /* gencap */ + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000004004ULL, /* grpcap */ + 0x0000000000000004ULL, /* engcap */ + 0x00000001003f03ffULL, /* opcap */ + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, /* offsets */ +}; static void idxd_free_ims_index(struct idxd_device *idxd, unsigned long ims_idx) @@ -124,7 +181,11 @@ static struct platform_msi_ops idxd_ims_ops = { static irqreturn_t idxd_guest_wq_completion_interrupt(int irq, void *data) { - /* send virtual interrupt */ + struct ims_irq_entry *irq_entry = data; + struct vdcm_idxd *vidxd = irq_entry->vidxd; + int msix_idx = irq_entry->int_src; + + vidxd_send_interrupt(vidxd, msix_idx + 1); return IRQ_HANDLED; } @@ -177,3 +238,1490 @@ static int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) return 0; } + +static inline bool handle_valid(unsigned long handle) +{ + return !!(handle & ~0xff); +} + +static void idxd_vdcm_reinit(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq; + struct idxd_device *idxd; + unsigned long flags; + + memset(vidxd->cfg, 0, VIDXD_MAX_CFG_SPACE_SZ); + memset(&vidxd->bar0, 0, sizeof(struct vdcm_idxd_pci_bar0)); + + memcpy(vidxd->cfg, idxd_pci_config, sizeof(idxd_pci_config)); + memcpy(vidxd->cfg + 0x100, idxd_pci_ext_cap, + sizeof(idxd_pci_ext_cap)); + + memcpy(vidxd->bar0.cap_ctrl_regs, idxd_cap_ctrl_reg, + sizeof(idxd_cap_ctrl_reg)); + + /* Set the MSI-X table size */ + vidxd->cfg[VIDXD_MSIX_TBL_SZ_OFFSET] = 1; + idxd = vidxd->idxd; + wq = vidxd->wq; + + if (wq_dedicated(wq)) { + spin_lock_irqsave(&idxd->dev_lock, flags); + idxd_wq_disable(wq, NULL); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + } + + vidxd_mmio_init(vidxd); +} + +struct vfio_region { + u32 type; + u32 subtype; + size_t size; + u32 flags; +}; + +struct kvmidxd_guest_info { + struct kvm *kvm; + struct vdcm_idxd *vidxd; +}; + +static int kvmidxd_guest_init(struct mdev_device *mdev) +{ + struct kvmidxd_guest_info *info; + struct vdcm_idxd *vidxd; + struct kvm *kvm; + struct device *dev = mdev_dev(mdev); + + vidxd = mdev_get_drvdata(mdev); + if (handle_valid(vidxd->handle)) + return -EEXIST; + + kvm = vidxd->vdev.kvm; + if (!kvm || kvm->mm != current->mm) { + dev_err(dev, "KVM is required to use Intel vIDXD\n"); + return -ESRCH; + } + + info = vzalloc(sizeof(*info)); + if (!info) + return -ENOMEM; + + vidxd->handle = (unsigned long)info; + info->vidxd = vidxd; + info->kvm = kvm; + + return 0; +} + +static bool kvmidxd_guest_exit(unsigned long handle) +{ + if (handle == 0) + return false; + + vfree((void *)handle); + + return true; +} + +static void __idxd_vdcm_release(struct vdcm_idxd *vidxd) +{ + int rc; + struct device *dev = &vidxd->idxd->pdev->dev; + + if (atomic_cmpxchg(&vidxd->vdev.released, 0, 1)) + return; + + if (!handle_valid(vidxd->handle)) + return; + + /* Re-initialize the VIDXD to a pristine state for re-use */ + rc = vfio_unregister_notifier(mdev_dev(vidxd->vdev.mdev), + VFIO_GROUP_NOTIFY, + &vidxd->vdev.group_notifier); + if (rc < 0) + dev_warn(dev, "vfio_unregister_notifier group failed: %d\n", + rc); + + kvmidxd_guest_exit(vidxd->handle); + vidxd_free_ims_entries(vidxd); + + vidxd->vdev.kvm = NULL; + vidxd->handle = 0; + idxd_vdcm_reinit(vidxd); +} + +static void idxd_vdcm_release(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "vdcm_idxd_release %d\n", vidxd->type->type); + __idxd_vdcm_release(vidxd); +} + +static void idxd_vdcm_release_work(struct work_struct *work) +{ + struct vdcm_idxd *vidxd = container_of(work, struct vdcm_idxd, + vdev.release_work); + + __idxd_vdcm_release(vidxd); +} + +static bool idxd_wq_match_uuid(struct idxd_wq *wq, const guid_t *uuid) +{ + struct idxd_wq_uuid *entry; + bool found = false; + + list_for_each_entry(entry, &wq->uuid_list, list) { + if (guid_equal(&entry->uuid, uuid)) { + found = true; + break; + } + } + + return found; +} + +static struct idxd_wq *find_wq_by_uuid(struct idxd_device *idxd, + const guid_t *uuid) +{ + int i; + struct idxd_wq *wq; + bool found = false; + + for (i = 0; i < idxd->max_wqs; i++) { + wq = &idxd->wqs[i]; + found = idxd_wq_match_uuid(wq, uuid); + if (found) + return wq; + } + + return NULL; +} + +static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, + struct mdev_device *mdev, + struct vdcm_idxd_type *type) +{ + struct vdcm_idxd *vidxd; + unsigned long flags; + struct idxd_wq *wq = NULL; + struct device *dev = mdev_dev(mdev); + + wq = find_wq_by_uuid(idxd, mdev_uuid(mdev)); + if (!wq) { + dev_dbg(dev, "No WQ found\n"); + return NULL; + } + + if (wq->state != IDXD_WQ_ENABLED) + return NULL; + + vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); + if (!vidxd) + return NULL; + + vidxd->idxd = idxd; + vidxd->vdev.mdev = mdev; + vidxd->wq = wq; + mdev_set_drvdata(mdev, vidxd); + vidxd->type = type; + vidxd->num_wqs = 1; + + mutex_lock(&wq->wq_lock); + if (wq_dedicated(wq)) { + /* disable wq. will be enabled by the VM */ + spin_lock_irqsave(&vidxd->idxd->dev_lock, flags); + idxd_wq_disable(vidxd->wq, NULL); + spin_unlock_irqrestore(&vidxd->idxd->dev_lock, flags); + } + + /* Initialize virtual PCI resources if it is an MDEV type for a VM */ + memcpy(vidxd->cfg, idxd_pci_config, sizeof(idxd_pci_config)); + memcpy(vidxd->cfg + 0x100, idxd_pci_ext_cap, + sizeof(idxd_pci_ext_cap)); + memcpy(vidxd->bar0.cap_ctrl_regs, idxd_cap_ctrl_reg, + sizeof(idxd_cap_ctrl_reg)); + + /* Set the MSI-X table size */ + vidxd->cfg[VIDXD_MSIX_TBL_SZ_OFFSET] = 1; + vidxd->bar_size[0] = VIDXD_BAR0_SIZE; + vidxd->bar_size[1] = VIDXD_BAR2_SIZE; + + vidxd_mmio_init(vidxd); + + INIT_WORK(&vidxd->vdev.release_work, idxd_vdcm_release_work); + + idxd_wq_get(wq); + mutex_unlock(&wq->wq_lock); + + return vidxd; +} + +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES] = { + { + .name = "wq", + .description = "IDXD MDEV workqueue", + .type = IDXD_MDEV_TYPE_WQ, + }, +}; + +static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev, + const char *name) +{ + int i; + char dev_name[IDXD_MDEV_NAME_LEN]; + + for (i = 0; i < IDXD_MDEV_TYPES; i++) { + snprintf(dev_name, IDXD_MDEV_NAME_LEN, "idxd-%s", + idxd_mdev_types[i].name); + + if (!strncmp(name, dev_name, IDXD_MDEV_NAME_LEN)) + return &idxd_mdev_types[i]; + } + + return NULL; +} + +static int idxd_vdcm_create(struct kobject *kobj, struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd; + struct vdcm_idxd_type *type; + struct device *dev, *parent; + struct idxd_device *idxd; + int rc = 0; + + parent = mdev_parent_dev(mdev); + idxd = dev_get_drvdata(parent); + dev = mdev_dev(mdev); + + mdev_set_iommu_device(dev, parent); + mutex_lock(&idxd->mdev_lock); + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + if (!type) { + dev_err(dev, "failed to find type %s to create\n", + kobject_name(kobj)); + rc = -EINVAL; + goto out; + } + + vidxd = vdcm_vidxd_create(idxd, mdev, type); + if (IS_ERR_OR_NULL(vidxd)) { + rc = !vidxd ? -ENOMEM : PTR_ERR(vidxd); + dev_err(dev, "failed to create vidxd: %d\n", rc); + goto out; + } + + list_add(&vidxd->list, &vidxd->wq->vdcm_list); + dev_dbg(dev, "mdev creation success: %s\n", dev_name(mdev_dev(mdev))); + + out: + mutex_unlock(&idxd->mdev_lock); + return rc; +} + +static void vdcm_vidxd_remove(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + struct device *dev = &idxd->pdev->dev; + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s: removing for wq %d\n", __func__, vidxd->wq->id); + + mutex_lock(&wq->wq_lock); + list_del(&vidxd->list); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + kfree(vidxd); +} + +static int idxd_vdcm_remove(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + + if (handle_valid(vidxd->handle)) + return -EBUSY; + + vdcm_vidxd_remove(vidxd); + return 0; +} + +static int idxd_vdcm_group_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct vdcm_idxd *vidxd = container_of(nb, struct vdcm_idxd, + vdev.group_notifier); + + /* The only action we care about */ + if (action == VFIO_GROUP_NOTIFY_SET_KVM) { + vidxd->vdev.kvm = data; + + if (!data) + schedule_work(&vidxd->vdev.release_work); + } + + return NOTIFY_OK; +} + +static int idxd_vdcm_open(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned long events; + int rc; + struct vdcm_idxd_type *type = vidxd->type; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s: type: %d\n", __func__, type->type); + + vidxd->vdev.group_notifier.notifier_call = idxd_vdcm_group_notifier; + events = VFIO_GROUP_NOTIFY_SET_KVM; + rc = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, + &events, &vidxd->vdev.group_notifier); + if (rc < 0) { + dev_err(dev, "vfio_register_notifier for group failed: %d\n", + rc); + return rc; + } + + /* allocate and setup IMS entries */ + rc = vidxd_setup_ims_entries(vidxd); + if (rc < 0) + goto undo_group; + + rc = kvmidxd_guest_init(mdev); + if (rc) + goto undo_ims; + + atomic_set(&vidxd->vdev.released, 0); + + return rc; + + undo_ims: + vidxd_free_ims_entries(vidxd); + undo_group: + vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, + &vidxd->vdev.group_notifier); + return rc; +} + +static int vdcm_vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, + unsigned int size) +{ + u32 offset = pos & (vidxd->bar_size[0] - 1); + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + dev_WARN_ONCE(dev, (size & (size - 1)) != 0, "%s\n", __func__); + dev_WARN_ONCE(dev, size > 8, "%s\n", __func__); + dev_WARN_ONCE(dev, (offset & (size - 1)) != 0, "%s\n", __func__); + + dev_dbg(dev, "vidxd mmio W %d %x %x: %llx\n", vidxd->wq->id, size, + offset, get_reg_val(buf, size)); + + /* If we don't limit this, we potentially can write out of bound */ + if (size > 8) + size = 8; + + switch (offset) { + case IDXD_GENCFG_OFFSET ... IDXD_GENCFG_OFFSET + 7: + /* Write only when device is disabled. */ + if (vidxd_state(vidxd) == IDXD_DEVICE_STATE_DISABLED) + memcpy(&bar0->cap_ctrl_regs[offset], buf, size); + break; + + case IDXD_GENCTRL_OFFSET: + memcpy(&bar0->cap_ctrl_regs[offset], buf, size); + break; + + case IDXD_INTCAUSE_OFFSET: + bar0->cap_ctrl_regs[offset] &= ~(get_reg_val(buf, 1) & 0x0f); + break; + + case IDXD_CMD_OFFSET: + if (size == 4) { + u8 *cap_ctrl = &bar0->cap_ctrl_regs[0]; + unsigned long *cmdsts = + (unsigned long *)&cap_ctrl[IDXD_CMDSTS_OFFSET]; + u32 val = get_reg_val(buf, size); + + /* Check and set device active */ + if (test_and_set_bit(31, cmdsts) == 0) { + *(u32 *)cmdsts = 1 << 31; + vidxd_do_command(vidxd, val); + } + } + break; + + case IDXD_SWERR_OFFSET: + /* W1C */ + bar0->cap_ctrl_regs[offset] &= ~(get_reg_val(buf, 1) & 3); + break; + + case VIDXD_WQCFG_OFFSET ... VIDXD_WQCFG_OFFSET + VIDXD_WQ_CTRL_SZ - 1: { + union wqcfg *wqcfg; + int wq_id = (offset - VIDXD_WQCFG_OFFSET) / 0x20; + struct idxd_wq *wq; + int subreg = offset & 0x1c; + u32 new_val; + + if (wq_id >= 1) + break; + wq = vidxd->wq; + wqcfg = (union wqcfg *)&bar0->wq_ctrl_regs[wq_id * 0x20]; + if (size >= 4) { + new_val = get_reg_val(buf, 4); + } else { + u32 tmp1, tmp2, shift, mask; + + switch (subreg) { + case 4: + tmp1 = wqcfg->bits[1]; break; + case 8: + tmp1 = wqcfg->bits[2]; break; + case 12: + tmp1 = wqcfg->bits[3]; break; + case 16: + tmp1 = wqcfg->bits[4]; break; + case 20: + tmp1 = wqcfg->bits[5]; break; + default: + tmp1 = 0; + } + + tmp2 = get_reg_val(buf, size); + shift = (offset & 0x03U) * 8; + mask = ((1U << size * 8) - 1u) << shift; + new_val = (tmp1 & ~mask) | (tmp2 << shift); + } + + if (subreg == 8) { + if (wqcfg->wq_state == 0) { + wqcfg->bits[2] &= 0xfe; + wqcfg->bits[2] |= new_val & 0xffffff01; + } + } + + break; + } + + case VIDXD_MSIX_TABLE_OFFSET ... + VIDXD_MSIX_TABLE_OFFSET + VIDXD_MSIX_TBL_SZ - 1: { + int index = (offset - VIDXD_MSIX_TABLE_OFFSET) / 0x10; + u8 *msix_entry = &bar0->msix_table[index * 0x10]; + u8 *msix_perm = &bar0->msix_perm_table[index * 8]; + int end; + + /* Upper bound checking to stop overflow */ + end = VIDXD_MSIX_TABLE_OFFSET + VIDXD_MSIX_TBL_SZ; + if (offset + size > end) + size = end - offset; + + memcpy(msix_entry + (offset & 0xf), buf, size); + /* check mask and pba */ + if ((msix_entry[12] & 1) == 0) { + *(u32 *)msix_perm &= ~3U; + if (test_and_clear_bit(index, &bar0->msix_pba)) + vidxd_send_interrupt(vidxd, index); + } else { + *(u32 *)msix_perm |= 1; + } + break; + } + + case VIDXD_MSIX_PERM_OFFSET ... + VIDXD_MSIX_PERM_OFFSET + VIDXD_MSIX_PERM_TBL_SZ - 1: + if ((offset & 7) == 0 && size == 4) { + int index = (offset - VIDXD_MSIX_PERM_OFFSET) / 8; + u32 *msix_perm = + (u32 *)&bar0->msix_perm_table[index * 8]; + u8 *msix_entry = &bar0->msix_table[index * 0x10]; + u32 val = get_reg_val(buf, size) & 0xfffff00d; + + if (index > 0) + vidxd_setup_ims_entry(vidxd, index - 1, val); + + if (val & 1) { + msix_entry[12] |= 1; + if (bar0->msix_pba & (1ULL << index)) + val |= 2; + } else { + msix_entry[12] &= ~1u; + if (test_and_clear_bit(index, + &bar0->msix_pba)) + vidxd_send_interrupt(vidxd, index); + } + *msix_perm = val; + } + break; + } + + return 0; +} + +static int vdcm_vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, + unsigned int size) +{ + u32 offset = pos & (vidxd->bar_size[0] - 1); + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + u8 *reg_addr, *msix_table, *msix_perm_table; + struct device *dev = mdev_dev(vidxd->vdev.mdev); + u32 end; + + dev_WARN_ONCE(dev, (size & (size - 1)) != 0, "%s\n", __func__); + dev_WARN_ONCE(dev, size > 8, "%s\n", __func__); + dev_WARN_ONCE(dev, (offset & (size - 1)) != 0, "%s\n", __func__); + + /* If we don't limit this, we potentially can write out of bound */ + if (size > 8) + size = 8; + + switch (offset) { + case 0 ... VIDXD_CAP_CTRL_SZ - 1: + end = VIDXD_CAP_CTRL_SZ; + if (offset + 8 > end) + size = end - offset; + reg_addr = &bar0->cap_ctrl_regs[offset]; + break; + + case VIDXD_GRPCFG_OFFSET ... + VIDXD_GRPCFG_OFFSET + VIDXD_GRP_CTRL_SZ - 1: + end = VIDXD_GRPCFG_OFFSET + VIDXD_GRP_CTRL_SZ; + if (offset + 8 > end) + size = end - offset; + reg_addr = &bar0->grp_ctrl_regs[offset - VIDXD_GRPCFG_OFFSET]; + break; + + case VIDXD_WQCFG_OFFSET ... VIDXD_WQCFG_OFFSET + VIDXD_WQ_CTRL_SZ - 1: + end = VIDXD_WQCFG_OFFSET + VIDXD_WQ_CTRL_SZ; + if (offset + 8 > end) + size = end - offset; + reg_addr = &bar0->wq_ctrl_regs[offset - VIDXD_WQCFG_OFFSET]; + break; + + case VIDXD_MSIX_TABLE_OFFSET ... + VIDXD_MSIX_TABLE_OFFSET + VIDXD_MSIX_TBL_SZ - 1: + end = VIDXD_MSIX_TABLE_OFFSET + VIDXD_MSIX_TBL_SZ; + if (offset + 8 > end) + size = end - offset; + msix_table = &bar0->msix_table[0]; + reg_addr = &msix_table[offset - VIDXD_MSIX_TABLE_OFFSET]; + break; + + case VIDXD_MSIX_PBA_OFFSET ... VIDXD_MSIX_PBA_OFFSET + 7: + end = VIDXD_MSIX_PBA_OFFSET + 8; + if (offset + 8 > end) + size = end - offset; + reg_addr = (u8 *)&bar0->msix_pba; + break; + + case VIDXD_MSIX_PERM_OFFSET ... + VIDXD_MSIX_PERM_OFFSET + VIDXD_MSIX_PERM_TBL_SZ - 1: + end = VIDXD_MSIX_PERM_OFFSET + VIDXD_MSIX_PERM_TBL_SZ; + if (offset + 8 > end) + size = end - offset; + msix_perm_table = &bar0->msix_perm_table[0]; + reg_addr = &msix_perm_table[offset - VIDXD_MSIX_PERM_OFFSET]; + break; + + default: + reg_addr = NULL; + break; + } + + if (reg_addr) + memcpy(buf, reg_addr, size); + else + memset(buf, 0, size); + + dev_dbg(dev, "vidxd mmio R %d %x %x: %llx\n", + vidxd->wq->id, size, offset, get_reg_val(buf, size)); + return 0; +} + +static int vdcm_vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, + void *buf, unsigned int count) +{ + u32 offset = pos & 0xfff; + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + memcpy(buf, &vidxd->cfg[offset], count); + + dev_dbg(dev, "vidxd pci R %d %x %x: %llx\n", + vidxd->wq->id, count, offset, get_reg_val(buf, count)); + + return 0; +} + +static int vdcm_vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, + void *buf, unsigned int size) +{ + u32 offset = pos & 0xfff; + u64 val; + u8 *cfg = vidxd->cfg; + u8 *bar0 = vidxd->bar0.cap_ctrl_regs; + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + dev_dbg(dev, "vidxd pci W %d %x %x: %llx\n", vidxd->wq->id, size, + offset, get_reg_val(buf, size)); + + switch (offset) { + case PCI_COMMAND: { /* device control */ + bool bme; + + memcpy(&cfg[offset], buf, size); + bme = cfg[offset] & PCI_COMMAND_MASTER; + if (!bme && + ((*(u32 *)&bar0[IDXD_GENSTATS_OFFSET]) & 0x3) != 0) { + *(u32 *)(&bar0[IDXD_SWERR_OFFSET]) = 0x51u << 8; + *(u32 *)(&bar0[IDXD_GENSTATS_OFFSET]) = 0; + } + + if (size < 4) + break; + offset += 2; + buf = buf + 2; + size -= 2; + } + /* fall through */ + + case PCI_STATUS: { /* device status */ + u16 nval = get_reg_val(buf, size) << (offset & 1) * 8; + + nval &= 0xf900; + *(u16 *)&cfg[offset] = *((u16 *)&cfg[offset]) & ~nval; + break; + } + + case PCI_CACHE_LINE_SIZE: + case PCI_INTERRUPT_LINE: + memcpy(&cfg[offset], buf, size); + break; + + case PCI_BASE_ADDRESS_0: /* BAR0 */ + case PCI_BASE_ADDRESS_1: /* BAR1 */ + case PCI_BASE_ADDRESS_2: /* BAR2 */ + case PCI_BASE_ADDRESS_3: /* BAR3 */ { + unsigned int bar_id, bar_offset; + u64 bar, bar_size; + + bar_id = (offset - PCI_BASE_ADDRESS_0) / 8; + bar_size = vidxd->bar_size[bar_id]; + bar_offset = PCI_BASE_ADDRESS_0 + bar_id * 8; + + val = get_reg_val(buf, size); + bar = *(u64 *)&cfg[bar_offset]; + memcpy((u8 *)&bar + (offset & 0x7), buf, size); + bar &= ~(bar_size - 1); + + *(u64 *)&cfg[bar_offset] = bar | + PCI_BASE_ADDRESS_MEM_TYPE_64 | + PCI_BASE_ADDRESS_MEM_PREFETCH; + + if (val == -1U || val == -1ULL) + break; + if (bar == 0 || bar == -1ULL - -1U) + break; + if (bar == (-1U & ~(bar_size - 1))) + break; + if (bar == (-1ULL & ~(bar_size - 1))) + break; + if (bar == vidxd->bar_val[bar_id]) + break; + + vidxd->bar_val[bar_id] = bar; + break; + } + + case VIDXD_ATS_OFFSET + 4: + if (size < 4) + break; + offset += 2; + buf = buf + 2; + size -= 2; + /* fall through */ + + case VIDXD_ATS_OFFSET + 6: + memcpy(&cfg[offset], buf, size); + break; + + case VIDXD_PRS_OFFSET + 4: { + u8 old_val, new_val; + + val = get_reg_val(buf, 1); + old_val = cfg[VIDXD_PRS_OFFSET + 4]; + new_val = val & 1; + + cfg[offset] = new_val; + if (old_val == 0 && new_val == 1) { + /* + * Clear Stopped, Response Failure, + * and Unexpected Response. + */ + *(u16 *)&cfg[VIDXD_PRS_OFFSET + 6] &= ~(u16)(0x0103); + } + + if (size < 4) + break; + + offset += 2; + buf = (u8 *)buf + 2; + size -= 2; + } + /* fall through */ + + case VIDXD_PRS_OFFSET + 6: + cfg[offset] &= ~(get_reg_val(buf, 1) & 3); + break; + case VIDXD_PRS_OFFSET + 12 ... VIDXD_PRS_OFFSET + 15: + memcpy(&cfg[offset], buf, size); + break; + + case VIDXD_PASID_OFFSET + 4: + if (size < 4) + break; + offset += 2; + buf = buf + 2; + size -= 2; + /* fall through */ + case VIDXD_PASID_OFFSET + 6: + cfg[offset] = get_reg_val(buf, 1) & 5; + break; + } + + return 0; +} + +static ssize_t idxd_vdcm_rw(struct mdev_device *mdev, char *buf, + size_t count, loff_t *ppos, enum idxd_vdcm_rw mode) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); + u64 pos = *ppos & VFIO_PCI_OFFSET_MASK; + struct device *dev = mdev_dev(mdev); + int rc = -EINVAL; + + if (index >= VFIO_PCI_NUM_REGIONS) { + dev_err(dev, "invalid index: %u\n", index); + return -EINVAL; + } + + switch (index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vdcm_vidxd_cfg_write(vidxd, pos, buf, count); + else + rc = vdcm_vidxd_cfg_read(vidxd, pos, buf, count); + break; + case VFIO_PCI_BAR0_REGION_INDEX: + case VFIO_PCI_BAR1_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vdcm_vidxd_mmio_write(vidxd, + vidxd->bar_val[0] + pos, buf, + count); + else + rc = vdcm_vidxd_mmio_read(vidxd, + vidxd->bar_val[0] + pos, buf, + count); + break; + case VFIO_PCI_BAR2_REGION_INDEX: + case VFIO_PCI_BAR3_REGION_INDEX: + case VFIO_PCI_BAR4_REGION_INDEX: + case VFIO_PCI_BAR5_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + case VFIO_PCI_ROM_REGION_INDEX: + default: + dev_err(dev, "unsupported region: %u\n", index); + } + + return rc == 0 ? count : rc; +} + +static ssize_t idxd_vdcm_read(struct mdev_device *mdev, char __user *buf, + size_t count, loff_t *ppos) +{ + unsigned int done = 0; + int rc; + + while (count) { + size_t filled; + + if (count >= 8 && !(*ppos % 8)) { + u64 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 8; + } else if (count >= 4 && !(*ppos % 4)) { + u32 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 2; + } else { + u8 val; + + rc = idxd_vdcm_rw(mdev, &val, sizeof(val), ppos, + IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + return done; + + read_err: + return -EFAULT; +} + +static ssize_t idxd_vdcm_write(struct mdev_device *mdev, + const char __user *buf, size_t count, + loff_t *ppos) +{ + unsigned int done = 0; + int rc; + + while (count) { + size_t filled; + + if (count >= 8 && !(*ppos % 8)) { + u64 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 8; + } else if (count >= 4 && !(*ppos % 4)) { + u32 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, + sizeof(val), ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 2; + } else { + u8 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, &val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + return done; +write_err: + return -EFAULT; +} + +static int check_vma(struct idxd_wq *wq, struct vm_area_struct *vma, + const char *func) +{ + if (vma->vm_end < vma->vm_start) + return -EINVAL; + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + + return 0; +} + +static int idxd_vdcm_mmap(struct mdev_device *mdev, struct vm_area_struct *vma) +{ + unsigned int wq_idx, rc; + unsigned long req_size, pgoff = 0, offset; + pgprot_t pg_prot; + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_wq *wq = vidxd->wq; + struct idxd_device *idxd = vidxd->idxd; + enum idxd_portal_prot virt_limited, phys_limited; + phys_addr_t base = pci_resource_start(idxd->pdev, IDXD_WQ_BAR); + struct device *dev = mdev_dev(mdev); + + rc = check_vma(wq, vma, __func__); + if (rc) + return rc; + + pg_prot = vma->vm_page_prot; + req_size = vma->vm_end - vma->vm_start; + vma->vm_flags |= VM_DONTCOPY; + + offset = (vma->vm_pgoff << PAGE_SHIFT) & + ((1ULL << VFIO_PCI_OFFSET_SHIFT) - 1); + + wq_idx = offset >> (PAGE_SHIFT + 2); + if (wq_idx >= 1) { + dev_err(dev, "mapping invalid wq %d off %lx\n", + wq_idx, offset); + return -EINVAL; + } + + virt_limited = ((offset >> PAGE_SHIFT) & 0x3) == 1; + phys_limited = IDXD_PORTAL_LIMITED; + + if (virt_limited == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq)) + phys_limited = IDXD_PORTAL_UNLIMITED; + + /* We always map IMS portals to the guest */ + pgoff = (base + + idxd_get_wq_portal_full_offset(wq->id, phys_limited, + IDXD_IRQ_IMS)) >> PAGE_SHIFT; + + dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size, + pgprot_val(pg_prot)); + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_private_data = mdev; + vma->vm_pgoff = pgoff; + vma->vm_private_data = mdev; + + return remap_pfn_range(vma, vma->vm_start, pgoff, req_size, pg_prot); +} + +static int idxd_vdcm_get_irq_count(struct vdcm_idxd *vidxd, int type) +{ + if (type == VFIO_PCI_MSI_IRQ_INDEX || + type == VFIO_PCI_MSIX_IRQ_INDEX) + return vidxd->num_wqs + 1; + + return 0; +} + +static int vdcm_idxd_set_msix_trigger(struct vdcm_idxd *vidxd, + unsigned int index, unsigned int start, + unsigned int count, uint32_t flags, + void *data) +{ + struct eventfd_ctx *trigger; + int i, rc = 0; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + if (count > VIDXD_MAX_MSIX_ENTRIES - 1) + count = VIDXD_MAX_MSIX_ENTRIES - 1; + + if (count == 0 && (flags & VFIO_IRQ_SET_DATA_NONE)) { + /* Disable all MSIX entries */ + for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) { + if (vidxd->vdev.msix_trigger[i]) { + dev_dbg(dev, "disable MSIX entry %d\n", i); + eventfd_ctx_put(vidxd->vdev.msix_trigger[i]); + vidxd->vdev.msix_trigger[i] = 0; + + if (i) { + rc = vidxd_free_ims_entry(vidxd, i - 1); + if (rc) + return rc; + } + } + } + return 0; + } + + for (i = 0; i < count; i++) { + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + u32 fd = *(u32 *)(data + i * sizeof(u32)); + + dev_dbg(dev, "enable MSIX entry %d\n", i); + trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(trigger)) { + pr_err("eventfd_ctx_fdget failed %d\n", i); + return PTR_ERR(trigger); + } + vidxd->vdev.msix_trigger[i] = trigger; + /* + * Allocate a vector from the OS and set in the IMS + * entry + */ + if (i) { + rc = vidxd_setup_ims_entry(vidxd, i - 1, 0); + if (rc) + return rc; + } + fd++; + } else if (flags & VFIO_IRQ_SET_DATA_NONE) { + dev_dbg(dev, "disable MSIX entry %d\n", i); + eventfd_ctx_put(vidxd->vdev.msix_trigger[i]); + vidxd->vdev.msix_trigger[i] = 0; + + if (i) { + rc = vidxd_free_ims_entry(vidxd, i - 1); + if (rc) + return rc; + } + } + } + return rc; +} + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, + unsigned int index, unsigned int start, + unsigned int count, void *data) +{ + int (*func)(struct vdcm_idxd *vidxd, unsigned int index, + unsigned int start, unsigned int count, uint32_t flags, + void *data) = NULL; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + int msixcnt = pci_msix_vec_count(vidxd->idxd->pdev); + + if (msixcnt < 0) + return -ENXIO; + + switch (index) { + case VFIO_PCI_INTX_IRQ_INDEX: + dev_warn(dev, "intx interrupts not supported.\n"); + break; + case VFIO_PCI_MSI_IRQ_INDEX: + dev_dbg(dev, "msi interrupt.\n"); + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; + break; + } + break; + case VFIO_PCI_MSIX_IRQ_INDEX: + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; + break; + } + break; + default: + return -ENOTTY; + } + + if (!func) + return -ENOTTY; + + return func(vidxd, index, start, count, flags, data); +} + +static void vidxd_vdcm_reset(struct vdcm_idxd *vidxd) +{ + vidxd_reset(vidxd); +} + +static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd, + unsigned long arg) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned long minsz; + int rc = -EINVAL; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "vidxd %lx ioctl, cmd: %d\n", vidxd->handle, cmd); + + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + info.flags = VFIO_DEVICE_FLAGS_PCI; + info.flags |= VFIO_DEVICE_FLAGS_RESET; + info.num_regions = VFIO_PCI_NUM_REGIONS; + info.num_irqs = VFIO_PCI_NUM_IRQS; + + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + + } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { + struct vfio_region_info info; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + int i; + struct vfio_region_info_cap_sparse_mmap *sparse = NULL; + size_t size; + int nr_areas = 1; + int cap_type_id = 0; + + minsz = offsetofend(struct vfio_region_info, offset); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + switch (info.index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = VIDXD_MAX_CFG_SPACE_SZ; + info.flags = VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR0_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = vidxd->bar_size[info.index]; + if (!info.size) { + info.flags = 0; + break; + } + + info.flags = VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR1_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + break; + case VFIO_PCI_BAR2_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.flags = VFIO_REGION_INFO_FLAG_CAPS | + VFIO_REGION_INFO_FLAG_MMAP | + VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE; + info.size = vidxd->bar_size[1]; + + /* + * Every WQ has two areas for unlimited and limited + * MSI-X portals. IMS portals are not reported + */ + nr_areas = 2; + + size = sizeof(*sparse) + + (nr_areas * sizeof(*sparse->areas)); + sparse = kzalloc(size, GFP_KERNEL); + if (!sparse) + return -ENOMEM; + + sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + sparse->header.version = 1; + sparse->nr_areas = nr_areas; + cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + + sparse->areas[0].offset = 0; + sparse->areas[0].size = PAGE_SIZE; + + sparse->areas[1].offset = PAGE_SIZE; + sparse->areas[1].size = PAGE_SIZE; + break; + + case VFIO_PCI_BAR3_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + dev_dbg(dev, "get region info bar:%d\n", info.index); + break; + + case VFIO_PCI_ROM_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + dev_dbg(dev, "get region info index:%d\n", + info.index); + break; + default: { + struct vfio_region_info_cap_type cap_type = { + .header.id = VFIO_REGION_INFO_CAP_TYPE, + .header.version = 1 + }; + + if (info.index >= VFIO_PCI_NUM_REGIONS + + vidxd->vdev.num_regions) + return -EINVAL; + + i = info.index - VFIO_PCI_NUM_REGIONS; + + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = vidxd->vdev.region[i].size; + info.flags = vidxd->vdev.region[i].flags; + + cap_type.type = vidxd->vdev.region[i].type; + cap_type.subtype = vidxd->vdev.region[i].subtype; + + rc = vfio_info_add_capability(&caps, &cap_type.header, + sizeof(cap_type)); + if (rc) + return rc; + } /* default */ + } /* info.index switch */ + + if ((info.flags & VFIO_REGION_INFO_FLAG_CAPS) && sparse) { + if (cap_type_id == VFIO_REGION_INFO_CAP_SPARSE_MMAP) { + rc = vfio_info_add_capability(&caps, + &sparse->header, + sizeof(*sparse) + + (sparse->nr_areas * + sizeof(*sparse->areas))); + kfree(sparse); + if (rc) + return rc; + } + } + + if (caps.size) { + if (info.argsz < sizeof(info) + caps.size) { + info.argsz = sizeof(info) + caps.size; + info.cap_offset = 0; + } else { + vfio_info_cap_shift(&caps, sizeof(info)); + if (copy_to_user((void __user *)arg + + sizeof(info), caps.buf, + caps.size)) { + kfree(caps.buf); + return -EFAULT; + } + info.cap_offset = sizeof(info); + } + + kfree(caps.buf); + } + + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { + struct vfio_irq_info info; + + minsz = offsetofend(struct vfio_irq_info, count); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) + return -EINVAL; + + switch (info.index) { + case VFIO_PCI_MSI_IRQ_INDEX: + case VFIO_PCI_MSIX_IRQ_INDEX: + default: + return -EINVAL; + } /* switch(info.index) */ + + info.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE; + info.count = idxd_vdcm_get_irq_count(vidxd, info.index); + + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + } else if (cmd == VFIO_DEVICE_SET_IRQS) { + struct vfio_irq_set hdr; + u8 *data = NULL; + size_t data_size = 0; + + minsz = offsetofend(struct vfio_irq_set, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) + return -EFAULT; + + if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) { + int max = idxd_vdcm_get_irq_count(vidxd, hdr.index); + + rc = vfio_set_irqs_validate_and_prepare(&hdr, max, + VFIO_PCI_NUM_IRQS, + &data_size); + if (rc) { + dev_err(dev, "intel:vfio_set_irqs_validate_and_prepare failed\n"); + return -EINVAL; + } + if (data_size) { + data = memdup_user((void __user *)(arg + minsz), + data_size); + if (IS_ERR(data)) + return PTR_ERR(data); + } + } + + if (!data) + return -EINVAL; + + rc = idxd_vdcm_set_irqs(vidxd, hdr.flags, hdr.index, + hdr.start, hdr.count, data); + kfree(data); + return rc; + } else if (cmd == VFIO_DEVICE_RESET) { + vidxd_vdcm_reset(vidxd); + return 0; + } + + return rc; +} + +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf) +{ + struct vdcm_idxd_type *type; + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + + if (type) + return sprintf(buf, "%s\n", type->description); + + return -EINVAL; +} +static MDEV_TYPE_ATTR_RO(name); + +static int find_available_mdev_instances(struct idxd_device *idxd) +{ + int count = 0, i; + + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq; + + wq = &idxd->wqs[i]; + if (!is_idxd_wq_mdev(wq)) + continue; + + if ((idxd_wq_refcount(wq) <= 1 && wq_dedicated(wq)) || + !wq_dedicated(wq)) + count++; + } + + return count; +} + +static ssize_t available_instances_show(struct kobject *kobj, + struct device *dev, char *buf) +{ + int count; + struct idxd_device *idxd = dev_get_drvdata(dev); + struct vdcm_idxd_type *type; + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + if (!type) + return -EINVAL; + + count = find_available_mdev_instances(idxd); + + return sprintf(buf, "%d\n", count); +} +static MDEV_TYPE_ATTR_RO(available_instances); + +static ssize_t device_api_show(struct kobject *kobj, struct device *dev, + char *buf) +{ + return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING); +} +static MDEV_TYPE_ATTR_RO(device_api); + +static struct attribute *idxd_mdev_types_attrs[] = { + &mdev_type_attr_name.attr, + &mdev_type_attr_device_api.attr, + &mdev_type_attr_available_instances.attr, + NULL, +}; + +static struct attribute_group idxd_mdev_type_group0 = { + .name = "wq", + .attrs = idxd_mdev_types_attrs, +}; + +static struct attribute_group *idxd_mdev_type_groups[] = { + &idxd_mdev_type_group0, + NULL, +}; + +static const struct mdev_parent_ops idxd_vdcm_ops = { + .supported_type_groups = idxd_mdev_type_groups, + .create = idxd_vdcm_create, + .remove = idxd_vdcm_remove, + .open = idxd_vdcm_open, + .release = idxd_vdcm_release, + .read = idxd_vdcm_read, + .write = idxd_vdcm_write, + .mmap = idxd_vdcm_mmap, + .ioctl = idxd_vdcm_ioctl, +}; + +int idxd_mdev_host_init(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { + rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) + dev_warn(dev, "Failed to enable aux-domain: %d\n", + rc); + } else { + dev_dbg(dev, "No aux-domain feature.\n"); + } + + return mdev_register_device(dev, &idxd_vdcm_ops); +} + +void idxd_mdev_host_release(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { + rc = iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) + dev_warn(dev, "Failed to disable aux-domain: %d\n", + rc); + } + + mdev_unregister_device(dev); +} diff --git a/drivers/dma/idxd/mdev.h b/drivers/dma/idxd/mdev.h index 5b05b6cb2b7b..0b3a4c9822d4 100644 --- a/drivers/dma/idxd/mdev.h +++ b/drivers/dma/idxd/mdev.h @@ -48,6 +48,8 @@ struct ims_irq_entry { struct idxd_vdev { struct mdev_device *mdev; + struct vfio_region *region; + int num_regions; struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES]; struct notifier_block group_notifier; struct kvm *kvm; @@ -79,4 +81,25 @@ static inline struct vdcm_idxd *to_vidxd(struct idxd_vdev *vdev) return container_of(vdev, struct vdcm_idxd, vdev); } +#define IDXD_MDEV_NAME_LEN 16 +#define IDXD_MDEV_DESCRIPTION_LEN 64 + +enum idxd_mdev_type { + IDXD_MDEV_TYPE_WQ = 0, +}; + +#define IDXD_MDEV_TYPES 1 + +struct vdcm_idxd_type { + char name[IDXD_MDEV_NAME_LEN]; + char description[IDXD_MDEV_DESCRIPTION_LEN]; + enum idxd_mdev_type type; + unsigned int avail_instance; +}; + +enum idxd_vdcm_rw { + IDXD_VDCM_READ = 0, + IDXD_VDCM_WRITE, +}; + #endif diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index a39e7ae6b3d9..043cf825a71f 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -137,6 +137,8 @@ enum idxd_device_status_state { IDXD_DEVICE_STATE_HALT, }; +#define IDXD_GENSTATS_MASK 0x03 + enum idxd_device_reset_type { IDXD_DEVICE_RESET_SOFTWARE = 0, IDXD_DEVICE_RESET_FLR, @@ -160,6 +162,7 @@ union idxd_command_reg { }; u32 bits; } __packed; +#define IDXD_CMD_INT_MASK 0x80000000 enum idxd_cmd { IDXD_CMD_ENABLE_DEVICE = 1, @@ -333,4 +336,11 @@ union wqcfg { }; u32 bits[8]; } __packed; + +enum idxd_wq_hw_state { + IDXD_WQ_DEV_DISABLED = 0, + IDXD_WQ_DEV_ENABLED, + IDXD_WQ_DEV_BUSY, +}; + #endif diff --git a/drivers/dma/idxd/submit.c b/drivers/dma/idxd/submit.c index bdcac933bb28..ee976b51b88d 100644 --- a/drivers/dma/idxd/submit.c +++ b/drivers/dma/idxd/submit.c @@ -57,6 +57,21 @@ struct idxd_desc *idxd_alloc_desc(struct idxd_wq *wq, desc = wq->descs[idx]; memset(desc->hw, 0, sizeof(struct dsa_hw_desc)); memset(desc->completion, 0, sizeof(struct dsa_completion_record)); + + if (idxd->pasid_enabled) + desc->hw->pasid = idxd->pasid; + + /* + * Descriptor completion vectors are 1-8 for MSIX. We will round + * robin through the 8 vectors. + */ + if (!idxd->int_handles) { + wq->vec_ptr = (wq->vec_ptr % idxd->num_wq_irqs) + 1; + desc->hw->int_handle = wq->vec_ptr; + } else { + desc->hw->int_handle = idxd->int_handles[wq->id]; + } + return desc; } @@ -115,7 +130,6 @@ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc, enum idxd_op_type optype) { struct idxd_device *idxd = wq->idxd; - int vec = desc->hw->int_handle; int rc; void __iomem *portal; @@ -143,9 +157,19 @@ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc, * Pending the descriptor to the lockless list for the irq_entry * that we designated the descriptor to. */ - if (desc->hw->flags & IDXD_OP_FLAG_RCI) + if (desc->hw->flags & IDXD_OP_FLAG_RCI) { + int vec; + + /* + * If the driver is on host kernel, it would be the value + * assigned to interrupt handle, which is index for MSIX + * vector. If it's guest then we'll set it to 1 for now + * since only 1 workqueue is exported. + */ + vec = !idxd->int_handles ? desc->hw->int_handle : 1; llist_add(&desc->llnode, &idxd->irq_entries[vec].pending_llist); + } return 0; } diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 07bad4f6c7fb..a175c2381e0e 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -14,6 +15,7 @@ static char *idxd_wq_type_names[] = { [IDXD_WQT_NONE] = "none", [IDXD_WQT_KERNEL] = "kernel", [IDXD_WQT_USER] = "user", + [IDXD_WQT_MDEV] = "mdev", }; static void idxd_conf_device_release(struct device *dev) @@ -69,6 +71,11 @@ static inline bool is_idxd_wq_cdev(struct idxd_wq *wq) return wq->type == IDXD_WQT_USER; } +inline bool is_idxd_wq_mdev(struct idxd_wq *wq) +{ + return wq->type == IDXD_WQT_MDEV ? true : false; +} + static int idxd_config_bus_match(struct device *dev, struct device_driver *drv) { @@ -205,6 +212,13 @@ static int idxd_config_bus_probe(struct device *dev) mutex_unlock(&wq->wq_lock); return -EINVAL; } + + /* This check is added until we have SVM support for mdev */ + if (wq->type == IDXD_WQT_MDEV) { + dev_warn(dev, "Shared MDEV unsupported."); + mutex_unlock(&wq->wq_lock); + return -EINVAL; + } } rc = idxd_wq_alloc_resources(wq); @@ -237,7 +251,7 @@ static int idxd_config_bus_probe(struct device *dev) } } - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) { spin_unlock_irqrestore(&idxd->dev_lock, flags); mutex_unlock(&wq->wq_lock); @@ -250,7 +264,7 @@ static int idxd_config_bus_probe(struct device *dev) rc = idxd_wq_map_portal(wq); if (rc < 0) { dev_warn(dev, "wq portal mapping failed: %d\n", rc); - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) dev_warn(dev, "IDXD wq disable failed\n"); spin_unlock_irqrestore(&idxd->dev_lock, flags); @@ -311,7 +325,7 @@ static void disable_wq(struct idxd_wq *wq) idxd_wq_unmap_portal(wq); spin_lock_irqsave(&idxd->dev_lock, flags); - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); spin_unlock_irqrestore(&idxd->dev_lock, flags); idxd_wq_free_resources(wq); @@ -1106,6 +1120,100 @@ static ssize_t wq_threshold_store(struct device *dev, static struct device_attribute dev_attr_wq_threshold = __ATTR(threshold, 0644, wq_threshold_show, wq_threshold_store); +static ssize_t wq_uuid_store(struct device *dev, + struct device_attribute *attr, const char *buf, + size_t count) +{ + char *str; + int rc; + struct idxd_wq_uuid *entry, *n; + struct idxd_wq_uuid *wq_uuid; + struct idxd_wq *wq = container_of(dev, struct idxd_wq, conf_dev); + struct device *ddev = &wq->idxd->pdev->dev; + + if (wq->type != IDXD_WQT_MDEV) + return -EPERM; + + if (count < UUID_STRING_LEN || (count > UUID_STRING_LEN + 1)) + return -EINVAL; + + str = kstrndup(buf, count, GFP_KERNEL); + if (!str) + return -ENOMEM; + + wq_uuid = devm_kzalloc(ddev, sizeof(struct idxd_wq_uuid), GFP_KERNEL); + if (!wq_uuid) { + kfree(str); + return -ENOMEM; + } + + rc = guid_parse(str, &wq_uuid->uuid); + kfree(str); + if (rc) + return rc; + + mutex_lock(&wq->wq_lock); + /* If user writes 0, erase entire list. */ + if (guid_is_null(&wq_uuid->uuid)) { + list_for_each_entry_safe(entry, n, &wq->uuid_list, list) { + list_del(&entry->list); + devm_kfree(ddev, entry); + wq->uuids--; + } + + mutex_unlock(&wq->wq_lock); + return count; + } + + /* If uuid already exists, remove the old uuid. */ + list_for_each_entry_safe(entry, n, &wq->uuid_list, list) { + if (guid_equal(&wq_uuid->uuid, &entry->uuid)) { + list_del(&entry->list); + devm_kfree(ddev, entry); + wq->uuids--; + mutex_unlock(&wq->wq_lock); + return count; + } + } + + /* + * At this point, we are only adding, and the wq must be on in order + * to do so. A disabled wq type is ambiguous. + */ + if (wq->state != IDXD_WQ_ENABLED) + return -EPERM; + /* + * If wq is shared or wq is dedicated and list empty, + * put uuid into list. + */ + if (!wq_dedicated(wq) || list_empty(&wq->uuid_list)) { + wq->uuids++; + list_add(&wq_uuid->list, &wq->uuid_list); + } else { + mutex_unlock(&wq->wq_lock); + return -EPERM; + } + + mutex_unlock(&wq->wq_lock); + return count; +} + +static ssize_t wq_uuid_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct idxd_wq *wq = container_of(dev, struct idxd_wq, conf_dev); + struct idxd_wq_uuid *entry; + int out = 0; + + list_for_each_entry(entry, &wq->uuid_list, list) + out += sprintf(buf + out, "%pUl\n", &entry->uuid); + + return out; +} + +static struct device_attribute dev_attr_wq_uuid = + __ATTR(uuid, 0644, wq_uuid_show, wq_uuid_store); + static ssize_t wq_type_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1116,8 +1224,9 @@ static ssize_t wq_type_show(struct device *dev, return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_KERNEL]); case IDXD_WQT_USER: - return sprintf(buf, "%s\n", - idxd_wq_type_names[IDXD_WQT_USER]); + return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_USER]); + case IDXD_WQT_MDEV: + return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_MDEV]); case IDXD_WQT_NONE: default: return sprintf(buf, "%s\n", @@ -1127,6 +1236,20 @@ static ssize_t wq_type_show(struct device *dev, return -EINVAL; } +static void wq_clear_uuids(struct idxd_wq *wq) +{ + struct idxd_wq_uuid *entry, *n; + struct device *dev = &wq->idxd->pdev->dev; + + mutex_lock(&wq->wq_lock); + list_for_each_entry_safe(entry, n, &wq->uuid_list, list) { + list_del(&entry->list); + devm_kfree(dev, entry); + wq->uuids--; + } + mutex_unlock(&wq->wq_lock); +} + static ssize_t wq_type_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) @@ -1144,13 +1267,20 @@ static ssize_t wq_type_store(struct device *dev, wq->type = IDXD_WQT_KERNEL; else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_USER])) wq->type = IDXD_WQT_USER; + else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_MDEV])) + wq->type = IDXD_WQT_MDEV; else return -EINVAL; /* If we are changing queue type, clear the name */ - if (wq->type != old_type) + if (wq->type != old_type) { memset(wq->name, 0, WQ_NAME_SIZE + 1); + /* If changed out of MDEV type, clear uuids */ + if (wq->type != IDXD_WQT_MDEV) + wq_clear_uuids(wq); + } + return count; } @@ -1218,6 +1348,7 @@ static struct attribute *idxd_wq_attributes[] = { &dev_attr_wq_type.attr, &dev_attr_wq_name.attr, &dev_attr_wq_cdev_minor.attr, + &dev_attr_wq_uuid.attr, NULL, }; diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c new file mode 100644 index 000000000000..d2a15f1dae6a --- /dev/null +++ b/drivers/dma/idxd/vdev.c @@ -0,0 +1,570 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" +#include "../../vfio/pci/vfio_pci_private.h" +#include "mdev.h" +#include "vdev.h" + +static int idxd_get_mdev_pasid(struct mdev_device *mdev) +{ + struct iommu_domain *domain; + struct device *dev = mdev_dev(mdev); + + domain = mdev_get_iommu_domain(dev); + if (!domain) + return -EINVAL; + + return iommu_aux_get_pasid(domain, dev->parent); +} + +int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx) +{ + int rc = -1; + struct device *dev = &vidxd->idxd->pdev->dev; + + /* + * We need to check MSIX mask bit only for entry 0 because that is + * the only virtual interrupt. Other interrupts are physical + * interrupts, and they are setup such that we receive them only + * when guest wants to receive them. + */ + if (msix_idx == 0) { + u8 *msix_perm = &vidxd->bar0.msix_perm_table[0]; + + if (msix_perm[0] & 1) { + set_bit(0, (unsigned long *)&vidxd->bar0.msix_pba); + set_bit(1, (unsigned long *)msix_perm); + } + return 1; + } + + if (!vidxd->vdev.msix_trigger[msix_idx]) { + dev_warn(dev, "%s: intr evtfd not found %d\n", + __func__, msix_idx); + return -EINVAL; + } + + rc = eventfd_signal(vidxd->vdev.msix_trigger[msix_idx], 1); + if (rc != 1) + dev_err(dev, "eventfd signal failed (%d)\n", rc); + else + dev_dbg(dev, "vidxd interrupt triggered wq(%d) %d\n", + vidxd->wq->id, msix_idx); + + return rc; +} + +static void vidxd_mmio_init_grpcfg(struct vdcm_idxd *vidxd, + struct grpcfg *grpcfg) +{ + struct idxd_wq *wq = vidxd->wq; + struct idxd_group *group = wq->group; + int i; + + /* + * At this point, we are only exporting a single workqueue for + * each mdev. So we need to just fake it as first workqueue + * and also mark the available engines in this group. + */ + + /* Set single workqueue and the first one */ + grpcfg->wqs[0] = 0x1; + grpcfg->engines = 0; + for (i = 0; i < group->num_engines; i++) + grpcfg->engines |= BIT(i); + grpcfg->flags.bits = group->grpcfg.flags.bits; +} + +void vidxd_mmio_init(struct vdcm_idxd *vidxd) +{ + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + struct idxd_device *idxd = vidxd->idxd; + struct idxd_wq *wq = vidxd->wq; + union wqcfg *wqcfg; + struct grpcfg *grpcfg; + union wq_cap_reg *wq_cap; + union offsets_reg *offsets; + + /* setup wqcfg */ + wqcfg = (union wqcfg *)&bar0->wq_ctrl_regs[0]; + grpcfg = (struct grpcfg *)&bar0->grp_ctrl_regs[0]; + + wqcfg->wq_size = wq->size; + wqcfg->wq_thresh = wq->threshold; + + if (wq_dedicated(wq)) + wqcfg->mode = 1; + + if (idxd->hw.gen_cap.block_on_fault && + test_bit(WQ_FLAG_BOF, &wq->flags)) + wqcfg->bof = 1; + + wqcfg->priority = wq->priority; + wqcfg->max_xfer_shift = idxd->hw.gen_cap.max_xfer_shift; + wqcfg->max_batch_shift = idxd->hw.gen_cap.max_batch_shift; + /* make mode change read-only */ + wqcfg->mode_support = 0; + + /* setup grpcfg */ + vidxd_mmio_init_grpcfg(vidxd, grpcfg); + + /* setup wqcap */ + wq_cap = (union wq_cap_reg *)&bar0->cap_ctrl_regs[IDXD_WQCAP_OFFSET]; + memset(wq_cap, 0, sizeof(union wq_cap_reg)); + wq_cap->total_wq_size = wq->size; + wq_cap->num_wqs = 1; + if (wq_dedicated(wq)) + wq_cap->dedicated_mode = 1; + else + wq_cap->shared_mode = 1; + + offsets = (union offsets_reg *)&bar0->cap_ctrl_regs[IDXD_TABLE_OFFSET]; + offsets->grpcfg = VIDXD_GRPCFG_OFFSET / 0x100; + offsets->wqcfg = VIDXD_WQCFG_OFFSET / 0x100; + offsets->msix_perm = VIDXD_MSIX_PERM_OFFSET / 0x100; + + /* Clear MSI-X permissions table */ + memset(bar0->msix_perm_table, 0, 2 * 8); +} + +static void idxd_complete_command(struct vdcm_idxd *vidxd, + enum idxd_cmdsts_err val) +{ + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + u32 *cmd = (u32 *)&bar0->cap_ctrl_regs[IDXD_CMD_OFFSET]; + u32 *cmdsts = (u32 *)&bar0->cap_ctrl_regs[IDXD_CMDSTS_OFFSET]; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + *cmdsts = val; + dev_dbg(dev, "%s: cmd: %#x status: %#x\n", __func__, *cmd, val); + + if (*cmd & IDXD_CMD_INT_MASK) { + bar0->cap_ctrl_regs[IDXD_INTCAUSE_OFFSET] |= IDXD_INTC_CMD; + vidxd_send_interrupt(vidxd, 0); + } +} + +static void vidxd_enable(struct vdcm_idxd *vidxd) +{ + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + bool ats = (*(u16 *)&vidxd->cfg[VIDXD_ATS_OFFSET + 6]) & (1U << 15); + bool prs = (*(u16 *)&vidxd->cfg[VIDXD_PRS_OFFSET + 4]) & 1U; + bool pasid = (*(u16 *)&vidxd->cfg[VIDXD_PASID_OFFSET + 6]) & 1U; + u32 vdev_state = *(u32 *)&bar0->cap_ctrl_regs[IDXD_GENSTATS_OFFSET] & + IDXD_GENSTATS_MASK; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s\n", __func__); + + if (vdev_state == IDXD_DEVICE_STATE_ENABLED) + return idxd_complete_command(vidxd, + IDXD_CMDSTS_ERR_DEV_ENABLED); + + /* Check PCI configuration */ + if (!(vidxd->cfg[PCI_COMMAND] & PCI_COMMAND_MASTER)) + return idxd_complete_command(vidxd, + IDXD_CMDSTS_ERR_BUSMASTER_EN); + + if (pasid != prs || (pasid && !ats)) + return idxd_complete_command(vidxd, + IDXD_CMDSTS_ERR_BUSMASTER_EN); + + bar0->cap_ctrl_regs[IDXD_GENSTATS_OFFSET] = IDXD_DEVICE_STATE_ENABLED; + + return idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_disable(struct vdcm_idxd *vidxd) +{ + int rc; + struct idxd_wq *wq; + union wqcfg *wqcfg; + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 vdev_state = *(u32 *)&bar0->cap_ctrl_regs[IDXD_GENSTATS_OFFSET] & + IDXD_GENSTATS_MASK; + + dev_dbg(dev, "%s\n", __func__); + + if (vdev_state == IDXD_DEVICE_STATE_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN); + return; + } + + wqcfg = (union wqcfg *)&bar0->wq_ctrl_regs[0]; + wq = vidxd->wq; + + /* If it is a DWQ, need to disable the DWQ as well */ + rc = idxd_wq_drain(wq); + if (rc < 0) + dev_warn(dev, "vidxd drain wq %d failed: %d\n", + wq->id, rc); + + if (wq_dedicated(wq)) { + rc = idxd_wq_disable(wq, NULL); + if (rc < 0) + dev_warn(dev, "vidxd disable wq %d failed: %d\n", + wq->id, rc); + } + + wqcfg->wq_state = 0; + bar0->cap_ctrl_regs[IDXD_GENSTATS_OFFSET] = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_drain(struct vdcm_idxd *vidxd) +{ + int rc; + struct idxd_wq *wq; + union wqcfg *wqcfg; + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + u32 vdev_state = *(u32 *)&bar0->cap_ctrl_regs[IDXD_GENSTATS_OFFSET] & + IDXD_GENSTATS_MASK; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s\n", __func__); + + if (vdev_state == IDXD_DEVICE_STATE_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOT_EN); + return; + } + + wqcfg = (union wqcfg *)&bar0->wq_ctrl_regs[0]; + wq = vidxd->wq; + + rc = idxd_wq_drain(wq); + if (rc < 0) + dev_warn(dev, "wq %d drain failed: %d\n", wq->id, rc); + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_abort(struct vdcm_idxd *vidxd) +{ + int rc; + struct idxd_wq *wq; + union wqcfg *wqcfg; + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + u32 vdev_state = *(u32 *)&bar0->cap_ctrl_regs[IDXD_GENSTATS_OFFSET] & + IDXD_GENSTATS_MASK; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s\n", __func__); + + if (vdev_state == IDXD_DEVICE_STATE_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOT_EN); + return; + } + + wqcfg = (union wqcfg *)&bar0->wq_ctrl_regs[0]; + wq = vidxd->wq; + + rc = idxd_wq_abort(wq); + if (rc < 0) + dev_warn(dev, "wq %d drain failed: %d\n", wq->id, rc); + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_drain(struct vdcm_idxd *vidxd, int val) +{ + vidxd_drain(vidxd); +} + +static void vidxd_wq_abort(struct vdcm_idxd *vidxd, int val) +{ + vidxd_abort(vidxd); +} + +void vidxd_reset(struct vdcm_idxd *vidxd) +{ + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + int rc; + struct idxd_wq *wq; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + *(u32 *)&bar0->cap_ctrl_regs[IDXD_GENSTATS_OFFSET] = + IDXD_DEVICE_STATE_DRAIN; + + wq = vidxd->wq; + + rc = idxd_wq_drain(wq); + if (rc < 0) + dev_warn(dev, "wq %d drain failed: %d\n", wq->id, rc); + + /* If it is a DWQ, need to disable the DWQ as well */ + if (wq_dedicated(wq)) { + rc = idxd_wq_disable(wq, NULL); + if (rc < 0) + dev_warn(dev, "vidxd disable wq %d failed: %d\n", + wq->id, rc); + } + + vidxd_mmio_init(vidxd); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_alloc_int_handle(struct vdcm_idxd *vidxd, int vidx) +{ + bool ims = (vidx >> 16) & 1; + u32 cmdsts; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + vidx = vidx & 0xffff; + + dev_dbg(dev, "allocating int handle for %x\n", vidx); + + if (vidx != 1) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX); + return; + } + + if (ims) { + dev_warn(dev, "IMS allocation is not implemented yet\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_NO_HANDLE); + } else { + vidx--; /* MSIX idx 0 is a slow path interrupt */ + cmdsts = vidxd->ims_index[vidx] << 8; + dev_dbg(dev, "int handle %d:%lld\n", vidx, + vidxd->ims_index[vidx]); + idxd_complete_command(vidxd, cmdsts); + } +} + +static void vidxd_wq_enable(struct vdcm_idxd *vidxd, int wq_id) +{ + struct idxd_wq *wq; + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + union wq_cap_reg *wqcap; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_device *idxd; + union wqcfg *vwqcfg, *wqcfg; + unsigned long flags; + int rc; + + dev_dbg(dev, "%s\n", __func__); + + if (wq_id >= 1) { + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_WQIDX); + return; + } + + idxd = vidxd->idxd; + wq = vidxd->wq; + + dev_dbg(dev, "%s: wq %u:%u\n", __func__, wq_id, wq->id); + + vwqcfg = (union wqcfg *)&bar0->wq_ctrl_regs[wq_id]; + wqcap = (union wq_cap_reg *)&bar0->cap_ctrl_regs[IDXD_WQCAP_OFFSET]; + wqcfg = &wq->wqcfg; + + if (vidxd_state(vidxd) != IDXD_DEVICE_STATE_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOTEN); + return; + } + + if (vwqcfg->wq_state != IDXD_WQ_DEV_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_ENABLED); + return; + } + + if (vwqcfg->wq_size == 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_SIZE); + return; + } + + if ((!wq_dedicated(wq) && wqcap->shared_mode == 0) || + (wq_dedicated(wq) && wqcap->dedicated_mode == 0)) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_MODE); + return; + } + + if (wq_dedicated(wq)) { + int wq_pasid; + u32 status; + int priv; + + wq_pasid = idxd_get_mdev_pasid(mdev); + priv = 1; + + if (wq_pasid >= 0) { + wqcfg->bits[2] &= ~0x3fffff00; + wqcfg->priv = priv; + wqcfg->pasid_en = 1; + wqcfg->pasid = wq_pasid; + dev_dbg(dev, "program pasid %d in wq %d\n", + wq_pasid, wq->id); + spin_lock_irqsave(&idxd->dev_lock, flags); + idxd_wq_update_pasid(wq, wq_pasid); + idxd_wq_update_priv(wq, priv); + rc = idxd_wq_enable(wq, &status); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + if (rc < 0) { + dev_err(dev, "vidxd enable wq %d failed\n", wq->id); + idxd_complete_command(vidxd, status); + return; + } + } else { + dev_err(dev, + "idxd pasid setup failed wq %d wq_pasid %d\n", + wq->id, wq_pasid); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_PASID_EN); + return; + } + } + + vwqcfg->wq_state = IDXD_WQ_DEV_ENABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + union wqcfg *wqcfg; + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + int rc; + + wq = vidxd->wq; + + if (!(wq_id_mask & BIT(0))) { + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_WQIDX); + return; + } + + dev_dbg(dev, "vidxd disable wq %u:%u\n", 0, wq->id); + + wqcfg = (union wqcfg *)&bar0->wq_ctrl_regs[0]; + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOT_EN); + return; + } + + if (wq_dedicated(wq)) { + u32 status; + + rc = idxd_wq_disable(wq, &status); + if (rc < 0) { + dev_err(dev, "vidxd disable wq %d failed\n", wq->id); + idxd_complete_command(vidxd, status); + return; + } + } + + wqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) +{ + union idxd_command_reg *reg = + (union idxd_command_reg *)&vidxd->bar0.cap_ctrl_regs[IDXD_CMD_OFFSET]; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + reg->bits = val; + + dev_dbg(dev, "%s: cmd code: %u reg: %x\n", __func__, reg->cmd, + reg->bits); + + switch (reg->cmd) { + case IDXD_CMD_ENABLE_DEVICE: + vidxd_enable(vidxd); + break; + case IDXD_CMD_DISABLE_DEVICE: + vidxd_disable(vidxd); + break; + case IDXD_CMD_DRAIN_ALL: + vidxd_drain(vidxd); + break; + case IDXD_CMD_ABORT_ALL: + vidxd_abort(vidxd); + break; + case IDXD_CMD_RESET_DEVICE: + vidxd_reset(vidxd); + break; + case IDXD_CMD_ENABLE_WQ: + vidxd_wq_enable(vidxd, reg->operand); + break; + case IDXD_CMD_DISABLE_WQ: + vidxd_wq_disable(vidxd, reg->operand); + break; + case IDXD_CMD_DRAIN_WQ: + vidxd_wq_drain(vidxd, reg->operand); + break; + case IDXD_CMD_ABORT_WQ: + vidxd_wq_abort(vidxd, reg->operand); + break; + case IDXD_CMD_REQUEST_INT_HANDLE: + vidxd_alloc_int_handle(vidxd, reg->operand); + break; + default: + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD); + break; + } +} + +int vidxd_setup_ims_entry(struct vdcm_idxd *vidxd, int ims_idx, u32 val) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + int pasid; + unsigned int ims_offset; + + /* + * Current implementation limits to 1 WQ for the vdev and therefore + * also only 1 IMS interrupt for that vdev. + */ + if (ims_idx >= VIDXD_MAX_WQS) { + dev_warn(dev, "ims_idx greater than vidxd allowed: %d\n", + ims_idx); + return -EINVAL; + } + + /* Setup the PASID filtering */ + pasid = idxd_get_mdev_pasid(mdev); + + if (pasid >= 0) { + val = (1 << 3) | (pasid << 12) | (val & 7); + ims_offset = vidxd->idxd->ims_offset + + vidxd->ims_index[ims_idx] * 0x10; + iowrite32(val, vidxd->idxd->reg_base + ims_offset + 12); + } else { + dev_warn(dev, "pasid setup failed for ims entry %lld\n", + vidxd->ims_index[ims_idx]); + } + + return 0; +} + +int vidxd_free_ims_entry(struct vdcm_idxd *vidxd, int msix_idx) +{ + return 0; +} diff --git a/drivers/dma/idxd/vdev.h b/drivers/dma/idxd/vdev.h new file mode 100644 index 000000000000..3dfff6d0f641 --- /dev/null +++ b/drivers/dma/idxd/vdev.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2019 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_VDEV_H_ +#define _IDXD_VDEV_H_ + +static inline u64 get_reg_val(void *buf, int size) +{ + u64 val = 0; + + switch (size) { + case 8: + val = *(uint64_t *)buf; + break; + case 4: + val = *(uint32_t *)buf; + break; + case 2: + val = *(uint16_t *)buf; + break; + case 1: + val = *(uint8_t *)buf; + break; + } + + return val; +} + +static inline u8 vidxd_state(struct vdcm_idxd *vidxd) +{ + return vidxd->bar0.cap_ctrl_regs[IDXD_GENSTATS_OFFSET] + & IDXD_GENSTATS_MASK; +} + +void vidxd_mmio_init(struct vdcm_idxd *vidxd); +int vidxd_free_ims_entry(struct vdcm_idxd *vidxd, int msix_idx); +int vidxd_setup_ims_entry(struct vdcm_idxd *vidxd, int ims_idx, u32 val); +int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx); +void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val); +void vidxd_reset(struct vdcm_idxd *vidxd); + +#endif From patchwork Tue Apr 21 23:35:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274564 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496KgG63d5z9sSb for ; Wed, 22 Apr 2020 09:35:22 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726407AbgDUXfS (ORCPT ); Tue, 21 Apr 2020 19:35:18 -0400 Received: from mga04.intel.com ([192.55.52.120]:39205 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726487AbgDUXfR (ORCPT ); Tue, 21 Apr 2020 19:35:17 -0400 IronPort-SDR: oWe5K7SBdwNh0+W1YYGoSMcsxrfr/ZBVR3DM96VBH47FlffjuDyP2EF5S1wTFPlD4bVJadP6ZW +0jgvWAIsZYg== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:35:17 -0700 IronPort-SDR: L7TUKFuOs0nA+TAwy+KcP+IKm9U25fgE1ivpraZI9cf7sTYIE5bXfqOUI9KapnOd2y84/lpQB4 tirC60FDpeFw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="258876602" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga006.jf.intel.com with ESMTP; 21 Apr 2020 16:35:15 -0700 Subject: [PATCH RFC 14/15] dmaengine: idxd: add error notification from host driver to mediated device From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:35:15 -0700 Message-ID: <158751211522.36773.1692028393873153808.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org When a device error occurs, the mediated device need to be notified in order to notify the guest of device error. Add support to notify the specific mdev when an error is wq specific and broadcast errors to all mdev when it's a generic device error. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 2 ++ drivers/dma/idxd/irq.c | 4 ++++ drivers/dma/idxd/vdev.c | 33 +++++++++++++++++++++++++++++++++ drivers/dma/idxd/vdev.h | 1 + 4 files changed, 40 insertions(+) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 92a9718daa15..651196514ad5 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -362,5 +362,7 @@ void idxd_wq_del_cdev(struct idxd_wq *wq); /* mdev */ int idxd_mdev_host_init(struct idxd_device *idxd); void idxd_mdev_host_release(struct idxd_device *idxd); +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq); +void idxd_vidxd_send_errors(struct idxd_device *idxd); #endif diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index bc634dc4e485..256ef7d8a5c9 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -167,6 +167,8 @@ irqreturn_t idxd_misc_thread(int vec, void *data) if (wq->type == IDXD_WQT_USER) wake_up_interruptible(&wq->idxd_cdev.err_queue); + else if (wq->type == IDXD_WQT_MDEV) + idxd_wq_vidxd_send_errors(wq); } else { int i; @@ -175,6 +177,8 @@ irqreturn_t idxd_misc_thread(int vec, void *data) if (wq->type == IDXD_WQT_USER) wake_up_interruptible(&wq->idxd_cdev.err_queue); + else if (wq->type == IDXD_WQT_MDEV) + idxd_vidxd_send_errors(idxd); } } diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index d2a15f1dae6a..83985f0a336e 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -568,3 +568,36 @@ int vidxd_free_ims_entry(struct vdcm_idxd *vidxd, int msix_idx) { return 0; } + +static void vidxd_send_errors(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + struct vdcm_idxd_pci_bar0 *bar0 = &vidxd->bar0; + u64 *swerr = (u64 *)&bar0->cap_ctrl_regs[IDXD_SWERR_OFFSET]; + int i; + + for (i = 0; i < 4; i++) { + *swerr = idxd->sw_err.bits[i]; + swerr++; + } + vidxd_send_interrupt(vidxd, 0); +} + +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq) +{ + struct vdcm_idxd *vidxd; + + list_for_each_entry(vidxd, &wq->vdcm_list, list) + vidxd_send_errors(vidxd); +} + +void idxd_vidxd_send_errors(struct idxd_device *idxd) +{ + int i; + + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq = &idxd->wqs[i]; + + idxd_wq_vidxd_send_errors(wq); + } +} diff --git a/drivers/dma/idxd/vdev.h b/drivers/dma/idxd/vdev.h index 3dfff6d0f641..14c6631e670c 100644 --- a/drivers/dma/idxd/vdev.h +++ b/drivers/dma/idxd/vdev.h @@ -38,5 +38,6 @@ int vidxd_setup_ims_entry(struct vdcm_idxd *vidxd, int ims_idx, u32 val); int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx); void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val); void vidxd_reset(struct vdcm_idxd *vidxd); +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq); #endif From patchwork Tue Apr 21 23:35:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 1274565 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 496KgQ03fWz9sT3 for ; Wed, 22 Apr 2020 09:35:29 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726545AbgDUXfY (ORCPT ); Tue, 21 Apr 2020 19:35:24 -0400 Received: from mga01.intel.com ([192.55.52.88]:36954 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726296AbgDUXfX (ORCPT ); Tue, 21 Apr 2020 19:35:23 -0400 IronPort-SDR: CBv6vkLWbU8jfJ6IfHCADHQIASqxLNemeyhOlbla+m9YIM1MHlCL2GHHucxV3WvJo1a/oBjjeY MCkvsDvGrhhg== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Apr 2020 16:35:23 -0700 IronPort-SDR: TsB3fO4zi2456JXzd/fClc/FbQU2+V4BCcehrJuXYFj1PwTrfCLGA17vHSN4OBM/Wg3XKgLR2W YgqF215PAvbw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,411,1580803200"; d="scan'208";a="255449736" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga003.jf.intel.com with ESMTP; 21 Apr 2020 16:35:22 -0700 Subject: [PATCH RFC 15/15] dmaengine: idxd: add ABI documentation for mediated device support From: Dave Jiang To: vkoul@kernel.org, megha.dey@linux.intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Apr 2020 16:35:21 -0700 Message-ID: <158751212189.36773.8197911986164174637.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> References: <158751095889.36773.6009825070990637468.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Jing Lin Add the sysfs attribute bits in ABI/stable for mediated deivce and guest support. Signed-off-by: Jing Lin Signed-off-by: Dave Jiang --- Documentation/ABI/stable/sysfs-driver-dma-idxd | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd index c1adddde23c2..b04cbc5a1827 100644 --- a/Documentation/ABI/stable/sysfs-driver-dma-idxd +++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd @@ -76,6 +76,12 @@ Date: Jan 30, 2020 KernelVersion: 5.7.0 Contact: dmaengine@vger.kernel.org Description: To indicate if PASID (process address space identifier) is + +What: sys/bus/dsa/devices/dsa/ims_size +Date: Apr 13, 2020 +KernelVersion: 5.8.0 +Contact: dmaengine@vger.kernel.org +Description: Number of entries in the interrupt message storage table. enabled or not for this device. What: sys/bus/dsa/devices/dsa/state @@ -141,8 +147,16 @@ Date: Oct 25, 2019 KernelVersion: 5.6.0 Contact: dmaengine@vger.kernel.org Description: The type of this work queue, it can be "kernel" type for work - queue usages in the kernel space or "user" type for work queue - usages by applications in user space. + queue usages in the kernel space, "user" type for work queue + usages by applications in user space, or "mdev" type for + VFIO mediated devices. + +What: sys/bus/dsa/devices/wq./uuid +Date: Apr 13, 2020 +KernelVersion: 5.8.0 +Contact: dmaengine@vger.kernel.org +Description: The uuid attached to this work queue when the mediated device is + created. What: sys/bus/dsa/devices/wq./cdev_minor Date: Oct 25, 2019