From patchwork Mon Apr 23 23:30:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903238 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN9Q5VdZz9rxp for ; Tue, 24 Apr 2018 09:35:06 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932747AbeDWXcR (ORCPT ); Mon, 23 Apr 2018 19:32:17 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41932 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932659AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006J0-Oy; Mon, 23 Apr 2018 17:31:00 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv2-0005bL-SY; Mon, 23 Apr 2018 17:30:48 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:33 -0600 Message-Id: <20180423233046.21476-2-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_FREE autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 01/14] PCI/P2PDMA: Support peer-to-peer memory X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Some PCI devices may have memory mapped in a BAR space that's intended for use in peer-to-peer transactions. In order to enable such transactions the memory must be registered with ZONE_DEVICE pages so it can be used by DMA interfaces in existing drivers. Add an interface for other subsystems to find and allocate chunks of P2P memory as necessary to facilitate transfers between two PCI peers: int pci_p2pdma_add_client(); struct pci_dev *pci_p2pmem_find(); void *pci_alloc_p2pmem(); The new interface requires a driver to collect a list of client devices involved in the transaction with the pci_p2pmem_add_client*() functions then call pci_p2pmem_find() to obtain any suitable P2P memory. Once this is done the list is bound to the memory and the calling driver is free to add and remove clients as necessary (adding incompatible clients will fail). With a suitable p2pmem device, memory can then be allocated with pci_alloc_p2pmem() for use in DMA transactions. Depending on hardware, using peer-to-peer memory may reduce the bandwidth of the transfer but can significantly reduce pressure on system memory. This may be desirable in many cases: for example a system could be designed with a small CPU connected to a PCI switch by a small number of lanes which would maximize the number of lanes available to connect to NVMe devices. The code is designed to only utilize the p2pmem device if all the devices involved in a transfer are behind the same root port (typically through a network of PCIe switches). This is because we have no way of knowing whether peer-to-peer routing between PCIe Root Ports is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P transfers that go through the RC is limited to only reducing DRAM usage and, in some cases, coding convenience. The PCI-SIG may be exploring adding a new capability bit to advertise whether this is possible for future hardware. This commit includes significant rework and feedback from Christoph Hellwig. Signed-off-by: Christoph Hellwig Signed-off-by: Logan Gunthorpe --- drivers/pci/Kconfig | 17 ++ drivers/pci/Makefile | 1 + drivers/pci/p2pdma.c | 694 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/memremap.h | 18 ++ include/linux/pci-p2pdma.h | 100 +++++++ include/linux/pci.h | 4 + 6 files changed, 834 insertions(+) create mode 100644 drivers/pci/p2pdma.c create mode 100644 include/linux/pci-p2pdma.h diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index 34b56a8f8480..b2396c22b53e 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -124,6 +124,23 @@ config PCI_PASID If unsure, say N. +config PCI_P2PDMA + bool "PCI peer-to-peer transfer support" + depends on PCI && ZONE_DEVICE && EXPERT + select GENERIC_ALLOCATOR + help + Enableѕ drivers to do PCI peer-to-peer transactions to and from + BARs that are exposed in other devices that are the part of + the hierarchy where peer-to-peer DMA is guaranteed by the PCI + specification to work (ie. anything below a single PCI bridge). + + Many PCIe root complexes do not support P2P transactions and + it's hard to tell which support it at all, so at this time, DMA + transations must be between devices behind the same root port. + (Typically behind a network of PCIe switches). + + If unsure, say N. + config PCI_LABEL def_bool y if (DMI || ACPI) depends on PCI diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 952addc7bacf..050c1e19a1de 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -25,6 +25,7 @@ obj-$(CONFIG_X86_INTEL_MID) += pci-mid.o obj-$(CONFIG_PCI_SYSCALL) += syscall.o obj-$(CONFIG_PCI_STUB) += pci-stub.o obj-$(CONFIG_PCI_ECAM) += ecam.o +obj-$(CONFIG_PCI_P2PDMA) += p2pdma.o obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o obj-y += host/ diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c new file mode 100644 index 000000000000..e524a12eca1f --- /dev/null +++ b/drivers/pci/p2pdma.c @@ -0,0 +1,694 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * PCI Peer 2 Peer DMA support. + * + * Copyright (c) 2016-2018, Logan Gunthorpe + * Copyright (c) 2016-2017, Microsemi Corporation + * Copyright (c) 2017, Christoph Hellwig + * Copyright (c) 2018, Eideticom Inc. + * + */ + +#include +#include +#include +#include +#include +#include +#include + +struct pci_p2pdma { + struct percpu_ref devmap_ref; + struct completion devmap_ref_done; + struct gen_pool *pool; + bool p2pmem_published; +}; + +static void pci_p2pdma_percpu_release(struct percpu_ref *ref) +{ + struct pci_p2pdma *p2p = + container_of(ref, struct pci_p2pdma, devmap_ref); + + complete_all(&p2p->devmap_ref_done); +} + +static void pci_p2pdma_percpu_kill(void *data) +{ + struct percpu_ref *ref = data; + + if (percpu_ref_is_dying(ref)) + return; + + percpu_ref_kill(ref); +} + +static void pci_p2pdma_release(void *data) +{ + struct pci_dev *pdev = data; + + if (!pdev->p2pdma) + return; + + wait_for_completion(&pdev->p2pdma->devmap_ref_done); + percpu_ref_exit(&pdev->p2pdma->devmap_ref); + + gen_pool_destroy(pdev->p2pdma->pool); + pdev->p2pdma = NULL; +} + +static int pci_p2pdma_setup(struct pci_dev *pdev) +{ + int error = -ENOMEM; + struct pci_p2pdma *p2p; + + p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL); + if (!p2p) + return -ENOMEM; + + p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev)); + if (!p2p->pool) + goto out; + + init_completion(&p2p->devmap_ref_done); + error = percpu_ref_init(&p2p->devmap_ref, + pci_p2pdma_percpu_release, 0, GFP_KERNEL); + if (error) + goto out_pool_destroy; + + percpu_ref_switch_to_atomic_sync(&p2p->devmap_ref); + + error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); + if (error) + goto out_pool_destroy; + + pdev->p2pdma = p2p; + + return 0; + +out_pool_destroy: + gen_pool_destroy(p2p->pool); +out: + devm_kfree(&pdev->dev, p2p); + return error; +} + +/** + * pci_p2pdma_add_resource - add memory for use as p2p memory + * @pdev: the device to add the memory to + * @bar: PCI BAR to add + * @size: size of the memory to add, may be zero to use the whole BAR + * @offset: offset into the PCI BAR + * + * The memory will be given ZONE_DEVICE struct pages so that it may + * be used with any DMA request. + */ +int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, + u64 offset) +{ + struct dev_pagemap *pgmap; + void *addr; + int error; + + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) + return -EINVAL; + + if (offset >= pci_resource_len(pdev, bar)) + return -EINVAL; + + if (!size) + size = pci_resource_len(pdev, bar) - offset; + + if (size + offset > pci_resource_len(pdev, bar)) + return -EINVAL; + + if (!pdev->p2pdma) { + error = pci_p2pdma_setup(pdev); + if (error) + return error; + } + + pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL); + if (!pgmap) + return -ENOMEM; + + pgmap->res.start = pci_resource_start(pdev, bar) + offset; + pgmap->res.end = pgmap->res.start + size - 1; + pgmap->res.flags = pci_resource_flags(pdev, bar); + pgmap->ref = &pdev->p2pdma->devmap_ref; + pgmap->type = MEMORY_DEVICE_PCI_P2PDMA; + + addr = devm_memremap_pages(&pdev->dev, pgmap); + if (IS_ERR(addr)) { + error = PTR_ERR(addr); + goto pgmap_free; + } + + error = gen_pool_add_virt(pdev->p2pdma->pool, (unsigned long)addr, + pci_bus_address(pdev, bar) + offset, + resource_size(&pgmap->res), dev_to_node(&pdev->dev)); + if (error) + goto pgmap_free; + + error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill, + &pdev->p2pdma->devmap_ref); + if (error) + goto pgmap_free; + + pci_info(pdev, "added peer-to-peer DMA memory %pR\n", + &pgmap->res); + + return 0; + +pgmap_free: + devres_free(pgmap); + return error; +} +EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource); + +static struct pci_dev *find_parent_pci_dev(struct device *dev) +{ + struct device *parent; + + dev = get_device(dev); + + while (dev) { + if (dev_is_pci(dev)) + return to_pci_dev(dev); + + parent = get_device(dev->parent); + put_device(dev); + dev = parent; + } + + return NULL; +} + +/* + * If a device is behind a switch, we try to find the upstream bridge + * port of the switch. This requires two calls to pci_upstream_bridge(): + * one for the upstream port on the switch, one on the upstream port + * for the next level in the hierarchy. Because of this, devices connected + * to the root port will be rejected. + */ +static struct pci_dev *get_upstream_bridge_port(struct pci_dev *pdev) +{ + struct pci_dev *up1, *up2; + + if (!pdev) + return NULL; + + up1 = pci_dev_get(pci_upstream_bridge(pdev)); + if (!up1) + return NULL; + + up2 = pci_dev_get(pci_upstream_bridge(up1)); + pci_dev_put(up1); + + return up2; +} + +/* + * Find the distance through the nearest common upstream bridge between + * two PCI devices. + * + * If the two devices are the same device then 0 will be returned. + * + * If there are two virtual functions of the same device behind the same + * bridge port then 2 will be returned (one step down to the bridge then + * one step back to the same device). + * + * In the case where two devices are connected to the same PCIe switch, the + * value 4 will be returned. This corresponds to the following PCI tree: + * + * -+ Root Port + * \+ Switch Upstream Port + * +-+ Switch Downstream Port + * + \- Device A + * \-+ Switch Downstream Port + * \- Device B + * + * The distance is 4 because we traverse from Device A through the downstream + * port of the switch, to the common upstream port, back up to the second + * downstream port and then to Device B. + * + * Any two devices that don't have a common upstream bridge will return -1. + * In this way devices on seperate root ports will be rejected, which + * is what we want for peer-to-peer seeing there's no way to determine + * if the root complex supports forwarding between root ports. + * + * In the case where two devices are connected to different PCIe switches + * this function will still return a positive distance as long as both + * switches evenutally have a common upstream bridge. Note this covers + * the case of using multiple PCIe switches to achieve a desired level of + * fan-out from a root port. The exact distance will be a function of the + * number of switches between Device A and Device B. + * + */ +static int upstream_bridge_distance(struct pci_dev *a, + struct pci_dev *b) +{ + int dist_a = 0; + int dist_b = 0; + struct pci_dev *aa, *bb = NULL, *tmp; + + aa = pci_dev_get(a); + + while (aa) { + dist_b = 0; + + pci_dev_put(bb); + bb = pci_dev_get(b); + + while (bb) { + if (aa == bb) + goto put_and_return; + + tmp = pci_dev_get(pci_upstream_bridge(bb)); + pci_dev_put(bb); + bb = tmp; + + dist_b++; + } + + tmp = pci_dev_get(pci_upstream_bridge(aa)); + pci_dev_put(aa); + aa = tmp; + + dist_a++; + } + + dist_a = -1; + dist_b = 0; + +put_and_return: + pci_dev_put(bb); + pci_dev_put(aa); + + return dist_a + dist_b; +} + +struct pci_p2pdma_client { + struct list_head list; + struct pci_dev *client; + struct pci_dev *provider; +}; + +/** + * pci_p2pdma_add_client - allocate a new element in a client device list + * @head: list head of p2pdma clients + * @dev: device to add to the list + * + * This adds @dev to a list of clients used by a p2pdma device. + * This list should be passed to pci_p2pmem_find(). Once pci_p2pmem_find() has + * been called successfully, the list will be bound to a specific p2pdma + * device and new clients can only be added to the list if they are + * supported by that p2pdma device. + * + * The caller is expected to have a lock which protects @head as necessary + * so that none of the pci_p2p functions can be called concurrently + * on that list. + * + * Returns 0 if the client was successfully added. + */ +int pci_p2pdma_add_client(struct list_head *head, struct device *dev) +{ + struct pci_p2pdma_client *item, *new_item; + struct pci_dev *provider = NULL; + struct pci_dev *client; + int ret; + + if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) && dev->dma_ops == &dma_virt_ops) { + dev_warn(dev, "cannot be used for peer-to-peer DMA because the driver makes use of dma_virt_ops\n"); + return -ENODEV; + } + + + client = find_parent_pci_dev(dev); + if (!client) { + dev_warn(dev, "cannot be used for peer-to-peer DMA as it is not a PCI device\n"); + return -ENODEV; + } + + item = list_first_entry_or_null(head, struct pci_p2pdma_client, list); + if (item && item->provider) { + provider = item->provider; + + if (upstream_bridge_distance(provider, client) < 0) { + dev_warn(dev, "cannot be used for peer-to-peer DMA as the client and provider do not share an upstream bridge\n"); + + ret = -EXDEV; + goto put_client; + } + } + + new_item = kzalloc(sizeof(*new_item), GFP_KERNEL); + if (!new_item) { + ret = -ENOMEM; + goto put_client; + } + + new_item->client = client; + new_item->provider = pci_dev_get(provider); + + list_add_tail(&new_item->list, head); + + return 0; + +put_client: + pci_dev_put(client); + return ret; +} +EXPORT_SYMBOL_GPL(pci_p2pdma_add_client); + +static void pci_p2pdma_client_free(struct pci_p2pdma_client *item) +{ + list_del(&item->list); + pci_dev_put(item->client); + pci_dev_put(item->provider); + kfree(item); +} + +/** + * pci_p2pdma_remove_client - remove and free a p2pdma client + * @head: list head of p2pdma clients + * @dev: device to remove from the list + * + * This removes @dev from a list of clients used by a p2pdma device. + * The caller is expected to have a lock which protects @head as necessary + * so that none of the pci_p2p functions can be called concurrently + * on that list. + */ +void pci_p2pdma_remove_client(struct list_head *head, struct device *dev) +{ + struct pci_p2pdma_client *pos, *tmp; + struct pci_dev *pdev; + + pdev = find_parent_pci_dev(dev); + if (!pdev) + return; + + list_for_each_entry_safe(pos, tmp, head, list) { + if (pos->client != pdev) + continue; + + pci_p2pdma_client_free(pos); + } + + pci_dev_put(pdev); +} +EXPORT_SYMBOL_GPL(pci_p2pdma_remove_client); + +/** + * pci_p2pdma_client_list_free - free an entire list of p2pdma clients + * @head: list head of p2pdma clients + * + * This removes all devices in a list of clients used by a p2pdma device. + * The caller is expected to have a lock which protects @head as necessary + * so that none of the pci_p2pdma functions can be called concurrently + * on that list. + */ +void pci_p2pdma_client_list_free(struct list_head *head) +{ + struct pci_p2pdma_client *pos, *tmp; + + list_for_each_entry_safe(pos, tmp, head, list) + pci_p2pdma_client_free(pos); +} +EXPORT_SYMBOL_GPL(pci_p2pdma_client_list_free); + +/** + * pci_p2pdma_distance - Determive the cumulative distance between + * a p2pdma provider and the clients in use. + * @provider: p2pdma provider to check against the client list + * @clients: list of devices to check (NULL-terminated) + * + * Returns -1 if any of the clients are not compatible (behind the same + * root port as the provider), otherwise returns a positive number where + * the lower number is the preferrable choice. (If there's one client + * that's the same as the provider it will return 0, which is best choice). + * + * For now, "compatible" means the provider and the clients are all behind + * the same PCI root port. This cuts out cases that may work but is safest + * for the user. Future work can expand this to white-list root complexes that + * can safely forward between each ports. + */ +int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients) +{ + struct pci_p2pdma_client *pos; + int ret; + int distance = 0; + + if (list_empty(clients)) + return -1; + + list_for_each_entry(pos, clients, list) { + ret = upstream_bridge_distance(provider, pos->client); + if (ret < 0) + goto no_match; + + distance += ret; + } + + ret = distance; + +no_match: + return ret; +} +EXPORT_SYMBOL_GPL(pci_p2pdma_distance); + +/** + * pci_p2pdma_assign_provider - Check compatibily (as per pci_p2pdma_distance) + * and assign a provider to a list of clients + * @provider: p2pdma provider to assign to the client list + * @clients: list of devices to check (NULL-terminated) + * + * Returns false if any of the clients are not compatible, true if the + * provider was successfully assigned to the clients. + */ +bool pci_p2pdma_assign_provider(struct pci_dev *provider, + struct list_head *clients) +{ + struct pci_p2pdma_client *pos; + + if (pci_p2pdma_distance(provider, clients) < 0) + return false; + + list_for_each_entry(pos, clients, list) + pos->provider = provider; + + return true; +} +EXPORT_SYMBOL_GPL(pci_p2pdma_assign_provider); + +/** + * pci_has_p2pmem - check if a given PCI device has published any p2pmem + * @pdev: PCI device to check + */ +bool pci_has_p2pmem(struct pci_dev *pdev) +{ + return pdev->p2pdma && pdev->p2pdma->p2pmem_published; +} +EXPORT_SYMBOL_GPL(pci_has_p2pmem); + +/** + * pci_p2pmem_find - find a peer-to-peer DMA memory device compatible with + * the specified list of clients and shortest distance (as determined + * by pci_p2pmem_dma()) + * @clients: list of devices to check (NULL-terminated) + * + * If multiple devices are behind the same switch, the one "closest" to the + * client devices in use will be chosen first. (So if one of the providers are + * the same as one of the clients, that provider will be used ahead of any + * other providers that are unrelated). If multiple providers are an equal + * distance away, one will be chosen at random. + * + * Returns a pointer to the PCI device with a reference taken (use pci_dev_put + * to return the reference) or NULL if no compatible device is found. The + * found provider will also be assigned to the client list. + */ +struct pci_dev *pci_p2pmem_find(struct list_head *clients) +{ + struct pci_dev *pdev = NULL; + struct pci_p2pdma_client *pos; + int distance; + int closest_distance = INT_MAX; + struct pci_dev **closest_pdevs; + int dev_cnt = 0; + const int max_devs = PAGE_SIZE / sizeof(*closest_pdevs); + int i; + + closest_pdevs = kmalloc(PAGE_SIZE, GFP_KERNEL); + + while ((pdev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, pdev))) { + if (!pci_has_p2pmem(pdev)) + continue; + + distance = pci_p2pdma_distance(pdev, clients); + if (distance < 0 || distance > closest_distance) + continue; + + if (distance == closest_distance && dev_cnt >= max_devs) + continue; + + if (distance < closest_distance) { + for (i = 0; i < dev_cnt; i++) + pci_dev_put(closest_pdevs[i]); + + dev_cnt = 0; + closest_distance = distance; + } + + closest_pdevs[dev_cnt++] = pci_dev_get(pdev); + } + + if (dev_cnt) + pdev = pci_dev_get(closest_pdevs[prandom_u32_max(dev_cnt)]); + + for (i = 0; i < dev_cnt; i++) + pci_dev_put(closest_pdevs[i]); + + if (pdev) + list_for_each_entry(pos, clients, list) + pos->provider = pdev; + + kfree(closest_pdevs); + return pdev; +} +EXPORT_SYMBOL_GPL(pci_p2pmem_find); + +/** + * pci_alloc_p2p_mem - allocate peer-to-peer DMA memory + * @pdev: the device to allocate memory from + * @size: number of bytes to allocate + * + * Returns the allocated memory or NULL on error. + */ +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size) +{ + void *ret; + + if (unlikely(!pdev->p2pdma)) + return NULL; + + if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref))) + return NULL; + + ret = (void *)gen_pool_alloc(pdev->p2pdma->pool, size); + + if (unlikely(!ret)) + percpu_ref_put(&pdev->p2pdma->devmap_ref); + + return ret; +} +EXPORT_SYMBOL_GPL(pci_alloc_p2pmem); + +/** + * pci_free_p2pmem - allocate peer-to-peer DMA memory + * @pdev: the device the memory was allocated from + * @addr: address of the memory that was allocated + * @size: number of bytes that was allocated + */ +void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size) +{ + gen_pool_free(pdev->p2pdma->pool, (uintptr_t)addr, size); + percpu_ref_put(&pdev->p2pdma->devmap_ref); +} +EXPORT_SYMBOL_GPL(pci_free_p2pmem); + +/** + * pci_virt_to_bus - return the PCI bus address for a given virtual + * address obtained with pci_alloc_p2pmem() + * @pdev: the device the memory was allocated from + * @addr: address of the memory that was allocated + */ +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr) +{ + if (!addr) + return 0; + if (!pdev->p2pdma) + return 0; + + /* + * Note: when we added the memory to the pool we used the PCI + * bus address as the physical address. So gen_pool_virt_to_phys() + * actually returns the bus address despite the misleading name. + */ + return gen_pool_virt_to_phys(pdev->p2pdma->pool, (unsigned long)addr); +} +EXPORT_SYMBOL_GPL(pci_p2pmem_virt_to_bus); + +/** + * pci_p2pmem_alloc_sgl - allocate peer-to-peer DMA memory in a scatterlist + * @pdev: the device to allocate memory from + * @sgl: the allocated scatterlist + * @nents: the number of SG entries in the list + * @length: number of bytes to allocate + * + * Returns 0 on success + */ +struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev, + unsigned int *nents, u32 length) +{ + struct scatterlist *sg; + void *addr; + + sg = kzalloc(sizeof(*sg), GFP_KERNEL); + if (!sg) + return NULL; + + sg_init_table(sg, 1); + + addr = pci_alloc_p2pmem(pdev, length); + if (!addr) + goto out_free_sg; + + sg_set_buf(sg, addr, length); + *nents = 1; + return sg; + +out_free_sg: + kfree(sg); + return NULL; +} +EXPORT_SYMBOL_GPL(pci_p2pmem_alloc_sgl); + +/** + * pci_p2pmem_free_sgl - free a scatterlist allocated by pci_p2pmem_alloc_sgl() + * @pdev: the device to allocate memory from + * @sgl: the allocated scatterlist + * @nents: the number of SG entries in the list + */ +void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl) +{ + struct scatterlist *sg; + int count; + + for_each_sg(sgl, sg, INT_MAX, count) { + if (!sg) + break; + + pci_free_p2pmem(pdev, sg_virt(sg), sg->length); + } + kfree(sgl); +} +EXPORT_SYMBOL_GPL(pci_p2pmem_free_sgl); + +/** + * pci_p2pmem_publish - publish the peer-to-peer DMA memory for use by + * other devices with pci_p2pmem_find() + * @pdev: the device with peer-to-peer DMA memory to publish + * @publish: set to true to publish the memory, false to unpublish it + * + * Published memory can be used by other PCI device drivers for + * peer-2-peer DMA operations. Non-published memory is reserved for + * exlusive use of the device driver that registers the peer-to-peer + * memory. + */ +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish) +{ + if (publish && !pdev->p2pdma) + return; + + pdev->p2pdma->p2pmem_published = publish; +} +EXPORT_SYMBOL_GPL(pci_p2pmem_publish); diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 7b4899c06f49..9e907c338a44 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -53,11 +53,16 @@ struct vmem_altmap { * driver can hotplug the device memory using ZONE_DEVICE and with that memory * type. Any page of a process can be migrated to such memory. However no one * should be allow to pin such memory so that it can always be evicted. + * + * MEMORY_DEVICE_PCI_P2PDMA: + * Device memory residing in a PCI BAR intended for use with Peer-to-Peer + * transactions. */ enum memory_type { MEMORY_DEVICE_HOST = 0, MEMORY_DEVICE_PRIVATE, MEMORY_DEVICE_PUBLIC, + MEMORY_DEVICE_PCI_P2PDMA, }; /* @@ -161,6 +166,19 @@ static inline void vmem_altmap_free(struct vmem_altmap *altmap, } #endif /* CONFIG_ZONE_DEVICE */ +#ifdef CONFIG_PCI_P2PDMA +static inline bool is_pci_p2pdma_page(const struct page *page) +{ + return is_zone_device_page(page) && + page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA; +} +#else /* CONFIG_PCI_P2PDMA */ +static inline bool is_pci_p2pdma_page(const struct page *page) +{ + return false; +} +#endif /* CONFIG_PCI_P2PDMA */ + #if defined(CONFIG_DEVICE_PRIVATE) || defined(CONFIG_DEVICE_PUBLIC) static inline bool is_device_private_page(const struct page *page) { diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h new file mode 100644 index 000000000000..80e931cb1235 --- /dev/null +++ b/include/linux/pci-p2pdma.h @@ -0,0 +1,100 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * PCI Peer 2 Peer DMA support. + * + * Copyright (c) 2016-2018, Logan Gunthorpe + * Copyright (c) 2016-2017, Microsemi Corporation + * Copyright (c) 2017, Christoph Hellwig + * Copyright (c) 2018, Eideticom Inc. + * + */ + +#ifndef _LINUX_PCI_P2PDMA_H +#define _LINUX_PCI_P2PDMA_H + +#include + +struct block_device; +struct scatterlist; + +#ifdef CONFIG_PCI_P2PDMA +int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, + u64 offset); +int pci_p2pdma_add_client(struct list_head *head, struct device *dev); +void pci_p2pdma_remove_client(struct list_head *head, struct device *dev); +void pci_p2pdma_client_list_free(struct list_head *head); +int pci_p2pdma_distance(struct pci_dev *provider, struct list_head *clients); +bool pci_p2pdma_assign_provider(struct pci_dev *provider, + struct list_head *clients); +bool pci_has_p2pmem(struct pci_dev *pdev); +struct pci_dev *pci_p2pmem_find(struct list_head *clients); +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size); +void pci_free_p2pmem(struct pci_dev *pdev, void *addr, size_t size); +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr); +struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev, + unsigned int *nents, u32 length); +void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl); +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish); +#else /* CONFIG_PCI_P2PDMA */ +static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, + size_t size, u64 offset) +{ + return 0; +} +static inline int pci_p2pdma_add_client(struct list_head *head, + struct device *dev) +{ + return 0; +} +static inline void pci_p2pdma_remove_client(struct list_head *head, + struct device *dev) +{ +} +static inline void pci_p2pdma_client_list_free(struct list_head *head) +{ +} +static inline int pci_p2pdma_distance(struct pci_dev *provider, + struct list_head *clients) +{ + return -1; +} +static inline bool pci_p2pdma_assign_provider(struct pci_dev *provider, + struct list_head *clients) +{ + return false; +} +static inline bool pci_has_p2pmem(struct pci_dev *pdev) +{ + return false; +} +static inline struct pci_dev *pci_p2pmem_find(struct list_head *clients) +{ + return NULL; +} +static inline void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size) +{ + return NULL; +} +static inline void pci_free_p2pmem(struct pci_dev *pdev, void *addr, + size_t size) +{ +} +static inline pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, + void *addr) +{ + return 0; +} +static inline struct scatterlist * pci_p2pmem_alloc_sgl(struct pci_dev *pdev, + unsigned int *nents, u32 length) +{ + return NULL; +} +static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev, + struct scatterlist *sgl) +{ +} +static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish) +{ +} +#endif /* CONFIG_PCI_P2PDMA */ +#endif /* _LINUX_PCI_P2P_H */ diff --git a/include/linux/pci.h b/include/linux/pci.h index 73178a2fcee0..005feaea8dca 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -277,6 +277,7 @@ struct pcie_link_state; struct pci_vpd; struct pci_sriov; struct pci_ats; +struct pci_p2pdma; /* The pci_dev structure describes PCI devices */ struct pci_dev { @@ -430,6 +431,9 @@ struct pci_dev { #ifdef CONFIG_PCI_PASID u16 pasid_features; #endif +#ifdef CONFIG_PCI_P2PDMA + struct pci_p2pdma *p2pdma; +#endif phys_addr_t rom; /* Physical address if not from BAR */ size_t romlen; /* Length if not from BAR */ char *driver_override; /* Driver name to force a match */ From patchwork Mon Apr 23 23:30:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903231 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN8B46RCz9rxp for ; Tue, 24 Apr 2018 09:34:02 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932827AbeDWXcj (ORCPT ); Mon, 23 Apr 2018 19:32:39 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41928 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932642AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006J1-On; Mon, 23 Apr 2018 17:31:01 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv3-0005bO-22; Mon, 23 Apr 2018 17:30:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:34 -0600 Message-Id: <20180423233046.21476-3-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.5 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_FREE, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 02/14] PCI/P2PDMA: Add sysfs group to display p2pmem stats X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add a sysfs group to display statistics about P2P memory that is registered in each PCI device. Attributes in the group display the total amount of P2P memory, the amount available and whether it is published or not. Signed-off-by: Logan Gunthorpe --- Documentation/ABI/testing/sysfs-bus-pci | 25 +++++++++++++++ drivers/pci/p2pdma.c | 54 +++++++++++++++++++++++++++++++++ 2 files changed, 79 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 44d4b2be92fd..044812c816d0 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -323,3 +323,28 @@ Description: This is similar to /sys/bus/pci/drivers_autoprobe, but affects only the VFs associated with a specific PF. + +What: /sys/bus/pci/devices/.../p2pmem/available +Date: November 2017 +Contact: Logan Gunthorpe +Description: + If the device has any Peer-to-Peer memory registered, this + file contains the amount of memory that has not been + allocated (in decimal). + +What: /sys/bus/pci/devices/.../p2pmem/size +Date: November 2017 +Contact: Logan Gunthorpe +Description: + If the device has any Peer-to-Peer memory registered, this + file contains the total amount of memory that the device + provides (in decimal). + +What: /sys/bus/pci/devices/.../p2pmem/published +Date: November 2017 +Contact: Logan Gunthorpe +Description: + If the device has any Peer-to-Peer memory registered, this + file contains a '1' if the memory has been published for + use inside the kernel or a '0' if it is only intended + for use within the driver that published it. diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index e524a12eca1f..4daad6374869 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -24,6 +24,54 @@ struct pci_p2pdma { bool p2pmem_published; }; +static ssize_t size_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pci_dev *pdev = to_pci_dev(dev); + size_t size = 0; + + if (pdev->p2pdma->pool) + size = gen_pool_size(pdev->p2pdma->pool); + + return snprintf(buf, PAGE_SIZE, "%zd\n", size); +} +static DEVICE_ATTR_RO(size); + +static ssize_t available_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pci_dev *pdev = to_pci_dev(dev); + size_t avail = 0; + + if (pdev->p2pdma->pool) + avail = gen_pool_avail(pdev->p2pdma->pool); + + return snprintf(buf, PAGE_SIZE, "%zd\n", avail); +} +static DEVICE_ATTR_RO(available); + +static ssize_t published_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pci_dev *pdev = to_pci_dev(dev); + + return snprintf(buf, PAGE_SIZE, "%d\n", + pdev->p2pdma->p2pmem_published); +} +static DEVICE_ATTR_RO(published); + +static struct attribute *p2pmem_attrs[] = { + &dev_attr_size.attr, + &dev_attr_available.attr, + &dev_attr_published.attr, + NULL, +}; + +static const struct attribute_group p2pmem_group = { + .attrs = p2pmem_attrs, + .name = "p2pmem", +}; + static void pci_p2pdma_percpu_release(struct percpu_ref *ref) { struct pci_p2pdma *p2p = @@ -53,6 +101,7 @@ static void pci_p2pdma_release(void *data) percpu_ref_exit(&pdev->p2pdma->devmap_ref); gen_pool_destroy(pdev->p2pdma->pool); + sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group); pdev->p2pdma = NULL; } @@ -83,9 +132,14 @@ static int pci_p2pdma_setup(struct pci_dev *pdev) pdev->p2pdma = p2p; + error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group); + if (error) + goto out_pool_destroy; + return 0; out_pool_destroy: + pdev->p2pdma = NULL; gen_pool_destroy(p2p->pool); out: devm_kfree(&pdev->dev, p2p); From patchwork Mon Apr 23 23:30:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903237 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN9J1X5Vz9ryr for ; Tue, 24 Apr 2018 09:35:00 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932756AbeDWXcT (ORCPT ); Mon, 23 Apr 2018 19:32:19 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41926 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932654AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006J2-Oi; Mon, 23 Apr 2018 17:31:01 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv3-0005bR-5H; Mon, 23 Apr 2018 17:30:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:35 -0600 Message-Id: <20180423233046.21476-4-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.5 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_FREE, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 03/14] PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org The DMA address used when mapping PCI P2P memory must be the PCI bus address. Thus, introduce pci_p2pmem_[un]map_sg() to map the correct addresses when using P2P memory. For this, we assume that an SGL passed to these functions contain all P2P memory or no P2P memory. Signed-off-by: Logan Gunthorpe --- drivers/pci/p2pdma.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/memremap.h | 1 + include/linux/pci-p2pdma.h | 13 ++++++++++++ 3 files changed, 65 insertions(+) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 4daad6374869..ed9dce8552a2 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -190,6 +190,8 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, pgmap->res.flags = pci_resource_flags(pdev, bar); pgmap->ref = &pdev->p2pdma->devmap_ref; pgmap->type = MEMORY_DEVICE_PCI_P2PDMA; + pgmap->pci_p2pdma_bus_offset = pci_bus_address(pdev, bar) - + pci_resource_start(pdev, bar); addr = devm_memremap_pages(&pdev->dev, pgmap); if (IS_ERR(addr)) { @@ -746,3 +748,52 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish) pdev->p2pdma->p2pmem_published = publish; } EXPORT_SYMBOL_GPL(pci_p2pmem_publish); + +/** + * pci_p2pdma_map_sg - map a PCI peer-to-peer sg for DMA + * @dev: device doing the DMA request + * @sg: scatter list to map + * @nents: elements in the scatterlist + * @dir: DMA direction + * + * Returns the number of SG entries mapped + */ +int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents, + enum dma_data_direction dir) +{ + struct dev_pagemap *pgmap; + struct scatterlist *s; + phys_addr_t paddr; + int i; + + /* + * p2pdma mappings are not compatible with devices that use + * dma_virt_ops. + */ + if (IS_ENABLED(CONFIG_DMA_VIRT_OPS) && dev->dma_ops == &dma_virt_ops) + return 0; + + for_each_sg(sg, s, nents, i) { + pgmap = sg_page(s)->pgmap; + paddr = sg_phys(s); + + s->dma_address = paddr - pgmap->pci_p2pdma_bus_offset; + sg_dma_len(s) = s->length; + } + + return nents; +} +EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg); + +/** + * pci_p2pdma_unmap_sg - unmap a PCI peer-to-peer sg for DMA + * @dev: device doing the DMA request + * @sg: scatter list to map + * @nents: elements in the scatterlist + * @dir: DMA direction + */ +void pci_p2pdma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, + enum dma_data_direction dir) +{ +} +EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg); diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 9e907c338a44..1660f64ce96f 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -125,6 +125,7 @@ struct dev_pagemap { struct device *dev; void *data; enum memory_type type; + u64 pci_p2pdma_bus_offset; }; #ifdef CONFIG_ZONE_DEVICE diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 80e931cb1235..0cde88341eeb 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -35,6 +35,10 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev, unsigned int *nents, u32 length); void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl); void pci_p2pmem_publish(struct pci_dev *pdev, bool publish); +int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents, + enum dma_data_direction dir); +void pci_p2pdma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, + enum dma_data_direction dir); #else /* CONFIG_PCI_P2PDMA */ static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset) @@ -96,5 +100,14 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev, static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish) { } +static inline int pci_p2pdma_map_sg(struct device *dev, + struct scatterlist *sg, int nents, enum dma_data_direction dir) +{ + return 0; +} +static inline void pci_p2pdma_unmap_sg(struct device *dev, + struct scatterlist *sg, int nents, enum dma_data_direction dir) +{ +} #endif /* CONFIG_PCI_P2PDMA */ #endif /* _LINUX_PCI_P2P_H */ From patchwork Mon Apr 23 23:30:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903229 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN6h0pC8z9ryr for ; Tue, 24 Apr 2018 09:32:44 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932709AbeDWXcl (ORCPT ); Mon, 23 Apr 2018 19:32:41 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41934 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932644AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006J3-Os; Mon, 23 Apr 2018 17:31:01 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv3-0005bU-93; Mon, 23 Apr 2018 17:30:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:36 -0600 Message-Id: <20180423233046.21476-5-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org For peer-to-peer transactions to work the downstream ports in each switch must not have the ACS flags set. At this time there is no way to dynamically change the flags and update the corresponding IOMMU groups so this is done at enumeration time before the groups are assigned. This effectively means that if CONFIG_PCI_P2PDMA is selected then all devices behind any PCIe switch heirarchy will be in the same IOMMU group. Which implies that individual devices behind any switch heirarchy will not be able to be assigned to separate VMs because there is no isolation between them. Additionally, any malicious PCIe devices will be able to DMA to memory exposed by other EPs in the same domain as TLPs will not be checked by the IOMMU. Given that the intended use case of P2P Memory is for users with custom hardware designed for purpose, we do not expect distributors to ever need to enable this option. Users that want to use P2P must have compiled a custom kernel with this configuration option and understand the implications regarding ACS. They will either not require ACS or will have design the system in such a way that devices that require isolation will be separate from those using P2P transactions. Signed-off-by: Logan Gunthorpe --- drivers/pci/Kconfig | 9 +++++++++ drivers/pci/p2pdma.c | 45 ++++++++++++++++++++++++++++++--------------- drivers/pci/pci.c | 6 ++++++ include/linux/pci-p2pdma.h | 5 +++++ 4 files changed, 50 insertions(+), 15 deletions(-) diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index b2396c22b53e..b6db41d4b708 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -139,6 +139,15 @@ config PCI_P2PDMA transations must be between devices behind the same root port. (Typically behind a network of PCIe switches). + Enabling this option will also disable ACS on all ports behind + any PCIe switch. This effectively puts all devices behind any + switch heirarchy into the same IOMMU group. Which implies that + individual devices behind any switch will not be able to be + assigned to separate VMs because there is no isolation between + them. Additionally, any malicious PCIe devices will be able to + DMA to memory exposed by other EPs in the same domain as TLPs + will not be checked by the IOMMU. + If unsure, say N. config PCI_LABEL diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index ed9dce8552a2..e9f43b43acac 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -240,27 +240,42 @@ static struct pci_dev *find_parent_pci_dev(struct device *dev) } /* - * If a device is behind a switch, we try to find the upstream bridge - * port of the switch. This requires two calls to pci_upstream_bridge(): - * one for the upstream port on the switch, one on the upstream port - * for the next level in the hierarchy. Because of this, devices connected - * to the root port will be rejected. + * pci_p2pdma_disable_acs - disable ACS flags for all PCI bridges + * @pdev: device to disable ACS flags for + * + * The ACS flags for P2P Request Redirect and P2P Completion Redirect need + * to be disabled on any PCI bridge in order for the TLPS to not be forwarded + * up to the RC which is not what we want for P2P. + * + * This function is called when the devices are first enumerated and + * will result in all devices behind any bridge to be in the same IOMMU + * group. At this time, there is no way to "hotplug" IOMMU groups so we rely + * on this largish hammer. If you need the devices to be in separate groups + * don't enable CONFIG_PCI_P2PDMA. + * + * Returns 1 if the ACS bits for this device was cleared, otherwise 0. */ -static struct pci_dev *get_upstream_bridge_port(struct pci_dev *pdev) +int pci_p2pdma_disable_acs(struct pci_dev *pdev) { - struct pci_dev *up1, *up2; + int pos; + u16 ctrl; - if (!pdev) - return NULL; + if (!pci_is_bridge(pdev)) + return 0; - up1 = pci_dev_get(pci_upstream_bridge(pdev)); - if (!up1) - return NULL; + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS); + if (!pos) + return 0; + + pci_info(pdev, "disabling ACS flags for peer-to-peer DMA\n"); + + pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl); + + ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR); - up2 = pci_dev_get(pci_upstream_bridge(up1)); - pci_dev_put(up1); + pci_write_config_word(pdev, pos + PCI_ACS_CTRL, ctrl); - return up2; + return 1; } /* diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e597655a5643..7e2f5724ba22 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -2835,6 +2836,11 @@ static void pci_std_enable_acs(struct pci_dev *dev) */ void pci_enable_acs(struct pci_dev *dev) { +#ifdef CONFIG_PCI_P2PDMA + if (pci_p2pdma_disable_acs(dev)) + return; +#endif + if (!pci_acs_enable) return; diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 0cde88341eeb..fcb3437a2f3c 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -18,6 +18,7 @@ struct block_device; struct scatterlist; #ifdef CONFIG_PCI_P2PDMA +int pci_p2pdma_disable_acs(struct pci_dev *pdev); int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset); int pci_p2pdma_add_client(struct list_head *head, struct device *dev); @@ -40,6 +41,10 @@ int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, int nents, void pci_p2pdma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir); #else /* CONFIG_PCI_P2PDMA */ +static inline int pci_p2pdma_disable_acs(struct pci_dev *pdev) +{ + return 0; +} static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset) { From patchwork Mon Apr 23 23:30:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903225 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN4w5kByz9s02 for ; Tue, 24 Apr 2018 09:31:12 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932662AbeDWXbH (ORCPT ); Mon, 23 Apr 2018 19:31:07 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41954 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932629AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006J4-S3; Mon, 23 Apr 2018 17:31:02 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv3-0005bX-CU; Mon, 23 Apr 2018 17:30:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe , Jonathan Corbet , Mauro Carvalho Chehab , Greg Kroah-Hartman , Vinod Koul , Linus Walleij , Thierry Reding , Sanyog Kale , Sagar Dharia Date: Mon, 23 Apr 2018 17:30:37 -0600 Message-Id: <20180423233046.21476-6-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com, corbet@lwn.net, mchehab@kernel.org, gregkh@linuxfoundation.org, linus.walleij@linaro.org, treding@nvidia.com, keith.busch@intel.com, dan.j.williams@intel.com, vinod.koul@intel.com, sanyog.r.kale@intel.com, sdharia@codeaurora.org X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 05/14] docs-rst: Add a new directory for PCI documentation X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add a new directory in the driver API guide for PCI specific documentation. This is in preparation for adding a new PCI P2P DMA driver writers guide which will go in this directory. Signed-off-by: Logan Gunthorpe Cc: Jonathan Corbet Cc: Mauro Carvalho Chehab Cc: Greg Kroah-Hartman Cc: Vinod Koul Cc: Linus Walleij Cc: Logan Gunthorpe Cc: Thierry Reding Cc: Sanyog Kale Cc: Sagar Dharia --- Documentation/driver-api/index.rst | 2 +- Documentation/driver-api/pci/index.rst | 19 +++++++++++++++++++ Documentation/driver-api/{ => pci}/pci.rst | 0 3 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 Documentation/driver-api/pci/index.rst rename Documentation/driver-api/{ => pci}/pci.rst (100%) diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 6d8352c0f354..9e4cd4e91a49 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -27,7 +27,7 @@ available subsections can be seen below. iio/index input usb/index - pci + pci/index spi i2c hsi diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst new file mode 100644 index 000000000000..03b57cbf8cc2 --- /dev/null +++ b/Documentation/driver-api/pci/index.rst @@ -0,0 +1,19 @@ +============================================ +The Linux PCI driver implementer's API guide +============================================ + +.. class:: toc-title + + Table of contents + +.. toctree:: + :maxdepth: 2 + + pci + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/driver-api/pci.rst b/Documentation/driver-api/pci/pci.rst similarity index 100% rename from Documentation/driver-api/pci.rst rename to Documentation/driver-api/pci/pci.rst From patchwork Mon Apr 23 23:30:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903230 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN7k0dxRz9rxx for ; Tue, 24 Apr 2018 09:33:38 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932635AbeDWXco (ORCPT ); Mon, 23 Apr 2018 19:32:44 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41924 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932647AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006J5-Os; Mon, 23 Apr 2018 17:31:01 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv3-0005ba-FJ; Mon, 23 Apr 2018 17:30:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe , Jonathan Corbet Date: Mon, 23 Apr 2018 17:30:38 -0600 Message-Id: <20180423233046.21476-7-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com, corbet@lwn.net X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.5 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_FREE, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 06/14] PCI/P2PDMA: Add P2P DMA driver writer's documentation X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add a restructured text file describing how to write drivers with support for P2P DMA transactions. The document describes how to use the APIs that were added in the previous few commits. Also adds an index for the PCI documentation tree even though this is the only PCI document that has been converted to restructured text at this time. Signed-off-by: Logan Gunthorpe Cc: Jonathan Corbet --- Documentation/PCI/index.rst | 14 +++ Documentation/driver-api/pci/index.rst | 1 + Documentation/driver-api/pci/p2pdma.rst | 166 ++++++++++++++++++++++++++++++++ Documentation/index.rst | 3 +- 4 files changed, 183 insertions(+), 1 deletion(-) create mode 100644 Documentation/PCI/index.rst create mode 100644 Documentation/driver-api/pci/p2pdma.rst diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst new file mode 100644 index 000000000000..2fdc4b3c291d --- /dev/null +++ b/Documentation/PCI/index.rst @@ -0,0 +1,14 @@ +================================== +Linux PCI Driver Developer's Guide +================================== + +.. toctree:: + + p2pdma + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/driver-api/pci/index.rst b/Documentation/driver-api/pci/index.rst index 03b57cbf8cc2..d12eeafbfc90 100644 --- a/Documentation/driver-api/pci/index.rst +++ b/Documentation/driver-api/pci/index.rst @@ -10,6 +10,7 @@ The Linux PCI driver implementer's API guide :maxdepth: 2 pci + p2pdma .. only:: subproject and html diff --git a/Documentation/driver-api/pci/p2pdma.rst b/Documentation/driver-api/pci/p2pdma.rst new file mode 100644 index 000000000000..49a512c405b2 --- /dev/null +++ b/Documentation/driver-api/pci/p2pdma.rst @@ -0,0 +1,166 @@ +============================ +PCI Peer-to-Peer DMA Support +============================ + +The PCI bus has pretty decent support for performing DMA transfers +between two endpoints on the bus. This type of transaction is +henceforth called Peer-to-Peer (or P2P). However, there are a number of +issues that make P2P transactions tricky to do in a perfectly safe way. + +One of the biggest issues is that PCI Root Complexes are not required +to support forwarding packets between Root Ports. To make things worse, +there is no simple way to determine if a given Root Complex supports +this or not. (See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, +the kernel only supports doing P2P when the endpoints involved are all +behind the same PCIe root port as the spec guarantees that all +packets will always be routable but does not require routing between +root ports. + +The second issue is that to make use of existing interfaces in Linux, +memory that is used for P2P transactions needs to be backed by struct +pages. However, PCI BARs are not typically cache coherent so there are +a few corner case gotchas with these pages so developers need to +be careful about what they do with them. + + +Driver Writer's Guide +===================== + +In a given P2P implementation there may be three or more different +types of kernel drivers in play: + +* Providers - A driver which provides or publishes P2P resources like + memory or doorbell registers to other drivers. +* Clients - A driver which makes use of a resource by setting up a + DMA transaction to or from it. +* Orchestrators - A driver which orchestrates the flow of data between + clients and providers + +In many cases there could be overlap between these three types (ie. +it may be typical for a driver to be both a provider and a client). + +For example, in the NVMe Target Copy Offload implementation: + +* The NVMe PCI driver is both a client, provider and orchestrator + in that it exposes any CMB (Controller Memory Buffer) as a P2P memory + resource (provider), it accepts P2P memory pages as buffers in requests + to be used directly (client) and it can also make use the CMB as + submission queue entries. +* The RDMA driver is a client in this arrangement so that an RNIC + can DMA directly to the memory exposed by the NVMe device. +* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC + to the P2P memory (CMB) and then to the NVMe device (and vice versa). + +This is currently the only arrangement supported by the kernel but +one could imagine slight tweaks to this that would allow for the same +functionality. For example, if a specific RNIC added a BAR with some +memory behind it, its driver could add support as a P2P provider and +then the NVMe Target could use the RNIC's memory instead of the CMB +in cases where the NVMe cards in use do not have CMB support. + + +Provider Drivers +---------------- + +A provider simply needs to register a BAR (or a portion of a BAR) +as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`. +This will register struct pages for all the specified memory. + +After that it may optionally publish all of its resources as +P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow +any orchestrator drivers to find and use the memory. When marked in +this way, the resource must be regular memory with no side effects. + +For the time being this is fairly rudimentary in that all resources +are typically going to be P2P memory. Future work will likely expand +this to include other types of resources like doorbells. + + +Client Drivers +-------------- + +A client driver typically only has to conditionally change its DMA map +routine to use the mapping functions :c:func:`pci_p2pdma_map_sg()` and +:c:func:`pci_p2pdma_unmap_sg()` instead of the usual :c:func:`dma_map_sg()` +functions. + +The client may also, optionally, make use of +:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping +functions and when to use the regular mapping functions. In some +situations, it may be more appropriate to use a flag to indicate a +given request is P2P memory and map appropriately (for example the +block layer uses a flag to keep P2P memory out of queues that do not +have P2P client support). It is important to ensure that struct pages that +back P2P memory stay out of code that does not have support for them. + + +Orchestrator Drivers +-------------------- + +The first task an orchestrator driver must do is compile a list of +all client drivers that will be involved in a given transaction. For +example, the NVMe Target driver creates a list including all NVMe drives +and the RNIC in use. The list is stored as an anonymous struct +list_head which must be initialized with the usual INIT_LIST_HEAD. +The following functions may then be used to add to, remove from and free +the list of clients with the functions :c:func:`pci_p2pdma_add_client()`, +:c:func:`pci_p2pdma_remove_client()` and +:c:func:`pci_p2pdma_client_list_free()`. + +With the client list in hand, the orchestrator may then call +:c:func:`pci_p2pmem_find()` to obtain a published P2P memory provider +that is supported (behind the same root port) as all the clients. If more +than one provider is supported, the one nearest to all the clients will +be chosen first. If there are more than one provider is an equal distance +away, the one returned will be chosen at random. This function returns the PCI +device to use for the provider with a reference taken and therefore +when it's no longer needed it should be returned with pci_dev_put(). + +Alternatively, if the orchestrator knows (via some other means) +which provider it wants to use it may use :c:func:`pci_has_p2pmem()` +to determine if it has P2P memory and :c:func:`pci_p2pdma_distance()` +to determine the cumulative distance between it and a potential +list of clients. + +With a supported provider in hand, the driver can then call +:c:func:`pci_p2pdma_assign_provider()` to assign the provider +to the client list. This function returns false if any of the +clients are unsupported by the provider. + +Once a provider is assigned to a client list via either +:c:func:`pci_p2pmem_find()` or :c:func:`pci_p2pdma_assign_provider()`, +the list is permanently bound to the provider such that any new clients +added to the list must be supported by the already selected provider. +If they are not supported, :c:func:`pci_p2pdma_add_client()` will return +an error. In this way, orchestrators are free to add and remove devices +without having to recheck support or tear down existing transfers to +change P2P providers. + +Once a provider is selected, the orchestrator can then use +:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to +allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()` +and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for +allocating scatter-gather lists with P2P memory. + +Struct Page Caveats +------------------- + +Driver writers should be very careful about not passing these special +struct pages to code that isn't prepared for it. At this time, the kernel +interfaces do not have any checks for ensuring this. This obviously +precludes passing these pages to userspace. + +P2P memory is also technically IO memory but should never have any side +effects behind it. Thus, the order of loads and stores should not be important +and ioreadX(), iowriteX() and friends should not be necessary. +However, as the memory is not cache coherent, if access ever needs to +be protected by a spinlock then :c:func:`mmiowb()` must be used before +unlocking the lock. (See ACQUIRES VS I/O ACCESSES in +Documentation/memory-barriers.txt) + + +P2P DMA Support Library +===================== + +.. kernel-doc:: drivers/pci/p2pdma.c + :export: diff --git a/Documentation/index.rst b/Documentation/index.rst index 3b99ab931d41..e7938b507df3 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -45,7 +45,7 @@ the kernel interface as seen by application developers. .. toctree:: :maxdepth: 2 - userspace-api/index + userspace-api/index Introduction to kernel development @@ -89,6 +89,7 @@ needed). sound/index crypto/index filesystems/index + PCI/index Architecture-specific documentation ----------------------------------- From patchwork Mon Apr 23 23:30:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903232 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN8L1Rzpz9rxp for ; Tue, 24 Apr 2018 09:34:10 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932819AbeDWXch (ORCPT ); Mon, 23 Apr 2018 19:32:37 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41916 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932648AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006J6-Oh; Mon, 23 Apr 2018 17:31:00 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv3-0005bd-IE; Mon, 23 Apr 2018 17:30:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:39 -0600 Message-Id: <20180423233046.21476-8-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.5 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_FREE, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 07/14] block: Introduce PCI P2P flags for request and request queue X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue supports targeting P2P memory. REQ_PCI_P2P is introduced to indicate a particular bio request is directed to/from PCI P2P memory. A request with this flag is not accepted unless the corresponding queues have the QUEUE_FLAG_PCI_P2P flag set. Signed-off-by: Logan Gunthorpe Reviewed-by: Sagi Grimberg Reviewed-by: Christoph Hellwig --- block/blk-core.c | 3 +++ include/linux/blk_types.h | 18 +++++++++++++++++- include/linux/blkdev.h | 3 +++ 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 806ce2442819..35680cbebaf4 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2270,6 +2270,9 @@ generic_make_request_checks(struct bio *bio) if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q)) goto not_supported; + if ((bio->bi_opf & REQ_PCI_P2PDMA) && !blk_queue_pci_p2pdma(q)) + goto not_supported; + if (should_fail_bio(bio)) goto end_io; diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 17b18b91ebac..41194d54c45a 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -279,6 +279,10 @@ enum req_flag_bits { __REQ_BACKGROUND, /* background IO */ __REQ_NOWAIT, /* Don't wait if request will block */ +#ifdef CONFIG_PCI_P2PDMA + __REQ_PCI_P2PDMA, /* request is to/from P2P memory */ +#endif + /* command specific flags for REQ_OP_WRITE_ZEROES: */ __REQ_NOUNMAP, /* do not free blocks when zeroing */ @@ -303,6 +307,18 @@ enum req_flag_bits { #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND) #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) +#ifdef CONFIG_PCI_P2PDMA +/* + * Currently SGLs do not support mixed P2P and regular memory so + * requests with P2P memory must not be merged. + */ +#define REQ_PCI_P2PDMA (1ULL << __REQ_PCI_P2PDMA) +#define REQ_IS_PCI_P2PDMA(req) ((req)->cmd_flags & REQ_PCI_P2PDMA) +#else +#define REQ_PCI_P2PDMA 0 +#define REQ_IS_PCI_P2PDMA(req) 0 +#endif /* CONFIG_PCI_P2PDMA */ + #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) #define REQ_DRV (1ULL << __REQ_DRV) @@ -311,7 +327,7 @@ enum req_flag_bits { (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER) #define REQ_NOMERGE_FLAGS \ - (REQ_NOMERGE | REQ_PREFLUSH | REQ_FUA) + (REQ_NOMERGE | REQ_PREFLUSH | REQ_FUA | REQ_PCI_P2PDMA) #define bio_op(bio) \ ((bio)->bi_opf & REQ_OP_MASK) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 9af3e0f430bc..116367babb39 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -698,6 +698,7 @@ struct request_queue { #define QUEUE_FLAG_SCSI_PASSTHROUGH 27 /* queue supports SCSI commands */ #define QUEUE_FLAG_QUIESCED 28 /* queue has been quiesced */ #define QUEUE_FLAG_PREEMPT_ONLY 29 /* only process REQ_PREEMPT requests */ +#define QUEUE_FLAG_PCI_P2PDMA 30 /* device supports pci p2p requests */ #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_SAME_COMP) | \ @@ -730,6 +731,8 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q); #define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags) #define blk_queue_scsi_passthrough(q) \ test_bit(QUEUE_FLAG_SCSI_PASSTHROUGH, &(q)->queue_flags) +#define blk_queue_pci_p2pdma(q) \ + test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags) #define blk_noretry_request(rq) \ ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \ From patchwork Mon Apr 23 23:30:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903228 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN6P4dxYz9rxx for ; Tue, 24 Apr 2018 09:32:29 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932574AbeDWXc0 (ORCPT ); Mon, 23 Apr 2018 19:32:26 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41930 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932649AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006J7-Om; Mon, 23 Apr 2018 17:31:00 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv3-0005bg-LC; Mon, 23 Apr 2018 17:30:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:40 -0600 Message-Id: <20180423233046.21476-9-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 08/14] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org In order to use PCI P2P memory pci_p2pmem_[un]map_sg() functions must be called to map the correct PCI bus address. To do this, check the first page in the scatter list to see if it is P2P memory or not. At the moment, scatter lists that contain P2P memory must be homogeneous so if the first page is P2P the entire SGL should be P2P. Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig --- drivers/infiniband/core/rw.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c index c8963e91f92a..f495e8a7f8ac 100644 --- a/drivers/infiniband/core/rw.c +++ b/drivers/infiniband/core/rw.c @@ -12,6 +12,7 @@ */ #include #include +#include #include #include @@ -280,7 +281,11 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num, struct ib_device *dev = qp->pd->device; int ret; - ret = ib_dma_map_sg(dev, sg, sg_cnt, dir); + if (is_pci_p2pdma_page(sg_page(sg))) + ret = pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir); + else + ret = ib_dma_map_sg(dev, sg, sg_cnt, dir); + if (!ret) return -ENOMEM; sg_cnt = ret; @@ -602,7 +607,11 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num, break; } - ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir); + if (is_pci_p2pdma_page(sg_page(sg))) + pci_p2pdma_unmap_sg(qp->pd->device->dma_device, sg, + sg_cnt, dir); + else + ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir); } EXPORT_SYMBOL(rdma_rw_ctx_destroy); From patchwork Mon Apr 23 23:30:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903227 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN5t1T4qz9rxx for ; Tue, 24 Apr 2018 09:32:02 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932692AbeDWXbM (ORCPT ); Mon, 23 Apr 2018 19:31:12 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41922 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932617AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006J8-Ow; Mon, 23 Apr 2018 17:31:01 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv3-0005bj-OD; Mon, 23 Apr 2018 17:30:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:41 -0600 Message-Id: <20180423233046.21476-10-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.5 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_FREE, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 09/14] nvme-pci: Use PCI p2pmem subsystem to manage the CMB X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Register the CMB buffer as p2pmem and use the appropriate allocation functions to create and destroy the IO submission queues. If the CMB supports WDS and RDS, publish it for use as P2P memory by other devices. We can now drop the __iomem safety on the buffer seeing that, by convention, devm_memremap_pages() allocates regular memory without side effects that's accessible without the iomem accessors. Signed-off-by: Logan Gunthorpe --- drivers/nvme/host/pci.c | 75 +++++++++++++++++++++++++++---------------------- 1 file changed, 41 insertions(+), 34 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index fbc71fac6f1e..514da4de3c85 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -29,6 +29,7 @@ #include #include #include +#include #include "nvme.h" @@ -92,9 +93,8 @@ struct nvme_dev { struct work_struct remove_work; struct mutex shutdown_lock; bool subsystem; - void __iomem *cmb; - pci_bus_addr_t cmb_bus_addr; u64 cmb_size; + bool cmb_use_sqes; u32 cmbsz; u32 cmbloc; struct nvme_ctrl ctrl; @@ -149,7 +149,7 @@ struct nvme_queue { struct nvme_dev *dev; spinlock_t q_lock; struct nvme_command *sq_cmds; - struct nvme_command __iomem *sq_cmds_io; + bool sq_cmds_is_io; volatile struct nvme_completion *cqes; struct blk_mq_tags **tags; dma_addr_t sq_dma_addr; @@ -431,10 +431,7 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq, { u16 tail = nvmeq->sq_tail; - if (nvmeq->sq_cmds_io) - memcpy_toio(&nvmeq->sq_cmds_io[tail], cmd, sizeof(*cmd)); - else - memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd)); + memcpy(&nvmeq->sq_cmds[tail], cmd, sizeof(*cmd)); if (++tail == nvmeq->q_depth) tail = 0; @@ -1289,9 +1286,18 @@ static void nvme_free_queue(struct nvme_queue *nvmeq) { dma_free_coherent(nvmeq->q_dmadev, CQ_SIZE(nvmeq->q_depth), (void *)nvmeq->cqes, nvmeq->cq_dma_addr); - if (nvmeq->sq_cmds) - dma_free_coherent(nvmeq->q_dmadev, SQ_SIZE(nvmeq->q_depth), - nvmeq->sq_cmds, nvmeq->sq_dma_addr); + + if (nvmeq->sq_cmds) { + if (nvmeq->sq_cmds_is_io) + pci_free_p2pmem(to_pci_dev(nvmeq->q_dmadev), + nvmeq->sq_cmds, + SQ_SIZE(nvmeq->q_depth)); + else + dma_free_coherent(nvmeq->q_dmadev, + SQ_SIZE(nvmeq->q_depth), + nvmeq->sq_cmds, + nvmeq->sq_dma_addr); + } } static void nvme_free_queues(struct nvme_dev *dev, int lowest) @@ -1371,12 +1377,21 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues, static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq, int qid, int depth) { - /* CMB SQEs will be mapped before creation */ - if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) - return 0; + struct pci_dev *pdev = to_pci_dev(dev->dev); + + if (qid && dev->cmb_use_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) { + nvmeq->sq_cmds = pci_alloc_p2pmem(pdev, SQ_SIZE(depth)); + nvmeq->sq_dma_addr = pci_p2pmem_virt_to_bus(pdev, + nvmeq->sq_cmds); + nvmeq->sq_cmds_is_io = true; + } + + if (!nvmeq->sq_cmds) { + nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth), + &nvmeq->sq_dma_addr, GFP_KERNEL); + nvmeq->sq_cmds_is_io = false; + } - nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth), - &nvmeq->sq_dma_addr, GFP_KERNEL); if (!nvmeq->sq_cmds) return -ENOMEM; return 0; @@ -1451,13 +1466,6 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid) struct nvme_dev *dev = nvmeq->dev; int result; - if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) { - unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth), - dev->ctrl.page_size); - nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset; - nvmeq->sq_cmds_io = dev->cmb + offset; - } - /* * A queue's vector matches the queue identifier unless the controller * has only one vector available. @@ -1691,9 +1699,6 @@ static void nvme_map_cmb(struct nvme_dev *dev) return; dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC); - if (!use_cmb_sqes) - return; - size = nvme_cmb_size_unit(dev) * nvme_cmb_size(dev); offset = nvme_cmb_size_unit(dev) * NVME_CMB_OFST(dev->cmbloc); bar = NVME_CMB_BIR(dev->cmbloc); @@ -1710,11 +1715,15 @@ static void nvme_map_cmb(struct nvme_dev *dev) if (size > bar_size - offset) size = bar_size - offset; - dev->cmb = ioremap_wc(pci_resource_start(pdev, bar) + offset, size); - if (!dev->cmb) + if (pci_p2pdma_add_resource(pdev, bar, size, offset)) return; - dev->cmb_bus_addr = pci_bus_address(pdev, bar) + offset; + dev->cmb_size = size; + dev->cmb_use_sqes = use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS); + + if ((dev->cmbsz & (NVME_CMBSZ_WDS | NVME_CMBSZ_RDS)) == + (NVME_CMBSZ_WDS | NVME_CMBSZ_RDS)) + pci_p2pmem_publish(pdev, true); if (sysfs_add_file_to_group(&dev->ctrl.device->kobj, &dev_attr_cmb.attr, NULL)) @@ -1724,12 +1733,10 @@ static void nvme_map_cmb(struct nvme_dev *dev) static inline void nvme_release_cmb(struct nvme_dev *dev) { - if (dev->cmb) { - iounmap(dev->cmb); - dev->cmb = NULL; + if (dev->cmb_size) { sysfs_remove_file_from_group(&dev->ctrl.device->kobj, &dev_attr_cmb.attr, NULL); - dev->cmbsz = 0; + dev->cmb_size = 0; } } @@ -1928,13 +1935,13 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) if (nr_io_queues == 0) return 0; - if (dev->cmb && (dev->cmbsz & NVME_CMBSZ_SQS)) { + if (dev->cmb_use_sqes) { result = nvme_cmb_qdepth(dev, nr_io_queues, sizeof(struct nvme_command)); if (result > 0) dev->q_depth = result; else - nvme_release_cmb(dev); + dev->cmb_use_sqes = false; } do { From patchwork Mon Apr 23 23:30:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903233 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN8W4TQHz9rxp for ; Tue, 24 Apr 2018 09:34:19 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932808AbeDWXcf (ORCPT ); Mon, 23 Apr 2018 19:32:35 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41918 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932635AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006J9-Op; Mon, 23 Apr 2018 17:31:01 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv3-0005bm-RS; Mon, 23 Apr 2018 17:30:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:42 -0600 Message-Id: <20180423233046.21476-11-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.5 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_FREE, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 10/14] nvme-pci: Add support for P2P memory in requests X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org For P2P requests, we must use the pci_p2pmem_[un]map_sg() functions instead of the dma_map_sg functions. With that, we can then indicate PCI_P2P support in the request queue. For this, we create an NVME_F_PCI_P2P flag which tells the core to set QUEUE_FLAG_PCI_P2P in the request queue. Signed-off-by: Logan Gunthorpe Reviewed-by: Sagi Grimberg Reviewed-by: Christoph Hellwig --- drivers/nvme/host/core.c | 4 ++++ drivers/nvme/host/nvme.h | 1 + drivers/nvme/host/pci.c | 19 +++++++++++++++---- 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 9df4f71e58ca..2ca9debbcf2b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2977,7 +2977,11 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) ns->queue = blk_mq_init_queue(ctrl->tagset); if (IS_ERR(ns->queue)) goto out_free_ns; + blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue); + if (ctrl->ops->flags & NVME_F_PCI_P2PDMA) + blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue); + ns->queue->queuedata = ns; ns->ctrl = ctrl; diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 061fecfd44f5..9a689c13998f 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -306,6 +306,7 @@ struct nvme_ctrl_ops { unsigned int flags; #define NVME_F_FABRICS (1 << 0) #define NVME_F_METADATA_SUPPORTED (1 << 1) +#define NVME_F_PCI_P2PDMA (1 << 2) int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val); int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val); int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val); diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 514da4de3c85..09b6aba6ed28 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -798,8 +798,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, goto out; ret = BLK_STS_RESOURCE; - nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir, - DMA_ATTR_NO_WARN); + + if (REQ_IS_PCI_P2PDMA(req)) + nr_mapped = pci_p2pdma_map_sg(dev->dev, iod->sg, iod->nents, + dma_dir); + else + nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, + dma_dir, DMA_ATTR_NO_WARN); if (!nr_mapped) goto out; @@ -844,7 +849,12 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) DMA_TO_DEVICE : DMA_FROM_DEVICE; if (iod->nents) { - dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir); + if (REQ_IS_PCI_P2PDMA(req)) + pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents, + dma_dir); + else + dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir); + if (blk_integrity_rq(req)) { if (req_op(req) == REQ_OP_READ) nvme_dif_remap(req, nvme_dif_complete); @@ -2439,7 +2449,8 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, char *buf, int size) static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = { .name = "pcie", .module = THIS_MODULE, - .flags = NVME_F_METADATA_SUPPORTED, + .flags = NVME_F_METADATA_SUPPORTED | + NVME_F_PCI_P2PDMA, .reg_read32 = nvme_pci_reg_read32, .reg_write32 = nvme_pci_reg_write32, .reg_read64 = nvme_pci_reg_read64, From patchwork Mon Apr 23 23:30:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903224 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN4v6Dtdz9rxx for ; Tue, 24 Apr 2018 09:31:11 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932629AbeDWXbJ (ORCPT ); Mon, 23 Apr 2018 19:31:09 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41920 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932652AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006JA-Ox; Mon, 23 Apr 2018 17:30:59 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv3-0005bp-UP; Mon, 23 Apr 2018 17:30:49 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:43 -0600 Message-Id: <20180423233046.21476-12-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 11/14] nvme-pci: Add a quirk for a pseudo CMB X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Introduce a quirk to use CMB-like memory on older devices that have an exposed BAR but do not advertise support for using CMBLOC and CMBSIZE. We'd like to use some of these older cards to test P2P memory. Signed-off-by: Logan Gunthorpe Reviewed-by: Sagi Grimberg --- drivers/nvme/host/nvme.h | 7 +++++++ drivers/nvme/host/pci.c | 24 ++++++++++++++++++++---- 2 files changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 9a689c13998f..885e9ec9b889 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -84,6 +84,13 @@ enum nvme_quirks { * Supports the LighNVM command set if indicated in vs[1]. */ NVME_QUIRK_LIGHTNVM = (1 << 6), + + /* + * Pseudo CMB Support on BAR 4. For adapters like the Microsemi + * NVRAM that have CMB-like memory on a BAR but does not set + * CMBLOC or CMBSZ. + */ + NVME_QUIRK_PSEUDO_CMB_BAR4 = (1 << 7), }; /* diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 09b6aba6ed28..e526e969680a 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1685,6 +1685,13 @@ static ssize_t nvme_cmb_show(struct device *dev, } static DEVICE_ATTR(cmb, S_IRUGO, nvme_cmb_show, NULL); +static u32 nvme_pseudo_cmbsz(struct pci_dev *pdev, int bar) +{ + return NVME_CMBSZ_WDS | NVME_CMBSZ_RDS | + (((ilog2(SZ_16M) - 12) / 4) << NVME_CMBSZ_SZU_SHIFT) | + ((pci_resource_len(pdev, bar) / SZ_16M) << NVME_CMBSZ_SZ_SHIFT); +} + static u64 nvme_cmb_size_unit(struct nvme_dev *dev) { u8 szu = (dev->cmbsz >> NVME_CMBSZ_SZU_SHIFT) & NVME_CMBSZ_SZU_MASK; @@ -1704,10 +1711,15 @@ static void nvme_map_cmb(struct nvme_dev *dev) struct pci_dev *pdev = to_pci_dev(dev->dev); int bar; - dev->cmbsz = readl(dev->bar + NVME_REG_CMBSZ); - if (!dev->cmbsz) - return; - dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC); + if (dev->ctrl.quirks & NVME_QUIRK_PSEUDO_CMB_BAR4) { + dev->cmbsz = nvme_pseudo_cmbsz(pdev, 4); + dev->cmbloc = 4; + } else { + dev->cmbsz = readl(dev->bar + NVME_REG_CMBSZ); + if (!dev->cmbsz) + return; + dev->cmbloc = readl(dev->bar + NVME_REG_CMBLOC); + } size = nvme_cmb_size_unit(dev) * nvme_cmb_size(dev); offset = nvme_cmb_size_unit(dev) * NVME_CMB_OFST(dev->cmbloc); @@ -2736,6 +2748,10 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_LIGHTNVM, }, { PCI_DEVICE(0x1d1d, 0x2807), /* CNEX WL */ .driver_data = NVME_QUIRK_LIGHTNVM, }, + { PCI_DEVICE(0x11f8, 0xf117), /* Microsemi NVRAM adaptor */ + .driver_data = NVME_QUIRK_PSEUDO_CMB_BAR4, }, + { PCI_DEVICE(0x1db1, 0x0002), /* Everspin nvNitro adaptor */ + .driver_data = NVME_QUIRK_PSEUDO_CMB_BAR4, }, { PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) }, { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001) }, { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) }, From patchwork Mon Apr 23 23:30:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903234 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN8Y1f1pz9rxp for ; Tue, 24 Apr 2018 09:34:21 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932803AbeDWXcc (ORCPT ); Mon, 23 Apr 2018 19:32:32 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41938 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932643AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006JB-Oq; Mon, 23 Apr 2018 17:31:00 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv4-0005bs-1A; Mon, 23 Apr 2018 17:30:50 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:44 -0600 Message-Id: <20180423233046.21476-13-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.5 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_FREE, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 12/14] nvmet: Introduce helper functions to allocate and free request SGLs X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Add helpers to allocate and free the SGL in a struct nvmet_req: int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq) void nvmet_req_free_sgl(struct nvmet_req *req) This will be expanded in a future patch to implement peer-to-peer memory DMAs and should be common with all target drivers. The presently unused 'sq' argument in the alloc function will be necessary to decide whether to use peer-to-peer memory and obtain the correct provider to allocate the memory. Signed-off-by: Logan Gunthorpe Cc: Christoph Hellwig Cc: Sagi Grimberg --- drivers/nvme/target/core.c | 18 ++++++++++++++++++ drivers/nvme/target/nvmet.h | 2 ++ 2 files changed, 20 insertions(+) diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index e95424f172fd..75d44bc3e8d3 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -575,6 +575,24 @@ void nvmet_req_execute(struct nvmet_req *req) } EXPORT_SYMBOL_GPL(nvmet_req_execute); +int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq) +{ + req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt); + if (!req->sg) + return -ENOMEM; + + return 0; +} +EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl); + +void nvmet_req_free_sgl(struct nvmet_req *req) +{ + sgl_free(req->sg); + req->sg = NULL; + req->sg_cnt = 0; +} +EXPORT_SYMBOL_GPL(nvmet_req_free_sgl); + static inline bool nvmet_cc_en(u32 cc) { return (cc >> NVME_CC_EN_SHIFT) & 0x1; diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index 15fd84ab21f8..10b162615a5e 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -273,6 +273,8 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq, void nvmet_req_uninit(struct nvmet_req *req); void nvmet_req_execute(struct nvmet_req *req); void nvmet_req_complete(struct nvmet_req *req, u16 status); +int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq); +void nvmet_req_free_sgl(struct nvmet_req *req); void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid, u16 size); From patchwork Mon Apr 23 23:30:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903235 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN8m52y4z9ryr for ; Tue, 24 Apr 2018 09:34:32 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932690AbeDWXca (ORCPT ); Mon, 23 Apr 2018 19:32:30 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41936 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932567AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvB-0006JC-Or; Mon, 23 Apr 2018 17:31:00 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv4-0005bv-47; Mon, 23 Apr 2018 17:30:50 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe Date: Mon, 23 Apr 2018 17:30:45 -0600 Message-Id: <20180423233046.21476-14-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.5 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_FREE, MYRULES_NO_TEXT autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 13/14] nvmet-rdma: Use new SGL alloc/free helper for requests X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Use the new helpers introduced in the previous patch to allocate the SGLs for the request. Seeing we use req.transfer_len as the length of the SGL it is set earlier and cleared on any error. It also seems to be unnecessary to accumulate the length as the map_sgl functions should only ever be called once. Signed-off-by: Logan Gunthorpe Cc: Christoph Hellwig Cc: Sagi Grimberg --- drivers/nvme/target/rdma.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 52e0c5d579a7..f7a3459d618f 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -430,7 +430,7 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp) } if (rsp->req.sg != &rsp->cmd->inline_sg) - sgl_free(rsp->req.sg); + nvmet_req_free_sgl(&rsp->req); if (unlikely(!list_empty_careful(&queue->rsp_wr_wait_list))) nvmet_rdma_process_wr_wait_list(queue); @@ -564,24 +564,24 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp, { struct rdma_cm_id *cm_id = rsp->queue->cm_id; u64 addr = le64_to_cpu(sgl->addr); - u32 len = get_unaligned_le24(sgl->length); u32 key = get_unaligned_le32(sgl->key); int ret; + rsp->req.transfer_len = get_unaligned_le24(sgl->length); + /* no data command? */ - if (!len) + if (!rsp->req.transfer_len) return 0; - rsp->req.sg = sgl_alloc(len, GFP_KERNEL, &rsp->req.sg_cnt); - if (!rsp->req.sg) - return NVME_SC_INTERNAL; + ret = nvmet_req_alloc_sgl(&rsp->req, &rsp->queue->nvme_sq); + if (ret < 0) + goto error_out; ret = rdma_rw_ctx_init(&rsp->rw, cm_id->qp, cm_id->port_num, rsp->req.sg, rsp->req.sg_cnt, 0, addr, key, nvmet_data_dir(&rsp->req)); if (ret < 0) - return NVME_SC_INTERNAL; - rsp->req.transfer_len += len; + goto error_out; rsp->n_rdma += ret; if (invalidate) { @@ -590,6 +590,10 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp, } return 0; + +error_out: + rsp->req.transfer_len = 0; + return NVME_SC_INTERNAL; } static u16 nvmet_rdma_map_sgl(struct nvmet_rdma_rsp *rsp) From patchwork Mon Apr 23 23:30:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Logan Gunthorpe X-Patchwork-Id: 903236 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=deltatee.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40VN9H13NBz9rxp for ; Tue, 24 Apr 2018 09:34:59 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932678AbeDWXcU (ORCPT ); Mon, 23 Apr 2018 19:32:20 -0400 Received: from ale.deltatee.com ([207.54.116.67]:41948 "EHLO ale.deltatee.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932639AbeDWXbF (ORCPT ); Mon, 23 Apr 2018 19:31:05 -0400 Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fAkvC-0006JD-2b; Mon, 23 Apr 2018 17:31:02 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.89) (envelope-from ) id 1fAkv4-0005by-78; Mon, 23 Apr 2018 17:30:50 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, linux-nvdimm@lists.01.org, linux-block@vger.kernel.org Cc: Stephen Bates , Christoph Hellwig , Jens Axboe , Keith Busch , Sagi Grimberg , Bjorn Helgaas , Jason Gunthorpe , Max Gurtovoy , Dan Williams , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , Benjamin Herrenschmidt , Alex Williamson , =?utf-8?q?Christian_K=C3=B6nig?= , Logan Gunthorpe , Steve Wise Date: Mon, 23 Apr 2018 17:30:46 -0600 Message-Id: <20180423233046.21476-15-logang@deltatee.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180423233046.21476-1-logang@deltatee.com> References: <20180423233046.21476-1-logang@deltatee.com> MIME-Version: 1.0 X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, linux-block@vger.kernel.org, sbates@raithlin.com, hch@lst.de, axboe@kernel.dk, sagi@grimberg.me, bhelgaas@google.com, jgg@mellanox.com, maxg@mellanox.com, keith.busch@intel.com, dan.j.williams@intel.com, benh@kernel.crashing.org, jglisse@redhat.com, alex.williamson@redhat.com, christian.koenig@amd.com, logang@deltatee.com, swise@opengridcomputing.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on ale.deltatee.com X-Spam-Level: X-Spam-Status: No, score=-6.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, MYRULES_FREE autolearn=no autolearn_force=no version=3.4.1 Subject: [PATCH v4 14/14] nvmet: Optionally use PCI P2P memory X-SA-Exim-Version: 4.2.1 (built Tue, 02 Aug 2016 21:08:31 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org We create a configfs attribute in each nvme-fabrics target port to enable p2p memory use. When enabled, the port will only then use the p2p memory if a p2p memory device can be found which is behind the same switch heirarchy as the RDMA port and all the block devices in use. If the user enabled it and no devices are found, then the system will silently fall back on using regular memory. If appropriate, that port will allocate memory for the RDMA buffers for queues from the p2pmem device falling back to system memory should anything fail. Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would save an extra PCI transfer as the NVME card could just take the data out of it's own memory. However, at this time, only a limited number of cards with CMB buffers seem to be available. Signed-off-by: Stephen Bates Signed-off-by: Steve Wise [hch: partial rewrite of the initial code] Signed-off-by: Christoph Hellwig Signed-off-by: Logan Gunthorpe --- drivers/nvme/target/configfs.c | 67 ++++++++++++++++++++++ drivers/nvme/target/core.c | 127 ++++++++++++++++++++++++++++++++++++++++- drivers/nvme/target/io-cmd.c | 3 + drivers/nvme/target/nvmet.h | 13 +++++ drivers/nvme/target/rdma.c | 2 + 5 files changed, 210 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c index ad9ff27234b5..5efe0dae0ee7 100644 --- a/drivers/nvme/target/configfs.c +++ b/drivers/nvme/target/configfs.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include "nvmet.h" @@ -864,12 +866,77 @@ static void nvmet_port_release(struct config_item *item) kfree(port); } +#ifdef CONFIG_PCI_P2PDMA +static ssize_t nvmet_p2pmem_show(struct config_item *item, char *page) +{ + struct nvmet_port *port = to_nvmet_port(item); + + if (!port->use_p2pmem) + return sprintf(page, "none\n"); + + if (!port->p2p_dev) + return sprintf(page, "auto\n"); + + return sprintf(page, "%s\n", pci_name(port->p2p_dev)); +} + +static ssize_t nvmet_p2pmem_store(struct config_item *item, + const char *page, size_t count) +{ + struct nvmet_port *port = to_nvmet_port(item); + struct device *dev; + struct pci_dev *p2p_dev = NULL; + bool use_p2pmem; + + dev = bus_find_device_by_name(&pci_bus_type, NULL, page); + if (dev) { + use_p2pmem = true; + p2p_dev = to_pci_dev(dev); + + if (!pci_has_p2pmem(p2p_dev)) { + pr_err("PCI device has no peer-to-peer memory: %s\n", + page); + pci_dev_put(p2p_dev); + return -ENODEV; + } + } else if (sysfs_streq(page, "auto")) { + use_p2pmem = 1; + } else if ((page[0] == '0' || page[0] == '1') && !iscntrl(page[1])) { + /* + * If the user enters a PCI device that doesn't exist + * like "0000:01:00.1", we don't want strtobool to think + * it's a '0' when it's clearly not what the user wanted. + * So we require 0's and 1's to be exactly one character. + */ + goto no_such_pci_device; + } else if (strtobool(page, &use_p2pmem)) { + goto no_such_pci_device; + } + + down_write(&nvmet_config_sem); + port->use_p2pmem = use_p2pmem; + pci_dev_put(port->p2p_dev); + port->p2p_dev = p2p_dev; + up_write(&nvmet_config_sem); + + return count; + +no_such_pci_device: + pr_err("No such PCI device: %s\n", page); + return -ENODEV; +} +CONFIGFS_ATTR(nvmet_, p2pmem); +#endif /* CONFIG_PCI_P2PDMA */ + static struct configfs_attribute *nvmet_port_attrs[] = { &nvmet_attr_addr_adrfam, &nvmet_attr_addr_treq, &nvmet_attr_addr_traddr, &nvmet_attr_addr_trsvcid, &nvmet_attr_addr_trtype, +#ifdef CONFIG_PCI_P2PDMA + &nvmet_attr_p2pmem, +#endif NULL, }; diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 75d44bc3e8d3..b2b62cd36f6c 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "nvmet.h" @@ -271,6 +272,25 @@ void nvmet_put_namespace(struct nvmet_ns *ns) percpu_ref_put(&ns->ref); } +static int nvmet_p2pdma_add_client(struct nvmet_ctrl *ctrl, + struct nvmet_ns *ns) +{ + int ret; + + if (!blk_queue_pci_p2pdma(ns->bdev->bd_queue)) { + pr_err("peer-to-peer DMA is not supported by %s\n", + ns->device_path); + return -EINVAL; + } + + ret = pci_p2pdma_add_client(&ctrl->p2p_clients, nvmet_ns_dev(ns)); + if (ret) + pr_err("failed to add peer-to-peer DMA client %s: %d\n", + ns->device_path, ret); + + return ret; +} + int nvmet_ns_enable(struct nvmet_ns *ns) { struct nvmet_subsys *subsys = ns->subsys; @@ -299,6 +319,14 @@ int nvmet_ns_enable(struct nvmet_ns *ns) if (ret) goto out_blkdev_put; + list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) { + if (ctrl->p2p_dev) { + ret = nvmet_p2pdma_add_client(ctrl, ns); + if (ret) + goto out_remove_clients; + } + } + if (ns->nsid > subsys->max_nsid) subsys->max_nsid = ns->nsid; @@ -328,6 +356,9 @@ int nvmet_ns_enable(struct nvmet_ns *ns) out_unlock: mutex_unlock(&subsys->lock); return ret; +out_remove_clients: + list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) + pci_p2pdma_remove_client(&ctrl->p2p_clients, nvmet_ns_dev(ns)); out_blkdev_put: blkdev_put(ns->bdev, FMODE_WRITE|FMODE_READ); ns->bdev = NULL; @@ -363,8 +394,10 @@ void nvmet_ns_disable(struct nvmet_ns *ns) percpu_ref_exit(&ns->ref); mutex_lock(&subsys->lock); - list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) + list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) { + pci_p2pdma_remove_client(&ctrl->p2p_clients, nvmet_ns_dev(ns)); nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE, 0, 0); + } if (ns->bdev) blkdev_put(ns->bdev, FMODE_WRITE|FMODE_READ); @@ -577,6 +610,21 @@ EXPORT_SYMBOL_GPL(nvmet_req_execute); int nvmet_req_alloc_sgl(struct nvmet_req *req, struct nvmet_sq *sq) { + struct pci_dev *p2p_dev = NULL; + + if (sq->ctrl) + p2p_dev = sq->ctrl->p2p_dev; + + req->p2p_dev = NULL; + if (sq->qid && p2p_dev) { + req->sg = pci_p2pmem_alloc_sgl(p2p_dev, &req->sg_cnt, + req->transfer_len); + if (req->sg) { + req->p2p_dev = p2p_dev; + return 0; + } + } + req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt); if (!req->sg) return -ENOMEM; @@ -587,7 +635,11 @@ EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl); void nvmet_req_free_sgl(struct nvmet_req *req) { - sgl_free(req->sg); + if (req->p2p_dev) + pci_p2pmem_free_sgl(req->p2p_dev, req->sg); + else + sgl_free(req->sg); + req->sg = NULL; req->sg_cnt = 0; } @@ -782,6 +834,74 @@ bool nvmet_host_allowed(struct nvmet_req *req, struct nvmet_subsys *subsys, return __nvmet_host_allowed(subsys, hostnqn); } +/* + * If allow_p2pmem is set, we will try to use P2P memory for the SGL lists for + * Ι/O commands. This requires the PCI p2p device to be compatible with the + * backing device for every namespace on this controller. + */ +static void nvmet_setup_p2pmem(struct nvmet_ctrl *ctrl, struct nvmet_req *req) +{ + struct nvmet_ns *ns; + int ret; + + if (!req->port->use_p2pmem || !req->p2p_client) + return; + + mutex_lock(&ctrl->subsys->lock); + + ret = pci_p2pdma_add_client(&ctrl->p2p_clients, req->p2p_client); + if (ret) { + pr_err("failed adding peer-to-peer DMA client %s: %d\n", + dev_name(req->p2p_client), ret); + goto free_devices; + } + + list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) { + ret = nvmet_p2pdma_add_client(ctrl, ns); + if (ret) + goto free_devices; + } + + if (req->port->p2p_dev) { + if (!pci_p2pdma_assign_provider(req->port->p2p_dev, + &ctrl->p2p_clients)) { + pr_info("peer-to-peer memory on %s is not supported\n", + pci_name(req->port->p2p_dev)); + goto free_devices; + } + ctrl->p2p_dev = pci_dev_get(req->port->p2p_dev); + } else { + ctrl->p2p_dev = pci_p2pmem_find(&ctrl->p2p_clients); + if (!ctrl->p2p_dev) { + pr_info("no supported peer-to-peer memory devices found\n"); + goto free_devices; + } + } + + mutex_unlock(&ctrl->subsys->lock); + + pr_info("using peer-to-peer memory on %s\n", pci_name(ctrl->p2p_dev)); + return; + +free_devices: + pci_p2pdma_client_list_free(&ctrl->p2p_clients); + mutex_unlock(&ctrl->subsys->lock); +} + +static void nvmet_release_p2pmem(struct nvmet_ctrl *ctrl) +{ + if (!ctrl->p2p_dev) + return; + + mutex_lock(&ctrl->subsys->lock); + + pci_p2pdma_client_list_free(&ctrl->p2p_clients); + pci_dev_put(ctrl->p2p_dev); + ctrl->p2p_dev = NULL; + + mutex_unlock(&ctrl->subsys->lock); +} + u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn, struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp) { @@ -821,6 +941,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn, INIT_WORK(&ctrl->async_event_work, nvmet_async_event_work); INIT_LIST_HEAD(&ctrl->async_events); + INIT_LIST_HEAD(&ctrl->p2p_clients); memcpy(ctrl->subsysnqn, subsysnqn, NVMF_NQN_SIZE); memcpy(ctrl->hostnqn, hostnqn, NVMF_NQN_SIZE); @@ -876,6 +997,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn, ctrl->kato = DIV_ROUND_UP(kato, 1000); } nvmet_start_keep_alive_timer(ctrl); + nvmet_setup_p2pmem(ctrl, req); mutex_lock(&subsys->lock); list_add_tail(&ctrl->subsys_entry, &subsys->ctrls); @@ -912,6 +1034,7 @@ static void nvmet_ctrl_free(struct kref *ref) flush_work(&ctrl->async_event_work); cancel_work_sync(&ctrl->fatal_err_work); + nvmet_release_p2pmem(ctrl); ida_simple_remove(&cntlid_ida, ctrl->cntlid); kfree(ctrl->sqs); diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c index cd2344179673..39bd37f1f312 100644 --- a/drivers/nvme/target/io-cmd.c +++ b/drivers/nvme/target/io-cmd.c @@ -56,6 +56,9 @@ static void nvmet_execute_rw(struct nvmet_req *req) op = REQ_OP_READ; } + if (is_pci_p2pdma_page(sg_page(req->sg))) + op_flags |= REQ_PCI_P2PDMA; + sector = le64_to_cpu(req->cmd->rw.slba); sector <<= (req->ns->blksize_shift - 9); diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index 10b162615a5e..f192fefe61d9 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -64,6 +64,11 @@ static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item) return container_of(to_config_group(item), struct nvmet_ns, group); } +static inline struct device *nvmet_ns_dev(struct nvmet_ns *ns) +{ + return disk_to_dev(ns->bdev->bd_disk); +} + struct nvmet_cq { u16 qid; u16 size; @@ -98,6 +103,8 @@ struct nvmet_port { struct list_head referrals; void *priv; bool enabled; + bool use_p2pmem; + struct pci_dev *p2p_dev; }; static inline struct nvmet_port *to_nvmet_port(struct config_item *item) @@ -132,6 +139,9 @@ struct nvmet_ctrl { const struct nvmet_fabrics_ops *ops; + struct pci_dev *p2p_dev; + struct list_head p2p_clients; + char subsysnqn[NVMF_NQN_FIELD_LEN]; char hostnqn[NVMF_NQN_FIELD_LEN]; }; @@ -234,6 +244,9 @@ struct nvmet_req { void (*execute)(struct nvmet_req *req); const struct nvmet_fabrics_ops *ops; + + struct pci_dev *p2p_dev; + struct device *p2p_client; }; static inline void nvmet_set_status(struct nvmet_req *req, u16 status) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index f7a3459d618f..27a6d8ea1b56 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -661,6 +661,8 @@ static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue, cmd->send_sge.addr, cmd->send_sge.length, DMA_TO_DEVICE); + cmd->req.p2p_client = &queue->dev->device->dev; + if (!nvmet_req_init(&cmd->req, &queue->nvme_cq, &queue->nvme_sq, &nvmet_rdma_ops)) return;