From patchwork Tue Jan 29 17:47:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 1032929 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43pv9M2HL2z9s1l for ; Wed, 30 Jan 2019 04:48:07 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728859AbfA2Rrl (ORCPT ); Tue, 29 Jan 2019 12:47:41 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48450 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728667AbfA2Rrl (ORCPT ); Tue, 29 Jan 2019 12:47:41 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 105AFA7885; Tue, 29 Jan 2019 17:47:40 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id CC26B5D985; Tue, 29 Jan 2019 17:47:37 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Logan Gunthorpe , Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , Jason Gunthorpe , linux-pci@vger.kernel.org, dri-devel@lists.freedesktop.org, Christoph Hellwig , Marek Szyprowski , Robin Murphy , Joerg Roedel , iommu@lists.linux-foundation.org Subject: [RFC PATCH 1/5] pci/p2p: add a function to test peer to peer capability Date: Tue, 29 Jan 2019 12:47:24 -0500 Message-Id: <20190129174728.6430-2-jglisse@redhat.com> In-Reply-To: <20190129174728.6430-1-jglisse@redhat.com> References: <20190129174728.6430-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 29 Jan 2019 17:47:40 +0000 (UTC) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Jérôme Glisse device_test_p2p() return true if two devices can peer to peer to each other. We add a generic function as different inter-connect can support peer to peer and we want to genericaly test this no matter what the inter-connect might be. However this version only support PCIE for now. Signed-off-by: Jérôme Glisse Cc: Logan Gunthorpe Cc: Greg Kroah-Hartman Cc: Rafael J. Wysocki Cc: Bjorn Helgaas Cc: Christian Koenig Cc: Felix Kuehling Cc: Jason Gunthorpe Cc: linux-kernel@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org --- drivers/pci/p2pdma.c | 27 +++++++++++++++++++++++++++ include/linux/pci-p2pdma.h | 6 ++++++ 2 files changed, 33 insertions(+) diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index c52298d76e64..620ac60babb5 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -797,3 +797,30 @@ ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, return sprintf(page, "%s\n", pci_name(p2p_dev)); } EXPORT_SYMBOL_GPL(pci_p2pdma_enable_show); + +bool pci_test_p2p(struct device *devA, struct device *devB) +{ + struct pci_dev *pciA, *pciB; + bool ret; + int tmp; + + /* + * For now we only support PCIE peer to peer but other inter-connect + * can be added. + */ + pciA = find_parent_pci_dev(devA); + pciB = find_parent_pci_dev(devB); + if (pciA == NULL || pciB == NULL) { + ret = false; + goto out; + } + + tmp = upstream_bridge_distance(pciA, pciB, NULL); + ret = tmp < 0 ? false : true; + +out: + pci_dev_put(pciB); + pci_dev_put(pciA); + return false; +} +EXPORT_SYMBOL_GPL(pci_test_p2p); diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index bca9bc3e5be7..7671cc499a08 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -36,6 +36,7 @@ int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev, bool *use_p2pdma); ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, bool use_p2pdma); +bool pci_test_p2p(struct device *devA, struct device *devB); #else /* CONFIG_PCI_P2PDMA */ static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, u64 offset) @@ -97,6 +98,11 @@ static inline ssize_t pci_p2pdma_enable_show(char *page, { return sprintf(page, "none\n"); } + +static inline bool pci_test_p2p(struct device *devA, struct device *devB) +{ + return false; +} #endif /* CONFIG_PCI_P2PDMA */ From patchwork Tue Jan 29 17:47:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 1032928 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43pv9K6My7z9sBZ for ; Wed, 30 Jan 2019 04:48:05 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727610AbfA2Rrn (ORCPT ); Tue, 29 Jan 2019 12:47:43 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44032 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728914AbfA2Rrn (ORCPT ); Tue, 29 Jan 2019 12:47:43 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4BDFEC0753CE; Tue, 29 Jan 2019 17:47:42 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 38BF05D98E; Tue, 29 Jan 2019 17:47:40 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Logan Gunthorpe , Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , Jason Gunthorpe , linux-pci@vger.kernel.org, dri-devel@lists.freedesktop.org, Christoph Hellwig , Marek Szyprowski , Robin Murphy , Joerg Roedel , iommu@lists.linux-foundation.org Subject: [RFC PATCH 2/5] drivers/base: add a function to test peer to peer capability Date: Tue, 29 Jan 2019 12:47:25 -0500 Message-Id: <20190129174728.6430-3-jglisse@redhat.com> In-Reply-To: <20190129174728.6430-1-jglisse@redhat.com> References: <20190129174728.6430-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 29 Jan 2019 17:47:42 +0000 (UTC) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Jérôme Glisse device_test_p2p() return true if two devices can peer to peer to each other. We add a generic function as different inter-connect can support peer to peer and we want to genericaly test this no matter what the inter-connect might be. However this version only support PCIE for now. Signed-off-by: Jérôme Glisse Cc: Logan Gunthorpe Cc: Greg Kroah-Hartman Cc: Rafael J. Wysocki Cc: Bjorn Helgaas Cc: Christian Koenig Cc: Felix Kuehling Cc: Jason Gunthorpe Cc: linux-kernel@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org --- drivers/base/core.c | 20 ++++++++++++++++++++ include/linux/device.h | 1 + 2 files changed, 21 insertions(+) diff --git a/drivers/base/core.c b/drivers/base/core.c index 0073b09bb99f..56023b00e108 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "base.h" #include "power/power.h" @@ -3167,3 +3168,22 @@ void device_set_of_node_from_dev(struct device *dev, const struct device *dev2) dev->of_node_reused = true; } EXPORT_SYMBOL_GPL(device_set_of_node_from_dev); + +/** + * device_test_p2p - test if two device can peer to peer to each other + * @devA: device A + * @devB: device B + * Returns: true if device can peer to peer to each other, false otherwise + */ +bool device_test_p2p(struct device *devA, struct device *devB) +{ + /* + * For now we only support PCIE peer to peer but other inter-connect + * can be added. + */ + if (pci_test_p2p(devA, devB)) + return true; + + return false; +} +EXPORT_SYMBOL_GPL(device_test_p2p); diff --git a/include/linux/device.h b/include/linux/device.h index 6cb4640b6160..0d532d7f0779 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1250,6 +1250,7 @@ extern int device_online(struct device *dev); extern void set_primary_fwnode(struct device *dev, struct fwnode_handle *fwnode); extern void set_secondary_fwnode(struct device *dev, struct fwnode_handle *fwnode); void device_set_of_node_from_dev(struct device *dev, const struct device *dev2); +bool device_test_p2p(struct device *devA, struct device *devB); static inline int dev_num_vf(struct device *dev) { From patchwork Tue Jan 29 17:47:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 1032925 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43pv9520tyz9sBZ for ; Wed, 30 Jan 2019 04:47:53 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729119AbfA2Rrr (ORCPT ); Tue, 29 Jan 2019 12:47:47 -0500 Received: from mx1.redhat.com ([209.132.183.28]:23840 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728914AbfA2Rrq (ORCPT ); Tue, 29 Jan 2019 12:47:46 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 92F9FC073D6F; Tue, 29 Jan 2019 17:47:45 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 93F8318A75; Tue, 29 Jan 2019 17:47:42 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Logan Gunthorpe , Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , Jason Gunthorpe , linux-pci@vger.kernel.org, dri-devel@lists.freedesktop.org, Christoph Hellwig , Marek Szyprowski , Robin Murphy , Joerg Roedel , iommu@lists.linux-foundation.org Subject: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma Date: Tue, 29 Jan 2019 12:47:26 -0500 Message-Id: <20190129174728.6430-4-jglisse@redhat.com> In-Reply-To: <20190129174728.6430-1-jglisse@redhat.com> References: <20190129174728.6430-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 29 Jan 2019 17:47:46 +0000 (UTC) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Jérôme Glisse Allow mmap of device file to export device memory to peer to peer devices. This will allow for instance a network device to access a GPU memory or to access a storage device queue directly. The common case will be a vma created by userspace device driver that is then share to another userspace device driver which call in its kernel device driver to map that vma. The vma does not need to have any valid CPU mapping so that only peer to peer device might access its content. Or it could have valid CPU mapping too in that case it should point to same memory for consistency. Note that peer to peer mapping is highly platform and device dependent and it might not work in all the cases. However we do expect supports for this to grow on more hardware platform. This patch only adds new call backs to vm_operations_struct bulk of code light within common bus driver (like pci) and device driver (both the exporting and importing device). Current design mandate that the importer must obey mmu_notifier and invalidate any peer to peer mapping anytime a notification of invalidation happens for a range that have been peer to peer mapped. This allows exporter device to easily invalidate mapping for any importer device. Signed-off-by: Jérôme Glisse Cc: Logan Gunthorpe Cc: Greg Kroah-Hartman Cc: Rafael J. Wysocki Cc: Bjorn Helgaas Cc: Christian Koenig Cc: Felix Kuehling Cc: Jason Gunthorpe Cc: linux-kernel@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org --- include/linux/mm.h | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 80bb6408fe73..1bd60a90e575 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -429,6 +429,44 @@ struct vm_operations_struct { pgoff_t start_pgoff, pgoff_t end_pgoff); unsigned long (*pagesize)(struct vm_area_struct * area); + /* + * Optional for device driver that want to allow peer to peer (p2p) + * mapping of their vma (which can be back by some device memory) to + * another device. + * + * Note that the exporting device driver might not have map anything + * inside the vma for the CPU but might still want to allow a peer + * device to access the range of memory corresponding to a range in + * that vma. + * + * FOR PREDICTABILITY IF DRIVER SUCCESSFULY MAP A RANGE ONCE FOR A + * DEVICE THEN FURTHER MAPPING OF THE SAME IF THE VMA IS STILL VALID + * SHOULD ALSO BE SUCCESSFUL. Following this rule allow the importing + * device to map once during setup and report any failure at that time + * to the userspace. Further mapping of the same range might happen + * after mmu notifier invalidation over the range. The exporting device + * can use this to move things around (defrag BAR space for instance) + * or do other similar task. + * + * IMPORTER MUST OBEY mmu_notifier NOTIFICATION AND CALL p2p_unmap() + * WHEN A NOTIFIER IS CALL FOR THE RANGE ! THIS CAN HAPPEN AT ANY + * POINT IN TIME WITH NO LOCK HELD. + * + * In below function, the device argument is the importing device, + * the exporting device is the device to which the vma belongs. + */ + long (*p2p_map)(struct vm_area_struct *vma, + struct device *device, + unsigned long start, + unsigned long end, + dma_addr_t *pa, + bool write); + long (*p2p_unmap)(struct vm_area_struct *vma, + struct device *device, + unsigned long start, + unsigned long end, + dma_addr_t *pa); + /* notification that a previously read-only page is about to become * writable, if an error is returned it will cause a SIGBUS */ vm_fault_t (*page_mkwrite)(struct vm_fault *vmf); From patchwork Tue Jan 29 17:47:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 1032927 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43pv9J5v8Xz9s1l for ; Wed, 30 Jan 2019 04:48:04 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729199AbfA2Rr5 (ORCPT ); Tue, 29 Jan 2019 12:47:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44506 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728914AbfA2Rry (ORCPT ); Tue, 29 Jan 2019 12:47:54 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C7A34C079C49; Tue, 29 Jan 2019 17:47:47 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id B23F35D97A; Tue, 29 Jan 2019 17:47:45 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Logan Gunthorpe , Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , Jason Gunthorpe , linux-pci@vger.kernel.org, dri-devel@lists.freedesktop.org, Christoph Hellwig , Marek Szyprowski , Robin Murphy , Joerg Roedel , iommu@lists.linux-foundation.org Subject: [RFC PATCH 4/5] mm/hmm: add support for peer to peer to HMM device memory Date: Tue, 29 Jan 2019 12:47:27 -0500 Message-Id: <20190129174728.6430-5-jglisse@redhat.com> In-Reply-To: <20190129174728.6430-1-jglisse@redhat.com> References: <20190129174728.6430-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 29 Jan 2019 17:47:53 +0000 (UTC) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Jérôme Glisse Signed-off-by: Jérôme Glisse Cc: Logan Gunthorpe Cc: Greg Kroah-Hartman Cc: Rafael J. Wysocki Cc: Bjorn Helgaas Cc: Christian Koenig Cc: Felix Kuehling Cc: Jason Gunthorpe Cc: linux-pci@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org --- include/linux/hmm.h | 47 +++++++++++++++++++++++++++++++++ mm/hmm.c | 63 +++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 105 insertions(+), 5 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 4a1454e3efba..7a3ac182cc48 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -710,6 +710,53 @@ struct hmm_devmem_ops { const struct page *page, unsigned int flags, pmd_t *pmdp); + + /* + * p2p_map() - map page for peer to peer between device + * @devmem: device memory structure (see struct hmm_devmem) + * @range: range of virtual address that is being mapped + * @device: device the range is being map to + * @addr: first virtual address in the range to consider + * @pa: device address (where actual mapping is store) + * Returns: number of page successfuly mapped, 0 otherwise + * + * Map page belonging to devmem to another device for peer to peer + * access. Device can decide not to map in which case memory will + * be migrated to main memory. + * + * Also there is no garantee that all the pages in the range does + * belongs to the devmem so it is up to the function to check that + * every single page does belong to devmem. + * + * Note for now we do not care about error exect error, so on failure + * function should just return 0. + */ + long (*p2p_map)(struct hmm_devmem *devmem, + struct hmm_range *range, + struct device *device, + unsigned long addr, + dma_addr_t *pas); + + /* + * p2p_unmap() - unmap page from peer to peer between device + * @devmem: device memory structure (see struct hmm_devmem) + * @range: range of virtual address that is being mapped + * @device: device the range is being map to + * @addr: first virtual address in the range to consider + * @pa: device address (where actual mapping is store) + * Returns: number of page successfuly unmapped, 0 otherwise + * + * Unmap page belonging to devmem previously map with p2p_map(). + * + * Note there is no garantee that all the pages in the range does + * belongs to the devmem so it is up to the function to check that + * every single page does belong to devmem. + */ + unsigned long (*p2p_unmap)(struct hmm_devmem *devmem, + struct hmm_range *range, + struct device *device, + unsigned long addr, + dma_addr_t *pas); }; /* diff --git a/mm/hmm.c b/mm/hmm.c index 1a444885404e..fd49b1e116d0 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -1193,16 +1193,19 @@ long hmm_range_dma_map(struct hmm_range *range, dma_addr_t *daddrs, bool block) { - unsigned long i, npages, mapped, page_size; + unsigned long i, npages, mapped, page_size, addr; long ret; +again: ret = hmm_range_fault(range, block); if (ret <= 0) return ret ? ret : -EBUSY; + mapped = 0; + addr = range->start; page_size = hmm_range_page_size(range); npages = (range->end - range->start) >> range->page_shift; - for (i = 0, mapped = 0; i < npages; ++i) { + for (i = 0; i < npages; ++i, addr += page_size) { enum dma_data_direction dir = DMA_FROM_DEVICE; struct page *page; @@ -1226,6 +1229,29 @@ long hmm_range_dma_map(struct hmm_range *range, goto unmap; } + if (is_device_private_page(page)) { + struct hmm_devmem *devmem = page->pgmap->data; + + if (!devmem->ops->p2p_map || !devmem->ops->p2p_unmap) { + /* Fall-back to main memory. */ + range->default_flags |= + range->flags[HMM_PFN_DEVICE_PRIVATE]; + goto again; + } + + ret = devmem->ops->p2p_map(devmem, range, device, + addr, daddrs); + if (ret <= 0) { + /* Fall-back to main memory. */ + range->default_flags |= + range->flags[HMM_PFN_DEVICE_PRIVATE]; + goto again; + } + mapped += ret; + i += ret; + continue; + } + /* If it is read and write than map bi-directional. */ if (range->pfns[i] & range->values[HMM_PFN_WRITE]) dir = DMA_BIDIRECTIONAL; @@ -1242,7 +1268,9 @@ long hmm_range_dma_map(struct hmm_range *range, return mapped; unmap: - for (npages = i, i = 0; (i < npages) && mapped; ++i) { + npages = i; + addr = range->start; + for (i = 0; (i < npages) && mapped; ++i, addr += page_size) { enum dma_data_direction dir = DMA_FROM_DEVICE; struct page *page; @@ -1253,6 +1281,18 @@ long hmm_range_dma_map(struct hmm_range *range, if (dma_mapping_error(device, daddrs[i])) continue; + if (is_device_private_page(page)) { + struct hmm_devmem *devmem = page->pgmap->data; + unsigned long inc; + + inc = devmem->ops->p2p_unmap(devmem, range, device, + addr, &daddrs[i]); + BUG_ON(inc > npages); + mapped += inc; + i += inc; + continue; + } + /* If it is read and write than map bi-directional. */ if (range->pfns[i] & range->values[HMM_PFN_WRITE]) dir = DMA_BIDIRECTIONAL; @@ -1285,7 +1325,7 @@ long hmm_range_dma_unmap(struct hmm_range *range, dma_addr_t *daddrs, bool dirty) { - unsigned long i, npages, page_size; + unsigned long i, npages, page_size, addr; long cpages = 0; /* Sanity check. */ @@ -1298,7 +1338,7 @@ long hmm_range_dma_unmap(struct hmm_range *range, page_size = hmm_range_page_size(range); npages = (range->end - range->start) >> range->page_shift; - for (i = 0; i < npages; ++i) { + for (i = 0, addr = range->start; i < npages; ++i, addr += page_size) { enum dma_data_direction dir = DMA_FROM_DEVICE; struct page *page; @@ -1318,6 +1358,19 @@ long hmm_range_dma_unmap(struct hmm_range *range, set_page_dirty(page); } + if (is_device_private_page(page)) { + struct hmm_devmem *devmem = page->pgmap->data; + unsigned long ret; + + BUG_ON(!devmem->ops->p2p_unmap); + + ret = devmem->ops->p2p_unmap(devmem, range, device, + addr, &daddrs[i]); + BUG_ON(ret > npages); + i += ret; + continue; + } + /* Unmap and clear pfns/dma address */ dma_unmap_page(device, daddrs[i], page_size, dir); range->pfns[i] = range->values[HMM_PFN_NONE]; From patchwork Tue Jan 29 17:47:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerome Glisse X-Patchwork-Id: 1032926 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43pv996qsXz9s1l for ; Wed, 30 Jan 2019 04:47:57 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729163AbfA2Rrw (ORCPT ); Tue, 29 Jan 2019 12:47:52 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60272 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728914AbfA2Rrv (ORCPT ); Tue, 29 Jan 2019 12:47:51 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5443A7AE81; Tue, 29 Jan 2019 17:47:50 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id D994D5D97E; Tue, 29 Jan 2019 17:47:47 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Logan Gunthorpe , Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , Jason Gunthorpe , linux-pci@vger.kernel.org, dri-devel@lists.freedesktop.org, Christoph Hellwig , Marek Szyprowski , Robin Murphy , Joerg Roedel , iommu@lists.linux-foundation.org Subject: [RFC PATCH 5/5] mm/hmm: add support for peer to peer to special device vma Date: Tue, 29 Jan 2019 12:47:28 -0500 Message-Id: <20190129174728.6430-6-jglisse@redhat.com> In-Reply-To: <20190129174728.6430-1-jglisse@redhat.com> References: <20190129174728.6430-1-jglisse@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 29 Jan 2019 17:47:51 +0000 (UTC) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org From: Jérôme Glisse Special device vma (mmap of a device file) can correspond to device driver object that some device driver might want to share with other device (giving access to). This add support for HMM to map those special device vma if the owning device (exporter) allows it. Signed-off-by: Jérôme Glisse Cc: Logan Gunthorpe Cc: Greg Kroah-Hartman Cc: Rafael J. Wysocki Cc: Bjorn Helgaas Cc: Christian Koenig Cc: Felix Kuehling Cc: Jason Gunthorpe Cc: linux-pci@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org --- include/linux/hmm.h | 6 ++ mm/hmm.c | 156 ++++++++++++++++++++++++++++++++++---------- 2 files changed, 128 insertions(+), 34 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 7a3ac182cc48..98ebe9f52432 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -137,6 +137,7 @@ enum hmm_pfn_flag_e { * result of vmf_insert_pfn() or vm_insert_page(). Therefore, it should not * be mirrored by a device, because the entry will never have HMM_PFN_VALID * set and the pfn value is undefined. + * HMM_PFN_P2P: this entry have been map as P2P ie the dma address is valid * * Driver provide entry value for none entry, error entry and special entry, * driver can alias (ie use same value for error and special for instance). It @@ -151,6 +152,7 @@ enum hmm_pfn_value_e { HMM_PFN_ERROR, HMM_PFN_NONE, HMM_PFN_SPECIAL, + HMM_PFN_P2P, HMM_PFN_VALUE_MAX }; @@ -250,6 +252,8 @@ static inline bool hmm_range_valid(struct hmm_range *range) static inline struct page *hmm_pfn_to_page(const struct hmm_range *range, uint64_t pfn) { + if (pfn == range->values[HMM_PFN_P2P]) + return NULL; if (pfn == range->values[HMM_PFN_NONE]) return NULL; if (pfn == range->values[HMM_PFN_ERROR]) @@ -270,6 +274,8 @@ static inline struct page *hmm_pfn_to_page(const struct hmm_range *range, static inline unsigned long hmm_pfn_to_pfn(const struct hmm_range *range, uint64_t pfn) { + if (pfn == range->values[HMM_PFN_P2P]) + return -1UL; if (pfn == range->values[HMM_PFN_NONE]) return -1UL; if (pfn == range->values[HMM_PFN_ERROR]) diff --git a/mm/hmm.c b/mm/hmm.c index fd49b1e116d0..621a4f831483 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -1058,37 +1058,36 @@ long hmm_range_snapshot(struct hmm_range *range) } EXPORT_SYMBOL(hmm_range_snapshot); -/* - * hmm_range_fault() - try to fault some address in a virtual address range - * @range: range being faulted - * @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem) - * Returns: 0 on success ortherwise: - * -EINVAL: - * Invalid argument - * -ENOMEM: - * Out of memory. - * -EPERM: - * Invalid permission (for instance asking for write and range - * is read only). - * -EAGAIN: - * If you need to retry and mmap_sem was drop. This can only - * happens if block argument is false. - * -EBUSY: - * If the the range is being invalidated and you should wait for - * invalidation to finish. - * -EFAULT: - * Invalid (ie either no valid vma or it is illegal to access that - * range), number of valid pages in range->pfns[] (from range start - * address). - * - * This is similar to a regular CPU page fault except that it will not trigger - * any memory migration if the memory being faulted is not accessible by CPUs - * and caller does not ask for migration. - * - * On error, for one virtual address in the range, the function will mark the - * corresponding HMM pfn entry with an error flag. - */ -long hmm_range_fault(struct hmm_range *range, bool block) +static int hmm_vma_p2p_map(struct hmm_range *range, struct vm_area_struct *vma, + unsigned long start, unsigned long end, + struct device *device, dma_addr_t *pas) +{ + struct hmm_vma_walk hmm_vma_walk; + unsigned long npages, i; + bool fault, write; + uint64_t *pfns; + int ret; + + i = (start - range->start) >> PAGE_SHIFT; + npages = (end - start) >> PAGE_SHIFT; + pfns = &range->pfns[i]; + pas = &pas[i]; + + hmm_vma_walk.range = range; + hmm_vma_walk.fault = true; + hmm_range_need_fault(&hmm_vma_walk, pfns, npages, + 0, &fault, &write); + + ret = vma->vm_ops->p2p_map(vma, device, start, end, pas, write); + for (i = 0; i < npages; ++i) { + pfns[i] = ret ? range->values[HMM_PFN_ERROR] : + range->values[HMM_PFN_P2P]; + } + return ret; +} + +static long _hmm_range_fault(struct hmm_range *range, bool block, + struct device *device, dma_addr_t *pas) { const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP; unsigned long start = range->start, end; @@ -1110,9 +1109,22 @@ long hmm_range_fault(struct hmm_range *range, bool block) } vma = find_vma(hmm->mm, start); - if (vma == NULL || (vma->vm_flags & device_vma)) + if (vma == NULL) return -EFAULT; + end = min(range->end, vma->vm_end); + if (vma->vm_flags & device_vma) { + if (!device || !pas || !vma->vm_ops->p2p_map) + return -EFAULT; + + ret = hmm_vma_p2p_map(range, vma, start, + end, device, pas); + if (ret) + return ret; + start = end; + continue; + } + if (is_vm_hugetlb_page(vma)) { struct hstate *h = hstate_vma(vma); @@ -1142,7 +1154,6 @@ long hmm_range_fault(struct hmm_range *range, bool block) hmm_vma_walk.block = block; hmm_vma_walk.range = range; mm_walk.private = &hmm_vma_walk; - end = min(range->end, vma->vm_end); mm_walk.vma = vma; mm_walk.mm = vma->vm_mm; @@ -1175,6 +1186,41 @@ long hmm_range_fault(struct hmm_range *range, bool block) return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT; } + +/* + * hmm_range_fault() - try to fault some address in a virtual address range + * @range: range being faulted + * @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem) + * Returns: 0 on success ortherwise: + * -EINVAL: + * Invalid argument + * -ENOMEM: + * Out of memory. + * -EPERM: + * Invalid permission (for instance asking for write and range + * is read only). + * -EAGAIN: + * If you need to retry and mmap_sem was drop. This can only + * happens if block argument is false. + * -EBUSY: + * If the the range is being invalidated and you should wait for + * invalidation to finish. + * -EFAULT: + * Invalid (ie either no valid vma or it is illegal to access that + * range), number of valid pages in range->pfns[] (from range start + * address). + * + * This is similar to a regular CPU page fault except that it will not trigger + * any memory migration if the memory being faulted is not accessible by CPUs + * and caller does not ask for migration. + * + * On error, for one virtual address in the range, the function will mark the + * corresponding HMM pfn entry with an error flag. + */ +long hmm_range_fault(struct hmm_range *range, bool block) +{ + return _hmm_range_fault(range, block, NULL, NULL); +} EXPORT_SYMBOL(hmm_range_fault); /* @@ -1197,7 +1243,7 @@ long hmm_range_dma_map(struct hmm_range *range, long ret; again: - ret = hmm_range_fault(range, block); + ret = _hmm_range_fault(range, block, device, daddrs); if (ret <= 0) return ret ? ret : -EBUSY; @@ -1209,6 +1255,11 @@ long hmm_range_dma_map(struct hmm_range *range, enum dma_data_direction dir = DMA_FROM_DEVICE; struct page *page; + if (range->pfns[i] == range->values[HMM_PFN_P2P]) { + mapped++; + continue; + } + /* * FIXME need to update DMA API to provide invalid DMA address * value instead of a function to test dma address value. This @@ -1274,6 +1325,11 @@ long hmm_range_dma_map(struct hmm_range *range, enum dma_data_direction dir = DMA_FROM_DEVICE; struct page *page; + if (range->pfns[i] == range->values[HMM_PFN_P2P]) { + mapped--; + continue; + } + page = hmm_pfn_to_page(range, range->pfns[i]); if (page == NULL) continue; @@ -1305,6 +1361,30 @@ long hmm_range_dma_map(struct hmm_range *range, } EXPORT_SYMBOL(hmm_range_dma_map); +static unsigned long hmm_vma_p2p_unmap(struct hmm_range *range, + struct vm_area_struct *vma, + unsigned long start, + struct device *device, + dma_addr_t *pas) +{ + unsigned long end; + + if (!vma) { + BUG(); + return 1; + } + + start &= PAGE_MASK; + if (start < vma->vm_start || start >= vma->vm_end) { + BUG(); + return 1; + } + + end = min(range->end, vma->vm_end); + vma->vm_ops->p2p_unmap(vma, device, start, end, pas); + return (end - start) >> PAGE_SHIFT; +} + /* * hmm_range_dma_unmap() - unmap range of that was map with hmm_range_dma_map() * @range: range being unmapped @@ -1342,6 +1422,14 @@ long hmm_range_dma_unmap(struct hmm_range *range, enum dma_data_direction dir = DMA_FROM_DEVICE; struct page *page; + if (range->pfns[i] == range->values[HMM_PFN_P2P]) { + BUG_ON(!vma); + cpages += hmm_vma_p2p_unmap(range, vma, addr, + device, &daddrs[i]); + i += cpages - 1; + continue; + } + page = hmm_pfn_to_page(range, range->pfns[i]); if (page == NULL) continue;