From patchwork Fri Sep 6 19:49:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 1982072 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4X0myz05YZz1y1G for ; Sat, 7 Sep 2024 05:49:47 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1smexe-0002PL-Mf; Fri, 06 Sep 2024 19:49:38 +0000 Received: from mail-mw2nam10on2086.outbound.protection.outlook.com ([40.107.94.86] helo=NAM10-MW2-obe.outbound.protection.outlook.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1smexc-0002Mm-PO for kernel-team@lists.ubuntu.com; Fri, 06 Sep 2024 19:49:37 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=tv3v76cEDYR9klKQxvKvftcGYaPQ1uBm5ViBVCi8dzWeugWn8vLjY+cGQzXbQnIfGqUwM9m2+4b7+rK0iwLCwTSYAYia6RVmAAiHreyUTteAdzmVUGkdKGCp5IA81p6Cj/ewVLDV6/pfiCajIIltHKzkBc4vQELQQOX8rx2YU7EFYlUWMze54IICGVRGVJxT8oQ+vdyMGs20TTAEXgUZEn6F43NnqYCkIenc1in83W/y6QnJawIFXvUioKh9eudX7qbiytrZN0JQ7umsuLy1v7W1v2sZ3zSXUTJt3WkusawE8TXXCk4hOqIrRfUWcHAUHc9mSMj/vR9tU7JPtlWBUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=39wD6H/w7feXQfnzn5Th+Y9VJIDIP181II4OliAcCOQ=; b=LUtCsdiu+B935O3ZJ19olTneHnWVnUNBHKGrN81LQ0Eqyoso3Otcx9mINsVA04EI7LfsIKz7OIyY553Ut/2DQxoQRsfY2s0xzl4Y8jwYKMK/g+0Qyp8HJPYvw8+8eoyHfxtzUmrW8P3uxExtc1HoLYvoCNb/+484IntQZphth34+55T5+5pzFICv/DznFJGzwQBouhBvN5TGVvEeSI6dqTPVy11Blp5jgmduG88shKEl3Rr1kmUwv2Hy37rz3ONAO6fZbJHfiwI1wdDbACA/ecwXStXZjOpUr+GNt8DgorNTnMEfyELXCliSDtPQy7gQCZS7XPSDf+B3MabXdqr6Fw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=lists.ubuntu.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) Received: from SN7PR04CA0081.namprd04.prod.outlook.com (2603:10b6:806:121::26) by DS7PR12MB8292.namprd12.prod.outlook.com (2603:10b6:8:e2::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.27; Fri, 6 Sep 2024 19:49:28 +0000 Received: from SN1PEPF0002636A.namprd02.prod.outlook.com (2603:10b6:806:121:cafe::66) by SN7PR04CA0081.outlook.office365.com (2603:10b6:806:121::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.27 via Frontend Transport; Fri, 6 Sep 2024 19:49:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by SN1PEPF0002636A.mail.protection.outlook.com (10.167.241.135) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7918.13 via Frontend Transport; Fri, 6 Sep 2024 19:49:28 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Fri, 6 Sep 2024 12:49:17 -0700 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail204.nvidia.com (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Fri, 6 Sep 2024 12:49:17 -0700 Received: from vdi.nvidia.com (10.127.8.12) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.1544.4 via Frontend Transport; Fri, 6 Sep 2024 12:49:16 -0700 From: William Tu To: Subject: [SRU][J:linux-bluefield][PATCH v3 4/6] UBUNTU: SAUCE: vfio/pci: Allow MMIO regions to be exported through dma-buf Date: Fri, 6 Sep 2024 19:49:06 +0000 Message-ID: <20240906194908.1127733-5-witu@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240906194908.1127733-1-witu@nvidia.com> References: <20240906194908.1127733-1-witu@nvidia.com> MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF0002636A:EE_|DS7PR12MB8292:EE_ X-MS-Office365-Filtering-Correlation-Id: ede4ac7e-4436-4c9b-58f9-08dccead04f1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|36860700013|1800799024|82310400026|376014; X-Microsoft-Antispam-Message-Info: MWNmo4TSVN6+ey2ClOp8zPigoS/wggm8WtBWvP7o/Mc+E1A3Kv6CoOtfyKZMcRKypo6M3UEQu9CQ6VJkutKAKts+kBWA9VDY/kaB4YF6M+6tSaVADdXxmsjtEe1lPaqXL99ycMfu5cbl6AJdQVO21a7Ytkzp795xa1tw1wsNiVMZv4hOp9A09EWd2PaI7WitSlSbe3NC2pjbiWwrK1LJawrLPlFo9Y+YpVX/gr71KUUo3CebQe7VmosLucm5v2nAsvklaVMzD/Nbj+bBQ1x7EnA7ouUYkasZh+CvWYTi1/EES/Mo0SHNFUYQob2DtIJKSZUE8cDLaqbkP+d28LhZjvCuklf2/4hqYegOAEqlY5XkM4zQOZIHCvU+K2IApSQyxRkJvizqDRKpPv66Q/GetYiHnQ3G0SSYsD9bnrkiDylwk2DNs/mb/vvze+eFngFsAqeaxUD3qMRF1+An+BMSxp6TbcjtOkpb3qvn1Y5zmBXn8XLXWYtsrJ1M8JbQOxEvhJUix+EzVUfxkvUCBhe0polVq8Ono1sCfH77hS3dVSEdg2XxXmBOPCLHiVl73cUOwFKRmRtPcoFdXUhfSVVf6NsRIO8G62oqZ93C/UH+kzt3Sc3u7eMhptvMxGrB9h+UXGLPqDK4vz5idPXkefq14u0J4JLDJ7y97LzJwgsl5SFXSXm6uln8xVJjaB9e3jaLiSiTjpBInNE0wg8yKFHEh2uDqZ6+hsy+2tykQja1AQ2nVpkd9FwCljDL3q+9RRr/gEgQq/Z2jYYCTw1oImpea08D5STKajkuzuhlkkY0CiDCwhIVfh7JvI6q+3lOcKeeRQpTgaLI5Xdj+IJKbo3dssrsU99AIYbQg+muxeHrl7exNsSXmbUpvqJP54U88chZ5qrW8ZuAwCHE1EOZGQSJxggYXGly2bPCR473NgG+2xfGCuLau+40zK3z3U9CE7LYqqWku2gswKphFX4W3/TLBBm/Os06DMbsPEP1CIzXUFwAHYgycP7yQ/HPguW3KyPFTDrkWq1nRCJeOxrLcC+kyJtOdYTeC9H4/Vp1vDcQPBxzFPJR9c7lkZ2T2tueBRr9ai4WGC5JFD4IbgafkMZyg+ucv//KPQdTCt3n5Xov7+958MfotaDLaaGlyo+YNv67u2XJ+WOWN55kPhp6MRCyXZiNDtbDB+N2O3N+oNQDx7bVaH8WSbhIMl6O+D/cUqfw0B/VzvTkd9Dibu96/n2zcDQ5ObgvE9Wu8qzDLwMQZwBnCTevSO7VOuWvu+5xabba5C9VLT4B6VJgLBBTGqUDNZsY+/4bQDo9qa6jDtSzittNyGDdqYqlm7oHyajcc3uGHbWaVnpU+lVsB6rxJsTTMhUa74l5ZX0Tc9XpvQj2jbNfEgEtULFiuSSDkOVlfbNWyKKFM30ytavFTCbTJVs9iuT1r3RZ7aAvOCPBj33D8mQm7/rEqi+C1H9D6U38aWzc X-Forefront-Antispam-Report: CIP:216.228.117.160; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge1.nvidia.com; CAT:NONE; SFS:(13230040)(36860700013)(1800799024)(82310400026)(376014); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Sep 2024 19:49:28.5951 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ede4ac7e-4436-4c9b-58f9-08dccead04f1 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.160]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF0002636A.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB8292 Received-SPF: softfail client-ip=40.107.94.86; envelope-from=witu@nvidia.com; helo=NAM10-MW2-obe.outbound.protection.outlook.com X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: vlad@nvidia.com, bodong@nvidia.com, sergeygo@nvidia.com, jgg@nvidia.com, zwaksman@nvidia.com Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Gunthorpe BugLink: https://bugs.launchpad.net/bugs/2077887 dma-buf has become a way to safely acquire a handle to non-struct page memory that can still have lifetime controlled by the exporter. Notably RDMA can now import dma-buf FDs and build them into MRs which allows for PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory from PCI device BARs. The patch design loosely follows the pattern in commit db1a8dd916aa ("habanalabs: add support for dma-buf exporter") except this does not support pinning. Instead, this implements what, in the past, we've called a revocable attachment using move. In normal situations the attachment is pinned, as a BAR does not change physical address. However when the VFIO device is closed, or a PCI reset is issued, access to the MMIO memory is revoked. Revoked means that move occurs, but an attempt to immediately re-map the memory will fail. In the reset case a future move will be triggered when MMIO access returns. As both close and reset are under userspace control it is expected that userspace will suspend use of the dma-buf before doing these operations, the revoke is purely for kernel self-defense against a hostile userspace. Co-authored-by: William Tu [witu: Add new ioctl uAPI for P2P dma buf] Signed-off-by: Jason Gunthorpe Signed-off-by: William Tu --- drivers/vfio/pci/Makefile | 3 +- drivers/vfio/pci/dma_buf.c | 262 +++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_config.c | 10 +- drivers/vfio/pci/vfio_pci_core.c | 23 ++- drivers/vfio/pci/vfio_pci_priv.h | 21 +++ include/linux/vfio_pci_core.h | 1 + include/uapi/linux/vfio.h | 28 +++ 7 files changed, 343 insertions(+), 5 deletions(-) create mode 100644 drivers/vfio/pci/dma_buf.c diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 74ee55f9c261..3ac2ed7d7a0a 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,7 +1,8 @@ # SPDX-License-Identifier: GPL-2.0-only vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o -vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o +vfio-pci-core-$(CONFIG_S390) += vfio_pci_zdev.o +vfio-pci-core-$(CONFIG_DMA_SHARED_BUFFER) += dma_buf.o obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o vfio-pci-y := vfio_pci.o diff --git a/drivers/vfio/pci/dma_buf.c b/drivers/vfio/pci/dma_buf.c new file mode 100644 index 000000000000..14d32a580190 --- /dev/null +++ b/drivers/vfio/pci/dma_buf.c @@ -0,0 +1,262 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. + */ +#include +#include +#include + +#include "vfio_pci_priv.h" + +MODULE_IMPORT_NS(DMA_BUF); + +struct vfio_pci_dma_buf { + struct dma_buf *dmabuf; + struct vfio_pci_core_device *vdev; + struct list_head dmabufs_elm; + unsigned int index; + unsigned int orig_nents; + size_t offset; + bool revoked; +}; + +static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attachment) +{ + struct vfio_pci_dma_buf *priv = dmabuf->priv; + int rc; + + rc = pci_p2pdma_distance_many(priv->vdev->pdev, &attachment->dev, 1, + true); + if (rc < 0) + attachment->peer2peer = false; + return 0; +} + +static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment) +{ +} + +static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment) +{ + /* + * Uses the dynamic interface but must always allow for + * dma_buf_move_notify() to do revoke + */ + return -EINVAL; +} + +static struct sg_table * +vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, + enum dma_data_direction dir) +{ + size_t sgl_size = dma_get_max_seg_size(attachment->dev); + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; + struct scatterlist *sgl; + struct sg_table *sgt; + dma_addr_t dma_addr; + unsigned int nents; + size_t offset; + int ret; + + dma_resv_assert_held(priv->dmabuf->resv); + + if (!attachment->peer2peer) + return ERR_PTR(-EPERM); + + if (priv->revoked) + return ERR_PTR(-ENODEV); + + sgt = kzalloc(sizeof(*sgt), GFP_KERNEL); + if (!sgt) + return ERR_PTR(-ENOMEM); + + nents = DIV_ROUND_UP(priv->dmabuf->size, sgl_size); + ret = sg_alloc_table(sgt, nents, GFP_KERNEL); + if (ret) + goto err_kfree_sgt; + + /* + * Since the memory being mapped is a device memory it could never be in + * CPU caches. + */ + dma_addr = dma_map_resource( + attachment->dev, + pci_resource_start(priv->vdev->pdev, priv->index) + + priv->offset, + priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC); + ret = dma_mapping_error(attachment->dev, dma_addr); + if (ret) + goto err_free_sgt; + + /* + * Break the BAR's physical range up into max sized SGL's according to + * the device's requirement. + */ + sgl = sgt->sgl; + for (offset = 0; offset != priv->dmabuf->size;) { + size_t chunk_size = min(priv->dmabuf->size - offset, sgl_size); + + sg_set_page(sgl, NULL, chunk_size, 0); + sg_dma_address(sgl) = dma_addr + offset; + sg_dma_len(sgl) = chunk_size; + sgl = sg_next(sgl); + offset += chunk_size; + } + + /* + * Because we are not going to include a CPU list we want to have some + * chance that other users will detect this by setting the orig_nents to + * 0 and using only nents (length of DMA list) when going over the sgl + */ + priv->orig_nents = sgt->orig_nents; + sgt->orig_nents = 0; + return sgt; + +err_free_sgt: + sg_free_table(sgt); +err_kfree_sgt: + kfree(sgt); + return ERR_PTR(ret); +} + +static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment, + struct sg_table *sgt, + enum dma_data_direction dir) +{ + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; + + sgt->orig_nents = priv->orig_nents; + dma_unmap_resource(attachment->dev, sg_dma_address(sgt->sgl), + priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC); + sg_free_table(sgt); + kfree(sgt); +} + +static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) +{ + struct vfio_pci_dma_buf *priv = dmabuf->priv; + + /* + * Either this or vfio_pci_dma_buf_cleanup() will remove from the list. + * The refcount prevents both. + */ + if (priv->vdev) { + down_write(&priv->vdev->memory_lock); + list_del_init(&priv->dmabufs_elm); + up_write(&priv->vdev->memory_lock); + vfio_device_put(&priv->vdev->vdev); + } + kfree(priv); +} + +static const struct dma_buf_ops vfio_pci_dmabuf_ops = { + .attach = vfio_pci_dma_buf_attach, + .map_dma_buf = vfio_pci_dma_buf_map, + .pin = vfio_pci_dma_buf_pin, + .unpin = vfio_pci_dma_buf_unpin, + .release = vfio_pci_dma_buf_release, + .unmap_dma_buf = vfio_pci_dma_buf_unmap, +}; + +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, + struct vfio_device_p2p_dma_buf *p2p_dma_buf) +{ + struct vfio_device_p2p_dma_buf get_dma_buf; + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct vfio_pci_dma_buf *priv; + int ret; + + memcpy(&get_dma_buf, p2p_dma_buf, sizeof(get_dma_buf)); + + /* For PCI the region_index is the BAR number like everything else */ + if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX) + return -EINVAL; + + exp_info.ops = &vfio_pci_dmabuf_ops; + exp_info.size = pci_resource_len(vdev->pdev, get_dma_buf.region_index); + if (!exp_info.size) + return -EINVAL; + if (get_dma_buf.offset || get_dma_buf.length) { + if (get_dma_buf.length > exp_info.size || + get_dma_buf.offset >= exp_info.size || + get_dma_buf.length > exp_info.size - get_dma_buf.offset || + get_dma_buf.offset % PAGE_SIZE || + get_dma_buf.length % PAGE_SIZE) + return -EINVAL; + exp_info.size = get_dma_buf.length; + } + exp_info.flags = get_dma_buf.open_flags; + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + INIT_LIST_HEAD(&priv->dmabufs_elm); + priv->offset = get_dma_buf.offset; + priv->index = get_dma_buf.region_index; + + exp_info.priv = priv; + priv->dmabuf = dma_buf_export(&exp_info); + if (IS_ERR(priv->dmabuf)) { + ret = PTR_ERR(priv->dmabuf); + kfree(priv); + return ret; + } + + /* dma_buf_put() now frees priv */ + + down_write(&vdev->memory_lock); + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->revoked = !__vfio_pci_memory_enabled(vdev); + priv->vdev = vdev; + vfio_device_get(&vdev->vdev); + list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); + dma_resv_unlock(priv->dmabuf->resv); + up_write(&vdev->memory_lock); + + /* + * dma_buf_fd() consumes the reference, when the file closes the dmabuf + * will be released. + */ + return dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags); +} + +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) +{ + struct vfio_pci_dma_buf *priv; + struct vfio_pci_dma_buf *tmp; + + lockdep_assert_held_write(&vdev->memory_lock); + + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + if (!dma_buf_try_get(priv->dmabuf)) + continue; + if (priv->revoked != revoked) { + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->revoked = revoked; + dma_buf_move_notify(priv->dmabuf); + dma_resv_unlock(priv->dmabuf->resv); + } + dma_buf_put(priv->dmabuf); + } +} + +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_dma_buf *priv; + struct vfio_pci_dma_buf *tmp; + + down_write(&vdev->memory_lock); + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + if (!dma_buf_try_get(priv->dmabuf)) + continue; + dma_resv_lock(priv->dmabuf->resv, NULL); + list_del_init(&priv->dmabufs_elm); + priv->vdev = NULL; + priv->revoked = true; + dma_buf_move_notify(priv->dmabuf); + dma_resv_unlock(priv->dmabuf->resv); + vfio_device_put(&vdev->vdev); + dma_buf_put(priv->dmabuf); + } + up_write(&vdev->memory_lock); +} diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 78c626b8907d..f14a8b81fe35 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -28,6 +28,8 @@ #include +#include "vfio_pci_priv.h" + /* Fake capability ID for standard config space */ #define PCI_CAP_ID_BASIC 0 @@ -581,10 +583,12 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos, virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY); new_mem = !!(new_cmd & PCI_COMMAND_MEMORY); - if (!new_mem) + if (!new_mem) { vfio_pci_zap_and_down_write_memory_lock(vdev); - else + vfio_pci_dma_buf_move(vdev, true); + } else { down_write(&vdev->memory_lock); + } /* * If the user is writing mem/io enable (new_mem/io) and we @@ -619,6 +623,8 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos, *virt_cmd &= cpu_to_le16(~mask); *virt_cmd |= cpu_to_le16(new_cmd & mask); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); } diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 5904801f6f77..096f8303a207 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -465,6 +465,8 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev) vfio_spapr_pci_eeh_release(vdev->pdev); vfio_pci_core_disable(vdev); + vfio_pci_dma_buf_cleanup(vdev); + mutex_lock(&vdev->igate); if (vdev->err_trigger) { eventfd_ctx_put(vdev->err_trigger); @@ -658,7 +660,10 @@ int vfio_pci_try_reset_function(struct vfio_pci_core_device *vdev) return -EINVAL; vfio_pci_zap_and_down_write_memory_lock(vdev); + vfio_pci_dma_buf_move(vdev, true); ret = pci_try_reset_function(vdev->pdev); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); return ret; @@ -1193,6 +1198,14 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd, default: return -ENOTTY; } + } else if (cmd == VFIO_DEVICE_P2P_DMA_BUF) { + struct vfio_device_p2p_dma_buf p2p_dma_buf; + + if (copy_from_user(&p2p_dma_buf, (void __user *)arg, + sizeof(p2p_dma_buf))) + return -EFAULT; + + return vfio_pci_core_feature_dma_buf(vdev, &p2p_dma_buf); } return -ENOTTY; @@ -1826,6 +1839,7 @@ void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev, INIT_LIST_HEAD(&vdev->vma_list); INIT_LIST_HEAD(&vdev->sriov_pfs_item); init_rwsem(&vdev->memory_lock); + INIT_LIST_HEAD(&vdev->dmabufs); } EXPORT_SYMBOL_GPL(vfio_pci_core_init_device); @@ -2138,11 +2152,16 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, * cause the PCI config space reset without restoring the original * state (saved locally in 'vdev->pm_save'). */ - list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) - vfio_pci_set_power_state(cur, PCI_D0); + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { + vfio_pci_dma_buf_move(cur, true); + } ret = pci_reset_bus(pdev); + list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) + if (__vfio_pci_memory_enabled(cur)) + vfio_pci_dma_buf_move(cur, false); + err_undo: list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) { if (cur == cur_mem) diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index 4971cd95b431..d60e9cb99140 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -4,4 +4,25 @@ int vfio_pci_try_reset_function(struct vfio_pci_core_device *vdev); +#ifdef CONFIG_DMA_SHARED_BUFFER +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, + struct vfio_device_p2p_dma_buf *arg); +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked); +#else +static int +vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, + struct vfio_device_p2p_dma_buf *arg) +{ + return -ENOTTY; +} +static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) +{ +} +static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, + bool revoked) +{ +} +#endif + #endif diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index f22d5a382c15..1136ed4a08f9 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -139,6 +139,7 @@ struct vfio_pci_core_device { struct mutex vma_lock; struct list_head vma_list; struct rw_semaphore memory_lock; + struct list_head dmabufs; }; #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index ef33ea002b0b..18f369426c13 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1002,6 +1002,34 @@ struct vfio_device_feature { */ #define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN (0) +/** + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the + * region selected. + * + * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC, + * etc. offset/length specify a slice of the region to create the dmabuf from. + * If both are 0 then the whole region is used. + * + * Return: The fd number on success, -1 and errno is set on failure. + */ + +/** + * VFIO_DEVICE_P2P_DMA_BUF - _IORW(VFIO_TYPE, VFIO_BASE + 22, + * struct vfio_device_p2p_dma_buf) + * + * Set the region index, open flags, offset and length to create a dma_buf + * for p2p dma. + * + * Return 0 on success, -errno on failure. + */ +struct vfio_device_p2p_dma_buf { + __u32 region_index; + __u32 open_flags; + __u32 offset; + __u64 length; +}; +#define VFIO_DEVICE_P2P_DMA_BUF _IO(VFIO_TYPE, VFIO_BASE + 22) + /* -------- API for Type1 VFIO IOMMU -------- */ /**