From patchwork Tue Apr 25 19:45:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tarun Gupta X-Patchwork-Id: 1773694 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=q7WsRyfx; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Q5XXx6zxgz23s0 for ; Wed, 26 Apr 2023 05:45:36 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1prObQ-00029N-FV; Tue, 25 Apr 2023 19:45:28 +0000 Received: from mail-dm3nam02on2080.outbound.protection.outlook.com ([40.107.95.80] helo=NAM02-DM3-obe.outbound.protection.outlook.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1prObN-00028M-At for kernel-team@lists.ubuntu.com; Tue, 25 Apr 2023 19:45:25 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cyVOIywjh+z9boSpatBvd8iCFoDc4vTaRJEg+h6pgcfMaEvh6OlR84pK67j3LvbhivshubskEzrjWAe9snDoBeVS0xXO0GbZcCfBwBg3vmBhrdEXbugy4F5qgc+8udhgZcjB2NTesy1EMzoLsZwED3r9mgk4ok/uJQY/vLt7et2658z+aDctQj7QghrIco7gKrWW3znutN/EJBw3GQ6c/qYtxwkSx+jJWEM5Yq3VdYzh8iVUyXK//dH9pdcayQ4F4Zhaq9mJ6R31Dy1bgC4hrDVxoeAwWPgSRdpS4hobZwuuO+PAPYn/PVLI44EoG8LzMXQ3Ke+TmYmF/NCieSDR6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Z0N8dFte4tNtUpW0J8wFjOccSpuscVkS0dRdNxEMTO4=; b=crZgSqbTUrYka14ibuLbooyTY2WbecDA6QO8KPFwDMKOfiQ5lB5tKXorzxN5wJqqC7kdK54bYOk9GO0dJpSYb5//ypDA0oCGFURDY4JhoECcVGDVJiJVDflrKEvITJZRpMLHBxddPzVjAxeUzQ0aj74E4Z+gdJg+8PAilrm0ilx+k9VCd5BsZ55paJMPLJvfu6KkQw6YVeRpyqbLioWphAlfy1rGqeN+J/t39JMgFVzgSgqZtCrGH6mKQz3vY5xvwnEsz9gVhAFV1KLfRbeaoNMIFkuV4ElQ9T8y+LXt1gCTXDHl1x6kpeArEVyqOETKqoPrlcZpLUXaeV2Rnuh1wg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=lists.ubuntu.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Z0N8dFte4tNtUpW0J8wFjOccSpuscVkS0dRdNxEMTO4=; b=q7WsRyfx0m8byqwvvjaip4ADSD9FBUtwcaDOPd/gyCrWPRP7+4/r0D7txKhhDcPqIQnUW83H4ZUW7PgcfqDhtZQx5MJxPnoakjfs6LcFCmk8sem5uYl5urKNeu3bfSV4IdekPcNw4TwAEuYxOCvT82D89wqgZPmjmdr6FaLeO0PeuSd+wjB7Q79/agD0d+9aWI7yxLypKRp+U/ioHRWYw4Yz03ZeSgkIHkiMn0AtMcRX93+N8JihC84ewQzkTBr52upmSm4Eiq7sOHKslfGu9sfioevSow2d0FRsV751skHWZ0K6xYiRm72I4nXTniw6byG8yhZ00zFcnVcofRJ5dA== Received: from MW2PR16CA0028.namprd16.prod.outlook.com (2603:10b6:907::41) by DS0PR12MB7656.namprd12.prod.outlook.com (2603:10b6:8:11f::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6319.33; Tue, 25 Apr 2023 19:45:22 +0000 Received: from CO1NAM11FT044.eop-nam11.prod.protection.outlook.com (2603:10b6:907:0:cafe::b9) by MW2PR16CA0028.outlook.office365.com (2603:10b6:907::41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6319.34 via Frontend Transport; Tue, 25 Apr 2023 19:45:22 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by CO1NAM11FT044.mail.protection.outlook.com (10.13.175.188) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6340.21 via Frontend Transport; Tue, 25 Apr 2023 19:45:20 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Tue, 25 Apr 2023 12:45:15 -0700 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Tue, 25 Apr 2023 12:45:15 -0700 Received: from nvidia.com (10.127.8.9) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.37 via Frontend Transport; Tue, 25 Apr 2023 12:45:12 -0700 From: Tarun Gupta To: , , , , , Subject: [SRU][Lunar][PATCH 1/1] UBUNTU: SAUCE: Add mdev_set_iommu_device() kABI. Date: Wed, 26 Apr 2023 01:15:08 +0530 Message-ID: <20230425194508.88398-2-targupta@nvidia.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230425194508.88398-1-targupta@nvidia.com> References: <20230425194508.88398-1-targupta@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT044:EE_|DS0PR12MB7656:EE_ X-MS-Office365-Filtering-Correlation-Id: 4b6487e8-b437-4815-283c-08db45c59ab3 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: UOhOoLJU0ux0Dzi2HhGIpetHejRQTCVXe5ExQgF8Bm+9S60GGh8EePDoWCDP8kKX6jVUsm/ufOrwIw4Tpvnp8y9OtqS141cUCl24DygUwSgfXaUNqoW5IHr/hlA4oM5gfvawD4NijYpU3trf1SwTOLgup1zXDoErVeRRvE5qD1txzhnMX+N8VjB0Uv/HeJcCH4OQf928jbDR/LWBTiTs5CTD9uPOS5B54TsnueeXm7nv9vCbOUU+fstTBr20RDaZGQBPqNhOxBHXh3jmaALMKUpqu8JRHg6hotAtnsErJ00oM1VPu46dL0/c3Ic64kmqzNy8Z3ZYI8l66hfeRR081lKA1ix8ja5BJn8JXXtRTfqYm8Nc3bElwlLygV4W0QFtAzUT+9LrJGGZfu0JZG5b+Su1s4yeUHb1GcIbE8i4pWPUKwm6W3DtHlixWpZGvrZwn7wnpf0794GebRTVixW3po1FcAvMe/ARf+J+p5LkWf4E3b74jMKjf3sHFTdQxTk8FQO2ncBk9my6YAREoqD14qK8vKRSqxO+oyg3/BXkMiKramFosCT2PafuHeFALu8Q9khXceqOsh5CrmjAaUVo5JDkfVDkTb58cIunPxfBcqaDhue6rW6fv/wj4dtESibNglNYm+wE+18d12pVxxN6iE2AaXhM2tT6Ne0Gp0JREn5DUPTuKtXiqheXYrcXnkSkldtW9NWpwf07zwEZ7FZc4kRD4MGmAetr0Hstwh9pMWO3riOEyqRS+pDTGDKooh8JRbNWhHu7yhLrnQhiTygFaQ== X-Forefront-Antispam-Report: CIP:216.228.118.233; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc7edge2.nvidia.com; CAT:NONE; SFS:(13230028)(4636009)(136003)(39860400002)(396003)(376002)(346002)(451199021)(40470700004)(36840700001)(46966006)(70586007)(70206006)(82740400003)(83380400001)(30864003)(2906002)(36860700001)(55016003)(40480700001)(26005)(1076003)(82310400005)(316002)(6666004)(4326008)(47076005)(2616005)(426003)(356005)(8676002)(336012)(7636003)(41300700001)(5660300002)(8936002)(7696005)(110136005)(86362001)(40460700003)(478600001)(6286002)(107886003)(36756003)(186003)(34020700004)(966005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Apr 2023 19:45:20.8426 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4b6487e8-b437-4815-283c-08db45c59ab3 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.118.233]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT044.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7656 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" With below commit present from 5.16+ upstream kernel onwards, support mdev_set_iommu_device() kABI has been removed from kernel due to lack of in-tree vendor drivers using the kABI. fda49d97f2c4 ("vfio: remove the unused mdev iommu hook") This patch partially reverts the above commit so that mdev_set_iommu_device() kABI is still supported with HWE kernels for Ubuntu 22.04. In this partial revert, have not added back the code for "aux" variants (IOMMU_DEV_FEAT_AUX) present in vfio_mdev_[attach|detach]_domain as this support was never added by any in-tree driver or known out-of-tree driver. Nvidia vGPU doesn't make use of IOMMU_DEV_FEAT_AUX feature. Also, it adds back the vfio_bus_is_mdev() function which was reverted in below patch as there were no users of it. This patch adds it back to detect if this is an mdev device. c3c0fa9d94f7 ("vfio: clean up the check for mediated device in vfio_iommu_type1") Also, in v6.2 kernel, "mdev_bus_type" struct has been unexported as part of below commit because it was used only in mdev.ko. But, for vGPU, as mentioned above, since we use vfio_bus_is_mdev() fn in vfio_iommu_type1.ko, we need to again export "mdev_bus_type" struct. 2815fe149ffa ("vfio/mdev: unexport mdev_bus_type") It is not a clean revert in vfio_iommu_type1_attach_group() fn as it is changed in v6.2 upstream kernel compared to when mdev_set_iommu_device() kABI was removed in 5.16 kernel. In 5.19 kernel, VFIO_EMULATED_IOMMU handling is introduced in vfio_iommu_type1_attach_group() fn which was not present when the patch was reverted in 5.16 kernel. But, the logic remains the same. The logic here is if this is an vfio-mdev device, then check by calling vfio_mdev_iommu_device() if this mdev device already has an backed IOMMU device (which will be provided by mdev_set_iommu_device kABI from vendor driver). If the mdev device has backed iommu device, then use that device's IOMMU domain. This kABI is used by SRIOV based Nvidia vGPU to pin all guest sysmem on VF during vGPU VM boot. With this patch, SRIOV based Nvidia vGPU will continue to work with upstream kernels. Nvidia vGPU driver calls mdev_set_iommu_device() for mdev device with VF as the backing IOMMU. BugLink : https://bugs.launchpad.net/bugs/1988806 Signed-off-by: Tarun Gupta Acked-by: Andrei Gherzan --- drivers/vfio/mdev/mdev_driver.c | 1 + drivers/vfio/mdev/mdev_private.h | 1 - drivers/vfio/vfio_iommu_type1.c | 126 ++++++++++++++++++++++++++++--- include/linux/mdev.h | 22 ++++++ 4 files changed, 140 insertions(+), 10 deletions(-) diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c index 7825d83a55f8..a4799e7d79fc 100644 --- a/drivers/vfio/mdev/mdev_driver.c +++ b/drivers/vfio/mdev/mdev_driver.c @@ -46,6 +46,7 @@ struct bus_type mdev_bus_type = { .remove = mdev_remove, .match = mdev_match, }; +EXPORT_SYMBOL_GPL(mdev_bus_type); /** * mdev_register_driver - register a new MDEV driver diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h index af457b27f607..ba1b2dbddc0b 100644 --- a/drivers/vfio/mdev/mdev_private.h +++ b/drivers/vfio/mdev/mdev_private.h @@ -13,7 +13,6 @@ int mdev_bus_register(void); void mdev_bus_unregister(void); -extern struct bus_type mdev_bus_type; extern const struct attribute_group *mdev_device_groups[]; #define to_mdev_type_attr(_attr) \ diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 7fa68dc4e938..fef221a87aa7 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include "vfio.h" @@ -115,6 +116,7 @@ struct vfio_batch { struct vfio_iommu_group { struct iommu_group *iommu_group; struct list_head next; + bool mdev_group; bool pinned_page_dirty_scope; }; @@ -1744,6 +1746,18 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, return ret; } +static int vfio_bus_type(struct device *dev, void *data) +{ + struct bus_type **bus = data; + + if (*bus && *bus != dev->bus) + return -EINVAL; + + *bus = dev->bus; + + return 0; +} + static int vfio_iommu_replay(struct vfio_iommu *iommu, struct vfio_domain *domain) { @@ -1992,6 +2006,81 @@ static bool vfio_iommu_has_sw_msi(struct list_head *group_resv_regions, return ret; } +static int vfio_mdev_attach_domain(struct device *dev, void *data) +{ + struct mdev_device *mdev = to_mdev_device(dev); + struct iommu_domain *domain = data; + struct device *iommu_device; + + iommu_device = mdev_get_iommu_device(mdev); + if (iommu_device) + return iommu_attach_device(domain, iommu_device); + + return -EINVAL; +} + +static int vfio_mdev_detach_domain(struct device *dev, void *data) +{ + struct mdev_device *mdev = to_mdev_device(dev); + struct iommu_domain *domain = data; + struct device *iommu_device; + + iommu_device = mdev_get_iommu_device(mdev); + if (iommu_device) + iommu_detach_device(domain, iommu_device); + + return 0; +} + +static int vfio_iommu_attach_group(struct vfio_domain *domain, + struct vfio_iommu_group *group) +{ + if (group->mdev_group) + return iommu_group_for_each_dev(group->iommu_group, + domain->domain, + vfio_mdev_attach_domain); + else + return iommu_attach_group(domain->domain, group->iommu_group); +} + +static void vfio_iommu_detach_group(struct vfio_domain *domain, + struct vfio_iommu_group *group) +{ + if (group->mdev_group) + iommu_group_for_each_dev(group->iommu_group, domain->domain, + vfio_mdev_detach_domain); + else + iommu_detach_group(domain->domain, group->iommu_group); +} + +static bool vfio_bus_is_mdev(struct bus_type *bus) +{ + struct bus_type *mdev_bus; + bool ret = false; + + mdev_bus = symbol_get(mdev_bus_type); + if (mdev_bus) { + ret = (bus == mdev_bus); + symbol_put(mdev_bus_type); + } + + return ret; +} + +static int vfio_mdev_iommu_device(struct device *dev, void *data) +{ + struct mdev_device *mdev = to_mdev_device(dev); + struct device **old = data, *new; + + new = mdev_get_iommu_device(mdev); + if (!new || (*old && *old != new)) + return -EINVAL; + + *old = new; + + return 0; +} + /* * This is a helper function to insert an address range to iova list. * The list is initially created with a single entry corresponding to @@ -2260,6 +2349,25 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, group->iommu_group = iommu_group; if (type == VFIO_EMULATED_IOMMU) { + struct bus_type *bus = NULL; + + ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type); + + if (!ret && vfio_bus_is_mdev(bus)) { + struct device *iommu_device = NULL; + + group->mdev_group = true; + + /* Determine the isolation type */ + ret = iommu_group_for_each_dev(iommu_group, + &iommu_device, + vfio_mdev_iommu_device); + if (!ret && iommu_device) { + iommu_group = iommu_device->iommu_group; + goto mdev_iommu_device; + } + } + list_add(&group->next, &iommu->emulated_iommu_groups); /* * An emulated IOMMU group cannot dirty memory directly, it can @@ -2272,6 +2380,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, goto out_unlock; } +mdev_iommu_device: + ret = -ENOMEM; domain = kzalloc(sizeof(*domain), GFP_KERNEL); if (!domain) @@ -2294,7 +2404,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, goto out_domain; } - ret = iommu_attach_group(domain->domain, group->iommu_group); + ret = vfio_iommu_attach_group(domain, group); if (ret) goto out_domain; @@ -2370,17 +2480,15 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, if (d->domain->ops == domain->domain->ops && d->enforce_cache_coherency == domain->enforce_cache_coherency) { - iommu_detach_group(domain->domain, group->iommu_group); - if (!iommu_attach_group(d->domain, - group->iommu_group)) { + vfio_iommu_detach_group(domain, group); + if (!vfio_iommu_attach_group(d, group)) { list_add(&group->next, &d->group_list); iommu_domain_free(domain->domain); kfree(domain); goto done; } - ret = iommu_attach_group(domain->domain, - group->iommu_group); + ret = vfio_iommu_attach_group(domain, group); if (ret) goto out_domain; } @@ -2417,7 +2525,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, return 0; out_detach: - iommu_detach_group(domain->domain, group->iommu_group); + vfio_iommu_detach_group(domain, group); out_domain: iommu_domain_free(domain->domain); vfio_iommu_iova_free(&iova_copy); @@ -2578,7 +2686,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, if (!group) continue; - iommu_detach_group(domain->domain, group->iommu_group); + vfio_iommu_detach_group(domain, group); update_dirty_scope = !group->pinned_page_dirty_scope; list_del(&group->next); kfree(group); @@ -2669,7 +2777,7 @@ static void vfio_release_domain(struct vfio_domain *domain) list_for_each_entry_safe(group, group_tmp, &domain->group_list, next) { - iommu_detach_group(domain->domain, group->iommu_group); + vfio_iommu_detach_group(domain, group); list_del(&group->next); kfree(group); } diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 139d05b26f82..b08163d67e63 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -20,6 +20,7 @@ struct mdev_device { guid_t uuid; struct list_head next; struct mdev_type *type; + struct device *iommu_device; bool active; }; @@ -53,6 +54,25 @@ static inline struct mdev_device *to_mdev_device(struct device *dev) return container_of(dev, struct mdev_device, dev); } +/* + * Called by the parent device driver to set the device which represents + * this mdev in iommu protection scope. By default, the iommu device is + * NULL, that indicates using vendor defined isolation. + * + * @dev: the mediated device that iommu will isolate. + * @iommu_device: a pci device which represents the iommu for @dev. + */ +static inline void mdev_set_iommu_device(struct mdev_device *mdev, + struct device *iommu_device) +{ + mdev->iommu_device = iommu_device; +} + +static inline struct device *mdev_get_iommu_device(struct mdev_device *mdev) +{ + return mdev->iommu_device; +} + /** * struct mdev_driver - Mediated device driver * @device_api: string to return for the device_api sysfs @@ -73,6 +93,8 @@ struct mdev_driver { struct device_driver driver; }; +extern struct bus_type mdev_bus_type; + int mdev_register_parent(struct mdev_parent *parent, struct device *dev, struct mdev_driver *mdev_driver, struct mdev_type **types, unsigned int nr_types);