{"id":2231633,"url":"http://patchwork.ozlabs.org/api/patches/2231633/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-pci/patch/20260501004157.3108202-10-mrathor@linux.microsoft.com/","project":{"id":28,"url":"http://patchwork.ozlabs.org/api/projects/28/?format=json","name":"Linux PCI development","link_name":"linux-pci","list_id":"linux-pci.vger.kernel.org","list_email":"linux-pci@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20260501004157.3108202-10-mrathor@linux.microsoft.com>","list_archive_url":null,"date":"2026-05-01T00:41:55","name":"[V2,09/11] x86/hyperv: Implement hyperv virtual IOMMU","commit_ref":null,"pull_url":null,"state":"new","archived":false,"hash":"e886e740c257a57bde65d3ca746bbb437d694688","submitter":{"id":91512,"url":"http://patchwork.ozlabs.org/api/people/91512/?format=json","name":"Mukesh R","email":"mrathor@linux.microsoft.com"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/linux-pci/patch/20260501004157.3108202-10-mrathor@linux.microsoft.com/mbox/","series":[{"id":502410,"url":"http://patchwork.ozlabs.org/api/series/502410/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-pci/list/?series=502410","date":"2026-05-01T00:41:46","name":"PCI passthru on Hyper-V (Part I)","version":2,"mbox":"http://patchwork.ozlabs.org/series/502410/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/2231633/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/2231633/checks/","tags":{},"related":[],"headers":{"Return-Path":"\n <linux-pci+bounces-53569-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-pci@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=linux.microsoft.com header.i=@linux.microsoft.com\n header.a=rsa-sha256 header.s=default header.b=cNwb0xrc;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c04:e001:36c::12fc:5321; helo=tor.lore.kernel.org;\n envelope-from=linux-pci+bounces-53569-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (1024-bit key) header.d=linux.microsoft.com\n header.i=@linux.microsoft.com header.b=\"cNwb0xrc\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=13.77.154.182","smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=linux.microsoft.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=linux.microsoft.com"],"Received":["from tor.lore.kernel.org (tor.lore.kernel.org\n [IPv6:2600:3c04:e001:36c::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g6C5Q1MhDz1yHZ\n\tfor <incoming@patchwork.ozlabs.org>; Fri, 01 May 2026 10:46:02 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id 07B80307CFCE\n\tfor <incoming@patchwork.ozlabs.org>; Fri,  1 May 2026 00:43:30 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 2A3602BEC23;\n\tFri,  1 May 2026 00:42:37 +0000 (UTC)","from linux.microsoft.com (linux.microsoft.com [13.77.154.182])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id DD66B295DA6;\n\tFri,  1 May 2026 00:42:28 +0000 (UTC)","from mrdev.corp.microsoft.com (unknown [40.86.183.173])\n\tby linux.microsoft.com (Postfix) with ESMTPSA id 2DBDA20B716C;\n\tThu, 30 Apr 2026 17:42:23 -0700 (PDT)"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1777596156; cv=none;\n b=m+V4VO7nJk+K4wiK21QsL5NNTgIrwlvi2alqqBPxBCY3DLXNXnag4djPg6/1BwzE70WD0HRVMBM/w9LcHXoB7l1r0nobuJOPi/dLQGEjVlkDnAWYigXpObluUa5LN5xLSsyQ6kBk3i/vTn+sM70RBSn7SsbHhp3ib5tbCVREef4=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1777596156; c=relaxed/simple;\n\tbh=hTS001BYPosNKbY6kF43EWNCqHl7yBOdNpRhLY5RdeI=;\n\th=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:\n\t MIME-Version;\n b=MkUCE2hhNzOO0sdCjAOZMW+Jb3qM99PWZIAvecb05zF1obQ3OZZA0lgEMn01HEswxUhjQ3E3joXx9BSFeRc06mDlegIprdIe7pdFsKUlKv3+HTMIIb0ehfp0UykeoT+NDWwnFAmaVq17juhMsGytutePc/N2h6IXgCq5VfqEH/k=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=linux.microsoft.com;\n spf=pass smtp.mailfrom=linux.microsoft.com;\n dkim=pass (1024-bit key) header.d=linux.microsoft.com\n header.i=@linux.microsoft.com header.b=cNwb0xrc;\n arc=none smtp.client-ip=13.77.154.182","DKIM-Filter":"OpenDKIM Filter v2.11.0 linux.microsoft.com 2DBDA20B716C","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com;\n\ts=default; t=1777596144;\n\tbh=aJpa7w/3/N5G5VgtZxAGYTTw+8QLD9ulUmSjCFyS90c=;\n\th=From:To:Cc:Subject:Date:In-Reply-To:References:From;\n\tb=cNwb0xrcR0u4WZq+W4mM6mfyCMrSO7BhNVslVvbxY9hWrVaKSQncMBwCK4AXGC0An\n\t KVNBLJCAsdhp/Xyvc+UitATgEkB8w120wIL0VO0yY4fh2bRfYEnwqlpKR7faxUrKrq\n\t 1aW45S1pcfZIjUcgm3b40RiELQaS0uxMcOTH0P48=","From":"Mukesh R <mrathor@linux.microsoft.com>","To":"hpa@zytor.com,\n\trobin.murphy@arm.com,\n\trobh@kernel.org,\n\twei.liu@kernel.org,\n\tmrathor@linux.microsoft.com,\n\tmhklinux@outlook.com,\n\tmuislam@microsoft.com,\n\tnamjain@linux.microsoft.com,\n\tmagnuskulke@linux.microsoft.com,\n\tanbelski@linux.microsoft.com,\n\tlinux-kernel@vger.kernel.org,\n\tlinux-hyperv@vger.kernel.org,\n\tiommu@lists.linux.dev,\n\tlinux-pci@vger.kernel.org,\n\tlinux-arch@vger.kernel.org","Cc":"kys@microsoft.com,\n\thaiyangz@microsoft.com,\n\tdecui@microsoft.com,\n\tlongli@microsoft.com,\n\ttglx@kernel.org,\n\tmingo@redhat.com,\n\tbp@alien8.de,\n\tdave.hansen@linux.intel.com,\n\tx86@kernel.org,\n\tjoro@8bytes.org,\n\twill@kernel.org,\n\tlpieralisi@kernel.org,\n\tkwilczynski@kernel.org,\n\tbhelgaas@google.com,\n\tarnd@arndb.de","Subject":"[PATCH V2 09/11] x86/hyperv: Implement hyperv virtual IOMMU","Date":"Thu, 30 Apr 2026 17:41:55 -0700","Message-ID":"<20260501004157.3108202-10-mrathor@linux.microsoft.com>","X-Mailer":"git-send-email 2.51.2.vfs.0.1","In-Reply-To":"<20260501004157.3108202-1-mrathor@linux.microsoft.com>","References":"<20260501004157.3108202-1-mrathor@linux.microsoft.com>","Precedence":"bulk","X-Mailing-List":"linux-pci@vger.kernel.org","List-Id":"<linux-pci.vger.kernel.org>","List-Subscribe":"<mailto:linux-pci+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-pci+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Transfer-Encoding":"8bit"},"content":"Add a new file to implement management of device domains, mapping and\nunmapping of IOMMU memory, and other iommu_ops to fit within the VFIO\nframework for PCI passthru on Hyper-V running Linux as baremetal root\nor L1VH root. This also implements direct attach mechanism (see below),\na special feature of Hyper-V for PCI passthru, and it is also made to\nwork within the VFIO framework.\n\nAt a high level, during boot the hypervisor creates a default identity\ndomain and attaches all devices to it. This nicely maps to Linux IOMMU\nsubsystem IOMMU_DOMAIN_IDENTITY domain. As a result, Linux does not\nneed to explicitly ask Hyper-V to attach devices and do maps/unmaps\nduring boot. As mentioned previously, Hyper-V supports two ways to do\nPCI passthru:\n\n  1. Device Domain (aka Domain Attach): root must create a device domain\n     in the hypervisor, and do map/unmap hypercalls for mapping and\n     unmapping guest RAM for DMA. All hypervisor communications use\n     device ID of type PCI for identifying and referencing the device.\n\n  2. Direct Attach: the hypervisor will simply use the guest's HW\n     page table for mappings, thus the root need not map/unmap guest\n     memory for DMA. As such, direct attach passthru setup during guest\n     boot is extremely fast. A direct attached device must always be\n     referenced via logical device ID and not via the PCI device ID.\n\nAt present, L1VH root only supports direct attaches. Also direct attach is\ndefault in non-L1VH cases because there are some significant performance\nissues with domain attach implementations currently for guests with higher\nRAM (say more than 8GB), and that unfortunately cannot be addressed in\nthe short term.\n\nCo-developed-by: Wei Liu <wei.liu@kernel.org>\nSigned-off-by: Wei Liu <wei.liu@kernel.org>\nSigned-off-by: Mukesh R <mrathor@linux.microsoft.com>\n---\n MAINTAINERS                       |   1 +\n arch/x86/kernel/pci-dma.c         |   2 +\n drivers/iommu/Kconfig             |   5 +-\n drivers/iommu/Makefile            |   1 +\n drivers/iommu/hyperv-iommu-root.c | 908 ++++++++++++++++++++++++++++++\n include/asm-generic/mshyperv.h    |  17 +\n include/linux/hyperv.h            |   6 +\n 7 files changed, 937 insertions(+), 3 deletions(-)\n create mode 100644 drivers/iommu/hyperv-iommu-root.c","diff":"diff --git a/MAINTAINERS b/MAINTAINERS\nindex f803a6a38fee..8ae040b89a56 100644\n--- a/MAINTAINERS\n+++ b/MAINTAINERS\n@@ -11914,6 +11914,7 @@ F:\tdrivers/clocksource/hyperv_timer.c\n F:\tdrivers/hid/hid-hyperv.c\n F:\tdrivers/hv/\n F:\tdrivers/input/serio/hyperv-keyboard.c\n+F:\tdrivers/iommu/hyperv-iommu-root.c\n F:\tdrivers/iommu/hyperv-irq.c\n F:\tdrivers/net/ethernet/microsoft/\n F:\tdrivers/net/hyperv/\ndiff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c\nindex 6267363e0189..cfeee6505e17 100644\n--- a/arch/x86/kernel/pci-dma.c\n+++ b/arch/x86/kernel/pci-dma.c\n@@ -8,6 +8,7 @@\n #include <linux/gfp.h>\n #include <linux/pci.h>\n #include <linux/amd-iommu.h>\n+#include <linux/hyperv.h>\n \n #include <asm/proto.h>\n #include <asm/dma.h>\n@@ -105,6 +106,7 @@ void __init pci_iommu_alloc(void)\n \tgart_iommu_hole_init();\n \tamd_iommu_detect();\n \tdetect_intel_iommu();\n+\thv_iommu_detect();\n \tswiotlb_init(x86_swiotlb_enable, x86_swiotlb_flags);\n }\n \ndiff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig\nindex f86262b11416..7909cf4373a6 100644\n--- a/drivers/iommu/Kconfig\n+++ b/drivers/iommu/Kconfig\n@@ -352,13 +352,12 @@ config MTK_IOMMU_V1\n \t  if unsure, say N here.\n \n config HYPERV_IOMMU\n-\tbool \"Hyper-V IRQ Handling\"\n+\tbool \"Hyper-V IOMMU Unit\"\n \tdepends on HYPERV && X86\n \tselect IOMMU_API\n \tdefault HYPERV\n \thelp\n-\t  Stub IOMMU driver to handle IRQs to support Hyper-V Linux\n-\t  guest and root partitions.\n+\t  Hyper-V pseudo IOMMU unit.\n \n config VIRTIO_IOMMU\n \ttristate \"Virtio IOMMU driver\"\ndiff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile\nindex 335ea77cced6..296fbc6ca829 100644\n--- a/drivers/iommu/Makefile\n+++ b/drivers/iommu/Makefile\n@@ -31,6 +31,7 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o\n obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o\n obj-$(CONFIG_S390_IOMMU) += s390-iommu.o\n obj-$(CONFIG_HYPERV) += hyperv-irq.o\n+obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu-root.o\n obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o\n obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o\n obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o\ndiff --git a/drivers/iommu/hyperv-iommu-root.c b/drivers/iommu/hyperv-iommu-root.c\nnew file mode 100644\nindex 000000000000..739bbf39dea2\n--- /dev/null\n+++ b/drivers/iommu/hyperv-iommu-root.c\n@@ -0,0 +1,908 @@\n+// SPDX-License-Identifier: GPL-2.0\n+/*\n+ * Hyper-V root vIOMMU driver.\n+ * Copyright (C) 2026, Microsoft, Inc.\n+ */\n+\n+#include <linux/pci.h>\n+#include <linux/dma-map-ops.h>\n+#include <linux/interval_tree.h>\n+#include <linux/hyperv.h>\n+#include \"dma-iommu.h\"\n+#include <asm/iommu.h>\n+#include <asm/mshyperv.h>\n+\n+/* We will not claim these PCI devices, eg hypervisor needs it for debugger */\n+static char *pci_devs_to_skip;\n+static int __init hv_iommu_setup_skip(char *str)\n+{\n+\tpci_devs_to_skip = str;\n+\n+\treturn 0;\n+}\n+/* hv_iommu_skip=(SSSS:BB:DD.F)(SSSS:BB:DD.F) */\n+__setup(\"hv_iommu_skip=\", hv_iommu_setup_skip);\n+\n+bool hv_no_attdev;\t /* disable direct device attach for passthru */\n+EXPORT_SYMBOL_GPL(hv_no_attdev);\n+static int __init setup_hv_no_attdev(char *str)\n+{\n+\thv_no_attdev = true;\n+\treturn 0;\n+}\n+__setup(\"hv_no_attdev\", setup_hv_no_attdev);\n+\n+/* Iommu device that we export to the world. HyperV supports max of one */\n+static struct iommu_device hv_virt_iommu;\n+\n+struct hv_domain {\n+\tstruct iommu_domain iommu_dom;\n+\tu32 domid_num;\t\t\t      /* as opposed to domain_id.type */\n+\tbool attached_dom;\t\t      /* is this direct attached dom? */\n+\tu64 partid;\t\t\t      /* partition id */\n+\tspinlock_t mappings_lock;\t      /* protects mappings_tree */\n+\tstruct rb_root_cached mappings_tree;  /* iova to pa lookup tree */\n+};\n+\n+#define to_hv_domain(d) container_of(d, struct hv_domain, iommu_dom)\n+\n+struct hv_iommu_mapping {\n+\tphys_addr_t paddr;\n+\tstruct interval_tree_node iova;\n+\tu32 flags;\n+};\n+\n+/*\n+ * By default, during boot the hypervisor creates one Stage 2 (S2) default\n+ * domain. Stage 2 means that the page table is controlled by the hypervisor.\n+ *   S2 default: access to entire root partition memory. This for us easily\n+ *\t\t maps to IOMMU_DOMAIN_IDENTITY in the iommu subsystem, and\n+ *\t\t is called HV_DEVICE_DOMAIN_ID_S2_DEFAULT in the hypervisor.\n+ *\n+ * Device Management:\n+ *   There are two ways to manage device attaches to domains:\n+ *     1. Domain Attach: A device domain is created in the hypervisor, the\n+ *\t\t\t device is attached to this domain, and then memory\n+ *\t\t\t ranges are mapped in the map callbacks.\n+ *     2. Direct Attach: No need to create a domain in the hypervisor for direct\n+ *\t\t\t attached devices. A hypercall is made to tell the\n+ *\t\t\t hypervisor to attach the device to a guest. There is\n+ *\t\t\t no need for explicit memory mappings because the\n+ *\t\t\t hypervisor will just use the guest HW page table.\n+ *\n+ * Since a direct attach is much faster, it is the default. This can be\n+ * changed via hv_no_attdev.\n+ *\n+ * L1VH: hypervisor only supports direct attach.\n+ */\n+\n+/*\n+ * Create dummy domains to correspond to hypervisor prebuilt default identity\n+ * and null domains (dummy because we do not make hypercalls to create them).\n+ */\n+static struct hv_domain hv_def_identity_dom;\n+static struct hv_domain hv_null_dom;\n+\n+static bool hv_special_domain(struct hv_domain *hvdom)\n+{\n+\treturn hvdom == &hv_def_identity_dom || hvdom == &hv_null_dom;\n+}\n+\n+struct iommu_domain_geometry default_geometry = (struct iommu_domain_geometry) {\n+\t.aperture_start = 0,\n+\t.aperture_end = -1UL,\n+\t.force_aperture = true,\n+};\n+\n+/*\n+ * Since the relevant hypercalls can only fit less than 512 PFNs in the pfn\n+ * array, report 1M max.\n+ */\n+#define HV_IOMMU_PGSIZES (SZ_4K | SZ_1M)\n+\n+static u32 unique_id;\t      /* unique numeric id of a new domain */\n+\n+static void hv_iommu_detach_dev(struct iommu_domain *immdom,\n+\t\t\t\tstruct device *dev);\n+static size_t hv_iommu_unmap_pages(struct iommu_domain *immdom, ulong iova,\n+\t\t\t\t   size_t pgsize, size_t pgcount,\n+\t\t\t\t   struct iommu_iotlb_gather *gather);\n+\n+/*\n+ * If the current thread is a VMM thread, return the partition id of the VM it\n+ * is managing, else return HV_PARTITION_ID_INVALID.\n+ */\n+u64 hv_get_current_partid(void)\n+{\n+\tu64 (*fn)(void);\n+\tu64 ptid;\n+\n+\tfn = symbol_get(mshv_current_partid);\n+\tif (!fn)\n+\t\treturn HV_PARTITION_ID_INVALID;\n+\n+\tptid = fn();\n+\tsymbol_put(mshv_current_partid);\n+\n+\treturn ptid;\n+}\n+EXPORT_SYMBOL_GPL(hv_get_current_partid);\n+\n+/* If this is a VMM thread, then this domain is for a guest vm */\n+static bool hv_curr_thread_is_vmm(void)\n+{\n+\treturn hv_get_current_partid() != HV_PARTITION_ID_INVALID;\n+}\n+\n+/* As opposed to some host app like SPDK etc... */\n+static bool hv_dom_owner_is_vmm(struct hv_domain *hvdom)\n+{\n+\treturn hvdom && hvdom->partid != HV_PARTITION_ID_INVALID;\n+}\n+\n+static bool hv_iommu_capable(struct device *dev, enum iommu_cap cap)\n+{\n+\tswitch (cap) {\n+\tcase IOMMU_CAP_CACHE_COHERENCY:\n+\t\treturn true;\n+\tdefault:\n+\t\treturn false;\n+\t}\n+}\n+\n+/*\n+ * Check if given pci device is a direct attached device. Caller must have\n+ * verified pdev is a valid pci device.\n+ */\n+bool hv_pcidev_is_attached_dev(struct pci_dev *pdev)\n+{\n+\tstruct iommu_domain *iommu_domain;\n+\tstruct hv_domain *hvdom;\n+\tstruct device *dev = &pdev->dev;\n+\n+\tiommu_domain = iommu_get_domain_for_dev(dev);\n+\tif (iommu_domain) {\n+\t\thvdom = to_hv_domain(iommu_domain);\n+\t\treturn hvdom->attached_dom;\n+\t}\n+\n+\treturn false;\n+}\n+EXPORT_SYMBOL_GPL(hv_pcidev_is_attached_dev);\n+\n+bool hv_pcidev_is_pthru_dev(struct pci_dev *pdev)\n+{\n+\tstruct device *dev = &pdev->dev;\n+\tstruct hv_domain *hvdom = dev_iommu_priv_get(dev);\n+\n+\tif (hvdom && !hv_special_domain(hvdom))\n+\t\treturn true;\n+\n+\treturn false;\n+}\n+EXPORT_SYMBOL_GPL(hv_pcidev_is_pthru_dev);\n+\n+/* Build device id for direct attached devices */\n+static u64 hv_build_devid_type_logical(struct pci_dev *pdev)\n+{\n+\thv_pci_segment segment;\n+\tunion hv_device_id hv_devid;\n+\tunion hv_pci_bdf bdf = {.as_uint16 = 0};\n+\tu32 rid = PCI_DEVID(pdev->bus->number, pdev->devfn);\n+\n+\tsegment = pci_domain_nr(pdev->bus);\n+\tbdf.bus = PCI_BUS_NUM(rid);\n+\tbdf.device = PCI_SLOT(rid);\n+\tbdf.function = PCI_FUNC(rid);\n+\n+\thv_devid.as_uint64 = 0;\n+\thv_devid.device_type = HV_DEVICE_TYPE_LOGICAL;\n+\thv_devid.logical.id = (u64)segment << 16 | bdf.as_uint16;\n+\n+\treturn hv_devid.as_uint64;\n+}\n+\n+u64 hv_build_devid_oftype(struct pci_dev *pdev, enum hv_device_type type)\n+{\n+\tif (type == HV_DEVICE_TYPE_LOGICAL) {\n+\t\tif (hv_l1vh_partition())\n+\t\t\treturn hv_pci_vmbus_device_id(pdev);\n+\t\telse\n+\t\t\treturn hv_build_devid_type_logical(pdev);\n+\t} else if (type == HV_DEVICE_TYPE_PCI)\n+#ifdef CONFIG_X86\n+\t\treturn hv_build_devid_type_pci(pdev);\n+#else\n+\t\treturn 0;\n+#endif\n+\treturn 0;\n+}\n+EXPORT_SYMBOL_GPL(hv_build_devid_oftype);\n+\n+/* Create a new device domain in the hypervisor */\n+static int hv_iommu_create_hyp_devdom(struct hv_domain *hvdom)\n+{\n+\tu64 status;\n+\tstruct hv_input_device_domain *ddp;\n+\tstruct hv_input_create_device_domain *input;\n+\tunsigned long flags;\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tddp = &input->device_domain;\n+\tddp->partition_id = HV_PARTITION_ID_SELF;\n+\tddp->domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;\n+\tddp->domain_id.id = hvdom->domid_num;\n+\n+\tinput->create_device_domain_flags.forward_progress_required = 1;\n+\tinput->create_device_domain_flags.inherit_owning_vtl = 0;\n+\n+\tstatus = hv_do_hypercall(HVCALL_CREATE_DEVICE_DOMAIN, input, NULL);\n+\n+\tlocal_irq_restore(flags);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+\n+\treturn hv_result_to_errno(status);\n+}\n+\n+static struct iommu_domain *hv_iommu_domain_alloc_paging(struct device *dev)\n+{\n+\tstruct hv_domain *hvdom;\n+\tint rc;\n+\n+\tif (hv_l1vh_partition() && !hv_curr_thread_is_vmm()) {\n+\t\tpr_err(\"Hyper-V: l1vh iommu does not support host devices\\n\");\n+\t\treturn NULL;\n+\t}\n+\n+\thvdom = kzalloc(sizeof(struct hv_domain), GFP_KERNEL);\n+\tif (hvdom == NULL)\n+\t\treturn NULL;\n+\n+\tspin_lock_init(&hvdom->mappings_lock);\n+\thvdom->mappings_tree = RB_ROOT_CACHED;\n+\n+\t/* Called under iommu group mutex, so single threaded */\n+\tif (++unique_id == HV_DEVICE_DOMAIN_ID_S2_DEFAULT)   /* ie, 0 */\n+\t\tgoto out_err;\n+\n+\thvdom->domid_num = unique_id;\n+\thvdom->partid = hv_get_current_partid();\n+\thvdom->iommu_dom.geometry = default_geometry;\n+\thvdom->iommu_dom.pgsize_bitmap = HV_IOMMU_PGSIZES;\n+\n+\t/* For guests, by default we do direct attaches, so no domain in hyp */\n+\tif (hv_dom_owner_is_vmm(hvdom) && !hv_no_attdev)\n+\t\thvdom->attached_dom = true;\n+\telse {\n+\t\trc = hv_iommu_create_hyp_devdom(hvdom);\n+\t\tif (rc)\n+\t\t\tgoto out_err;\n+\t}\n+\n+\treturn &hvdom->iommu_dom;\n+\n+out_err:\n+\tunique_id--;\n+\tkfree(hvdom);\n+\treturn NULL;\n+}\n+\n+static void hv_iommu_domain_free(struct iommu_domain *immdom)\n+{\n+\tstruct hv_domain *hvdom = to_hv_domain(immdom);\n+\tunsigned long flags;\n+\tu64 status;\n+\tstruct hv_input_delete_device_domain *input;\n+\n+\tif (hv_special_domain(hvdom))\n+\t\treturn;\n+\n+\tif (!hv_dom_owner_is_vmm(hvdom) || hv_no_attdev) {\n+\t\tstruct hv_input_device_domain *ddp;\n+\n+\t\tlocal_irq_save(flags);\n+\t\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\t\tddp = &input->device_domain;\n+\t\tmemset(input, 0, sizeof(*input));\n+\n+\t\tddp->partition_id = HV_PARTITION_ID_SELF;\n+\t\tddp->domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;\n+\t\tddp->domain_id.id = hvdom->domid_num;\n+\n+\t\tstatus = hv_do_hypercall(HVCALL_DELETE_DEVICE_DOMAIN, input,\n+\t\t\t\t\t NULL);\n+\t\tlocal_irq_restore(flags);\n+\n+\t\tif (!hv_result_success(status))\n+\t\t\thv_status_err(status, \"\\n\");\n+\t}\n+\n+\tkfree(hvdom);\n+}\n+\n+/* Attach a device to a domain previously created in the hypervisor */\n+static int hv_iommu_att_dev2dom(struct hv_domain *hvdom, struct pci_dev *pdev)\n+{\n+\tunsigned long flags;\n+\tu64 status;\n+\tenum hv_device_type dev_type;\n+\tstruct hv_input_attach_device_domain *input;\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tinput->device_domain.partition_id = HV_PARTITION_ID_SELF;\n+\tinput->device_domain.domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;\n+\tinput->device_domain.domain_id.id = hvdom->domid_num;\n+\n+\t/* NB: Upon guest shutdown, device is re-attached to the default domain\n+\t *     without explicit detach.\n+\t */\n+\tif (hv_l1vh_partition())\n+\t\tdev_type = HV_DEVICE_TYPE_LOGICAL;\n+\telse\n+\t\tdev_type = HV_DEVICE_TYPE_PCI;\n+\n+\tinput->device_id.as_uint64 = hv_build_devid_oftype(pdev, dev_type);\n+\n+\tstatus = hv_do_hypercall(HVCALL_ATTACH_DEVICE_DOMAIN, input, NULL);\n+\tlocal_irq_restore(flags);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+\n+\treturn hv_result_to_errno(status);\n+}\n+\n+/* Caller must have validated that dev is a valid pci dev */\n+static int hv_iommu_direct_attach_device(struct pci_dev *pdev, u64 ptid)\n+{\n+\tstruct hv_input_attach_device *input;\n+\tu64 status;\n+\tint rc;\n+\tunsigned long flags;\n+\tunion hv_device_id host_devid;\n+\tenum hv_device_type dev_type;\n+\n+\tif (ptid == HV_PARTITION_ID_INVALID) {\n+\t\tpr_err(\"Hyper-V: Invalid partition id in direct attach\\n\");\n+\t\treturn -EINVAL;\n+\t}\n+\n+\tif (hv_l1vh_partition())\n+\t\tdev_type = HV_DEVICE_TYPE_LOGICAL;\n+\telse\n+\t\tdev_type = HV_DEVICE_TYPE_PCI;\n+\n+\thost_devid.as_uint64 = hv_build_devid_oftype(pdev, dev_type);\n+\n+\tdo {\n+\t\tlocal_irq_save(flags);\n+\t\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\t\tmemset(input, 0, sizeof(*input));\n+\t\tinput->partition_id = ptid;\n+\t\tinput->device_id = host_devid;\n+\n+\t\t/* Hypervisor associates logical_id with this device, and in\n+\t\t * some hypercalls like retarget interrupts, logical_id must be\n+\t\t * used instead of the BDF. It is a required parameter.\n+\t\t */\n+\t\tinput->attdev_flags.logical_id = 1;\n+\t\tinput->logical_devid =\n+\t\t\t   hv_build_devid_oftype(pdev, HV_DEVICE_TYPE_LOGICAL);\n+\n+\t\tstatus = hv_do_hypercall(HVCALL_ATTACH_DEVICE, input, NULL);\n+\t\tlocal_irq_restore(flags);\n+\n+\t\tif (hv_result(status) == HV_STATUS_INSUFFICIENT_MEMORY) {\n+\t\t\trc = hv_call_deposit_pages(NUMA_NO_NODE, ptid, 1);\n+\t\t\tif (rc)\n+\t\t\t\tbreak;\n+\t\t}\n+\t} while (hv_result(status) == HV_STATUS_INSUFFICIENT_MEMORY);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+\n+\treturn hv_result_to_errno(status);\n+}\n+\n+/* Attach a device for passthru to guest VMs, host apps like SPDK, etc */\n+static int hv_iommu_attach_dev(struct iommu_domain *immdom, struct device *dev,\n+\t\t\t       struct iommu_domain *old)\n+{\n+\tstruct pci_dev *pdev;\n+\tint rc;\n+\tstruct hv_domain *hvdom_new = to_hv_domain(immdom);\n+\tstruct hv_domain *hvdom_prev = dev_iommu_priv_get(dev);\n+\n+\t/* Only allow PCI devices for now */\n+\tif (!dev_is_pci(dev))\n+\t\treturn -EINVAL;\n+\n+\tpdev = to_pci_dev(dev);\n+\n+\tif (hv_l1vh_partition() && !hv_special_domain(hvdom_new) &&\n+\t    !hvdom_new->attached_dom)\n+\t\treturn -EINVAL;\n+\n+\t/* VFIO does not do explicit detach calls, hence check first if we need\n+\t * to detach first. Also, in case of guest shutdown, it's the VMM\n+\t * thread that attaches it back to the hv_def_identity_dom, and\n+\t * hvdom_prev will not be null then. It is null during boot.\n+\t */\n+\tif (hvdom_prev)\n+\t\tif (!hv_l1vh_partition() || !hv_special_domain(hvdom_prev))\n+\t\t\thv_iommu_detach_dev(&hvdom_prev->iommu_dom, dev);\n+\n+\tif (hv_l1vh_partition() && hv_special_domain(hvdom_new)) {\n+\t\tdev_iommu_priv_set(dev, hvdom_new);  /* sets \"private\" field */\n+\t\treturn 0;\n+\t}\n+\n+\tif (hvdom_new->attached_dom)\n+\t\trc = hv_iommu_direct_attach_device(pdev, hvdom_new->partid);\n+\telse\n+\t\trc = hv_iommu_att_dev2dom(hvdom_new, pdev);\n+\n+\tif (rc == 0)\n+\t\tdev_iommu_priv_set(dev, hvdom_new);  /* sets \"private\" field */\n+\n+\treturn rc;\n+}\n+\n+static void hv_iommu_det_dev_from_guest(struct pci_dev *pdev, u64 ptid)\n+{\n+\tstruct hv_input_detach_device *input;\n+\tu64 status, log_devid;\n+\tunsigned long flags;\n+\n+\tlog_devid = hv_build_devid_oftype(pdev, HV_DEVICE_TYPE_LOGICAL);\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tinput->partition_id = ptid;\n+\tinput->logical_devid = log_devid;\n+\tstatus = hv_do_hypercall(HVCALL_DETACH_DEVICE, input, NULL);\n+\tlocal_irq_restore(flags);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+}\n+\n+static void hv_iommu_det_dev_from_dom(struct pci_dev *pdev)\n+{\n+\tu64 status, devid;\n+\tunsigned long flags;\n+\tstruct hv_input_detach_device_domain *input;\n+\n+\tdevid = hv_build_devid_oftype(pdev, HV_DEVICE_TYPE_PCI);\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tinput->partition_id = HV_PARTITION_ID_SELF;\n+\tinput->device_id.as_uint64 = devid;\n+\tstatus = hv_do_hypercall(HVCALL_DETACH_DEVICE_DOMAIN, input, NULL);\n+\tlocal_irq_restore(flags);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+}\n+\n+static void hv_iommu_detach_dev(struct iommu_domain *immdom, struct device *dev)\n+{\n+\tstruct pci_dev *pdev;\n+\tstruct hv_domain *hvdom = to_hv_domain(immdom);\n+\n+\t/* See the attach function, only PCI devices for now */\n+\tif (!dev_is_pci(dev))\n+\t\treturn;\n+\n+\tpdev = to_pci_dev(dev);\n+\n+\tif (hvdom->attached_dom)\n+\t\thv_iommu_det_dev_from_guest(pdev, hvdom->partid);\n+\n+\t\t/* Do not reset attached_dom, hv_iommu_unmap_pages happens\n+\t\t * next.\n+\t\t */\n+\telse\n+\t\thv_iommu_det_dev_from_dom(pdev);\n+}\n+\n+static int hv_iommu_add_tree_mapping(struct hv_domain *hvdom,\n+\t\t\t\t     unsigned long iova, phys_addr_t paddr,\n+\t\t\t\t     size_t size, u32 flags)\n+{\n+\tunsigned long irqflags;\n+\tstruct hv_iommu_mapping *mapping;\n+\n+\tmapping = kzalloc(sizeof(*mapping), GFP_ATOMIC);\n+\tif (!mapping)\n+\t\treturn -ENOMEM;\n+\n+\tmapping->paddr = paddr;\n+\tmapping->iova.start = iova;\n+\tmapping->iova.last = iova + size - 1;\n+\tmapping->flags = flags;\n+\n+\tspin_lock_irqsave(&hvdom->mappings_lock, irqflags);\n+\tinterval_tree_insert(&mapping->iova, &hvdom->mappings_tree);\n+\tspin_unlock_irqrestore(&hvdom->mappings_lock, irqflags);\n+\n+\treturn 0;\n+}\n+\n+static size_t hv_iommu_del_tree_mappings(struct hv_domain *hvdom,\n+\t\t\t\t\tunsigned long iova, size_t size)\n+{\n+\tunsigned long flags;\n+\tsize_t unmapped = 0;\n+\tunsigned long last = iova + size - 1;\n+\tstruct hv_iommu_mapping *mapping = NULL;\n+\tstruct interval_tree_node *node, *next;\n+\n+\tspin_lock_irqsave(&hvdom->mappings_lock, flags);\n+\tnext = interval_tree_iter_first(&hvdom->mappings_tree, iova, last);\n+\twhile (next) {\n+\t\tnode = next;\n+\t\tmapping = container_of(node, struct hv_iommu_mapping, iova);\n+\t\tnext = interval_tree_iter_next(node, iova, last);\n+\n+\t\t/* Trying to split a mapping? Not supported for now. */\n+\t\tif (mapping->iova.start < iova)\n+\t\t\tbreak;\n+\n+\t\tunmapped += mapping->iova.last - mapping->iova.start + 1;\n+\n+\t\tinterval_tree_remove(node, &hvdom->mappings_tree);\n+\t\tkfree(mapping);\n+\t}\n+\tspin_unlock_irqrestore(&hvdom->mappings_lock, flags);\n+\n+\treturn unmapped;\n+}\n+\n+/* Return: must return exact status from the hypercall without changes */\n+static u64 hv_iommu_map_pgs(struct hv_domain *hvdom,\n+\t\t\t    unsigned long iova, phys_addr_t paddr,\n+\t\t\t    unsigned long npages, u32 map_flags)\n+{\n+\tu64 status;\n+\tint i;\n+\tstruct hv_input_map_device_gpa_pages *input;\n+\tunsigned long flags, pfn;\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tinput->device_domain.partition_id = HV_PARTITION_ID_SELF;\n+\tinput->device_domain.domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;\n+\tinput->device_domain.domain_id.id = hvdom->domid_num;\n+\tinput->map_flags = map_flags;\n+\tinput->target_device_va_base = iova;\n+\n+\tpfn = paddr >> HV_HYP_PAGE_SHIFT;\n+\tfor (i = 0; i < npages; i++, pfn++)\n+\t\tinput->gpa_page_list[i] = pfn;\n+\n+\tstatus = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_GPA_PAGES, npages, 0,\n+\t\t\t\t     input, NULL);\n+\n+\tlocal_irq_restore(flags);\n+\treturn status;\n+}\n+\n+/*\n+ * The core VFIO code loops over memory ranges calling this function with\n+ * the largest size from HV_IOMMU_PGSIZES. cond_resched() is in vfio_iommu_map.\n+ */\n+static int hv_iommu_map_pages(struct iommu_domain *immdom, ulong iova,\n+\t\t\t      phys_addr_t paddr, size_t pgsize, size_t pgcount,\n+\t\t\t      int prot, gfp_t gfp, size_t *mapped)\n+{\n+\tu32 map_flags;\n+\tint ret;\n+\tu64 status;\n+\tunsigned long npages, done = 0;\n+\tstruct hv_domain *hvdom = to_hv_domain(immdom);\n+\tsize_t size = pgsize * pgcount;\n+\n+\tmap_flags = HV_MAP_GPA_READABLE;\t/* required */\n+\tmap_flags |= prot & IOMMU_WRITE ? HV_MAP_GPA_WRITABLE : 0;\n+\n+\tret = hv_iommu_add_tree_mapping(hvdom, iova, paddr, size, map_flags);\n+\tif (ret)\n+\t\treturn ret;\n+\n+\tif (hvdom->attached_dom) {\n+\t\t*mapped = size;\n+\t\treturn 0;\n+\t}\n+\n+\tnpages = size >> HV_HYP_PAGE_SHIFT;\n+\twhile (done < npages) {\n+\t\tulong completed, remain = npages - done;\n+\n+\t\tstatus = hv_iommu_map_pgs(hvdom, iova, paddr, remain,\n+\t\t\t\t\t  map_flags);\n+\n+\t\tcompleted = hv_repcomp(status);\n+\t\tdone = done + completed;\n+\t\tiova = iova + (completed << HV_HYP_PAGE_SHIFT);\n+\t\tpaddr = paddr + (completed << HV_HYP_PAGE_SHIFT);\n+\n+\t\tif (hv_result(status) == HV_STATUS_INSUFFICIENT_MEMORY) {\n+\t\t\tret = hv_call_deposit_pages(NUMA_NO_NODE,\n+\t\t\t\t\t\t    hv_current_partition_id,\n+\t\t\t\t\t\t    256);\n+\t\t\tif (ret)\n+\t\t\t\tbreak;\n+\t\t\tcontinue;\n+\t\t}\n+\t\tif (!hv_result_success(status))\n+\t\t\tbreak;\n+\t}\n+\n+\tif (!hv_result_success(status)) {\n+\t\tsize_t done_size = done << HV_HYP_PAGE_SHIFT;\n+\n+\t\thv_status_err(status, \"pgs:%lx/%lx iova:%lx\\n\",\n+\t\t\t      done, npages, iova);\n+\t\t/*\n+\t\t * lookup tree has all mappings [0 - size-1]. Below unmap will\n+\t\t * only remove from [0 - done], we need to remove second chunk\n+\t\t * [done+1 - size-1].\n+\t\t */\n+\t\thv_iommu_del_tree_mappings(hvdom, iova, size - done_size);\n+\t\thv_iommu_unmap_pages(immdom, iova - done_size, HV_HYP_PAGE_SIZE,\n+\t\t\t\t     done, NULL);\n+\t\tif (mapped)\n+\t\t\t*mapped = 0;\n+\t} else\n+\t\tif (mapped)\n+\t\t\t*mapped = size;\n+\n+\treturn hv_result_to_errno(status);\n+}\n+\n+static size_t hv_iommu_unmap_pages(struct iommu_domain *immdom, ulong iova,\n+\t\t\t\t   size_t pgsize, size_t pgcount,\n+\t\t\t\t   struct iommu_iotlb_gather *gather)\n+{\n+\tunsigned long flags, npages;\n+\tstruct hv_input_unmap_device_gpa_pages *input;\n+\tu64 status;\n+\tstruct hv_domain *hvdom = to_hv_domain(immdom);\n+\tsize_t unmapped, size = pgsize * pgcount;\n+\n+\tunmapped = hv_iommu_del_tree_mappings(hvdom, iova, size);\n+\tif (unmapped < size)\n+\t\tpr_err(\"%s: could not delete all mappings (%lx:%lx/%lx)\\n\",\n+\t\t       __func__, iova, unmapped, size);\n+\n+\tif (hvdom->attached_dom)\n+\t\treturn size;\n+\n+\tnpages = size >> HV_HYP_PAGE_SHIFT;\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tinput->device_domain.partition_id = HV_PARTITION_ID_SELF;\n+\tinput->device_domain.domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;\n+\tinput->device_domain.domain_id.id = hvdom->domid_num;\n+\tinput->target_device_va_base = iova;\n+\n+\tstatus = hv_do_rep_hypercall(HVCALL_UNMAP_DEVICE_GPA_PAGES, npages,\n+\t\t\t\t     0, input, NULL);\n+\tlocal_irq_restore(flags);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+\n+\treturn unmapped;\n+}\n+\n+static phys_addr_t hv_iommu_iova_to_phys(struct iommu_domain *immdom,\n+\t\t\t\t\t dma_addr_t iova)\n+{\n+\tunsigned long flags;\n+\tstruct hv_iommu_mapping *mapping;\n+\tstruct interval_tree_node *node;\n+\tu64 paddr = 0;\n+\tstruct hv_domain *hvdom = to_hv_domain(immdom);\n+\n+\tspin_lock_irqsave(&hvdom->mappings_lock, flags);\n+\tnode = interval_tree_iter_first(&hvdom->mappings_tree, iova, iova);\n+\tif (node) {\n+\t\tmapping = container_of(node, struct hv_iommu_mapping, iova);\n+\t\tpaddr = mapping->paddr + (iova - mapping->iova.start);\n+\t}\n+\tspin_unlock_irqrestore(&hvdom->mappings_lock, flags);\n+\n+\treturn paddr;\n+}\n+\n+/*\n+ * Currently, hypervisor does not provide list of devices it is using\n+ * dynamically. So use this to allow users to manually specify devices that\n+ * should be skipped. (eg. hypervisor debugger using some network device).\n+ */\n+static struct iommu_device *hv_iommu_probe_device(struct device *dev)\n+{\n+\tif (!dev_is_pci(dev))\n+\t\treturn ERR_PTR(-ENODEV);\n+\n+\tif (pci_devs_to_skip && *pci_devs_to_skip) {\n+\t\tint rc, pos = 0;\n+\t\tint parsed;\n+\t\tint segment, bus, slot, func;\n+\t\tstruct pci_dev *pdev = to_pci_dev(dev);\n+\n+\t\tdo {\n+\t\t\tparsed = 0;\n+\n+\t\t\trc = sscanf(pci_devs_to_skip + pos, \" (%x:%x:%x.%x) %n\",\n+\t\t\t\t    &segment, &bus, &slot, &func, &parsed);\n+\t\t\tif (rc)\n+\t\t\t\tbreak;\n+\t\t\tif (parsed <= 0)\n+\t\t\t\tbreak;\n+\n+\t\t\tif (pci_domain_nr(pdev->bus) == segment &&\n+\t\t\t    pdev->bus->number == bus &&\n+\t\t\t    PCI_SLOT(pdev->devfn) == slot &&\n+\t\t\t    PCI_FUNC(pdev->devfn) == func) {\n+\n+\t\t\t\tdev_info(dev, \"skipped by Hyper-V IOMMU\\n\");\n+\t\t\t\treturn ERR_PTR(-ENODEV);\n+\t\t\t}\n+\t\t\tpos += parsed;\n+\n+\t\t} while (pci_devs_to_skip[pos]);\n+\t}\n+\n+\t/* Device will be explicitly attached to the default domain, so no need\n+\t * to do dev_iommu_priv_set() here.\n+\t */\n+\n+\treturn &hv_virt_iommu;\n+}\n+\n+static void hv_iommu_probe_finalize(struct device *dev)\n+{\n+\tstruct iommu_domain *immdom = iommu_get_domain_for_dev(dev);\n+\n+\tif (immdom && immdom->type == IOMMU_DOMAIN_DMA)\n+\t\tiommu_setup_dma_ops(dev, immdom);\n+\telse\n+\t\tset_dma_ops(dev, NULL);\n+}\n+\n+static void hv_iommu_release_device(struct device *dev)\n+{\n+\tstruct hv_domain *hvdom = dev_iommu_priv_get(dev);\n+\n+\t/* Need to detach device from device domain if necessary. */\n+\tif (hvdom)\n+\t\thv_iommu_detach_dev(&hvdom->iommu_dom, dev);\n+\n+\tdev_iommu_priv_set(dev, NULL);\n+\tset_dma_ops(dev, NULL);\n+}\n+\n+static struct iommu_group *hv_iommu_device_group(struct device *dev)\n+{\n+\tif (dev_is_pci(dev))\n+\t\treturn pci_device_group(dev);\n+\telse\n+\t\treturn generic_device_group(dev);\n+}\n+\n+static int hv_iommu_def_domain_type(struct device *dev)\n+{\n+\t/* The hypervisor always creates this by default during boot */\n+\treturn IOMMU_DOMAIN_IDENTITY;\n+}\n+\n+static struct iommu_ops hv_iommu_ops = {\n+\t.capable\t    = hv_iommu_capable,\n+\t.domain_alloc_paging\t= hv_iommu_domain_alloc_paging,\n+\t.probe_device\t    = hv_iommu_probe_device,\n+\t.probe_finalize     = hv_iommu_probe_finalize,\n+\t.release_device     = hv_iommu_release_device,\n+\t.def_domain_type    = hv_iommu_def_domain_type,\n+\t.device_group\t    = hv_iommu_device_group,\n+\t.default_domain_ops = &(const struct iommu_domain_ops) {\n+\t\t.attach_dev   = hv_iommu_attach_dev,\n+\t\t.map_pages    = hv_iommu_map_pages,\n+\t\t.unmap_pages  = hv_iommu_unmap_pages,\n+\t\t.iova_to_phys = hv_iommu_iova_to_phys,\n+\t\t.free\t      = hv_iommu_domain_free,\n+\t},\n+\t.owner\t\t    = THIS_MODULE,\n+\t.identity_domain = &hv_def_identity_dom.iommu_dom,\n+\t.blocked_domain  = &hv_null_dom.iommu_dom,\n+};\n+\n+static const struct iommu_domain_ops hv_special_domain_ops = {\n+\t.attach_dev = hv_iommu_attach_dev,\n+};\n+\n+static void __init hv_initialize_special_domains(void)\n+{\n+\thv_def_identity_dom.iommu_dom.type = IOMMU_DOMAIN_IDENTITY;\n+\thv_def_identity_dom.iommu_dom.ops = &hv_special_domain_ops;\n+\thv_def_identity_dom.iommu_dom.owner = &hv_iommu_ops;\n+\thv_def_identity_dom.iommu_dom.geometry = default_geometry;\n+\thv_def_identity_dom.domid_num = HV_DEVICE_DOMAIN_ID_S2_DEFAULT; /* 0 */\n+\n+\thv_null_dom.iommu_dom.type = IOMMU_DOMAIN_BLOCKED;\n+\thv_null_dom.iommu_dom.ops = &hv_special_domain_ops;\n+\thv_null_dom.iommu_dom.owner = &hv_iommu_ops;\n+\thv_null_dom.iommu_dom.geometry = default_geometry;\n+\thv_null_dom.domid_num = HV_DEVICE_DOMAIN_ID_S2_NULL;  /* INTMAX */\n+}\n+\n+static int __init hv_iommu_init(void)\n+{\n+\tint ret;\n+\tstruct iommu_device *iommup = &hv_virt_iommu;\n+\n+\tif (!hv_is_hyperv_initialized())\n+\t\treturn -ENODEV;\n+\n+\tret = iommu_device_sysfs_add(iommup, NULL, NULL, \"%s\", \"hyperv-iommu\");\n+\tif (ret) {\n+\t\tpr_err(\"Hyper-V: iommu_device_sysfs_add failed: %d\\n\", ret);\n+\t\treturn ret;\n+\t}\n+\n+\t/* This must come before iommu_device_register because the latter calls\n+\t * into the hooks.\n+\t */\n+\thv_initialize_special_domains();\n+\n+\tret = iommu_device_register(iommup, &hv_iommu_ops, NULL);\n+\tif (ret) {\n+\t\tpr_err(\"Hyper-V: iommu_device_register failed: %d\\n\", ret);\n+\t\tgoto err_sysfs_remove;\n+\t}\n+\n+\tpr_info(\"Hyper-V IOMMU initialized\\n\");\n+\n+\treturn 0;\n+\n+err_sysfs_remove:\n+\tiommu_device_sysfs_remove(iommup);\n+\treturn ret;\n+}\n+\n+void __init hv_iommu_detect(void)\n+{\n+\tif (no_iommu || iommu_detected)\n+\t\treturn;\n+\n+\t/* For l1vh, always expose an iommu unit */\n+\tif (!hv_l1vh_partition())\n+\t\tif (!(ms_hyperv.misc_features & HV_DEVICE_DOMAIN_AVAILABLE))\n+\t\t\treturn;\n+\n+\tiommu_detected = 1;\n+\tx86_init.iommu.iommu_init = hv_iommu_init;\n+\n+\tpci_request_acs();\n+}\ndiff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h\nindex a6878ab685e7..fca5ed68b5c2 100644\n--- a/include/asm-generic/mshyperv.h\n+++ b/include/asm-generic/mshyperv.h\n@@ -337,6 +337,23 @@ static inline u64 hv_pci_vmbus_device_id(struct pci_dev *pdev)\n { return 0; }\n #endif /* IS_ENABLED(CONFIG_PCI_HYPERV) */\n \n+#if IS_ENABLED(CONFIG_HYPERV_IOMMU)\n+u64 hv_get_current_partid(void);\n+bool hv_pcidev_is_attached_dev(struct pci_dev *pdev);\n+bool hv_pcidev_is_pthru_dev(struct pci_dev *pdev);\n+u64 hv_build_devid_oftype(struct pci_dev *pdev, enum hv_device_type type);\n+#else\n+static inline bool hv_pcidev_is_attached_dev(struct pci_dev *pdev)\n+{ return false; }\n+static inline bool hv_pcidev_is_pthru_dev(struct pci_dev *pdev)\n+{ return false; }\n+static inline u64 hv_build_devid_oftype(struct pci_dev *pdev,\n+\t\t\t\t\tenum hv_device_type type)\n+{ return 0; }\n+static inline u64 hv_get_current_partid(void)\n+{ return HV_PARTITION_ID_INVALID; }\n+#endif /* IS_ENABLED(CONFIG_HYPERV_IOMMU) */\n+\n #if IS_ENABLED(CONFIG_MSHV_ROOT)\n static inline bool hv_root_partition(void)\n {\ndiff --git a/include/linux/hyperv.h b/include/linux/hyperv.h\nindex 5459e776ec17..6eee1cbf6f23 100644\n--- a/include/linux/hyperv.h\n+++ b/include/linux/hyperv.h\n@@ -1769,4 +1769,10 @@ static inline unsigned long virt_to_hvpfn(void *addr)\n #define HVPFN_DOWN(x)\t((x) >> HV_HYP_PAGE_SHIFT)\n #define page_to_hvpfn(page)\t(page_to_pfn(page) * NR_HV_HYP_PAGES_IN_PAGE)\n \n+#ifdef CONFIG_HYPERV_IOMMU\n+void __init hv_iommu_detect(void);\n+#else\n+static inline void hv_iommu_detect(void) { }\n+#endif /* CONFIG_HYPERV_IOMMU */\n+\n #endif /* _HYPERV_H */\n","prefixes":["V2","09/11"]}