{"id":2226034,"url":"http://patchwork.ozlabs.org/api/1.2/patches/2226034/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-pci/patch/20260422023239.1171963-12-mrathor@linux.microsoft.com/","project":{"id":28,"url":"http://patchwork.ozlabs.org/api/1.2/projects/28/?format=json","name":"Linux PCI development","link_name":"linux-pci","list_id":"linux-pci.vger.kernel.org","list_email":"linux-pci@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20260422023239.1171963-12-mrathor@linux.microsoft.com>","list_archive_url":null,"date":"2026-04-22T02:32:37","name":"[V1,11/13] x86/hyperv: Implement hyperv virtual iommu","commit_ref":null,"pull_url":null,"state":"new","archived":false,"hash":"3f847802b6d219e7f76e72362c6347469470aae6","submitter":{"id":91512,"url":"http://patchwork.ozlabs.org/api/1.2/people/91512/?format=json","name":"Mukesh R","email":"mrathor@linux.microsoft.com"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/linux-pci/patch/20260422023239.1171963-12-mrathor@linux.microsoft.com/mbox/","series":[{"id":500915,"url":"http://patchwork.ozlabs.org/api/1.2/series/500915/?format=json","web_url":"http://patchwork.ozlabs.org/project/linux-pci/list/?series=500915","date":"2026-04-22T02:32:26","name":"PCI passthru on Hyper-V (Part I)","version":1,"mbox":"http://patchwork.ozlabs.org/series/500915/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/2226034/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/2226034/checks/","tags":{},"related":[],"headers":{"Return-Path":"\n <linux-pci+bounces-52908-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","linux-pci@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (1024-bit key;\n unprotected) header.d=linux.microsoft.com header.i=@linux.microsoft.com\n header.a=rsa-sha256 header.s=default header.b=TcXjEJ/P;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c04:e001:36c::12fc:5321; helo=tor.lore.kernel.org;\n envelope-from=linux-pci+bounces-52908-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (1024-bit key) header.d=linux.microsoft.com\n header.i=@linux.microsoft.com header.b=\"TcXjEJ/P\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=13.77.154.182","smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=linux.microsoft.com","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=linux.microsoft.com"],"Received":["from tor.lore.kernel.org (tor.lore.kernel.org\n [IPv6:2600:3c04:e001:36c::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g0k4v0Kg1z1yGs\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 22 Apr 2026 12:41:35 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id C87363078ADD\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 22 Apr 2026 02:36:02 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 07079392C2D;\n\tWed, 22 Apr 2026 02:34:19 +0000 (UTC)","from linux.microsoft.com (linux.microsoft.com [13.77.154.182])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 7639338AC7B;\n\tWed, 22 Apr 2026 02:33:59 +0000 (UTC)","from mrdev.corp.microsoft.com\n (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33])\n\tby linux.microsoft.com (Postfix) with ESMTPSA id CBADC20B6F24;\n\tTue, 21 Apr 2026 19:33:51 -0700 (PDT)"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1776825255; cv=none;\n b=ezjP8soaspAyuTtMgHKGCM9zet589qdOl9VDSYj+nVfRYojYR/eKl+xTkTTEuMiBiIRkvkMKalvXyDembig8QMG3UfzMl44NGmdMnXAouWHpEVegLbkScLAY2Ai4Xz6s+dN9ZZHsdUQ4bH8Z/tPVGsA4fK08hLOxYPVej/bW0gw=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1776825255; c=relaxed/simple;\n\tbh=h+Ny31frSUM48Jpi/cd7zYpVI/BIBt1tEJTL0lAuKbY=;\n\th=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:\n\t MIME-Version;\n b=sPykYZI4fQJ4uHfFZIof8tnY/1k6TG9Dyg4lq7awN85C8yLNL/TgvwNdJl3ds/aD3Bh4rKnHTbVYHJY9sDi0Q5TWlreiAdc1QF0Ma4byRdle4gGTi0c5pYKa7bMXhWxrZnqPkLMb7D8f/Fo+QTON57kp1lsTER2kKgh4PolpYe4=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=none dis=none) header.from=linux.microsoft.com;\n spf=pass smtp.mailfrom=linux.microsoft.com;\n dkim=pass (1024-bit key) header.d=linux.microsoft.com\n header.i=@linux.microsoft.com header.b=TcXjEJ/P;\n arc=none smtp.client-ip=13.77.154.182","DKIM-Filter":"OpenDKIM Filter v2.11.0 linux.microsoft.com CBADC20B6F24","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com;\n\ts=default; t=1776825232;\n\tbh=IyVWoBAu8eoWBZGeXBFmCaTyCy4N0jOKyOef3d/z/ho=;\n\th=From:To:Cc:Subject:Date:In-Reply-To:References:From;\n\tb=TcXjEJ/P+AcV6722ar/CoYdYshkqWa8TsFFOqI0KlYMvaw0B1XIJ+5k7tc/2g7sDG\n\t yOdVaHW4JE2EKlasdErus6zgt52jFLeAbaf6TlrWafu12jUEyzv2p78qujjsUh/fvp\n\t ii0XSueUcDF2zdxFeghnRJmOwSTRR3OzC+PpdNN0=","From":"Mukesh R <mrathor@linux.microsoft.com>","To":"hpa@zytor.com,\n\trobin.murphy@arm.com,\n\trobh@kernel.org,\n\twei.liu@kernel.org,\n\tmrathor@linux.microsoft.com,\n\tmhklinux@outlook.com,\n\tmuislam@microsoft.com,\n\tnamjain@linux.microsoft.com,\n\tmagnuskulke@linux.microsoft.com,\n\tanbelski@linux.microsoft.com,\n\tlinux-kernel@vger.kernel.org,\n\tlinux-hyperv@vger.kernel.org,\n\tiommu@lists.linux.dev,\n\tlinux-pci@vger.kernel.org,\n\tlinux-arch@vger.kernel.org","Cc":"kys@microsoft.com,\n\thaiyangz@microsoft.com,\n\tdecui@microsoft.com,\n\tlongli@microsoft.com,\n\ttglx@kernel.org,\n\tmingo@redhat.com,\n\tbp@alien8.de,\n\tdave.hansen@linux.intel.com,\n\tx86@kernel.org,\n\tjoro@8bytes.org,\n\twill@kernel.org,\n\tlpieralisi@kernel.org,\n\tkwilczynski@kernel.org,\n\tbhelgaas@google.com,\n\tarnd@arndb.de","Subject":"[PATCH V1 11/13] x86/hyperv: Implement hyperv virtual iommu","Date":"Tue, 21 Apr 2026 19:32:37 -0700","Message-ID":"<20260422023239.1171963-12-mrathor@linux.microsoft.com>","X-Mailer":"git-send-email 2.51.2.vfs.0.1","In-Reply-To":"<20260422023239.1171963-1-mrathor@linux.microsoft.com>","References":"<20260422023239.1171963-1-mrathor@linux.microsoft.com>","Precedence":"bulk","X-Mailing-List":"linux-pci@vger.kernel.org","List-Id":"<linux-pci.vger.kernel.org>","List-Subscribe":"<mailto:linux-pci+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:linux-pci+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Transfer-Encoding":"8bit"},"content":"Add a new file to implement management of device domains, mapping and\nunmapping of iommu memory, and other iommu_ops to fit within the VFIO\nframework for PCI passthru on Hyper-V running Linux as root or L1VH\nparent. This also implements direct attach mechanism for PCI passthru,\nand it is also made to work within the VFIO framework.\n\nAt a high level, during boot the hypervisor creates a default identity\ndomain and attaches all devices to it. This nicely maps to Linux iommu\nsubsystem IOMMU_DOMAIN_IDENTITY domain. As a result, Linux does not\nneed to explicitly ask Hyper-V to attach devices and do maps/unmaps\nduring boot. As mentioned previously, Hyper-V supports two ways to do\nPCI passthru:\n\n  1. Device Domain: root must create a device domain in the hypervisor,\n     and do map/unmap hypercalls for mapping and unmapping guest RAM.\n     All hypervisor communications use device id of type PCI for\n     identifying and referencing the device.\n\n  2. Direct Attach: the hypervisor will simply use the guest's HW\n     page table for mappings, thus the host need not do map/unmap\n     device memory hypercalls. As such, direct attach passthru setup\n     during guest boot is extremely fast. A direct attached device\n     must be referenced via logical device id and not via the PCI\n     device id.\n\nAt present, L1VH root/parent only supports direct attaches. Also direct\nattach is default in non-L1VH cases because there are some significant\nperformance issues with device domain implementation currently for guests\nwith higher RAM (say more than 8GB), and that unfortunately cannot be\naddressed in the short term.\n\nCo-developed-by: Wei Liu <wei.liu@kernel.org>\nSigned-off-by: Wei Liu <wei.liu@kernel.org>\nSigned-off-by: Mukesh R <mrathor@linux.microsoft.com>\n---\n MAINTAINERS                       |   1 +\n arch/x86/kernel/pci-dma.c         |   2 +\n drivers/iommu/Kconfig             |   5 +-\n drivers/iommu/Makefile            |   1 +\n drivers/iommu/hyperv-iommu-root.c | 899 ++++++++++++++++++++++++++++++\n include/asm-generic/mshyperv.h    |  24 +-\n include/linux/hyperv.h            |   6 +\n 7 files changed, 934 insertions(+), 4 deletions(-)\n create mode 100644 drivers/iommu/hyperv-iommu-root.c","diff":"diff --git a/MAINTAINERS b/MAINTAINERS\nindex f803a6a38fee..8ae040b89a56 100644\n--- a/MAINTAINERS\n+++ b/MAINTAINERS\n@@ -11914,6 +11914,7 @@ F:\tdrivers/clocksource/hyperv_timer.c\n F:\tdrivers/hid/hid-hyperv.c\n F:\tdrivers/hv/\n F:\tdrivers/input/serio/hyperv-keyboard.c\n+F:\tdrivers/iommu/hyperv-iommu-root.c\n F:\tdrivers/iommu/hyperv-irq.c\n F:\tdrivers/net/ethernet/microsoft/\n F:\tdrivers/net/hyperv/\ndiff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c\nindex 6267363e0189..cfeee6505e17 100644\n--- a/arch/x86/kernel/pci-dma.c\n+++ b/arch/x86/kernel/pci-dma.c\n@@ -8,6 +8,7 @@\n #include <linux/gfp.h>\n #include <linux/pci.h>\n #include <linux/amd-iommu.h>\n+#include <linux/hyperv.h>\n \n #include <asm/proto.h>\n #include <asm/dma.h>\n@@ -105,6 +106,7 @@ void __init pci_iommu_alloc(void)\n \tgart_iommu_hole_init();\n \tamd_iommu_detect();\n \tdetect_intel_iommu();\n+\thv_iommu_detect();\n \tswiotlb_init(x86_swiotlb_enable, x86_swiotlb_flags);\n }\n \ndiff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig\nindex f86262b11416..7909cf4373a6 100644\n--- a/drivers/iommu/Kconfig\n+++ b/drivers/iommu/Kconfig\n@@ -352,13 +352,12 @@ config MTK_IOMMU_V1\n \t  if unsure, say N here.\n \n config HYPERV_IOMMU\n-\tbool \"Hyper-V IRQ Handling\"\n+\tbool \"Hyper-V IOMMU Unit\"\n \tdepends on HYPERV && X86\n \tselect IOMMU_API\n \tdefault HYPERV\n \thelp\n-\t  Stub IOMMU driver to handle IRQs to support Hyper-V Linux\n-\t  guest and root partitions.\n+\t  Hyper-V pseudo IOMMU unit.\n \n config VIRTIO_IOMMU\n \ttristate \"Virtio IOMMU driver\"\ndiff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile\nindex 335ea77cced6..296fbc6ca829 100644\n--- a/drivers/iommu/Makefile\n+++ b/drivers/iommu/Makefile\n@@ -31,6 +31,7 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o\n obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o\n obj-$(CONFIG_S390_IOMMU) += s390-iommu.o\n obj-$(CONFIG_HYPERV) += hyperv-irq.o\n+obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu-root.o\n obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o\n obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o\n obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o\ndiff --git a/drivers/iommu/hyperv-iommu-root.c b/drivers/iommu/hyperv-iommu-root.c\nnew file mode 100644\nindex 000000000000..492de5a1cf23\n--- /dev/null\n+++ b/drivers/iommu/hyperv-iommu-root.c\n@@ -0,0 +1,899 @@\n+// SPDX-License-Identifier: GPL-2.0\n+/*\n+ * Hyper-V root vIOMMU driver.\n+ * Copyright (C) 2026, Microsoft, Inc.\n+ */\n+\n+#include <linux/pci.h>\n+#include <linux/dma-map-ops.h>\n+#include <linux/interval_tree.h>\n+#include <linux/hyperv.h>\n+#include \"dma-iommu.h\"\n+#include <asm/iommu.h>\n+#include <asm/mshyperv.h>\n+\n+/* We will not claim these PCI devices, eg hypervisor needs it for debugger */\n+static char *pci_devs_to_skip;\n+static int __init hv_iommu_setup_skip(char *str)\n+{\n+\tpci_devs_to_skip = str;\n+\n+\treturn 0;\n+}\n+/* hv_iommu_skip=(SSSS:BB:DD.F)(SSSS:BB:DD.F) */\n+__setup(\"hv_iommu_skip=\", hv_iommu_setup_skip);\n+\n+bool hv_no_attdev;\t /* disable direct device attach for passthru */\n+EXPORT_SYMBOL_GPL(hv_no_attdev);\n+static int __init setup_hv_no_attdev(char *str)\n+{\n+\thv_no_attdev = true;\n+\treturn 0;\n+}\n+__setup(\"hv_no_attdev\", setup_hv_no_attdev);\n+\n+/* Iommu device that we export to the world. HyperV supports max of one */\n+static struct iommu_device hv_virt_iommu;\n+\n+struct hv_domain {\n+\tstruct iommu_domain iommu_dom;\n+\tu32 domid_num;\t\t\t      /* as opposed to domain_id.type */\n+\tbool attached_dom;\t\t      /* is this direct attached dom? */\n+\tu64 partid;\t\t\t      /* partition id */\n+\tspinlock_t mappings_lock;\t      /* protects mappings_tree */\n+\tstruct rb_root_cached mappings_tree;  /* iova to pa lookup tree */\n+};\n+\n+#define to_hv_domain(d) container_of(d, struct hv_domain, iommu_dom)\n+\n+struct hv_iommu_mapping {\n+\tphys_addr_t paddr;\n+\tstruct interval_tree_node iova;\n+\tu32 flags;\n+};\n+\n+/*\n+ * By default, during boot the hypervisor creates one Stage 2 (S2) default\n+ * domain. Stage 2 means that the page table is controlled by the hypervisor.\n+ *   S2 default: access to entire root partition memory. This for us easily\n+ *\t\t maps to IOMMU_DOMAIN_IDENTITY in the iommu subsystem, and\n+ *\t\t is called HV_DEVICE_DOMAIN_ID_S2_DEFAULT in the hypervisor.\n+ *\n+ * Device Management:\n+ *   There are two ways to manage device attaches to domains:\n+ *     1. Domain Attach: A device domain is created in the hypervisor, the\n+ *\t\t\t device is attached to this domain, and then memory\n+ *\t\t\t ranges are mapped in the map callbacks.\n+ *     2. Direct Attach: No need to create a domain in the hypervisor for direct\n+ *\t\t\t attached devices. A hypercall is made to tell the\n+ *\t\t\t hypervisor to attach the device to a guest. There is\n+ *\t\t\t no need for explicit memory mappings because the\n+ *\t\t\t hypervisor will just use the guest HW page table.\n+ *\n+ * Since a direct attach is much faster, it is the default. This can be\n+ * changed via hv_no_attdev.\n+ *\n+ * L1VH: hypervisor only supports direct attach.\n+ */\n+\n+/*\n+ * Create dummy domain to correspond to hypervisor prebuilt default identity\n+ * domain (dummy because we do not make hypercall to create them).\n+ */\n+static struct hv_domain hv_def_identity_dom;\n+\n+static bool hv_special_domain(struct hv_domain *hvdom)\n+{\n+\treturn hvdom == &hv_def_identity_dom;\n+}\n+\n+struct iommu_domain_geometry default_geometry = (struct iommu_domain_geometry) {\n+\t.aperture_start = 0,\n+\t.aperture_end = -1UL,\n+\t.force_aperture = true,\n+};\n+\n+/*\n+ * Since the relevant hypercalls can only fit less than 512 PFNs in the pfn\n+ * array, report 1M max.\n+ */\n+#define HV_IOMMU_PGSIZES (SZ_4K | SZ_1M)\n+\n+static u32 unique_id;\t      /* unique numeric id of a new domain */\n+\n+static void hv_iommu_detach_dev(struct iommu_domain *immdom,\n+\t\t\t\tstruct device *dev);\n+static size_t hv_iommu_unmap_pages(struct iommu_domain *immdom, ulong iova,\n+\t\t\t\t   size_t pgsize, size_t pgcount,\n+\t\t\t\t   struct iommu_iotlb_gather *gather);\n+\n+/*\n+ * If the current thread is a VMM thread, return the partition id of the VM it\n+ * is managing, else return HV_PARTITION_ID_INVALID.\n+ */\n+u64 hv_get_current_partid(void)\n+{\n+\tu64 (*fn)(void);\n+\tu64 ptid;\n+\n+\tfn = symbol_get(mshv_current_partid);\n+\tif (!fn)\n+\t\treturn HV_PARTITION_ID_INVALID;\n+\n+\tptid = fn();\n+\tsymbol_put(mshv_current_partid);\n+\n+\treturn ptid;\n+}\n+EXPORT_SYMBOL_GPL(hv_get_current_partid);\n+\n+/* If this is a VMM thread, then this domain is for a guest vm */\n+static bool hv_curr_thread_is_vmm(void)\n+{\n+\treturn hv_get_current_partid() != HV_PARTITION_ID_INVALID;\n+}\n+\n+/* As opposed to some host app like SPDK etc... */\n+static bool hv_dom_owner_is_vmm(struct hv_domain *hvdom)\n+{\n+\treturn hvdom && hvdom->partid != HV_PARTITION_ID_INVALID;\n+}\n+\n+static bool hv_iommu_capable(struct device *dev, enum iommu_cap cap)\n+{\n+\tswitch (cap) {\n+\tcase IOMMU_CAP_CACHE_COHERENCY:\n+\t\treturn true;\n+\tdefault:\n+\t\treturn false;\n+\t}\n+}\n+\n+/*\n+ * Check if given pci device is a direct attached device. Caller must have\n+ * verified pdev is a valid pci device.\n+ */\n+bool hv_pcidev_is_attached_dev(struct pci_dev *pdev)\n+{\n+\tstruct iommu_domain *iommu_domain;\n+\tstruct hv_domain *hvdom;\n+\tstruct device *dev = &pdev->dev;\n+\n+\tiommu_domain = iommu_get_domain_for_dev(dev);\n+\tif (iommu_domain) {\n+\t\thvdom = to_hv_domain(iommu_domain);\n+\t\treturn hvdom->attached_dom;\n+\t}\n+\n+\treturn false;\n+}\n+EXPORT_SYMBOL_GPL(hv_pcidev_is_attached_dev);\n+\n+bool hv_pcidev_is_pthru_dev(struct pci_dev *pdev)\n+{\n+\tstruct device *dev = &pdev->dev;\n+\tstruct hv_domain *hvdom = dev_iommu_priv_get(dev);\n+\n+\tif (hvdom && !hv_special_domain(hvdom))\n+\t\treturn true;\n+\n+\treturn false;\n+}\n+EXPORT_SYMBOL_GPL(hv_pcidev_is_pthru_dev);\n+\n+/* Build device id for direct attached devices */\n+static u64 hv_build_devid_type_logical(struct pci_dev *pdev)\n+{\n+\thv_pci_segment segment;\n+\tunion hv_device_id hv_devid;\n+\tunion hv_pci_bdf bdf = {.as_uint16 = 0};\n+\tu32 rid = PCI_DEVID(pdev->bus->number, pdev->devfn);\n+\n+\tsegment = pci_domain_nr(pdev->bus);\n+\tbdf.bus = PCI_BUS_NUM(rid);\n+\tbdf.device = PCI_SLOT(rid);\n+\tbdf.function = PCI_FUNC(rid);\n+\n+\thv_devid.as_uint64 = 0;\n+\thv_devid.device_type = HV_DEVICE_TYPE_LOGICAL;\n+\thv_devid.logical.id = (u64)segment << 16 | bdf.as_uint16;\n+\n+\treturn hv_devid.as_uint64;\n+}\n+\n+u64 hv_build_devid_oftype(struct pci_dev *pdev, enum hv_device_type type)\n+{\n+\tif (type == HV_DEVICE_TYPE_LOGICAL) {\n+\t\tif (hv_l1vh_partition())\n+\t\t\treturn hv_pci_vmbus_device_id(pdev);\n+\t\telse\n+\t\t\treturn hv_build_devid_type_logical(pdev);\n+\t} else if (type == HV_DEVICE_TYPE_PCI)\n+#ifdef CONFIG_X86\n+\t\treturn hv_build_devid_type_pci(pdev);\n+#else\n+\t\treturn 0;\n+#endif\n+\treturn 0;\n+}\n+EXPORT_SYMBOL_GPL(hv_build_devid_oftype);\n+\n+/* Create a new device domain in the hypervisor */\n+static int hv_iommu_create_hyp_devdom(struct hv_domain *hvdom)\n+{\n+\tu64 status;\n+\tstruct hv_input_device_domain *ddp;\n+\tstruct hv_input_create_device_domain *input;\n+\tunsigned long flags;\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tddp = &input->device_domain;\n+\tddp->partition_id = HV_PARTITION_ID_SELF;\n+\tddp->domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;\n+\tddp->domain_id.id = hvdom->domid_num;\n+\n+\tinput->create_device_domain_flags.forward_progress_required = 1;\n+\tinput->create_device_domain_flags.inherit_owning_vtl = 0;\n+\n+\tstatus = hv_do_hypercall(HVCALL_CREATE_DEVICE_DOMAIN, input, NULL);\n+\n+\tlocal_irq_restore(flags);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+\n+\treturn hv_result_to_errno(status);\n+}\n+\n+/* During boot, all devices are attached to this */\n+static struct iommu_domain *hv_iommu_domain_alloc_identity(struct device *dev)\n+{\n+\treturn &hv_def_identity_dom.iommu_dom;\n+}\n+\n+static struct iommu_domain *hv_iommu_domain_alloc_paging(struct device *dev)\n+{\n+\tstruct hv_domain *hvdom;\n+\tint rc;\n+\n+\tif (hv_l1vh_partition() && !hv_curr_thread_is_vmm()) {\n+\t\tpr_err(\"Hyper-V: l1vh iommu does not support host devices\\n\");\n+\t\treturn NULL;\n+\t}\n+\n+\thvdom = kzalloc(sizeof(struct hv_domain), GFP_KERNEL);\n+\tif (hvdom == NULL)\n+\t\treturn NULL;\n+\n+\tspin_lock_init(&hvdom->mappings_lock);\n+\thvdom->mappings_tree = RB_ROOT_CACHED;\n+\n+\t/* Called under iommu group mutex, so single threaded */\n+\tif (++unique_id == HV_DEVICE_DOMAIN_ID_S2_DEFAULT)   /* ie, 0 */\n+\t\tgoto out_err;\n+\n+\thvdom->domid_num = unique_id;\n+\thvdom->partid = hv_get_current_partid();\n+\thvdom->iommu_dom.geometry = default_geometry;\n+\thvdom->iommu_dom.pgsize_bitmap = HV_IOMMU_PGSIZES;\n+\n+\t/* For guests, by default we do direct attaches, so no domain in hyp */\n+\tif (hv_dom_owner_is_vmm(hvdom) && !hv_no_attdev)\n+\t\thvdom->attached_dom = true;\n+\telse {\n+\t\trc = hv_iommu_create_hyp_devdom(hvdom);\n+\t\tif (rc)\n+\t\t\tgoto out_err;\n+\t}\n+\n+\treturn &hvdom->iommu_dom;\n+\n+out_err:\n+\tunique_id--;\n+\tkfree(hvdom);\n+\treturn NULL;\n+}\n+\n+static void hv_iommu_domain_free(struct iommu_domain *immdom)\n+{\n+\tstruct hv_domain *hvdom = to_hv_domain(immdom);\n+\tunsigned long flags;\n+\tu64 status;\n+\tstruct hv_input_delete_device_domain *input;\n+\n+\tif (hv_special_domain(hvdom))\n+\t\treturn;\n+\n+\tif (!hv_dom_owner_is_vmm(hvdom) || hv_no_attdev) {\n+\t\tstruct hv_input_device_domain *ddp;\n+\n+\t\tlocal_irq_save(flags);\n+\t\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\t\tddp = &input->device_domain;\n+\t\tmemset(input, 0, sizeof(*input));\n+\n+\t\tddp->partition_id = HV_PARTITION_ID_SELF;\n+\t\tddp->domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;\n+\t\tddp->domain_id.id = hvdom->domid_num;\n+\n+\t\tstatus = hv_do_hypercall(HVCALL_DELETE_DEVICE_DOMAIN, input,\n+\t\t\t\t\t NULL);\n+\t\tlocal_irq_restore(flags);\n+\n+\t\tif (!hv_result_success(status))\n+\t\t\thv_status_err(status, \"\\n\");\n+\t}\n+\n+\tkfree(hvdom);\n+}\n+\n+/* Attach a device to a domain previously created in the hypervisor */\n+static int hv_iommu_att_dev2dom(struct hv_domain *hvdom, struct pci_dev *pdev)\n+{\n+\tunsigned long flags;\n+\tu64 status;\n+\tenum hv_device_type dev_type;\n+\tstruct hv_input_attach_device_domain *input;\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tinput->device_domain.partition_id = HV_PARTITION_ID_SELF;\n+\tinput->device_domain.domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;\n+\tinput->device_domain.domain_id.id = hvdom->domid_num;\n+\n+\t/* NB: Upon guest shutdown, device is re-attached to the default domain\n+\t *     without explicit detach.\n+\t */\n+\tif (hv_l1vh_partition())\n+\t\tdev_type = HV_DEVICE_TYPE_LOGICAL;\n+\telse\n+\t\tdev_type = HV_DEVICE_TYPE_PCI;\n+\n+\tinput->device_id.as_uint64 = hv_build_devid_oftype(pdev, dev_type);\n+\n+\tstatus = hv_do_hypercall(HVCALL_ATTACH_DEVICE_DOMAIN, input, NULL);\n+\tlocal_irq_restore(flags);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+\n+\treturn hv_result_to_errno(status);\n+}\n+\n+/* Caller must have validated that dev is a valid pci dev */\n+static int hv_iommu_direct_attach_device(struct pci_dev *pdev, u64 ptid)\n+{\n+\tstruct hv_input_attach_device *input;\n+\tu64 status;\n+\tint rc;\n+\tunsigned long flags;\n+\tunion hv_device_id host_devid;\n+\tenum hv_device_type dev_type;\n+\n+\tif (ptid == HV_PARTITION_ID_INVALID) {\n+\t\tpr_err(\"Hyper-V: Invalid partition id in direct attach\\n\");\n+\t\treturn -EINVAL;\n+\t}\n+\n+\tif (hv_l1vh_partition())\n+\t\tdev_type = HV_DEVICE_TYPE_LOGICAL;\n+\telse\n+\t\tdev_type = HV_DEVICE_TYPE_PCI;\n+\n+\thost_devid.as_uint64 = hv_build_devid_oftype(pdev, dev_type);\n+\n+\tdo {\n+\t\tlocal_irq_save(flags);\n+\t\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\t\tmemset(input, 0, sizeof(*input));\n+\t\tinput->partition_id = ptid;\n+\t\tinput->device_id = host_devid;\n+\n+\t\t/* Hypervisor associates logical_id with this device, and in\n+\t\t * some hypercalls like retarget interrupts, logical_id must be\n+\t\t * used instead of the BDF. It is a required parameter.\n+\t\t */\n+\t\tinput->attdev_flags.logical_id = 1;\n+\t\tinput->logical_devid =\n+\t\t\t   hv_build_devid_oftype(pdev, HV_DEVICE_TYPE_LOGICAL);\n+\n+\t\tstatus = hv_do_hypercall(HVCALL_ATTACH_DEVICE, input, NULL);\n+\t\tlocal_irq_restore(flags);\n+\n+\t\tif (hv_result(status) == HV_STATUS_INSUFFICIENT_MEMORY) {\n+\t\t\trc = hv_call_deposit_pages(NUMA_NO_NODE, ptid, 1);\n+\t\t\tif (rc)\n+\t\t\t\tbreak;\n+\t\t}\n+\t} while (hv_result(status) == HV_STATUS_INSUFFICIENT_MEMORY);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+\n+\treturn hv_result_to_errno(status);\n+}\n+\n+/* Attach a device for passthru to guest VMs, host apps like SPDK, etc */\n+static int hv_iommu_attach_dev(struct iommu_domain *immdom, struct device *dev,\n+\t\t\t       struct iommu_domain *old)\n+{\n+\tstruct pci_dev *pdev;\n+\tint rc;\n+\tstruct hv_domain *hvdom_new = to_hv_domain(immdom);\n+\tstruct hv_domain *hvdom_prev = dev_iommu_priv_get(dev);\n+\n+\t/* Only allow PCI devices for now */\n+\tif (!dev_is_pci(dev))\n+\t\treturn -EINVAL;\n+\n+\tpdev = to_pci_dev(dev);\n+\n+\tif (hv_l1vh_partition() && !hv_special_domain(hvdom_new) &&\n+\t    !hvdom_new->attached_dom)\n+\t\treturn -EINVAL;\n+\n+\t/* VFIO does not do explicit detach calls, hence check first if we need\n+\t * to detach first. Also, in case of guest shutdown, it's the VMM\n+\t * thread that attaches it back to the hv_def_identity_dom, and\n+\t * hvdom_prev will not be null then. It is null during boot.\n+\t */\n+\tif (hvdom_prev)\n+\t\tif (!hv_l1vh_partition() || !hv_special_domain(hvdom_prev))\n+\t\t\thv_iommu_detach_dev(&hvdom_prev->iommu_dom, dev);\n+\n+\tif (hv_l1vh_partition() && hv_special_domain(hvdom_new)) {\n+\t\tdev_iommu_priv_set(dev, hvdom_new);  /* sets \"private\" field */\n+\t\treturn 0;\n+\t}\n+\n+\tif (hvdom_new->attached_dom)\n+\t\trc = hv_iommu_direct_attach_device(pdev, hvdom_new->partid);\n+\telse\n+\t\trc = hv_iommu_att_dev2dom(hvdom_new, pdev);\n+\n+\tif (rc == 0)\n+\t\tdev_iommu_priv_set(dev, hvdom_new);  /* sets \"private\" field */\n+\n+\treturn rc;\n+}\n+\n+static void hv_iommu_det_dev_from_guest(struct pci_dev *pdev, u64 ptid)\n+{\n+\tstruct hv_input_detach_device *input;\n+\tu64 status, log_devid;\n+\tunsigned long flags;\n+\n+\tlog_devid = hv_build_devid_oftype(pdev, HV_DEVICE_TYPE_LOGICAL);\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tinput->partition_id = ptid;\n+\tinput->logical_devid = log_devid;\n+\tstatus = hv_do_hypercall(HVCALL_DETACH_DEVICE, input, NULL);\n+\tlocal_irq_restore(flags);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+}\n+\n+static void hv_iommu_det_dev_from_dom(struct pci_dev *pdev)\n+{\n+\tu64 status, devid;\n+\tunsigned long flags;\n+\tstruct hv_input_detach_device_domain *input;\n+\n+\tdevid = hv_build_devid_oftype(pdev, HV_DEVICE_TYPE_PCI);\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tinput->partition_id = HV_PARTITION_ID_SELF;\n+\tinput->device_id.as_uint64 = devid;\n+\tstatus = hv_do_hypercall(HVCALL_DETACH_DEVICE_DOMAIN, input, NULL);\n+\tlocal_irq_restore(flags);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+}\n+\n+static void hv_iommu_detach_dev(struct iommu_domain *immdom, struct device *dev)\n+{\n+\tstruct pci_dev *pdev;\n+\tstruct hv_domain *hvdom = to_hv_domain(immdom);\n+\n+\t/* See the attach function, only PCI devices for now */\n+\tif (!dev_is_pci(dev))\n+\t\treturn;\n+\n+\tpdev = to_pci_dev(dev);\n+\n+\tif (hvdom->attached_dom)\n+\t\thv_iommu_det_dev_from_guest(pdev, hvdom->partid);\n+\n+\t\t/* Do not reset attached_dom, hv_iommu_unmap_pages happens\n+\t\t * next.\n+\t\t */\n+\telse\n+\t\thv_iommu_det_dev_from_dom(pdev);\n+}\n+\n+static int hv_iommu_add_tree_mapping(struct hv_domain *hvdom,\n+\t\t\t\t     unsigned long iova, phys_addr_t paddr,\n+\t\t\t\t     size_t size, u32 flags)\n+{\n+\tunsigned long irqflags;\n+\tstruct hv_iommu_mapping *mapping;\n+\n+\tmapping = kzalloc(sizeof(*mapping), GFP_ATOMIC);\n+\tif (!mapping)\n+\t\treturn -ENOMEM;\n+\n+\tmapping->paddr = paddr;\n+\tmapping->iova.start = iova;\n+\tmapping->iova.last = iova + size - 1;\n+\tmapping->flags = flags;\n+\n+\tspin_lock_irqsave(&hvdom->mappings_lock, irqflags);\n+\tinterval_tree_insert(&mapping->iova, &hvdom->mappings_tree);\n+\tspin_unlock_irqrestore(&hvdom->mappings_lock, irqflags);\n+\n+\treturn 0;\n+}\n+\n+static size_t hv_iommu_del_tree_mappings(struct hv_domain *hvdom,\n+\t\t\t\t\tunsigned long iova, size_t size)\n+{\n+\tunsigned long flags;\n+\tsize_t unmapped = 0;\n+\tunsigned long last = iova + size - 1;\n+\tstruct hv_iommu_mapping *mapping = NULL;\n+\tstruct interval_tree_node *node, *next;\n+\n+\tspin_lock_irqsave(&hvdom->mappings_lock, flags);\n+\tnext = interval_tree_iter_first(&hvdom->mappings_tree, iova, last);\n+\twhile (next) {\n+\t\tnode = next;\n+\t\tmapping = container_of(node, struct hv_iommu_mapping, iova);\n+\t\tnext = interval_tree_iter_next(node, iova, last);\n+\n+\t\t/* Trying to split a mapping? Not supported for now. */\n+\t\tif (mapping->iova.start < iova)\n+\t\t\tbreak;\n+\n+\t\tunmapped += mapping->iova.last - mapping->iova.start + 1;\n+\n+\t\tinterval_tree_remove(node, &hvdom->mappings_tree);\n+\t\tkfree(mapping);\n+\t}\n+\tspin_unlock_irqrestore(&hvdom->mappings_lock, flags);\n+\n+\treturn unmapped;\n+}\n+\n+/* Return: must return exact status from the hypercall without changes */\n+static u64 hv_iommu_map_pgs(struct hv_domain *hvdom,\n+\t\t\t    unsigned long iova, phys_addr_t paddr,\n+\t\t\t    unsigned long npages, u32 map_flags)\n+{\n+\tu64 status;\n+\tint i;\n+\tstruct hv_input_map_device_gpa_pages *input;\n+\tunsigned long flags, pfn;\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tinput->device_domain.partition_id = HV_PARTITION_ID_SELF;\n+\tinput->device_domain.domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;\n+\tinput->device_domain.domain_id.id = hvdom->domid_num;\n+\tinput->map_flags = map_flags;\n+\tinput->target_device_va_base = iova;\n+\n+\tpfn = paddr >> HV_HYP_PAGE_SHIFT;\n+\tfor (i = 0; i < npages; i++, pfn++)\n+\t\tinput->gpa_page_list[i] = pfn;\n+\n+\tstatus = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_GPA_PAGES, npages, 0,\n+\t\t\t\t     input, NULL);\n+\n+\tlocal_irq_restore(flags);\n+\treturn status;\n+}\n+\n+/*\n+ * The core VFIO code loops over memory ranges calling this function with\n+ * the largest size from HV_IOMMU_PGSIZES. cond_resched() is in vfio_iommu_map.\n+ */\n+static int hv_iommu_map_pages(struct iommu_domain *immdom, ulong iova,\n+\t\t\t      phys_addr_t paddr, size_t pgsize, size_t pgcount,\n+\t\t\t      int prot, gfp_t gfp, size_t *mapped)\n+{\n+\tu32 map_flags;\n+\tint ret;\n+\tu64 status;\n+\tunsigned long npages, done = 0;\n+\tstruct hv_domain *hvdom = to_hv_domain(immdom);\n+\tsize_t size = pgsize * pgcount;\n+\n+\tmap_flags = HV_MAP_GPA_READABLE;\t/* required */\n+\tmap_flags |= prot & IOMMU_WRITE ? HV_MAP_GPA_WRITABLE : 0;\n+\n+\tret = hv_iommu_add_tree_mapping(hvdom, iova, paddr, size, map_flags);\n+\tif (ret)\n+\t\treturn ret;\n+\n+\tif (hvdom->attached_dom) {\n+\t\t*mapped = size;\n+\t\treturn 0;\n+\t}\n+\n+\tnpages = size >> HV_HYP_PAGE_SHIFT;\n+\twhile (done < npages) {\n+\t\tulong completed, remain = npages - done;\n+\n+\t\tstatus = hv_iommu_map_pgs(hvdom, iova, paddr, remain,\n+\t\t\t\t\t  map_flags);\n+\n+\t\tcompleted = hv_repcomp(status);\n+\t\tdone = done + completed;\n+\t\tiova = iova + (completed << HV_HYP_PAGE_SHIFT);\n+\t\tpaddr = paddr + (completed << HV_HYP_PAGE_SHIFT);\n+\n+\t\tif (hv_result(status) == HV_STATUS_INSUFFICIENT_MEMORY) {\n+\t\t\tret = hv_call_deposit_pages(NUMA_NO_NODE,\n+\t\t\t\t\t\t    hv_current_partition_id,\n+\t\t\t\t\t\t    256);\n+\t\t\tif (ret)\n+\t\t\t\tbreak;\n+\t\t\tcontinue;\n+\t\t}\n+\t\tif (!hv_result_success(status))\n+\t\t\tbreak;\n+\t}\n+\n+\tif (!hv_result_success(status)) {\n+\t\tsize_t done_size = done << HV_HYP_PAGE_SHIFT;\n+\n+\t\thv_status_err(status, \"pgs:%lx/%lx iova:%lx\\n\",\n+\t\t\t      done, npages, iova);\n+\t\t/*\n+\t\t * lookup tree has all mappings [0 - size-1]. Below unmap will\n+\t\t * only remove from [0 - done], we need to remove second chunk\n+\t\t * [done+1 - size-1].\n+\t\t */\n+\t\thv_iommu_del_tree_mappings(hvdom, iova, size - done_size);\n+\t\thv_iommu_unmap_pages(immdom, iova - done_size, HV_HYP_PAGE_SIZE,\n+\t\t\t\t     done, NULL);\n+\t\tif (mapped)\n+\t\t\t*mapped = 0;\n+\t} else\n+\t\tif (mapped)\n+\t\t\t*mapped = size;\n+\n+\treturn hv_result_to_errno(status);\n+}\n+\n+static size_t hv_iommu_unmap_pages(struct iommu_domain *immdom, ulong iova,\n+\t\t\t\t   size_t pgsize, size_t pgcount,\n+\t\t\t\t   struct iommu_iotlb_gather *gather)\n+{\n+\tunsigned long flags, npages;\n+\tstruct hv_input_unmap_device_gpa_pages *input;\n+\tu64 status;\n+\tstruct hv_domain *hvdom = to_hv_domain(immdom);\n+\tsize_t unmapped, size = pgsize * pgcount;\n+\n+\tunmapped = hv_iommu_del_tree_mappings(hvdom, iova, size);\n+\tif (unmapped < size)\n+\t\tpr_err(\"%s: could not delete all mappings (%lx:%lx/%lx)\\n\",\n+\t\t       __func__, iova, unmapped, size);\n+\n+\tif (hvdom->attached_dom)\n+\t\treturn size;\n+\n+\tnpages = size >> HV_HYP_PAGE_SHIFT;\n+\n+\tlocal_irq_save(flags);\n+\tinput = *this_cpu_ptr(hyperv_pcpu_input_arg);\n+\tmemset(input, 0, sizeof(*input));\n+\n+\tinput->device_domain.partition_id = HV_PARTITION_ID_SELF;\n+\tinput->device_domain.domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;\n+\tinput->device_domain.domain_id.id = hvdom->domid_num;\n+\tinput->target_device_va_base = iova;\n+\n+\tstatus = hv_do_rep_hypercall(HVCALL_UNMAP_DEVICE_GPA_PAGES, npages,\n+\t\t\t\t     0, input, NULL);\n+\tlocal_irq_restore(flags);\n+\n+\tif (!hv_result_success(status))\n+\t\thv_status_err(status, \"\\n\");\n+\n+\treturn unmapped;\n+}\n+\n+static phys_addr_t hv_iommu_iova_to_phys(struct iommu_domain *immdom,\n+\t\t\t\t\t dma_addr_t iova)\n+{\n+\tunsigned long flags;\n+\tstruct hv_iommu_mapping *mapping;\n+\tstruct interval_tree_node *node;\n+\tu64 paddr = 0;\n+\tstruct hv_domain *hvdom = to_hv_domain(immdom);\n+\n+\tspin_lock_irqsave(&hvdom->mappings_lock, flags);\n+\tnode = interval_tree_iter_first(&hvdom->mappings_tree, iova, iova);\n+\tif (node) {\n+\t\tmapping = container_of(node, struct hv_iommu_mapping, iova);\n+\t\tpaddr = mapping->paddr + (iova - mapping->iova.start);\n+\t}\n+\tspin_unlock_irqrestore(&hvdom->mappings_lock, flags);\n+\n+\treturn paddr;\n+}\n+\n+/*\n+ * Currently, hypervisor does not provide list of devices it is using\n+ * dynamically. So use this to allow users to manually specify devices that\n+ * should be skipped. (eg. hypervisor debugger using some network device).\n+ */\n+static struct iommu_device *hv_iommu_probe_device(struct device *dev)\n+{\n+\tif (!dev_is_pci(dev))\n+\t\treturn ERR_PTR(-ENODEV);\n+\n+\tif (pci_devs_to_skip && *pci_devs_to_skip) {\n+\t\tint rc, pos = 0;\n+\t\tint parsed;\n+\t\tint segment, bus, slot, func;\n+\t\tstruct pci_dev *pdev = to_pci_dev(dev);\n+\n+\t\tdo {\n+\t\t\tparsed = 0;\n+\n+\t\t\trc = sscanf(pci_devs_to_skip + pos, \" (%x:%x:%x.%x) %n\",\n+\t\t\t\t    &segment, &bus, &slot, &func, &parsed);\n+\t\t\tif (rc)\n+\t\t\t\tbreak;\n+\t\t\tif (parsed <= 0)\n+\t\t\t\tbreak;\n+\n+\t\t\tif (pci_domain_nr(pdev->bus) == segment &&\n+\t\t\t    pdev->bus->number == bus &&\n+\t\t\t    PCI_SLOT(pdev->devfn) == slot &&\n+\t\t\t    PCI_FUNC(pdev->devfn) == func) {\n+\n+\t\t\t\tdev_info(dev, \"skipped by Hyper-V IOMMU\\n\");\n+\t\t\t\treturn ERR_PTR(-ENODEV);\n+\t\t\t}\n+\t\t\tpos += parsed;\n+\n+\t\t} while (pci_devs_to_skip[pos]);\n+\t}\n+\n+\t/* Device will be explicitly attached to the default domain, so no need\n+\t * to do dev_iommu_priv_set() here.\n+\t */\n+\n+\treturn &hv_virt_iommu;\n+}\n+\n+static void hv_iommu_probe_finalize(struct device *dev)\n+{\n+\tstruct iommu_domain *immdom = iommu_get_domain_for_dev(dev);\n+\n+\tif (immdom && immdom->type == IOMMU_DOMAIN_DMA)\n+\t\tiommu_setup_dma_ops(dev, immdom);\n+\telse\n+\t\tset_dma_ops(dev, NULL);\n+}\n+\n+static void hv_iommu_release_device(struct device *dev)\n+{\n+\tstruct hv_domain *hvdom = dev_iommu_priv_get(dev);\n+\n+\t/* Need to detach device from device domain if necessary. */\n+\tif (hvdom)\n+\t\thv_iommu_detach_dev(&hvdom->iommu_dom, dev);\n+\n+\tdev_iommu_priv_set(dev, NULL);\n+\tset_dma_ops(dev, NULL);\n+}\n+\n+static struct iommu_group *hv_iommu_device_group(struct device *dev)\n+{\n+\tif (dev_is_pci(dev))\n+\t\treturn pci_device_group(dev);\n+\telse\n+\t\treturn generic_device_group(dev);\n+}\n+\n+static int hv_iommu_def_domain_type(struct device *dev)\n+{\n+\t/* The hypervisor always creates this by default during boot */\n+\treturn IOMMU_DOMAIN_IDENTITY;\n+}\n+\n+static struct iommu_ops hv_iommu_ops = {\n+\t.capable\t    = hv_iommu_capable,\n+\t.domain_alloc_identity\t= hv_iommu_domain_alloc_identity,\n+\t.domain_alloc_paging\t= hv_iommu_domain_alloc_paging,\n+\t.probe_device\t    = hv_iommu_probe_device,\n+\t.probe_finalize     = hv_iommu_probe_finalize,\n+\t.release_device     = hv_iommu_release_device,\n+\t.def_domain_type    = hv_iommu_def_domain_type,\n+\t.device_group\t    = hv_iommu_device_group,\n+\t.default_domain_ops = &(const struct iommu_domain_ops) {\n+\t\t.attach_dev   = hv_iommu_attach_dev,\n+\t\t.map_pages    = hv_iommu_map_pages,\n+\t\t.unmap_pages  = hv_iommu_unmap_pages,\n+\t\t.iova_to_phys = hv_iommu_iova_to_phys,\n+\t\t.free\t      = hv_iommu_domain_free,\n+\t},\n+\t.owner\t\t    = THIS_MODULE,\n+};\n+\n+static void __init hv_initialize_special_domains(void)\n+{\n+\thv_def_identity_dom.iommu_dom.geometry = default_geometry;\n+\thv_def_identity_dom.domid_num = HV_DEVICE_DOMAIN_ID_S2_DEFAULT; /* 0 */\n+}\n+\n+static int __init hv_iommu_init(void)\n+{\n+\tint ret;\n+\tstruct iommu_device *iommup = &hv_virt_iommu;\n+\n+\tif (!hv_is_hyperv_initialized())\n+\t\treturn -ENODEV;\n+\n+\tret = iommu_device_sysfs_add(iommup, NULL, NULL, \"%s\", \"hyperv-iommu\");\n+\tif (ret) {\n+\t\tpr_err(\"Hyper-V: iommu_device_sysfs_add failed: %d\\n\", ret);\n+\t\treturn ret;\n+\t}\n+\n+\t/* This must come before iommu_device_register because the latter calls\n+\t * into the hooks.\n+\t */\n+\thv_initialize_special_domains();\n+\n+\tret = iommu_device_register(iommup, &hv_iommu_ops, NULL);\n+\tif (ret) {\n+\t\tpr_err(\"Hyper-V: iommu_device_register failed: %d\\n\", ret);\n+\t\tgoto err_sysfs_remove;\n+\t}\n+\n+\tpr_info(\"Hyper-V IOMMU initialized\\n\");\n+\n+\treturn 0;\n+\n+err_sysfs_remove:\n+\tiommu_device_sysfs_remove(iommup);\n+\treturn ret;\n+}\n+\n+void __init hv_iommu_detect(void)\n+{\n+\tif (no_iommu || iommu_detected)\n+\t\treturn;\n+\n+\t/* For l1vh, always expose an iommu unit */\n+\tif (!hv_l1vh_partition())\n+\t\tif (!(ms_hyperv.misc_features & HV_DEVICE_DOMAIN_AVAILABLE))\n+\t\t\treturn;\n+\n+\tiommu_detected = 1;\n+\tx86_init.iommu.iommu_init = hv_iommu_init;\n+\n+\tpci_request_acs();\n+}\ndiff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h\nindex fe5ddd1c43ff..edbcfc2a9b60 100644\n--- a/include/asm-generic/mshyperv.h\n+++ b/include/asm-generic/mshyperv.h\n@@ -331,11 +331,33 @@ static inline enum hv_isolation_type hv_get_isolation_type(void)\n \n #if IS_ENABLED(CONFIG_PCI_HYPERV)\n u64 hv_pci_vmbus_device_id(struct pci_dev *pdev);\n-#else   /* IS_ENABLED(CONFIG_PCI_HYPERV) */\n+#else\t/* IS_ENABLED(CONFIG_PCI_HYPERV) */\n static inline u64 hv_pci_vmbus_device_id(struct pci_dev *pdev)\n { return 0; }\n #endif /* IS_ENABLED(CONFIG_PCI_HYPERV) */\n \n+#if IS_ENABLED(CONFIG_HYPERV_IOMMU)\n+u64 hv_get_current_partid(void);\n+bool hv_pcidev_is_attached_dev(struct pci_dev *pdev);\n+bool hv_pcidev_is_pthru_dev(struct pci_dev *pdev);\n+u64 hv_build_devid_oftype(struct pci_dev *pdev, enum hv_device_type type);\n+\n+#else /* Remove following after arm64 implementation is done */\n+\n+static inline bool hv_pcidev_is_attached_dev(struct pci_dev *pdev)\n+{ return false; }\n+\n+static inline bool hv_pcidev_is_pthru_dev(struct pci_dev *pdev)\n+{ return false; }\n+\n+static inline u64 hv_build_devid_oftype(struct pci_dev *pdev,\n+\t\t\t\t\tenum hv_device_type type)\n+{ return 0; }\n+\n+static inline u64 hv_get_current_partid(void)\n+{ return HV_PARTITION_ID_INVALID; }\n+#endif /* IS_ENABLED(CONFIG_HYPERV_IOMMU) */\n+\n #if IS_ENABLED(CONFIG_MSHV_ROOT)\n static inline bool hv_root_partition(void)\n {\ndiff --git a/include/linux/hyperv.h b/include/linux/hyperv.h\nindex 5459e776ec17..6eee1cbf6f23 100644\n--- a/include/linux/hyperv.h\n+++ b/include/linux/hyperv.h\n@@ -1769,4 +1769,10 @@ static inline unsigned long virt_to_hvpfn(void *addr)\n #define HVPFN_DOWN(x)\t((x) >> HV_HYP_PAGE_SHIFT)\n #define page_to_hvpfn(page)\t(page_to_pfn(page) * NR_HV_HYP_PAGES_IN_PAGE)\n \n+#ifdef CONFIG_HYPERV_IOMMU\n+void __init hv_iommu_detect(void);\n+#else\n+static inline void hv_iommu_detect(void) { }\n+#endif /* CONFIG_HYPERV_IOMMU */\n+\n #endif /* _HYPERV_H */\n","prefixes":["V1","11/13"]}