From patchwork Tue Oct 27 10:18:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 1388411 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CL78x0Ksxz9sSf for ; Tue, 27 Oct 2020 21:24:01 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4CL78w6JtvzDqQy for ; Tue, 27 Oct 2020 21:24:00 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.ru (client-ip=107.174.27.60; helo=ozlabs.ru; envelope-from=aik@ozlabs.ru; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from ozlabs.ru (ozlabs.ru [107.174.27.60]) by lists.ozlabs.org (Postfix) with ESMTP id 4CL7385N4SzDqGb for ; Tue, 27 Oct 2020 21:19:00 +1100 (AEDT) Received: from fstn1-p1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id A3EA6AE80277; Tue, 27 Oct 2020 06:18:10 -0400 (EDT) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH kernel v2 1/2] dma: Allow mixing bypass and normal IOMMU operation Date: Tue, 27 Oct 2020 21:18:40 +1100 Message-Id: <20201027101841.96056-2-aik@ozlabs.ru> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201027101841.96056-1-aik@ozlabs.ru> References: <20201027101841.96056-1-aik@ozlabs.ru> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Kardashevskiy , iommu@lists.linux-foundation.org, Christoph Hellwig , linux-kernel@vger.kernel.org Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" At the moment we allow bypassing DMA ops only when we can do this for the entire RAM. However there are configs with mixed type memory where we could still allow bypassing IOMMU in most cases; POWERPC with persistent memory is one example. This adds another check for the bus limit to determine where bypass can still work and we invoke direct DMA API; when DMA handle is outside that limit, we fall back to DMA ops. This adds a CONFIG_DMA_OPS_BYPASS_BUS_LIMIT config option which is off by default and will be enable for PPC_PSERIES in the following patch. Signed-off-by: Alexey Kardashevskiy --- kernel/dma/mapping.c | 61 ++++++++++++++++++++++++++++++++++++++++++-- kernel/dma/Kconfig | 4 +++ 2 files changed, 63 insertions(+), 2 deletions(-) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 51bb8fa8eb89..0f4f998e6c72 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -137,6 +137,18 @@ static inline bool dma_map_direct(struct device *dev, return dma_go_direct(dev, *dev->dma_mask, ops); } +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT +static inline bool can_map_direct(struct device *dev, phys_addr_t addr) +{ + return dev->bus_dma_limit >= phys_to_dma(dev, addr); +} + +static inline bool dma_handle_direct(struct device *dev, dma_addr_t dma_handle) +{ + return dma_handle >= dev->archdata.dma_offset; +} +#endif + dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, unsigned long attrs) @@ -151,6 +163,11 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, if (dma_map_direct(dev, ops)) addr = dma_direct_map_page(dev, page, offset, size, dir, attrs); +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT + else if (dev->bus_dma_limit && + can_map_direct(dev, (phys_addr_t) page_to_phys(page) + offset + size)) + addr = dma_direct_map_page(dev, page, offset, size, dir, attrs); +#endif else addr = ops->map_page(dev, page, offset, size, dir, attrs); debug_dma_map_page(dev, page, offset, size, dir, addr); @@ -167,6 +184,10 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size, BUG_ON(!valid_dma_direction(dir)); if (dma_map_direct(dev, ops)) dma_direct_unmap_page(dev, addr, size, dir, attrs); +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT + else if (dev->bus_dma_limit && dma_handle_direct(dev, addr + size)) + dma_direct_unmap_page(dev, addr, size, dir, attrs); +#endif else if (ops->unmap_page) ops->unmap_page(dev, addr, size, dir, attrs); debug_dma_unmap_page(dev, addr, size, dir); @@ -190,6 +211,23 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents, if (dma_map_direct(dev, ops)) ents = dma_direct_map_sg(dev, sg, nents, dir, attrs); +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT + else if (dev->bus_dma_limit) { + struct scatterlist *s; + bool direct = true; + int i; + + for_each_sg(sg, s, nents, i) { + direct = can_map_direct(dev, sg_phys(s) + s->offset + s->length); + if (!direct) + break; + } + if (direct) + ents = dma_direct_map_sg(dev, sg, nents, dir, attrs); + else + ents = ops->map_sg(dev, sg, nents, dir, attrs); + } +#endif else ents = ops->map_sg(dev, sg, nents, dir, attrs); BUG_ON(ents < 0); @@ -207,9 +245,28 @@ void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, BUG_ON(!valid_dma_direction(dir)); debug_dma_unmap_sg(dev, sg, nents, dir); - if (dma_map_direct(dev, ops)) + if (dma_map_direct(dev, ops)) { dma_direct_unmap_sg(dev, sg, nents, dir, attrs); - else if (ops->unmap_sg) + return; + } +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT + if (dev->bus_dma_limit) { + struct scatterlist *s; + bool direct = true; + int i; + + for_each_sg(sg, s, nents, i) { + direct = dma_handle_direct(dev, s->dma_address + s->length); + if (!direct) + break; + } + if (direct) { + dma_direct_unmap_sg(dev, sg, nents, dir, attrs); + return; + } + } +#endif + if (ops->unmap_sg) ops->unmap_sg(dev, sg, nents, dir, attrs); } EXPORT_SYMBOL(dma_unmap_sg_attrs); diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index c99de4a21458..02fa174fbdec 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -20,6 +20,10 @@ config DMA_OPS config DMA_OPS_BYPASS bool +# IOMMU driver limited by a DMA window size may switch between bypass and window +config DMA_OPS_BYPASS_BUS_LIMIT + bool + config NEED_SG_DMA_LENGTH bool From patchwork Tue Oct 27 10:18:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 1388404 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CL74t3cRqz9sSf for ; Tue, 27 Oct 2020 21:20:30 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4CL74t2l3tzDqNn for ; Tue, 27 Oct 2020 21:20:30 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.ru (client-ip=107.174.27.60; helo=ozlabs.ru; envelope-from=aik@ozlabs.ru; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from ozlabs.ru (ozlabs.ru [107.174.27.60]) by lists.ozlabs.org (Postfix) with ESMTP id 4CL73630rSzDqCB for ; Tue, 27 Oct 2020 21:18:57 +1100 (AEDT) Received: from fstn1-p1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id 51269AE80279; Tue, 27 Oct 2020 06:18:13 -0400 (EDT) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH kernel v2 2/2] powerpc/dma: Fallback to dma_ops when persistent memory present Date: Tue, 27 Oct 2020 21:18:41 +1100 Message-Id: <20201027101841.96056-3-aik@ozlabs.ru> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201027101841.96056-1-aik@ozlabs.ru> References: <20201027101841.96056-1-aik@ozlabs.ru> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Kardashevskiy , iommu@lists.linux-foundation.org, Christoph Hellwig , linux-kernel@vger.kernel.org Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" So far we have been using huge DMA windows to map all the RAM available. The RAM is normally mapped to the VM address space contiguously, and there is always a reasonable upper limit for possible future hot plugged RAM which makes it easy to map all RAM via IOMMU. Now there is persistent memory ("ibm,pmemory" in the FDT) which (unlike normal RAM) can map anywhere in the VM space beyond the maximum RAM size and since it can be used for DMA, it requires extending the huge window up to MAX_PHYSMEM_BITS which requires hypervisor support for: 1. huge TCE tables; 2. multilevel TCE tables; 3. huge IOMMU pages. Certain hypervisors cannot do either so the only option left is restricting the huge DMA window to include only RAM and fallback to the default DMA window for persistent memory. This checks if the system has persistent memory. If it does not, the DMA bypass mode is selected, i.e. * dev->bus_dma_limit = 0 * dev->dma_ops_bypass = true <- this avoid calling dma_ops for mapping. If there is such memory, this creates identity mapping only for RAM and sets the dev->bus_dma_limit to let the generic code decide whether to call into the direct DMA or the indirect DMA ops. This should not change the existing behaviour when no persistent memory as dev->dma_ops_bypass is expected to be set. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/kernel/dma-iommu.c | 12 +++++-- arch/powerpc/platforms/pseries/iommu.c | 44 ++++++++++++++++++++------ arch/powerpc/Kconfig | 1 + 3 files changed, 45 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c index a1c744194018..d123b7205f76 100644 --- a/arch/powerpc/kernel/dma-iommu.c +++ b/arch/powerpc/kernel/dma-iommu.c @@ -90,8 +90,16 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask) struct iommu_table *tbl = get_iommu_table_base(dev); if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) { - dev->dma_ops_bypass = true; - dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n"); + /* + * dma_iommu_bypass_supported() sets dma_max when there is + * 1:1 mapping but it is somehow limited. + * ibm,pmemory is one example. + */ + dev->dma_ops_bypass = dev->bus_dma_limit == 0; + if (!dev->dma_ops_bypass) + dev_warn(dev, "iommu: 64-bit OK but using default ops\n"); + else + dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n"); return 1; } diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index e4198700ed1a..91112e748491 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -839,7 +839,7 @@ static void remove_ddw(struct device_node *np, bool remove_prop) np, ret); } -static u64 find_existing_ddw(struct device_node *pdn) +static u64 find_existing_ddw(struct device_node *pdn, int *window_shift) { struct direct_window *window; const struct dynamic_dma_window_prop *direct64; @@ -851,6 +851,7 @@ static u64 find_existing_ddw(struct device_node *pdn) if (window->device == pdn) { direct64 = window->prop; dma_addr = be64_to_cpu(direct64->dma_base); + *window_shift = be32_to_cpu(direct64->window_shift); break; } } @@ -1111,11 +1112,13 @@ static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn) */ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) { - int len, ret; + int len = 0, ret; + bool pmem_present = of_find_node_by_type(NULL, "ibm,pmemory") != NULL; + int max_ram_len = order_base_2(ddw_memory_hotplug_max()); struct ddw_query_response query; struct ddw_create_response create; int page_shift; - u64 dma_addr, max_addr; + u64 dma_addr; struct device_node *dn; u32 ddw_avail[DDW_APPLICABLE_SIZE]; struct direct_window *window; @@ -1126,7 +1129,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) mutex_lock(&direct_window_init_mutex); - dma_addr = find_existing_ddw(pdn); + dma_addr = find_existing_ddw(pdn, &len); if (dma_addr != 0) goto out_unlock; @@ -1212,14 +1215,26 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) } /* verify the window * number of ptes will map the partition */ /* check largest block * page size > max memory hotplug addr */ - max_addr = ddw_memory_hotplug_max(); - if (query.largest_available_block < (max_addr >> page_shift)) { - dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu " - "%llu-sized pages\n", max_addr, query.largest_available_block, - 1ULL << page_shift); + /* + * The "ibm,pmemory" can appear anywhere in the address space. + * Assuming it is still backed by page structs, try MAX_PHYSMEM_BITS + * for the upper limit and fallback to max RAM otherwise but this + * disables device::dma_ops_bypass. + */ + len = max_ram_len; + if (pmem_present) { + if (query.largest_available_block >= + (1ULL << (MAX_PHYSMEM_BITS - page_shift))) + len = MAX_PHYSMEM_BITS - page_shift; + else + dev_info(&dev->dev, "Skipping ibm,pmemory"); + } + + if (query.largest_available_block < (1ULL << (len - page_shift))) { + dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu %llu-sized pages\n", + 1ULL << len, query.largest_available_block, 1ULL << page_shift); goto out_failed; } - len = order_base_2(max_addr); win64 = kzalloc(sizeof(struct property), GFP_KERNEL); if (!win64) { dev_info(&dev->dev, @@ -1299,6 +1314,15 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) out_unlock: mutex_unlock(&direct_window_init_mutex); + + /* + * If we have persistent memory and the window size is only as big + * as RAM, then we failed to create a window to cover persistent + * memory and need to set the DMA limit. + */ + if (pmem_present && dma_addr && (len == max_ram_len)) + dev->dev.bus_dma_limit = dma_addr + (1ULL << len); + return dma_addr; } diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index e9f13fe08492..5a8881bf140e 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -159,6 +159,7 @@ config PPC select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN select DMA_OPS if PPC64 select DMA_OPS_BYPASS if PPC64 + select DMA_OPS_BYPASS_BUS_LIMIT if PPC64 && PPC_PSERIES select DYNAMIC_FTRACE if FUNCTION_TRACER select EDAC_ATOMIC_SCRUB select EDAC_SUPPORT