From patchwork Wed Jul 25 09:50:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 949097 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41b9WW1hw2z9rxx for ; Wed, 25 Jul 2018 19:52:39 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 41b9WW0465zDrTH for ; Wed, 25 Jul 2018 19:52:39 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=ozlabs.ru (client-ip=107.173.13.209; helo=ozlabs.ru; envelope-from=aik@ozlabs.ru; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from ozlabs.ru (unknown [107.173.13.209]) by lists.ozlabs.org (Postfix) with ESMTP id 41b9TP1NmnzDrHN for ; Wed, 25 Jul 2018 19:50:47 +1000 (AEST) Received: from vpl1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id 054A8AE80037; Wed, 25 Jul 2018 05:49:23 -0400 (EDT) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH kernel RFC 1/3] powerpc/pseries/iommu: Allow dynamic window to start from zero Date: Wed, 25 Jul 2018 19:50:30 +1000 Message-Id: <20180725095032.2196-2-aik@ozlabs.ru> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180725095032.2196-1-aik@ozlabs.ru> References: <20180725095032.2196-1-aik@ozlabs.ru> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Kardashevskiy , kvm-ppc@vger.kernel.org, David Gibson Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" At the moment the kernel does not expect dynamic windows to ever start at zero on a PCI bus as PAPR requires the hypervisor to create a 32bit default window which starts from zero and the pseries kernel only creates additional windows. However PAPR permits removing the default window and creating another one instead, starting from zero as well. In fact, the kernel used to remove the default window after sha1 25ebc45b934 but this has been reverted later. Since there are devices capable of more than 32 bits for DMA but less than 50, and currently available hardware allows the second window only at 1<<59, we will need to be able to create bigger windows starting from zero. This does the initial preparation and should not cause any behavioral changes. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson --- arch/powerpc/platforms/pseries/iommu.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 06f0296..9ece42f 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -53,6 +53,8 @@ #include "pseries.h" +#define DDW_INVALID_OFFSET ((uint64_t)-1) + static struct iommu_table_group *iommu_pseries_alloc_group(int node) { struct iommu_table_group *table_group; @@ -844,7 +846,7 @@ static u64 find_existing_ddw(struct device_node *pdn) { struct direct_window *window; const struct dynamic_dma_window_prop *direct64; - u64 dma_addr = 0; + u64 dma_addr = DDW_INVALID_OFFSET; spin_lock(&direct_window_list_lock); /* check if we already created a window and dupe that config if so */ @@ -992,7 +994,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) mutex_lock(&direct_window_init_mutex); dma_addr = find_existing_ddw(pdn); - if (dma_addr != 0) + if (dma_addr != DDW_INVALID_OFFSET) goto out_unlock; /* @@ -1228,7 +1230,7 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask) } if (pdn && PCI_DN(pdn)) { dma_offset = enable_ddw(pdev, pdn); - if (dma_offset != 0) { + if (dma_offset != DDW_INVALID_OFFSET) { dev_info(dev, "Using 64-bit direct DMA at offset %llx\n", dma_offset); set_dma_offset(dev, dma_offset); set_dma_ops(dev, &dma_nommu_ops); From patchwork Wed Jul 25 09:50:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 949098 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41b9YV6jmMz9ryn for ; Wed, 25 Jul 2018 19:54:22 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 41b9YV4clszDrS7 for ; Wed, 25 Jul 2018 19:54:22 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=ozlabs.ru (client-ip=107.173.13.209; helo=ozlabs.ru; envelope-from=aik@ozlabs.ru; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from ozlabs.ru (unknown [107.173.13.209]) by lists.ozlabs.org (Postfix) with ESMTP id 41b9TQ2k9RzDrHr for ; Wed, 25 Jul 2018 19:50:50 +1000 (AEST) Received: from vpl1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id 41B47AE801DA; Wed, 25 Jul 2018 05:49:26 -0400 (EDT) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH kernel RFC 2/3] powerpc/pseries/iommu: Force default DMA window removal Date: Wed, 25 Jul 2018 19:50:31 +1000 Message-Id: <20180725095032.2196-3-aik@ozlabs.ru> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180725095032.2196-1-aik@ozlabs.ru> References: <20180725095032.2196-1-aik@ozlabs.ru> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Kardashevskiy , kvm-ppc@vger.kernel.org, David Gibson Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" It is quite common for a device to support more than 32bit but less than 64bit for DMA, for example, GPUs often support 42..50bits. However the pseries platform only allows huge DMA window (the one which allows the use of more than 2GB of DMA space) for 64bit-capable devices mostly because: 1. we may have 32bit and >32bit devices on the same IOMMU domain and we cannot place the new big window where the 32bit one is located; 2. the existing hardware only supports the second window at very high offset of 1<<59 == 0x0800.0000.0000.0000. So in order to allow 33..59bit DMA, we have to remove the default DMA window and place a huge one there instead. The PAPR spec says that the platform may decide not to use the default window and remove it using DDW RTAS calls. There are few possible ways for the platform to decide: 1. look at the device IDs and decide in advance that such and such devices are capable of more than 32bit DMA (powernv's sketchy bypass does something like this - it drops the default window if all devices on the PE are from the same vendor) - this is not great as involves guessing because, unlike sketchy bypass, the GPU case involves 2 vendor ids and does not scale; 2. advertise 1 available DMA window in the hypervisor via ibm,query-pe-dma-window so the pseries platform could take it as a clue that if more bits for DMA are needed, it has to remove the default window - this is not great as it is implicit clue rather than direct instruction; 3. removing the default DMA window at all it not really an option as PAPR mandates its presense at the guest boot time; 4. make the hypervisor explicitly tell the guest that the default window is better be removed so the guest does not have to think hard and can simply do what requested and this is what this patch does. This makes use of the latter approach and exploits a new "qemu,dma-force-remove-default" flag in a vPHB. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/platforms/pseries/iommu.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 9ece42f..840afe5 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -54,6 +54,7 @@ #include "pseries.h" #define DDW_INVALID_OFFSET ((uint64_t)-1) +#define DDW_INVALID_LIOBN ((uint32_t)-1) static struct iommu_table_group *iommu_pseries_alloc_group(int node) { @@ -977,7 +978,8 @@ static LIST_HEAD(failed_ddw_pdn_list); * * returns the dma offset for use by dma_set_mask */ -static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) +static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn, + u32 default_liobn) { int len, ret; struct ddw_query_response query; @@ -1022,6 +1024,16 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) if (ret) goto out_failed; + /* + * The device tree has a request to force remove the default window, + * do this. + */ + if (default_liobn != DDW_INVALID_LIOBN && (!ddw_avail[2] || + rtas_call(ddw_avail[2], 1, 1, NULL, default_liobn))) { + dev_dbg(&dev->dev, "Could not remove window"); + goto out_failed; + } + /* * Query if there is a second window of size to map the * whole partition. Query returns number of windows, largest @@ -1212,7 +1224,7 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask) pdev = to_pci_dev(dev); /* only attempt to use a new window if 64-bit DMA is requested */ - if (!disable_ddw && dma_mask == DMA_BIT_MASK(64)) { + if (!disable_ddw && dma_mask > DMA_BIT_MASK(32)) { dn = pci_device_to_OF_node(pdev); dev_dbg(dev, "node is %pOF\n", dn); @@ -1229,7 +1241,15 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask) break; } if (pdn && PCI_DN(pdn)) { - dma_offset = enable_ddw(pdev, pdn); + u32 flag = 0, liobn = DDW_INVALID_LIOBN; + int ret = of_property_read_u32(pdn, + "qemu,dma-force-remove-default", &flag); + + if (!ret && flag && dma_window && + dma_mask != DMA_BIT_MASK(64)) + liobn = be32_to_cpu(dma_window[0]); + + dma_offset = enable_ddw(pdev, pdn, liobn); if (dma_offset != DDW_INVALID_OFFSET) { dev_info(dev, "Using 64-bit direct DMA at offset %llx\n", dma_offset); set_dma_offset(dev, dma_offset); From patchwork Wed Jul 25 09:50:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 949102 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41b9dY3y96z9rxx for ; Wed, 25 Jul 2018 19:57:53 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 41b9dY2rbVzDrvG for ; Wed, 25 Jul 2018 19:57:53 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=ozlabs.ru (client-ip=107.173.13.209; helo=ozlabs.ru; envelope-from=aik@ozlabs.ru; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from ozlabs.ru (unknown [107.173.13.209]) by lists.ozlabs.org (Postfix) with ESMTP id 41b9TT1zX5zDrHN for ; Wed, 25 Jul 2018 19:50:53 +1000 (AEST) Received: from vpl1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id 81C4EAE801DC; Wed, 25 Jul 2018 05:49:28 -0400 (EDT) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH kernel RFC 3/3] powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation Date: Wed, 25 Jul 2018 19:50:32 +1000 Message-Id: <20180725095032.2196-4-aik@ozlabs.ru> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180725095032.2196-1-aik@ozlabs.ru> References: <20180725095032.2196-1-aik@ozlabs.ru> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Kardashevskiy , kvm-ppc@vger.kernel.org, David Gibson Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" We might have memory@ nodes with "linux,usable-memory" set to zero (for example, to replicate powernv's behaviour for GPU coherent memory) which means that the memory needs an extra initialization but since it can be used afterwards, the pseries platform will try mapping it for DMA so the DMA window needs to cover those memory regions too. This walks through the memory nodes to find the highest RAM address to let a huge DMA window cover that too in case this memory gets onlined later. The existing memory_hotplug_max() does not do the job as it calls: 1. memblock_end_of_DRAM() which looks at memory blocks and GPU RAM is not there because of size==0 in linux,usable-memory property of the memory node; 2. hot_add_drconf_memory_max() does not support sparse memory if we want to map this memory in the guest where it is mapped on the host (and it looks like we have to), the drconf chunk is easily getting bigger that a megabyte. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/platforms/pseries/iommu.c | 43 +++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 840afe5..74404f8 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -967,6 +967,47 @@ struct failed_ddw_pdn { static LIST_HEAD(failed_ddw_pdn_list); +static unsigned long read_n_cells(int n, const __be32 **buf) +{ + unsigned long result = 0; + + while (n--) { + result = (result << 32) | of_read_number(*buf, 1); + (*buf)++; + } + return result; +} + +static phys_addr_t ddw_memory_hotplug_max(void) +{ + phys_addr_t max_addr = memory_hotplug_max(); + struct device_node *memory; + + for_each_node_by_type(memory, "memory") { + unsigned long start, size; + int ranges, n_mem_addr_cells, n_mem_size_cells, len; + const __be32 *memcell_buf; + + memcell_buf = of_get_property(memory, "reg", &len); + if (!memcell_buf || len <= 0) + continue; + + n_mem_addr_cells = of_n_addr_cells(memory); + n_mem_size_cells = of_n_size_cells(memory); + + /* ranges in cell */ + ranges = (len >> 2) / (n_mem_addr_cells + n_mem_size_cells); + + /* these are order-sensitive, and modify the buffer pointer */ + start = read_n_cells(n_mem_addr_cells, &memcell_buf); + size = read_n_cells(n_mem_size_cells, &memcell_buf); + + max_addr = max_t(phys_addr_t, max_addr, start + size); + } + + return max_addr; +} + /* * If the PE supports dynamic dma windows, and there is space for a table * that can map all pages in a linear offset, then setup such a table, @@ -1067,7 +1108,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn, } /* verify the window * number of ptes will map the partition */ /* check largest block * page size > max memory hotplug addr */ - max_addr = memory_hotplug_max(); + max_addr = ddw_memory_hotplug_max(); if (query.largest_available_block < (max_addr >> page_shift)) { dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u " "%llu-sized pages\n", max_addr, query.largest_available_block,