From patchwork Wed May  3 20:33:08 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Manoj Iyer <manoj.iyer@canonical.com>
X-Patchwork-Id: 758195
Return-Path: <kernel-team-bounces@lists.ubuntu.com>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from huckleberry.canonical.com (huckleberry.canonical.com
	[91.189.94.19])
	by ozlabs.org (Postfix) with ESMTP id 3wJ8xV11hwz9rxw;
	Thu,  4 May 2017 06:33:18 +1000 (AEST)
Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com)
	by huckleberry.canonical.com with esmtp (Exim 4.76)
	(envelope-from <kernel-team-bounces@lists.ubuntu.com>)
	id 1d60xX-0006U8-EO; Wed, 03 May 2017 20:33:15 +0000
Received: from youngberry.canonical.com ([91.189.89.112])
	by huckleberry.canonical.com with esmtps
	(TLS1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.76) (envelope-from <manjo.iyer@canonical.com>)
	id 1d60xS-0006Sk-JT
	for kernel-team@lists.ubuntu.com; Wed, 03 May 2017 20:33:10 +0000
Received: from 1.general.manjo.us.vpn ([10.172.65.2] helo=canonical.com)
	by youngberry.canonical.com with esmtpsa
	(TLS1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.76) (envelope-from <manjo.iyer@canonical.com>)
	id 1d60xS-0001gH-3J
	for kernel-team@lists.ubuntu.com; Wed, 03 May 2017 20:33:10 +0000
From: Manoj Iyer <manoj.iyer@canonical.com>
To: kernel-team@lists.ubuntu.com
Subject: [PATCH 5/6] iommu/dma: Plumb in the per-CPU IOVA caches
Date: Wed,  3 May 2017 15:33:08 -0500
Message-Id: <20170503203308.32491-1-manoj.iyer@canonical.com>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20170503203150.32261-1-manoj.iyer@canonical.com>
References: <20170503203150.32261-1-manoj.iyer@canonical.com>
X-BeenThere: kernel-team@lists.ubuntu.com
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Kernel team discussions <kernel-team.lists.ubuntu.com>
List-Unsubscribe: <https://lists.ubuntu.com/mailman/options/kernel-team>,
	<mailto:kernel-team-request@lists.ubuntu.com?subject=unsubscribe>
List-Archive: <https://lists.ubuntu.com/archives/kernel-team>
List-Post: <mailto:kernel-team@lists.ubuntu.com>
List-Help: <mailto:kernel-team-request@lists.ubuntu.com?subject=help>
List-Subscribe: <https://lists.ubuntu.com/mailman/listinfo/kernel-team>,
	<mailto:kernel-team-request@lists.ubuntu.com?subject=subscribe>
MIME-Version: 1.0
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: kernel-team-bounces@lists.ubuntu.com

From: Robin Murphy <robin.murphy@arm.com>

With IOVA allocation suitably tidied up, we are finally free to opt in
to the per-CPU caching mechanism. The caching alone can provide a modest
improvement over walking the rbtree for weedier systems (iperf3 shows
~10% more ethernet throughput on an ARM Juno r1 constrained to a single
650MHz Cortex-A53), but the real gain will be in sidestepping the rbtree
lock contention which larger ARM-based systems with lots of parallel I/O
are starting to feel the pain of.

BugLink: http://bugs.launchpad.net/bugs/1680549

Reviewed-by: Nate Watterson <nwatters@codeaurora.org>
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit bb65a64c7285e7105c1a6c8a33b37770343a4e96)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
---
 drivers/iommu/dma-iommu.c | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a35df77e7d5a..dce6b17f2c9e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -276,8 +276,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
 {
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iova_domain *iovad = &cookie->iovad;
-	unsigned long shift, iova_len;
-	struct iova *iova = NULL;
+	unsigned long shift, iova_len, iova = 0;
 
 	if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
 		cookie->msi_iova += size;
@@ -286,41 +285,39 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
 
 	shift = iova_shift(iovad);
 	iova_len = size >> shift;
+	/*
+	 * Freeing non-power-of-two-sized allocations back into the IOVA caches
+	 * will come back to bite us badly, so we have to waste a bit of space
+	 * rounding up anything cacheable to make sure that can't happen. The
+	 * order of the unadjusted size will still match upon freeing.
+	 */
+	if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
+		iova_len = roundup_pow_of_two(iova_len);
 
 	if (domain->geometry.force_aperture)
 		dma_limit = min(dma_limit, domain->geometry.aperture_end);
 
 	/* Try to get PCI devices a SAC address */
 	if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
-		iova = alloc_iova(iovad, iova_len, DMA_BIT_MASK(32) >> shift,
-				  true);
-	/*
-	 * Enforce size-alignment to be safe - there could perhaps be an
-	 * attribute to control this per-device, or at least per-domain...
-	 */
+		iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);
+
 	if (!iova)
-		iova = alloc_iova(iovad, iova_len, dma_limit >> shift, true);
+		iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);
 
-	return (dma_addr_t)iova->pfn_lo << shift;
+	return (dma_addr_t)iova << shift;
 }
 
 static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
 		dma_addr_t iova, size_t size)
 {
 	struct iova_domain *iovad = &cookie->iovad;
-	struct iova *iova_rbnode;
+	unsigned long shift = iova_shift(iovad);
 
 	/* The MSI case is only ever cleaning up its most recent allocation */
-	if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
+	if (cookie->type == IOMMU_DMA_MSI_COOKIE)
 		cookie->msi_iova -= size;
-		return;
-	}
-
-	iova_rbnode = find_iova(iovad, iova_pfn(iovad, iova));
-	if (WARN_ON(!iova_rbnode))
-		return;
-
-	__free_iova(iovad, iova_rbnode);
+	else
+		free_iova_fast(iovad, iova >> shift, size >> shift);
 }
 
 static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr,