From patchwork Wed May 3 20:33:08 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Manoj Iyer X-Patchwork-Id: 758195 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3wJ8xV11hwz9rxw; Thu, 4 May 2017 06:33:18 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1d60xX-0006U8-EO; Wed, 03 May 2017 20:33:15 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1d60xS-0006Sk-JT for kernel-team@lists.ubuntu.com; Wed, 03 May 2017 20:33:10 +0000 Received: from 1.general.manjo.us.vpn ([10.172.65.2] helo=canonical.com) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1d60xS-0001gH-3J for kernel-team@lists.ubuntu.com; Wed, 03 May 2017 20:33:10 +0000 From: Manoj Iyer To: kernel-team@lists.ubuntu.com Subject: [PATCH 5/6] iommu/dma: Plumb in the per-CPU IOVA caches Date: Wed, 3 May 2017 15:33:08 -0500 Message-Id: <20170503203308.32491-1-manoj.iyer@canonical.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170503203150.32261-1-manoj.iyer@canonical.com> References: <20170503203150.32261-1-manoj.iyer@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com From: Robin Murphy With IOVA allocation suitably tidied up, we are finally free to opt in to the per-CPU caching mechanism. The caching alone can provide a modest improvement over walking the rbtree for weedier systems (iperf3 shows ~10% more ethernet throughput on an ARM Juno r1 constrained to a single 650MHz Cortex-A53), but the real gain will be in sidestepping the rbtree lock contention which larger ARM-based systems with lots of parallel I/O are starting to feel the pain of. BugLink: http://bugs.launchpad.net/bugs/1680549 Reviewed-by: Nate Watterson Tested-by: Nate Watterson Signed-off-by: Robin Murphy Signed-off-by: Joerg Roedel (cherry picked from commit bb65a64c7285e7105c1a6c8a33b37770343a4e96) Signed-off-by: Manoj Iyer --- drivers/iommu/dma-iommu.c | 37 +++++++++++++++++-------------------- 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index a35df77e7d5a..dce6b17f2c9e 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -276,8 +276,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, { struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = &cookie->iovad; - unsigned long shift, iova_len; - struct iova *iova = NULL; + unsigned long shift, iova_len, iova = 0; if (cookie->type == IOMMU_DMA_MSI_COOKIE) { cookie->msi_iova += size; @@ -286,41 +285,39 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, shift = iova_shift(iovad); iova_len = size >> shift; + /* + * Freeing non-power-of-two-sized allocations back into the IOVA caches + * will come back to bite us badly, so we have to waste a bit of space + * rounding up anything cacheable to make sure that can't happen. The + * order of the unadjusted size will still match upon freeing. + */ + if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) + iova_len = roundup_pow_of_two(iova_len); if (domain->geometry.force_aperture) dma_limit = min(dma_limit, domain->geometry.aperture_end); /* Try to get PCI devices a SAC address */ if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev)) - iova = alloc_iova(iovad, iova_len, DMA_BIT_MASK(32) >> shift, - true); - /* - * Enforce size-alignment to be safe - there could perhaps be an - * attribute to control this per-device, or at least per-domain... - */ + iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift); + if (!iova) - iova = alloc_iova(iovad, iova_len, dma_limit >> shift, true); + iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift); - return (dma_addr_t)iova->pfn_lo << shift; + return (dma_addr_t)iova << shift; } static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie, dma_addr_t iova, size_t size) { struct iova_domain *iovad = &cookie->iovad; - struct iova *iova_rbnode; + unsigned long shift = iova_shift(iovad); /* The MSI case is only ever cleaning up its most recent allocation */ - if (cookie->type == IOMMU_DMA_MSI_COOKIE) { + if (cookie->type == IOMMU_DMA_MSI_COOKIE) cookie->msi_iova -= size; - return; - } - - iova_rbnode = find_iova(iovad, iova_pfn(iovad, iova)); - if (WARN_ON(!iova_rbnode)) - return; - - __free_iova(iovad, iova_rbnode); + else + free_iova_fast(iovad, iova >> shift, size >> shift); } static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr,