From patchwork Tue Mar 12 18:14:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shivaprasad G Bhat X-Patchwork-Id: 1911308 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=tq59OYoF; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TvMJS2g2gz1yWy for ; Wed, 13 Mar 2024 05:15:36 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=tq59OYoF; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4TvMJR3vDlz3vXK for ; Wed, 13 Mar 2024 05:15:35 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=tq59OYoF; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=sbhat@linux.ibm.com; receiver=lists.ozlabs.org) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4TvMHX3jwjz3dTx for ; Wed, 13 Mar 2024 05:14:48 +1100 (AEDT) Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42CI0VUR032729; Tue, 12 Mar 2024 18:14:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=uszZoHUScvfpeeagNwfFGLg8tvqjaAxDDkFYoQOnBbw=; b=tq59OYoFryHBMRfr8jnsahJNUT+fpLCEv1kLdtL7ZBBTiTY0tRQlXwav41eH7nuWrWYp tx+PZKvKYAMrDUiY4TJ0oHmIW7/2Ry84XWSoSTU7MZupFGZW0uBlGHRipxZ9hUxGoMr6 9AilEQY0fvpHT6gAot/2qNdoZZj0Dj1Xnx/6ewfpp0fYn2zEvVrO0ErIFT/KugyG6K0y lXd9W1hluC0e86YAratmluxhs89OAKyHdhJcEIHR2JvD0jzrSrMmuuUWBWFuDMOOlpN8 OkND/bDX8DWSr061/Ozzqutrc7XDI/usciDdnYg4PVnkyS/oZB9oXm8XD14DYwjMLyjZ mA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wturh8chr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Mar 2024 18:14:31 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42CI6VSX019017; Tue, 12 Mar 2024 18:14:30 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wturh8chd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Mar 2024 18:14:30 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42CFSEoY020423; Tue, 12 Mar 2024 18:14:29 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3ws3km0m4y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Mar 2024 18:14:29 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42CIEN3G37945740 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Mar 2024 18:14:25 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 681D920049; Tue, 12 Mar 2024 18:14:23 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B169020040; Tue, 12 Mar 2024 18:14:20 +0000 (GMT) Received: from ltcd48-lp2.aus.stglabs.ibm.com (unknown [9.3.101.175]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 12 Mar 2024 18:14:20 +0000 (GMT) Subject: [RFC PATCH 1/3] powerpc/pseries/iommu: Bring back userspace view for single level TCE tables From: Shivaprasad G Bhat To: tpearson@raptorengineering.com, alex.williamson@redhat.com, linuxppc-dev@lists.ozlabs.org Date: Tue, 12 Mar 2024 13:14:20 -0500 Message-ID: <171026725393.8367.17497620074051138306.stgit@linux.ibm.com> In-Reply-To: <171026724548.8367.8321359354119254395.stgit@linux.ibm.com> References: <171026724548.8367.8321359354119254395.stgit@linux.ibm.com> User-Agent: StGit/1.5 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 5-fhysGSE_e6smelR3AT1kYjih6oe9_C X-Proofpoint-GUID: 7cKbR36YwbMJnTzz3TDT8ElrmbaqRYBj X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-12_11,2024-03-12_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 spamscore=0 clxscore=1015 lowpriorityscore=0 suspectscore=0 priorityscore=1501 phishscore=0 impostorscore=0 adultscore=0 bulkscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2403120138 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: robh@kernel.org, jroedel@suse.de, sbhat@linux.ibm.com, gbatra@linux.vnet.ibm.com, jgg@ziepe.ca, aik@ozlabs.ru, linux-kernel@vger.kernel.org, svaidy@linux.ibm.com, aneesh.kumar@kernel.org, brking@linux.vnet.ibm.com, npiggin@gmail.com, kvm@vger.kernel.org, naveen.n.rao@linux.ibm.com, vaibhav@linux.ibm.com, msuchanek@suse.de, aik@amd.com Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to it_userspace") which implemented the tce indirect levels support for PowerNV ended up removing the single level support which existed by default(generic tce_iommu_userspace_view_alloc/free() calls). On pSeries the TCEs are single level, and the allocation of userspace view is lost with the removal of generic code. The patch attempts to bring it back for pseries on the refactored code base. On pSeries, the windows/tables are "borrowed", so the it_ops->free() is not called during the container detach or the tce release call paths as the table is not really freed. So, decoupling the userspace view array free and alloc from table's it_ops just the way it was before. Signed-off-by: Shivaprasad G Bhat --- arch/powerpc/platforms/pseries/iommu.c | 19 ++++++++++-- drivers/vfio/vfio_iommu_spapr_tce.c | 51 ++++++++++++++++++++++++++++++++ 2 files changed, 67 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index e8c4129697b1..40de8d55faef 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -143,7 +143,7 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index, } -static void tce_free_pSeries(struct iommu_table *tbl, long index, long npages) +static void tce_clear_pSeries(struct iommu_table *tbl, long index, long npages) { __be64 *tcep; @@ -162,6 +162,11 @@ static unsigned long tce_get_pseries(struct iommu_table *tbl, long index) return be64_to_cpu(*tcep); } +static void tce_free_pSeries(struct iommu_table *tbl) +{ + /* Do nothing. */ +} + static void tce_free_pSeriesLP(unsigned long liobn, long, long, long); static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long); @@ -576,7 +581,7 @@ struct iommu_table_ops iommu_table_lpar_multi_ops; struct iommu_table_ops iommu_table_pseries_ops = { .set = tce_build_pSeries, - .clear = tce_free_pSeries, + .clear = tce_clear_pSeries, .get = tce_get_pseries }; @@ -685,15 +690,23 @@ static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned return rc; } + +static __be64 *tce_useraddr_pSeriesLP(struct iommu_table *tbl, long index, + bool __always_unused alloc) +{ + return tbl->it_userspace ? &tbl->it_userspace[index - tbl->it_offset] : NULL; +} #endif struct iommu_table_ops iommu_table_lpar_multi_ops = { .set = tce_buildmulti_pSeriesLP, #ifdef CONFIG_IOMMU_API .xchg_no_kill = tce_exchange_pseries, + .useraddrptr = tce_useraddr_pSeriesLP, #endif .clear = tce_freemulti_pSeriesLP, - .get = tce_get_pSeriesLP + .get = tce_get_pSeriesLP, + .free = tce_free_pSeries }; /* diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index a94ec6225d31..1cf36d687559 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -177,6 +177,50 @@ static long tce_iommu_register_pages(struct tce_container *container, return ret; } +static long tce_iommu_userspace_view_alloc(struct iommu_table *tbl, + struct mm_struct *mm) +{ + unsigned long cb = ALIGN(sizeof(tbl->it_userspace[0]) * + tbl->it_size, PAGE_SIZE); + unsigned long *uas; + long ret; + + if (tbl->it_indirect_levels) + return 0; + + WARN_ON(tbl->it_userspace); + + ret = account_locked_vm(mm, cb >> PAGE_SHIFT, true); + if (ret) + return ret; + + uas = vzalloc(cb); + if (!uas) { + account_locked_vm(mm, cb >> PAGE_SHIFT, false); + return -ENOMEM; + } + tbl->it_userspace = (__be64 *) uas; + + return 0; +} + +static void tce_iommu_userspace_view_free(struct iommu_table *tbl, + struct mm_struct *mm) +{ + unsigned long cb = ALIGN(sizeof(tbl->it_userspace[0]) * + tbl->it_size, PAGE_SIZE); + + if (!tbl->it_userspace) + return; + + if (tbl->it_indirect_levels) + return; + + vfree(tbl->it_userspace); + tbl->it_userspace = NULL; + account_locked_vm(mm, cb >> PAGE_SHIFT, false); +} + static bool tce_page_is_contained(struct mm_struct *mm, unsigned long hpa, unsigned int it_page_shift) { @@ -554,6 +598,12 @@ static long tce_iommu_build_v2(struct tce_container *container, unsigned long hpa; enum dma_data_direction dirtmp; + if (!tbl->it_userspace) { + ret = tce_iommu_userspace_view_alloc(tbl, container->mm); + if (ret) + return ret; + } + for (i = 0; i < pages; ++i) { struct mm_iommu_table_group_mem_t *mem = NULL; __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry + i); @@ -637,6 +687,7 @@ static void tce_iommu_free_table(struct tce_container *container, { unsigned long pages = tbl->it_allocated_size >> PAGE_SHIFT; + tce_iommu_userspace_view_free(tbl, container->mm); iommu_tce_table_put(tbl); account_locked_vm(container->mm, pages, false); } From patchwork Tue Mar 12 18:14:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shivaprasad G Bhat X-Patchwork-Id: 1911309 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=daBm+Yut; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TvMKD3rY4z1yWy for ; Wed, 13 Mar 2024 05:16:16 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=daBm+Yut; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4TvMKD1cXJz3d2m for ; Wed, 13 Mar 2024 05:16:16 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=daBm+Yut; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=sbhat@linux.ibm.com; receiver=lists.ozlabs.org) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4TvMHg35knz3dVb for ; Wed, 13 Mar 2024 05:14:55 +1100 (AEDT) Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42CI2EHB012168; Tue, 12 Mar 2024 18:14:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=trXzCuOVdv09a9VzXDn9FwK37IqxmCmGzyWTin/u/q0=; b=daBm+Yut79BtK2Fi6tIXYKy23LM4/kMpGhBDDoVSqseqHCuKiFzMYJndfbVtVv5oKqgh cakwTcV73PraOWbpXMQFznDk41NhcN/humZlDeQsiEOe5dBrHTjTAoB1rGa6iiyBy8VU qStk/9n4xK3GKrJNILnbXQGie4mg1NYUGGuovI61BJDWUR5tRR77m8W3iQesZu0y3VyK xR07uOS+bTUROdwDZ/3a8WlBcrUmuWf+H8z04lNrjjMGSnLbBeSOBVOhuD2VBnEnSdKe PtmWSAQLcNw3ClAs6hcXHeTm1m4F0nlzJHYNV6bgKMHLDqvjEgSHdLwYEz9NdIh8skkU sg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wtuyj06h2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Mar 2024 18:14:42 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42CI3JB2014895; Tue, 12 Mar 2024 18:14:42 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wtuyj06gq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Mar 2024 18:14:41 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42CGwNGW018552; Tue, 12 Mar 2024 18:14:40 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3ws4t208wx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Mar 2024 18:14:40 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42CIEZPE35389710 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Mar 2024 18:14:37 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8A4CB20040; Tue, 12 Mar 2024 18:14:35 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 86F4520049; Tue, 12 Mar 2024 18:14:32 +0000 (GMT) Received: from ltcd48-lp2.aus.stglabs.ibm.com (unknown [9.3.101.175]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 12 Mar 2024 18:14:32 +0000 (GMT) Subject: [RFC PATCH 2/3] powerpc/iommu: Move pSeries specific functions to pseries/iommu.c From: Shivaprasad G Bhat To: tpearson@raptorengineering.com, alex.williamson@redhat.com, linuxppc-dev@lists.ozlabs.org Date: Tue, 12 Mar 2024 13:14:31 -0500 Message-ID: <171026726856.8367.17227042474134236958.stgit@linux.ibm.com> In-Reply-To: <171026724548.8367.8321359354119254395.stgit@linux.ibm.com> References: <171026724548.8367.8321359354119254395.stgit@linux.ibm.com> User-Agent: StGit/1.5 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: urm6KnyZBseyjTaqwoPSkS8Lh48vfFgS X-Proofpoint-ORIG-GUID: KkuFWV4ed1u9IsVbUSQjPU2tdccDWXib X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-12_11,2024-03-12_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 spamscore=0 lowpriorityscore=0 mlxscore=0 adultscore=0 impostorscore=0 malwarescore=0 suspectscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2403120138 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: robh@kernel.org, jroedel@suse.de, sbhat@linux.ibm.com, gbatra@linux.vnet.ibm.com, jgg@ziepe.ca, aik@ozlabs.ru, linux-kernel@vger.kernel.org, svaidy@linux.ibm.com, aneesh.kumar@kernel.org, brking@linux.vnet.ibm.com, npiggin@gmail.com, kvm@vger.kernel.org, naveen.n.rao@linux.ibm.com, vaibhav@linux.ibm.com, msuchanek@suse.de, aik@amd.com Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The PowerNV specific table_group_ops are defined in powernv/pci-ioda.c. The pSeries specific table_group_ops are sitting in the generic powerpc file. Move it to where it actually belong(pseries/iommu.c). Only code movement, no functional changes intended. Signed-off-by: Shivaprasad G Bhat --- arch/powerpc/include/asm/iommu.h | 4 + arch/powerpc/kernel/iommu.c | 149 -------------------------------- arch/powerpc/platforms/pseries/iommu.c | 145 +++++++++++++++++++++++++++++++ 3 files changed, 150 insertions(+), 148 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 026695943550..744cc5fc22d3 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -156,6 +156,9 @@ extern int iommu_tce_table_put(struct iommu_table *tbl); extern struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid, unsigned long res_start, unsigned long res_end); bool iommu_table_in_use(struct iommu_table *tbl); +extern void iommu_table_reserve_pages(struct iommu_table *tbl, + unsigned long res_start, unsigned long res_end); +extern void iommu_table_clear(struct iommu_table *tbl); #define IOMMU_TABLE_GROUP_MAX_TABLES 2 @@ -218,7 +221,6 @@ extern long iommu_tce_xchg_no_kill(struct mm_struct *mm, extern void iommu_tce_kill(struct iommu_table *tbl, unsigned long entry, unsigned long pages); -extern struct iommu_table_group_ops spapr_tce_table_group_ops; #else static inline void iommu_register_group(struct iommu_table_group *table_group, int pci_domain_number, diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 1185efebf032..aa11b2acf24f 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -642,7 +642,7 @@ void ppc_iommu_unmap_sg(struct iommu_table *tbl, struct scatterlist *sglist, tbl->it_ops->flush(tbl); } -static void iommu_table_clear(struct iommu_table *tbl) +void iommu_table_clear(struct iommu_table *tbl) { /* * In case of firmware assisted dump system goes through clean @@ -683,7 +683,7 @@ static void iommu_table_clear(struct iommu_table *tbl) #endif } -static void iommu_table_reserve_pages(struct iommu_table *tbl, +void iommu_table_reserve_pages(struct iommu_table *tbl, unsigned long res_start, unsigned long res_end) { int i; @@ -1101,59 +1101,6 @@ void iommu_tce_kill(struct iommu_table *tbl, } EXPORT_SYMBOL_GPL(iommu_tce_kill); -#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) -static int iommu_take_ownership(struct iommu_table *tbl) -{ - unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; - int ret = 0; - - /* - * VFIO does not control TCE entries allocation and the guest - * can write new TCEs on top of existing ones so iommu_tce_build() - * must be able to release old pages. This functionality - * requires exchange() callback defined so if it is not - * implemented, we disallow taking ownership over the table. - */ - if (!tbl->it_ops->xchg_no_kill) - return -EINVAL; - - spin_lock_irqsave(&tbl->large_pool.lock, flags); - for (i = 0; i < tbl->nr_pools; i++) - spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock); - - if (iommu_table_in_use(tbl)) { - pr_err("iommu_tce: it_map is not empty"); - ret = -EBUSY; - } else { - memset(tbl->it_map, 0xff, sz); - } - - for (i = 0; i < tbl->nr_pools; i++) - spin_unlock(&tbl->pools[i].lock); - spin_unlock_irqrestore(&tbl->large_pool.lock, flags); - - return ret; -} - -static void iommu_release_ownership(struct iommu_table *tbl) -{ - unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; - - spin_lock_irqsave(&tbl->large_pool.lock, flags); - for (i = 0; i < tbl->nr_pools; i++) - spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock); - - memset(tbl->it_map, 0, sz); - - iommu_table_reserve_pages(tbl, tbl->it_reserved_start, - tbl->it_reserved_end); - - for (i = 0; i < tbl->nr_pools; i++) - spin_unlock(&tbl->pools[i].lock); - spin_unlock_irqrestore(&tbl->large_pool.lock, flags); -} -#endif - int iommu_add_device(struct iommu_table_group *table_group, struct device *dev) { /* @@ -1185,98 +1132,6 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev) EXPORT_SYMBOL_GPL(iommu_add_device); #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) -/* - * A simple iommu_table_group_ops which only allows reusing the existing - * iommu_table. This handles VFIO for POWER7 or the nested KVM. - * The ops does not allow creating windows and only allows reusing the existing - * one if it matches table_group->tce32_start/tce32_size/page_shift. - */ -static unsigned long spapr_tce_get_table_size(__u32 page_shift, - __u64 window_size, __u32 levels) -{ - unsigned long size; - - if (levels > 1) - return ~0U; - size = window_size >> (page_shift - 3); - return size; -} - -static long spapr_tce_create_table(struct iommu_table_group *table_group, int num, - __u32 page_shift, __u64 window_size, __u32 levels, - struct iommu_table **ptbl) -{ - struct iommu_table *tbl = table_group->tables[0]; - - if (num > 0) - return -EPERM; - - if (tbl->it_page_shift != page_shift || - tbl->it_size != (window_size >> page_shift) || - tbl->it_indirect_levels != levels - 1) - return -EINVAL; - - *ptbl = iommu_tce_table_get(tbl); - return 0; -} - -static long spapr_tce_set_window(struct iommu_table_group *table_group, - int num, struct iommu_table *tbl) -{ - return tbl == table_group->tables[num] ? 0 : -EPERM; -} - -static long spapr_tce_unset_window(struct iommu_table_group *table_group, int num) -{ - return 0; -} - -static long spapr_tce_take_ownership(struct iommu_table_group *table_group) -{ - int i, j, rc = 0; - - for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { - struct iommu_table *tbl = table_group->tables[i]; - - if (!tbl || !tbl->it_map) - continue; - - rc = iommu_take_ownership(tbl); - if (!rc) - continue; - - for (j = 0; j < i; ++j) - iommu_release_ownership(table_group->tables[j]); - return rc; - } - return 0; -} - -static void spapr_tce_release_ownership(struct iommu_table_group *table_group) -{ - int i; - - for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { - struct iommu_table *tbl = table_group->tables[i]; - - if (!tbl) - continue; - - iommu_table_clear(tbl); - if (tbl->it_map) - iommu_release_ownership(tbl); - } -} - -struct iommu_table_group_ops spapr_tce_table_group_ops = { - .get_table_size = spapr_tce_get_table_size, - .create_table = spapr_tce_create_table, - .set_window = spapr_tce_set_window, - .unset_window = spapr_tce_unset_window, - .take_ownership = spapr_tce_take_ownership, - .release_ownership = spapr_tce_release_ownership, -}; - /* * A simple iommu_ops to allow less cruft in generic VFIO code. */ diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 40de8d55faef..3d9865dadf73 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -54,6 +54,57 @@ enum { DDW_EXT_QUERY_OUT_SIZE = 2 }; +static int iommu_take_ownership(struct iommu_table *tbl) +{ + unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; + int ret = 0; + + /* + * VFIO does not control TCE entries allocation and the guest + * can write new TCEs on top of existing ones so iommu_tce_build() + * must be able to release old pages. This functionality + * requires exchange() callback defined so if it is not + * implemented, we disallow taking ownership over the table. + */ + if (!tbl->it_ops->xchg_no_kill) + return -EINVAL; + + spin_lock_irqsave(&tbl->large_pool.lock, flags); + for (i = 0; i < tbl->nr_pools; i++) + spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock); + + if (iommu_table_in_use(tbl)) { + pr_err("iommu_tce: it_map is not empty"); + ret = -EBUSY; + } else { + memset(tbl->it_map, 0xff, sz); + } + + for (i = 0; i < tbl->nr_pools; i++) + spin_unlock(&tbl->pools[i].lock); + spin_unlock_irqrestore(&tbl->large_pool.lock, flags); + + return ret; +} + +static void iommu_release_ownership(struct iommu_table *tbl) +{ + unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; + + spin_lock_irqsave(&tbl->large_pool.lock, flags); + for (i = 0; i < tbl->nr_pools; i++) + spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock); + + memset(tbl->it_map, 0, sz); + + iommu_table_reserve_pages(tbl, tbl->it_reserved_start, + tbl->it_reserved_end); + + for (i = 0; i < tbl->nr_pools; i++) + spin_unlock(&tbl->pools[i].lock); + spin_unlock_irqrestore(&tbl->large_pool.lock, flags); +} + static struct iommu_table *iommu_pseries_alloc_table(int node) { struct iommu_table *tbl; @@ -67,6 +118,8 @@ static struct iommu_table *iommu_pseries_alloc_table(int node) return tbl; } +struct iommu_table_group_ops spapr_tce_table_group_ops; + static struct iommu_table_group *iommu_pseries_alloc_group(int node) { struct iommu_table_group *table_group; @@ -1656,6 +1709,98 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask) return false; } +/* + * A simple iommu_table_group_ops which only allows reusing the existing + * iommu_table. This handles VFIO for POWER7 or the nested KVM. + * The ops does not allow creating windows and only allows reusing the existing + * one if it matches table_group->tce32_start/tce32_size/page_shift. + */ +static unsigned long spapr_tce_get_table_size(__u32 page_shift, + __u64 window_size, __u32 levels) +{ + unsigned long size; + + if (levels > 1) + return ~0U; + size = window_size >> (page_shift - 3); + return size; +} + +static long spapr_tce_create_table(struct iommu_table_group *table_group, int num, + __u32 page_shift, __u64 window_size, __u32 levels, + struct iommu_table **ptbl) +{ + struct iommu_table *tbl = table_group->tables[0]; + + if (num > 0) + return -EPERM; + + if (tbl->it_page_shift != page_shift || + tbl->it_size != (window_size >> page_shift) || + tbl->it_indirect_levels != levels - 1) + return -EINVAL; + + *ptbl = iommu_tce_table_get(tbl); + return 0; +} + +static long spapr_tce_set_window(struct iommu_table_group *table_group, + int num, struct iommu_table *tbl) +{ + return tbl == table_group->tables[num] ? 0 : -EPERM; +} + +static long spapr_tce_unset_window(struct iommu_table_group *table_group, int num) +{ + return 0; +} + +static long spapr_tce_take_ownership(struct iommu_table_group *table_group) +{ + int i, j, rc = 0; + + for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { + struct iommu_table *tbl = table_group->tables[i]; + + if (!tbl || !tbl->it_map) + continue; + + rc = iommu_take_ownership(tbl); + if (!rc) + continue; + + for (j = 0; j < i; ++j) + iommu_release_ownership(table_group->tables[j]); + return rc; + } + return 0; +} + +static void spapr_tce_release_ownership(struct iommu_table_group *table_group) +{ + int i; + + for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { + struct iommu_table *tbl = table_group->tables[i]; + + if (!tbl) + continue; + + iommu_table_clear(tbl); + if (tbl->it_map) + iommu_release_ownership(tbl); + } +} + +struct iommu_table_group_ops spapr_tce_table_group_ops = { + .get_table_size = spapr_tce_get_table_size, + .create_table = spapr_tce_create_table, + .set_window = spapr_tce_set_window, + .unset_window = spapr_tce_unset_window, + .take_ownership = spapr_tce_take_ownership, + .release_ownership = spapr_tce_release_ownership, +}; + static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action, void *data) { From patchwork Tue Mar 12 18:14:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shivaprasad G Bhat X-Patchwork-Id: 1911310 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=YuXtDlQj; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=112.213.38.117; helo=lists.ozlabs.org; envelope-from=linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=patchwork.ozlabs.org) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TvML06Zbmz1yWy for ; Wed, 13 Mar 2024 05:16:56 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=YuXtDlQj; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4TvML05Tykz3vft for ; Wed, 13 Mar 2024 05:16:56 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=YuXtDlQj; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=sbhat@linux.ibm.com; receiver=lists.ozlabs.org) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4TvMHs64hzz3vX8 for ; Wed, 13 Mar 2024 05:15:05 +1100 (AEDT) Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42CI2Fq9012214; Tue, 12 Mar 2024 18:14:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=Qpe8OHrLllath+SBvB0Y7CkJtrx77IztT4zCArdCnRI=; b=YuXtDlQjycRKcw63w61kY0tB4xI7DDbswodzhXrWJVD964HLpuoHF1V0EIEAatBoqVT+ mP/od+nKrcnWpT6gZgjl/O68Sfumu2+grjWYT5kb5OcJPWP1aHs/zQOxi+Ri/xzJ+68Q VtbDcnE8Z8auevoKoiabcmijSl2Wdoq3goV0pgQO/VpWPx9i4PKOHXRBs3Qq8R5IRDJn ZMPca4oMFB0EUEUAPW1vOIGRiFRRtB/wk8QaQa5WWqGfwWr1KV2Fkwt8EDh8afwT6XG+ O4l6xHw30ruK0q5GUFcEv1hh5QTnugpEWT8r+p9RtFhk12zB6MUUlOZUeY2Ae1dhSaBK 2Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wtuyj06kw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Mar 2024 18:14:55 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42CI2FVr012225; Tue, 12 Mar 2024 18:14:54 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wtuyj06km-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Mar 2024 18:14:54 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42CH0W8C018543; Tue, 12 Mar 2024 18:14:53 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3ws4t208y1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Mar 2024 18:14:53 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42CIEmCp40304968 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Mar 2024 18:14:50 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0E2032004B; Tue, 12 Mar 2024 18:14:48 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BD3AD20040; Tue, 12 Mar 2024 18:14:44 +0000 (GMT) Received: from ltcd48-lp2.aus.stglabs.ibm.com (unknown [9.3.101.175]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 12 Mar 2024 18:14:44 +0000 (GMT) Subject: [RFC PATCH 3/3] pseries/iommu: Enable DDW for VFIO TCE create From: Shivaprasad G Bhat To: tpearson@raptorengineering.com, alex.williamson@redhat.com, linuxppc-dev@lists.ozlabs.org Date: Tue, 12 Mar 2024 13:14:44 -0500 Message-ID: <171026728072.8367.13581504605624115205.stgit@linux.ibm.com> In-Reply-To: <171026724548.8367.8321359354119254395.stgit@linux.ibm.com> References: <171026724548.8367.8321359354119254395.stgit@linux.ibm.com> User-Agent: StGit/1.5 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: kRujcfDp-OtMsLWIsKbVrWo8iIV8o-b- X-Proofpoint-ORIG-GUID: eOk7-MbDjtchGKZni5cMFFbyB5H_9r6P X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-12_11,2024-03-12_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 spamscore=0 lowpriorityscore=0 mlxscore=0 adultscore=0 impostorscore=0 malwarescore=0 suspectscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2403120138 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: robh@kernel.org, jroedel@suse.de, sbhat@linux.ibm.com, gbatra@linux.vnet.ibm.com, jgg@ziepe.ca, aik@ozlabs.ru, linux-kernel@vger.kernel.org, svaidy@linux.ibm.com, aneesh.kumar@kernel.org, brking@linux.vnet.ibm.com, npiggin@gmail.com, kvm@vger.kernel.org, naveen.n.rao@linux.ibm.com, vaibhav@linux.ibm.com, msuchanek@suse.de, aik@amd.com Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The commit 9d67c9433509 ("powerpc/iommu: Add \"borrowing\" iommu_table_group_ops") implemented the "borrow" mechanism for the pSeries SPAPR TCE. It did implement this support partially that it left out creating the DDW if not present already. The patch here attempts to fix the missing gaps. - Expose the DDW info to user by collecting it during probe. - Create the window and the iommu table if not present during VFIO_SPAPR_TCE_CREATE. - Remove and recreate the window if the pageshift and window sizes do not match. - Restore the original window in enable_ddw() if the user had created/modified the DDW. As there is preference for DIRECT mapping on the host driver side, the user created window is removed. The changes work only for the non-SRIOV-VF scenarios for PEs having 2 DMA windows. Signed-off-by: Shivaprasad G Bhat --- arch/powerpc/include/asm/iommu.h | 3 arch/powerpc/kernel/iommu.c | 7 - arch/powerpc/platforms/pseries/iommu.c | 362 +++++++++++++++++++++++++++++++- 3 files changed, 360 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 744cc5fc22d3..fde174122844 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -110,6 +110,7 @@ struct iommu_table { unsigned long it_page_shift;/* table iommu page size */ struct list_head it_group_list;/* List of iommu_table_group_link */ __be64 *it_userspace; /* userspace view of the table */ + bool reset_ddw; struct iommu_table_ops *it_ops; struct kref it_kref; int it_nid; @@ -169,6 +170,8 @@ struct iommu_table_group_ops { __u32 page_shift, __u64 window_size, __u32 levels); + void (*init_group)(struct iommu_table_group *table_group, + struct device *dev); long (*create_table)(struct iommu_table_group *table_group, int num, __u32 page_shift, diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index aa11b2acf24f..1cce2b8b8f2c 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -740,6 +740,7 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid, return NULL; } + tbl->it_nid = nid; iommu_table_reserve_pages(tbl, res_start, res_end); /* We only split the IOMMU table if we have 1GB or more of space */ @@ -1141,7 +1142,10 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain, { struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_group *grp = iommu_group_get(dev); - struct iommu_table_group *table_group; + struct iommu_table_group *table_group = iommu_group_get_iommudata(grp); + + /* This should have been in spapr_tce_iommu_probe_device() ?*/ + table_group->ops->init_group(table_group, dev); /* At first attach the ownership is already set */ if (!domain) { @@ -1149,7 +1153,6 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain, return 0; } - table_group = iommu_group_get_iommudata(grp); /* * The domain being set to PLATFORM from earlier * BLOCKED. The table_group ownership has to be released. diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 3d9865dadf73..7224107a0f60 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -630,6 +630,62 @@ static void iommu_table_setparms(struct pci_controller *phb, phb->dma_window_base_cur += phb->dma_window_size; } +static int iommu_table_reset(struct iommu_table *tbl, unsigned long busno, + unsigned long liobn, unsigned long win_addr, + unsigned long window_size, unsigned long page_shift, + void *base, struct iommu_table_ops *table_ops) +{ + unsigned long sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long); + unsigned int i, oldsize = tbl->it_size; + struct iommu_pool *p; + + WARN_ON(!tbl->it_ops); + + if (oldsize != (window_size >> page_shift)) { + vfree(tbl->it_map); + + tbl->it_map = vzalloc_node(sz, tbl->it_nid); + if (!tbl->it_map) + return -ENOMEM; + + tbl->it_size = window_size >> page_shift; + if (oldsize < (window_size >> page_shift)) + iommu_table_clear(tbl); + } + tbl->it_busno = busno; + tbl->it_index = liobn; + tbl->it_offset = win_addr >> page_shift; + tbl->it_blocksize = 16; + tbl->it_type = TCE_PCI; + tbl->it_ops = table_ops; + tbl->it_page_shift = page_shift; + tbl->it_base = (unsigned long)base; + + if ((tbl->it_size << tbl->it_page_shift) >= (1UL * 1024 * 1024 * 1024)) + tbl->nr_pools = IOMMU_NR_POOLS; + else + tbl->nr_pools = 1; + + tbl->poolsize = (tbl->it_size * 3 / 4) / tbl->nr_pools; + + for (i = 0; i < tbl->nr_pools; i++) { + p = &tbl->pools[i]; + spin_lock_init(&(p->lock)); + p->start = tbl->poolsize * i; + p->hint = p->start; + p->end = p->start + tbl->poolsize; + } + + p = &tbl->large_pool; + spin_lock_init(&(p->lock)); + p->start = tbl->poolsize * i; + p->hint = p->start; + p->end = tbl->it_size; + return 0; +} + + + struct iommu_table_ops iommu_table_lpar_multi_ops; struct iommu_table_ops iommu_table_pseries_ops = { @@ -1016,8 +1072,8 @@ static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_ return 0; } -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift, - bool *direct_mapping) +static bool find_existing_ddw(struct device_node *pdn, u32 *liobn, u64 *dma_addr, + int *window_shift, bool *direct_mapping) { struct dma_win *window; const struct dynamic_dma_window_prop *dma64; @@ -1031,6 +1087,7 @@ static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *windo *dma_addr = be64_to_cpu(dma64->dma_base); *window_shift = be32_to_cpu(dma64->window_shift); *direct_mapping = window->direct; + *liobn = be32_to_cpu(dma64->liobn); found = true; break; } @@ -1315,6 +1372,23 @@ static int iommu_get_page_shift(u32 query_page_size) return ret; } +static __u64 query_page_size_to_mask(u32 query_page_size) +{ + const long shift[] = { + (SZ_4K), (SZ_64K), (SZ_16M), + (SZ_32M), (SZ_64M), (SZ_128M), + (SZ_256M), (SZ_16G), (SZ_2M) + }; + int i, ret = 0; + + for (i = 0; i < ARRAY_SIZE(shift); i++) { + if (query_page_size & (1 << i)) + ret |= shift[i]; + } + + return ret; +} + static struct property *ddw_property_create(const char *propname, u32 liobn, u64 dma_addr, u32 page_shift, u32 window_shift) { @@ -1344,6 +1418,9 @@ static struct property *ddw_property_create(const char *propname, u32 liobn, u64 return win64; } +static long remove_dynamic_dma_windows_locked(struct iommu_table_group *table_group, + struct pci_dev *pdev); + /* * If the PE supports dynamic dma windows, and there is space for a table * that can map all pages in a linear offset, then setup such a table, @@ -1373,6 +1450,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn) bool pmem_present; struct pci_dn *pci = PCI_DN(pdn); struct property *default_win = NULL; + u32 liobn; dn = of_find_node_by_type(NULL, "ibm,pmemory"); pmem_present = dn != NULL; @@ -1380,8 +1458,19 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn) mutex_lock(&dma_win_init_mutex); - if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len, &direct_mapping)) - goto out_unlock; + if (find_existing_ddw(pdn, &liobn, &dev->dev.archdata.dma_offset, &len, &direct_mapping)) { + struct iommu_table *tbl = pci->table_group->tables[1]; + + if (direct_mapping || (tbl && !tbl->reset_ddw)) + goto out_unlock; + /* VFIO user created window has custom size/pageshift */ + if (remove_dynamic_dma_windows_locked(pci->table_group, dev)) + goto out_failed; + + iommu_tce_table_put(tbl); + pci->table_group->tables[1] = NULL; + set_iommu_table_base(&dev->dev, pci->table_group->tables[0]); + } /* * If we already went through this for a previous function of @@ -1726,20 +1815,272 @@ static unsigned long spapr_tce_get_table_size(__u32 page_shift, return size; } +static void spapr_tce_init_group(struct iommu_table_group *table_group, struct device *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct device_node *dn, *pdn; + u32 ddw_avail[DDW_APPLICABLE_SIZE]; + struct ddw_query_response query; + int ret; + + if (table_group->max_dynamic_windows_supported > 0) + return; + + /* No need to insitialize for kdump kernel. */ + if (is_kdump_kernel()) + return; + + dn = pci_device_to_OF_node(pdev); + pdn = pci_dma_find(dn, NULL); + if (!pdn || !PCI_DN(pdn)) { + table_group->max_dynamic_windows_supported = -1; + return; + } + + /* TODO: Phyp sets VF default window base at 512PiB offset. Need + * tce32_base set to the global offset and use the start as 0? + */ + ret = of_property_read_u32_array(pdn, "ibm,ddw-applicable", + &ddw_avail[0], DDW_APPLICABLE_SIZE); + if (ret) { + table_group->max_dynamic_windows_supported = -1; + return; + } + + ret = query_ddw(pdev, ddw_avail, &query, pdn); + if (ret) { + dev_err(&pdev->dev, "%s: query_ddw failed\n", __func__); + table_group->max_dynamic_windows_supported = -1; + return; + } + + /* The SRIOV VFs have only 1 window, the default is removed + * before creating the 64-bit window + */ + if (query.windows_available == 0) + table_group->max_dynamic_windows_supported = 1; + else + table_group->max_dynamic_windows_supported = 2; + + table_group->max_levels = 1; + table_group->pgsizes |= query_page_size_to_mask(query.page_size); +} + + +static long remove_dynamic_dma_windows_locked(struct iommu_table_group *table_group, + struct pci_dev *pdev) +{ + struct device_node *dn = pci_device_to_OF_node(pdev), *pdn; + bool direct_mapping; + struct dma_win *window; + u32 liobn; + int len; + + pdn = pci_dma_find(dn, NULL); + if (!pdn || !PCI_DN(pdn)) { // Niether of 32s|64-bit exist! + return -ENODEV; + } + + if (find_existing_ddw(pdn, &liobn, &pdev->dev.archdata.dma_offset, &len, &direct_mapping)) { + remove_ddw(pdn, true, direct_mapping ? DIRECT64_PROPNAME : DMA64_PROPNAME); + spin_lock(&dma_win_list_lock); + list_for_each_entry(window, &dma_win_list, list) { + if (window->device == pdn) { + list_del(&window->list); + kfree(window); + break; + } + } + spin_unlock(&dma_win_list_lock); + } + + return 0; +} + +static int dev_has_iommu_table(struct device *dev, void *data) +{ + struct pci_dev *pdev = to_pci_dev(dev); + struct pci_dev **ppdev = data; + + if (!dev) + return 0; + + if (device_iommu_mapped(dev)) { + *ppdev = pdev; + return 1; + } + + return 0; +} + + +static struct pci_dev *iommu_group_get_first_pci_dev(struct iommu_group *group) +{ + struct pci_dev *pdev = NULL; + int ret; + + /* No IOMMU group ? */ + if (!group) + return NULL; + + ret = iommu_group_for_each_dev(group, &pdev, dev_has_iommu_table); + if (!ret || !pdev) + return NULL; + return pdev; +} + static long spapr_tce_create_table(struct iommu_table_group *table_group, int num, __u32 page_shift, __u64 window_size, __u32 levels, struct iommu_table **ptbl) { - struct iommu_table *tbl = table_group->tables[0]; + struct pci_dev *pdev = iommu_group_get_first_pci_dev(table_group->group); + struct device_node *dn = pci_device_to_OF_node(pdev), *pdn; + struct iommu_table *tbl = table_group->tables[num]; + u32 window_shift = order_base_2(window_size); + u32 ddw_avail[DDW_APPLICABLE_SIZE]; + struct ddw_create_response create; + struct ddw_query_response query; + unsigned long start = 0, end = 0; + struct failed_ddw_pdn *fpdn; + struct dma_win *window; + struct property *win64; + struct pci_dn *pci; + int len, ret = 0; + u64 win_addr; - if (num > 0) + if (num > 1) return -EPERM; - if (tbl->it_page_shift != page_shift || - tbl->it_size != (window_size >> page_shift) || - tbl->it_indirect_levels != levels - 1) - return -EINVAL; + if (tbl && (tbl->it_page_shift == page_shift) && + (tbl->it_size == (window_size >> page_shift)) && + (tbl->it_indirect_levels == levels - 1)) + goto exit; + + if (num == 0) + return -EINVAL; /* Can't modify the default window. */ + + /* TODO: The SRIO-VFs have only 1 window. */ + if (table_group->max_dynamic_windows_supported == 1) + return -EPERM; + + mutex_lock(&dma_win_init_mutex); + + ret = -ENODEV; + /* If the enable DDW failed for the pdn, dont retry! */ + list_for_each_entry(fpdn, &failed_ddw_pdn_list, list) { + if (fpdn->pdn == pdn) { + pr_err("%s: %pOF in failed DDW device list\n", __func__, pdn); + goto out_unlock; + } + } + + pdn = pci_dma_find(dn, NULL); + if (!pdn || !PCI_DN(pdn)) { /* Niether of 32s|64-bit exist! */ + pr_err("%s: No dma-windows exist for the node %pOF\n", __func__, pdn); + goto out_failed; + } + + /* The existing ddw didn't match the size/shift */ + if (remove_dynamic_dma_windows_locked(table_group, pdev)) { + pr_err("%s: The existing DDW remova failed for node %pOF\n", __func__, pdn); + goto out_failed; /* Could not remove it either! */ + } + + pci = PCI_DN(pdn); + ret = of_property_read_u32_array(pdn, "ibm,ddw-applicable", + &ddw_avail[0], DDW_APPLICABLE_SIZE); + if (ret) { + pr_err("%s: ibm,ddw-applicable not found\n", __func__); + goto out_failed; + } + + ret = query_ddw(pdev, ddw_avail, &query, pdn); + if (ret) + goto out_failed; + ret = -ENODEV; + len = window_shift; + if (query.largest_available_block < (1ULL << (len - page_shift))) { + dev_dbg(&pdev->dev, "can't map window 0x%llx with %llu %llu-sized pages\n", + 1ULL << len, query.largest_available_block, + 1ULL << page_shift); + ret = -EINVAL; /* Retry with smaller window size */ + goto out_unlock; + } + + if (create_ddw(pdev, ddw_avail, &create, page_shift, len)) + goto out_failed; + + win_addr = ((u64)create.addr_hi << 32) | create.addr_lo; + win64 = ddw_property_create(DMA64_PROPNAME, create.liobn, win_addr, page_shift, len); + if (!win64) + goto remove_window; + + ret = of_add_property(pdn, win64); + if (ret) { + dev_err(&pdev->dev, "unable to add DMA window property for %pOF: %d", + pdn, ret); + goto free_property; + } + ret = -ENODEV; + + window = ddw_list_new_entry(pdn, win64->value); + if (!window) + goto remove_property; + + window->direct = false; + + if (tbl) { + iommu_table_reset(tbl, pci->phb->bus->number, create.liobn, win_addr, + 1UL << len, page_shift, NULL, &iommu_table_lpar_multi_ops); + } else { + tbl = iommu_pseries_alloc_table(pci->phb->node); + if (!tbl) { + dev_err(&pdev->dev, "couldn't create new IOMMU table\n"); + goto free_window; + } + iommu_table_setparms_common(tbl, pci->phb->bus->number, create.liobn, win_addr, + 1UL << len, page_shift, NULL, + &iommu_table_lpar_multi_ops); + iommu_init_table(tbl, pci->phb->node, start, end); + } + + tbl->reset_ddw = true; + pci->table_group->tables[1] = tbl; + set_iommu_table_base(&pdev->dev, tbl); + pdev->dev.archdata.dma_offset = win_addr; + + spin_lock(&dma_win_list_lock); + list_add(&window->list, &dma_win_list); + spin_unlock(&dma_win_list_lock); + + mutex_unlock(&dma_win_init_mutex); + + goto exit; + +free_window: + kfree(window); +remove_property: + of_remove_property(pdn, win64); +free_property: + kfree(win64->name); + kfree(win64->value); + kfree(win64); +remove_window: + __remove_dma_window(pdn, ddw_avail, create.liobn); + +out_failed: + fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL); + if (!fpdn) + goto out_unlock; + fpdn->pdn = pdn; + list_add(&fpdn->list, &failed_ddw_pdn_list); + +out_unlock: + mutex_unlock(&dma_win_init_mutex); + + return ret; +exit: *ptbl = iommu_tce_table_get(tbl); return 0; } @@ -1795,6 +2136,7 @@ static void spapr_tce_release_ownership(struct iommu_table_group *table_group) struct iommu_table_group_ops spapr_tce_table_group_ops = { .get_table_size = spapr_tce_get_table_size, .create_table = spapr_tce_create_table, + .init_group = spapr_tce_init_group, .set_window = spapr_tce_set_window, .unset_window = spapr_tce_unset_window, .take_ownership = spapr_tce_take_ownership,