From patchwork Fri Feb 7 19:42:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 1235121 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nvidia.com header.i=@nvidia.com header.a=rsa-sha256 header.s=n1 header.b=ESZpDxNI; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48Dmq26CL2z9sRX for ; Sat, 8 Feb 2020 07:19:10 +1100 (AEDT) Received: from localhost ([::1]:34834 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A5k-0000sJ-Jr for incoming@patchwork.ozlabs.org; Fri, 07 Feb 2020 15:19:08 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54273) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A3B-0005fW-T9 for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j0A3A-0006lU-1c for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:29 -0500 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:16217) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j0A39-0006kG-Ro for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:28 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 07 Feb 2020 12:15:25 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Fri, 07 Feb 2020 12:16:24 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Fri, 07 Feb 2020 12:16:24 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 20:16:24 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 20:16:17 +0000 From: Kirti Wankhede To: , Subject: [PATCH v12 Kernel 1/7] vfio: KABI for migration interface for device state Date: Sat, 8 Feb 2020 01:12:28 +0530 Message-ID: <1581104554-10704-2-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> References: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581106525; bh=Wk/4vlleosW0ARxawGyH3dUipo434bU6vZVMB6qpW1Y=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=ESZpDxNI6FyuI9/di6HP09hWINSc4fqZKC/ppLb0GYCnjl9gNKyzOZAayjZ4DWPLM hdT2ZQo+ezYwU3VOI5vESbYJFZJURs60dq/lzSb5pFhB7p6DvUIMeiffOSh60gvfA/ qGZPZYLXfCI0DZ5qkqcomEDndYUz0eZwelHoZBVu56DPbUGGeuUp14skB2NCtf7X5W oF74rgH1kkMTU10PAZdT4rGMVbEjNkegAaxzYK+w3Ur40ACP0rlQf8+DAzzOql2lCg t7mN6HZYK8dCpe3FvYtU6lSIYM2ASOvd/wIZrDmQYH4CnriQFDU/txEpjpxaoLtsaI MjGIj9UtsY5zg== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.143 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" - Defined MIGRATION region type and sub-type. - Defined vfio_device_migration_info structure which will be placed at 0th offset of migration region to get/set VFIO device related information. Defined members of structure and usage on read/write access. - Defined device states and state transition details. - Defined sequence to be followed while saving and resuming VFIO device. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- include/uapi/linux/vfio.h | 208 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 208 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 9e843a147ead..572242620ce9 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type { #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff) #define VFIO_REGION_TYPE_GFX (1) #define VFIO_REGION_TYPE_CCW (2) +#define VFIO_REGION_TYPE_MIGRATION (3) /* sub-types for VFIO_REGION_TYPE_PCI_* */ @@ -379,6 +380,213 @@ struct vfio_region_gfx_edid { /* sub-types for VFIO_REGION_TYPE_CCW */ #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD (1) +/* sub-types for VFIO_REGION_TYPE_MIGRATION */ +#define VFIO_REGION_SUBTYPE_MIGRATION (1) + +/* + * Structure vfio_device_migration_info is placed at 0th offset of + * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related migration + * information. Field accesses from this structure are only supported at their + * native width and alignment, otherwise the result is undefined and vendor + * drivers should return an error. + * + * device_state: (read/write) + * - User application writes this field to inform vendor driver about the + * device state to be transitioned to. + * - Vendor driver should take necessary actions to change device state. + * On successful transition to given state, vendor driver should return + * success on write(device_state, state) system call. If device state + * transition fails, vendor driver should return error, -EFAULT. + * - On user application side, if device state transition fails, i.e. if + * write(device_state, state) returns error, read device_state again to + * determine the current state of the device from vendor driver. + * - Vendor driver should return previous state of the device unless vendor + * driver has encountered an internal error, in which case vendor driver + * may report the device_state VFIO_DEVICE_STATE_ERROR. + * - User application must use the device reset ioctl in order to recover + * the device from VFIO_DEVICE_STATE_ERROR state. If the device is + * indicated in a valid device state via reading device_state, the user + * application may decide attempt to transition the device to any valid + * state reachable from the current state or terminate itself. + * + * device_state consists of 3 bits: + * - If bit 0 set, indicates _RUNNING state. When it's clear, that + * indicates _STOP state. When device is changed to _STOP, driver should + * stop device before write() returns. + * - If bit 1 set, indicates _SAVING state. When set, that indicates driver + * should start gathering device state information which will be provided + * to VFIO user application to save device's state. + * - If bit 2 set, indicates _RESUMING state. When set, that indicates + * prepare to resume device, data provided through migration region + * should be used to resume device. + * Bits 3 - 31 are reserved for future use. In order to preserve them, + * user application should perform read-modify-write operation on this + * field when modifying the specified bits. + * + * +------- _RESUMING + * |+------ _SAVING + * ||+----- _RUNNING + * ||| + * 000b => Device Stopped, not saving or resuming + * 001b => Device running state, default state + * 010b => Stop Device & save device state, stop-and-copy state + * 011b => Device running and save device state, pre-copy state + * 100b => Device stopped and device state is resuming + * 101b => Invalid state + * 110b => Error state + * 111b => Invalid state + * + * State transitions: + * + * _RESUMING _RUNNING Pre-copy Stop-and-copy _STOP + * (100b) (001b) (011b) (010b) (000b) + * 0. Running or Default state + * | + * + * 1. Normal Shutdown (optional) + * |------------------------------------->| + * + * 2. Save state or Suspend + * |------------------------->|---------->| + * + * 3. Save state during live migration + * |----------->|------------>|---------->| + * + * 4. Resuming + * |<---------| + * + * 5. Resumed + * |--------->| + * + * 0. Default state of VFIO device is _RUNNNG when user application starts. + * 1. During normal user application shutdown, vfio device state changes + * from _RUNNING to _STOP. This is optional, user application may or may not + * perform this state transition and vendor driver may not need. + * 2. When user application save state or suspend application, device state + * transitions from _RUNNING to stop-and-copy state and then to _STOP. + * On state transition from _RUNNING to stop-and-copy, driver must + * stop device, save device state and send it to application through + * migration region. Sequence to be followed for such transition is given + * below. + * 3. In user application live migration, state transitions from _RUNNING + * to pre-copy to stop-and-copy to _STOP. + * On state transition from _RUNNING to pre-copy, driver should start + * gathering device state while application is still running and send device + * state data to application through migration region. + * On state transition from pre-copy to stop-and-copy, driver must stop + * device, save device state and send it to user application through + * migration region. + * Sequence to be followed for above two transitions is given below. + * 4. To start resuming phase, device state should be transitioned from + * _RUNNING to _RESUMING state. + * In _RESUMING state, driver should use received device state data through + * migration region to resume device. + * 5. On providing saved device data to driver, application should change state + * from _RESUMING to _RUNNING. + * + * pending bytes: (read only) + * Number of pending bytes yet to be migrated from vendor driver + * + * data_offset: (read only) + * User application should read data_offset in migration region from where + * user application should read device data during _SAVING state or write + * device data during _RESUMING state. See below for detail of sequence to + * be followed. + * + * data_size: (read/write) + * User application should read data_size to get size of data copied in + * bytes in migration region during _SAVING state and write size of data + * copied in bytes in migration region during _RESUMING state. + * + * Migration region looks like: + * ------------------------------------------------------------------ + * |vfio_device_migration_info| data section | + * | | /////////////////////////////// | + * ------------------------------------------------------------------ + * ^ ^ + * offset 0-trapped part data_offset + * + * Structure vfio_device_migration_info is always followed by data section in + * the region, so data_offset will always be non-0. Offset from where data is + * copied is decided by kernel driver, data section can be trapped or mapped + * or partitioned, depending on how kernel driver defines data section. + * Data section partition can be defined as mapped by sparse mmap capability. + * If mmapped, then data_offset should be page aligned, where as initial section + * which contain vfio_device_migration_info structure might not end at offset + * which is page aligned. The user is not required to access via mmap regardless + * of the region mmap capabilities. + * Vendor driver should decide whether to partition data section and how to + * partition the data section. Vendor driver should return data_offset + * accordingly. + * + * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy phase + * and for _SAVING device state or stop-and-copy phase: + * a. read pending_bytes, indicates start of new iteration to get device data. + * Repeatative read on pending_bytes at this stage should have no side + * effect. + * If pending_bytes == 0, user application should not iterate to get data + * for that device. + * If pending_bytes > 0, go through below steps. + * b. read data_offset, indicates vendor driver to make data available through + * data section. Vendor driver should return this read operation only after + * data is available from (region + data_offset) to (region + data_offset + + * data_size). + * c. read data_size, amount of data in bytes available through migration + * region. + * Read on data_offset and data_size should return offset and size of current + * buffer if user application reads those more than once here. + * d. read data of data_size bytes from (region + data_offset) from migration + * region. + * e. process data. + * f. read pending_bytes, this read operation indicates data from previous + * iteration had read. If pending_bytes > 0, goto step b. + * + * If there is any error during the above sequence, vendor driver can return + * error code for next read()/write() operation, that will terminate the loop + * and user should take next necessary action, for example, fail migration or + * terminate user application. + * + * User application can transition from _SAVING|_RUNNING (pre-copy state) to + * _SAVING (stop-and-copy) state regardless of pending bytes. + * User application should iterate in _SAVING (stop-and-copy) until + * pending_bytes is 0. + * + * Sequence to be followed while _RESUMING device state: + * While data for this device is available, repeat below steps: + * a. read data_offset from where user application should write data. + * b. write data of data_size to migration region from data_offset. Data size + * should be data packet size at source during _SAVING. + * c. write data_size which indicates vendor driver that data is written in + * migration region. Vendor driver should read this data from migration + * region and resume device's state. + * + * For user application, data is opaque. User application should write data in + * the same order as received and should of same transaction size at source. + */ + +struct vfio_device_migration_info { + __u32 device_state; /* VFIO device state */ +#define VFIO_DEVICE_STATE_STOP (0) +#define VFIO_DEVICE_STATE_RUNNING (1 << 0) +#define VFIO_DEVICE_STATE_SAVING (1 << 1) +#define VFIO_DEVICE_STATE_RESUMING (1 << 2) +#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \ + VFIO_DEVICE_STATE_SAVING | \ + VFIO_DEVICE_STATE_RESUMING) + +#define VFIO_DEVICE_STATE_VALID(state) \ + (state & VFIO_DEVICE_STATE_RESUMING ? \ + (state & VFIO_DEVICE_STATE_MASK) == VFIO_DEVICE_STATE_RESUMING : 1) + +#define VFIO_DEVICE_STATE_ERROR \ + (VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING) + + __u32 reserved; + __u64 pending_bytes; + __u64 data_offset; + __u64 data_size; +} __attribute__((packed)); + /* * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped * which allows direct access to non-MSIX registers which happened to be within From patchwork Fri Feb 7 19:42:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 1235116 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nvidia.com header.i=@nvidia.com header.a=rsa-sha256 header.s=n1 header.b=iqPqaE4P; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48Dmmq6G4Bz9sRt for ; Sat, 8 Feb 2020 07:17:15 +1100 (AEDT) Received: from localhost ([::1]:34776 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A3t-0005qB-Oc for incoming@patchwork.ozlabs.org; Fri, 07 Feb 2020 15:17:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54286) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A3F-0005mR-VG for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j0A3E-0006vd-Uo for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:33 -0500 Received: from hqnvemgate26.nvidia.com ([216.228.121.65]:11691) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j0A3E-0006u9-PA for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:32 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 07 Feb 2020 12:16:17 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Fri, 07 Feb 2020 12:16:31 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Fri, 07 Feb 2020 12:16:31 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 20:16:31 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 20:16:24 +0000 From: Kirti Wankhede To: , Subject: [PATCH v12 Kernel 2/7] vfio iommu: Remove atomicity of ref_count of pinned pages Date: Sat, 8 Feb 2020 01:12:29 +0530 Message-ID: <1581104554-10704-3-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> References: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581106577; bh=8nqDVFl6a2E+cBhlRJoQWVeBxlXKUInlVp9C6nTopcc=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=iqPqaE4P/2k4gUT/zkosJ1XAni0M4LEiTf8RcyN24oN4uwxVCYv+eIyerlhkeRPIB +KEbUUTIq95wdL7cO6WU9AWwkn+Eh97ZP+R3O3TdcH1Fvv4ZNQC7x1+sw4Z/V3v+ho Ikub+qAmxy/sDmzLMqQX6jKzDlb2qCpgp3EHFjWtB8ex9m4D7zj33khlAzw2a+dOdS YWpJ/JZkPaQ5rQ9xFj9UAWXgTl11/+s5pfLT+H/EYAN7eNBy0q2yjUkCgxtzbjtwlm f7QGg3BI4+ZNYxitZ5W8zfKtpwtbC7RYaFnyoBQvXKjFAHjS3Mav91RnfAmLGLtH7s N14fYguREUtQw== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.65 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" vfio_pfn.ref_count is always updated by holding iommu->lock, using atomic variable is overkill. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index a177bf2c6683..d386461e5d11 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -111,7 +111,7 @@ struct vfio_pfn { struct rb_node node; dma_addr_t iova; /* Device address */ unsigned long pfn; /* Host pfn */ - atomic_t ref_count; + unsigned int ref_count; }; struct vfio_regions { @@ -232,7 +232,7 @@ static int vfio_add_to_pfn_list(struct vfio_dma *dma, dma_addr_t iova, vpfn->iova = iova; vpfn->pfn = pfn; - atomic_set(&vpfn->ref_count, 1); + vpfn->ref_count = 1; vfio_link_pfn(dma, vpfn); return 0; } @@ -250,7 +250,7 @@ static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn = vfio_find_vpfn(dma, iova); if (vpfn) - atomic_inc(&vpfn->ref_count); + vpfn->ref_count++; return vpfn; } @@ -258,7 +258,8 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn) { int ret = 0; - if (atomic_dec_and_test(&vpfn->ref_count)) { + vpfn->ref_count--; + if (!vpfn->ref_count) { ret = put_pfn(vpfn->pfn, dma->prot); vfio_remove_from_pfn_list(dma, vpfn); } From patchwork Fri Feb 7 19:42:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 1235117 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nvidia.com header.i=@nvidia.com header.a=rsa-sha256 header.s=n1 header.b=DBfCGbHj; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48Dmmy6qYwz9sRt for ; Sat, 8 Feb 2020 07:17:22 +1100 (AEDT) Received: from localhost ([::1]:34784 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A40-0005zW-SI for incoming@patchwork.ozlabs.org; Fri, 07 Feb 2020 15:17:20 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54323) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A3N-0005uX-2s for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j0A3L-00072i-TE for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:41 -0500 Received: from hqnvemgate26.nvidia.com ([216.228.121.65]:11707) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j0A3L-00071a-Nj for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:39 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 07 Feb 2020 12:16:24 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Fri, 07 Feb 2020 12:16:38 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Fri, 07 Feb 2020 12:16:38 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 20:16:37 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 20:16:31 +0000 From: Kirti Wankhede To: , Subject: [PATCH v12 Kernel 3/7] vfio iommu: Add ioctl definition for dirty pages tracking. Date: Sat, 8 Feb 2020 01:12:30 +0530 Message-ID: <1581104554-10704-4-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> References: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581106584; bh=0i337WqOzs6QPtnLIONyV13fKIhwI49JrEmkYl794UY=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=DBfCGbHjjUYkYi96vedVpEM2A12CpoqYxdhRVdaD5xkzLmEhWea4Lq0KN8hN5uEmo Sty7bgacZ9/+2xBmkGRrECPTuhSi77xKiOyMxf0nHxDr51tge70xEiZyX42kYksWgk yB3hH29E6quPM0xHWCpaGxSwwPWuqXiQGzk6tvmtd0nMonB8IeeN2/gXIaLasGIrJU fusBmcc8gbzUisMXDbLtuV1wWIMMU7wphwMh5rGHU19u7z9LapqBJOEM5cCmgpprt8 NGW8dukMEjqqpV/YizWKTM6hLqTMTXLjEKmfB1EgYSIEkEFjHleF0BGB5R+Qt+QVYn wlF4pzpCngiKg== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.65 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" IOMMU container maintains a list of all pages pinned by vfio_pin_pages API. All pages pinned by vendor driver through this API should be considered as dirty during migration. When container consists of IOMMU capable device and all pages are pinned and mapped, then all pages are marked dirty. Added support to start/stop unpinned pages tracking and to get bitmap of all dirtied pages for requested IO virtual address range. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- include/uapi/linux/vfio.h | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 572242620ce9..b1b03c720749 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1002,6 +1002,48 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) +/** + * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17, + * struct vfio_iommu_type1_dirty_bitmap) + * IOCTL is used for dirty pages tracking. Caller sets argsz, which is size of + * struct vfio_iommu_type1_dirty_bitmap. Caller set flag depend on which + * operation to perform, details as below: + * + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates + * migration is active and IOMMU module should track pages which are being + * unpinned. Unpinned pages are tracked until tracking is stopped by user + * application by setting VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag. + * + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates + * IOMMU should stop tracking unpinned pages and also free previously tracked + * unpinned pages data. + * + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set, + * IOCTL returns dirty pages bitmap for IOMMU container during migration for + * given IOVA range. User must allocate memory to get bitmap, zero the bitmap + * memory and set size of allocated memory in bitmap_size field. One bit is + * used to represent one page consecutively starting from iova offset. User + * should provide page size in 'pgsize'. Bit set in bitmap indicates page at + * that offset from iova is dirty. + * + * Only one flag should be set at a time. + * + */ +struct vfio_iommu_type1_dirty_bitmap { + __u32 argsz; + __u32 flags; +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START (1 << 0) +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP (1 << 1) +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP (1 << 2) + __u64 iova; /* IO virtual address */ + __u64 size; /* Size of iova range */ + __u64 pgsize; /* page size for bitmap */ + __u64 bitmap_size; /* in bytes */ + void __user *bitmap; /* one bit per page */ +}; + +#define VFIO_IOMMU_DIRTY_PAGES _IO(VFIO_TYPE, VFIO_BASE + 17) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /* From patchwork Fri Feb 7 19:42:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 1235122 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nvidia.com header.i=@nvidia.com header.a=rsa-sha256 header.s=n1 header.b=CaUgEQjp; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48DmqT1LyMz9sRX for ; Sat, 8 Feb 2020 07:19:33 +1100 (AEDT) Received: from localhost ([::1]:34848 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A67-0001F7-1M for incoming@patchwork.ozlabs.org; Fri, 07 Feb 2020 15:19:31 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54356) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A3X-0006E5-50 for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j0A3V-0007A3-1g for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:51 -0500 Received: from hqnvemgate25.nvidia.com ([216.228.121.64]:18003) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j0A3U-00078A-P1 for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:49 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 07 Feb 2020 12:16:20 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Fri, 07 Feb 2020 12:16:45 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Fri, 07 Feb 2020 12:16:45 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 20:16:46 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 20:16:38 +0000 From: Kirti Wankhede To: , Subject: [PATCH v12 Kernel 4/7] vfio iommu: Implementation of ioctl to for dirty pages tracking. Date: Sat, 8 Feb 2020 01:12:31 +0530 Message-ID: <1581104554-10704-5-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> References: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581106580; bh=V92M9kZNrAz8narYr9KEP/P3a3G4aMPmzWkSASFijgI=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=CaUgEQjpksf09gzJFfKatqVQYWPI3uEwke9kSjntfjMQEnRNsGgXK2tCiDH8uA8go 65sLvoCvPrBkhiatzTjD4xap+d62aK4riu1RxIXF5AT+hx8ycAiB4I4AepGGo5AhuJ CfKa0GE5MCcmnQVpqFjRGs/X1g5UEbmWLvWPD/AHGjkAUjSfr9P6b2O4tPQRwFigQ2 9umKZVhF+0xICo9reJApnAwxVWlFkMgqeuoH2uoJ9jme0tS20iBA7r7cWAvhu4/OCN anl78js1SfjDLK9k+C2u8wSefPrwXNlXfyA5J2yBzBZ+Xby+YsIv5EndWI0l40ZwPw QgwMuID7hIKVg== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.64 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations: - Start pinned and unpinned pages tracking while migration is active - Stop pinned and unpinned dirty pages tracking. This is also used to stop dirty pages tracking if migration failed or cancelled. - Get dirty pages bitmap. This ioctl returns bitmap of dirty pages, its user space application responsibility to copy content of dirty pages from source to destination during migration. To prevent DoS attack, memory for bitmap is allocated per vfio_dma structure. Bitmap size is calculated considering smallest supported page size. Bitmap is allocated when dirty logging is enabled for those vfio_dmas whose vpfn list is not empty or whole range is mapped, in case of pass-through device. There could be multiple option as to when bitmap should be populated: * Polulate bitmap for already pinned pages when bitmap is allocated for a vfio_dma with the smallest supported page size. Updates bitmap from page pinning and unpinning functions. When user application queries bitmap, check if requested page size is same as page size used to populated bitmap. If it is equal, copy bitmap. But if not equal, re-populated bitmap according to requested page size and then copy to user. Pros: Bitmap gets populated on the fly after dirty tracking has started. Cons: If requested page size is different than smallest supported page size, then bitmap has to be re-populated again, with additional overhead of allocating bitmap memory again for re-population of bitmap. * Populate bitmap when bitmap is queried by user application. Pros: Bitmap is populated with requested page size. This eliminates the need to re-populate bitmap if requested page size is different than smallest supported pages size. Cons: There is one time processing time, when bitmap is queried. I prefer later option with simple logic and to eliminate over-head of bitmap repopulation in case of differnt page sizes. Later option is implemented in this patch. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 299 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 287 insertions(+), 12 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index d386461e5d11..df358dc1c85b 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -70,6 +70,7 @@ struct vfio_iommu { unsigned int dma_avail; bool v2; bool nesting; + bool dirty_page_tracking; }; struct vfio_domain { @@ -90,6 +91,7 @@ struct vfio_dma { bool lock_cap; /* capable(CAP_IPC_LOCK) */ struct task_struct *task; struct rb_root pfn_list; /* Ex-user pinned pfn list */ + unsigned long *bitmap; }; struct vfio_group { @@ -125,6 +127,7 @@ struct vfio_regions { (!list_empty(&iommu->domain_list)) static int put_pfn(unsigned long pfn, int prot); +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu); /* * This code handles mapping and unmapping of user data buffers @@ -174,6 +177,57 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, struct vfio_dma *old) rb_erase(&old->node, &iommu->dma_list); } +static inline unsigned long dirty_bitmap_bytes(unsigned int npages) +{ + if (!npages) + return 0; + + return ALIGN(npages, BITS_PER_LONG) / sizeof(unsigned long); +} + +static int vfio_dma_bitmap_alloc(struct vfio_iommu *iommu, + struct vfio_dma *dma, unsigned long pgsizes) +{ + unsigned long pgshift = __ffs(pgsizes); + + if (!RB_EMPTY_ROOT(&dma->pfn_list) || dma->iommu_mapped) { + unsigned long npages = dma->size >> pgshift; + unsigned long bsize = dirty_bitmap_bytes(npages); + + dma->bitmap = kvzalloc(bsize, GFP_KERNEL); + if (!dma->bitmap) + return -ENOMEM; + } + return 0; +} + +static int vfio_dma_all_bitmap_alloc(struct vfio_iommu *iommu, + unsigned long pgsizes) +{ + struct rb_node *n = rb_first(&iommu->dma_list); + int ret; + + for (; n; n = rb_next(n)) { + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); + + ret = vfio_dma_bitmap_alloc(iommu, dma, pgsizes); + if (ret) + return ret; + } + return 0; +} + +static void vfio_dma_all_bitmap_free(struct vfio_iommu *iommu) +{ + struct rb_node *n = rb_first(&iommu->dma_list); + + for (; n; n = rb_next(n)) { + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); + + kfree(dma->bitmap); + } +} + /* * Helper Functions for host iova-pfn list */ @@ -244,6 +298,29 @@ static void vfio_remove_from_pfn_list(struct vfio_dma *dma, kfree(vpfn); } +static void vfio_remove_unpinned_from_pfn_list(struct vfio_dma *dma) +{ + struct rb_node *n = rb_first(&dma->pfn_list); + + for (; n; n = rb_next(n)) { + struct vfio_pfn *vpfn = rb_entry(n, struct vfio_pfn, node); + + if (!vpfn->ref_count) + vfio_remove_from_pfn_list(dma, vpfn); + } +} + +static void vfio_remove_unpinned_from_dma_list(struct vfio_iommu *iommu) +{ + struct rb_node *n = rb_first(&iommu->dma_list); + + for (; n; n = rb_next(n)) { + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); + + vfio_remove_unpinned_from_pfn_list(dma); + } +} + static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct vfio_dma *dma, unsigned long iova) { @@ -261,7 +338,8 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn) vpfn->ref_count--; if (!vpfn->ref_count) { ret = put_pfn(vpfn->pfn, dma->prot); - vfio_remove_from_pfn_list(dma, vpfn); + if (!dma->bitmap) + vfio_remove_from_pfn_list(dma, vpfn); } return ret; } @@ -483,13 +561,14 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, return ret; } -static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova, +static int vfio_unpin_page_external(struct vfio_iommu *iommu, + struct vfio_dma *dma, dma_addr_t iova, bool do_accounting) { int unlocked; struct vfio_pfn *vpfn = vfio_find_vpfn(dma, iova); - if (!vpfn) + if (!vpfn || !vpfn->ref_count) return 0; unlocked = vfio_iova_put_vfio_pfn(dma, vpfn); @@ -510,6 +589,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, unsigned long remote_vaddr; struct vfio_dma *dma; bool do_accounting; + unsigned long iommu_pgsizes = vfio_pgsize_bitmap(iommu); if (!iommu || !user_pfn || !phys_pfn) return -EINVAL; @@ -551,8 +631,10 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, vpfn = vfio_iova_get_vfio_pfn(dma, iova); if (vpfn) { - phys_pfn[i] = vpfn->pfn; - continue; + if (vpfn->ref_count > 1) { + phys_pfn[i] = vpfn->pfn; + continue; + } } remote_vaddr = dma->vaddr + iova - dma->iova; @@ -560,11 +642,23 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, do_accounting); if (ret) goto pin_unwind; - - ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]); - if (ret) { - vfio_unpin_page_external(dma, iova, do_accounting); - goto pin_unwind; + if (!vpfn) { + ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]); + if (ret) { + vfio_unpin_page_external(iommu, dma, iova, + do_accounting); + goto pin_unwind; + } + } else + vpfn->pfn = phys_pfn[i]; + + if (iommu->dirty_page_tracking && !dma->bitmap) { + ret = vfio_dma_bitmap_alloc(iommu, dma, iommu_pgsizes); + if (ret) { + vfio_unpin_page_external(iommu, dma, iova, + do_accounting); + goto pin_unwind; + } } } @@ -578,7 +672,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, iova = user_pfn[j] << PAGE_SHIFT; dma = vfio_find_dma(iommu, iova, PAGE_SIZE); - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(iommu, dma, iova, do_accounting); phys_pfn[j] = 0; } pin_done: @@ -612,7 +706,7 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data, dma = vfio_find_dma(iommu, iova, PAGE_SIZE); if (!dma) goto unpin_exit; - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(iommu, dma, iova, do_accounting); } unpin_exit: @@ -830,6 +924,113 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) return bitmap; } +static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova, + size_t size, uint64_t pgsize, + unsigned char __user *bitmap) +{ + struct vfio_dma *dma; + dma_addr_t i = iova, iova_limit; + unsigned int bsize, nbits = 0, l = 0; + unsigned long pgshift = __ffs(pgsize); + + while ((dma = vfio_find_dma(iommu, i, pgsize))) { + int ret, j; + unsigned int npages = 0, shift = 0; + unsigned char temp = 0; + + /* mark all pages dirty if all pages are pinned and mapped. */ + if (dma->iommu_mapped) { + iova_limit = min(dma->iova + dma->size, iova + size); + npages = iova_limit/pgsize; + bitmap_set(dma->bitmap, 0, npages); + } else if (dma->bitmap) { + struct rb_node *n = rb_first(&dma->pfn_list); + bool found = false; + + for (; n; n = rb_next(n)) { + struct vfio_pfn *vpfn = rb_entry(n, + struct vfio_pfn, node); + if (vpfn->iova >= i) { + found = true; + break; + } + } + + if (!found) { + i += dma->size; + continue; + } + + for (; n; n = rb_next(n)) { + unsigned int s; + struct vfio_pfn *vpfn = rb_entry(n, + struct vfio_pfn, node); + + if (vpfn->iova >= iova + size) + break; + + s = (vpfn->iova - dma->iova) >> pgshift; + bitmap_set(dma->bitmap, s, 1); + + iova_limit = vpfn->iova + pgsize; + } + npages = iova_limit/pgsize; + } + + bsize = dirty_bitmap_bytes(npages); + shift = nbits % BITS_PER_BYTE; + + if (npages && shift) { + l--; + if (!access_ok((void __user *)bitmap + l, + sizeof(unsigned char))) + return -EINVAL; + + ret = __get_user(temp, bitmap + l); + if (ret) + return ret; + } + + for (j = 0; j < bsize; j++, l++) { + temp = temp | + (*((unsigned char *)dma->bitmap + j) << shift); + if (!access_ok((void __user *)bitmap + l, + sizeof(unsigned char))) + return -EINVAL; + + ret = __put_user(temp, bitmap + l); + if (ret) + return ret; + if (shift) { + temp = *((unsigned char *)dma->bitmap + j) >> + (BITS_PER_BYTE - shift); + } + } + + nbits += npages; + + i = min(dma->iova + dma->size, iova + size); + if (i >= iova + size) + break; + } + return 0; +} + +static long verify_bitmap_size(unsigned long npages, unsigned long bitmap_size) +{ + long bsize; + + if (!bitmap_size || bitmap_size > SIZE_MAX) + return -EINVAL; + + bsize = dirty_bitmap_bytes(npages); + + if (bitmap_size < bsize) + return -EINVAL; + + return bsize; +} + static int vfio_dma_do_unmap(struct vfio_iommu *iommu, struct vfio_iommu_type1_dma_unmap *unmap) { @@ -2277,6 +2478,80 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, return copy_to_user((void __user *)arg, &unmap, minsz) ? -EFAULT : 0; + } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) { + struct vfio_iommu_type1_dirty_bitmap range; + uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START | + VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP | + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP; + int ret; + + if (!iommu->v2) + return -EACCES; + + minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap, + bitmap); + + if (copy_from_user(&range, (void __user *)arg, minsz)) + return -EFAULT; + + if (range.argsz < minsz || range.flags & ~mask) + return -EINVAL; + + /* only one flag should be set at a time */ + if (__ffs(range.flags) != __fls(range.flags)) + return -EINVAL; + + if (range.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_START) { + unsigned long iommu_pgsizes = vfio_pgsize_bitmap(iommu); + + mutex_lock(&iommu->lock); + iommu->dirty_page_tracking = true; + ret = vfio_dma_all_bitmap_alloc(iommu, iommu_pgsizes); + mutex_unlock(&iommu->lock); + return ret; + } else if (range.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP) { + mutex_lock(&iommu->lock); + iommu->dirty_page_tracking = false; + vfio_dma_all_bitmap_free(iommu); + vfio_remove_unpinned_from_dma_list(iommu); + mutex_unlock(&iommu->lock); + return 0; + } else if (range.flags & + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) { + long bsize; + unsigned long pgshift = __ffs(range.pgsize); + uint64_t iommu_pgsizes = vfio_pgsize_bitmap(iommu); + uint64_t iommu_pgmask = + ((uint64_t)1 << __ffs(iommu_pgsizes)) - 1; + + if ((range.pgsize & iommu_pgsizes) != range.pgsize) + return -EINVAL; + if (range.iova & iommu_pgmask) + return -EINVAL; + if (!range.size || range.size & iommu_pgmask) + return -EINVAL; + if (range.iova + range.size < range.iova) + return -EINVAL; + if (!access_ok((void __user *)range.bitmap, + range.bitmap_size)) + return -EINVAL; + + bsize = verify_bitmap_size(range.size >> pgshift, + range.bitmap_size); + if (bsize < 0) + return bsize; + + mutex_lock(&iommu->lock); + if (iommu->dirty_page_tracking) + ret = vfio_iova_dirty_bitmap(iommu, range.iova, + range.size, range.pgsize, + (unsigned char __user *)range.bitmap); + else + ret = -EINVAL; + mutex_unlock(&iommu->lock); + + return ret; + } } return -ENOTTY; From patchwork Fri Feb 7 19:42:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 1235123 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nvidia.com header.i=@nvidia.com header.a=rsa-sha256 header.s=n1 header.b=BuP2SuHK; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48DmqY3MHjz9sRX for ; Sat, 8 Feb 2020 07:19:37 +1100 (AEDT) Received: from localhost ([::1]:34856 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A6B-0001Ml-BI for incoming@patchwork.ozlabs.org; Fri, 07 Feb 2020 15:19:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54369) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A3c-0006K4-0K for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j0A3a-0007Lh-DT for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:55 -0500 Received: from hqnvemgate26.nvidia.com ([216.228.121.65]:11729) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j0A3a-0007LH-5A for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:16:54 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 07 Feb 2020 12:16:38 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Fri, 07 Feb 2020 12:16:52 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Fri, 07 Feb 2020 12:16:52 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 20:16:53 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 20:16:46 +0000 From: Kirti Wankhede To: , Subject: [PATCH v12 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap Date: Sat, 8 Feb 2020 01:12:32 +0530 Message-ID: <1581104554-10704-6-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> References: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581106599; bh=aoFkyD1phMFO3EyRxvdVkmH6qXD9Mf9B75hip54fN0Q=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=BuP2SuHKgaiqgbkGJACrFXh2ODFC1bU+2KUiJF9Rhc0r9h5p1z6yDqD6hkzvVsEq2 jOWoGkl9QJWncyOXUmOvUY9FfCxmyKewU2yE2VQ9at1RlKyvvMu3KFZbfsHlH8ur/Q gFg6sQCs3mmUAqVELxPmuA0P29RVEox/43tX4vXO5eBcXTJIaMi3FdHrEsxiMIlgJO UNc/Cnf9KzDAaOp5WF0nWjpyL51kCHd4nEwzRyQtyOz9zJE3pU1oBXlwrq0mTp3QCx xqlaT5QIo4/vOy70pAsfNhan0TUM9iHzJTP3NZ/WmpbZYLsp1Q+Us9H7GosZN6qYG7 1R4yn8FAxRo8Q== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.65 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Pages, pinned by external interface for requested IO virtual address range, might get unpinned and unmapped while migration is active and device is still running, that is, in pre-copy phase while guest driver still could access those pages. Host device can write to these pages while those were mapped. Such pages should be marked dirty so that after migration guest driver should still be able to complete the operation. To get bitmap during unmap, user should set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and zeroed by user space application. Bitmap size and page size should be set by user application. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 56 +++++++++++++++++++++++++++++++++++++---- include/uapi/linux/vfio.h | 12 +++++++++ 2 files changed, 63 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index df358dc1c85b..4e6ad0513932 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1032,7 +1032,8 @@ static long verify_bitmap_size(unsigned long npages, unsigned long bitmap_size) } static int vfio_dma_do_unmap(struct vfio_iommu *iommu, - struct vfio_iommu_type1_dma_unmap *unmap) + struct vfio_iommu_type1_dma_unmap *unmap, + unsigned long *bitmap) { uint64_t mask; struct vfio_dma *dma, *dma_last = NULL; @@ -1107,6 +1108,15 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, if (dma->task->mm != current->mm) break; + if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) && + (dma_last != dma)) + vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size, + unmap->bitmap_pgsize, + (unsigned char __user *) bitmap); + else + vfio_remove_unpinned_from_pfn_list(dma); + + if (!RB_EMPTY_ROOT(&dma->pfn_list)) { struct vfio_iommu_type1_dma_unmap nb_unmap; @@ -1132,6 +1142,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, &nb_unmap); goto again; } + unmapped += dma->size; vfio_remove_dma(iommu, dma); } @@ -2462,22 +2473,57 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, } else if (cmd == VFIO_IOMMU_UNMAP_DMA) { struct vfio_iommu_type1_dma_unmap unmap; - long ret; + unsigned long *bitmap = NULL; + long ret, bsize; minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size); if (copy_from_user(&unmap, (void __user *)arg, minsz)) return -EFAULT; - if (unmap.argsz < minsz || unmap.flags) + if (unmap.argsz < minsz || + unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) return -EINVAL; - ret = vfio_dma_do_unmap(iommu, &unmap); + if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) { + unsigned long pgshift; + uint64_t iommu_pgsizes = vfio_pgsize_bitmap(iommu); + uint64_t iommu_pgmask = + ((uint64_t)1 << __ffs(iommu_pgsizes)) - 1; + + if (copy_from_user(&unmap, (void __user *)arg, + sizeof(unmap))) + return -EFAULT; + + pgshift = __ffs(unmap.bitmap_pgsize); + + if (((unmap.bitmap_pgsize - 1) & iommu_pgmask) != + (unmap.bitmap_pgsize - 1)) + return -EINVAL; + + if ((unmap.bitmap_pgsize & iommu_pgsizes) != + unmap.bitmap_pgsize) + return -EINVAL; + if (unmap.iova + unmap.size < unmap.iova) + return -EINVAL; + if (!access_ok((void __user *)unmap.bitmap, + unmap.bitmap_size)) + return -EINVAL; + + bsize = verify_bitmap_size(unmap.size >> pgshift, + unmap.bitmap_size); + if (bsize < 0) + return bsize; + bitmap = unmap.bitmap; + } + + ret = vfio_dma_do_unmap(iommu, &unmap, bitmap); if (ret) return ret; - return copy_to_user((void __user *)arg, &unmap, minsz) ? + ret = copy_to_user((void __user *)arg, &unmap, minsz) ? -EFAULT : 0; + return ret; } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) { struct vfio_iommu_type1_dirty_bitmap range; uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START | diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index b1b03c720749..a852e729b5a2 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -985,12 +985,24 @@ struct vfio_iommu_type1_dma_map { * field. No guarantee is made to the user that arbitrary unmaps of iova * or size different from those used in the original mapping call will * succeed. + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap + * before unmapping IO virtual addresses. When this flag is set, user should + * allocate memory to get bitmap, clear the bitmap memory by setting zero and + * should set size of allocated memory in bitmap_size field. One bit in bitmap + * represents per page , page of user provided page size in 'bitmap_pgsize', + * consecutively starting from iova offset. Bit set indicates page at that + * offset from iova is dirty. Bitmap of pages in the range of unmapped size is + * returned in bitmap. */ struct vfio_iommu_type1_dma_unmap { __u32 argsz; __u32 flags; +#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0) __u64 iova; /* IO virtual address */ __u64 size; /* Size of mapping (bytes) */ + __u64 bitmap_pgsize; /* page size for bitmap */ + __u64 bitmap_size; /* in bytes */ + void __user *bitmap; /* one bit per page */ }; #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14) From patchwork Fri Feb 7 19:42:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 1235119 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nvidia.com header.i=@nvidia.com header.a=rsa-sha256 header.s=n1 header.b=ppXUOceR; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48Dmp74b86z9sRm for ; Sat, 8 Feb 2020 07:18:23 +1100 (AEDT) Received: from localhost ([::1]:34824 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A4z-000892-Gq for incoming@patchwork.ozlabs.org; Fri, 07 Feb 2020 15:18:21 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54395) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A3h-0006WZ-Sp for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:17:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j0A3g-0007cJ-RN for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:17:01 -0500 Received: from hqnvemgate26.nvidia.com ([216.228.121.65]:11743) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j0A3g-0007ZV-Lq for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:17:00 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 07 Feb 2020 12:16:45 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Fri, 07 Feb 2020 12:16:59 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Fri, 07 Feb 2020 12:16:59 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 20:16:59 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 20:16:53 +0000 From: Kirti Wankhede To: , Subject: [PATCH v12 Kernel 6/7] vfio iommu: Adds flag to indicate dirty pages tracking capability support Date: Sat, 8 Feb 2020 01:12:33 +0530 Message-ID: <1581104554-10704-7-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> References: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581106605; bh=d0zXUZmy9N6FtcKKI+kRwHxPm0xZuAKZgHr8mReuMT8=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=ppXUOceRQde80RTGCzGVExRWrk3Xwtc30M1UpcTd9QplRXGtJIWOd3c5usrIQe41K fSOob20491CCvjTvN57jGKpirgPpWUDn1RLlYP1os7k/xwXJeNhw/so+AIMnaiQnC0 bxapdTP5sdFdI6Le6iMLr1x6TqJ/eE4pb2vUznwyZ0ChCDGI0a8zB3MnEW+9OyGAfp lgXq/KM2ZFmajqDZcS5545zjibcWxuh6jUfA/eSTsdZqqi5866bERFkd0BMWknwWXJ N4Vl3a+TkFrlvmFpLWNNNhXmeN7lb7bmuoQLvBBNhSVEw2WSO4x+p+HQH7fnIrPYxC 7MARqWDB+ofUQ== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.65 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Flag VFIO_IOMMU_INFO_DIRTY_PGS in VFIO_IOMMU_GET_INFO indicates that driver support dirty pages tracking. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 3 ++- include/uapi/linux/vfio.h | 5 +++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 4e6ad0513932..f748a3dbe9f9 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2426,7 +2426,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, info.cap_offset = 0; /* output, no-recopy necessary */ } - info.flags = VFIO_IOMMU_INFO_PGSIZES; + info.flags = VFIO_IOMMU_INFO_PGSIZES | + VFIO_IOMMU_INFO_DIRTY_PGS; info.iova_pgsizes = vfio_pgsize_bitmap(iommu); diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index a852e729b5a2..8528e835541d 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -928,8 +928,9 @@ struct vfio_device_ioeventfd { struct vfio_iommu_type1_info { __u32 argsz; __u32 flags; -#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ -#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */ +#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */ +#define VFIO_IOMMU_INFO_DIRTY_PGS (1 << 2) /* supports dirty page tracking */ __u64 iova_pgsizes; /* Bitmap of supported page sizes */ __u32 cap_offset; /* Offset within info struct of first cap */ }; From patchwork Fri Feb 7 19:42:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 1235124 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nvidia.com header.i=@nvidia.com header.a=rsa-sha256 header.s=n1 header.b=QeWIUtuq; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48Dms31qnCz9sRX for ; Sat, 8 Feb 2020 07:20:55 +1100 (AEDT) Received: from localhost ([::1]:34920 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A7R-0004Pk-6z for incoming@patchwork.ozlabs.org; Fri, 07 Feb 2020 15:20:53 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54434) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j0A3p-0006t2-Md for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:17:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j0A3o-0007og-3I for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:17:09 -0500 Received: from hqnvemgate26.nvidia.com ([216.228.121.65]:11755) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j0A3n-0007kh-Ss for qemu-devel@nongnu.org; Fri, 07 Feb 2020 15:17:08 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 07 Feb 2020 12:16:52 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Fri, 07 Feb 2020 12:17:06 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Fri, 07 Feb 2020 12:17:06 -0800 Received: from HQMAIL101.nvidia.com (172.20.187.10) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 20:17:06 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 20:16:59 +0000 From: Kirti Wankhede To: , Subject: [PATCH v12 Kernel 7/7] vfio: Selective dirty page tracking if IOMMU backed device pins pages Date: Sat, 8 Feb 2020 01:12:34 +0530 Message-ID: <1581104554-10704-8-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> References: <1581104554-10704-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581106612; bh=YVAtaLtfgsC2YtbtiscRi9VO5xbl0xoPRNsPdWoiJ9g=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=QeWIUtuqVsrjKktkTpL8vA+7kHT7G81M2g8WYxEcNLpP3ygKEofIYGQq4Z1D9SwmM 9sTWxJWcKIGAehXqc/Hvg0Go8Bc2ugEZm8AenMEdMzfp0OlY7hhsnr9Wyex3y8Wy3e Symeq5adh1WLE/s83Zp5PDUNRgNnOhnk1fnFOp9JM3rWx6PUms9eNFia1kLHBEVrOx Opmx52E5Q0cI0gCRpwKaDCM8UA0zVnFSbrZUe94ne+DANzHZI67rkgtPAIjEZoUIZj ICpBLLntzvgGZomE+ZRExhY2i/oWD9Eg0eKFOAmDu5YC+scCoJlTwA9Qm2UQTmTeqe ujZ6dmAV/d10Q== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.65 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Added a check such that only singleton IOMMU groups can pin pages. From the point when vendor driver pins any pages, consider IOMMU group dirty page scope to be limited to pinned pages. To optimize to avoid walking list often, added flag pinned_page_dirty_scope to indicate if all of the vfio_groups for each vfio_domain in the domain_list dirty page scope is limited to pinned pages. This flag is updated on first pinned pages request for that IOMMU group and on attaching/detaching group. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio.c | 13 +++++++- drivers/vfio/vfio_iommu_type1.c | 72 +++++++++++++++++++++++++++++++++++++++-- include/linux/vfio.h | 4 ++- 3 files changed, 84 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index c8482624ca34..a941c860b440 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -87,6 +87,7 @@ struct vfio_group { bool noiommu; struct kvm *kvm; struct blocking_notifier_head notifier; + bool is_singleton; }; struct vfio_device { @@ -838,6 +839,12 @@ int vfio_add_group_dev(struct device *dev, return PTR_ERR(device); } + mutex_lock(&group->device_lock); + group->is_singleton = false; + if (list_is_singular(&group->device_list)) + group->is_singleton = true; + mutex_unlock(&group->device_lock); + /* * Drop all but the vfio_device reference. The vfio_device holds * a reference to the vfio_group, which holds a reference to the @@ -1895,6 +1902,9 @@ int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, int npage, if (!group) return -ENODEV; + if (!group->is_singleton) + return -EINVAL; + ret = vfio_group_add_container_user(group); if (ret) goto err_pin_pages; @@ -1902,7 +1912,8 @@ int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, int npage, container = group->container; driver = container->iommu_driver; if (likely(driver && driver->ops->pin_pages)) - ret = driver->ops->pin_pages(container->iommu_data, user_pfn, + ret = driver->ops->pin_pages(container->iommu_data, + group->iommu_group, user_pfn, npage, prot, phys_pfn); else ret = -ENOTTY; diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index f748a3dbe9f9..a787a2bcd757 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -71,6 +71,7 @@ struct vfio_iommu { bool v2; bool nesting; bool dirty_page_tracking; + bool pinned_page_dirty_scope; }; struct vfio_domain { @@ -98,6 +99,7 @@ struct vfio_group { struct iommu_group *iommu_group; struct list_head next; bool mdev_group; /* An mdev group */ + bool has_pinned_pages; }; struct vfio_iova { @@ -129,6 +131,10 @@ struct vfio_regions { static int put_pfn(unsigned long pfn, int prot); static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu); +static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu, + struct iommu_group *iommu_group); + +static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu); /* * This code handles mapping and unmapping of user data buffers * into DMA'ble space using the IOMMU @@ -580,11 +586,13 @@ static int vfio_unpin_page_external(struct vfio_iommu *iommu, } static int vfio_iommu_type1_pin_pages(void *iommu_data, + struct iommu_group *iommu_group, unsigned long *user_pfn, int npage, int prot, unsigned long *phys_pfn) { struct vfio_iommu *iommu = iommu_data; + struct vfio_group *group; int i, j, ret; unsigned long remote_vaddr; struct vfio_dma *dma; @@ -661,8 +669,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, } } } - ret = i; + + group = vfio_iommu_find_iommu_group(iommu, iommu_group); + if (!group->has_pinned_pages) { + group->has_pinned_pages = true; + update_pinned_page_dirty_scope(iommu); + } + goto pin_done; pin_unwind: @@ -938,8 +952,11 @@ static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova, unsigned int npages = 0, shift = 0; unsigned char temp = 0; - /* mark all pages dirty if all pages are pinned and mapped. */ - if (dma->iommu_mapped) { + /* + * mark all pages dirty if any IOMMU capable device is not able + * to report dirty pages and all pages are pinned and mapped. + */ + if (!iommu->pinned_page_dirty_scope && dma->iommu_mapped) { iova_limit = min(dma->iova + dma->size, iova + size); npages = iova_limit/pgsize; bitmap_set(dma->bitmap, 0, npages); @@ -1479,6 +1496,51 @@ static struct vfio_group *find_iommu_group(struct vfio_domain *domain, return NULL; } +static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu, + struct iommu_group *iommu_group) +{ + struct vfio_domain *domain; + struct vfio_group *group = NULL; + + list_for_each_entry(domain, &iommu->domain_list, next) { + group = find_iommu_group(domain, iommu_group); + if (group) + return group; + } + + if (iommu->external_domain) + group = find_iommu_group(iommu->external_domain, iommu_group); + + return group; +} + +static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu) +{ + struct vfio_domain *domain; + struct vfio_group *group; + + list_for_each_entry(domain, &iommu->domain_list, next) { + list_for_each_entry(group, &domain->group_list, next) { + if (!group->has_pinned_pages) { + iommu->pinned_page_dirty_scope = false; + return; + } + } + } + + if (iommu->external_domain) { + domain = iommu->external_domain; + list_for_each_entry(group, &domain->group_list, next) { + if (!group->has_pinned_pages) { + iommu->pinned_page_dirty_scope = false; + return; + } + } + } + + iommu->pinned_page_dirty_scope = true; +} + static bool vfio_iommu_has_sw_msi(struct list_head *group_resv_regions, phys_addr_t *base) { @@ -1885,6 +1947,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, list_add(&group->next, &iommu->external_domain->group_list); + update_pinned_page_dirty_scope(iommu); mutex_unlock(&iommu->lock); return 0; @@ -2007,6 +2070,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, done: /* Delete the old one and insert new iova list */ vfio_iommu_iova_insert_copy(iommu, &iova_copy); + update_pinned_page_dirty_scope(iommu); mutex_unlock(&iommu->lock); vfio_iommu_resv_free(&group_resv_regions); @@ -2021,6 +2085,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, out_free: kfree(domain); kfree(group); + update_pinned_page_dirty_scope(iommu); mutex_unlock(&iommu->lock); return ret; } @@ -2225,6 +2290,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, vfio_iommu_iova_free(&iova_copy); detach_group_done: + update_pinned_page_dirty_scope(iommu); mutex_unlock(&iommu->lock); } diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e42a711a2800..da29802d6276 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -72,7 +72,9 @@ struct vfio_iommu_driver_ops { struct iommu_group *group); void (*detach_group)(void *iommu_data, struct iommu_group *group); - int (*pin_pages)(void *iommu_data, unsigned long *user_pfn, + int (*pin_pages)(void *iommu_data, + struct iommu_group *group, + unsigned long *user_pfn, int npage, int prot, unsigned long *phys_pfn); int (*unpin_pages)(void *iommu_data,