From patchwork Thu May 17 08:59:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 915219 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40mlcv6nGDz9s1d for ; Thu, 17 May 2018 19:00:14 +1000 (AEST) Received: from localhost ([::1]:43777 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fJElc-0000iV-Ty for incoming@patchwork.ozlabs.org; Thu, 17 May 2018 05:00:08 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59122) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fJEl6-0000hw-7w for qemu-devel@nongnu.org; Thu, 17 May 2018 04:59:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fJEl3-0006bd-5K for qemu-devel@nongnu.org; Thu, 17 May 2018 04:59:36 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:51792 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fJEl3-0006bZ-04 for qemu-devel@nongnu.org; Thu, 17 May 2018 04:59:33 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5D98D4075386; Thu, 17 May 2018 08:59:32 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-109.pek2.redhat.com [10.72.12.109]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7AB622166BAD; Thu, 17 May 2018 08:59:29 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Thu, 17 May 2018 16:59:15 +0800 Message-Id: <20180517085927.24925-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 17 May 2018 08:59:32 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 17 May 2018 08:59:32 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v3 00/12] intel-iommu: nested vIOMMU, cleanups, bug fixes X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tian Kevin , "Michael S . Tsirkin" , Jason Wang , peterx@redhat.com, Alex Williamson , Jintack Lim Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" (Hello, Jintack, Feel free to test this branch again against your scp error case when you got free time) I rewrote some of the patches in V3. Major changes: - Dropped mergable interval tree, instead introduced IOVA tree, which is even simpler. - Fix the scp error issue that Jintack reported. Please see patches for detailed information. That's the major reason to rewrite a few of the patches. We use replay for domain flushes are possibly incorrect in the past. The thing is that IOMMU replay has an "definition" that "we should only send MAP when new page detected", while for shadow page syncing we actually need something else than that. So in this version I started to use a new vtd_sync_shadow_page_table() helper to do the page sync. - Some other refines after the refactoring. I'll add unit test for the IOVA tree after this series merged to make sure we won't switch to another new tree implementaion... The element size in the new IOVA tree should be around sizeof(GTreeNode + IOMMUTLBEntry) ~= (5*8+4*8) = 72 bytes. So the worst case usage ratio would be 72/4K=2%, which still seems acceptable (it means 8G L2 guest will use 8G*2%=160MB as metadata to maintain the mapping in QEMU). I did explicit test with scp this time, copying 1G sized file for >10 times on each of the following case: - L1 guest, with vIOMMU and with assigned device - L2 guest, without vIOMMU and with assigned device - L2 guest, with vIOMMU (so 3-layer nested IOMMU) and with assigned device Please review. Thanks, (Below are old content from previous cover letter) ========================== v2: - fix patchew code style warnings - interval tree: postpone malloc when inserting; simplify node remove a bit where proper [Jason] - fix up comment and commit message for iommu lock patch [Kevin] - protect context cache too using the iommu lock [Kevin, Jason] - add vast comment in patch 8 to explain the modify-PTE problem [Jason, Kevin] Online repo: https://github.com/xzpeter/qemu/tree/fix-vtd-dma This series fixes several major problems that current code has: - Issue 1: when getting very big PSI UNMAP invalidations, the current code is buggy in that we might skip the notification while actually we should always send that notification. - Issue 2: IOTLB is not thread safe, while block dataplane can be accessing and updating it in parallel. - Issue 3: For devices that only registered with UNMAP-only notifiers, we don't really need to do page walking for PSIs, we can directly deliver the notification down. For example, vhost. - Issue 4: unsafe window for MAP notified devices like vfio-pci (and in the future, vDPA as well). The problem is that, now for domain invalidations we do this to make sure the shadow page tables are correctly synced: 1. unmap the whole address space 2. replay the whole address space, map existing pages However during step 1 and 2 there will be a very tiny window (it can be as big as 3ms) that the shadow page table is either invalid or incomplete (since we're rebuilding it up). That's fatal error since devices never know that happending and it's still possible to DMA to memories. Patch 1 fixes issue 1. I put it at the first since it's picked from an old post. Patch 2 is a cleanup to remove useless IntelIOMMUNotifierNode struct. Patch 3 fixes issue 2. Patch 4 fixes issue 3. Patch 5-9 fix issue 4. Here a very simple interval tree is implemented based on Gtree. It's different with general interval tree in that it does not allow user to pass in private data (e.g., translated addresses). However that benefits us that then we can merge adjacent interval leaves so that hopefully we won't consume much memory even if the mappings are a lot (that happens for nested virt - when mapping the whole L2 guest RAM range, it can be at least in GBs). Patch 10 is another big cleanup only can work after patch 9. Tests: - device assignments to L1, even L2 guests. With this series applied (and the kernel IOMMU patches: https://lkml.org/lkml/2018/4/18/5), we can even nest vIOMMU now, e.g., we can specify vIOMMU in L2 guest with assigned devices and things will work. We can't before. - vhost smoke test for regression. Please review. Thanks, Peter Xu (12): intel-iommu: send PSI always even if across PDEs intel-iommu: remove IntelIOMMUNotifierNode intel-iommu: add iommu lock intel-iommu: only do page walk for MAP notifiers intel-iommu: introduce vtd_page_walk_info intel-iommu: pass in address space when page walk intel-iommu: trace domain id during page walk util: implement simple iova tree intel-iommu: maintain per-device iova ranges intel-iommu: simplify page walk logic intel-iommu: new vtd_sync_shadow_page_table_range intel-iommu: new sync_shadow_page_table include/hw/i386/intel_iommu.h | 19 +- include/qemu/iova-tree.h | 134 ++++++++++++ hw/i386/intel_iommu.c | 381 +++++++++++++++++++++++++--------- util/iova-tree.c | 114 ++++++++++ MAINTAINERS | 6 + hw/i386/trace-events | 5 +- util/Makefile.objs | 1 + 7 files changed, 556 insertions(+), 104 deletions(-) create mode 100644 include/qemu/iova-tree.h create mode 100644 util/iova-tree.c Tested-by: Jintack Lim