[{"id":1915089,"web_url":"http://patchwork.ozlabs.org/comment/1915089/","msgid":"<010f281a-ba20-3ad9-5842-ae93bfd4b987@redhat.com>","list_archive_url":null,"date":"2018-05-17T14:42:54","subject":"Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even\n\tif across PDEs","submitter":{"id":69187,"url":"http://patchwork.ozlabs.org/api/people/69187/","name":"Eric Auger","email":"eric.auger@redhat.com"},"content":"Hi Peter,\n\nOn 05/04/2018 05:08 AM, Peter Xu wrote:\n> During IOVA page table walking, there is a special case when the PSI\n> covers one whole PDE (Page Directory Entry, which contains 512 Page\n> Table Entries) or more.  In the past, we skip that entry and we don't\n> notify the IOMMU notifiers.  This is not correct.  We should send UNMAP\n> notification to registered UNMAP notifiers in this case.\n> \n> For UNMAP only notifiers, this might cause IOTLBs cached in the devices\n> even if they were already invalid.  For MAP/UNMAP notifiers like\n> vfio-pci, this will cause stale page mappings.\n> \n> This special case doesn't trigger often, but it is very easy to be\n> triggered by nested device assignments, since in that case we'll\n> possibly map the whole L2 guest RAM region into the device's IOVA\n> address space (several GBs at least), which is far bigger than normal\n> kernel driver usages of the device (tens of MBs normally).\n> \n> Without this patch applied to L1 QEMU, nested device assignment to L2\n> guests will dump some errors like:\n> \n> qemu-system-x86_64: VFIO_MAP_DMA: -17\n> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,\n>                     0x7f89a920d000) = -17 (File exists)\n> \n> Acked-by: Jason Wang <jasowang@redhat.com>\n> [peterx: rewrite the commit message]\n> Signed-off-by: Peter Xu <peterx@redhat.com>\n> ---\n>  hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------\n>  1 file changed, 30 insertions(+), 12 deletions(-)\n> \n> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c\n> index fb31de9416..b359efd6f9 100644\n> --- a/hw/i386/intel_iommu.c\n> +++ b/hw/i386/intel_iommu.c\n> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,\n>  \n>  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);\n>  \n> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,\n> +                             vtd_page_walk_hook hook_fn, void *private)\nI find the function  name a bit weird as it does not does a ptw but\nrather call a callback on an entry. vtd_callback_wrapper?\n> +{\n> +    assert(hook_fn);\n> +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,\n> +                            entry->addr_mask, entry->perm);\n> +    return hook_fn(entry, private);\n> +}\n> +\n>  /**\n>   * vtd_page_walk_level - walk over specific level for IOVA range\n>   *\n> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,\n>           */\n>          entry_valid = read_cur | write_cur;\n>  \n> +        entry.target_as = &address_space_memory;\n> +        entry.iova = iova & subpage_mask;\n> +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);\n> +        entry.addr_mask = ~subpage_mask;\n> +\n>          if (vtd_is_last_slpte(slpte, level)) {\n> -            entry.target_as = &address_space_memory;\n> -            entry.iova = iova & subpage_mask;\n>              /* NOTE: this is only meaningful if entry_valid == true */\n>              entry.translated_addr = vtd_get_slpte_addr(slpte, aw);\n> -            entry.addr_mask = ~subpage_mask;\n> -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);\n>              if (!entry_valid && !notify_unmap) {\n>                  trace_vtd_page_walk_skip_perm(iova, iova_next);\n>                  goto next;\n>              }\n> -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,\n> -                                    entry.addr_mask, entry.perm);\n> -            if (hook_fn) {\n> -                ret = hook_fn(&entry, private);\n> -                if (ret < 0) {\n> -                    return ret;\n> -                }\n> +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);\n> +            if (ret < 0) {\n> +                return ret;\n>              }\n>          } else {\n>              if (!entry_valid) {\n> -                trace_vtd_page_walk_skip_perm(iova, iova_next);\n> +                if (notify_unmap) {\n> +                    /*\n> +                     * The whole entry is invalid; unmap it all.\n> +                     * Translated address is meaningless, zero it.\n> +                     */\n> +                    entry.translated_addr = 0x0;\ndo you really need to zero the translated_addr and the related comment.\nAs soon as perm is NONE this should not be used?\n> +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);\n> +                    if (ret < 0) {\n> +                        return ret;\n> +                    }\n> +                } else {\n> +                    trace_vtd_page_walk_skip_perm(iova, iova_next);\n> +                }\n>                  goto next;\n>              }\n>              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,\n> \n\nBesides\nReviewed-by: Eric Auger <eric.auger@redhat.com>\n\nThanks\n\nEric","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdmarc=fail (p=none dis=none) header.from=redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 40mvDw6D5Bz9s3M\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 18 May 2018 00:43:28 +1000 (AEST)","from localhost ([::1]:58980 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1fJK7q-0006Fp-GH\n\tfor incoming@patchwork.ozlabs.org; Thu, 17 May 2018 10:43:26 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:34940)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <eric.auger@redhat.com>) id 1fJK7T-0006F1-0t\n\tfor qemu-devel@nongnu.org; Thu, 17 May 2018 10:43:08 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <eric.auger@redhat.com>) id 1fJK7N-0005SR-Qj\n\tfor qemu-devel@nongnu.org; Thu, 17 May 2018 10:43:02 -0400","from mx3-rdu2.redhat.com ([66.187.233.73]:39020\n\thelo=mx1.redhat.com)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <eric.auger@redhat.com>)\n\tid 1fJK7N-0005SH-Kn\n\tfor qemu-devel@nongnu.org; Thu, 17 May 2018 10:42:57 -0400","from smtp.corp.redhat.com\n\t(int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id EEFDF8151D40;\n\tThu, 17 May 2018 14:42:56 +0000 (UTC)","from localhost.localdomain (ovpn-117-111.ams2.redhat.com\n\t[10.36.117.111])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 890F22023582;\n\tThu, 17 May 2018 14:42:55 +0000 (UTC)"],"To":"Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org","References":"<20180504030811.28111-1-peterx@redhat.com>\n\t<20180504030811.28111-2-peterx@redhat.com>","From":"Auger Eric <eric.auger@redhat.com>","Message-ID":"<010f281a-ba20-3ad9-5842-ae93bfd4b987@redhat.com>","Date":"Thu, 17 May 2018 16:42:54 +0200","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101\n\tThunderbird/45.4.0","MIME-Version":"1.0","In-Reply-To":"<20180504030811.28111-2-peterx@redhat.com>","Content-Type":"text/plain; charset=windows-1252","Content-Transfer-Encoding":"7bit","X-Scanned-By":"MIMEDefang 2.78 on 10.11.54.4","X-Greylist":["Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.11.55.8]);\n\tThu, 17 May 2018 14:42:56 +0000 (UTC)","inspected by milter-greylist-4.5.16 (mx1.redhat.com\n\t[10.11.55.8]); \n\tThu, 17 May 2018 14:42:56 +0000 (UTC) for IP:'10.11.54.4'\n\tDOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com'\n\tHELO:'smtp.corp.redhat.com' FROM:'eric.auger@redhat.com' RCPT:''"],"X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"66.187.233.73","Subject":"Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even\n\tif across PDEs","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Jintack Lim <jintack@cs.columbia.edu>, Tian Kevin <kevin.tian@intel.com>,\n\tAlex Williamson <alex.williamson@redhat.com>,\n\tJason Wang <jasowang@redhat.com>, \"Michael S . Tsirkin\" <mst@redhat.com>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1915600,"web_url":"http://patchwork.ozlabs.org/comment/1915600/","msgid":"<20180518034153.GB2569@xz-mi>","list_archive_url":null,"date":"2018-05-18T03:41:53","subject":"Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even\n\tif across PDEs","submitter":{"id":67717,"url":"http://patchwork.ozlabs.org/api/people/67717/","name":"Peter Xu","email":"peterx@redhat.com"},"content":"On Thu, May 17, 2018 at 04:42:54PM +0200, Auger Eric wrote:\n> Hi Peter,\n> \n> On 05/04/2018 05:08 AM, Peter Xu wrote:\n> > During IOVA page table walking, there is a special case when the PSI\n> > covers one whole PDE (Page Directory Entry, which contains 512 Page\n> > Table Entries) or more.  In the past, we skip that entry and we don't\n> > notify the IOMMU notifiers.  This is not correct.  We should send UNMAP\n> > notification to registered UNMAP notifiers in this case.\n> > \n> > For UNMAP only notifiers, this might cause IOTLBs cached in the devices\n> > even if they were already invalid.  For MAP/UNMAP notifiers like\n> > vfio-pci, this will cause stale page mappings.\n> > \n> > This special case doesn't trigger often, but it is very easy to be\n> > triggered by nested device assignments, since in that case we'll\n> > possibly map the whole L2 guest RAM region into the device's IOVA\n> > address space (several GBs at least), which is far bigger than normal\n> > kernel driver usages of the device (tens of MBs normally).\n> > \n> > Without this patch applied to L1 QEMU, nested device assignment to L2\n> > guests will dump some errors like:\n> > \n> > qemu-system-x86_64: VFIO_MAP_DMA: -17\n> > qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,\n> >                     0x7f89a920d000) = -17 (File exists)\n> > \n> > Acked-by: Jason Wang <jasowang@redhat.com>\n> > [peterx: rewrite the commit message]\n> > Signed-off-by: Peter Xu <peterx@redhat.com>\n> > ---\n> >  hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------\n> >  1 file changed, 30 insertions(+), 12 deletions(-)\n> > \n> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c\n> > index fb31de9416..b359efd6f9 100644\n> > --- a/hw/i386/intel_iommu.c\n> > +++ b/hw/i386/intel_iommu.c\n> > @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,\n> >  \n> >  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);\n> >  \n> > +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,\n> > +                             vtd_page_walk_hook hook_fn, void *private)\n> I find the function  name a bit weird as it does not does a ptw but\n> rather call a callback on an entry. vtd_callback_wrapper?\n\nIt's a hook for the page walk process, and IMHO vtd_callback_wrapper\ndoes not really provide any hint for the page walking.  So even if you\nprefer the \"callback_wrapper\" naming I would still more prefer:\n\n  vtd_page_walk_callback[_wrapper]\n\nthough if so I'd say I don't see much benefit comparing to use the old\nvtd_page_walk_hook, which seems fine to me too...\n\n> > +{\n> > +    assert(hook_fn);\n> > +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,\n> > +                            entry->addr_mask, entry->perm);\n> > +    return hook_fn(entry, private);\n> > +}\n> > +\n> >  /**\n> >   * vtd_page_walk_level - walk over specific level for IOVA range\n> >   *\n> > @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,\n> >           */\n> >          entry_valid = read_cur | write_cur;\n> >  \n> > +        entry.target_as = &address_space_memory;\n> > +        entry.iova = iova & subpage_mask;\n> > +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);\n> > +        entry.addr_mask = ~subpage_mask;\n> > +\n> >          if (vtd_is_last_slpte(slpte, level)) {\n> > -            entry.target_as = &address_space_memory;\n> > -            entry.iova = iova & subpage_mask;\n> >              /* NOTE: this is only meaningful if entry_valid == true */\n> >              entry.translated_addr = vtd_get_slpte_addr(slpte, aw);\n> > -            entry.addr_mask = ~subpage_mask;\n> > -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);\n> >              if (!entry_valid && !notify_unmap) {\n> >                  trace_vtd_page_walk_skip_perm(iova, iova_next);\n> >                  goto next;\n> >              }\n> > -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,\n> > -                                    entry.addr_mask, entry.perm);\n> > -            if (hook_fn) {\n> > -                ret = hook_fn(&entry, private);\n> > -                if (ret < 0) {\n> > -                    return ret;\n> > -                }\n> > +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);\n> > +            if (ret < 0) {\n> > +                return ret;\n> >              }\n> >          } else {\n> >              if (!entry_valid) {\n> > -                trace_vtd_page_walk_skip_perm(iova, iova_next);\n> > +                if (notify_unmap) {\n> > +                    /*\n> > +                     * The whole entry is invalid; unmap it all.\n> > +                     * Translated address is meaningless, zero it.\n> > +                     */\n> > +                    entry.translated_addr = 0x0;\n> do you really need to zero the translated_addr and the related comment.\n> As soon as perm is NONE this should not be used?\n\nYes here we can avoid setting it.  However that'll make sure we don't\nleak strange numbers to the below notifiers, so I would still slightly\nprefer to zero it here.\n\n> > +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);\n> > +                    if (ret < 0) {\n> > +                        return ret;\n> > +                    }\n> > +                } else {\n> > +                    trace_vtd_page_walk_skip_perm(iova, iova_next);\n> > +                }\n> >                  goto next;\n> >              }\n> >              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,\n> > \n> \n> Besides\n> Reviewed-by: Eric Auger <eric.auger@redhat.com>\n\nThanks for reviewing the series.\n\nNote that this v2 is obsolete, please feel free to read version 3 of\nthis series, which contains quite a lot of functional changes.\n\nRegards,","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdmarc=fail (p=none dis=none) header.from=redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 40nDWl2QTfz9s0y\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 18 May 2018 13:42:25 +1000 (AEST)","from localhost ([::1]:36488 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1fJWHe-0002Nx-So\n\tfor incoming@patchwork.ozlabs.org; Thu, 17 May 2018 23:42:22 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:44427)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1fJWHM-0002Nn-3E\n\tfor qemu-devel@nongnu.org; Thu, 17 May 2018 23:42:05 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <peterx@redhat.com>) id 1fJWHJ-0005zd-0I\n\tfor qemu-devel@nongnu.org; Thu, 17 May 2018 23:42:04 -0400","from mx3-rdu2.redhat.com ([66.187.233.73]:56554\n\thelo=mx1.redhat.com)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <peterx@redhat.com>) id 1fJWHI-0005yx-PX\n\tfor qemu-devel@nongnu.org; Thu, 17 May 2018 23:42:00 -0400","from smtp.corp.redhat.com\n\t(int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id D9D0CC12A7;\n\tFri, 18 May 2018 03:41:58 +0000 (UTC)","from xz-mi (dhcp-14-151.nay.redhat.com [10.66.14.151])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id C437A2024CBC;\n\tFri, 18 May 2018 03:41:55 +0000 (UTC)"],"Date":"Fri, 18 May 2018 11:41:53 +0800","From":"Peter Xu <peterx@redhat.com>","To":"Auger Eric <eric.auger@redhat.com>","Message-ID":"<20180518034153.GB2569@xz-mi>","References":"<20180504030811.28111-1-peterx@redhat.com>\n\t<20180504030811.28111-2-peterx@redhat.com>\n\t<010f281a-ba20-3ad9-5842-ae93bfd4b987@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=utf-8","Content-Disposition":"inline","In-Reply-To":"<010f281a-ba20-3ad9-5842-ae93bfd4b987@redhat.com>","User-Agent":"Mutt/1.9.5 (2018-04-13)","X-Scanned-By":"MIMEDefang 2.78 on 10.11.54.4","X-Greylist":["Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.11.55.1]);\n\tFri, 18 May 2018 03:41:58 +0000 (UTC)","inspected by milter-greylist-4.5.16 (mx1.redhat.com\n\t[10.11.55.1]); \n\tFri, 18 May 2018 03:41:58 +0000 (UTC) for IP:'10.11.54.4'\n\tDOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com'\n\tHELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:''"],"X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"66.187.233.73","Subject":"Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even\n\tif across PDEs","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Tian Kevin <kevin.tian@intel.com>, \"Michael S . Tsirkin\" <mst@redhat.com>,\n\tJason Wang <jasowang@redhat.com>, qemu-devel@nongnu.org,\n\tAlex Williamson <alex.williamson@redhat.com>,\n\tJintack Lim <jintack@cs.columbia.edu>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}},{"id":1915687,"web_url":"http://patchwork.ozlabs.org/comment/1915687/","msgid":"<941044eb-2f85-3d1d-a9d9-2c257648680e@redhat.com>","list_archive_url":null,"date":"2018-05-18T07:39:42","subject":"Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even\n\tif across PDEs","submitter":{"id":69187,"url":"http://patchwork.ozlabs.org/api/people/69187/","name":"Eric Auger","email":"eric.auger@redhat.com"},"content":"Hi,\n\nOn 05/18/2018 05:41 AM, Peter Xu wrote:\n> On Thu, May 17, 2018 at 04:42:54PM +0200, Auger Eric wrote:\n>> Hi Peter,\n>>\n>> On 05/04/2018 05:08 AM, Peter Xu wrote:\n>>> During IOVA page table walking, there is a special case when the PSI\n>>> covers one whole PDE (Page Directory Entry, which contains 512 Page\n>>> Table Entries) or more.  In the past, we skip that entry and we don't\n>>> notify the IOMMU notifiers.  This is not correct.  We should send UNMAP\n>>> notification to registered UNMAP notifiers in this case.\n>>>\n>>> For UNMAP only notifiers, this might cause IOTLBs cached in the devices\n>>> even if they were already invalid.  For MAP/UNMAP notifiers like\n>>> vfio-pci, this will cause stale page mappings.\n>>>\n>>> This special case doesn't trigger often, but it is very easy to be\n>>> triggered by nested device assignments, since in that case we'll\n>>> possibly map the whole L2 guest RAM region into the device's IOVA\n>>> address space (several GBs at least), which is far bigger than normal\n>>> kernel driver usages of the device (tens of MBs normally).\n>>>\n>>> Without this patch applied to L1 QEMU, nested device assignment to L2\n>>> guests will dump some errors like:\n>>>\n>>> qemu-system-x86_64: VFIO_MAP_DMA: -17\n>>> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,\n>>>                     0x7f89a920d000) = -17 (File exists)\n>>>\n>>> Acked-by: Jason Wang <jasowang@redhat.com>\n>>> [peterx: rewrite the commit message]\n>>> Signed-off-by: Peter Xu <peterx@redhat.com>\n>>> ---\n>>>  hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------\n>>>  1 file changed, 30 insertions(+), 12 deletions(-)\n>>>\n>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c\n>>> index fb31de9416..b359efd6f9 100644\n>>> --- a/hw/i386/intel_iommu.c\n>>> +++ b/hw/i386/intel_iommu.c\n>>> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,\n>>>  \n>>>  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);\n>>>  \n>>> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,\n>>> +                             vtd_page_walk_hook hook_fn, void *private)\n>> I find the function  name a bit weird as it does not does a ptw but\n>> rather call a callback on an entry. vtd_callback_wrapper?\n> \n> It's a hook for the page walk process, and IMHO vtd_callback_wrapper\n> does not really provide any hint for the page walking.  So even if you\n> prefer the \"callback_wrapper\" naming I would still more prefer:\n> \n>   vtd_page_walk_callback[_wrapper]\n> \n> though if so I'd say I don't see much benefit comparing to use the old\n> vtd_page_walk_hook, which seems fine to me too...\nI preferred vtd_page_walk_hook too.\n\nThanks\n\nEric\n> \n>>> +{\n>>> +    assert(hook_fn);\n>>> +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,\n>>> +                            entry->addr_mask, entry->perm);\n>>> +    return hook_fn(entry, private);\n>>> +}\n>>> +\n>>>  /**\n>>>   * vtd_page_walk_level - walk over specific level for IOVA range\n>>>   *\n>>> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,\n>>>           */\n>>>          entry_valid = read_cur | write_cur;\n>>>  \n>>> +        entry.target_as = &address_space_memory;\n>>> +        entry.iova = iova & subpage_mask;\n>>> +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);\n>>> +        entry.addr_mask = ~subpage_mask;\n>>> +\n>>>          if (vtd_is_last_slpte(slpte, level)) {\n>>> -            entry.target_as = &address_space_memory;\n>>> -            entry.iova = iova & subpage_mask;\n>>>              /* NOTE: this is only meaningful if entry_valid == true */\n>>>              entry.translated_addr = vtd_get_slpte_addr(slpte, aw);\n>>> -            entry.addr_mask = ~subpage_mask;\n>>> -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);\n>>>              if (!entry_valid && !notify_unmap) {\n>>>                  trace_vtd_page_walk_skip_perm(iova, iova_next);\n>>>                  goto next;\n>>>              }\n>>> -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,\n>>> -                                    entry.addr_mask, entry.perm);\n>>> -            if (hook_fn) {\n>>> -                ret = hook_fn(&entry, private);\n>>> -                if (ret < 0) {\n>>> -                    return ret;\n>>> -                }\n>>> +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);\n>>> +            if (ret < 0) {\n>>> +                return ret;\n>>>              }\n>>>          } else {\n>>>              if (!entry_valid) {\n>>> -                trace_vtd_page_walk_skip_perm(iova, iova_next);\n>>> +                if (notify_unmap) {\n>>> +                    /*\n>>> +                     * The whole entry is invalid; unmap it all.\n>>> +                     * Translated address is meaningless, zero it.\n>>> +                     */\n>>> +                    entry.translated_addr = 0x0;\n>> do you really need to zero the translated_addr and the related comment.\n>> As soon as perm is NONE this should not be used?\n> \n> Yes here we can avoid setting it.  However that'll make sure we don't\n> leak strange numbers to the below notifiers, so I would still slightly\n> prefer to zero it here.\n> \n>>> +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);\n>>> +                    if (ret < 0) {\n>>> +                        return ret;\n>>> +                    }\n>>> +                } else {\n>>> +                    trace_vtd_page_walk_skip_perm(iova, iova_next);\n>>> +                }\n>>>                  goto next;\n>>>              }\n>>>              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,\n>>>\n>>\n>> Besides\n>> Reviewed-by: Eric Auger <eric.auger@redhat.com>\n> \n> Thanks for reviewing the series.\n> \n> Note that this v2 is obsolete, please feel free to read version 3 of\n> this series, which contains quite a lot of functional changes.\n> \n> Regards,\n>","headers":{"Return-Path":"<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=nongnu.org\n\t(client-ip=2001:4830:134:3::11; helo=lists.gnu.org;\n\tenvelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org;\n\tdmarc=fail (p=none dis=none) header.from=redhat.com"],"Received":["from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])\n\t(using TLSv1 with cipher AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 40nKpF55WFz9s3B\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 18 May 2018 17:40:21 +1000 (AEST)","from localhost ([::1]:37133 helo=lists.gnu.org)\n\tby lists.gnu.org with esmtp (Exim 4.71) (envelope-from\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)\n\tid 1fJZzv-0001EL-1W\n\tfor incoming@patchwork.ozlabs.org; Fri, 18 May 2018 03:40:19 -0400","from eggs.gnu.org ([2001:4830:134:3::10]:47075)\n\tby lists.gnu.org with esmtp (Exim 4.71)\n\t(envelope-from <eric.auger@redhat.com>) id 1fJZzQ-0001E0-Js\n\tfor qemu-devel@nongnu.org; Fri, 18 May 2018 03:39:49 -0400","from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)\n\t(envelope-from <eric.auger@redhat.com>) id 1fJZzN-0004Z5-Fy\n\tfor qemu-devel@nongnu.org; Fri, 18 May 2018 03:39:48 -0400","from mx3-rdu2.redhat.com ([66.187.233.73]:52676\n\thelo=mx1.redhat.com)\n\tby eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)\n\t(Exim 4.71) (envelope-from <eric.auger@redhat.com>)\n\tid 1fJZzN-0004Yo-9E\n\tfor qemu-devel@nongnu.org; Fri, 18 May 2018 03:39:45 -0400","from smtp.corp.redhat.com\n\t(int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id CCE887D84C;\n\tFri, 18 May 2018 07:39:44 +0000 (UTC)","from localhost.localdomain (ovpn-117-111.ams2.redhat.com\n\t[10.36.117.111])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 73E232166BAD;\n\tFri, 18 May 2018 07:39:43 +0000 (UTC)"],"To":"Peter Xu <peterx@redhat.com>","References":"<20180504030811.28111-1-peterx@redhat.com>\n\t<20180504030811.28111-2-peterx@redhat.com>\n\t<010f281a-ba20-3ad9-5842-ae93bfd4b987@redhat.com>\n\t<20180518034153.GB2569@xz-mi>","From":"Auger Eric <eric.auger@redhat.com>","Message-ID":"<941044eb-2f85-3d1d-a9d9-2c257648680e@redhat.com>","Date":"Fri, 18 May 2018 09:39:42 +0200","User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101\n\tThunderbird/45.4.0","MIME-Version":"1.0","In-Reply-To":"<20180518034153.GB2569@xz-mi>","Content-Type":"text/plain; charset=utf-8","Content-Transfer-Encoding":"7bit","X-Scanned-By":"MIMEDefang 2.78 on 10.11.54.6","X-Greylist":["Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.11.55.2]);\n\tFri, 18 May 2018 07:39:44 +0000 (UTC)","inspected by milter-greylist-4.5.16 (mx1.redhat.com\n\t[10.11.55.2]); \n\tFri, 18 May 2018 07:39:44 +0000 (UTC) for IP:'10.11.54.6'\n\tDOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com'\n\tHELO:'smtp.corp.redhat.com' FROM:'eric.auger@redhat.com' RCPT:''"],"X-detected-operating-system":"by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]\n\t[fuzzy]","X-Received-From":"66.187.233.73","Subject":"Re: [Qemu-devel] [PATCH v2 01/10] intel-iommu: send PSI always even\n\tif across PDEs","X-BeenThere":"qemu-devel@nongnu.org","X-Mailman-Version":"2.1.21","Precedence":"list","List-Id":"<qemu-devel.nongnu.org>","List-Unsubscribe":"<https://lists.nongnu.org/mailman/options/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>","List-Archive":"<http://lists.nongnu.org/archive/html/qemu-devel/>","List-Post":"<mailto:qemu-devel@nongnu.org>","List-Help":"<mailto:qemu-devel-request@nongnu.org?subject=help>","List-Subscribe":"<https://lists.nongnu.org/mailman/listinfo/qemu-devel>,\n\t<mailto:qemu-devel-request@nongnu.org?subject=subscribe>","Cc":"Tian Kevin <kevin.tian@intel.com>, \"Michael S . Tsirkin\" <mst@redhat.com>,\n\tJason Wang <jasowang@redhat.com>, qemu-devel@nongnu.org,\n\tAlex Williamson <alex.williamson@redhat.com>,\n\tJintack Lim <jintack@cs.columbia.edu>","Errors-To":"qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org","Sender":"\"Qemu-devel\"\n\t<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>"}}]