From patchwork Thu Nov  7 18:54:13 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Michael S. Tsirkin" <mst@redhat.com>
X-Patchwork-Id: 289456
Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by ozlabs.org (Postfix) with ESMTPS id 901D22C007B
	for <incoming@patchwork.ozlabs.org>;
	Fri,  8 Nov 2013 05:51:46 +1100 (EST)
Received: from localhost ([::1]:41745 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>)
	id 1VeUgC-0000BK-Pp
	for incoming@patchwork.ozlabs.org; Thu, 07 Nov 2013 13:51:44 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:51176)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1VeUfq-0008UV-S0
	for qemu-devel@nongnu.org; Thu, 07 Nov 2013 13:51:29 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1VeUfk-0002gs-TB
	for qemu-devel@nongnu.org; Thu, 07 Nov 2013 13:51:22 -0500
Received: from mx1.redhat.com ([209.132.183.28]:28915)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1VeUfk-0002go-Kn
	for qemu-devel@nongnu.org; Thu, 07 Nov 2013 13:51:16 -0500
Received: from int-mx01.intmail.prod.int.phx2.redhat.com
	(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id rA7IpFdA004956
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <qemu-devel@nongnu.org>; Thu, 7 Nov 2013 13:51:15 -0500
Received: from redhat.com (vpn1-6-106.ams2.redhat.com [10.36.6.106])
	by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with
	SMTP id rA7IpCuG012376; Thu, 7 Nov 2013 13:51:13 -0500
Date: Thu, 7 Nov 2013 20:54:13 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20131107185413.GA4974@redhat.com>
References: <1383840877-2861-1-git-send-email-pbonzini@redhat.com>
	<20131107162131.GA4370@redhat.com> <527BBFDB.2010404@redhat.com>
	<20131107164705.GA4572@redhat.com> <527BCE04.9020107@redhat.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <527BCE04.9020107@redhat.com>
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-Received-From: 209.132.183.28
Cc: marcel.a@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com
Subject: Re: [Qemu-devel] [PATCH 0/2] exec: alternative fix for master abort
	woes
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

On Thu, Nov 07, 2013 at 06:29:40PM +0100, Paolo Bonzini wrote:
> Il 07/11/2013 17:47, Michael S. Tsirkin ha scritto:
> > That's on kvm with 52 bit address.
> > But where I would be concerned is systems with e.g. 36 bit address
> > space where we are doubling the cost of the lookup.
> > E.g. try i386 and not x86_64.
> 
> Tried now...
> 
>                 P_L2_LEVELS pre-patch           post-patch
>    i386         3                               6
>    x86_64       4                               6
> 
> I timed the inl_from_qemu test of vmexit.flat with both KVM and TCG.  With
> TCG there's indeed a visible penalty of 20 cycles for i386 and 10 for x86_64
> (you can extrapolate to 30 cycles for TARGET_PHYS_ADDR_SPACE_BITS=32 targets).
> These can be more or less entirely ascribed to phys_page_find:
> 
>                                  TCG             |      KVM
>                            pre-patch  post-patch |  pre-patch   post-patch
> phys_page_find(i386)          13%         25%    |     0.6%         1%
> inl_from_qemu cycles(i386)    153         173    |   ~12000      ~12000

I'm a bit confused by the numbers above. The % of phys_page_find has
grown from 13% to  25% (almost double, which is kind of expected
give we have twice the # of levels). But overhead in # of cycles only went from 153 to
173? Maybe the test is a bit wrong for tcg - how about unrolling the
loop in kvm unit test?


Then you have to divide the reported result by 10.

> phys_page_find(x86_64)        18%         25%    |     0.8%         1%
> inl_from_qemu cycles(x86_64)  163         173    |   ~12000      ~12000
> 
> Thus this patch costs 0.4% in the worst case for KVM, 12% in the worst case
> for TCG.  The cycle breakdown is:
> 
>     60 phys_page_find
>     28 access_with_adjusted_size
>     24 address_space_translate_internal
>     20 address_space_rw
>     13 io_mem_read
>     11 address_space_translate
>      9 memory_region_read_accessor
>      6 memory_region_access_valid
>      4 helper_inl
>      4 memory_access_size
>      3 cpu_inl
> 
> (This run reported 177 cycles per access; the total is 182 due to rounding).
> It is probably possible to shave at least 10 cycles from the functions below,
> or to make the depth of the tree dynamic so that you would save even more
> compared to 1.6.0.
> 
> Also, compiling with "-fstack-protector" instead of "-fstack-protector-all",
> as suggested a while ago by rth, is already giving a savings of 20 cycles.
> 

Is it true that with TCG this affects more than just MMIO
as phys_page_find will also sometimes run on CPU accesses to memory?

> And of course, if this were a realistic test, KVM's 60x penalty would
> be a severe problem---but it isn't, because this is not a realistic setting.
> 
> Paolo

Well, for this argument to carry the day we'd need to design
a realistic test which isn't easy :)

diff --git a/x86/vmexit.c b/x86/vmexit.c
index 957d0cc..405d545 100644
--- a/x86/vmexit.c
+++ b/x86/vmexit.c
@@ -40,6 +40,15 @@ static unsigned int inl(unsigned short port)
 {
     unsigned int val;
     asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+    asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+    asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+    asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+    asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+    asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+    asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+    asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+    asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
+    asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port));
     return val;
 }