From patchwork Thu Nov 7 18:54:13 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Michael S. Tsirkin" X-Patchwork-Id: 289456 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 901D22C007B for ; Fri, 8 Nov 2013 05:51:46 +1100 (EST) Received: from localhost ([::1]:41745 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VeUgC-0000BK-Pp for incoming@patchwork.ozlabs.org; Thu, 07 Nov 2013 13:51:44 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51176) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VeUfq-0008UV-S0 for qemu-devel@nongnu.org; Thu, 07 Nov 2013 13:51:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VeUfk-0002gs-TB for qemu-devel@nongnu.org; Thu, 07 Nov 2013 13:51:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:28915) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VeUfk-0002go-Kn for qemu-devel@nongnu.org; Thu, 07 Nov 2013 13:51:16 -0500 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id rA7IpFdA004956 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Thu, 7 Nov 2013 13:51:15 -0500 Received: from redhat.com (vpn1-6-106.ams2.redhat.com [10.36.6.106]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with SMTP id rA7IpCuG012376; Thu, 7 Nov 2013 13:51:13 -0500 Date: Thu, 7 Nov 2013 20:54:13 +0200 From: "Michael S. Tsirkin" To: Paolo Bonzini Message-ID: <20131107185413.GA4974@redhat.com> References: <1383840877-2861-1-git-send-email-pbonzini@redhat.com> <20131107162131.GA4370@redhat.com> <527BBFDB.2010404@redhat.com> <20131107164705.GA4572@redhat.com> <527BCE04.9020107@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <527BCE04.9020107@redhat.com> X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: marcel.a@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com Subject: Re: [Qemu-devel] [PATCH 0/2] exec: alternative fix for master abort woes X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Thu, Nov 07, 2013 at 06:29:40PM +0100, Paolo Bonzini wrote: > Il 07/11/2013 17:47, Michael S. Tsirkin ha scritto: > > That's on kvm with 52 bit address. > > But where I would be concerned is systems with e.g. 36 bit address > > space where we are doubling the cost of the lookup. > > E.g. try i386 and not x86_64. > > Tried now... > > P_L2_LEVELS pre-patch post-patch > i386 3 6 > x86_64 4 6 > > I timed the inl_from_qemu test of vmexit.flat with both KVM and TCG. With > TCG there's indeed a visible penalty of 20 cycles for i386 and 10 for x86_64 > (you can extrapolate to 30 cycles for TARGET_PHYS_ADDR_SPACE_BITS=32 targets). > These can be more or less entirely ascribed to phys_page_find: > > TCG | KVM > pre-patch post-patch | pre-patch post-patch > phys_page_find(i386) 13% 25% | 0.6% 1% > inl_from_qemu cycles(i386) 153 173 | ~12000 ~12000 I'm a bit confused by the numbers above. The % of phys_page_find has grown from 13% to 25% (almost double, which is kind of expected give we have twice the # of levels). But overhead in # of cycles only went from 153 to 173? Maybe the test is a bit wrong for tcg - how about unrolling the loop in kvm unit test? Then you have to divide the reported result by 10. > phys_page_find(x86_64) 18% 25% | 0.8% 1% > inl_from_qemu cycles(x86_64) 163 173 | ~12000 ~12000 > > Thus this patch costs 0.4% in the worst case for KVM, 12% in the worst case > for TCG. The cycle breakdown is: > > 60 phys_page_find > 28 access_with_adjusted_size > 24 address_space_translate_internal > 20 address_space_rw > 13 io_mem_read > 11 address_space_translate > 9 memory_region_read_accessor > 6 memory_region_access_valid > 4 helper_inl > 4 memory_access_size > 3 cpu_inl > > (This run reported 177 cycles per access; the total is 182 due to rounding). > It is probably possible to shave at least 10 cycles from the functions below, > or to make the depth of the tree dynamic so that you would save even more > compared to 1.6.0. > > Also, compiling with "-fstack-protector" instead of "-fstack-protector-all", > as suggested a while ago by rth, is already giving a savings of 20 cycles. > Is it true that with TCG this affects more than just MMIO as phys_page_find will also sometimes run on CPU accesses to memory? > And of course, if this were a realistic test, KVM's 60x penalty would > be a severe problem---but it isn't, because this is not a realistic setting. > > Paolo Well, for this argument to carry the day we'd need to design a realistic test which isn't easy :) diff --git a/x86/vmexit.c b/x86/vmexit.c index 957d0cc..405d545 100644 --- a/x86/vmexit.c +++ b/x86/vmexit.c @@ -40,6 +40,15 @@ static unsigned int inl(unsigned short port) { unsigned int val; asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port)); + asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port)); + asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port)); + asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port)); + asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port)); + asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port)); + asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port)); + asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port)); + asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port)); + asm volatile("inl %w1, %0" : "=a"(val) : "Nd"(port)); return val; }