Patchwork ppcr: Avoid decrementer related kvm exits

login
register
mail settings
Submitter David Gibson
Date Oct. 14, 2011, 5:19 a.m.
Message ID <1318569548-8537-1-git-send-email-david@gibson.dropbear.id.au>
Download mbox | patch
Permalink /patch/119701/
State New
Headers show

Comments

David Gibson - Oct. 14, 2011, 5:19 a.m.
In __cpu_ppc_store_decr(), we set up a regular timer used to trigger
decrementer interrupts.  This is necessary to implement the decrementer
properly under TCG, but is unnecessary under KVM (true for both Book3S-PR
and Book3S-HV KVM variants), because the kernel handles generating and
delivering decrementer exceptions.

Under kvm, in fact, the timer causes expensive and unnecessary exits from
kvm to qemu.  This patch, therefore, disables setting the timer when kvm
is in use.

Signed-off-by: Anton Blanchard <anton@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc.c |   25 ++++++++++++++-----------
 1 files changed, 14 insertions(+), 11 deletions(-)
Alexander Graf - Oct. 14, 2011, 5:30 a.m.
On 14.10.2011, at 07:19, David Gibson wrote:

> In __cpu_ppc_store_decr(), we set up a regular timer used to trigger
> decrementer interrupts.  This is necessary to implement the decrementer
> properly under TCG, but is unnecessary under KVM (true for both Book3S-PR
> and Book3S-HV KVM variants), because the kernel handles generating and
> delivering decrementer exceptions.
> 
> Under kvm, in fact, the timer causes expensive and unnecessary exits from
> kvm to qemu.  This patch, therefore, disables setting the timer when kvm
> is in use.
> 
> Signed-off-by: Anton Blanchard <anton@au1.ibm.com>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> hw/ppc.c |   25 ++++++++++++++-----------
> 1 files changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/ppc.c b/hw/ppc.c
> index 25b59dd..87aa4e5 100644
> --- a/hw/ppc.c
> +++ b/hw/ppc.c
> @@ -658,21 +658,24 @@ static void __cpu_ppc_store_decr (CPUState *env, uint64_t *nextp,

Do we ever call store_decr in the kvm case? Isn't that only called from emulated mtdec?

Alex

>                                   int is_excp)
> {
>     ppc_tb_t *tb_env = env->tb_env;
> -    uint64_t now, next;
> 
>     LOG_TB("%s: %08" PRIx32 " => %08" PRIx32 "\n", __func__,
>                 decr, value);
> -    now = qemu_get_clock_ns(vm_clock);
> -    next = now + muldiv64(value, get_ticks_per_sec(), tb_env->decr_freq);
> -    if (is_excp) {
> -        next += *nextp - now;
> -    }
> -    if (next == now) {
> -        next++;
> +    if (!kvm_enabled()) {
> +        uint64_t now, next;
> +
> +        now = qemu_get_clock_ns(vm_clock);
> +        next = now + muldiv64(value, get_ticks_per_sec(), tb_env->decr_freq);
> +        if (is_excp) {
> +            next += *nextp - now;
> +        }
> +        if (next == now) {
> +            next++;
> +        }
> +        *nextp = next;
> +        /* Adjust timer */
> +        qemu_mod_timer(timer, next);
>     }
> -    *nextp = next;
> -    /* Adjust timer */
> -    qemu_mod_timer(timer, next);
> 
>     /* If we set a negative value and the decrementer was positive, raise an
>      * exception.
> -- 
> 1.7.6.3
>
David Gibson - Oct. 14, 2011, 6:36 a.m.
On Fri, Oct 14, 2011 at 07:30:09AM +0200, Alexander Graf wrote:
> 
> On 14.10.2011, at 07:19, David Gibson wrote:
> 
> > In __cpu_ppc_store_decr(), we set up a regular timer used to trigger
> > decrementer interrupts.  This is necessary to implement the decrementer
> > properly under TCG, but is unnecessary under KVM (true for both Book3S-PR
> > and Book3S-HV KVM variants), because the kernel handles generating and
> > delivering decrementer exceptions.
> > 
> > Under kvm, in fact, the timer causes expensive and unnecessary exits from
> > kvm to qemu.  This patch, therefore, disables setting the timer when kvm
> > is in use.
> > 
> > Signed-off-by: Anton Blanchard <anton@au1.ibm.com>
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > hw/ppc.c |   25 ++++++++++++++-----------
> > 1 files changed, 14 insertions(+), 11 deletions(-)
> > 
> > diff --git a/hw/ppc.c b/hw/ppc.c
> > index 25b59dd..87aa4e5 100644
> > --- a/hw/ppc.c
> > +++ b/hw/ppc.c
> > @@ -658,21 +658,24 @@ static void __cpu_ppc_store_decr (CPUState *env, uint64_t *nextp,
> 
> Do we ever call store_decr in the kvm case? Isn't that only called
> from emulated mtdec?

Yes, from cpu_ppc_set_tb_clk().  Anton observed the kvm exits in the
wild, they're not theoretical.

Agh, which reminds me, I forgot to fixup the git author again.  The
patch should show authorship by Anton Blanchard <anton@au1.ibm.com>,
as in the s-o-b.
Alexander Graf - Oct. 14, 2011, 6:44 a.m.
On 14.10.2011, at 08:36, David Gibson wrote:

> On Fri, Oct 14, 2011 at 07:30:09AM +0200, Alexander Graf wrote:
>> 
>> On 14.10.2011, at 07:19, David Gibson wrote:
>> 
>>> In __cpu_ppc_store_decr(), we set up a regular timer used to trigger
>>> decrementer interrupts.  This is necessary to implement the decrementer
>>> properly under TCG, but is unnecessary under KVM (true for both Book3S-PR
>>> and Book3S-HV KVM variants), because the kernel handles generating and
>>> delivering decrementer exceptions.
>>> 
>>> Under kvm, in fact, the timer causes expensive and unnecessary exits from
>>> kvm to qemu.  This patch, therefore, disables setting the timer when kvm
>>> is in use.
>>> 
>>> Signed-off-by: Anton Blanchard <anton@au1.ibm.com>
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> ---
>>> hw/ppc.c |   25 ++++++++++++++-----------
>>> 1 files changed, 14 insertions(+), 11 deletions(-)
>>> 
>>> diff --git a/hw/ppc.c b/hw/ppc.c
>>> index 25b59dd..87aa4e5 100644
>>> --- a/hw/ppc.c
>>> +++ b/hw/ppc.c
>>> @@ -658,21 +658,24 @@ static void __cpu_ppc_store_decr (CPUState *env, uint64_t *nextp,
>> 
>> Do we ever call store_decr in the kvm case? Isn't that only called
>> from emulated mtdec?
> 
> Yes, from cpu_ppc_set_tb_clk().  Anton observed the kvm exits in the
> wild, they're not theoretical.
> 
> Agh, which reminds me, I forgot to fixup the git author again.  The
> patch should show authorship by Anton Blanchard <anton@au1.ibm.com>,
> as in the s-o-b.

Wouldn't a simple

if (kvm_enabled()) {
    return;
}

in the beginning of the function make more sense? There's no code connecting the in-qemu and the in-kvm decrementors atm, so any logic applying to the in-qemu one is moot for kvm.


Alex
David Gibson - Oct. 14, 2011, 6:46 a.m.
On Fri, Oct 14, 2011 at 08:44:06AM +0200, Alexander Graf wrote:
> 
> On 14.10.2011, at 08:36, David Gibson wrote:
> 
> > On Fri, Oct 14, 2011 at 07:30:09AM +0200, Alexander Graf wrote:
> >> 
> >> On 14.10.2011, at 07:19, David Gibson wrote:
> >> 
> >>> In __cpu_ppc_store_decr(), we set up a regular timer used to trigger
> >>> decrementer interrupts.  This is necessary to implement the decrementer
> >>> properly under TCG, but is unnecessary under KVM (true for both Book3S-PR
> >>> and Book3S-HV KVM variants), because the kernel handles generating and
> >>> delivering decrementer exceptions.
> >>> 
> >>> Under kvm, in fact, the timer causes expensive and unnecessary exits from
> >>> kvm to qemu.  This patch, therefore, disables setting the timer when kvm
> >>> is in use.
> >>> 
> >>> Signed-off-by: Anton Blanchard <anton@au1.ibm.com>
> >>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>> ---
> >>> hw/ppc.c |   25 ++++++++++++++-----------
> >>> 1 files changed, 14 insertions(+), 11 deletions(-)
> >>> 
> >>> diff --git a/hw/ppc.c b/hw/ppc.c
> >>> index 25b59dd..87aa4e5 100644
> >>> --- a/hw/ppc.c
> >>> +++ b/hw/ppc.c
> >>> @@ -658,21 +658,24 @@ static void __cpu_ppc_store_decr (CPUState *env, uint64_t *nextp,
> >> 
> >> Do we ever call store_decr in the kvm case? Isn't that only called
> >> from emulated mtdec?
> > 
> > Yes, from cpu_ppc_set_tb_clk().  Anton observed the kvm exits in the
> > wild, they're not theoretical.
> > 
> > Agh, which reminds me, I forgot to fixup the git author again.  The
> > patch should show authorship by Anton Blanchard <anton@au1.ibm.com>,
> > as in the s-o-b.
> 
> Wouldn't a simple
> 
> if (kvm_enabled()) {
>     return;
> }
> 
> in the beginning of the function make more sense? There's no code
> connecting the in-qemu and the in-kvm decrementors atm, so any logic
> applying to the in-qemu one is moot for kvm.

Uh.. I guess so.  I wasn't 100% sure the last bit of code in the
function wouldn't have some effect on kvm.  But I guess it doesn't;
I'll revise.
David Gibson - Oct. 17, 2011, 5:25 a.m.
On Fri, Oct 14, 2011 at 05:46:14PM +1100, David Gibson wrote:
> On Fri, Oct 14, 2011 at 08:44:06AM +0200, Alexander Graf wrote:
> > 
> > On 14.10.2011, at 08:36, David Gibson wrote:
> > 
> > > On Fri, Oct 14, 2011 at 07:30:09AM +0200, Alexander Graf wrote:
> > >> 
> > >> On 14.10.2011, at 07:19, David Gibson wrote:
> > >> 
> > >>> In __cpu_ppc_store_decr(), we set up a regular timer used to trigger
> > >>> decrementer interrupts.  This is necessary to implement the decrementer
> > >>> properly under TCG, but is unnecessary under KVM (true for both Book3S-PR
> > >>> and Book3S-HV KVM variants), because the kernel handles generating and
> > >>> delivering decrementer exceptions.
> > >>> 
> > >>> Under kvm, in fact, the timer causes expensive and unnecessary exits from
> > >>> kvm to qemu.  This patch, therefore, disables setting the timer when kvm
> > >>> is in use.
> > >>> 
> > >>> Signed-off-by: Anton Blanchard <anton@au1.ibm.com>
> > >>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > >>> ---
> > >>> hw/ppc.c |   25 ++++++++++++++-----------
> > >>> 1 files changed, 14 insertions(+), 11 deletions(-)
> > >>> 
> > >>> diff --git a/hw/ppc.c b/hw/ppc.c
> > >>> index 25b59dd..87aa4e5 100644
> > >>> --- a/hw/ppc.c
> > >>> +++ b/hw/ppc.c
> > >>> @@ -658,21 +658,24 @@ static void __cpu_ppc_store_decr (CPUState *env, uint64_t *nextp,
> > >> 
> > >> Do we ever call store_decr in the kvm case? Isn't that only called
> > >> from emulated mtdec?
> > > 
> > > Yes, from cpu_ppc_set_tb_clk().  Anton observed the kvm exits in the
> > > wild, they're not theoretical.
> > > 
> > > Agh, which reminds me, I forgot to fixup the git author again.  The
> > > patch should show authorship by Anton Blanchard <anton@au1.ibm.com>,
> > > as in the s-o-b.
> > 
> > Wouldn't a simple
> > 
> > if (kvm_enabled()) {
> >     return;
> > }
> > 
> > in the beginning of the function make more sense? There's no code
> > connecting the in-qemu and the in-kvm decrementors atm, so any logic
> > applying to the in-qemu one is moot for kvm.
> 
> Uh.. I guess so.  I wasn't 100% sure the last bit of code in the
> function wouldn't have some effect on kvm.  But I guess it doesn't;
> I'll revise.

Revised patch sent.
Scott Wood - Oct. 25, 2011, 7:38 p.m.
On 10/14/2011 01:44 AM, Alexander Graf wrote:
> Wouldn't a simple
> 
> if (kvm_enabled()) {
>     return;
> }
> 
> in the beginning of the function make more sense? There's no code connecting the in-qemu and the in-kvm decrementors atm, so any logic applying to the in-qemu one is moot for kvm.

On book3e at least, we can use sregs to set the decrementer, and we
probably want this to happen on reset.

-Scott
Alexander Graf - Oct. 28, 2011, 7:27 a.m.
On 25.10.2011, at 21:38, Scott Wood wrote:

> On 10/14/2011 01:44 AM, Alexander Graf wrote:
>> Wouldn't a simple
>> 
>> if (kvm_enabled()) {
>>    return;
>> }
>> 
>> in the beginning of the function make more sense? There's no code connecting the in-qemu and the in-kvm decrementors atm, so any logic applying to the in-qemu one is moot for kvm.
> 
> On book3e at least, we can use sregs to set the decrementer, and we
> probably want this to happen on reset.

Sure.

if (kvm_enabled()) {
    kvmppc_set_dec(x);
    return;
}


Alex

Patch

diff --git a/hw/ppc.c b/hw/ppc.c
index 25b59dd..87aa4e5 100644
--- a/hw/ppc.c
+++ b/hw/ppc.c
@@ -658,21 +658,24 @@  static void __cpu_ppc_store_decr (CPUState *env, uint64_t *nextp,
                                   int is_excp)
 {
     ppc_tb_t *tb_env = env->tb_env;
-    uint64_t now, next;
 
     LOG_TB("%s: %08" PRIx32 " => %08" PRIx32 "\n", __func__,
                 decr, value);
-    now = qemu_get_clock_ns(vm_clock);
-    next = now + muldiv64(value, get_ticks_per_sec(), tb_env->decr_freq);
-    if (is_excp) {
-        next += *nextp - now;
-    }
-    if (next == now) {
-        next++;
+    if (!kvm_enabled()) {
+        uint64_t now, next;
+
+        now = qemu_get_clock_ns(vm_clock);
+        next = now + muldiv64(value, get_ticks_per_sec(), tb_env->decr_freq);
+        if (is_excp) {
+            next += *nextp - now;
+        }
+        if (next == now) {
+            next++;
+        }
+        *nextp = next;
+        /* Adjust timer */
+        qemu_mod_timer(timer, next);
     }
-    *nextp = next;
-    /* Adjust timer */
-    qemu_mod_timer(timer, next);
 
     /* If we set a negative value and the decrementer was positive, raise an
      * exception.