diff mbox

target-arm: Use common CPU cycle infrastructure

Message ID 1443123824-26866-1-git-send-email-cov@codeaurora.org
State New
Headers show

Commit Message

Christopher Covington Sept. 24, 2015, 7:43 p.m. UTC
cpu_get_ticks() provides a common interface across targets for
calculating CPU cycles. Using this fixes PMCCNTR reads when -icount
is specified (previously a non-increasing value was returned).

Signed-off-by: Christopher Covington <cov@codeaurora.org>
---
 target-arm/helper.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

Comments

Alistair Francis Sept. 28, 2015, 10:05 p.m. UTC | #1
On Thu, Sep 24, 2015 at 12:43 PM, Christopher Covington
<cov@codeaurora.org> wrote:
> cpu_get_ticks() provides a common interface across targets for
> calculating CPU cycles. Using this fixes PMCCNTR reads when -icount
> is specified (previously a non-increasing value was returned).
>
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> ---
>  target-arm/helper.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index 7dc49cb..32923fb 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -729,8 +729,7 @@ void pmccntr_sync(CPUARMState *env)
>  {
>      uint64_t temp_ticks;
>
> -    temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
> -                          get_ticks_per_sec(), 1000000);
> +    temp_ticks = cpu_get_ticks();

This patch doesn't apply anymore, you will need to rebase it.

Also I don't think this is correct. cpu_get_ticks() returns the host
CPU cycle counter, when in this case we want the guest cycles.

Thanks,

Alistair

>
>      if (env->cp15.c9_pmcr & PMCRD) {
>          /* Increment once every 64 processor clock cycles */
> @@ -768,8 +767,7 @@ static uint64_t pmccntr_read(CPUARMState *env, const ARMCPRegInfo *ri)
>          return env->cp15.c15_ccnt;
>      }
>
> -    total_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
> -                           get_ticks_per_sec(), 1000000);
> +    total_ticks = cpu_get_ticks();
>
>      if (env->cp15.c9_pmcr & PMCRD) {
>          /* Increment once every 64 processor clock cycles */
> @@ -789,8 +787,7 @@ static void pmccntr_write(CPUARMState *env, const ARMCPRegInfo *ri,
>          return;
>      }
>
> -    total_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
> -                           get_ticks_per_sec(), 1000000);
> +    total_ticks = cpu_get_ticks();
>
>      if (env->cp15.c9_pmcr & PMCRD) {
>          /* Increment once every 64 processor clock cycles */
> --
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
>
Christopher Covington Sept. 29, 2015, 2:07 p.m. UTC | #2
On 09/28/2015 06:05 PM, Alistair Francis wrote:
> On Thu, Sep 24, 2015 at 12:43 PM, Christopher Covington
> <cov@codeaurora.org> wrote:
>> cpu_get_ticks() provides a common interface across targets for
>> calculating CPU cycles. Using this fixes PMCCNTR reads when -icount
>> is specified (previously a non-increasing value was returned).
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> ---
>>  target-arm/helper.c | 9 +++------
>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/target-arm/helper.c b/target-arm/helper.c
>> index 7dc49cb..32923fb 100644
>> --- a/target-arm/helper.c
>> +++ b/target-arm/helper.c
>> @@ -729,8 +729,7 @@ void pmccntr_sync(CPUARMState *env)
>>  {
>>      uint64_t temp_ticks;
>>
>> -    temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
>> -                          get_ticks_per_sec(), 1000000);
>> +    temp_ticks = cpu_get_ticks();

> Also I don't think this is correct. cpu_get_ticks() returns the host
> CPU cycle counter, when in this case we want the guest cycles.

I too find the use of host CPU cycles quite perplexing. Paolo suggested it
[1]. Maybe there are timeouts in some software that tend to work better in
such a mode. Perhaps it is faster, although my intuition is that it's just
providing a facade of frequency scaling to the guest.

1. https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg00162.html

I like to declare guest instructions per guest CPU cycles = 1. As I understand
it, an "-icount 0" pair of parameters is how to do this in QEMU for x86. I'd
like it to work for ARM.

I wrote a simple assembly test case which I'm working on porting it to the
kvm-unit-tests framework. In the non-icount case, I saw roughly the same order
of magnitude for guest IPC before and after the patch. I'd like to also write
CPU frequency (guest CPU cycles per generic timer guest seconds) and (M)IPS
(guest instructions per generic timer guest seconds) tests.

Thanks,
Christopher Covington
Christopher Covington Oct. 2, 2015, 4:44 p.m. UTC | #3
On 09/29/2015 10:07 AM, Christopher Covington wrote:
> On 09/28/2015 06:05 PM, Alistair Francis wrote:
>> On Thu, Sep 24, 2015 at 12:43 PM, Christopher Covington
>> <cov@codeaurora.org> wrote:
>>> cpu_get_ticks() provides a common interface across targets for
>>> calculating CPU cycles. Using this fixes PMCCNTR reads when -icount
>>> is specified (previously a non-increasing value was returned).
>>>
>>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>>> ---
>>>  target-arm/helper.c | 9 +++------
>>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/target-arm/helper.c b/target-arm/helper.c
>>> index 7dc49cb..32923fb 100644
>>> --- a/target-arm/helper.c
>>> +++ b/target-arm/helper.c
>>> @@ -729,8 +729,7 @@ void pmccntr_sync(CPUARMState *env)
>>>  {
>>>      uint64_t temp_ticks;
>>>
>>> -    temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
>>> -                          get_ticks_per_sec(), 1000000);
>>> +    temp_ticks = cpu_get_ticks();
> 
>> Also I don't think this is correct. cpu_get_ticks() returns the host
>> CPU cycle counter, when in this case we want the guest cycles.
> 
> I too find the use of host CPU cycles quite perplexing. Paolo suggested it
> [1]. Maybe there are timeouts in some software that tend to work better in
> such a mode. Perhaps it is faster, although my intuition is that it's just
> providing a facade of frequency scaling to the guest.
> 
> 1. https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg00162.html
> 
> I like to declare guest instructions per guest CPU cycles = 1. As I understand
> it, an "-icount 0" pair of parameters is how to do this in QEMU for x86. I'd
> like it to work for ARM.
> 
> I wrote a simple assembly test case which I'm working on porting it to the
> kvm-unit-tests framework. In the non-icount case, I saw roughly the same order
> of magnitude for guest IPC before and after the patch. I'd like to also write
> CPU frequency (guest CPU cycles per generic timer guest seconds) and (M)IPS
> (guest instructions per generic timer guest seconds) tests.

I've sent out the CPI test case and while exercising it I noticed that
Laurent's patch fixed -icount. So my original goal has been accomplished. I'm
happy to rebase this patch if the authorities (Peter Maydell?) think using
cpu_get_ticks() is the right thing to do. Otherwise I'll probably try to move
on to support for the instructions event in the ARM PMU.

Thanks,
Christopher Covington
Peter Maydell Oct. 2, 2015, 4:56 p.m. UTC | #4
On 2 October 2015 at 17:44, Christopher Covington <cov@codeaurora.org> wrote:
> I've sent out the CPI test case and while exercising it I noticed that
> Laurent's patch fixed -icount. So my original goal has been accomplished. I'm
> happy to rebase this patch if the authorities (Peter Maydell?) think using
> cpu_get_ticks() is the right thing to do. Otherwise I'll probably try to move
> on to support for the instructions event in the ARM PMU.

Authority here is probably Peter Crosthwaite. I can produce an
opinion but I'd have to go and research a bunch of stuff to do
that, so I'm hoping to avoid it...

thanks
-- PMM
Peter Crosthwaite Oct. 2, 2015, 5:25 p.m. UTC | #5
On Fri, Oct 2, 2015 at 9:56 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 2 October 2015 at 17:44, Christopher Covington <cov@codeaurora.org> wrote:
>> I've sent out the CPI test case and while exercising it I noticed that
>> Laurent's patch fixed -icount. So my original goal has been accomplished. I'm
>> happy to rebase this patch if the authorities (Peter Maydell?) think using
>> cpu_get_ticks() is the right thing to do. Otherwise I'll probably try to move
>> on to support for the instructions event in the ARM PMU.
>
> Authority here is probably Peter Crosthwaite. I can produce an
> opinion but I'd have to go and research a bunch of stuff to do
> that, so I'm hoping to avoid it...
>

So my idea here is the CPU input frequency should be a property of the CPU.

Some experimental results confirm that the PMCCNTR on many common ARM
implementations is directly connected to the input clock and can be
relied on as a straight free-running counter. I think a genuine
instruction counter is something else and this timer should be
independent of any core provider of cycle count.

Regards,
Peter

> thanks
> -- PMM
Peter Maydell Oct. 2, 2015, 6:08 p.m. UTC | #6
On 2 October 2015 at 18:25, Peter Crosthwaite
<crosthwaitepeter@gmail.com> wrote:
> So my idea here is the CPU input frequency should be a property of the CPU.
>
> Some experimental results confirm that the PMCCNTR on many common ARM
> implementations is directly connected to the input clock and can be
> relied on as a straight free-running counter. I think a genuine
> instruction counter is something else and this timer should be
> independent of any core provider of cycle count.

Architecturally, the PMCCNTR counter is counting the hardware processor
clock. It's definitely not an instruction counter. (It's also not
counting Processor Element clock cycles, though that only makes a
difference if you have a multi-threaded hw implementation.) It is
generally subject to any hw changes in clock frequency (including if
your WFI/WFE do clock stopping).

What that means for QEMU I'm not totally sure :-)

thanks
-- PMM
Peter Crosthwaite Oct. 2, 2015, 6:14 p.m. UTC | #7
On Fri, Oct 2, 2015 at 11:08 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 2 October 2015 at 18:25, Peter Crosthwaite
> <crosthwaitepeter@gmail.com> wrote:
>> So my idea here is the CPU input frequency should be a property of the CPU.
>>
>> Some experimental results confirm that the PMCCNTR on many common ARM
>> implementations is directly connected to the input clock and can be
>> relied on as a straight free-running counter. I think a genuine
>> instruction counter is something else and this timer should be
>> independent of any core provider of cycle count.
>
> Architecturally, the PMCCNTR counter is counting the hardware processor
> clock. It's definitely not an instruction counter. (It's also not
> counting Processor Element clock cycles, though that only makes a
> difference if you have a multi-threaded hw implementation.) It is
> generally subject to any hw changes in clock frequency (including if
> your WFI/WFE do clock stopping).
>

WFI/WFE halting could be easily implemented directly as that, in
similar way to EL filtering.

> What that means for QEMU I'm not totally sure :-)

I think this all points to it being just another normal timer (like
those in hw/timer).

Regards,
Peter

>
> thanks
> -- PMM
Christopher Covington Oct. 2, 2015, 7:25 p.m. UTC | #8
On 10/02/2015 01:25 PM, Peter Crosthwaite wrote:
> On Fri, Oct 2, 2015 at 9:56 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 2 October 2015 at 17:44, Christopher Covington <cov@codeaurora.org> wrote:
>>> I've sent out the CPI test case and while exercising it I noticed that
>>> Laurent's patch fixed -icount. So my original goal has been accomplished. I'm
>>> happy to rebase this patch if the authorities (Peter Maydell?) think using
>>> cpu_get_ticks() is the right thing to do. Otherwise I'll probably try to move
>>> on to support for the instructions event in the ARM PMU.
>>
>> Authority here is probably Peter Crosthwaite. I can produce an
>> opinion but I'd have to go and research a bunch of stuff to do
>> that, so I'm hoping to avoid it...
> 
> So my idea here is the CPU input frequency should be a property of the CPU.
> 
> Some experimental results confirm that the PMCCNTR on many common ARM
> implementations is directly connected to the input clock and can be
> relied on as a straight free-running counter. I think a genuine
> instruction counter is something else

Yes, the "genuine" instruction counter is something else. The instruction
count is only relevant for folks trying to get deterministic execution by
using the -icount option. QEMU TCG mode does not emulate a cycle-level input
clock for the guest (the whole class of functional models skip this
time-consuming step) but rather operates a block at a time. By doing a little
extra, I think it also interpolates the exact instruction count. Specifying a
fixed IPC = n is the most sensible way of deterministically calculating a
PMCCNTR_EL0 value that I know of. The -icount option allows users to choose
such deterministic behavior.

> and this timer should be independent of any core provider of cycle count.

What, if anything, do you think should be hooked up to the core provider of
cycle count?

Christopher Covington
Peter Crosthwaite Oct. 2, 2015, 7:56 p.m. UTC | #9
On Fri, Oct 2, 2015 at 12:25 PM, Christopher Covington
<cov@codeaurora.org> wrote:
> On 10/02/2015 01:25 PM, Peter Crosthwaite wrote:
>> On Fri, Oct 2, 2015 at 9:56 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> On 2 October 2015 at 17:44, Christopher Covington <cov@codeaurora.org> wrote:
>>>> I've sent out the CPI test case and while exercising it I noticed that
>>>> Laurent's patch fixed -icount. So my original goal has been accomplished. I'm
>>>> happy to rebase this patch if the authorities (Peter Maydell?) think using
>>>> cpu_get_ticks() is the right thing to do. Otherwise I'll probably try to move
>>>> on to support for the instructions event in the ARM PMU.
>>>
>>> Authority here is probably Peter Crosthwaite. I can produce an
>>> opinion but I'd have to go and research a bunch of stuff to do
>>> that, so I'm hoping to avoid it...
>>
>> So my idea here is the CPU input frequency should be a property of the CPU.
>>
>> Some experimental results confirm that the PMCCNTR on many common ARM
>> implementations is directly connected to the input clock and can be
>> relied on as a straight free-running counter. I think a genuine
>> instruction counter is something else
>
> Yes, the "genuine" instruction counter is something else. The instruction
> count is only relevant for folks trying to get deterministic execution by
> using the -icount option. QEMU TCG mode does not emulate a cycle-level input
> clock for the guest (the whole class of functional models skip this
> time-consuming step) but rather operates a block at a time. By doing a little
> extra, I think it also interpolates the exact instruction count. Specifying a
> fixed IPC = n is the most sensible way of deterministically calculating a
> PMCCNTR_EL0 value that I know of. The -icount option allows users to choose
> such deterministic behavior.
>
>> and this timer should be independent of any core provider of cycle count.
>
> What, if anything, do you think should be hooked up to the core provider of
> cycle count?
>

Depends, Is this a virtual-machine only concept, or do you have
something with a real-hardware analogue?

Regards,
Peter

> Christopher Covington
>
> --
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
Christopher Covington Oct. 2, 2015, 8:48 p.m. UTC | #10
On 10/02/2015 03:56 PM, Peter Crosthwaite wrote:
> On Fri, Oct 2, 2015 at 12:25 PM, Christopher Covington
> <cov@codeaurora.org> wrote:
>> On 10/02/2015 01:25 PM, Peter Crosthwaite wrote:
>>> On Fri, Oct 2, 2015 at 9:56 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>> On 2 October 2015 at 17:44, Christopher Covington <cov@codeaurora.org> wrote:
>>>>> I've sent out the CPI test case and while exercising it I noticed that
>>>>> Laurent's patch fixed -icount. So my original goal has been accomplished. I'm
>>>>> happy to rebase this patch if the authorities (Peter Maydell?) think using
>>>>> cpu_get_ticks() is the right thing to do. Otherwise I'll probably try to move
>>>>> on to support for the instructions event in the ARM PMU.
>>>>
>>>> Authority here is probably Peter Crosthwaite. I can produce an
>>>> opinion but I'd have to go and research a bunch of stuff to do
>>>> that, so I'm hoping to avoid it...
>>>
>>> So my idea here is the CPU input frequency should be a property of the CPU.
>>>
>>> Some experimental results confirm that the PMCCNTR on many common ARM
>>> implementations is directly connected to the input clock and can be
>>> relied on as a straight free-running counter. I think a genuine
>>> instruction counter is something else
>>
>> Yes, the "genuine" instruction counter is something else. The instruction
>> count is only relevant for folks trying to get deterministic execution by
>> using the -icount option. QEMU TCG mode does not emulate a cycle-level input
>> clock for the guest (the whole class of functional models skip this
>> time-consuming step) but rather operates a block at a time. By doing a little
>> extra, I think it also interpolates the exact instruction count. Specifying a
>> fixed IPC = n is the most sensible way of deterministically calculating a
>> PMCCNTR_EL0 value that I know of. The -icount option allows users to choose
>> such deterministic behavior.
>>
>>> and this timer should be independent of any core provider of cycle count.
>>
>> What, if anything, do you think should be hooked up to the core provider of
>> cycle count?
> 
> Depends, Is this a virtual-machine only concept, or do you have
> something with a real-hardware analogue?

What I meant to ask was, do you see any reason for cpu_get_ticks() to exist?
If no architecture besides i386 wants to use it, perhaps the code should be
moved there.

Thanks,
Christopher Covington
Peter Maydell Oct. 2, 2015, 10:41 p.m. UTC | #11
On 2 October 2015 at 21:48, Christopher Covington <cov@codeaurora.org> wrote:
> What I meant to ask was, do you see any reason for cpu_get_ticks() to exist?
> If no architecture besides i386 wants to use it, perhaps the code should be
> moved there.

OTOH various non-x86 things do use the closely related cpu_get_real_ticks(),
and the implementation of cpu_get_ticks() is very closely related to
the other clock code in cpus.c.

-- PMM
Paolo Bonzini Oct. 5, 2015, 2:09 p.m. UTC | #12
On 03/10/2015 00:41, Peter Maydell wrote:
> > What I meant to ask was, do you see any reason for cpu_get_ticks() to exist?
> > If no architecture besides i386 wants to use it, perhaps the code should be
> > moved there.
>
> OTOH various non-x86 things do use the closely related cpu_get_real_ticks(),
> and the implementation of cpu_get_ticks() is very closely related to
> the other clock code in cpus.c.

cpu_get_real_ticks() is returning the host cycle counter;
cpu_get_ticks() is stopping/resuming it when the VM is stopped/resumed.
 In other words, cpu_get_real_ticks() is to cpu_get_ticks() what
QEMU_CLOCK_REALTIME is to QEMU_CLOCK_VIRTUAL.

They are also similar in that both cpu_get_ticks() and
QEMU_CLOCK_VIRTUAL "morph" to the instruction count in icount mode.

cpu_get_ticks() should be the right implementation for the ARM PMCCNTR
cycle counter, since: 1) PMCCNTR is roughly the same as the x86 RDTSC;
2) cpu_get_ticks() is the only monotonically increasing value outside
icount mode.

Paolo
Peter Maydell Oct. 5, 2015, 2:11 p.m. UTC | #13
On 5 October 2015 at 15:09, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 03/10/2015 00:41, Peter Maydell wrote:
>> > What I meant to ask was, do you see any reason for cpu_get_ticks() to exist?
>> > If no architecture besides i386 wants to use it, perhaps the code should be
>> > moved there.
>>
>> OTOH various non-x86 things do use the closely related cpu_get_real_ticks(),
>> and the implementation of cpu_get_ticks() is very closely related to
>> the other clock code in cpus.c.
>
> cpu_get_real_ticks() is returning the host cycle counter;
> cpu_get_ticks() is stopping/resuming it when the VM is stopped/resumed.
>  In other words, cpu_get_real_ticks() is to cpu_get_ticks() what
> QEMU_CLOCK_REALTIME is to QEMU_CLOCK_VIRTUAL.

...but it seems wrong to have anything in the simulation care
about the host cycle counter, especially since on some hosts
the underlying implementation is terrible.

thanks
-- PMM
Paolo Bonzini Oct. 5, 2015, 2:27 p.m. UTC | #14
On 05/10/2015 16:11, Peter Maydell wrote:
> > > OTOH various non-x86 things do use the closely related cpu_get_real_ticks(),
> > > and the implementation of cpu_get_ticks() is very closely related to
> > > the other clock code in cpus.c.
> >
> > cpu_get_real_ticks() is returning the host cycle counter;
> > cpu_get_ticks() is stopping/resuming it when the VM is stopped/resumed.
> >  In other words, cpu_get_real_ticks() is to cpu_get_ticks() what
> > QEMU_CLOCK_REALTIME is to QEMU_CLOCK_VIRTUAL.
>
> ...but it seems wrong to have anything in the simulation care
> about the host cycle counter,

I agree; instruction count is much better.  Unfortunately, -icount has a
relatively hefty performance penalty.  It is also common that code using
PMCCNTR/RDTSC wants the increment between two reads to be large-ish and
at least remotely related to the number of instructions that were
executed in between.

I've also used rdtsc in the guest from time to time to benchmark changes
to TCG. Having it map to the host cycle counter is somewhat useful for
that purpose, though one might say this usecase is niche.

> especially since on some hosts
> the underlying implementation is terrible.

I remember seeing a bug where this terrible implementation caused
divisions by zero on the host.  The default implementation in
include/qemu/timer.h should be changed to
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL).

Note that in practice this terrible implementation is only used on
ARM/AArch64.  Is PMCCNTR or something similar accessible from userspace?
 I guess no, since even write access is enabled via PMUSERENR (and in
general PMUSERENR ought to be false on a preemptive-multitasking OS).

In the end I guess qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) would work just
as well for PMCCNTR, but I think non-Linux OSes still have a relatively
slow clock_gettime() so there is still an advantage in using
cpu_get_ticks().

Paolo

ps: on x86, a long time ago rdtsc was reliable and clock_gettime() was
slow, so it was a no-brainer for benchmarks and the like; then rdtsc
became unreliable and clock_gettime() became fast on Linux; but now at
least on new machines rdtsc is usually reliable again.
Peter Maydell Oct. 6, 2015, 12:49 p.m. UTC | #15
On 5 October 2015 at 15:27, Paolo Bonzini <pbonzini@redhat.com> wrote:
> I remember seeing a bug where this terrible implementation caused
> divisions by zero on the host.  The default implementation in
> include/qemu/timer.h should be changed to
> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL).
>
> Note that in practice this terrible implementation is only used on
> ARM/AArch64.  Is PMCCNTR or something similar accessible from userspace?
>  I guess no, since even write access is enabled via PMUSERENR (and in
> general PMUSERENR ought to be false on a preemptive-multitasking OS).

Yeah, there's no guarantee on userspace access. I think the
fastest way to get some kind of count is to use a library
function that boils down to gettimeofday(2), which we will
do purely in userspace in the kernel VDSO if possible.

thanks
-- PMM
Paolo Bonzini Oct. 6, 2015, 12:58 p.m. UTC | #16
On 06/10/2015 14:49, Peter Maydell wrote:
> Yeah, there's no guarantee on userspace access. I think the
> fastest way to get some kind of count is to use a library
> function that boils down to gettimeofday(2), which we will
> do purely in userspace in the kernel VDSO if possible.

Could we just use CNTVCT_EL0?  Which cores have it?

Paolo
Peter Maydell Oct. 6, 2015, 1:06 p.m. UTC | #17
On 6 October 2015 at 13:58, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 06/10/2015 14:49, Peter Maydell wrote:
>> Yeah, there's no guarantee on userspace access. I think the
>> fastest way to get some kind of count is to use a library
>> function that boils down to gettimeofday(2), which we will
>> do purely in userspace in the kernel VDSO if possible.

Looking closer, clock_gettime() also has a userspace
only fastpath in the VDSO.

> Could we just use CNTVCT_EL0?  Which cores have it?

Not guaranteed to be enabled for userspace access by the kernel,
and even if it is enabled, the kernel folks don't (last time I
checked) consider this userspace ABI -- it's only there for
the benefit of the VDSO. (CNTVCT_EL0 is how the fast clock_gettime
and getimeofday paths are implemented.)

thanks
-- PMM
Paolo Bonzini Oct. 6, 2015, 1:10 p.m. UTC | #18
On 06/10/2015 15:06, Peter Maydell wrote:
> Looking closer, clock_gettime() also has a userspace
> only fastpath in the VDSO.

Yup, hence the patch that changed the default cpu_get_real_ticks()
implementation to get_clock().

>> > Could we just use CNTVCT_EL0?  Which cores have it?
>
> Not guaranteed to be enabled for userspace access by the kernel,
> and even if it is enabled, the kernel folks don't (last time I
> checked) consider this userspace ABI -- it's only there for
> the benefit of the VDSO.

Ok, thanks.

> (CNTVCT_EL0 is how the fast clock_gettime
> and getimeofday paths are implemented.)

Yes.

Paolo
Peter Maydell Oct. 13, 2015, 8:53 p.m. UTC | #19
On 24 September 2015 at 20:43, Christopher Covington <cov@codeaurora.org> wrote:
> cpu_get_ticks() provides a common interface across targets for
> calculating CPU cycles. Using this fixes PMCCNTR reads when -icount
> is specified (previously a non-increasing value was returned).
>
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> ---
>  target-arm/helper.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)

So, I think the conclusion from this thread was that we should
(a) change the default cpu_get_host_ticks() implementation to
call get_clock()
(b) rebase this patch and apply it

Is anybody planning to do that?

In any case, I'm taking this thread off my "must-review" list :-)

thanks
-- PMM
Christopher Covington Oct. 14, 2015, 12:10 p.m. UTC | #20
On 10/13/2015 04:53 PM, Peter Maydell wrote:
> On 24 September 2015 at 20:43, Christopher Covington <cov@codeaurora.org> wrote:
>> cpu_get_ticks() provides a common interface across targets for
>> calculating CPU cycles. Using this fixes PMCCNTR reads when -icount
>> is specified (previously a non-increasing value was returned).
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> ---
>>  target-arm/helper.c | 9 +++------
>>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> So, I think the conclusion from this thread was that we should
> (a) change the default cpu_get_host_ticks() implementation to
> call get_clock()
> (b) rebase this patch and apply it
> 
> Is anybody planning to do that?

It's not in my plans at the moment.

Thanks,
Christopher Covington
diff mbox

Patch

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 7dc49cb..32923fb 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -729,8 +729,7 @@  void pmccntr_sync(CPUARMState *env)
 {
     uint64_t temp_ticks;
 
-    temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
-                          get_ticks_per_sec(), 1000000);
+    temp_ticks = cpu_get_ticks();
 
     if (env->cp15.c9_pmcr & PMCRD) {
         /* Increment once every 64 processor clock cycles */
@@ -768,8 +767,7 @@  static uint64_t pmccntr_read(CPUARMState *env, const ARMCPRegInfo *ri)
         return env->cp15.c15_ccnt;
     }
 
-    total_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
-                           get_ticks_per_sec(), 1000000);
+    total_ticks = cpu_get_ticks();
 
     if (env->cp15.c9_pmcr & PMCRD) {
         /* Increment once every 64 processor clock cycles */
@@ -789,8 +787,7 @@  static void pmccntr_write(CPUARMState *env, const ARMCPRegInfo *ri,
         return;
     }
 
-    total_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL),
-                           get_ticks_per_sec(), 1000000);
+    total_ticks = cpu_get_ticks();
 
     if (env->cp15.c9_pmcr & PMCRD) {
         /* Increment once every 64 processor clock cycles */