Message ID | 1430417667-4245-5-git-send-email-christopher.covington@linaro.org |
---|---|
State | New |
Headers | show |
On 30 April 2015 at 19:14, Christopher Covington <christopher.covington@linaro.org> wrote: > Present a system with an instructions per cycle of exactly one. > This makes it less likely a user will mistake the cycle counter > values as meaningful and makes calculations involving cycles > trivial while preserving the necessary property of the cycle > counter register as monotonically increasing. > > Signed-off-by: Christopher Covington <christopher.covington@linaro.org> > --- > target-arm/helper.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/target-arm/helper.c b/target-arm/helper.c > index 3e6fb0b..a027a19 100644 > --- a/target-arm/helper.c > +++ b/target-arm/helper.c > @@ -648,8 +648,7 @@ void pmccntr_sync(CPUARMState *env) > { > uint64_t temp_ticks; > > - temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL), > - get_ticks_per_sec(), 1000000); > + temp_ticks = cpu_get_icount_raw(); Are you really really sure the _raw function is the correct one? Nowhere else in the codebase calls it except the other wrappers in cpu.c which provide a sane view of the instruction count... (I suspect cpu_get_icount_raw() should really be static.) PS: it would be helpful if you could make sure you include a cover letter for future patchseries: patchsets without a cover letter are awkward for my workflow... (A single standalone patch doesn't need a coverletter.) thanks -- PMM
Hi Peter, Thanks for taking a look. On Apr 30, 2015 2:28 PM, "Peter Maydell" <peter.maydell@linaro.org> wrote: > > On 30 April 2015 at 19:14, Christopher Covington > <christopher.covington@linaro.org> wrote: > > Present a system with an instructions per cycle of exactly one. > > This makes it less likely a user will mistake the cycle counter > > values as meaningful and makes calculations involving cycles > > trivial while preserving the necessary property of the cycle > > counter register as monotonically increasing. > > > > Signed-off-by: Christopher Covington <christopher.covington@linaro.org> > > --- > > target-arm/helper.c | 9 +++------ > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > diff --git a/target-arm/helper.c b/target-arm/helper.c > > index 3e6fb0b..a027a19 100644 > > --- a/target-arm/helper.c > > +++ b/target-arm/helper.c > > @@ -648,8 +648,7 @@ void pmccntr_sync(CPUARMState *env) > > { > > uint64_t temp_ticks; > > > > - temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL), > > - get_ticks_per_sec(), 1000000); > > + temp_ticks = cpu_get_icount_raw(); > > Are you really really sure the _raw function is the correct one? > Nowhere else in the codebase calls it except the other wrappers > in cpu.c which provide a sane view of the instruction count... > (I suspect cpu_get_icount_raw() should really be static.) I thought it wasn't being used because it was new, having been introduced by Pavel Dovgalyuk's determinism patches. When you talk about sanity, what would this patch make insane? The instructions per second and cycles per second that would result? If so, what if seconds/timer ticks were changed in the same patch to be derived from the instruction count. All of this could potentially only apply with -icount specified. > PS: it would be helpful if you could make sure you include > a cover letter for future patchseries: patchsets without a > cover letter are awkward for my workflow... (A single > standalone patch doesn't need a coverletter.) Will do. For this series it should have been that the first four patches are hopefully not controversial, but so far have only been tested by some very simple bare metal code (using PSCI exit instead of Angel exit at the end) and booting the Linux kernel and looking at the printks, but not yet by running `perf stat`. The last patch (this one) I realize may be controversial as it tries to make QEMU conform to certain expectations such as determinism that it has not historically met. Thanks, Chris
On 30 April 2015 at 22:33, Christopher Covington <christopher.covington@linaro.org> wrote: > On Apr 30, 2015 2:28 PM, "Peter Maydell" <peter.maydell@linaro.org> wrote: >> Are you really really sure the _raw function is the correct one? >> Nowhere else in the codebase calls it except the other wrappers >> in cpu.c which provide a sane view of the instruction count... >> (I suspect cpu_get_icount_raw() should really be static.) > > I thought it wasn't being used because it was new, having been introduced by > Pavel Dovgalyuk's determinism patches. > > When you talk about sanity, what would this patch make insane? The > instructions per second and cycles per second that would result? If so, what > if seconds/timer ticks were changed in the same patch to be derived from the > instruction count. All of this could potentially only apply with -icount > specified. I don't really know for certain how the code here works, but it makes me very dubious when I see a function that is being used literally nowhere else in any other target CPU, and where every other code path to it goes via taking a lock. Paolo: can you suggest what the right function to call to implement a cycle counter is? thanks -- PMM
On Thu, Apr 30, 2015 at 11:14 AM, Christopher Covington <christopher.covington@linaro.org> wrote: > Present a system with an instructions per cycle of exactly one. > This makes it less likely a user will mistake the cycle counter > values as meaningful and makes calculations involving cycles > trivial while preserving the necessary property of the cycle > counter register as monotonically increasing. > > Signed-off-by: Christopher Covington <christopher.covington@linaro.org> > --- > target-arm/helper.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/target-arm/helper.c b/target-arm/helper.c > index 3e6fb0b..a027a19 100644 > --- a/target-arm/helper.c > +++ b/target-arm/helper.c > @@ -648,8 +648,7 @@ void pmccntr_sync(CPUARMState *env) > { > uint64_t temp_ticks; > > - temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL), > - get_ticks_per_sec(), 1000000); > + temp_ticks = cpu_get_icount_raw(); So I have guests (for better or worse) that make assumptions about the rate of the PMCCNTR WRT real time. But isn't the PMCCNTR really a clock cycle counter rather than an instruction counter? That clock rate is well defined even if it is just the trivial get_ticks_per_sec() at the moment. Ideally we should have a configurable clock rate in there instead of get_ticks_per_sec(). This is a major change in definition. I can see your use case though, where you actually want this to mean something WRT program performance. Should we add a switch between the two behaviours? Regards, Peter > > if (env->cp15.c9_pmcr & PMCRD) { > /* Increment once every 64 processor clock cycles */ > @@ -687,8 +686,7 @@ static uint64_t pmccntr_read(CPUARMState *env, const ARMCPRegInfo *ri) > return env->cp15.c15_ccnt; > } > > - total_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL), > - get_ticks_per_sec(), 1000000); > + total_ticks = cpu_get_icount_raw(); > > if (env->cp15.c9_pmcr & PMCRD) { > /* Increment once every 64 processor clock cycles */ > @@ -708,8 +706,7 @@ static void pmccntr_write(CPUARMState *env, const ARMCPRegInfo *ri, > return; > } > > - total_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL), > - get_ticks_per_sec(), 1000000); > + total_ticks = cpu_get_icount_raw(); > > if (env->cp15.c9_pmcr & PMCRD) { > /* Increment once every 64 processor clock cycles */ > -- > 1.9.1 > >
On Thu, Apr 30, 2015 at 9:24 PM, Peter Crosthwaite <peter.crosthwaite@xilinx.com> wrote: > On Thu, Apr 30, 2015 at 11:14 AM, Christopher Covington > <christopher.covington@linaro.org> wrote: >> Present a system with an instructions per cycle of exactly one. >> This makes it less likely a user will mistake the cycle counter >> values as meaningful and makes calculations involving cycles >> trivial while preserving the necessary property of the cycle >> counter register as monotonically increasing. >> >> Signed-off-by: Christopher Covington <christopher.covington@linaro.org> >> --- >> target-arm/helper.c | 9 +++------ >> 1 file changed, 3 insertions(+), 6 deletions(-) >> >> diff --git a/target-arm/helper.c b/target-arm/helper.c >> index 3e6fb0b..a027a19 100644 >> --- a/target-arm/helper.c >> +++ b/target-arm/helper.c >> @@ -648,8 +648,7 @@ void pmccntr_sync(CPUARMState *env) >> { >> uint64_t temp_ticks; >> >> - temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL), >> - get_ticks_per_sec(), 1000000); >> + temp_ticks = cpu_get_icount_raw(); > > So I have guests (for better or worse) that make assumptions about the > rate of the PMCCNTR WRT real time. But isn't the PMCCNTR really a > clock cycle counter rather than an instruction counter? That clock > rate is well defined even if it is just the trivial > get_ticks_per_sec() at the moment. Ideally we should have a > configurable clock rate in there instead of get_ticks_per_sec(). This > is a major change in definition. Is this with KVM, user emulation, or system emulation mode? I'm mainly looking at things from the TCG (and mostly system mode) perspective. If not using KVM, would the assumptions of your guests hold if you ran them on a hardware device instead of QEMU TCG? I'm still trying to understand all the code involved, but I don't see get_ticks_per_sec() or any other constants as problematic. > I can see your use case though, where you actually want this to mean > something WRT program performance. Should we add a switch between the > two behaviours? This switch already exists, in the form of -icount, and I really should have already been using it. What initially scared me away was qemu_icount_bias and icount_adjust(). Accounting for halting seems like a good thing but the accounting for "speed" by referencing a host clock doesn't make sense to me. If I have an ARM development board hooked up to an x86 desktop, it's not querying the desktop for time. So why does it make sense for QEMU system emulation mode behave like that? The -icount help text makes it sound like adjustment for speed reasons should only take place when "auto" is specified, but I have yet to understand the code well enough to verify that. Chris
On 01/05/2015 00:02, Peter Maydell wrote: > On 30 April 2015 at 22:33, Christopher Covington > <christopher.covington@linaro.org> wrote: >> On Apr 30, 2015 2:28 PM, "Peter Maydell" <peter.maydell@linaro.org> wrote: >>> Are you really really sure the _raw function is the correct one? >>> Nowhere else in the codebase calls it except the other wrappers >>> in cpu.c which provide a sane view of the instruction count... >>> (I suspect cpu_get_icount_raw() should really be static.) >> >> I thought it wasn't being used because it was new, having been introduced by >> Pavel Dovgalyuk's determinism patches. >> >> When you talk about sanity, what would this patch make insane? The right function is cpu_get_ticks(). This one works without icount as well. At the top there is: if (use_icount) { return cpu_get_icount(); } and perhaps it would be correct to use cpu_get_icount_raw() if you do not want the cycle counter to increase in halted state. But at least x86 wants it to increase. Paolo The >> instructions per second and cycles per second that would result? If so, what >> if seconds/timer ticks were changed in the same patch to be derived from the >> instruction count. All of this could potentially only apply with -icount >> specified. > > I don't really know for certain how the code here works, but > it makes me very dubious when I see a function that is being > used literally nowhere else in any other target CPU, and > where every other code path to it goes via taking a lock. > > Paolo: can you suggest what the right function to call to implement > a cycle counter is? > > thanks > -- PMM >
On Fri, May 1, 2015 at 7:35 AM, Christopher Covington <christopher.covington@linaro.org> wrote: > On Thu, Apr 30, 2015 at 9:24 PM, Peter Crosthwaite > <peter.crosthwaite@xilinx.com> wrote: >> On Thu, Apr 30, 2015 at 11:14 AM, Christopher Covington >> <christopher.covington@linaro.org> wrote: >>> Present a system with an instructions per cycle of exactly one. >>> This makes it less likely a user will mistake the cycle counter >>> values as meaningful and makes calculations involving cycles >>> trivial while preserving the necessary property of the cycle >>> counter register as monotonically increasing. >>> >>> Signed-off-by: Christopher Covington <christopher.covington@linaro.org> >>> --- >>> target-arm/helper.c | 9 +++------ >>> 1 file changed, 3 insertions(+), 6 deletions(-) >>> >>> diff --git a/target-arm/helper.c b/target-arm/helper.c >>> index 3e6fb0b..a027a19 100644 >>> --- a/target-arm/helper.c >>> +++ b/target-arm/helper.c >>> @@ -648,8 +648,7 @@ void pmccntr_sync(CPUARMState *env) >>> { >>> uint64_t temp_ticks; >>> >>> - temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL), >>> - get_ticks_per_sec(), 1000000); >>> + temp_ticks = cpu_get_icount_raw(); >> >> So I have guests (for better or worse) that make assumptions about the >> rate of the PMCCNTR WRT real time. But isn't the PMCCNTR really a >> clock cycle counter rather than an instruction counter? That clock >> rate is well defined even if it is just the trivial >> get_ticks_per_sec() at the moment. Ideally we should have a >> configurable clock rate in there instead of get_ticks_per_sec(). This >> is a major change in definition. > > Is this with KVM, user emulation, or system emulation mode? I'm mainly > looking at things from the TCG (and mostly system mode) perspective. The same. My guest has only ever run on TCG. > If not using KVM, would the assumptions of your guests hold if you ran > them on a hardware device instead of QEMU TCG? I run them as a matter of course on hardware devices. What I am most worried about (and I need to run some tests to really confirm facts) is what happens if a CPU WFIs. Should the PMCCNTR continue on or hold its value? If we match instruction execution to PMCCNTR to the PMCCNTR will freeze. Regards, Peter > I'm still trying to > understand all the code involved, but I don't see get_ticks_per_sec() > or any other constants as problematic. > >> I can see your use case though, where you actually want this to mean >> something WRT program performance. Should we add a switch between the >> two behaviours? > > This switch already exists, in the form of -icount, and I really > should have already been using it. What initially scared me away was > qemu_icount_bias and icount_adjust(). Accounting for halting seems > like a good thing but the accounting for "speed" by referencing a host > clock doesn't make sense to me. If I have an ARM development board > hooked up to an x86 desktop, it's not querying the desktop for time. > So why does it make sense for QEMU system emulation mode behave like > that? The -icount help text makes it sound like adjustment for speed > reasons should only take place when "auto" is specified, but I have > yet to understand the code well enough to verify that. > > Chris >
On 6 May 2015 at 15:05, Peter Crosthwaite <peter.crosthwaite@xilinx.com> wrote: > What I am most worried about (and I need to run some tests to really > confirm facts) is what happens if a CPU WFIs. Should the PMCCNTR > continue on or hold its value? If we match instruction execution to > PMCCNTR to the PMCCNTR will freeze. See the v8 ARM ARM D5.1.1: this doesn't count PE clock cycles, it's linked to the hardware processor clock. The exact relationship is IMPDEF so we have some leeway for doing whatever seems reasonable here. Permitted things: * counter can stop on WFI (see D5.1.3) * counter can continue to run on WFI (it's impdef whether the PE clock is gated when in WFI, though I would expect that to be a popular implementation) * counter can read the same value if read twice in a row * counter can run forward a lot even if no insns executed (the example given is of a hyperthreading implementation) So we should do whatever seems most convenient implementation-wise I think... -- PMM
diff --git a/target-arm/helper.c b/target-arm/helper.c index 3e6fb0b..a027a19 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -648,8 +648,7 @@ void pmccntr_sync(CPUARMState *env) { uint64_t temp_ticks; - temp_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL), - get_ticks_per_sec(), 1000000); + temp_ticks = cpu_get_icount_raw(); if (env->cp15.c9_pmcr & PMCRD) { /* Increment once every 64 processor clock cycles */ @@ -687,8 +686,7 @@ static uint64_t pmccntr_read(CPUARMState *env, const ARMCPRegInfo *ri) return env->cp15.c15_ccnt; } - total_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL), - get_ticks_per_sec(), 1000000); + total_ticks = cpu_get_icount_raw(); if (env->cp15.c9_pmcr & PMCRD) { /* Increment once every 64 processor clock cycles */ @@ -708,8 +706,7 @@ static void pmccntr_write(CPUARMState *env, const ARMCPRegInfo *ri, return; } - total_ticks = muldiv64(qemu_clock_get_us(QEMU_CLOCK_VIRTUAL), - get_ticks_per_sec(), 1000000); + total_ticks = cpu_get_icount_raw(); if (env->cp15.c9_pmcr & PMCRD) { /* Increment once every 64 processor clock cycles */
Present a system with an instructions per cycle of exactly one. This makes it less likely a user will mistake the cycle counter values as meaningful and makes calculations involving cycles trivial while preserving the necessary property of the cycle counter register as monotonically increasing. Signed-off-by: Christopher Covington <christopher.covington@linaro.org> --- target-arm/helper.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)