Message ID | 20190716072038.8408-2-felipe.balbi@linux.intel.com |
---|---|
State | RFC |
Delegated to: | David Miller |
Headers | show |
Series | PTP: add support for Intel's TGPIO controller | expand |
Felipe, On Tue, 16 Jul 2019, Felipe Balbi wrote: -ENOCHANGELOG As you said in the cover letter: > (3) The change in arch/x86/kernel/tsc.c needs to be reviewed at length > before going in. So some information what those interfaces are used for and why they are needed would be really helpful. > +void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns) > +{ > + u64 tmp, res, rem; > + u64 cycles; > + > + tsc_counterval->cycles = clocksource_tsc.read(NULL); > + cycles = tsc_counterval->cycles; > + tsc_counterval->cs = art_related_clocksource; > + > + rem = do_div(cycles, tsc_khz); > + > + res = cycles * USEC_PER_SEC; > + tmp = rem * USEC_PER_SEC; > + > + do_div(tmp, tsc_khz); > + res += tmp; > + > + *tsc_ns = res; > +} > +EXPORT_SYMBOL(get_tsc_ns); > + > +u64 get_art_ns_now(void) > +{ > + struct system_counterval_t tsc_cycles; > + u64 tsc_ns; > + > + get_tsc_ns(&tsc_cycles, &tsc_ns); > + > + return tsc_ns; > +} > +EXPORT_SYMBOL(get_art_ns_now); While the changes look innocuous I'm missing the big picture why this needs to emulate ART instead of simply using TSC directly. Thanks, tglx
Hi, Thomas Gleixner <tglx@linutronix.de> writes: > Felipe, > > On Tue, 16 Jul 2019, Felipe Balbi wrote: > > -ENOCHANGELOG > > As you said in the cover letter: > >> (3) The change in arch/x86/kernel/tsc.c needs to be reviewed at length >> before going in. > > So some information what those interfaces are used for and why they are > needed would be really helpful. Okay, I have some more details about this. The TGPIO device itself uses ART since TSC is not directly available to anything other than the CPU. The 'problem' here is that reading ART incurs extra latency which we would like to avoid. Therefore, we use TSC and scale it to nanoseconds which, would be the same as ART to ns. >> +void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns) >> +{ >> + u64 tmp, res, rem; >> + u64 cycles; >> + >> + tsc_counterval->cycles = clocksource_tsc.read(NULL); >> + cycles = tsc_counterval->cycles; >> + tsc_counterval->cs = art_related_clocksource; >> + >> + rem = do_div(cycles, tsc_khz); >> + >> + res = cycles * USEC_PER_SEC; >> + tmp = rem * USEC_PER_SEC; >> + >> + do_div(tmp, tsc_khz); >> + res += tmp; >> + >> + *tsc_ns = res; >> +} >> +EXPORT_SYMBOL(get_tsc_ns); >> + >> +u64 get_art_ns_now(void) >> +{ >> + struct system_counterval_t tsc_cycles; >> + u64 tsc_ns; >> + >> + get_tsc_ns(&tsc_cycles, &tsc_ns); >> + >> + return tsc_ns; >> +} >> +EXPORT_SYMBOL(get_art_ns_now); > > While the changes look innocuous I'm missing the big picture why this needs > to emulate ART instead of simply using TSC directly. i don't think we're emulating ART here (other than the name in the function). We're just reading TSC and converting to nanoseconds, right? Cheers
Felipe, On Thu, 15 Aug 2019, Felipe Balbi wrote: > Thomas Gleixner <tglx@linutronix.de> writes: > > On Tue, 16 Jul 2019, Felipe Balbi wrote: > > > > So some information what those interfaces are used for and why they are > > needed would be really helpful. > > Okay, I have some more details about this. The TGPIO device itself uses > ART since TSC is not directly available to anything other than the > CPU. The 'problem' here is that reading ART incurs extra latency which > we would like to avoid. Therefore, we use TSC and scale it to > nanoseconds which, would be the same as ART to ns. Fine. But that's not really correct: TSC = art_to_tsc_offset + ART * scale; > >> +void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns) Why is this not returning the result instead of having that pointer indirection? > >> +{ > >> + u64 tmp, res, rem; > >> + u64 cycles; > >> + > >> + tsc_counterval->cycles = clocksource_tsc.read(NULL); > >> + cycles = tsc_counterval->cycles; > >> + tsc_counterval->cs = art_related_clocksource; So this does more than returning the TSC time converted to nanoseconds. The function name should reflect this. Plus both functions want kernel-doc explaining what they do. > >> + rem = do_div(cycles, tsc_khz); > >> + > >> + res = cycles * USEC_PER_SEC; > >> + tmp = rem * USEC_PER_SEC; > >> + > >> + do_div(tmp, tsc_khz); > >> + res += tmp; > >> + > >> + *tsc_ns = res; > >> +} > >> +EXPORT_SYMBOL(get_tsc_ns); > >> + > >> +u64 get_art_ns_now(void) > >> +{ > >> + struct system_counterval_t tsc_cycles; > >> + u64 tsc_ns; > >> + > >> + get_tsc_ns(&tsc_cycles, &tsc_ns); > >> + > >> + return tsc_ns; > >> +} > >> +EXPORT_SYMBOL(get_art_ns_now); > > > > While the changes look innocuous I'm missing the big picture why this needs > > to emulate ART instead of simply using TSC directly. > > i don't think we're emulating ART here (other than the name in the > function). We're just reading TSC and converting to nanoseconds, right? Well, the function name says clearly: get_art_ns_now(). But you are not using ART, you use the TSC and derive the ART value from it (incorrectly). Thanks, tglx
Hi, (sorry for the long delay, got caught up in other tasks) Thomas Gleixner <tglx@linutronix.de> writes: > On Thu, 15 Aug 2019, Felipe Balbi wrote: >> Thomas Gleixner <tglx@linutronix.de> writes: >> > On Tue, 16 Jul 2019, Felipe Balbi wrote: >> > >> > So some information what those interfaces are used for and why they are >> > needed would be really helpful. >> >> Okay, I have some more details about this. The TGPIO device itself uses >> ART since TSC is not directly available to anything other than the >> CPU. The 'problem' here is that reading ART incurs extra latency which >> we would like to avoid. Therefore, we use TSC and scale it to >> nanoseconds which, would be the same as ART to ns. > > Fine. But that's not really correct: > > TSC = art_to_tsc_offset + ART * scale; From silicon folks I got the equation: ART = ECX * EBX / EAX; If I'm reading this correctly, that's basically what native_calibrate_tsc() does (together with some error checking the safe defaults). Couldn't we, instead, just have a single function like below? u64 convert_tsc_to_art_ns() { return x86_platform.calibrate_tsc(); } Another way would be extract the important parts from native_calibrate_tsc() into a separate helper. This would safe another call to cpuid(0x15,...); >> >> +void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns) > > Why is this not returning the result instead of having that pointer > indirection? That can be changed easily, no worries. >> >> +{ >> >> + u64 tmp, res, rem; >> >> + u64 cycles; >> >> + >> >> + tsc_counterval->cycles = clocksource_tsc.read(NULL); >> >> + cycles = tsc_counterval->cycles; >> >> + tsc_counterval->cs = art_related_clocksource; > > So this does more than returning the TSC time converted to nanoseconds. The > function name should reflect this. Plus both functions want kernel-doc > explaining what they do. convert_tsc_to_art_ns()? That would be analogous to convert_art_to_tsc() and convert_art_ns_to_tsc(). cheers
Hi, On Tue, 1 Oct 2019, Felipe Balbi wrote: > (sorry for the long delay, got caught up in other tasks) Delayed by vacation :) > Thomas Gleixner <tglx@linutronix.de> writes: > > On Thu, 15 Aug 2019, Felipe Balbi wrote: > >> Thomas Gleixner <tglx@linutronix.de> writes: > >> > On Tue, 16 Jul 2019, Felipe Balbi wrote: > >> > > >> > So some information what those interfaces are used for and why they are > >> > needed would be really helpful. > >> > >> Okay, I have some more details about this. The TGPIO device itself uses > >> ART since TSC is not directly available to anything other than the > >> CPU. The 'problem' here is that reading ART incurs extra latency which > >> we would like to avoid. Therefore, we use TSC and scale it to > >> nanoseconds which, would be the same as ART to ns. > > > > Fine. But that's not really correct: > > > > TSC = art_to_tsc_offset + ART * scale; > > From silicon folks I got the equation: > > ART = ECX * EBX / EAX; What is the content of ECX/EBX/EAX and where is it coming from? > If I'm reading this correctly, that's basically what > native_calibrate_tsc() does (together with some error checking the safe > defaults). Couldn't we, instead, just have a single function like below? > > u64 convert_tsc_to_art_ns() > { > return x86_platform.calibrate_tsc(); > } Huch? How is that supposed to work? calibrate_tsc() returns the TSC frequency. > Another way would be extract the important parts from > native_calibrate_tsc() into a separate helper. This would safe another > call to cpuid(0x15,...); What for? The relation between TSC and ART is already established via detect_art() which reads all relevant data out of CPUID(ART_CPUID_LEAF). We use exactly that information for convert_art_to_tsc() so the obvious solution for calculating ART from TSC is to do the reverse operation. convert_art_to_tsc() { rem = do_div(art, art_to_tsc_denominator); res = art * art_to_tsc_numerator; tmp = rem * art_to_tsc_numerator; do_div(tmp, art_to_tsc_denominator); res += tmp + art_to_tsc_offset; } which is translated into math: TSC = ART * SCALE + OFFSET where SCALE = N / D and N = CPUID(ART_CPUID_LEAF).EAX D = CPUID(ART_CPUID_LEAF).EBX So the obvious reverse operation is: ART = (TSC - OFFSET) / SCALE; Translating that into code should not be rocket science. Thanks, tglx
Hi, Thomas Gleixner <tglx@linutronix.de> writes: >> Thomas Gleixner <tglx@linutronix.de> writes: >> > On Thu, 15 Aug 2019, Felipe Balbi wrote: >> >> Thomas Gleixner <tglx@linutronix.de> writes: >> >> > On Tue, 16 Jul 2019, Felipe Balbi wrote: >> >> > >> >> > So some information what those interfaces are used for and why they are >> >> > needed would be really helpful. >> >> >> >> Okay, I have some more details about this. The TGPIO device itself uses >> >> ART since TSC is not directly available to anything other than the >> >> CPU. The 'problem' here is that reading ART incurs extra latency which >> >> we would like to avoid. Therefore, we use TSC and scale it to >> >> nanoseconds which, would be the same as ART to ns. >> > >> > Fine. But that's not really correct: >> > >> > TSC = art_to_tsc_offset + ART * scale; >> >> From silicon folks I got the equation: >> >> ART = ECX * EBX / EAX; > > What is the content of ECX/EBX/EAX and where is it coming from? Since last email, I got a bit of extra information about how all of these should work. ECX contains crystal rate of TSC, EBX and EAX contain constants for converting between TSC and ART. So, ART = tsc_cycles * EBX/EAX, this will give me ART cycles. Also, the time gpio IP needs ART cycles to be written to its comparator registers, but the PTP framework wants time in nanoseconds. Therefore we need two new conversion functions: convert_tsc_to_art_cycles() and something which gives us current TSC in nanoseconds. >> If I'm reading this correctly, that's basically what >> native_calibrate_tsc() does (together with some error checking the safe >> defaults). Couldn't we, instead, just have a single function like below? >> >> u64 convert_tsc_to_art_ns() >> { >> return x86_platform.calibrate_tsc(); >> } > > Huch? How is that supposed to work? calibrate_tsc() returns the TSC > frequency. Yup, that was a total brain fart. >> Another way would be extract the important parts from >> native_calibrate_tsc() into a separate helper. This would safe another >> call to cpuid(0x15,...); > > What for? > > The relation between TSC and ART is already established via detect_art() > which reads all relevant data out of CPUID(ART_CPUID_LEAF). > > We use exactly that information for convert_art_to_tsc() so the obvious > solution for calculating ART from TSC is to do the reverse operation. > > convert_art_to_tsc() > { > rem = do_div(art, art_to_tsc_denominator); > > res = art * art_to_tsc_numerator; > tmp = rem * art_to_tsc_numerator; > > do_div(tmp, art_to_tsc_denominator); > res += tmp + art_to_tsc_offset; > } > > which is translated into math: > > TSC = ART * SCALE + OFFSET > > where > > SCALE = N / D > > and > > N = CPUID(ART_CPUID_LEAF).EAX > D = CPUID(ART_CPUID_LEAF).EBX > > So the obvious reverse operation is: > > ART = (TSC - OFFSET) / SCALE; > > Translating that into code should not be rocket science. Right, that's where we got after talking to other folks. Do you have any preferences for the function names? Or does convert_tsc_to_art() sound ok?
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h index 8a0c25c6bf09..b7a9f4385a82 100644 --- a/arch/x86/include/asm/tsc.h +++ b/arch/x86/include/asm/tsc.h @@ -32,6 +32,8 @@ static inline cycles_t get_cycles(void) extern struct system_counterval_t convert_art_to_tsc(u64 art); extern struct system_counterval_t convert_art_ns_to_tsc(u64 art_ns); +extern void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns); +extern u64 get_art_ns_now(void); extern void tsc_early_init(void); extern void tsc_init(void); diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 0b29e58f288e..333fffc1db7c 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1215,6 +1215,38 @@ struct system_counterval_t convert_art_to_tsc(u64 art) } EXPORT_SYMBOL(convert_art_to_tsc); +void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns) +{ + u64 tmp, res, rem; + u64 cycles; + + tsc_counterval->cycles = clocksource_tsc.read(NULL); + cycles = tsc_counterval->cycles; + tsc_counterval->cs = art_related_clocksource; + + rem = do_div(cycles, tsc_khz); + + res = cycles * USEC_PER_SEC; + tmp = rem * USEC_PER_SEC; + + do_div(tmp, tsc_khz); + res += tmp; + + *tsc_ns = res; +} +EXPORT_SYMBOL(get_tsc_ns); + +u64 get_art_ns_now(void) +{ + struct system_counterval_t tsc_cycles; + u64 tsc_ns; + + get_tsc_ns(&tsc_cycles, &tsc_ns); + + return tsc_ns; +} +EXPORT_SYMBOL(get_art_ns_now); + /** * convert_art_ns_to_tsc() - Convert ART in nanoseconds to TSC. * @art_ns: ART (Always Running Timer) in unit of nanoseconds
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> --- arch/x86/include/asm/tsc.h | 2 ++ arch/x86/kernel/tsc.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+)