diff mbox series

[v6,28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

Message ID 20220506000008.30892-29-ricardo.neri-calderon@linux.intel.com (mailing list archive)
State Handled Elsewhere
Headers show
Series x86: Implement an HPET-based hardlockup detector | expand

Commit Message

Ricardo Neri May 6, 2022, midnight UTC
The HPET hardlockup detector relies on tsc_khz to estimate the value of
that the TSC will have when its HPET channel fires. A refined tsc_khz
helps to estimate better the expected TSC value.

Using the early value of tsc_khz may lead to a large error in the expected
TSC value. Restarting the NMI watchdog detector has the effect of kicking
its HPET channel and make use of the refined tsc_khz.

When the HPET hardlockup is not in use, restarting the NMI watchdog is
a noop.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v5:
 * Introduced this patch

Changes since v4
 * N/A

Changes since v3
 * N/A

Changes since v2:
 * N/A

Changes since v1:
 * N/A
---
 arch/x86/kernel/tsc.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Nicholas Piggin May 10, 2022, 11:16 a.m. UTC | #1
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> The HPET hardlockup detector relies on tsc_khz to estimate the value of
> that the TSC will have when its HPET channel fires. A refined tsc_khz
> helps to estimate better the expected TSC value.
> 
> Using the early value of tsc_khz may lead to a large error in the expected
> TSC value. Restarting the NMI watchdog detector has the effect of kicking
> its HPET channel and make use of the refined tsc_khz.
> 
> When the HPET hardlockup is not in use, restarting the NMI watchdog is
> a noop.
> 
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Stephane Eranian <eranian@google.com>
> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
> Cc: iommu@lists.linux-foundation.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: x86@kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v5:
>  * Introduced this patch
> 
> Changes since v4
>  * N/A
> 
> Changes since v3
>  * N/A
> 
> Changes since v2:
>  * N/A
> 
> Changes since v1:
>  * N/A
> ---
>  arch/x86/kernel/tsc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index cafacb2e58cc..cc1843044d88 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct work_struct *work)
>  	/* Inform the TSC deadline clockevent devices about the recalibration */
>  	lapic_update_tsc_freq();
>  
> +	/*
> +	 * If in use, the HPET hardlockup detector relies on tsc_khz.
> +	 * Reconfigure it to make use of the refined tsc_khz.
> +	 */
> +	lockup_detector_reconfigure();

I don't know if the API is conceptually good.

You change something that the lockup detector is currently using, 
*while* the detector is running asynchronously, and then reconfigure
it. What happens in the window? If this code is only used for small
adjustments maybe it does not really matter but in principle it's
a bad API to export.

lockup_detector_reconfigure as an internal API is okay because it
reconfigures things while the watchdog is stopped [actually that
looks untrue for soft dog which uses watchdog_thresh in
is_softlockup(), but that should be fixed].

You're the arch so you're allowed to stop the watchdog and configure
it, e.g., hardlockup_detector_perf_stop() is called in arch/.

So you want to disable HPET watchdog if it was enabled, then update
wherever you're using tsc_khz, then re-enable.

Thanks,
Nick
Thomas Gleixner May 10, 2022, 11:44 a.m. UTC | #2
On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
>> +	/*
>> +	 * If in use, the HPET hardlockup detector relies on tsc_khz.
>> +	 * Reconfigure it to make use of the refined tsc_khz.
>> +	 */
>> +	lockup_detector_reconfigure();
>
> I don't know if the API is conceptually good.
>
> You change something that the lockup detector is currently using, 
> *while* the detector is running asynchronously, and then reconfigure
> it. What happens in the window? If this code is only used for small
> adjustments maybe it does not really matter but in principle it's
> a bad API to export.
>
> lockup_detector_reconfigure as an internal API is okay because it
> reconfigures things while the watchdog is stopped [actually that
> looks untrue for soft dog which uses watchdog_thresh in
> is_softlockup(), but that should be fixed].
>
> You're the arch so you're allowed to stop the watchdog and configure
> it, e.g., hardlockup_detector_perf_stop() is called in arch/.
>
> So you want to disable HPET watchdog if it was enabled, then update
> wherever you're using tsc_khz, then re-enable.

The real question is whether making this refined tsc_khz value
immediately effective matters at all. IMO, it does not because up to
that point the watchdog was happily using the coarse calibrated value
and the whole use TSC to assess whether the HPET fired mechanism is just
a guestimate anyway. So what's the point of trying to guess 'more
correct'.

Thanks,

        tglx
Ricardo Neri May 17, 2022, 10:08 p.m. UTC | #3
On Tue, May 10, 2022 at 09:16:21PM +1000, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> > The HPET hardlockup detector relies on tsc_khz to estimate the value of
> > that the TSC will have when its HPET channel fires. A refined tsc_khz
> > helps to estimate better the expected TSC value.
> > 
> > Using the early value of tsc_khz may lead to a large error in the expected
> > TSC value. Restarting the NMI watchdog detector has the effect of kicking
> > its HPET channel and make use of the refined tsc_khz.
> > 
> > When the HPET hardlockup is not in use, restarting the NMI watchdog is
> > a noop.
> > 
> > Cc: Andi Kleen <ak@linux.intel.com>
> > Cc: Stephane Eranian <eranian@google.com>
> > Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
> > Cc: iommu@lists.linux-foundation.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> > Changes since v5:
> >  * Introduced this patch
> > 
> > Changes since v4
> >  * N/A
> > 
> > Changes since v3
> >  * N/A
> > 
> > Changes since v2:
> >  * N/A
> > 
> > Changes since v1:
> >  * N/A
> > ---
> >  arch/x86/kernel/tsc.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index cafacb2e58cc..cc1843044d88 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> >  	/* Inform the TSC deadline clockevent devices about the recalibration */
> >  	lapic_update_tsc_freq();
> >  
> > +	/*
> > +	 * If in use, the HPET hardlockup detector relies on tsc_khz.
> > +	 * Reconfigure it to make use of the refined tsc_khz.
> > +	 */
> > +	lockup_detector_reconfigure();
> 
> I don't know if the API is conceptually good.
> 
> You change something that the lockup detector is currently using, 
> *while* the detector is running asynchronously, and then reconfigure
> it. 

Yes, this is what happens.

> What happens in the window? If this code is only used for small
> adjustments maybe it does not really matter

Please see my comment

> but in principle it's a bad API to export.
> 
> lockup_detector_reconfigure as an internal API is okay because it
> reconfigures things while the watchdog is stopped

I see.

> [actually that  looks untrue for soft dog which uses watchdog_thresh in
> is_softlockup(), but that should be fixed].

Perhaps there should be a watchdog_thresh_user. When the user updates it,
the detector is stopped, watchdog_thresh is updated, and then the detector
is resumed.

> 
> You're the arch so you're allowed to stop the watchdog and configure
> it, e.g., hardlockup_detector_perf_stop() is called in arch/.

I had it like this but it did not look right to me. You are right, however,
I can stop/restart the watchdog when needed.

Thanks and BR,
Ricardo
Ricardo Neri May 17, 2022, 10:53 p.m. UTC | #4
On Tue, May 10, 2022 at 01:44:05PM +0200, Thomas Gleixner wrote:
> On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote:
> > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> >> +	/*
> >> +	 * If in use, the HPET hardlockup detector relies on tsc_khz.
> >> +	 * Reconfigure it to make use of the refined tsc_khz.
> >> +	 */
> >> +	lockup_detector_reconfigure();
> >
> > I don't know if the API is conceptually good.
> >
> > You change something that the lockup detector is currently using, 
> > *while* the detector is running asynchronously, and then reconfigure
> > it. What happens in the window? If this code is only used for small
> > adjustments maybe it does not really matter but in principle it's
> > a bad API to export.
> >
> > lockup_detector_reconfigure as an internal API is okay because it
> > reconfigures things while the watchdog is stopped [actually that
> > looks untrue for soft dog which uses watchdog_thresh in
> > is_softlockup(), but that should be fixed].
> >
> > You're the arch so you're allowed to stop the watchdog and configure
> > it, e.g., hardlockup_detector_perf_stop() is called in arch/.
> >
> > So you want to disable HPET watchdog if it was enabled, then update
> > wherever you're using tsc_khz, then re-enable.
> 
> The real question is whether making this refined tsc_khz value
> immediately effective matters at all. IMO, it does not because up to
> that point the watchdog was happily using the coarse calibrated value
> and the whole use TSC to assess whether the HPET fired mechanism is just
> a guestimate anyway. So what's the point of trying to guess 'more
> correct'.

In some of my test systems I observed that, the TSC value does not fall
within the expected error window the first time the HPET channel expires.

I inferred that the error computed using the coarser tsc_khz was wrong.
Recalculating the error window with refined tsc_khz would correct it.

However, restarting the timer has the side-effect of kicking the timer and,
therefore pushing the first HPET NMI further in the future.

Perhaps kicking HPET channel, not recomputing the error window, corrected
(masked?) the problem.

I will investigate further and rework or drop this patch as needed.

Thanks and BR,
Ricardo
diff mbox series

Patch

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index cafacb2e58cc..cc1843044d88 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1386,6 +1386,12 @@  static void tsc_refine_calibration_work(struct work_struct *work)
 	/* Inform the TSC deadline clockevent devices about the recalibration */
 	lapic_update_tsc_freq();
 
+	/*
+	 * If in use, the HPET hardlockup detector relies on tsc_khz.
+	 * Reconfigure it to make use of the refined tsc_khz.
+	 */
+	lockup_detector_reconfigure();
+
 	/* Update the sched_clock() rate to match the clocksource one */
 	for_each_possible_cpu(cpu)
 		set_cyc2ns_scale(tsc_khz, cpu, tsc_stop);