diff mbox series

[15/15] sched/cputime: Handle dyntick-idle steal time correctly

Message ID 20260116145208.87445-16-frederic@kernel.org (mailing list archive)
State Handled Elsewhere
Headers show
Series tick/sched: Refactor idle cputime accounting | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu fail boot (corenet32_smp_defconfig, e500mc, qemu-system-ppc openbios-ppc, ppc-rootfs.cpio.gz, fedora-42) failed at step Run qemu-e500mc with fedora-42 build kernel.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 5 jobs.

Commit Message

Frederic Weisbecker Jan. 16, 2026, 2:52 p.m. UTC
The dyntick-idle steal time is currently accounted when the tick
restarts but the stolen idle time is not substracted from the idle time
that was already accounted. This is to avoid observing the idle time
going backward as the dyntick-idle cputime accessors can't reliably know
in advance the stolen idle time.

In order to maintain a forward progressing idle cputime while
substracting idle steal time from it, keep track of the previously
accounted idle stolen time and substract it from _later_ idle cputime
accounting.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/kernel_stat.h | 1 +
 kernel/sched/cputime.c      | 9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index e59916477075..a5b5a25c3cc1 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -39,6 +39,7 @@  struct kernel_cpustat {
 	bool		idle_elapse;
 	seqcount_t	idle_sleeptime_seq;
 	ktime_t		idle_entrytime;
+	u64		idle_steal;
 #endif
 	u64		cpustat[NR_STATS];
 };
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 3dadfaa92b27..749a6ed4d2fa 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -424,18 +424,25 @@  static void kcpustat_idle_stop(struct kernel_cpustat *kc, ktime_t now)
 {
 	u64 *cpustat = kc->cpustat;
 	ktime_t delta;
+	u64 steal, steal_delta;
 
 	if (!kc->idle_elapse)
 		return;
 
 	delta = ktime_sub(now, kc->idle_entrytime);
+	steal = steal_account_process_time(delta);
 
 	write_seqcount_begin(&kc->idle_sleeptime_seq);
+	steal_delta = min_t(u64, kc->idle_steal, delta);
+	delta -= steal_delta;
+	kc->idle_steal -= steal_delta;
+
 	if (nr_iowait_cpu(smp_processor_id()) > 0)
 		cpustat[CPUTIME_IOWAIT] = ktime_add(cpustat[CPUTIME_IOWAIT], delta);
 	else
 		cpustat[CPUTIME_IDLE] = ktime_add(cpustat[CPUTIME_IDLE], delta);
 
+	kc->idle_steal += steal;
 	kc->idle_entrytime = now;
 	kc->idle_elapse = false;
 	write_seqcount_end(&kc->idle_sleeptime_seq);
@@ -459,7 +466,6 @@  void kcpustat_dyntick_stop(ktime_t now)
 		kc->idle_dyntick = false;
 		irqtime_dyntick_stop();
 		vtime_dyntick_stop();
-		steal_account_process_time(ULONG_MAX);
 	}
 }
 
@@ -507,6 +513,7 @@  static u64 kcpustat_field_dyntick(int cpu, enum cpu_usage_stat idx,
 		if (kc->idle_elapse && compute_delta) {
 			ktime_t delta = ktime_sub(now, kc->idle_entrytime);
 
+			delta -= min_t(u64, kc->idle_steal, (u64)delta);
 			idle = ktime_add(cpustat[idx], delta);
 		} else {
 			idle = cpustat[idx];