diff mbox series

[RFC,10/11] um: Delay timer_read only in possible busy loops in TT-mode

Message ID 20231103-bb-timetravel-patches-v1-10-e2c68efcf664@uni-rostock.de
State Changes Requested
Headers show
Series Several Time Travel Mode Enhancements | expand

Commit Message

Benjamin Beichler Nov. 3, 2023, 4:41 p.m. UTC
Some userspace programs use the current timestamp as a looping condition
(directly or indirectly), which can lead to infinite loops in TT-mode
since the timestamp doesn't change until the next time step.

Commit 065038706f77 ("um: Support time travel mode") introduced a
workaround by always inserting a time step when the current timestamp is
read. However, this introduces delays, even in cases where such a loop
doesn't exist, for example, in filesystem code that updates access
timestamps.

This slows down external TT-mode as more simulation roundtrips are
required, and it unnecessarily affects the determinism and accuracy of
the simulation.

This patch attempts to detect potential busy loops by identifying the
currently executed syscall that would return a timestamp. If 10
consecutive calls at the same timestamp occur, the timer_read is
delayed.

Since the patch does not consider the PID of the caller, it might be
fooled by legitimate processes that take a single timestamp. However,
the result is no worse than the original behavior.

Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
 arch/um/kernel/time.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

Comments

Johannes Berg Nov. 6, 2023, 8:51 p.m. UTC | #1
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> This slows down external TT-mode as more simulation roundtrips are
> required, and it unnecessarily affects the determinism and accuracy of
> the simulation.

I still don't think this is really true, it doesn't really affect
determinism? It makes it ... different, sure, but not non-deterministic?



> +static const int suspicious_busy_loop_syscalls[] = {
> +	36, //sys_getitimer
> +	96, //sys_gettimeofday
> +	201, //sys_time
> +	224, //sys_timer_gettime
> +	228, //sys_clock_gettime
> +	287, //sys_timerfd_gettime
> +};

That's kind of awful. Surely we can use __NR_timer_gettime etc. here at
least?

> +static bool suspicious_busy_loop(void)
> +{
> +	int i;
> +	int syscall = syscall_get_nr(current, task_pt_regs(current));
> +
> +	for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
> +		if (suspicious_busy_loop_syscalls[i] == syscall)
> +			return true;
> +	}

Might also be faster to have a bitmap? But ... also kind of awkward I
guess.

I dunno. I'm not even sure what you're trying to achieve - apart from
"determinism" which seems odd or even wrong, and speed, which is
probably easier done with a better free-until and the shared memory
calendar we have been working on.

johannes
Benjamin Beichler Nov. 10, 2023, 3:54 p.m. UTC | #2
Am 06.11.2023 um 21:51 schrieb Johannes Berg:
> On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
>> This slows down external TT-mode as more simulation roundtrips are
>> required, and it unnecessarily affects the determinism and accuracy of
>> the simulation.
> I still don't think this is really true, it doesn't really affect
> determinism? It makes it ... different, sure, but not non-deterministic?
I intentionally kept it vague, but what I meant is that it's 
unnecessarily challenging to determine.

Perhaps I should mention that I'm running an unmodified Ubuntu rootfs 
with systemd, which starts many daemons and other processes.

To me, it seems illogical to delay everything just because one process 
is waiting for a timestamp.

At the moment, we haven't patched the random device that fetches random 
bytes from the host (do you already have a patch for this?),
so complete repeatability isn't guaranteed at the moment. However, that 
could be a logical next step.
>> +static const int suspicious_busy_loop_syscalls[] = {
>> +     36, //sys_getitimer
>> +     96, //sys_gettimeofday
>> +     201, //sys_time
>> +     224, //sys_timer_gettime
>> +     228, //sys_clock_gettime
>> +     287, //sys_timerfd_gettime
>> +};
> That's kind of awful. Surely we can use __NR_timer_gettime etc. here at
> least?
Actually, this was a quick attempt to address the issue, and during that 
period, I couldn't locate the appropriate macros.

These numbers are generated from arch/x86/entry/syscalls/syscall_64.tbl 
(or 32 if configured in that manner).

I might be overlooking something, but it seems that __NR_timer_gettime 
isn't defined in the kernel. If you have a better reference for this 
translation, I'd appreciate it.

I could verify if the current syscall translates into the corresponding 
function symbol in the um-syscall table.
>> +static bool suspicious_busy_loop(void)
>> +{
>> +     int i;
>> +     int syscall = syscall_get_nr(current, task_pt_regs(current));
>> +
>> +     for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
>> +             if (suspicious_busy_loop_syscalls[i] == syscall)
>> +                     return true;
>> +     }
> Might also be faster to have a bitmap? But ... also kind of awkward I
> guess.
Actually, a short fixed size array should be optimized quite well with 
loop unrolling or other stuff, isn't it? I could also do a switch with 
all calls, but this loop seems for me the easiest.
> I dunno. I'm not even sure what you're trying to achieve - apart from
> "determinism" which seems odd or even wrong, and speed, which is
> probably easier done with a better free-until and the shared memory
> calendar we have been working on.
In my perspective, delaying get_timer only serves as a tie-breaker for 
poorly behaving software that resorts to a busy-loop while waiting for 
time to advance.

While this behavior might not be uncommon, why penalize all processes 
for it?

Consider an experiment where I aim to measure the impact of network 
latency on software. Sometimes, the response latency fluctuates
because a background task was scheduled randomly but deterministically 
at the same time, obtaining a timestamp that blocks all other
processes and advances simulation time. This action simply undermines 
the utility of the time travel mode unnecessarily.

However, software actively waiting for time advancement in a busy loop 
achieves its goal. It’s almost a win-win situation, isn't it?

Benjamin
Benjamin Berg Nov. 10, 2023, 4:39 p.m. UTC | #3
On Fri, 2023-11-10 at 16:54 +0100, Benjamin Beichler wrote:
> At the moment, we haven't patched the random device that fetches random 
> bytes from the host (do you already have a patch for this?),
> so complete repeatability isn't guaranteed at the moment. However, that 
> could be a logical next step.

Right, we have the attached kernel patches internally. This simply
disables some of the random sources and replaces os_getrandom with
returning static random from the UML_RANDOM environment variable.

I doubt that it makes sense to upstream these patches, but may we can
include them as patch files in USFSTL or so.

The second piece is using a mount namespace to ensure that the linux
command line is identical between runs and that the location of all
files that are accessed directly from the host through hostfs never
changes.

The last piece was setting GLIBC_TUNABLES=-AVX512CD in the environment
just in case the CPU feature set is slightly different. That would
cause ld.so to search for a different set of optimized library versions
(affecting syscalls and with that randomness).

Benjamin
Johannes Berg Nov. 10, 2023, 5:29 p.m. UTC | #4
On Fri, 2023-11-10 at 16:54 +0100, Benjamin Beichler wrote:
> Am 06.11.2023 um 21:51 schrieb Johannes Berg:
> > On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> > > This slows down external TT-mode as more simulation roundtrips are
> > > required, and it unnecessarily affects the determinism and accuracy of
> > > the simulation.
> > I still don't think this is really true, it doesn't really affect
> > determinism? It makes it ... different, sure, but not non-deterministic?
> I intentionally kept it vague, but what I meant is that it's 
> unnecessarily challenging to determine.

Yeah, ok, fair enough.

> Perhaps I should mention that I'm running an unmodified Ubuntu rootfs 
> with systemd, which starts many daemons and other processes.

That sounds like a bit of a nightmare, to be honest, wouldn't you want
to keep things under tighter control? But I guess it really depends on
what you're trying to achieve.

> To me, it seems illogical to delay everything just because one process 
> is waiting for a timestamp.

Yeah I guess we'll just have to disagree ;-) You're running some
process, so you've kind of decided to "give it time" of sorts, and in a
normal system reading time will always take time, just like everything
else :)

But anyway, I'm not really opposed to this patch, it's just ... not
great, I guess? And like I said, makes more sense to squash 9 and 10?

> At the moment, we haven't patched the random device that fetches random 
> bytes from the host (do you already have a patch for this?),
> so complete repeatability isn't guaranteed at the moment. However, that 
> could be a logical next step.
> > > +static const int suspicious_busy_loop_syscalls[] = {
> > > +     36, //sys_getitimer
> > > +     96, //sys_gettimeofday
> > > +     201, //sys_time
> > > +     224, //sys_timer_gettime
> > > +     228, //sys_clock_gettime
> > > +     287, //sys_timerfd_gettime
> > > +};
> > That's kind of awful. Surely we can use __NR_timer_gettime etc. here at
> > least?
> Actually, this was a quick attempt to address the issue, and during that 
> period, I couldn't locate the appropriate macros.
> 
> These numbers are generated from arch/x86/entry/syscalls/syscall_64.tbl 
> (or 32 if configured in that manner).
> 
> I might be overlooking something, but it seems that __NR_timer_gettime 
> isn't defined in the kernel. If you have a better reference for this 
> translation, I'd appreciate it.

Look at the arch/x86/include/generated/uapi/asm/unistd*.h files after
you build the tree. How do they actually get generated? Beats me.

> > > +static bool suspicious_busy_loop(void)
> > > +{
> > > +     int i;
> > > +     int syscall = syscall_get_nr(current, task_pt_regs(current));
> > > +
> > > +     for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
> > > +             if (suspicious_busy_loop_syscalls[i] == syscall)
> > > +                     return true;
> > > +     }
> > Might also be faster to have a bitmap? But ... also kind of awkward I
> > guess.
> Actually, a short fixed size array should be optimized quite well with 
> loop unrolling or other stuff, isn't it? I could also do a switch with 
> all calls, but this loop seems for me the easiest.

Yeah, maybe. I haven't checked the output.

> > I dunno. I'm not even sure what you're trying to achieve - apart from
> > "determinism" which seems odd or even wrong, and speed, which is
> > probably easier done with a better free-until and the shared memory
> > calendar we have been working on.
> In my perspective, delaying get_timer only serves as a tie-breaker for 
> poorly behaving software that resorts to a busy-loop while waiting for 
> time to advance.

Yeah I guess that's true.

> While this behavior might not be uncommon, why penalize all processes 
> for it?

Well I think we have a different sense of "penalize", I wouldn't say
that. I mean, you can't reasonably expect getting a timestamp doesn't
take any time at all, that's just not how physical reality works? Now
we're bending the rules here in that a lot of things that normally take
time suddenly don't, but I guess I don't fully understand why you're so
keen on bending the rules _all the way_.

But I think that's this:

> Consider an experiment where I aim to measure the impact of network 
> latency on software. Sometimes, the response latency fluctuates
> because a background task was scheduled randomly but deterministically 

"randomly but deterministically" is kind of fun 😂️

> at the same time, obtaining a timestamp that blocks all other
> processes and advances simulation time. This action simply undermines 
> the utility of the time travel mode unnecessarily.
> 
> However, software actively waiting for time advancement in a busy loop 
> achieves its goal. It’s almost a win-win situation, isn't it?

Fair enough.

johannes
diff mbox series

Patch

diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index f76237cfc1ea..1782f1ed140e 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -22,6 +22,7 @@ 
 #include <linux/time-internal.h>
 #include <linux/um_timetravel.h>
 #include <shared/init.h>
+#include <asm/syscall-generic.h>
 
 #ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
 enum time_travel_mode time_travel_mode;
@@ -721,6 +722,27 @@  static irqreturn_t um_timer(int irq, void *dev)
 	return IRQ_HANDLED;
 }
 
+static const int suspicious_busy_loop_syscalls[] = {
+	36, //sys_getitimer
+	96, //sys_gettimeofday
+	201, //sys_time
+	224, //sys_timer_gettime
+	228, //sys_clock_gettime
+	287, //sys_timerfd_gettime
+};
+
+static bool suspicious_busy_loop(void)
+{
+	int i;
+	int syscall = syscall_get_nr(current, task_pt_regs(current));
+
+	for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
+		if (suspicious_busy_loop_syscalls[i] == syscall)
+			return true;
+	}
+	return false;
+};
+
 static u64 timer_read(struct clocksource *cs)
 {
 	static unsigned long long last_timer_read;
@@ -742,11 +764,12 @@  static u64 timer_read(struct clocksource *cs)
 		if (last_timer_read != time_travel_time) {
 			last_timer_read = time_travel_time;
 			consecutive_calls_at_same_time = 0;
-		} else {
+		} else if (suspicious_busy_loop()) {
 			consecutive_calls_at_same_time++;
 		}
 		if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
-		    !time_travel_ext_waiting && consecutive_calls_at_same_time > 10)
+		    !time_travel_ext_waiting && consecutive_calls_at_same_time > 10 &&
+		    suspicious_busy_loop())
 			time_travel_update_time(time_travel_time +
 						TIMER_MULTIPLIER,
 						false);