diff mbox series

[RFC,04/11] um: Handle UM_TIMETRAVEL_RUN only in idle loop, signal success in ACK

Message ID 20231103-bb-timetravel-patches-v1-4-e2c68efcf664@uni-rostock.de
State Changes Requested
Headers show
Series Several Time Travel Mode Enhancements | expand

Commit Message

Benjamin Beichler Nov. 3, 2023, 4:41 p.m. UTC
The Timetravel socket can be read from the SIGIO handler, which may
catch a RUN message that was expected to be received in the wait loop.

It can happen that the device simulation only "created" one interrupt,
but an uncertain (i.e., hard to predict by the calendar/device
simulation) number of SIGIOs are emitted, with each SIGIO interrupt
producing a GET message. The calendar might try to send a RUN message
after the first GET message, which is then processed in the SIGIO
handler, waiting for the response to a subsequent GET message. However,
time is advanced by the received RUN message. Typically, this doesn't
pose problems because the interrupt implies that the next RUN message
should be at the current time (as sent by the calendar with the GET
message). But there are corner cases, such as simultaneous other
interrupts, which may desynchronize the current time in the UML instance
from the expected time from the calendar. Since there is no real use
case for advancing time in the SIGIO handler with RUN messages (in
contrast to UPDATE messages), we logically only expect time to advance
in the idle loop in TT-ext mode. Therefore, with this patch, we restrict
time changes to the idle loop.

Additionally, since both the idle loop and the signal/interrupt handlers
do blocking reads from the TT socket, a deadlock can occur if a RUN
message intended for the idle loop is received in the SIGIO handler. In
this situation, the calendar expects the UML instance to run, but it
actually waits for another message, either in the SIGIO handler (e.g.,
a second interrupt) or in a poll in the idle loop, as the previous
message was handled by the signal handler, which returned execution to
the main loop and ultimately entered the idle loop.

Therefore, this patch also allows checking whether the current RUN
message was handled by the idle loop or by the signal/interrupt handlers
in the ACK of the RUN.

With the information in the ACK of the RUN message, the calendar knows
whether the RUN was answered in a signal handler and can act
accordingly.

Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
 arch/um/kernel/time.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Johannes Berg Nov. 6, 2023, 8:53 p.m. UTC | #1
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> The Timetravel socket can be read from the SIGIO handler, which may
> catch a RUN message that was expected to be received in the wait loop.
> 
> It can happen that the device simulation only "created" one interrupt,
> but an uncertain (i.e., hard to predict by the calendar/device
> simulation) number of SIGIOs are emitted, with each SIGIO interrupt
> producing a GET message. The calendar might try to send a RUN message
> after the first GET message, which is then processed in the SIGIO
> handler, waiting for the response to a subsequent GET message. However,
> time is advanced by the received RUN message. Typically, this doesn't
> pose problems because the interrupt implies that the next RUN message
> should be at the current time (as sent by the calendar with the GET
> message). But there are corner cases, such as simultaneous other
> interrupts, which may desynchronize the current time in the UML instance
> from the expected time from the calendar. Since there is no real use
> case for advancing time in the SIGIO handler with RUN messages (in
> contrast to UPDATE messages), we logically only expect time to advance
> in the idle loop in TT-ext mode. Therefore, with this patch, we restrict
> time changes to the idle loop.
> 
> Additionally, since both the idle loop and the signal/interrupt handlers
> do blocking reads from the TT socket, a deadlock can occur if a RUN
> message intended for the idle loop is received in the SIGIO handler. In
> this situation, the calendar expects the UML instance to run, but it
> actually waits for another message, either in the SIGIO handler (e.g.,
> a second interrupt) or in a poll in the idle loop, as the previous
> message was handled by the signal handler, which returned execution to
> the main loop and ultimately entered the idle loop.
> 
> Therefore, this patch also allows checking whether the current RUN
> message was handled by the idle loop or by the signal/interrupt handlers
> in the ACK of the RUN.
> 
> With the information in the ACK of the RUN message, the calendar knows
> whether the RUN was answered in a signal handler and can act
> accordingly.
> 

This is going to take a bit more review cycles, tbh, still processing it
and probably need to sleep on it :)

johannes
Johannes Berg Nov. 10, 2023, 6:41 p.m. UTC | #2
So I guess now you reminded me of this thread ...

> The Timetravel socket can be read from the SIGIO handler, which may
> catch a RUN message that was expected to be received in the wait loop.
> 
> It can happen that the device simulation only "created" one interrupt,
> but an uncertain (i.e., hard to predict by the calendar/device
> simulation) number of SIGIOs are emitted, with each SIGIO interrupt
> producing a GET message. The calendar might try to send a RUN message
> after the first GET message, which is then processed in the SIGIO
> handler, waiting for the response to a subsequent GET message. However,
> time is advanced by the received RUN message. Typically, this doesn't
> pose problems because the interrupt implies that the next RUN message
> should be at the current time (as sent by the calendar with the GET
> message). But there are corner cases, such as simultaneous other
> interrupts, which may desynchronize the current time in the UML instance
> from the expected time from the calendar. Since there is no real use
> case for advancing time in the SIGIO handler with RUN messages (in
> contrast to UPDATE messages), we logically only expect time to advance
> in the idle loop in TT-ext mode. Therefore, with this patch, we restrict
> time changes to the idle loop.

OK, that mostly makes sense, so far. Although IIUC now you lost the time
update, so you have to recover from that again somehow when you actually
do the RUN (which we don't do from the SIGIO handler).

> Additionally, since both the idle loop and the signal/interrupt handlers
> do blocking reads from the TT socket, a deadlock can occur if a RUN
> message intended for the idle loop is received in the SIGIO handler. In
> this situation, the calendar expects the UML instance to run, but it
> actually waits for another message, either in the SIGIO handler (e.g.,
> a second interrupt) or in a poll in the idle loop, as the previous
> message was handled by the signal handler, which returned execution to
> the main loop and ultimately entered the idle loop.

I'm not sure I understand this part. How can you get a RUN message when
someone else is still running and sending you interrupts? Or are you
sending interrupts (with time travel handlers because of patch 3?) from
_outside_ of the simulation?! I guess that'd be a case of "don't do that
then"? :)

Or ... I'm still wrapping my head around the system you built ... are
you saying that interrupts are actually delayed and we have something
like this:

Participant A   --- message ----------------> UML (socket)
Participant A   --- WAIT --> calendar
                                  SIGIO ----> UML
calendar        <-- GET --------------------- UML (time_travel_handler)
calendar        --- RUN --------------------> UML
                                              (reads "RUN" instead
                                               of GET response)

But then that's again just because "you're doing it wrong", Participant
A should be waiting for an ACK message? At least that was the intent.
You're not using it that way with the line interrupts.

But honestly, I think that's just another reason not to do things the
way you're doing in patches 2 and 3.

> Therefore, this patch also allows checking whether the current RUN
> message was handled by the idle loop or by the signal/interrupt handlers
> in the ACK of the RUN.
> 
> With the information in the ACK of the RUN message, the calendar knows
> whether the RUN was answered in a signal handler and can act
> accordingly.

TBH, I don't like this at all. If I'm not completely off with my
analysis above (or it's a similar situation), this just puts workaround
requirements on the calendar because you don't synchronise communication
in _other_ parts of the system properly.

In the simulation that we had intended to build, the exact timing of the
SIGIO wouldn't matter because UML would _always_ be in WAIT state when
getting SIGIO, and would always have to fully process the SIGIO (and
send an ACK on the respective socket) before the other side continues
and we can get to any other part of the system.

That's a pretty fundamental rule of the system because it allows us to
reason how information flows, which part can be doing what at which
time, and (mostly) ignore the inherently asynchronous nature of today's
computing in the system, since things are always locally serialised.

You broke that rule, and the system is coming apart at the seams. You
already found - as I pointed out in the other email - that you now have
to some assumptions on the sender side of the message about how the
interrupt processing will be done on the receiver (although that
assumption is perhaps not more than "may be immediate so I need to
return to the calendar").

You also found that when you broke that rule, you got other asynchronous
things going on, and now need to handle that suddenly you can get a RUN
message inside a SIGIO handler, because that's async and nothing waits
for it to have finished.

I honestly don't believe that removing this rule will do the system good
in the long term. You'll almost certainly find other corner cases - we
still find corner cases in the system _with_ the rule, and that's a
_much_ simpler system.

I also tend to think that the more complex system (without that rule)
will be much harder to optimise. For example, the shared memory table of
run requests (yes yes, I'll send it in a bit), you don't even have a GET
message in time_travel_add_irq_event() any more. In fact you have no
messages at all there! That maybe makes this part easier? But it also
means now you have to reason about why that's actually still OK to not
get the RUN message where you previously expected it, etc.


Or if we consider further optimisations we've been thinking about (but
not implemented yet) - one of them is to use signals for wait/run so we
can hand off running between processes without going through the
calendar (with the calendar still maintaining the new distributed
protocol on behalf of clients that don't participate in a distributed
protocol yet.) Not having the ACK would actually _defeat_ this
optimisation, because you wouldn't actually know who to hand it off to,
only the calendar would be able to do the handoff, since it's the only
one communicating with everyone. With the ACK you know that as soon as
you have the ACK, the shared table must be updated, and you can make the
hand-off decision locally instead of going to the calendar.


So ... I think all of this is an even longer-winded way than the last
discussion over on patch 3 of saying I don't like your simulation model
that doesn't require an ACK, and I don't think we should support it.


FWIW, it really shouldn't be that hard to replace the 'lines' with
virtio serial consoles or so. We've actually done that, we have a
device-side implementation of this:

#define VIRTIO_ID_RPROC_SERIAL          11 /* virtio remoteproc serial link */

(which is really just a virtio console, shows up as /dev/vport*p0 inside
the virtual system, rather than /dev/ssl* or whatever you have there
now.

With the usfstl vhost device abstraction [1] it's less than 150 LOC, and
much of it is boilerplate/struct filling...

[1] https://github.com/linux-test-project/usfstl/blob/main/include/usfstl/vhost.h


Although our implementation only supports a single connection and is
built into whatever generates the messages, if you want multiple UML
instances to communicate you'd have to have an implementation that has a
socket for each and a config file which is connected to where.

johannes
diff mbox series

Patch

diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index fddd1dec27e6..ff5ef75bbb94 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -94,7 +94,10 @@  static void time_travel_handle_message(struct um_timetravel_msg *msg,
 	case UM_TIMETRAVEL_ACK:
 		return;
 	case UM_TIMETRAVEL_RUN:
-		time_travel_set_time(msg->time);
+		// the caller time_travel_handle_message will do the appropriate time advance
+		// return here 0, if we got RUN, but are not in idle/ext_wait, to
+		// actually do anything with the run
+		resp.time = (mode != TTMH_READ) ? 1 : 0;
 		break;
 	case UM_TIMETRAVEL_FREE_UNTIL:
 		time_travel_ext_free_until_valid = true;
@@ -238,7 +241,7 @@  static void time_travel_ext_wait(bool idle)
 	 */
 	while (msg.op != UM_TIMETRAVEL_RUN)
 		time_travel_handle_message(&msg, idle ? TTMH_IDLE : TTMH_POLL);
-
+	time_travel_set_time(msg.time);
 	time_travel_ext_waiting--;
 
 	/* we might request more stuff while polling - reset when we run */