diff mbox series

[RFC,03/11] um: Use a simple time travel handler for line interrupts

Message ID 20231103-bb-timetravel-patches-v1-3-e2c68efcf664@uni-rostock.de
State Changes Requested
Headers show
Series Several Time Travel Mode Enhancements | expand

Commit Message

Benjamin Beichler Nov. 3, 2023, 4:41 p.m. UTC
This change permits interrupts on serial lines in time travel mode,
especially in external mode. However, these interrupts are processed
with the simple handler that does not provide any acknowledgment.

Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
 arch/um/drivers/line.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Johannes Berg Nov. 6, 2023, 8:26 p.m. UTC | #1
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> This change permits interrupts on serial lines in time travel mode,
> especially in external mode. However, these interrupts are processed
> with the simple handler that does not provide any acknowledgment.
> 

Yeah... we had this discussion in the other thread.

At least in my mental model this is broken because the sender of the
event basically has to prepare the calendar for it happening, which
feels ... odd.

But I can see where you're coming from, and switching to virtio console
that does it all right might be tricky, so ... perhaps this still makes
some sense.

I'd still think we should put a warning into the code somewhere when you
actually use line interrupts though?

johannes
Benjamin Beichler Nov. 10, 2023, 4:53 p.m. UTC | #2
Am 06.11.2023 um 21:26 schrieb Johannes Berg:
> ________________________________
>
> On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
>> This change permits interrupts on serial lines in time travel mode,
>> especially in external mode. However, these interrupts are processed
>> with the simple handler that does not provide any acknowledgment.
>>
> Yeah... we had this discussion in the other thread.
>
> At least in my mental model this is broken because the sender of the
> event basically has to prepare the calendar for it happening, which
> feels ... odd.
Actually, for me, this would make kind of sense. Why couldn't we be more 
explicit in the time travel protocol to the calendar about interrupts?
We could use the ID from the start-msg to identify um instances and 
tell, in an additional msg, that we are going to trigger an interrupt at 
the current time to that instance from the device simulation.
Another way would be, to serve such information (if the device 
simulation really need it) over the facilities we already established. 
Why not send the ACK as MSG_OOB or ancillary data?
>
> But I can see where you're coming from, and switching to virtio console
> that does it all right might be tricky, so ... perhaps this still makes
> some sense.
>
> I'd still think we should put a warning into the code somewhere when you
> actually use line interrupts though?
>
> johannes
>
Benjamin
Johannes Berg Nov. 10, 2023, 5:16 p.m. UTC | #3
On Fri, 2023-11-10 at 17:53 +0100, Benjamin Beichler wrote:
> > 
> > At least in my mental model this is broken because the sender of the
> > event basically has to prepare the calendar for it happening, which
> > feels ... odd.

> Actually, for me, this would make kind of sense.

I'm not sure I agree, I guess. To me, the sender is just telling someone
else "here's something going on I want you to look at". That doesn't
really mean the receiver needs to actually handle it "right away". Maybe
the receiver implements a model where it's modelling an interrupt
latency (which _would_ be more realistic), so it only handles the
message some (small amount of) time later. Maybe it's in a polling model
right now and has disabled interrupt sources, and just wakes every so
often to check for messages. Maybe something else.

So I don't really agree that baking a "must be scheduled immediately"
into any protocol really makes sense.

> Why couldn't we be more 
> explicit in the time travel protocol to the calendar about interrupts?
> We could use the ID from the start-msg to identify um instances and 
> tell, in an additional msg, that we are going to trigger an interrupt at 
> the current time to that instance from the device simulation.

But see above - I basically just don't think the sender of the event is
the right position to know that information.

> Another way would be, to serve such information (if the device 
> simulation really need it) over the facilities we already established. 
> Why not send the ACK as MSG_OOB or ancillary data?

Don't think I understand that. Which "facilities" are you referring to
here?

johannes
Benjamin Beichler Nov. 13, 2023, 11:46 a.m. UTC | #4
Am 10.11.2023 um 18:16 schrieb Johannes Berg:
> ________________________________
>
> On Fri, 2023-11-10 at 17:53 +0100, Benjamin Beichler wrote:
>>> At least in my mental model this is broken because the sender of the
>>> event basically has to prepare the calendar for it happening, which
>>> feels ... odd.
>> Actually, for me, this would make kind of sense.
> I'm not sure I agree, I guess. To me, the sender is just telling someone
> else "here's something going on I want you to look at". That doesn't
> really mean the receiver needs to actually handle it "right away". Maybe
> the receiver implements a model where it's modelling an interrupt
> latency (which _would_ be more realistic), so it only handles the
> message some (small amount of) time later. Maybe it's in a polling model
> right now and has disabled interrupt sources, and just wakes every so
> often to check for messages. Maybe something else.
>
> So I don't really agree that baking a "must be scheduled immediately"
> into any protocol really makes sense.
I think you are mixing here some levels of abstraction. Some entity in 
the "system" needs to
react to an interrupt immediately. In the real world, this is mostly 
done by some kind of interrupt controller
which may latch the interrupt (if they are deactivated from CPU side). 
Of course some more sophisticated
bus protocols (i.e. virtio/vhost in this case) can also signal the 
device to do no interrupts, but in the most
basic case, e.g. a button at a GPIO pin, the device could not handle 
that. Most of the interrupt abstraction
in um is realized via signals, but some more seems (IMHO) to be needed 
from the tt-protocol, if we have external
device simulations. My suggestion was to add the virtual interrupt line 
state change as a message to the calendar.

Of course, you could argue, that the device simulation needs to be 
clever enough (so distributing this task of the
interrupt controller), but even therefore the device simulation need 
some more information to reflect delayed interrupts,
deactivated interrupts and so on.

What confuses me is, that you explicitly use the immediate synchronous 
response model on the interrupts from the vhost/virtio
driver, but you seem to refuse this on other drivers.
>
>> Why couldn't we be more
>> explicit in the time travel protocol to the calendar about interrupts?
>> We could use the ID from the start-msg to identify um instances and
>> tell, in an additional msg, that we are going to trigger an interrupt at
>> the current time to that instance from the device simulation.
> But see above - I basically just don't think the sender of the event is
> the right position to know that information.
Mhh if I apply that to the vhost/virtio simulations, that is, what is 
implicitly modeled by
the exchanged file descriptors and vhost/virtio protocol state.

We may change the whole tt-ext-mode to only accept vhost/virtio drivers, 
but that needs to
be stated explicitly in the documentation. I'm not really in favor of 
that, but if it is documented
it is easier to understand.
>
>> Another way would be, to serve such information (if the device
>> simulation really need it) over the facilities we already established.
>> Why not send the ACK as MSG_OOB or ancillary data?
> Don't think I understand that. Which "facilities" are you referring to
> here?
I hadn't researched that on Friday, but we could use a custom control 
msg (cmsg) on the unix sockets
attached to the (serial) line devices. The device simulation can then 
wait on that ancillary data for
synchronous interrupt handling, as you have proposed for vhost/virtio. 
My first idea was to use
MSG_OOB, which do not exit on Unix-Domain-Sockets. If someone does not 
use uds as transport
we may give a warning.


Benjamin
Johannes Berg Nov. 13, 2023, 9:22 p.m. UTC | #5
On Mon, 2023-11-13 at 12:46 +0100, Benjamin Beichler wrote:
> > So I don't really agree that baking a "must be scheduled immediately"
> > into any protocol really makes sense.

> I think you are mixing here some levels of abstraction. Some entity in 
> the "system" needs to react to an interrupt immediately.

Don't think I agree, that's not really necessary.

> In the real world, this is mostly 
> done by some kind of interrupt controller
> which may latch the interrupt (if they are deactivated from CPU side). 

Even that itself may just be polling some lines, theoretically.

But that's mostly theoretical - if you wanted to simulate some interrupt
latency, you'd simply not react 'immediately' but rather some amount of
time later; the interrupt controller and all that is basically all the
same piece of code.

> Of course some more sophisticated
> bus protocols (i.e. virtio/vhost in this case) can also signal the 
> device to do no interrupts, but in the most
> basic case, e.g. a button at a GPIO pin, the device could not handle 
> that. 

Not sure I follow - how's that related to the protocol? Either you get
an interrupt or you don't, but once you do get it, there's also nothing
that says you will handle it immediately.


> Most of the interrupt abstraction in um is realized via signals,

Which is actually unnecessary for this kind of simulation since the UML
instance must be in idle by definition for someone else to be running
and sending it 'real' interrupts. We could as well just directly build
an epoll loop there, but that's more complicated surgery in the innards
of UML's interrupt model that isn't really needed.

Without time-travel, it _is_ needed since you want to have 'real'
interrupts that actually stop a process from running and you get into
the device driver and all.

> but some more seems (IMHO) to be needed 
> from the tt-protocol, if we have external
> device simulations. 

Well, I just said above _less_ is actually needed ;-)

The _more_ that's needed is to actually ensure the "only one thing is
running" part, which I've built into the system via acknowledgement (or
perhaps requirement for those).

> My suggestion was to add the virtual interrupt line 
> state change as a message to the calendar.

Yes but it doesn't work unless the other side _already_ knows that it
will happen, because it broke the rule of "only one thing is running".

Again you can argue that it's fine to return to the calendar even if the
receiving process won't actually do anything, but because the receiving
process is still thinking about it, you end up with all the contortions
you have to do in patch 4 and 7 ... because even the _calendar_ doesn't
know when the request will actually come to it.

Perhaps one way of handling this that doesn't require all those
contortions would be for the sender of the event to actually
_completely_ handle the calendar request for the interrupt on behalf of
the receiver, so that the receiver doesn't actually do _anything_ but
mark "this IRQ is pending" in this case. Once it actually gets a RUN
message it will actually start running since it assumes that it will not
get a RUN message without having requested a calendar entry. If the
calendar entry were already handle on its behalf, you'd not need the
request and therefore not need the special handling for the request from
patch 4.
You'd need a different implementation of patches 2/3, and get rid of
patches 4, 6, 7 and 8.


So ... thinking about that more, the issue isn't so much that you're
making an assumption that the interrupt should happen right away, it's
that only some parts of your system are making that assumption (the
sender of the event that causes the interrupt), and then you're leaving
the rest of the system to catch up with something that actually gets
processed _asynchronously_, causing all kinds of trouble.

> Of course, you could argue, that the device simulation needs to be 
> clever enough (so distributing this task of the
> interrupt controller), but even therefore the device simulation need 
> some more information to reflect delayed interrupts,
> deactivated interrupts and so on.
> 
> What confuses me is, that you explicitly use the immediate synchronous 
> response model on the interrupts from the vhost/virtio
> driver, but you seem to refuse this on other drivers.

I didn't mean to _refuse_ it, and actually you'll note that I made the
usfstl device abstraction able to trivially implement interrupt latency
by setting a single parameter (named "interrupt_latency" :-) ), so it's
been something that was definitely on my mind, but so far no testing
scenario has needed it, and the simulation runs faster without it ;-)

So no, I'm not actually trying to tell you that you must implement
interrupt latency. Sorry I came across that way.

I was just trying to use the interrupt latency as a reasoning for why
the recipient should be in control, but now having thought about it
more, I don't even think the recipient _needs_ to be in control, just
that one side needs to be _fully_ in control, and the model you've built
has both sides in control and racing against each other (and in the case
of UML even with itself internally processing the signal.)

> > > Why couldn't we be more
> > > explicit in the time travel protocol to the calendar about interrupts?
> > > We could use the ID from the start-msg to identify um instances and
> > > tell, in an additional msg, that we are going to trigger an interrupt at
> > > the current time to that instance from the device simulation.
> > But see above - I basically just don't think the sender of the event is
> > the right position to know that information.

Yeah you know what, I partially retract that statement. I still think
kind of the right way to think about it is to have the recipient in
control, but there are indeed things like the line interrupts where
perhaps it's better to put the sender in control. It's perhaps not
trivial since you might want to do a few writes? You'd have to combine
that into making a single calendar interrupt processing entry on behalf
of the receiver.


> Mhh if I apply that to the vhost/virtio simulations, that is, what is 
> implicitly modeled by the exchanged file descriptors and vhost/virtio
> protocol state.

Not sure I follow? The whole file descriptors and all are just
establishing communication mechanisms?

> We may change the whole tt-ext-mode to only accept vhost/virtio drivers, 
> but that needs to
> be stated explicitly in the documentation. I'm not really in favor of 
> that, but if it is documented
> it is easier to understand.

See above. I guess we don't have to, if we put a bit more requirement on
the sender.

> > Don't think I understand that. Which "facilities" are you referring to
> > here?
> I hadn't researched that on Friday, but we could use a custom control 
> msg (cmsg) on the unix sockets
> attached to the (serial) line devices. The device simulation can then 
> wait on that ancillary data for
> synchronous interrupt handling, as you have proposed for vhost/virtio. 
> My first idea was to use
> MSG_OOB, which do not exit on Unix-Domain-Sockets. If someone does not 
> use uds as transport
> we may give a warning.

Not sure that really works, what would you transfer in the other
direction? Or send an fd and then wait for it to get a message on that
fd or something?

johannes
Johannes Berg Nov. 13, 2023, 9:57 p.m. UTC | #6
On Mon, 2023-11-13 at 22:22 +0100, Johannes Berg wrote:

> > My suggestion was to add the virtual interrupt line 
> > state change as a message to the calendar.
> 
> Yes but it doesn't work unless the other side _already_ knows that it
> will happen, because it broke the rule of "only one thing is running".
> 
> Again you can argue that it's fine to return to the calendar even if the
> receiving process won't actually do anything, but because the receiving
> process is still thinking about it, you end up with all the contortions
> you have to do in patch 4 and 7 ... because even the _calendar_ doesn't
> know when the request will actually come to it.
> 
> Perhaps one way of handling this that doesn't require all those
> contortions would be for the sender of the event to actually
> _completely_ handle the calendar request for the interrupt on behalf of
> the receiver, so that the receiver doesn't actually do _anything_ but
> mark "this IRQ is pending" in this case. Once it actually gets a RUN
> message it will actually start running since it assumes that it will not
> get a RUN message without having requested a calendar entry. If the
> calendar entry were already handle on its behalf, you'd not need the
> request and therefore not need the special handling for the request from
> patch 4.
> You'd need a different implementation of patches 2/3, and get rid of
> patches 4, 6, 7 and 8.
> 

So maybe for my future self and all the bystanders, I'll try to explain
how I see the issue that causes patches 4, 6, 7 and 8 to be needed:


Actors: UML, ES (Event Sender), Cal (Calendar)

In UML, we're using a line-based protocols as in drivers/line.c
The ES is connected to that line and can send it messages.
(The same scenario would probably apply the other way around, in theory,
so we might need to have a way to implement this with roles reversed.)

Let's start by sending a message to UML, that's the whole point of this
discussion:

    ES ---> UML
    // so far nothing happened, the host kernel queued SIGIO to UML
    ES: continues running until idle and returns to Cal

Now we already have a few options, and I don't know which one got
implemented.

A. Perhaps ES also told Cal to expect that UML is running:
    Cal ---RUN--> UML

B. Perhaps Cal just asks everyone what their current request is
    Cal ---GET--> UML

C. Perhaps something else?

In any case, we already see a first race here, let's say A happened, and
now:

    ! SIGIO -> UML
    UML calls simple_timetravel_handler -> time_travel_add_irq_event
    UML ---REQUEST--> Cal
    UML <----RUN----- Cal
    // UML really confused -> patch 4

Or maybe:

    UML <-- RUN ----- CAL
    UML: starts running, starts manipulating time event list
    ! SIGIO -> UML
    UML calls simple_timetravel_handler -> time_travel_add_irq_event
        manipulates time event list
    // UML corrupts data structures -> patch 8

Or:

    UML <-- RUN ------ CAL
    UML runs only really briefly, sends new request,
        time_travel_ext_prev_request_valid = true
    ! SIGIO -> UML
    UML calls simple_timetravel_handler -> time_travel_add_irq_event
        time_travel_ext_prev_request_valid == true
    // no new request, Cal confused -> patch 6, and maybe 7


Not sure if that's really all completely accurate, and almost certainly
there are more scenarios that cause issues?

But in all cases the root cause is the asynchronous nature of doing
this; partially internally in UML (list protection etc.) and partially
outside UML (calendar doesn't know what's supposed to happen next until
async SIGIO is processed.)

In contrast, with virtio, you get

    ES -- event ----> UML
    // so far nothing happened, the host kernel queued SIGIO to UML
    // ES waits for ACK
    ! SIGIO -> UML
    UML calls simple_timetravel_handler -> time_travel_add_irq_event
    UML ---REQUEST--> Cal
    UML <--ACK------- Cal
    ES <---ACK------- UML
    ES: continues running until idle and returns to Cal
    Cal: schedules next runnable entry, likely UML

Note how due to the "// ES waits for ACK" nothing bad happens, because
even if the host doesn't schedule the SIGIO immediately to the UML
process, or that needs some time, _nothing_ else in the simulation makes
progress either until UML has.

IMNSHO that's the far simpler model than taking into account all the
potential races of which I outlined some above and trying to work with
them.



Now then I went on to say that we could just basically make it _all_ the
sender's responsibility on behalf of the receiver, and then we'd get


    ES -- event ----> UML
    // so far nothing happened, the host kernel queued SIGIO to UML
    ES -- add UML --> Cal
    ES <--- ACK ----- Cal
    ES: continues running u8ntil idle and returns to Cal
    Cal: schedules next runnable entry, likely UML

Now the only thing we need to handle in terms of concurrency are two
scenarios that continue from that scenario:

1.
    Cal --- RUN ----> UML
    ! SIGIO -> UML

and

2.
    ! SIGIO -> UML
    Cal --- RUN ----> UML

Obviously 2 is what you expect, but again you can even have races like
in 1 and the SIGIO can happen at roughly any time.

I'm tempted to say the only reasonable way to handle that would be to
basically not do _anything_ in the SIGIO but go poll all the file
descriptors this might happen for upon receiving a RUN message.

We could even go so far as to add a new RUN_TRIGGERED_BY_OTHER message
that would make it clear that someone else entered the entry into the
calendar, and only then epoll for interrupts for this message, but I'm
not sure that's needed or makes sense (if you were going to wake up
anyway at that time, you'd still handle interrupts.)

In any case, that feels like a _far_ more tractable problem, and the
only concurrency would be between running and the SIGIO, where the SIGIO
basically no longer matters.



However ... if we consider this the other way around, we can actually
see that it's much harder to implement than it sounds - now suddenly
instead of having to connect the sockets with each other etc. you also
have to give the implementation knowledge about who is on the other
side, and how to even do a calendar request on their behalf! We don't
even have a protocol for doing a request on someone else's behalf today
(though with the shared memory you could just enter one), and actually
changing the protocol to support it might even be tricky since it
doesn't have much space for extra data in the messages ... maybe if we
split 'op' or use 'seq' for it... But you'd need to have a (pre-
assigned?) ID for each, or have them exchange IDs over something,
ideally the socket that's actually used for communication, but that
limits the kinds of sockets you can use, etc. So it's by no means
trivial either. And I understand that. I just don't think making the
protocol and implementations handle all the races that happen when you
start doing things asynchronously is really a good idea.

johannes
Benjamin Beichler Nov. 20, 2023, 1:42 p.m. UTC | #7
Am 13.11.2023 um 22:57 schrieb Johannes Berg:
>
> So maybe for my future self and all the bystanders, I'll try to 
> explain how I see the issue that causes patches 4, 6, 7 and 8 to be 
> needed: 
--- snip ---
Hello Johannes,
Sorry for my delayed response to your detailed email. I find it quite 
hard to discuss such a complex topic via mailing lists without it 
sounding impolite.

Maybe also as some basis for my reasoning: I'm quite familiar with 
discrete event-based simulation, with a special focus on SystemC 
simulation.There are
some common (old) principles of DES that map perfectly to the time 
travel mode, so my mental model has always been shaped around those 
constraints.
- Every instance/object/element in a DES has some inputs/signals that 
activate it either at the current moment or at some later point in time.
- Every instance/object/element in a DES will run, starting from its 
activation time until it has "finished", without advancing the global 
simulation time.
- Events (or activations) occurring at the exact same point in 
simulation time would happen in parallel. Therefore, the actual order of 
execution in a
sequential simulator is more or less unimportant (though some 
implementations may require a deterministic order, ensuring simulations 
with the same
parameters yield the exact same output, e.g., SystemC). The parallel 
execution of events at the same time in the simulation is an 
optimization that
may be implemented, but is not always utilized.


After reviewing all your analyses, I believe the most significant 
difference in my implementation lies in the last point. I did not 
enforce the order
of message processing when they occur exactly at the same simulation 
time. Consequently, I modified my implementation to eliminate the 
synchronous
(and, to be honest, quite hacky) read operation with special handling on 
the timetravel socket. Instead, I implemented a central epoll routine, which
is called by my master simulation kernel (NS3). My rationale was that if 
I haven't received the request from the TT-protocol, I cannot advance time.


In conjunction with running not a single UML instance but many (my 
current use case consists of a 10-node pair == 20 UML nodes), this can 
create all
sorts of race/deadlock conditions, which we have identified.
For me, the ACK of vhost/virtio seemed somewhat redundant, as it 
provides the same information as the TT-protocol, assuming my device 
simulation
resides within the scheduler. I must admit my assumption was incorrect, 
primarily because the implementation of the TT-protocol in the kernel is
somewhat fragile and, most importantly, your requirement that 
***nothing***is allowed to interfere (creating SIGIOs) at certain states 
of a time-traveling
UML instance is not well-documented. We need the ACK not for an 
acknowledgment of registering the interrupt but to know that we are 
allowed to send the
next TT-msg. This very tight coupling of these two protocols does not 
appear to be the best design or, at the very least, is poorly documented.
The prohibition of interference in certain TT-states also led to my 
second mistake. I relaxed my second DES requirement and allowed 
interrupts when
the UML instance is in the RUN-state. This decision was based on the 
impression that UML was built to work this way without TT, so why should it
break when in TT-Mode (which you proved was wrong). Whether this is 
semantically reasonable or not can be questioned, but it triggered technical
problems with the current implementation.

With this realization, I tend to agree that maybe the whole patches to 
ensure thread-safe (or reentrant-safe access) of the event list might be 
dropped.
Still, we should ensure that the SIGIO is simply processed synchronously 
in the idle loop. This aligns with my last DES constraint: since everything
happens at the same moment in simulation time, we do not need "real" 
interrupts but can process interrupts (SIGIOs) later (but at the same 
simulation time).
I think this approach only works in ext or cpu-inf mode and may be 
problematic in "normal" timetravel mode. I might even consider dropping 
the signal handler,
which marks the interrupts pending, and process the signals with a 
signalfd, but that is, again, only an optimization.
Additionally, to address the interrupt acknowledgment for the serial 
line, I'd like to propose this: why not add an extra file descriptor in the
command line, which is something the kernel could write to, such as an 
eventfd or a pipe, to signal the acknowledgment of the interrupt. For 
example, the
command line changes to ssl0=fd:0,fd:1,fd:3. If somebody uses the serial 
line driver with timetravel mode but without that acknowledgment fd, we 
can emit
a warning or an error.

I believe all these changes should work well with the shared memory 
optimization and should make the entire time travel ext protocol a bit 
more robust,
easier to use, and harder to misuse. ;-)

However, even after the lengthy discussion on "when" interrupts should 
be processed, I still hold the opinion that the response to raising an 
interrupt
should be immediate to the device simulation and not delayed in 
simulation time. This only makes the device simulation harder without 
real benefit. If
you want to delay the interrupt handling (ISR and so on), that's still 
possible and in both situations highly dependent on the UML 
implementation. If
we want to add an interrupt delay, we need to implement something in UML 
anyway. If you want to delay it in the device driver, you can also always do
it, but you are not at the mercy of some hard-to-determine extra delay 
from UML.

Overall, if you think that makes sense, I could start on some patches, 
or perhaps you feel more comfortable doing that.


Benjamin
Johannes Berg Nov. 24, 2023, 2:53 p.m. UTC | #8
Hi,

> Sorry for my delayed response to your detailed email. I find it quite 
> hard to discuss such a complex topic via mailing lists without it 
> sounding impolite.


Heh, no worries about either :)


> Maybe also as some basis for my reasoning: I'm quite familiar with 
> discrete event-based simulation, with a special focus on SystemC 
> simulation.


Yeah OK, fair. I've only tangentially heard about SystemC :)


> There are
> some common (old) principles of DES that map perfectly to the time 
> travel mode, so my mental model has always been shaped around those 
> constraints.
> - Every instance/object/element in a DES has some inputs/signals that 
> activate it either at the current moment or at some later point in
time.
> - Every instance/object/element in a DES will run, starting from its 
> activation time until it has "finished", without advancing the global 
> simulation time.
> - Events (or activations) occurring at the exact same point in 
> simulation time would happen in parallel. Therefore, the actual order
of 
> execution in a
> sequential simulator is more or less unimportant (though some 
> implementations may require a deterministic order, ensuring
simulations 
> with the same
> parameters yield the exact same output, e.g., SystemC). The parallel 
> execution of events at the same time in the simulation is an 
> optimization that
> may be implemented, but is not always utilized.


Sure.


> After reviewing all your analyses, I believe the most significant 
> difference in my implementation lies in the last point. I did not 
> enforce the order
> of message processing when they occur exactly at the same simulation 
> time. 


I'm not sure I fully agree with this - yes there's an ordering aspect to
it wrt. processing events that happen at the same simulation time, but
I'm not even sure that's the worst part of what you saw. Given the use
of SIGIO, the harder part is that you don't even have a guarantee that
an event is processed _at all_ at the same simulation time it was sent.
It might get processed later, at a different simulation time.


> Consequently, I modified my implementation to eliminate the 
> synchronous
> (and, to be honest, quite hacky) read operation with special handling
on 
> the timetravel socket. Instead, I implemented a central epoll routine,
which
> is called by my master simulation kernel (NS3). My rationale was that
if 
> I haven't received the request from the TT-protocol, I cannot advance
time.


Yes, but even then I believe that a SIGIO event could be delayed to be
processed at a later simulation time than it should be.


> In conjunction with running not a single UML instance but many (my 
> current use case consists of a 10-node pair == 20 UML nodes), this can
> create all
> sorts of race/deadlock conditions, which we have identified.


FWIW, I'm typically running 5-6 nodes with a simulated WiFi device each,
so the scale isn't that much different.


> For me, the ACK of vhost/virtio seemed somewhat redundant, as it 
> provides the same information as the TT-protocol, 


Actually, no, I don't agree that it provides the same information. The
ACK provides - to the sender of the event - the acknowledgement that the
event was actually seen. However, I will say that it's actually more a
consequence of UML using SIGIO than a fundamental part of the
simulation. If you were actually able to say "UML can do nothing upon
exiting idle state without checking for events first", you'd probably be
correct. However, that's an incorrect assumption given how UML's
interrupt model is implemented now.

So yes, we could remove the SIGIO model from UML when time-travel=ext or
=infcpu is in use, but that'd be a much more invasive rework.

However, if we'd actually do such a rework, then I believe you'd be
_almost_ correct with that sentence. The only other remaining lack of
information would be an updated free-until value, though you could
assume that the receiver of the event always wants to process it
immediately and update free-until=now from the sender of the event.


> assuming my device simulation
> resides within the scheduler.


I'm not sure that really makes a big difference. You could always set it
up in a way that the sender of the event causes free-until=now and
returns to the scheduler, though the scheduler would then have to
actually ask everyone for their current request [1] if it doesn't have
information about who an event was sent to.

[1] and each participant in the simulation would have to check for
pending events before answering such a question


> I must admit my assumption was incorrect, 
> primarily because the implementation of the TT-protocol in the kernel
is
> somewhat fragile and, most importantly, your requirement that 
> ***nothing***is allowed to interfere (creating SIGIOs) at certain
states 
> of a time-traveling
> UML instance is not well-documented. 


I guess I don't agree with it being fragile, but that's OK :)

You make it sound like that's an inherent fault of the protocol, when
really it's a consequence of the distributed scheduling model, IMHO.


And I think maybe this is also where SystemC-based intuition fails,
because things are distributed here.


> We need the ACK not for an 
> acknowledgment of registering the interrupt but to know that we are 
> allowed to send the
> next TT-msg. This very tight coupling of these two protocols does not 
> appear to be the best design or, at the very least, is poorly
documented.


I think you're mixing up the design and the implementation a bit. We do
need the ACK for saying that the interrupt was actually registered,
because otherwise - given SIGIO - we have no chance to know it was
processed synchronously. But I think I've described the possible
variants of the model above already.

If you didn't use SIGIO and processed events and TT-msgs in the same
mainloop, you'd not have to worry about the states, and that's true of
all of my device implementations, see e.g. network in my scheduler. But
that still doesn't get rid of the requirement for ACKs given how we want
to have a free-until optimisation (and thus a form of distributed
scheduling.)


> The prohibition of interference in certain TT-states also led to my 
> second mistake. I relaxed my second DES requirement and allowed 
> interrupts when the UML instance is in the RUN-state. 

I tend to think this was also another bug though, as I described in my
other email this can lead to events not even registering at the same
simulation time they were sent at, due to the async nature of SIGIO and
lack of polling when exiting WAIT state.


> This decision was based on the 
> impression that UML was built to work this way without TT, so why
should it
> break when in TT-Mode (which you proved was wrong). Whether this is 
> semantically reasonable or not can be questioned, but it triggered
technical
> problems with the current implementation.
> 
> With this realization, I tend to agree that maybe the whole patches to
> ensure thread-safe (or reentrant-safe access) of the event list might
be 
> dropped.


OK.


> Still, we should ensure that the SIGIO is simply processed
synchronously 
> in the idle loop.


I think you mean something else, but as written, this is wrong. You
cannot "process SIGIO", but you can (otherwise) notice (poll for) the
event that caused SIGIO, or even get rid of the SIGIO entirely.

So I think you mean "all events are processed synchronously in or on
exit from the idle loop", and then I tend to agree, but that's a pretty
big departure from how interrupt processing works in UML normally.


> This aligns with my last DES constraint: since everything
> happens at the same moment in simulation time, we do not need "real" 
> interrupts but can process interrupts (SIGIOs) later (but at the same 
> simulation time).


(again, not SIGIO, but events on FDs)

Well, it's a trade-off. Yes we could do this, but the only advantage is
getting rid of the ACK messages, which are also needed for the
distributed scheduling model.


In a SystemC model I believe everything happens in a single process, so
sending an event basically can influence the scheduler already? Or
perhaps "scheduling" in that model doesn't actually just consist of
picking the next task from an existing list, but having each task check
for events first?

Either way, I'm not sure this works in the distributed model here.


> I think this approach only works in ext or cpu-inf mode and may be 
> problematic in "normal" timetravel mode.

Yes, for sure, 'normal' time-travel is not suitable for this kind of
simulation, and vice versa.


> I might even consider dropping  the signal handler,
> which marks the interrupts pending, and process the signals with a 
> signalfd, but that is, again, only an optimization.


No, if you want to get away from the ACK requirement, you _have_ to drop
the signal handler, or at least make it do nothing, and understand that
events were queued in a different way, synchronously. The async nature
of the signal handlers causes all the problems we've been discussing -
unless, as I have done, you always wait for them to finish first with
the ACK requirement.


> Additionally, to address the interrupt acknowledgment for the serial 
> line, I'd like to propose this: why not add an extra file descriptor
in the
> command line, which is something the kernel could write to, such as an
> eventfd or a pipe, to signal the acknowledgment of the interrupt. For 
> example, the
> command line changes to ssl0=fd:0,fd:1,fd:3. If somebody uses the
serial 
> line driver with timetravel mode but without that acknowledgment fd,
we 
> can emit a warning or an error.


Sure, that works. But above you were arguing against the ACKs and now
you want to implement them? Which is it? ;-)


> I believe all these changes should work well with the shared memory 
> optimization and should make the entire time travel ext protocol a bit
> more robust, easier to use, and harder to misuse. ;-)

I'm not sure which changes really are left?

Adding the ACKs to the serial port, sure, that makes sense to me.


Getting rid of the ACKs overall, dunno, maybe it's possible but it
requires a different scheduling model, where _any_ external event makes
the sender (which was running) give up its free-until without knowing
that was needed [2], and then the scheduler has to ask everyone to
process pending events before actually picking the next task to run.
This was a bit implicitly done by your scheduling request message, so
that at that point your UML had time to process events. I think it was
still racy though since it was still SIGIO based, which has no ordering
guarantee vs. the read from the TT socket.

[2] though you argue below it's always needed


> However, even after the lengthy discussion on "when" interrupts should
> be processed, I still hold the opinion that the response to raising an
> interrupt
> should be immediate to the device simulation and not delayed in 
> simulation time. This only makes the device simulation harder without 
> real benefit. If
> you want to delay the interrupt handling (ISR and so on), that's still
> possible and in both situations highly dependent on the UML 
> implementation. If
> we want to add an interrupt delay, we need to implement something in
UML 
> anyway. If you want to delay it in the device driver, you can also
always do
> it, but you are not at the mercy of some hard-to-determine extra delay
> from UML.

I think this is really completely orthogonal to the other discussion
apart from what I wrote at [2] above, but I don't necessarily think that
it always makes sense to process all events immediately, or to even
schedule for that. It doesn't even make the implementation harder to
request the time slot for handling the event at now + epsilon instead of
now. But anyway, no need to get into this.

> Overall, if you think that makes sense, I could start on some patches,
> or perhaps you feel more comfortable doing that.

I'm honestly not sure what you had in mind now, apart from the ack-fd on
serial?

johannes
diff mbox series

Patch

diff --git a/arch/um/drivers/line.c b/arch/um/drivers/line.c
index b98545f3edb5..3cda0ae41824 100644
--- a/arch/um/drivers/line.c
+++ b/arch/um/drivers/line.c
@@ -260,9 +260,9 @@  int line_setup_irq(int fd, int input, int output, struct line *line, void *data)
 	int err;
 
 	if (input) {
-		err = um_request_irq(UM_IRQ_ALLOC, fd, IRQ_READ,
+		err = um_request_irq_tt(UM_IRQ_ALLOC, fd, IRQ_READ,
 				     line_interrupt, 0,
-				     driver->read_irq_name, data);
+				     driver->read_irq_name, data, simple_timetravel_handler);
 		if (err < 0)
 			return err;
 
@@ -270,9 +270,9 @@  int line_setup_irq(int fd, int input, int output, struct line *line, void *data)
 	}
 
 	if (output) {
-		err = um_request_irq(UM_IRQ_ALLOC, fd, IRQ_WRITE,
+		err = um_request_irq_tt(UM_IRQ_ALLOC, fd, IRQ_WRITE,
 				     line_write_interrupt, 0,
-				     driver->write_irq_name, data);
+				     driver->write_irq_name, data, simple_timetravel_handler);
 		if (err < 0)
 			return err;