diff mbox series

[1/4] um: irqs: process outstanding IRQs when unblocking signals

Message ID 20231018123643.1255813-1-benjamin@sipsolutions.net
State Needs Review / ACK
Headers show
Series [1/4] um: irqs: process outstanding IRQs when unblocking signals | expand

Commit Message

Benjamin Berg Oct. 18, 2023, 12:36 p.m. UTC
From: Benjamin Berg <benjamin.berg@intel.com>

When in time-travel mode, the eventfd events are read even when signals
are blocked as SIGIO still needs to be processed. In this case, the
event is cleared on the eventfd but the IRQ still needs to be fired
later.

We did already ensure that the SIGIO handler is run again. However, the
FDs are configured to be level triggered, so that eventfd will not
notify again. As such, add some logic to mark the IRQ as pending and
process it at the next opportunity.

To avoid duplication, reuse the logic used for the suspend/resume case.
This does not really change anything except for delaying running the
IRQs with timetravel_handler at a slightly later point in time (and
possibly running non-timetravel IRQs that shouldn't happen earlier).
While at it, move marking as pending into irq_event_handler as that is
the more logical place for it to happen.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
---
 arch/um/kernel/irq.c | 78 ++++++++++++++++++++++++++++----------------
 1 file changed, 49 insertions(+), 29 deletions(-)

Comments

Benjamin Beichler Oct. 20, 2023, 9:15 a.m. UTC | #1
Am 18.10.2023 um 14:36 schrieb benjamin@sipsolutions.net:
> From: Benjamin Berg <benjamin.berg@intel.com>
> 
> When in time-travel mode, the eventfd events are read even when signals
> are blocked as SIGIO still needs to be processed. In this case, the
> event is cleared on the eventfd but the IRQ still needs to be fired
> later.
> 
> We did already ensure that the SIGIO handler is run again. However, the
> FDs are configured to be level triggered, so that eventfd will not
> notify again. As such, add some logic to mark the IRQ as pending and
> process it at the next opportunity.
> 
> To avoid duplication, reuse the logic used for the suspend/resume case.
> This does not really change anything except for delaying running the
> IRQs with timetravel_handler at a slightly later point in time (and
> possibly running non-timetravel IRQs that shouldn't happen earlier).
> While at it, move marking as pending into irq_event_handler as that is
> the more logical place for it to happen.
> 
> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>

I also noticed this problem, but after a discussion with Johannes Berg 
about this, we come to the conclusion, that all drivers with interrupts 
need to deal with time travel mode via their own time travel handler. 
What I actually did, was to write a trivial handler for e.g., the serial 
line, which simply only call time_travel_add_irq_event(ev).

I'm not entirely sure, what the right way to do it, although the outcome 
seems to be the same, as with no time advance the actual simulation 
time, when the interrupt is handled is the same.

I actually also have some other significant bug fixes on time travel in 
my tree, but I was too lazy to send them here, are you interested in 
taking a look at them?
Anton Ivanov Oct. 20, 2023, 9:26 a.m. UTC | #2
On 20/10/2023 10:15, Benjamin Beichler wrote:
> Am 18.10.2023 um 14:36 schrieb benjamin@sipsolutions.net:
>> From: Benjamin Berg <benjamin.berg@intel.com>
>>
>> When in time-travel mode, the eventfd events are read even when signals
>> are blocked as SIGIO still needs to be processed. In this case, the
>> event is cleared on the eventfd but the IRQ still needs to be fired
>> later.
>>
>> We did already ensure that the SIGIO handler is run again. However, the
>> FDs are configured to be level triggered, so that eventfd will not
>> notify again. As such, add some logic to mark the IRQ as pending and
>> process it at the next opportunity.
>>
>> To avoid duplication, reuse the logic used for the suspend/resume case.
>> This does not really change anything except for delaying running the
>> IRQs with timetravel_handler at a slightly later point in time (and
>> possibly running non-timetravel IRQs that shouldn't happen earlier).
>> While at it, move marking as pending into irq_event_handler as that is
>> the more logical place for it to happen.
>>
>> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
> 
> I also noticed this problem, but after a discussion with Johannes Berg about this, we come to the conclusion, that all drivers with interrupts need to deal with time travel mode via their own time travel handler. What I actually did, was to write a trivial handler for e.g., the serial line, which simply only call time_travel_add_irq_event(ev).
> 
> I'm not entirely sure, what the right way to do it, although the outcome seems to be the same, as with no time advance the actual simulation time, when the interrupt is handled is the same.

Add an extra registration flag for um_register_irq?


enum um_irq_type {
	IRQ_READ,
	IRQ_WRITE,
	IRQ_HAVE_TT_HANDLER,
	NUM_IRQ_TYPES,
};

Ones that do not register get the default one which advances the time. Ones that do - have it left to them and do with time whatever they need.

> 
> I actually also have some other significant bug fixes on time travel in my tree, but I was too lazy to send them here, are you interested in taking a look at them?
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
Benjamin Berg Oct. 20, 2023, 9:59 a.m. UTC | #3
On Fri, 2023-10-20 at 11:15 +0200, Benjamin Beichler wrote:
> Am 18.10.2023 um 14:36 schrieb benjamin@sipsolutions.net:
> > From: Benjamin Berg <benjamin.berg@intel.com>
> > 
> > When in time-travel mode, the eventfd events are read even when signals
> > are blocked as SIGIO still needs to be processed. In this case, the
> > event is cleared on the eventfd but the IRQ still needs to be fired
> > later.
> > 
> > We did already ensure that the SIGIO handler is run again. However, the
> > FDs are configured to be level triggered, so that eventfd will not
> > notify again. As such, add some logic to mark the IRQ as pending and
> > process it at the next opportunity.
> > 
> > To avoid duplication, reuse the logic used for the suspend/resume case.
> > This does not really change anything except for delaying running the
> > IRQs with timetravel_handler at a slightly later point in time (and
> > possibly running non-timetravel IRQs that shouldn't happen earlier).
> > While at it, move marking as pending into irq_event_handler as that is
> > the more logical place for it to happen.
> > 
> > Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
> 
> I also noticed this problem, but after a discussion with Johannes Berg 
> about this, we come to the conclusion, that all drivers with interrupts 
> need to deal with time travel mode via their own time travel handler.
> What I actually did, was to write a trivial handler for e.g., the serial 
> line, which simply only call time_travel_add_irq_event(ev).

Yes, we could instead just assert that we do not have any IRQ without a
timetravel handler. We really shouldn't have, but we do have one for
stdin unfortunately (which is at least used by the hostap hwsim tests,
so we might want to not remove it until we have a different solution).

What we could do is just add an stdin timetravel handler as you have
done, but print a warning when the stdin IRQ is registered[1]. And
then, if someone tries to register an IRQ without a time-travel handler
we just panic(), it should never happen.

Benjamin

[1] We'll also need to set stdin to be a "null" channel and then
somehow avoid registering the IRQ in that case.

> I'm not entirely sure, what the right way to do it, although the outcome 
> seems to be the same, as with no time advance the actual simulation 
> time, when the interrupt is handled is the same.
> 
> I actually also have some other significant bug fixes on time travel in 
> my tree, but I was too lazy to send them here, are you interested in 
> taking a look at them?
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
Benjamin Beichler Oct. 20, 2023, 10:33 a.m. UTC | #4
Am 20.10.2023 um 11:26 schrieb Anton Ivanov:

>> I also noticed this problem, but after a discussion with Johannes Berg 
>> about this, we come to the conclusion, that all drivers with 
>> interrupts need to deal with time travel mode via their own time 
>> travel handler. What I actually did, was to write a trivial handler 
>> for e.g., the serial line, which simply only call 
>> time_travel_add_irq_event(ev).
>>
>> I'm not entirely sure, what the right way to do it, although the 
>> outcome seems to be the same, as with no time advance the actual 
>> simulation time, when the interrupt is handled is the same.
> 
> Add an extra registration flag for um_register_irq?
> 
> 
> enum um_irq_type {
>      IRQ_READ,
>      IRQ_WRITE,
>      IRQ_HAVE_TT_HANDLER,
>      NUM_IRQ_TYPES,
> };
> 
> Ones that do not register get the default one which advances the time. 
> Ones that do - have it left to them and do with time whatever they need.
> 
There is already mechanics for it, as we have um_request_irq_tt and 
um_request_irq. I don't think an additional type enhance the semantics.


>>
>> I actually also have some other significant bug fixes on time travel 
>> in my tree, but I was too lazy to send them here, are you interested 
>> in taking a look at them?
>>
Benjamin Beichler Oct. 20, 2023, 10:38 a.m. UTC | #5
Am 20.10.2023 um 11:59 schrieb Benjamin Berg:
>>
>> I also noticed this problem, but after a discussion with Johannes Berg
>> about this, we come to the conclusion, that all drivers with interrupts
>> need to deal with time travel mode via their own time travel handler.
>> What I actually did, was to write a trivial handler for e.g., the serial
>> line, which simply only call time_travel_add_irq_event(ev).
> 
> Yes, we could instead just assert that we do not have any IRQ without a
> timetravel handler. We really shouldn't have, but we do have one for
> stdin unfortunately (which is at least used by the hostap hwsim tests,
> so we might want to not remove it until we have a different solution).
> 
> What we could do is just add an stdin timetravel handler as you have
> done, but print a warning when the stdin IRQ is registered[1]. And
> then, if someone tries to register an IRQ without a time-travel handler
> we just panic(), it should never happen.
Can you explain, why a time travel handler for stdin may be bad? It 
sounds like you want to avoid it, but I see no immediate problem.

> 
> Benjamin
> 
> [1] We'll also need to set stdin to be a "null" channel and then
> somehow avoid registering the IRQ in that case.
>
Johannes Berg Oct. 20, 2023, 11:39 a.m. UTC | #6
On Fri, 2023-10-20 at 12:38 +0200, Benjamin Beichler wrote:
> 
> Can you explain, why a time travel handler for stdin may be bad? It 
> sounds like you want to avoid it, but I see no immediate problem.
> 

I need to read the thread, but this one's easy ;-)

The thing is that on such a channel you don't send an ACK when you've
seen that there's something to handle. As a result, the sender will
continue running while you're trying to request a new schedule entry
from the controller. As a result, it may run past your new schedule
entry because it didn't know about it yet (this would likely bring down
the controller and crash the simulation), or the relative order of the
two entries is undefined, in the sense that it depends on the process
scheduling of the host.

johannes
Benjamin Beichler Oct. 20, 2023, 12:06 p.m. UTC | #7
Am 20.10.2023 um 13:39 schrieb Johannes Berg:
> On Fri, 2023-10-20 at 12:38 +0200, Benjamin Beichler wrote:
>>
>> Can you explain, why a time travel handler for stdin may be bad? It
>> sounds like you want to avoid it, but I see no immediate problem.
>>
> 
> I need to read the thread, but this one's easy ;-)
> 
> The thing is that on such a channel you don't send an ACK when you've
> seen that there's something to handle. As a result, the sender will
> continue running while you're trying to request a new schedule entry
> from the controller. As a result, it may run past your new schedule
> entry because it didn't know about it yet (this would likely bring down
> the controller and crash the simulation), or the relative order of the
> two entries is undefined, in the sense that it depends on the process
> scheduling of the host.
Sorry, but I did not get this. What may run past the schedule entry? Is 
your assumption, that the "thing" connected to stdin is always totally 
unaware of the time travel mode?

When our (to be published) simulation send something on serial lines (I 
think it does not matter whether it is a socket or a pipe), we expect 
that the uml instance needs to be run as long as it changes back to 
idle/wait state before the simulation time is advanced. Since the basis 
model of the time travel mode is, that you have infinite amount of 
processing power, the interrupt needs to be always handled at the 
current time.

Maybe my think-model only holds for "smaller" amounts of data (maybe one 
page or something?) or not the free-running mode, but I'm not completely 
convinced. :-D

Maybe we need to define, a bit more formally, how the (designed) 
processing model of interrupts in time travel mode is.

> 
> johannes
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
Johannes Berg Oct. 20, 2023, 12:20 p.m. UTC | #8
On Fri, 2023-10-20 at 14:06 +0200, Benjamin Beichler wrote:
> Am 20.10.2023 um 13:39 schrieb Johannes Berg:
> > On Fri, 2023-10-20 at 12:38 +0200, Benjamin Beichler wrote:
> > > 
> > > Can you explain, why a time travel handler for stdin may be bad? It
> > > sounds like you want to avoid it, but I see no immediate problem.
> > > 
> > 
> > I need to read the thread, but this one's easy ;-)
> > 
> > The thing is that on such a channel you don't send an ACK when you've
> > seen that there's something to handle. As a result, the sender will
> > continue running while you're trying to request a new schedule entry
> > from the controller. As a result, it may run past your new schedule
> > entry because it didn't know about it yet (this would likely bring down
> > the controller and crash the simulation), or the relative order of the
> > two entries is undefined, in the sense that it depends on the process
> > scheduling of the host.

> Sorry, but I did not get this. What may run past the schedule entry? Is 
> your assumption, that the "thing" connected to stdin is always totally 
> unaware of the time travel mode?

No, I'm not assuming that, if that's done all bets are off anyway. I'm
assuming that both are connected to the controller.

> When our (to be published) simulation send something on serial lines (I 
> think it does not matter whether it is a socket or a pipe), we expect 
> that the uml instance needs to be run as long as it changes back to 
> idle/wait state before the simulation time is advanced. Since the basis 
> model of the time travel mode is, that you have infinite amount of 
> processing power, the interrupt needs to be always handled at the 
> current time.

Yes but you need to schedule for the interrupt, and you don't
necessarily know what 'current time' is at interrupt time.

So let's say you have "something" that's scheduled to run at times
 - 1000
 - 2000
 - 3000

and free-until is 10000 or something high.

Now it sends a message to Linux stdio at 1000.

But it doesn't have a way to wait for ack. So it continues, checks the
schedule and free-until, and can advance time to 2000 since it doesn't
yet know Linux requested time to run at 1000.

This gives you something not predictable - it depends on the host
scheduling, whichever ran first (Linux started next or the other
continued, or both and it gets messed up).

> Maybe my think-model only holds for "smaller" amounts of data (maybe one 
> page or something?) or not the free-running mode, but I'm not completely 
> convinced. :-D

Nah it doesn't really matter how much data there is.

> Maybe we need to define, a bit more formally, how the (designed) 
> processing model of interrupts in time travel mode is.

That's probably a separate Master's thesis ;-)

johannes
Johannes Berg Oct. 20, 2023, 12:23 p.m. UTC | #9
On Fri, 2023-10-20 at 14:20 +0200, Johannes Berg wrote:
> 
> Yes but you need to schedule for the interrupt, and you don't
> necessarily know what 'current time' is at interrupt time.
> 
> So let's say you have "something" that's scheduled to run at times
>  - 1000
>  - 2000
>  - 3000
> 
> and free-until is 10000 or something high.
> 

It can also happen without free-until, then it just depends which one of
the two - they're running in parallel now (linux doing time-travel
interrupt handling and adding the event, the other thing continuing to
schedule and doing the next entry)  - asks the controller first.

johannes
Benjamin Beichler Oct. 20, 2023, 12:43 p.m. UTC | #10
>> Yes but you need to schedule for the interrupt, and you don't
>> necessarily know what 'current time' is at interrupt time.
>>
>> So let's say you have "something" that's scheduled to run at times
>>   - 1000
>>   - 2000
>>   - 3000
>>
>> and free-until is 10000 or something high.
>>
> It can also happen without free-until, then it just depends which one of
> the two - they're running in parallel now (linux doing time-travel
> interrupt handling and adding the event, the other thing continuing to
> schedule and doing the next entry)  - asks the controller first.

Since they happen at the same time, most discrete event simulations make 
the assumption, that the actual order of execution should not be part of 
the semantics.

Although, most of the implementations demand a deterministic order (for 
repetition) but even that requirement is sometimes lifted. So this is no 
contradiction to common discrete event simulations.
Johannes Berg Oct. 20, 2023, 12:58 p.m. UTC | #11
On Fri, 2023-10-20 at 14:43 +0200, Benjamin Beichler wrote:
> > > Yes but you need to schedule for the interrupt, and you don't
> > > necessarily know what 'current time' is at interrupt time.
> > > 
> > > So let's say you have "something" that's scheduled to run at times
> > >   - 1000
> > >   - 2000
> > >   - 3000
> > > 
> > > and free-until is 10000 or something high.
> > > 
> > It can also happen without free-until, then it just depends which one of
> > the two - they're running in parallel now (linux doing time-travel
> > interrupt handling and adding the event, the other thing continuing to
> > schedule and doing the next entry)  - asks the controller first.
> 
> Since they happen at the same time, most discrete event simulations make 
> the assumption, that the actual order of execution should not be part of 
> the semantics.
> 

Well, fair, but they can bounce more messages around them, and that
should change things. I don't think we've built a simulation system here
that can strictly assume this is true.

In any case, it can get worse - if he sender process actually updated
the controller to 2000 by the time the linux actually calls
time_travel_add_irq_event(), the linux event will run at 2000 instead of
1000.

All this is why we have the vhost-user extensions for ACK messages etc.

johannes
Benjamin Beichler Oct. 20, 2023, 12:58 p.m. UTC | #12
> On Fri, 2023-10-20 at 14:06 +0200, Benjamin Beichler wrote:
>> Am 20.10.2023 um 13:39 schrieb Johannes Berg:
>>> On Fri, 2023-10-20 at 12:38 +0200, Benjamin Beichler wrote:
>>>> Can you explain, why a time travel handler for stdin may be bad? It
>>>> sounds like you want to avoid it, but I see no immediate problem.
>>>>
>>> I need to read the thread, but this one's easy ;-)
>>>
>>> The thing is that on such a channel you don't send an ACK when you've
>>> seen that there's something to handle. As a result, the sender will
>>> continue running while you're trying to request a new schedule entry
>>> from the controller. As a result, it may run past your new schedule
>>> entry because it didn't know about it yet (this would likely bring down
>>> the controller and crash the simulation), or the relative order of the
>>> two entries is undefined, in the sense that it depends on the process
>>> scheduling of the host.
>> Sorry, but I did not get this. What may run past the schedule entry? Is
>> your assumption, that the "thing" connected to stdin is always totally
>> unaware of the time travel mode?
> No, I'm not assuming that, if that's done all bets are off anyway. I'm
> assuming that both are connected to the controller.
>
>> When our (to be published) simulation send something on serial lines (I
>> think it does not matter whether it is a socket or a pipe), we expect
>> that the uml instance needs to be run as long as it changes back to
>> idle/wait state before the simulation time is advanced. Since the basis
>> model of the time travel mode is, that you have infinite amount of
>> processing power, the interrupt needs to be always handled at the
>> current time.
> Yes but you need to schedule for the interrupt, and you don't
> necessarily know what 'current time' is at interrupt time.
>
> So let's say you have "something" that's scheduled to run at times
>   - 1000
>   - 2000
>   - 3000
>
> and free-until is 10000 or something high.
>
> Now it sends a message to Linux stdio at 1000.
>
> But it doesn't have a way to wait for ack. So it continues, checks the
> schedule and free-until, and can advance time to 2000 since it doesn't
> yet know Linux requested time to run at 1000.

Do I get it right, that you anticipate that all parts of the simulation 
may run at different paces and the controller is only needed for 
synchronization?

My mind model is, that simulation time advances in the controller are 
only done, If all parts of the simulation reached the current simulation 
time and are at the event processed/idle/wait state. Therefore, the 
thing connected to stdio will not advance to some other time, if some 
other component has outstanding events (i.e., an interrupt at this 
case). Of course free-running is a problematic in this mind model, but 
in this mode you willingly sacrifice the precision of events for 
performance.

>
> This gives you something not predictable - it depends on the host
> scheduling, whichever ran first (Linux started next or the other
> continued, or both and it gets messed up).
>
>> Maybe my think-model only holds for "smaller" amounts of data (maybe one
>> page or something?) or not the free-running mode, but I'm not completely
>> convinced. :-D
> Nah it doesn't really matter how much data there is.
>
>> Maybe we need to define, a bit more formally, how the (designed)
>> processing model of interrupts in time travel mode is.
> That's probably a separate Master's thesis ;-)

It could, but some basic definition for a common understanding might help.

However, do you like to have a look at my current state of time travel 
changes in github or do you think an RFC to the mailing list is better? 
I like github for a quick dive into patches, but your whole workflow is 
maybe different ;-)
Johannes Berg Oct. 20, 2023, 1:39 p.m. UTC | #13
On Fri, 2023-10-20 at 14:58 +0200, Benjamin Beichler wrote:
> > On Fri, 2023-10-20 at 14:06 +0200, Benjamin Beichler wrote:
> > > Am 20.10.2023 um 13:39 schrieb Johannes Berg:
> > > > On Fri, 2023-10-20 at 12:38 +0200, Benjamin Beichler wrote:
> > > > > Can you explain, why a time travel handler for stdin may be bad? It
> > > > > sounds like you want to avoid it, but I see no immediate problem.
> > > > > 
> > > > I need to read the thread, but this one's easy ;-)
> > > > 
> > > > The thing is that on such a channel you don't send an ACK when you've
> > > > seen that there's something to handle. As a result, the sender will
> > > > continue running while you're trying to request a new schedule entry
> > > > from the controller. As a result, it may run past your new schedule
> > > > entry because it didn't know about it yet (this would likely bring down
> > > > the controller and crash the simulation), or the relative order of the
> > > > two entries is undefined, in the sense that it depends on the process
> > > > scheduling of the host.
> > > Sorry, but I did not get this. What may run past the schedule entry? Is
> > > your assumption, that the "thing" connected to stdin is always totally
> > > unaware of the time travel mode?
> > No, I'm not assuming that, if that's done all bets are off anyway. I'm
> > assuming that both are connected to the controller.
> > 
> > > When our (to be published) simulation send something on serial lines (I
> > > think it does not matter whether it is a socket or a pipe), we expect
> > > that the uml instance needs to be run as long as it changes back to
> > > idle/wait state before the simulation time is advanced. Since the basis
> > > model of the time travel mode is, that you have infinite amount of
> > > processing power, the interrupt needs to be always handled at the
> > > current time.
> > Yes but you need to schedule for the interrupt, and you don't
> > necessarily know what 'current time' is at interrupt time.
> > 
> > So let's say you have "something" that's scheduled to run at times
> >   - 1000
> >   - 2000
> >   - 3000
> > 
> > and free-until is 10000 or something high.
> > 
> > Now it sends a message to Linux stdio at 1000.
> > 
> > But it doesn't have a way to wait for ack. So it continues, checks the
> > schedule and free-until, and can advance time to 2000 since it doesn't
> > yet know Linux requested time to run at 1000.
> 
> Do I get it right, that you anticipate that all parts of the simulation 
> may run at different paces and the controller is only needed for 
> synchronization?

Not sure what you mean by that.

What I _wanted_ is that only one can run at a time (from a host
perspective), so e.g. virtio stuff works like

A is running
A -> B virtio:
A puts data on ring virtio ring
A sends message on vhost-user socket to indicate ring update to B
A waits for ACK for that message
B sees the ring update message
B requests time from controller
B sends ACK to A
A continues running
A gets to the next scheduling point but now sees free-until==now
A releases time slice to controller
controller assigns time slice to B
B runs to process the virtio ring data


So you see that only one is running at a time.

Without the ACK, you'd have
 B sees the ring update message
 B requests time from the controller

in parallel with

 A continues running
 A gets to the next scheduling point

but it might either not see free-until==now, or other things.


> My mind model is, that simulation time advances in the controller are 
> only done, If all parts of the simulation reached the current simulation 
> time and are at the event processed/idle/wait state. 

Yeah that's a simpler model, and in what we implemented it's true if you
do have the ACK messages. If don't have free-until, that's also possibly
true, although we might have implementation bugs in a sense with
time_travel_add_irq_event() since that looks at the current time. And
also, if you request something the controller thinks is already in the
past because the other side updated the controller due to free-until,
the simulation just crashes.

It's also possible that B (in the situation above) actually *does* ask
the controller who should run next, and the controller simply doesn't
yet know:

 B sends message to A's stdin (say at 1000)
 B requests to run at 2000 from controller
 B releases time to controller
 controller sets time to 2000 and tells B to run
 A requests scheduling at 1000
 controller *crashes*

This is actually quite likely if the host is running 6.6 with the new
EEVDF scheduler, because it will prefer to let B continue running, and
then it just needs to pick the controller over A next.

This again is because of the lack of synchronization in editing the
schedule (sending messages to the controller) which the ACK message
solves.

Hence our internal use cases for "text protocol" actually use virtio-
console rather than stdio, so you do have ACK on the vhost-user protocol
again.

> Therefore, the 
> thing connected to stdio will not advance to some other time, if some 
> other component has outstanding events (i.e., an interrupt at this 
> case). 

Yes, but by the time it actually asks "are there outstanding events",
they might no be added yet, due to hos scheduling order!

> Of course free-running is a problematic in this mind model, but 
> in this mode you willingly sacrifice the precision of events for 
> performance.

If you're referring to free-until: that isn't free-running, it's just
meant to reduce the communication overhead if you know nobody else is
running. The controller maintains this.

Btw we have further overhead reductions with shared memory, need to push
all that out.

> > > Maybe we need to define, a bit more formally, how the (designed)
> > > processing model of interrupts in time travel mode is.
> > That's probably a separate Master's thesis ;-)
> 
> It could, but some basic definition for a common understanding might help.

:)
Does the above help a bit? But yeah maybe I should try to write it down
more formally.

> However, do you like to have a look at my current state of time travel 
> changes in github or do you think an RFC to the mailing list is better? 
> I like github for a quick dive into patches, but your whole workflow is 
> maybe different ;-)
> 

RFC to the list is probably easier to comment on etc.

johannes
Benjamin Beichler Oct. 20, 2023, 3:47 p.m. UTC | #14
>> My mind model is, that simulation time advances in the controller are 
>> only done, If all parts of the simulation reached the current 
>> simulation time and are at the event processed/idle/wait state. 
> Yeah that's a simpler model, and in what we implemented it's true if 
> you do have the ACK messages. If don't have free-until, that's also 
> possibly true, although we might have implementation bugs in a sense 
> with time_travel_add_irq_event() since that looks at the current time. 
> And also, if you request something the controller thinks is already in 
> the past because the other side updated the controller due to 
> free-until, the simulation just crashes. It's also possible that B (in 
> the situation above) actually *does* ask the controller who should run 
> next, and the controller simply doesn't yet know: B sends message to 
> A's stdin (say at 1000) B requests to run at 2000 from controller B 
> releases time to controller controller sets time to 2000 and tells B 
> to run A requests scheduling at 1000 controller *crashes* 

I think, I get now, why I don't have this Problem: My virtio simulation 
and my controller/simulation-master are tightly coupled and indeed the 
same process. When I create a new event for the UML-instance, it is 
anticipated in the simulation master.

So my sequence is more like:
B sends message to A's stdin (say at 1000)
B tells the controller, that it activated A, and time should not advance 
until A has sent a request for processing an interrupt and has gone back 
to waiting state
B requests to run at 2000 from controller
B releases time to controller
...

I see that without further information from the device simulation to the 
controller, it is quite harder.

Nonetheless, I'm not totally sure, how this interacts with the timing 
semantics of the interrupts here. Either the solution of Benjamin Berg 
or mine (with the trivial tt-handler have the same problem).

I will post my patches the next days, you may have a look on them, maybe 
you also experienced some of my problems :-)
Johannes Berg Oct. 20, 2023, 3:51 p.m. UTC | #15
On Fri, 2023-10-20 at 17:47 +0200, Benjamin Beichler wrote:
> 
> I think, I get now, why I don't have this Problem: My virtio simulation 
> and my controller/simulation-master are tightly coupled and indeed the 
> same process. When I create a new event for the UML-instance, it is 
> anticipated in the simulation master.
> 
> So my sequence is more like:
> B sends message to A's stdin (say at 1000)
> B tells the controller, that it activated A, and time should not advance 
> until A has sent a request for processing an interrupt and has gone back 
> to waiting state
> B requests to run at 2000 from controller
> B releases time to controller
> ...

Ahh, OK. I thought you were probably using our controller from the
usfstl, but of course you don't have to. I think I did push the shared
memory optimisation there though, but I suspect I haven't posted the
Linux client version.

> I see that without further information from the device simulation to the 
> controller, it is quite harder.

Right, I wanted to handle that in the other side (hence the ACK) since
you might not always know _when_ the interrupt is scheduled there, even
if it's currently always "immediately".

> Nonetheless, I'm not totally sure, how this interacts with the timing 
> semantics of the interrupts here. Either the solution of Benjamin Berg 
> or mine (with the trivial tt-handler have the same problem).

Oh, yeah, his changes do have the same problem. He's just fixing that
interrupts got lost completely in some already _otherwise_ broken cases,
to make it work better without hanging, not to make it work correctly.

To make it work correctly you shouldn't use stdio at all, or I guess
have some external helper thing like you do :)

johannes
diff mbox series

Patch

diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index 635d44606bfe..ceda4bd2e5ed 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
@@ -37,7 +37,7 @@  struct irq_reg {
 	bool pending;
 	bool wakeup;
 #ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
-	bool pending_on_resume;
+	bool pending_event;
 	void (*timetravel_handler)(int, int, void *,
 				   struct time_travel_event *);
 	struct time_travel_event event;
@@ -56,6 +56,9 @@  static DEFINE_SPINLOCK(irq_lock);
 static LIST_HEAD(active_fds);
 static DECLARE_BITMAP(irqs_allocated, UM_LAST_SIGNAL_IRQ);
 static bool irqs_suspended;
+#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
+static bool irqs_pending;
+#endif
 
 static void irq_io_loop(struct irq_reg *irq, struct uml_pt_regs *regs)
 {
@@ -84,9 +87,12 @@  static void irq_event_handler(struct time_travel_event *ev)
 {
 	struct irq_reg *reg = container_of(ev, struct irq_reg, event);
 
-	/* do nothing if suspended - just to cause a wakeup */
-	if (irqs_suspended)
+	/* do nothing if suspended; just cause a wakeup and mark as pending */
+	if (irqs_suspended) {
+		irqs_pending = true;
+		reg->pending_event = true;
 		return;
+	}
 
 	generic_handle_irq(reg->irq);
 }
@@ -110,16 +116,47 @@  static bool irq_do_timetravel_handler(struct irq_entry *entry,
 	if (!reg->event.pending)
 		return false;
 
-	if (irqs_suspended)
-		reg->pending_on_resume = true;
 	return true;
 }
+
+static void irq_do_pending_events(bool timetravel_handlers_only)
+{
+	struct irq_entry *entry;
+
+	if (!irqs_pending || timetravel_handlers_only)
+		return;
+
+	irqs_pending = false;
+
+	list_for_each_entry(entry, &active_fds, list) {
+		enum um_irq_type t;
+
+		for (t = 0; t < NUM_IRQ_TYPES; t++) {
+			struct irq_reg *reg = &entry->reg[t];
+
+			/*
+			 * Any timetravel_handler was invoked already, just
+			 * directly run the IRQ.
+			 */
+			if (reg->pending_event) {
+				irq_enter();
+				generic_handle_irq(reg->irq);
+				irq_exit();
+				reg->pending_event = false;
+			}
+		}
+	}
+}
 #else
 static bool irq_do_timetravel_handler(struct irq_entry *entry,
 				      enum um_irq_type t)
 {
 	return false;
 }
+
+static void irq_do_pending_events(bool timetravel_handlers_only)
+{
+}
 #endif
 
 static void sigio_reg_handler(int idx, struct irq_entry *entry, enum um_irq_type t,
@@ -145,6 +182,8 @@  static void sigio_reg_handler(int idx, struct irq_entry *entry, enum um_irq_type
 	 */
 	if (timetravel_handlers_only) {
 #ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
+		reg->pending_event = true;
+		irqs_pending = true;
 		mark_sigio_pending();
 #endif
 		return;
@@ -162,6 +201,10 @@  static void _sigio_handler(struct uml_pt_regs *regs,
 	if (timetravel_handlers_only && !um_irq_timetravel_handler_used())
 		return;
 
+	/* Flush out pending events that were ignored due to time-travel. */
+	if (!irqs_suspended)
+		irq_do_pending_events(timetravel_handlers_only);
+
 	while (1) {
 		/* This is now lockless - epoll keeps back-referencesto the irqs
 		 * which have trigger it so there is no need to walk the irq
@@ -543,30 +586,7 @@  void um_irqs_resume(void)
 	unsigned long flags;
 
 
-	local_irq_save(flags);
-#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
-	/*
-	 * We don't need to lock anything here since we're in resume
-	 * and nothing else is running, but have disabled IRQs so we
-	 * don't try anything else with the interrupt list from there.
-	 */
-	list_for_each_entry(entry, &active_fds, list) {
-		enum um_irq_type t;
-
-		for (t = 0; t < NUM_IRQ_TYPES; t++) {
-			struct irq_reg *reg = &entry->reg[t];
-
-			if (reg->pending_on_resume) {
-				irq_enter();
-				generic_handle_irq(reg->irq);
-				irq_exit();
-				reg->pending_on_resume = false;
-			}
-		}
-	}
-#endif
-
-	spin_lock(&irq_lock);
+	spin_lock_irqsave(&irq_lock, flags);
 	list_for_each_entry(entry, &active_fds, list) {
 		if (entry->suspended) {
 			int err = os_set_fd_async(entry->fd);