Patchwork [11/22] i8259: Update IRQ state after reset

login
register
mail settings
Submitter Jan Kiszka
Date Sept. 28, 2011, 11 a.m.
Message ID <bcaf89ca8b39b60707d5d3b3f4fa4d2dbd56ffb6.1317207666.git.jan.kiszka@siemens.com>
Download mbox | patch
Permalink /patch/116781/
State New
Headers show

Comments

Jan Kiszka - Sept. 28, 2011, 11 a.m.
As we clearly modify the PIC state on pic_reset, we also have to update
the IRQ output. This only happened on init so far. Apply this
consistently.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/i8259.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)
Blue Swirl - Sept. 28, 2011, 6:01 p.m.
On Wed, Sep 28, 2011 at 11:00 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> As we clearly modify the PIC state on pic_reset, we also have to update
> the IRQ output. This only happened on init so far. Apply this
> consistently.

Nack, IRQ lines shouldn't be touched on reset. The other side may not
be ready for receiving the interrupt change and qemu_irqs are
stateless anyway.

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  hw/i8259.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/hw/i8259.c b/hw/i8259.c
> index b7a011f..3498c6b 100644
> --- a/hw/i8259.c
> +++ b/hw/i8259.c
> @@ -283,6 +283,7 @@ static void pic_reset(void *opaque)
>     s->init4 = 0;
>     s->single_mode = 0;
>     /* Note: ELCR is not reset */
> +    pic_update_irq(s->pics_state);
>  }
>
>  static void pic_ioport_write(void *opaque, target_phys_addr_t addr64,
> @@ -298,8 +299,6 @@ static void pic_ioport_write(void *opaque, target_phys_addr_t addr64,
>         if (val & 0x10) {
>             /* init */
>             pic_reset(s);
> -            /* deassert a pending interrupt */
> -            qemu_irq_lower(s->pics_state->pics[0].int_out);
>             s->init_state = 1;
>             s->init4 = val & 1;
>             s->single_mode = val & 2;
> --
> 1.7.3.4
>
>
Peter Maydell - Sept. 28, 2011, 6:09 p.m.
On 28 September 2011 19:01, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Sep 28, 2011 at 11:00 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> As we clearly modify the PIC state on pic_reset, we also have to update
>> the IRQ output. This only happened on init so far. Apply this
>> consistently.
>
> Nack, IRQ lines shouldn't be touched on reset. The other side may not
> be ready for receiving the interrupt change and qemu_irqs are
> stateless anyway.

So how does this work for a device whose reset state is "this
irq/gpio line is asserted" ?

-- PMM
Blue Swirl - Sept. 28, 2011, 6:42 p.m.
On Wed, Sep 28, 2011 at 6:09 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 28 September 2011 19:01, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Wed, Sep 28, 2011 at 11:00 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> As we clearly modify the PIC state on pic_reset, we also have to update
>>> the IRQ output. This only happened on init so far. Apply this
>>> consistently.
>>
>> Nack, IRQ lines shouldn't be touched on reset. The other side may not
>> be ready for receiving the interrupt change and qemu_irqs are
>> stateless anyway.
>
> So how does this work for a device whose reset state is "this
> irq/gpio line is asserted" ?

The polarity could be reversed but that wouldn't work if the signal is
used in several places.

But the duration of reset in QEMU is infinitely small, nothing is
expected to happen during that. This doesn't match real HW though, the
reset could take several hundred milliseconds but I doubt modeling
that would be interesting.

We also assume that the reset states of devices and their outputs are
coherent i.e. the output of one device during reset would not change
the state of another device from its reset state.
Jan Kiszka - Sept. 28, 2011, 9:18 p.m.
On 2011-09-28 20:01, Blue Swirl wrote:
> On Wed, Sep 28, 2011 at 11:00 AM, Jan Kiszka<jan.kiszka@siemens.com>  wrote:
>> As we clearly modify the PIC state on pic_reset, we also have to update
>> the IRQ output. This only happened on init so far. Apply this
>> consistently.
>
> Nack, IRQ lines shouldn't be touched on reset. The other side may not
> be ready for receiving the interrupt change and qemu_irqs are
> stateless anyway.

Sorry, but failing to clear the line (this is what pic_update_irq will 
effectively do) is a clear bug in the current code. This patch is 100% 
analogue to what, e.g. the PCI layer does on reset. Please re-read.

Jan
Peter Maydell - Sept. 28, 2011, 9:38 p.m.
On 28 September 2011 19:42, Blue Swirl <blauwirbel@gmail.com> wrote:
> We also assume that the reset states of devices and their outputs are
> coherent i.e. the output of one device during reset would not change
> the state of another device from its reset state.

This seems a very dubious assumption to make. Consider the
realview board: the 'card present' line from the pl181
MMC card controller is wired up to (1) a GPIO input on the
'sysctl' system register module [which causes it to appear
as a status bit in a register] and (2) via an inverter to
an input on a pl061 GPIO module. On coming out of reset
something has to be done to make the reset value of the
output from the pl181 be reflected in the internal status
of the sysctl and pl061 devices.

(On real hardware this happens when the pl061/sysctl
devices come out of reset and sample their input pins.
If we wanted to model it that way we'd need internal state
on all gpio lines and a two phase "enter reset"+"leave reset"
or something.)

-- PMM
Blue Swirl - Sept. 29, 2011, 7:35 p.m.
On Wed, Sep 28, 2011 at 9:38 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 28 September 2011 19:42, Blue Swirl <blauwirbel@gmail.com> wrote:
>> We also assume that the reset states of devices and their outputs are
>> coherent i.e. the output of one device during reset would not change
>> the state of another device from its reset state.
>
> This seems a very dubious assumption to make. Consider the
> realview board: the 'card present' line from the pl181
> MMC card controller is wired up to (1) a GPIO input on the
> 'sysctl' system register module [which causes it to appear
> as a status bit in a register] and (2) via an inverter to
> an input on a pl061 GPIO module. On coming out of reset
> something has to be done to make the reset value of the
> output from the pl181 be reflected in the internal status
> of the sysctl and pl061 devices.
>
> (On real hardware this happens when the pl061/sysctl
> devices come out of reset and sample their input pins.
> If we wanted to model it that way we'd need internal state
> on all gpio lines and a two phase "enter reset"+"leave reset"
> or something.)

Maybe the rule should be that during reset, all inputs should be in
high impedance mode. We'd need a two-phase reset for QEMU for that
too, or to disconnect all signals before reset and then reconnect them
after that.
Blue Swirl - Sept. 29, 2011, 7:45 p.m.
On Wed, Sep 28, 2011 at 9:18 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2011-09-28 20:01, Blue Swirl wrote:
>>
>> On Wed, Sep 28, 2011 at 11:00 AM, Jan Kiszka<jan.kiszka@siemens.com>
>>  wrote:
>>>
>>> As we clearly modify the PIC state on pic_reset, we also have to update
>>> the IRQ output. This only happened on init so far. Apply this
>>> consistently.
>>
>> Nack, IRQ lines shouldn't be touched on reset. The other side may not
>> be ready for receiving the interrupt change and qemu_irqs are
>> stateless anyway.
>
> Sorry, but failing to clear the line (this is what pic_update_irq will
> effectively do) is a clear bug in the current code. This patch is 100%
> analogue to what, e.g. the PCI layer does on reset. Please re-read.

Reset will happen also when the devices are created. At that time,
qemu_irq callback triggered by changing of the state may produce
undesired effects on the other side. There have been bugs earlier, see
bc26e55a6615dc594be425d293db40d5cdcdb84b and
42f1ced228c9b616cfa2b69846025271618e4ef5 and discussion in
http://lists.nongnu.org/archive/html/qemu-devel/2009-06/msg01024.html.
Jan Kiszka - Sept. 30, 2011, 6:47 a.m.
On 2011-09-29 21:45, Blue Swirl wrote:
> On Wed, Sep 28, 2011 at 9:18 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2011-09-28 20:01, Blue Swirl wrote:
>>>
>>> On Wed, Sep 28, 2011 at 11:00 AM, Jan Kiszka<jan.kiszka@siemens.com>
>>>  wrote:
>>>>
>>>> As we clearly modify the PIC state on pic_reset, we also have to update
>>>> the IRQ output. This only happened on init so far. Apply this
>>>> consistently.
>>>
>>> Nack, IRQ lines shouldn't be touched on reset. The other side may not
>>> be ready for receiving the interrupt change and qemu_irqs are
>>> stateless anyway.
>>
>> Sorry, but failing to clear the line (this is what pic_update_irq will
>> effectively do) is a clear bug in the current code. This patch is 100%
>> analogue to what, e.g. the PCI layer does on reset. Please re-read.
> 
> Reset will happen also when the devices are created. At that time,
> qemu_irq callback triggered by changing of the state may produce
> undesired effects on the other side.

All those potential effects will be cleared again when the receiver is
reset as well. If not, that would be a bug which requires fixing.

> There have been bugs earlier, see
> bc26e55a6615dc594be425d293db40d5cdcdb84b and
> 42f1ced228c9b616cfa2b69846025271618e4ef5 and discussion in
> http://lists.nongnu.org/archive/html/qemu-devel/2009-06/msg01024.html.

Deasserting IRQs at PCI device level is indeed useless as we already do
this in the PCI core.

There is no difference between system reset after power-up or later on.
And there should be no difference between per device, per group of
devices or system-wide reset. A device model cannot tell these apart.
That the devices reset handler is only called on system reset is an odd
and fragile assumption - not that fragile for platform devices like the
i8259, but definitely bogus for pluggable ones.

My series does not depend on this cleanup/fix, but the reason not to
apply remains wrong IMHO.

Jan
Peter Maydell - Sept. 30, 2011, 9:14 a.m.
On 30 September 2011 07:47, Jan Kiszka <jan.kiszka@web.de> wrote:
> There is no difference between system reset after power-up or later on.
> And there should be no difference between per device, per group of
> devices or system-wide reset. A device model cannot tell these apart.

I think it's also useful to distinguish "system reset" meaning 'simulated
reset button was pressed, causing the reset line to be asserted on various
devices' from "simulation reset" meaning 'qemu effectively tore down the
whole simulation and put it into the same state it would be when qemu
initially starts'. I don't think you can usefully reset a single device
except by asserting its simulated reset signal, except possibly for
hotplug type situations where you're simulating applying power to it.

> My series does not depend on this cleanup/fix, but the reason not to
> apply remains wrong IMHO.

(just to be clear) I'm not arguing against your patch, just more
generally that I'm not sure our current model of reset is entirely
coherent.

-- PMM
Blue Swirl - Sept. 30, 2011, 8:47 p.m.
On Fri, Sep 30, 2011 at 6:47 AM, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2011-09-29 21:45, Blue Swirl wrote:
>> On Wed, Sep 28, 2011 at 9:18 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
>>> On 2011-09-28 20:01, Blue Swirl wrote:
>>>>
>>>> On Wed, Sep 28, 2011 at 11:00 AM, Jan Kiszka<jan.kiszka@siemens.com>
>>>>  wrote:
>>>>>
>>>>> As we clearly modify the PIC state on pic_reset, we also have to update
>>>>> the IRQ output. This only happened on init so far. Apply this
>>>>> consistently.
>>>>
>>>> Nack, IRQ lines shouldn't be touched on reset. The other side may not
>>>> be ready for receiving the interrupt change and qemu_irqs are
>>>> stateless anyway.
>>>
>>> Sorry, but failing to clear the line (this is what pic_update_irq will
>>> effectively do) is a clear bug in the current code. This patch is 100%
>>> analogue to what, e.g. the PCI layer does on reset. Please re-read.
>>
>> Reset will happen also when the devices are created. At that time,
>> qemu_irq callback triggered by changing of the state may produce
>> undesired effects on the other side.
>
> All those potential effects will be cleared again when the receiver is
> reset as well. If not, that would be a bug which requires fixing.

No, the order of resetting devices is not known so the IRQ may or may
not have the effect. If the other device still has pre-reset
configuration, it may spread incorrect effects to devices which
already were reset.

If reliable post reset effects are needed, reset should be divided
into two separate phases. So far no interesting cases have been
demonstrated.

>> There have been bugs earlier, see
>> bc26e55a6615dc594be425d293db40d5cdcdb84b and
>> 42f1ced228c9b616cfa2b69846025271618e4ef5 and discussion in
>> http://lists.nongnu.org/archive/html/qemu-devel/2009-06/msg01024.html.
>
> Deasserting IRQs at PCI device level is indeed useless as we already do
> this in the PCI core.
>
> There is no difference between system reset after power-up or later on.
> And there should be no difference between per device, per group of
> devices or system-wide reset. A device model cannot tell these apart.
> That the devices reset handler is only called on system reset is an odd
> and fragile assumption - not that fragile for platform devices like the
> i8259, but definitely bogus for pluggable ones.

That part of the discussion is obsolete (or at least uninteresting
here). For example this message has a relevant example:
http://lists.nongnu.org/archive/html/qemu-devel/2009-06/msg01081.html

It's about VM restore, but the situation is similar during reset.

> My series does not depend on this cleanup/fix, but the reason not to
> apply remains wrong IMHO.

Sorry, but if a patch is incorrect, it is not a cleanup nor a fix.
Blue Swirl - Sept. 30, 2011, 8:52 p.m.
On Fri, Sep 30, 2011 at 9:14 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 30 September 2011 07:47, Jan Kiszka <jan.kiszka@web.de> wrote:
>> There is no difference between system reset after power-up or later on.
>> And there should be no difference between per device, per group of
>> devices or system-wide reset. A device model cannot tell these apart.
>
> I think it's also useful to distinguish "system reset" meaning 'simulated
> reset button was pressed, causing the reset line to be asserted on various
> devices' from "simulation reset" meaning 'qemu effectively tore down the
> whole simulation and put it into the same state it would be when qemu
> initially starts'. I don't think you can usefully reset a single device
> except by asserting its simulated reset signal, except possibly for
> hotplug type situations where you're simulating applying power to it.

In the 2009 thread, i286 triple fault was mentioned. It's also
possible to reset a single device, which can then reset a slave device
and so forth, like Sparc32 DMA controller can reset ESP device and
that should perform a SCSI bus reset.

>> My series does not depend on this cleanup/fix, but the reason not to
>> apply remains wrong IMHO.
>
> (just to be clear) I'm not arguing against your patch, just more
> generally that I'm not sure our current model of reset is entirely
> coherent.
>
> -- PMM
>
Jan Kiszka - Oct. 1, 2011, 6:47 a.m.
On 2011-09-30 22:47, Blue Swirl wrote:
> That part of the discussion is obsolete (or at least uninteresting
> here). For example this message has a relevant example:
> http://lists.nongnu.org/archive/html/qemu-devel/2009-06/msg01081.html
> 
> It's about VM restore, but the situation is similar during reset.

Actually, that is not comparable as we are entering the device's
quiescent state.

> 
>> My series does not depend on this cleanup/fix, but the reason not to
>> apply remains wrong IMHO.
> 
> Sorry, but if a patch is incorrect, it is not a cleanup nor a fix.

Well, both current MIPS and PREP look like they would benefit from it.

Jan
Blue Swirl - Oct. 1, 2011, 7:31 a.m.
On Sat, Oct 1, 2011 at 6:47 AM, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2011-09-30 22:47, Blue Swirl wrote:
>> That part of the discussion is obsolete (or at least uninteresting
>> here). For example this message has a relevant example:
>> http://lists.nongnu.org/archive/html/qemu-devel/2009-06/msg01081.html
>>
>> It's about VM restore, but the situation is similar during reset.
>
> Actually, that is not comparable as we are entering the device's
> quiescent state.

It is. Here's an example for the reset case based on the Paul's original one.

Because devices are reset in unpredictable order that they should not
be communicating with other devices (e.g. by modifying IRQ lines).

Consider a system with a device (DEV) and a level triggered interrupt
controller (PIC1) with the ability to toggle the level where
triggering happens, chained to a rising edge triggered interrupt
controller (PIC2).

(DEV) ->  (PIC1) -> (PIC2)

Before reset, DEV output is high, PIC1 has the interrupt unmasked (but
high) and the trigger level is configured as active low, PIC2 has no
pending interrupts.

We now reset, so the state should be that DEV output is low, PIC1 has
masked all interrupts and its input set to active high, and PIC2 has
no pending interrupts. Devices are reset in the order PIC2, DEV, PIC1.

If devices toggle their interrupts on reset then we get incorrect
state after the reset:

PIC2 is reset to the desired no-interrupts-pending state.

DEV is reset. This lowers the IRQ, which is passed to PIC1. PIC1 still
has the old interrupt mask and level set to active low, so it passes
the IRQ through to PIC2, which detects the edge event and marks the
interrupt as pending.

PIC1 is reset, updates the new mask, sets the input level to active
high and lowers its output. However this event does not clear the
internal PIC2 pending interrupt flag, so machine state will be wrong
after reset.

Therefore it is incorrect to perform any qemu_irq activities during
reset (also VM restore like the original example), don't you agree?

If we continued to reset all the devices (call the reset handlers
multiple times), eventually machine state should stabilize (equivalent
of real HW with nice long reset pulses), but on QEMU the reset event
is infinitely short so we have to be more careful.

Actually I don't think that even a two-phase reset with qemu_irq or
Pin activity on the second phase would produce correct results in
every obscure case. Though this may be detectable since the start
state would be known.

>>> My series does not depend on this cleanup/fix, but the reason not to
>>> apply remains wrong IMHO.
>>
>> Sorry, but if a patch is incorrect, it is not a cleanup nor a fix.
>
> Well, both current MIPS and PREP look like they would benefit from it.

How?
Peter Maydell - Oct. 1, 2011, 11:20 a.m.
On 30 September 2011 21:47, Blue Swirl <blauwirbel@gmail.com> wrote:
> If reliable post reset effects are needed, reset should be divided
> into two separate phases. So far no interesting cases have been
> demonstrated.

I've already cited one in this thread [realview mmc card present]
and that's just the one I happen to know about because I made some
patches in that area. I bet there are more in the codebase that
we're not handling properly; they're not easy to find because they
depend on an interaction between how two devices happen to be wired
up rather than being a bug in a particular device.

Another random example: lan9118_reset() calls phy_reset() calls
phy_update_link() calls phy_update_irq() calls lan9118_update()
calls qemu_set_irq(). That's a bug at the moment (ie I agree
that with qdev as it is currently trying to assert outgoing
gpio lines in the reset callback is broken), but we do somehow
need to drive the irq line high on reset (both of the
"write-to-register" triggered kind and the plain old power-on
kind) because the signal defaults to active-low on reset.
(we can't just say our model of the irq line tracks the logical
state of the line, because there's a device config register
that lets you flip it from active-low to active-high, so
our model of the irq line has to have the same sense that the
interrupt controller expects.)

-- PMM
Jan Kiszka - Oct. 2, 2011, 4:27 p.m.
On 2011-10-01 09:31, Blue Swirl wrote:
> On Sat, Oct 1, 2011 at 6:47 AM, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2011-09-30 22:47, Blue Swirl wrote:
>>> That part of the discussion is obsolete (or at least uninteresting
>>> here). For example this message has a relevant example:
>>> http://lists.nongnu.org/archive/html/qemu-devel/2009-06/msg01081.html
>>>
>>> It's about VM restore, but the situation is similar during reset.
>>
>> Actually, that is not comparable as we are entering the device's
>> quiescent state.
> 
> It is. Here's an example for the reset case based on the Paul's original one.
> 
> Because devices are reset in unpredictable order that they should not
> be communicating with other devices (e.g. by modifying IRQ lines).
> 
> Consider a system with a device (DEV) and a level triggered interrupt
> controller (PIC1) with the ability to toggle the level where
> triggering happens, chained to a rising edge triggered interrupt
> controller (PIC2).
> 
> (DEV) ->  (PIC1) -> (PIC2)
> 
> Before reset, DEV output is high, PIC1 has the interrupt unmasked (but
> high) and the trigger level is configured as active low, PIC2 has no
> pending interrupts.
> 
> We now reset, so the state should be that DEV output is low, PIC1 has
> masked all interrupts and its input set to active high, and PIC2 has
> no pending interrupts. Devices are reset in the order PIC2, DEV, PIC1.
> 
> If devices toggle their interrupts on reset then we get incorrect
> state after the reset:
> 
> PIC2 is reset to the desired no-interrupts-pending state.
> 
> DEV is reset. This lowers the IRQ, which is passed to PIC1. PIC1 still
> has the old interrupt mask and level set to active low, so it passes
> the IRQ through to PIC2, which detects the edge event and marks the
> interrupt as pending.
> 
> PIC1 is reset, updates the new mask, sets the input level to active
> high and lowers its output. However this event does not clear the
> internal PIC2 pending interrupt flag, so machine state will be wrong
> after reset.
> 
> Therefore it is incorrect to perform any qemu_irq activities during
> reset (also VM restore like the original example), don't you agree?

A rather odd but valid counterexample. Have you seen such a setup already?

But I'll provide a real example where the model "no IRQ change
propagated on reset, devices handle this internally" fails as well:

PIC -> CPU

We have a level-triggered active-high line in this case. When the CPU is
reset, it "somehow" knows that it is attached to the PIC and assumes
that this device is reset as well. Therefore, the CPU clears its cached
input state on reset. That works if both devices are actually reset. But
it fails if only the CPU is reset while the PIC output is active.

That's likely the reason why MIPS and PPC/PREP do no touch the cached
interrupt line state on reset but expect that the source will inform
them whenever the line goes down - e.g. due to reset.

The conflict we are in with the current reset model is hard-coding the
board wiring and source knowledge into sink device models vs.
propagating reset states. I agree that both have their corner cases.

But in order to continue with properly disentangling board knowledge
from generic device models, we should head for the latter variant where
already possible (like in the i8259 case). On the long term, this should
be resolved using a two-stage model where every root of an interrupt
line signals its state down the chain at the end of a reset phase.

Jan
Avi Kivity - Oct. 2, 2011, 4:39 p.m.
On 09/28/2011 09:01 PM, Blue Swirl wrote:
> On Wed, Sep 28, 2011 at 11:00 AM, Jan Kiszka<jan.kiszka@siemens.com>  wrote:
> >  As we clearly modify the PIC state on pic_reset, we also have to update
> >  the IRQ output. This only happened on init so far. Apply this
> >  consistently.
>
> Nack, IRQ lines shouldn't be touched on reset. The other side may not
> be ready for receiving the interrupt change and qemu_irqs are
> stateless anyway.
>

The way to fix it is two-phase reset:

phase 1: reset internal state (-> move all outputs to reset values), 
don't sample inputs yet
phase 2: allow sampling inputs
Avi Kivity - Oct. 2, 2011, 4:56 p.m.
On 10/01/2011 10:31 AM, Blue Swirl wrote:
> Therefore it is incorrect to perform any qemu_irq activities during
> reset (also VM restore like the original example), don't you agree?

It is not incorrect.  Real hardware updates outputs on RESET assertion, 
and real hardware deals with devices entering reset at different times 
(due to signal propagation delay or slow devices).

> If we continued to reset all the devices (call the reset handlers
> multiple times), eventually machine state should stabilize (equivalent
> of real HW with nice long reset pulses), but on QEMU the reset event
> is infinitely short so we have to be more careful.

calling qemu_irq_pulse(reset) simulates a reset signal of any length 
(since nothing happens between the two edges).

> Actually I don't think that even a two-phase reset with qemu_irq or
> Pin activity on the second phase would produce correct results in
> every obscure case. Though this may be detectable since the start
> state would be known.

The output signals have to stabilize before the second edge of the reset 
signal.
Jan Kiszka - Oct. 2, 2011, 5:46 p.m.
On 2011-10-02 18:39, Avi Kivity wrote:
> On 09/28/2011 09:01 PM, Blue Swirl wrote:
>> On Wed, Sep 28, 2011 at 11:00 AM, Jan Kiszka<jan.kiszka@siemens.com> 
>> wrote:
>> >  As we clearly modify the PIC state on pic_reset, we also have to
>> update
>> >  the IRQ output. This only happened on init so far. Apply this
>> >  consistently.
>>
>> Nack, IRQ lines shouldn't be touched on reset. The other side may not
>> be ready for receiving the interrupt change and qemu_irqs are
>> stateless anyway.
>>
> 
> The way to fix it is two-phase reset:
> 
> phase 1: reset internal state (-> move all outputs to reset values),
> don't sample inputs yet
> phase 2: allow sampling inputs

As far as I understood Anthony's QOM plans, phase 1 will correspond to
"unrealize", phase 2 to "realize".

However, we do not depend on two phases in this particular case (i8259)
and can live with a coalescing both for now.

Jan
Blue Swirl - Oct. 2, 2011, 7:01 p.m.
On Sun, Oct 2, 2011 at 4:27 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2011-10-01 09:31, Blue Swirl wrote:
>> On Sat, Oct 1, 2011 at 6:47 AM, Jan Kiszka <jan.kiszka@web.de> wrote:
>>> On 2011-09-30 22:47, Blue Swirl wrote:
>>>> That part of the discussion is obsolete (or at least uninteresting
>>>> here). For example this message has a relevant example:
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2009-06/msg01081.html
>>>>
>>>> It's about VM restore, but the situation is similar during reset.
>>>
>>> Actually, that is not comparable as we are entering the device's
>>> quiescent state.
>>
>> It is. Here's an example for the reset case based on the Paul's original one.
>>
>> Because devices are reset in unpredictable order that they should not
>> be communicating with other devices (e.g. by modifying IRQ lines).
>>
>> Consider a system with a device (DEV) and a level triggered interrupt
>> controller (PIC1) with the ability to toggle the level where
>> triggering happens, chained to a rising edge triggered interrupt
>> controller (PIC2).
>>
>> (DEV) ->  (PIC1) -> (PIC2)
>>
>> Before reset, DEV output is high, PIC1 has the interrupt unmasked (but
>> high) and the trigger level is configured as active low, PIC2 has no
>> pending interrupts.
>>
>> We now reset, so the state should be that DEV output is low, PIC1 has
>> masked all interrupts and its input set to active high, and PIC2 has
>> no pending interrupts. Devices are reset in the order PIC2, DEV, PIC1.
>>
>> If devices toggle their interrupts on reset then we get incorrect
>> state after the reset:
>>
>> PIC2 is reset to the desired no-interrupts-pending state.
>>
>> DEV is reset. This lowers the IRQ, which is passed to PIC1. PIC1 still
>> has the old interrupt mask and level set to active low, so it passes
>> the IRQ through to PIC2, which detects the edge event and marks the
>> interrupt as pending.
>>
>> PIC1 is reset, updates the new mask, sets the input level to active
>> high and lowers its output. However this event does not clear the
>> internal PIC2 pending interrupt flag, so machine state will be wrong
>> after reset.
>>
>> Therefore it is incorrect to perform any qemu_irq activities during
>> reset (also VM restore like the original example), don't you agree?
>
> A rather odd but valid counterexample. Have you seen such a setup already?
>
> But I'll provide a real example where the model "no IRQ change
> propagated on reset, devices handle this internally" fails as well:
>
> PIC -> CPU
>
> We have a level-triggered active-high line in this case. When the CPU is
> reset, it "somehow" knows that it is attached to the PIC and assumes
> that this device is reset as well. Therefore, the CPU clears its cached
> input state on reset. That works if both devices are actually reset. But
> it fails if only the CPU is reset while the PIC output is active.

OK, we have a positive incorrect case and negative one. This means
that the current model is broken.

> That's likely the reason why MIPS and PPC/PREP do no touch the cached
> interrupt line state on reset but expect that the source will inform
> them whenever the line goes down - e.g. due to reset.
>
> The conflict we are in with the current reset model is hard-coding the
> board wiring and source knowledge into sink device models vs.
> propagating reset states. I agree that both have their corner cases.
>
> But in order to continue with properly disentangling board knowledge
> from generic device models, we should head for the latter variant where
> already possible (like in the i8259 case). On the long term, this should
> be resolved using a two-stage model where every root of an interrupt
> line signals its state down the chain at the end of a reset phase.

I don't think that even a two phase reset can solve the instability in
all possible cases. If the qemu_irq lines form a complex network,
several cycles could be needed until the effects have propagated and
the network has stabilized. In a defective network (loop with NOT in
the middle), the network could oscillate forever and never stabilize
(or until qemu_irq callbacks fill the stack and QEMU crashes), just
like real HW.
Blue Swirl - Oct. 2, 2011, 7:06 p.m.
On Sun, Oct 2, 2011 at 4:39 PM, Avi Kivity <avi@redhat.com> wrote:
> On 09/28/2011 09:01 PM, Blue Swirl wrote:
>>
>> On Wed, Sep 28, 2011 at 11:00 AM, Jan Kiszka<jan.kiszka@siemens.com>
>>  wrote:
>> >  As we clearly modify the PIC state on pic_reset, we also have to update
>> >  the IRQ output. This only happened on init so far. Apply this
>> >  consistently.
>>
>> Nack, IRQ lines shouldn't be touched on reset. The other side may not
>> be ready for receiving the interrupt change and qemu_irqs are
>> stateless anyway.
>>
>
> The way to fix it is two-phase reset:
>
> phase 1: reset internal state (-> move all outputs to reset values), don't
> sample inputs yet

This solves the problem of old state accidentally interfering with reset state.

> phase 2: allow sampling inputs

This could lead to incorrect state for complex networks. It would
still be better than what we have now.
Avi Kivity - Oct. 2, 2011, 7:07 p.m.
> > 
> > The way to fix it is two-phase reset:
> > 
> > phase 1: reset internal state (-> move all outputs to reset
> > values),
> > don't sample inputs yet
> > phase 2: allow sampling inputs
> 
> As far as I understood Anthony's QOM plans, phase 1 will correspond
> to
> "unrealize", phase 2 to "realize".

That smells of abusing mechanism used for construction for reset purposes.

Why not use an ordinary qemu_irq?  It reresents a pin; 0->1 edge (assert) enters phase 1, 1->0 edge (deassert) enters phase 2.  Exactly like real hardware.

> 
> However, we do not depend on two phases in this particular case
> (i8259)
> and can live with a coalescing both for now.
> 

Agree.
Avi Kivity - Oct. 2, 2011, 7:08 p.m.
> > phase 1: reset internal state (-> move all outputs to reset
> > values), don't
> > sample inputs yet
> 
> This solves the problem of old state accidentally interfering with
> reset state.
> 
> > phase 2: allow sampling inputs
> 
> This could lead to incorrect state for complex networks. It would
> still be better than what we have now.
> 

Can you give an example?  Can be theoretical, doesn't have to refer to real hardware.
Blue Swirl - Oct. 2, 2011, 7:13 p.m.
On Sun, Oct 2, 2011 at 4:56 PM, Avi Kivity <avi@redhat.com> wrote:
> On 10/01/2011 10:31 AM, Blue Swirl wrote:
>>
>> Therefore it is incorrect to perform any qemu_irq activities during
>> reset (also VM restore like the original example), don't you agree?
>
> It is not incorrect.  Real hardware updates outputs on RESET assertion, and
> real hardware deals with devices entering reset at different times (due to
> signal propagation delay or slow devices).

Yes, but on real hardware, during the propagation of any effects, the
reset line is held asserted for millions of clock cycles in order to
stabilize the machine.

>> If we continued to reset all the devices (call the reset handlers
>> multiple times), eventually machine state should stabilize (equivalent
>> of real HW with nice long reset pulses), but on QEMU the reset event
>> is infinitely short so we have to be more careful.
>
> calling qemu_irq_pulse(reset) simulates a reset signal of any length (since
> nothing happens between the two edges).

Not really. The first edge could trigger the reset (but don't sample
inputs) phase, this would be equal to forcefully stabilizing any
device internal state. The second would release the device inputs, but
that's also a source of problems.

>> Actually I don't think that even a two-phase reset with qemu_irq or
>> Pin activity on the second phase would produce correct results in
>> every obscure case. Though this may be detectable since the start
>> state would be known.
>
> The output signals have to stabilize before the second edge of the reset
> signal.

They can't since the devices' inputs are ignored at that phase.
Blue Swirl - Oct. 2, 2011, 7:15 p.m.
On Sun, Oct 2, 2011 at 7:07 PM, Avi Kivity <avi@redhat.com> wrote:
>> >
>> > The way to fix it is two-phase reset:
>> >
>> > phase 1: reset internal state (-> move all outputs to reset
>> > values),
>> > don't sample inputs yet
>> > phase 2: allow sampling inputs
>>
>> As far as I understood Anthony's QOM plans, phase 1 will correspond
>> to
>> "unrealize", phase 2 to "realize".
>
> That smells of abusing mechanism used for construction for reset purposes.
>
> Why not use an ordinary qemu_irq?  It reresents a pin; 0->1 edge (assert) enters phase 1, 1->0 edge (deassert) enters phase 2.  Exactly like real hardware.

Fully agree. I also proposed using qemu_irq for reset (but without
phases) a long time ago.

>>
>> However, we do not depend on two phases in this particular case
>> (i8259)
>> and can live with a coalescing both for now.
>>
>
> Agree.
>
Avi Kivity - Oct. 2, 2011, 7:20 p.m.
----- Original Message -----
> On Sun, Oct 2, 2011 at 4:56 PM, Avi Kivity <avi@redhat.com> wrote:
> > On 10/01/2011 10:31 AM, Blue Swirl wrote:
> >>
> >> Therefore it is incorrect to perform any qemu_irq activities
> >> during
> >> reset (also VM restore like the original example), don't you
> >> agree?
> >
> > It is not incorrect.  Real hardware updates outputs on RESET
> > assertion, and
> > real hardware deals with devices entering reset at different times
> > (due to
> > signal propagation delay or slow devices).
> 
> Yes, but on real hardware, during the propagation of any effects, the
> reset line is held asserted for millions of clock cycles in order to
> stabilize the machine.

But nothing can happen in these cycles, since qemu emulates everything that happens in the on the edge, and any timers will be cancelled.


> 
> >> If we continued to reset all the devices (call the reset handlers
> >> multiple times), eventually machine state should stabilize
> >> (equivalent
> >> of real HW with nice long reset pulses), but on QEMU the reset
> >> event
> >> is infinitely short so we have to be more careful.
> >
> > calling qemu_irq_pulse(reset) simulates a reset signal of any
> > length (since
> > nothing happens between the two edges).
> 
> Not really. The first edge could trigger the reset (but don't sample
> inputs) phase, this would be equal to forcefully stabilizing any
> device internal state. The second would release the device inputs,
> but
> that's also a source of problems.


Can you elaborate on the problems?

> 
> >> Actually I don't think that even a two-phase reset with qemu_irq
> >> or
> >> Pin activity on the second phase would produce correct results in
> >> every obscure case. Though this may be detectable since the start
> >> state would be known.
> >
> > The output signals have to stabilize before the second edge of the
> > reset
> > signal.
> 
> They can't since the devices' inputs are ignored at that phase.
> 

A real device also ignores inputs during reset (or if it doesn't, we can just emulate that).


I would really like to see a concrete example we can discuss.
Blue Swirl - Oct. 2, 2011, 7:26 p.m.
On Sun, Oct 2, 2011 at 7:08 PM, Avi Kivity <avi@redhat.com> wrote:
>> > phase 1: reset internal state (-> move all outputs to reset
>> > values), don't
>> > sample inputs yet
>>
>> This solves the problem of old state accidentally interfering with
>> reset state.
>>
>> > phase 2: allow sampling inputs
>>
>> This could lead to incorrect state for complex networks. It would
>> still be better than what we have now.
>>
>
> Can you give an example?  Can be theoretical, doesn't have to refer to real hardware.

For example, outputs A and B should both be driven high by reset. They
are connected to a XNOR gate, whose output is fed to edge triggered
device. The device should not see any edges outside of the reset
cycle, during reset cycle they are ignored.
Avi Kivity - Oct. 2, 2011, 7:35 p.m.
> >
> > Can you give an example?  Can be theoretical, doesn't have to refer
> > to real hardware.
> 
> For example, outputs A and B should both be driven high by reset.
> They
> are connected to a XNOR gate, whose output is fed to edge triggered
> device. The device should not see any edges outside of the reset
> cycle, during reset cycle they are ignored.
> 

I don't see the issue?  After phase 1 the two outputs will be high, after phase two they will be whatever the device logic computes.

During phase 1 you may see an edge, but that also happens with real hardware.  The target device my see A and B driven high before it sees the reset pulse, and A and B (and the inputs of the XNOR gate) may have different timings.

The device may see an edge immediately before reset, but then it will be reset itself.
Blue Swirl - Oct. 2, 2011, 7:39 p.m.
On Sun, Oct 2, 2011 at 7:20 PM, Avi Kivity <avi@redhat.com> wrote:
>
>
> ----- Original Message -----
>> On Sun, Oct 2, 2011 at 4:56 PM, Avi Kivity <avi@redhat.com> wrote:
>> > On 10/01/2011 10:31 AM, Blue Swirl wrote:
>> >>
>> >> Therefore it is incorrect to perform any qemu_irq activities
>> >> during
>> >> reset (also VM restore like the original example), don't you
>> >> agree?
>> >
>> > It is not incorrect.  Real hardware updates outputs on RESET
>> > assertion, and
>> > real hardware deals with devices entering reset at different times
>> > (due to
>> > signal propagation delay or slow devices).
>>
>> Yes, but on real hardware, during the propagation of any effects, the
>> reset line is held asserted for millions of clock cycles in order to
>> stabilize the machine.
>
> But nothing can happen in these cycles, since qemu emulates everything that happens in the on the edge, and any timers will be cancelled.

There are some wild activities in the beginning of the reset, but
after that nothing will happen.

>>
>> >> If we continued to reset all the devices (call the reset handlers
>> >> multiple times), eventually machine state should stabilize
>> >> (equivalent
>> >> of real HW with nice long reset pulses), but on QEMU the reset
>> >> event
>> >> is infinitely short so we have to be more careful.
>> >
>> > calling qemu_irq_pulse(reset) simulates a reset signal of any
>> > length (since
>> > nothing happens between the two edges).
>>
>> Not really. The first edge could trigger the reset (but don't sample
>> inputs) phase, this would be equal to forcefully stabilizing any
>> device internal state. The second would release the device inputs,
>> but
>> that's also a source of problems.
>
>
> Can you elaborate on the problems?
>
>>
>> >> Actually I don't think that even a two-phase reset with qemu_irq
>> >> or
>> >> Pin activity on the second phase would produce correct results in
>> >> every obscure case. Though this may be detectable since the start
>> >> state would be known.
>> >
>> > The output signals have to stabilize before the second edge of the
>> > reset
>> > signal.
>>
>> They can't since the devices' inputs are ignored at that phase.
>>
>
> A real device also ignores inputs during reset (or if it doesn't, we can just emulate that).

Maybe this could work:
1 - issue start of reset cycle (raise qemu_irq, unrealize): internal
states reset, no I/O.
2 - issue start of I/O, reset held (new phase): evaluate inputs and
post outputs, but consider also that reset is still active, so
transition is not complete for all devices
3 - release reset line (lower qemu_irq, realize): gentlemen, start your engines

During phase 2 some of the network effects could be contained.

> I would really like to see a concrete example we can discuss.

See my previous message.
Blue Swirl - Oct. 2, 2011, 7:40 p.m.
On Sun, Oct 2, 2011 at 7:35 PM, Avi Kivity <avi@redhat.com> wrote:
>> >
>> > Can you give an example?  Can be theoretical, doesn't have to refer
>> > to real hardware.
>>
>> For example, outputs A and B should both be driven high by reset.
>> They
>> are connected to a XNOR gate, whose output is fed to edge triggered
>> device. The device should not see any edges outside of the reset
>> cycle, during reset cycle they are ignored.
>>
>
> I don't see the issue?  After phase 1 the two outputs will be high, after phase two they will be whatever the device logic computes.
>
> During phase 1 you may see an edge, but that also happens with real hardware.  The target device my see A and B driven high before it sees the reset pulse, and A and B (and the inputs of the XNOR gate) may have different timings.
>
> The device may see an edge immediately before reset, but then it will be reset itself.

The difference is that on real HW the edge will happen when the reset
line is still active, so it will be ignored.
Avi Kivity - Oct. 2, 2011, 7:44 p.m.
> >
> > A real device also ignores inputs during reset (or if it doesn't,
> > we can just emulate that).
> 
> Maybe this could work:
> 1 - issue start of reset cycle (raise qemu_irq, unrealize): internal
> states reset, no I/O.
> 2 - issue start of I/O, reset held (new phase): evaluate inputs and
> post outputs, but consider also that reset is still active, so
> transition is not complete for all devices

This doesn't correspond to real hardware.  Each device detects the reset edge independently.  There is no signal that says "all devices have seen the edge".

We should either have just 1, or merge 1 and 2.
Avi Kivity - Oct. 2, 2011, 7:47 p.m.
> >> For example, outputs A and B should both be driven high by reset.
> >> They
> >> are connected to a XNOR gate, whose output is fed to edge
> >> triggered
> >> device. The device should not see any edges outside of the reset
> >> cycle, during reset cycle they are ignored.
> >>
> >
> > I don't see the issue?  After phase 1 the two outputs will be high,
> > after phase two they will be whatever the device logic computes.
> >
> > During phase 1 you may see an edge, but that also happens with real
> > hardware.  The target device my see A and B driven high before it
> > sees the reset pulse, and A and B (and the inputs of the XNOR
> > gate) may have different timings.
> >
> > The device may see an edge immediately before reset, but then it
> > will be reset itself.
> 
> The difference is that on real HW the edge will happen when the reset
> line is still active, so it will be ignored.
> 

There is no way to guarantee this.  If A is driven high before the target device detects RESET, it will see the edge.

Of course real hardware has timing specs, but these are maximum latencies.  If the XNOR gate is especially fast today it can overtake the target device's reset edge detector.
Jan Kiszka - Oct. 2, 2011, 7:47 p.m.
On 2011-10-02 21:07, Avi Kivity wrote:
>>>
>>> The way to fix it is two-phase reset:
>>>
>>> phase 1: reset internal state (-> move all outputs to reset
>>> values),
>>> don't sample inputs yet
>>> phase 2: allow sampling inputs
>>
>> As far as I understood Anthony's QOM plans, phase 1 will correspond
>> to
>> "unrealize", phase 2 to "realize".
> 
> That smells of abusing mechanism used for construction for reset purposes.
> 
> Why not use an ordinary qemu_irq?  It reresents a pin; 0->1 edge (assert) enters phase 1, 1->0 edge (deassert) enters phase 2.  Exactly like real hardware.

QEMU makes no difference between power-on reset and system reset right
now. At least I am not aware of any device model that has problems with
this. Thus there is apparently no need to treat reset and create
differently. As QOM requires a two-phase device creation anyway (to let
properties to "settle"), it looks like there is some reuse potential.

Jan
Blue Swirl - Oct. 2, 2011, 7:49 p.m.
On Sun, Oct 2, 2011 at 7:44 PM, Avi Kivity <avi@redhat.com> wrote:
>> >
>> > A real device also ignores inputs during reset (or if it doesn't,
>> > we can just emulate that).
>>
>> Maybe this could work:
>> 1 - issue start of reset cycle (raise qemu_irq, unrealize): internal
>> states reset, no I/O.
>> 2 - issue start of I/O, reset held (new phase): evaluate inputs and
>> post outputs, but consider also that reset is still active, so
>> transition is not complete for all devices
>
> This doesn't correspond to real hardware.   Each device detects the reset edge independently.  There is no signal that says "all devices have seen the edge".

Reset is not edge triggered but level signal.

> We should either have just 1, or merge 1 and 2.
>

Then we'd be back to where the problems started.
Avi Kivity - Oct. 2, 2011, 7:50 p.m.
> > 
> > Why not use an ordinary qemu_irq?  It reresents a pin; 0->1 edge
> > (assert) enters phase 1, 1->0 edge (deassert) enters phase 2.
> >  Exactly like real hardware.
> 
> QEMU makes no difference between power-on reset and system reset
> right
> now. At least I am not aware of any device model that has problems
> with
> this. Thus there is apparently no need to treat reset and create
> differently. As QOM requires a two-phase device creation anyway (to
> let
> properties to "settle"), it looks like there is some reuse potential.
> 

I'm all for sharing this.  But let's model it as a qemu_irq or Pin.

btw, we could have qemu_irqs that take an extra qemu_irq reset input, so we can do this without much coding.
Blue Swirl - Oct. 2, 2011, 7:52 p.m.
On Sun, Oct 2, 2011 at 7:47 PM, Avi Kivity <avi@redhat.com> wrote:
>> >> For example, outputs A and B should both be driven high by reset.
>> >> They
>> >> are connected to a XNOR gate, whose output is fed to edge
>> >> triggered
>> >> device. The device should not see any edges outside of the reset
>> >> cycle, during reset cycle they are ignored.
>> >>
>> >
>> > I don't see the issue?  After phase 1 the two outputs will be high,
>> > after phase two they will be whatever the device logic computes.
>> >
>> > During phase 1 you may see an edge, but that also happens with real
>> > hardware.  The target device my see A and B driven high before it
>> > sees the reset pulse, and A and B (and the inputs of the XNOR
>> > gate) may have different timings.
>> >
>> > The device may see an edge immediately before reset, but then it
>> > will be reset itself.
>>
>> The difference is that on real HW the edge will happen when the reset
>> line is still active, so it will be ignored.
>>
>
> There is no way to guarantee this.  If A is driven high before the target device detects RESET, it will see the edge.

That case is not what we have here, it would be equivalent of pulsing
qemu_irq reset lines for each device in order. This would be even
worse than what we have now.

> Of course real hardware has timing specs, but these are maximum latencies.  If the XNOR gate is especially fast today it can overtake the target device's reset edge detector.

Devices don't have reset edge detectors.
Avi Kivity - Oct. 2, 2011, 7:52 p.m.
----- Original Message -----
> On Sun, Oct 2, 2011 at 7:44 PM, Avi Kivity <avi@redhat.com> wrote:
> >> >
> >> > A real device also ignores inputs during reset (or if it
> >> > doesn't,
> >> > we can just emulate that).
> >>
> >> Maybe this could work:
> >> 1 - issue start of reset cycle (raise qemu_irq, unrealize):
> >> internal
> >> states reset, no I/O.
> >> 2 - issue start of I/O, reset held (new phase): evaluate inputs
> >> and
> >> post outputs, but consider also that reset is still active, so
> >> transition is not complete for all devices
> >
> > This doesn't correspond to real hardware.   Each device detects the
> > reset edge independently.  There is no signal that says "all
> > devices have seen the edge".
> 
> Reset is not edge triggered but level signal.

It still has propagation delay.  If your XNOR gate connects to the NORAD master launch controller, your design may have side effects.

> 
> > We should either have just 1, or merge 1 and 2.
> >
> 
> Then we'd be back to where the problems started.
> 

Sorry, I don't see the problem yet.
Avi Kivity - Oct. 2, 2011, 7:58 p.m.
> >
> > There is no way to guarantee this.  If A is driven high before the
> > target device detects RESET, it will see the edge.
> 
> That case is not what we have here, it would be equivalent of pulsing
> qemu_irq reset lines for each device in order. This would be even
> worse than what we have now.


That's  not what I'm saying.


> 
> > Of course real hardware has timing specs, but these are maximum
> > latencies.  If the XNOR gate is especially fast today it can
> > overtake the target device's reset edge detector.
> 
> Devices don't have reset edge detectors.
> 

Say the target device's output has an AND connecting #RESET and an input, to the output.  When #RESET is asserted, the input is driven low.  The output is connected to a counter.

When #RESET is asserted, the source device's A and B are raised high, with delay Da and Db.  If they are different, the XNOR gate generates a pulse with delay Dx.  If Dx is smaller than the AND gate's delay Drm, then the counter will count.
Blue Swirl - Oct. 2, 2011, 7:59 p.m.
On Sun, Oct 2, 2011 at 7:52 PM, Avi Kivity <avi@redhat.com> wrote:
>
>
> ----- Original Message -----
>> On Sun, Oct 2, 2011 at 7:44 PM, Avi Kivity <avi@redhat.com> wrote:
>> >> >
>> >> > A real device also ignores inputs during reset (or if it
>> >> > doesn't,
>> >> > we can just emulate that).
>> >>
>> >> Maybe this could work:
>> >> 1 - issue start of reset cycle (raise qemu_irq, unrealize):
>> >> internal
>> >> states reset, no I/O.
>> >> 2 - issue start of I/O, reset held (new phase): evaluate inputs
>> >> and
>> >> post outputs, but consider also that reset is still active, so
>> >> transition is not complete for all devices
>> >
>> > This doesn't correspond to real hardware.   Each device detects the
>> > reset edge independently.  There is no signal that says "all
>> > devices have seen the edge".
>>
>> Reset is not edge triggered but level signal.
>
> It still has propagation delay.  If your XNOR gate connects to the NORAD master launch controller, your design may have side effects.

Hopefully the reset signal would control also the edge triggered
launch controller.

The delays are one reason why real reset pulses are so long. The other
reasons are the stabilization effects, internal and external.

>> > We should either have just 1, or merge 1 and 2.
>> >
>>
>> Then we'd be back to where the problems started.
>>
>
> Sorry, I don't see the problem yet.

I'm not sure we have a complete solution either. Phase 1 would solve
most problems with old state which is the most important issue here.
If the machine refuses to wake up from reset, this should be noticed
quite soon, so we could move to dual phase system while still
searching for the best possible system in all possible worlds.
Avi Kivity - Oct. 2, 2011, 8:03 p.m.
> >
> > It still has propagation delay.  If your XNOR gate connects to the
> > NORAD master launch controller, your design may have side effects.
> 
> Hopefully the reset signal would control also the edge triggered
> launch controller.

It doesn't help.  There is propagation delay there as well.  If the input wins the race against reset, the launch sequence is started.

> 
> The delays are one reason why real reset pulses are so long. The
> other
> reasons are the stabilization effects, internal and external.
> 
> >> > We should either have just 1, or merge 1 and 2.
> >> >
> >>
> >> Then we'd be back to where the problems started.
> >>
> >
> > Sorry, I don't see the problem yet.
> 
> I'm not sure we have a complete solution either. Phase 1 would solve
> most problems with old state which is the most important issue here.
> If the machine refuses to wake up from reset, this should be noticed
> quite soon, so we could move to dual phase system while still
> searching for the best possible system in all possible worlds.
> 


Okay.  I still don't think there is a problem.
Blue Swirl - Oct. 2, 2011, 8:05 p.m.
On Sun, Oct 2, 2011 at 7:58 PM, Avi Kivity <avi@redhat.com> wrote:
>> >
>> > There is no way to guarantee this.  If A is driven high before the
>> > target device detects RESET, it will see the edge.
>>
>> That case is not what we have here, it would be equivalent of pulsing
>> qemu_irq reset lines for each device in order. This would be even
>> worse than what we have now.
>
>
> That's  not what I'm saying.
>
>
>>
>> > Of course real hardware has timing specs, but these are maximum
>> > latencies.  If the XNOR gate is especially fast today it can
>> > overtake the target device's reset edge detector.
>>
>> Devices don't have reset edge detectors.
>>
>
> Say the target device's output has an AND connecting #RESET and an input, to the output.  When #RESET is asserted, the input is driven low.  The output is connected to a counter.
>
> When #RESET is asserted, the source device's A and B are raised high, with delay Da and Db.  If they are different, the XNOR gate generates a pulse with delay Dx.  If Dx is smaller than the AND gate's delay Drm, then the counter will count.

On a real device, the reset and clocks are the few global signals
available, distributed to most places. The counter would be
constructed of JK flip flops and the reset line would be connected to
the clear line of the flip flops. Then the counters don't count while
reset is active.
Blue Swirl - Oct. 2, 2011, 8:11 p.m.
On Sun, Oct 2, 2011 at 8:03 PM, Avi Kivity <avi@redhat.com> wrote:
>> >
>> > It still has propagation delay.  If your XNOR gate connects to the
>> > NORAD master launch controller, your design may have side effects.
>>
>> Hopefully the reset signal would control also the edge triggered
>> launch controller.
>
> It doesn't help.  There is propagation delay there as well.  If the input wins the race against reset, the launch sequence is started.

To bring the example back to QEMU, a disk write could be issued or
network packet could be sent. But still, this only confirms that
during the initial phase of reset, no output activities can be allowed
to avoid such problems.

>> The delays are one reason why real reset pulses are so long. The
>> other
>> reasons are the stabilization effects, internal and external.
>>
>> >> > We should either have just 1, or merge 1 and 2.
>> >> >
>> >>
>> >> Then we'd be back to where the problems started.
>> >>
>> >
>> > Sorry, I don't see the problem yet.
>>
>> I'm not sure we have a complete solution either. Phase 1 would solve
>> most problems with old state which is the most important issue here.
>> If the machine refuses to wake up from reset, this should be noticed
>> quite soon, so we could move to dual phase system while still
>> searching for the best possible system in all possible worlds.
>>
>
>
> Okay.  I still don't think there is a problem.

There is, Jan's real example and my theoretical cases show that.
Avi Kivity - Oct. 2, 2011, 8:14 p.m.
> >
> > Say the target device's output has an AND connecting #RESET and an
> > input, to the output.  When #RESET is asserted, the input is
> > driven low.  The output is connected to a counter.
> >
> > When #RESET is asserted, the source device's A and B are raised
> > high, with delay Da and Db.  If they are different, the XNOR gate
> > generates a pulse with delay Dx.  If Dx is smaller than the AND
> > gate's delay Drm, then the counter will count.
> 
> On a real device, the reset and clocks are the few global signals
> available, distributed to most places. The counter would be
> constructed of JK flip flops and the reset line would be connected to
> the clear line of the flip flops. Then the counters don't count while
> reset is active.
> 
> 

You could have a momentary count if the RESET input's gate delay to the counter is slower than the other inputs.  It would be cleared immediately afterwards (when phase 1 sweeps into the counter as well).

What I'm saying is that RESET order isn't defined on real hardware either, due to signal propagation effects.
Avi Kivity - Oct. 2, 2011, 8:17 p.m.
> >
> > It doesn't help.  There is propagation delay there as well.  If the
> > input wins the race against reset, the launch sequence is started.
> 
> To bring the example back to QEMU, a disk write could be issued or
> network packet could be sent. But still, this only confirms that
> during the initial phase of reset, no output activities can be
> allowed
> to avoid such problems.

In fact these aren't problems.  The packet may be sent or data written, as long as they aren't corrupted.  A device is allowed to "delay" a reset (but not indefinitely).


> > Okay.  I still don't think there is a problem.
> 
> There is, Jan's real example and my theoretical cases show that.
> 

I believe your example has the same problem on real hardware.  Jan, is your problem not solved by two-phase reset?
Blue Swirl - Oct. 2, 2011, 8:18 p.m.
On Sun, Oct 2, 2011 at 8:14 PM, Avi Kivity <avi@redhat.com> wrote:
>> >
>> > Say the target device's output has an AND connecting #RESET and an
>> > input, to the output.  When #RESET is asserted, the input is
>> > driven low.  The output is connected to a counter.
>> >
>> > When #RESET is asserted, the source device's A and B are raised
>> > high, with delay Da and Db.  If they are different, the XNOR gate
>> > generates a pulse with delay Dx.  If Dx is smaller than the AND
>> > gate's delay Drm, then the counter will count.
>>
>> On a real device, the reset and clocks are the few global signals
>> available, distributed to most places. The counter would be
>> constructed of JK flip flops and the reset line would be connected to
>> the clear line of the flip flops. Then the counters don't count while
>> reset is active.
>>
>>
>
> You could have a momentary count if the RESET input's gate delay to the counter is slower than the other inputs.  It would be cleared immediately afterwards (when phase 1 sweeps into the counter as well).
>
> What I'm saying is that RESET order isn't defined on real hardware either, due to signal propagation effects.

Yes, but there the millions of reset cycles help immensely to suppress
the effects.
Avi Kivity - Oct. 2, 2011, 8:21 p.m.
> >
> > What I'm saying is that RESET order isn't defined on real hardware
> > either, due to signal propagation effects.
> 
> Yes, but there the millions of reset cycles help immensely to
> suppress
> the effects.
> 

That's modeled correctly.  After the end of phase 1, everything is settled.  During phase 1, you can see some spikes, but you can see them on real hardware as well.
Blue Swirl - Oct. 2, 2011, 8:26 p.m.
On Sun, Oct 2, 2011 at 8:17 PM, Avi Kivity <avi@redhat.com> wrote:
>> >
>> > It doesn't help.  There is propagation delay there as well.  If the
>> > input wins the race against reset, the launch sequence is started.
>>
>> To bring the example back to QEMU, a disk write could be issued or
>> network packet could be sent. But still, this only confirms that
>> during the initial phase of reset, no output activities can be
>> allowed
>> to avoid such problems.
>
> In fact these aren't problems.  The packet may be sent or data written, as long as they aren't corrupted.  A device is allowed to "delay" a reset (but not indefinitely).

Oh, but corruption could easily happen. Consider for example a disk
controller waiting for DMA ready signal a device separate from the DMA
controller. Due to reset glitches in the device or signal chain, the
DMA ready signal arrives but the DMA controller still contains old
information, writing the data to disk from wrong memory location.

>> > Okay.  I still don't think there is a problem.
>>
>> There is, Jan's real example and my theoretical cases show that.
>>
>
> I believe your example has the same problem on real hardware.  Jan, is your problem not solved by two-phase reset?

On real hardware the problem is masked by the long reset pulse. We
have an infinitely short pulse.

Jan's case should be OK, a CPU doesn't have any outputs.
Blue Swirl - Oct. 2, 2011, 8:30 p.m.
On Sun, Oct 2, 2011 at 8:21 PM, Avi Kivity <avi@redhat.com> wrote:
>> >
>> > What I'm saying is that RESET order isn't defined on real hardware
>> > either, due to signal propagation effects.
>>
>> Yes, but there the millions of reset cycles help immensely to
>> suppress
>> the effects.
>>
>
> That's modeled correctly.  After the end of phase 1, everything is settled.  During phase 1, you can see some spikes, but you can see them on real hardware as well.

No. With two phase reset (like I understood it), at the beginning of
phase, everything internal is settled (registers reset), no I/O. After
the phase 1, starting the external I/O activities cause spikes but
these are not suppressed.

In my 3-phase version, the reset would be held during phase 2 just to
handle the spikes (registers continue to be reset as needed).
Avi Kivity - Oct. 2, 2011, 8:31 p.m.
> >
> > In fact these aren't problems.  The packet may be sent or data
> > written, as long as they aren't corrupted.  A device is allowed to
> > "delay" a reset (but not indefinitely).
> 
> Oh, but corruption could easily happen. Consider for example a disk
> controller waiting for DMA ready signal a device separate from the
> DMA
> controller. Due to reset glitches in the device or signal chain, the
> DMA ready signal arrives but the DMA controller still contains old
> information, writing the data to disk from wrong memory location.


Would not this corruption also happen on real hardware?  If reset to the disk controller is delayed by a slow gate or extra capacitance on a line?
Blue Swirl - Oct. 2, 2011, 8:36 p.m.
On Sun, Oct 2, 2011 at 8:31 PM, Avi Kivity <avi@redhat.com> wrote:
>
>> >
>> > In fact these aren't problems.  The packet may be sent or data
>> > written, as long as they aren't corrupted.  A device is allowed to
>> > "delay" a reset (but not indefinitely).
>>
>> Oh, but corruption could easily happen. Consider for example a disk
>> controller waiting for DMA ready signal a device separate from the
>> DMA
>> controller. Due to reset glitches in the device or signal chain, the
>> DMA ready signal arrives but the DMA controller still contains old
>> information, writing the data to disk from wrong memory location.
>
>
> Would not this corruption also happen on real hardware?  If reset to the disk controller is delayed by a slow gate or extra capacitance on a line?

Maybe, but the delays are probably too short on real HW before any
packets are sent or disk gets written. On QEMU, I/O can be
instantaneous.
Avi Kivity - Oct. 2, 2011, 8:39 p.m.
----- Original Message -----
> On Sun, Oct 2, 2011 at 8:21 PM, Avi Kivity <avi@redhat.com> wrote:
> >> >
> >> > What I'm saying is that RESET order isn't defined on real
> >> > hardware
> >> > either, due to signal propagation effects.
> >>
> >> Yes, but there the millions of reset cycles help immensely to
> >> suppress
> >> the effects.
> >>
> >
> > That's modeled correctly.  After the end of phase 1, everything is
> > settled.  During phase 1, you can see some spikes, but you can see
> > them on real hardware as well.
> 
> No. With two phase reset (like I understood it), at the beginning of
> phase, everything internal is settled (registers reset), no I/O.
> After
> the phase 1, starting the external I/O activities cause spikes but
> these are not suppressed.

Phase 1 is qemu_irq_raise() on all RESET inputs.  During this period, outputs of devices that have seen the edge move to their RESET values, and they ignore inputs.  Because it doesn't happen simultaneously, some devices see those outputs moving and may act on them.  This corresponds to different devices having different propagation delays.

At the end of phase 1, everything is settled.  This corresponds to T = max(reset latency of all devices).

Phase 2 is qemu_irq_lower() on all RESET inputs.  At this time, inputs begin to be considered.

Between phase 1 and phase 2 are those millions of cycles (minus T above).


So yes, you see spikes, but you also see them on real hardware.
Avi Kivity - Oct. 2, 2011, 8:41 p.m.
> >
> >
> > Would not this corruption also happen on real hardware?  If reset
> > to the disk controller is delayed by a slow gate or extra
> > capacitance on a line?
> 
> Maybe, but the delays are probably too short on real HW before any
> packets are sent or disk gets written. On QEMU, I/O can be
> instantaneous.
> 

Right, this is a real difference.  If any hardware actually depends on this, we can model it by launching a timer instead of issuing the I/O.  When the reset arrives to the disk controller, it will cancel the timer.

This is expensive both to code and in run-time performance, but we can afford the expense since we don't have such a case, yes?
Blue Swirl - Oct. 2, 2011, 8:53 p.m.
On Sun, Oct 2, 2011 at 8:39 PM, Avi Kivity <avi@redhat.com> wrote:
>
>
> ----- Original Message -----
>> On Sun, Oct 2, 2011 at 8:21 PM, Avi Kivity <avi@redhat.com> wrote:
>> >> >
>> >> > What I'm saying is that RESET order isn't defined on real
>> >> > hardware
>> >> > either, due to signal propagation effects.
>> >>
>> >> Yes, but there the millions of reset cycles help immensely to
>> >> suppress
>> >> the effects.
>> >>
>> >
>> > That's modeled correctly.  After the end of phase 1, everything is
>> > settled.  During phase 1, you can see some spikes, but you can see
>> > them on real hardware as well.
>>
>> No. With two phase reset (like I understood it), at the beginning of
>> phase, everything internal is settled (registers reset), no I/O.
>> After
>> the phase 1, starting the external I/O activities cause spikes but
>> these are not suppressed.
>
> Phase 1 is qemu_irq_raise() on all RESET inputs.  During this period, outputs of devices that have seen the edge move to their RESET values, and they ignore inputs.  Because it doesn't happen simultaneously, some devices see those outputs moving and may act on them.  This corresponds to different devices having different propagation delays.
> At the end of phase 1, everything is settled.  This corresponds to T = max(reset latency of all devices).
>
> Phase 2 is qemu_irq_lower() on all RESET inputs.  At this time, inputs begin to be considered.
>
> Between phase 1 and phase 2 are those millions of cycles (minus T above).
>
>
> So yes, you see spikes, but you also see them on real hardware.

With this definition of phases, at least the DEV/PIC1/PIC2 example I
presented earlier isn't possible. PIC2 would ignore the new input, so
it would retain reset state.
Blue Swirl - Oct. 2, 2011, 8:55 p.m.
On Sun, Oct 2, 2011 at 8:41 PM, Avi Kivity <avi@redhat.com> wrote:
>> >
>> >
>> > Would not this corruption also happen on real hardware?  If reset
>> > to the disk controller is delayed by a slow gate or extra
>> > capacitance on a line?
>>
>> Maybe, but the delays are probably too short on real HW before any
>> packets are sent or disk gets written. On QEMU, I/O can be
>> instantaneous.
>>
>
> Right, this is a real difference.  If any hardware actually depends on this, we can model it by launching a timer instead of issuing the I/O.  When the reset arrives to the disk controller, it will cancel the timer.
>
> This is expensive both to code and in run-time performance, but we can afford the expense since we don't have such a case, yes?

It's already in, see -win2k-hack and floppy DMA hack.
Paolo Bonzini - Oct. 3, 2011, 7:21 a.m.
On 10/02/2011 10:55 PM, Blue Swirl wrote:
> It's already in, see -win2k-hack and floppy DMA hack.

Is the floppy DMA hack just a consequence of the weird idle BHs, or is 
it to actually please a real BIOS?

Paolo
Avi Kivity - Oct. 4, 2011, 12:12 p.m.
On 10/02/2011 10:36 PM, Blue Swirl wrote:
> On Sun, Oct 2, 2011 at 8:31 PM, Avi Kivity<avi@redhat.com>  wrote:
> >
> >>  >
> >>  >  In fact these aren't problems.  The packet may be sent or data
> >>  >  written, as long as they aren't corrupted.  A device is allowed to
> >>  >  "delay" a reset (but not indefinitely).
> >>
> >>  Oh, but corruption could easily happen. Consider for example a disk
> >>  controller waiting for DMA ready signal a device separate from the
> >>  DMA
> >>  controller. Due to reset glitches in the device or signal chain, the
> >>  DMA ready signal arrives but the DMA controller still contains old
> >>  information, writing the data to disk from wrong memory location.
> >
> >
> >  Would not this corruption also happen on real hardware?  If reset to the disk controller is delayed by a slow gate or extra capacitance on a line?
>
> Maybe, but the delays are probably too short on real HW before any
> packets are sent or disk gets written. On QEMU, I/O can be
> instantaneous.

As an anecdote, while reading a chip errata I came upon this:



15. CPU May Record Signal Glitches When MCH is Being Reset

Problem: When the MCH is reset via RSTIN# the CPU may record any one or 
more of the
following errors: address strobe glitch (MSR IA32_MCi_STATUS bit 23), 
data strobe
glitch (MSR IA32_MCi_STATUS bit 22), P/N data strobes out of sync (MSR
IA32_MCi_STATUS bit 21), PIC & FSB data parity (MSR IA32_MCi_STATUS bit 
19), RSP
parity (MSR IA32_MCi_STATUS bit 18), or FSB address parity (MSR 
IA32_MCi_STATUS
bit 16).  This can happen when the MCH asserts CPURST# just after the 
MCH drives an
FSB transaction.  This may happen because RSTIN# and CPURST# maintain an
asynchronous relationship with each other.

Workaround: None.

Status: No Fix

(of course the situation there is different, there is no global reset 
signal)

Patch

diff --git a/hw/i8259.c b/hw/i8259.c
index b7a011f..3498c6b 100644
--- a/hw/i8259.c
+++ b/hw/i8259.c
@@ -283,6 +283,7 @@  static void pic_reset(void *opaque)
     s->init4 = 0;
     s->single_mode = 0;
     /* Note: ELCR is not reset */
+    pic_update_irq(s->pics_state);
 }
 
 static void pic_ioport_write(void *opaque, target_phys_addr_t addr64,
@@ -298,8 +299,6 @@  static void pic_ioport_write(void *opaque, target_phys_addr_t addr64,
         if (val & 0x10) {
             /* init */
             pic_reset(s);
-            /* deassert a pending interrupt */
-            qemu_irq_lower(s->pics_state->pics[0].int_out);
             s->init_state = 1;
             s->init4 = val & 1;
             s->single_mode = val & 2;