diff mbox

[09/10] piix: do not raise irq while loading vmstate

Message ID 1410265809-27247-10-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Sept. 9, 2014, 12:30 p.m. UTC
From: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>

This patch disables raising an irq while loading the state of PCI bridge.
The aim of this patch is preserving the same behavior while saving and
restoring the VM state. IRQ is not raised while saving the state of the bridge.
That's why the behavior of the restored system will differ from
the original one. This patch eliminates raising an irq and just restores
the calculated state fields in post_load function.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci-host/piix.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

Comments

Juan Quintela Sept. 9, 2014, 1:37 p.m. UTC | #1
Paolo Bonzini <pbonzini@redhat.com> wrote:
> From: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
>
> This patch disables raising an irq while loading the state of PCI bridge.
> The aim of this patch is preserving the same behavior while saving and
> restoring the VM state. IRQ is not raised while saving the state of the bridge.
> That's why the behavior of the restored system will differ from
> the original one. This patch eliminates raising an irq and just restores
> the calculated state fields in post_load function.
>
> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

No problem from migration point of view.
I assume that the pci bits are ok, so

Acked-by: Juan Quintela <quintela@redhat.com>
Michael S. Tsirkin Sept. 9, 2014, 1:54 p.m. UTC | #2
On Tue, Sep 09, 2014 at 02:30:08PM +0200, Paolo Bonzini wrote:
> From: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
> 
> This patch disables raising an irq while loading the state of PCI bridge.
> The aim of this patch is preserving the same behavior while saving and
> restoring the VM state. IRQ is not raised while saving the state of the bridge.
> That's why the behavior of the restored system will differ from
> the original one.


Hmm I don't understand.
You are removing call to piix3_update_irq_levels
this call is supposed to just sync up bus to
irq level.

How can this change system state? Saved state
is supposed to already be in sync.


> This patch eliminates raising an irq and just restores
> the calculated state fields in post_load function.
> 
> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/pci-host/piix.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
> index e0e0946..cd50435 100644
> --- a/hw/pci-host/piix.c
> +++ b/hw/pci-host/piix.c
> @@ -409,7 +409,7 @@ static void piix3_set_irq_pic(PIIX3State *piix3, int pic_irq)
>                       (pic_irq * PIIX_NUM_PIRQS))));
>  }
>  
> -static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level)
> +static void piix3_set_irq_level_internal(PIIX3State *piix3, int pirq, int level)
>  {
>      int pic_irq;
>      uint64_t mask;
> @@ -422,6 +422,18 @@ static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level)
>      mask = 1ULL << ((pic_irq * PIIX_NUM_PIRQS) + pirq);
>      piix3->pic_levels &= ~mask;
>      piix3->pic_levels |= mask * !!level;
> +}
> +
> +static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level)
> +{
> +    int pic_irq;
> +
> +    pic_irq = piix3->dev.config[PIIX_PIRQC + pirq];
> +    if (pic_irq >= PIIX_NUM_PIC_IRQS) {
> +        return;
> +    }
> +
> +    piix3_set_irq_level_internal(piix3, pirq, level);
>  
>      piix3_set_irq_pic(piix3, pic_irq);
>  }
> @@ -527,7 +539,18 @@ static void piix3_reset(void *opaque)
>  static int piix3_post_load(void *opaque, int version_id)
>  {
>      PIIX3State *piix3 = opaque;
> -    piix3_update_irq_levels(piix3);
> +    int pirq;
> +
> +    /* Update irq levels without raising an interrupt which could
> +     * happen in piix3_update_irq_levels.  Raising an IRQ will
> +     * bring the system to a different state than the saved one.

I don't like comments like this: don't discuss what could
have been if code was different. Explain why code
is like it is.

> +     * Interrupt state is serialized separately through the i8259.
> +     */
> +    piix3->pic_levels = 0;
> +    for (pirq = 0; pirq < PIIX_NUM_PIRQS; pirq++) {
> +        piix3_set_irq_level_internal(piix3, pirq,
> +                            pci_bus_get_irq_level(piix3->dev.bus, pirq));
> +    }
>      return 0;
>  }
>  
> -- 
> 2.1.0
> 
>
Paolo Bonzini Sept. 9, 2014, 5:16 p.m. UTC | #3
Il 09/09/2014 15:54, Michael S. Tsirkin ha scritto:
> 
> Hmm I don't understand.
> You are removing call to piix3_update_irq_levels
> this call is supposed to just sync up bus to
> irq level.
> 
> How can this change system state? Saved state
> is supposed to already be in sync.

i440FX/PIIX3 state is loaded before i8259, so the interrupt will never
be in the i8259 ISR.  I am not sure why it is a problem for
record/replay, but I think it's plausible to consider this a bug.  i8259
state should not be affected by the load of PIIX3 state, since i8259 is
migrated separately.

Paolo
Michael S. Tsirkin Sept. 9, 2014, 8:51 p.m. UTC | #4
On Tue, Sep 09, 2014 at 07:16:19PM +0200, Paolo Bonzini wrote:
> Il 09/09/2014 15:54, Michael S. Tsirkin ha scritto:
> > 
> > Hmm I don't understand.
> > You are removing call to piix3_update_irq_levels
> > this call is supposed to just sync up bus to
> > irq level.
> > 
> > How can this change system state? Saved state
> > is supposed to already be in sync.
> 
> i440FX/PIIX3 state is loaded before i8259, so the interrupt will never
> be in the i8259 ISR.  I am not sure why it is a problem for
> record/replay, but I think it's plausible to consider this a bug.  i8259
> state should not be affected by the load of PIIX3 state, since i8259 is
> migrated separately.
> 
> Paolo

Sorry I still don't understand. Why do stuff from vmstate callback then?
How is it different?

I'd like to see a description of a scenario where this patch makes
a difference.
Paolo Bonzini Sept. 10, 2014, 8:38 a.m. UTC | #5
Il 09/09/2014 22:51, Michael S. Tsirkin ha scritto:
> > i440FX/PIIX3 state is loaded before i8259, so the interrupt will never
> > be in the i8259 ISR.  I am not sure why it is a problem for
> > record/replay, but I think it's plausible to consider this a bug.  i8259
> > state should not be affected by the load of PIIX3 state, since i8259 is
> > migrated separately.
> 
> Sorry I still don't understand. Why do stuff from vmstate callback then?
> How is it different?

Reconstructing internal state from post_load is okay.

What is not okay (and I think it should be a rule) is to touch other
devices from post_load, unless you know that they are deserialized
first.  For example it's okay for a PCI device to talk to the parent
bridge in its post_load function.

In the case of PIIX3 vs. i8259, however, you know that i8259 is
deserialized _last_ because i8259 is an ISA device and PIIX3 provides
the ISA bus.  So it's incorrect, even though it's currently harmless, to
touch the i8259 before it's deserialized.

> I'd like to see a description of a scenario where this patch makes
> a difference.

Of course it would be nice to have testcases for this, but I guess one
case could be:

- LAPIC configured in ExtINT mode

- interrupts are masked in the i8259, but the i8259 doesn't know that
yet because it's not been loaded yet

- the PIIX3 loads the state and the interrupt is set.  pic_set_irq is
called, calls pic_update_irq

- pic_update_irq calls pic_get_irq, which uses IMR=0 and thus raises LINT0

- the APIC has been loaded already, so LINT0 is injected incorrectly


Another case could be:

- i8259 is processing IRQ0.  The lower-priority interrupt from PIIX3 is
in IRR.  Machine is migrated.

- the PIIX3 loads the state and sets the interrupt in the i8259.
pic_set_irq is called, calls pic_update_irq, calls pic_get_irq

- because i8259 has not been loaded yet, pic_get_irq sees ISR=0 and the
interrupt is injected even though IRQ0 (higher priority) is being serviced.


In both cases, the saved i8259 state will have the PIIX3 interrupt in
IRR, so the interrupt is not lost, just held (as it would have been on
the source machine).

Paolo
Peter Maydell Sept. 10, 2014, 8:51 a.m. UTC | #6
On 10 September 2014 09:38, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 09/09/2014 22:51, Michael S. Tsirkin ha scritto:
>> Sorry I still don't understand. Why do stuff from vmstate callback then?
>> How is it different?
>
> Reconstructing internal state from post_load is okay.
>
> What is not okay (and I think it should be a rule) is to touch other
> devices from post_load, unless you know that they are deserialized
> first.  For example it's okay for a PCI device to talk to the parent
> bridge in its post_load function.

I don't think it's right to talk to another device even if you do
know it's deserialized first. Talking to it might make it change
its state, which would be wrong (since its correct state is
the state it's just deserialized). I would suggest the rule should
be "never do something that can change the state of another
device in post-load".

(We have similar issues with reset, except worse in that we
don't have a coherent rule to cause everything to come out
of reset in the right state.)

thanks
-- PMM
Paolo Bonzini Sept. 10, 2014, 9:05 a.m. UTC | #7
Il 10/09/2014 10:51, Peter Maydell ha scritto:
> > What is not okay (and I think it should be a rule) is to touch other
> > devices from post_load, unless you know that they are deserialized
> > first.  For example it's okay for a PCI device to talk to the parent
> > bridge in its post_load function.
> 
> I don't think it's right to talk to another device even if you do
> know it's deserialized first. Talking to it might make it change
> its state, which would be wrong (since its correct state is
> the state it's just deserialized). I would suggest the rule should
> be "never do something that can change the state of another
> device in post-load".

That's harder to do, but if it is possible to do it, it would be great
as well.

It would not surprise me to find a case where the parent device actually
_expects_ the children's post_load to inform it about something, instead
of serializing that part of state on its own.

Paolo

> (We have similar issues with reset, except worse in that we
> don't have a coherent rule to cause everything to come out
> of reset in the right state.)
Paolo Bonzini Sept. 10, 2014, 9:58 a.m. UTC | #8
Il 10/09/2014 12:50, Michael S. Tsirkin ha scritto:
> OK, got this, thanks for the explanation!
> So the reason i8259 might be out of sync is
> because it's not yet deserialized.

Yes, especially the IMR/IRR/ISR fields.

> I think it's a good idea to put (at least the
> last part) in the commit log.

Like this:

    This patch disables raising an irq while loading the state of PCI bridge.
    Because the i8259 has not been deserialized yet, raising an interrupt
    could bring the system out-of-sync with the migration source.  For example,
    the migration source could have masked the interrupt in the i8259. On the
    destination, the i8259 device model would not know that yet and would
    trigger an interrupt in the CPU.
    
    This patch eliminates raising an irq and just restores the calculated
    state fields in post_load function.  Interrupt state will be deserialized
    separately through the IRR field of the i8259.

> Also it's updating irq state, not just raising irq,
> that might be problematic, right?

Well, the i8259 is in the reset state so ISR=IRR=0, aka all IRQ lines 
are known to be low.  But yes, in general it's the update that is 
problematic.
Paolo Bonzini Sept. 10, 2014, 10:07 a.m. UTC | #9
Il 10/09/2014 13:04, Michael S. Tsirkin ha scritto:
>> >     This patch disables raising an irq while loading the state of PCI bridge.
>> >     Because the i8259 has not been deserialized yet, raising an interrupt
>> >     could bring the system out-of-sync with the migration source.  For example,
>> >     the migration source could have masked the interrupt in the i8259. On the
>> >     destination, the i8259 device model would not know that yet and would
>> >     trigger an interrupt in the CPU.
>> >     
>> >     This patch eliminates raising an irq and just restores the calculated
>> >     state fields in post_load function.  Interrupt state will be deserialized
>> >     separately through the IRR field of the i8259.
> Yes, thanks!
> Except imho it's a bit better to s/raising/setting/ in the last paragraph.

And pretty much everywhere else, not just in the last paragraph.

Thanks!

Paolo
Michael S. Tsirkin Sept. 10, 2014, 10:20 a.m. UTC | #10
On Wed, Sep 10, 2014 at 11:05:39AM +0200, Paolo Bonzini wrote:
> Il 10/09/2014 10:51, Peter Maydell ha scritto:
> > > What is not okay (and I think it should be a rule) is to touch other
> > > devices from post_load, unless you know that they are deserialized
> > > first.  For example it's okay for a PCI device to talk to the parent
> > > bridge in its post_load function.
> > 
> > I don't think it's right to talk to another device even if you do
> > know it's deserialized first. Talking to it might make it change
> > its state, which would be wrong (since its correct state is
> > the state it's just deserialized). I would suggest the rule should
> > be "never do something that can change the state of another
> > device in post-load".
> 
> That's harder to do, but if it is possible to do it, it would be great
> as well.
> 
> It would not surprise me to find a case where the parent device actually
> _expects_ the children's post_load to inform it about something, instead
> of serializing that part of state on its own.
> 
> Paolo

Absolutely, I don't think we can require that.

For example, at the moment, for PCI bridges, we serialize the
state of all interrupt lines, but that's just a function
of all devices connected to each line.
So we are transmitting redundant information, and I have plans
to discard that and recompute parent state based on child state.



> > (We have similar issues with reset, except worse in that we
> > don't have a coherent rule to cause everything to come out
> > of reset in the right state.)
Michael S. Tsirkin Sept. 10, 2014, 10:50 a.m. UTC | #11
On Wed, Sep 10, 2014 at 10:38:46AM +0200, Paolo Bonzini wrote:
> Il 09/09/2014 22:51, Michael S. Tsirkin ha scritto:
> > > i440FX/PIIX3 state is loaded before i8259, so the interrupt will never
> > > be in the i8259 ISR.  I am not sure why it is a problem for
> > > record/replay, but I think it's plausible to consider this a bug.  i8259
> > > state should not be affected by the load of PIIX3 state, since i8259 is
> > > migrated separately.
> > 
> > Sorry I still don't understand. Why do stuff from vmstate callback then?
> > How is it different?
> 
> Reconstructing internal state from post_load is okay.
> 
> What is not okay (and I think it should be a rule) is to touch other
> devices from post_load, unless you know that they are deserialized
> first.  For example it's okay for a PCI device to talk to the parent
> bridge in its post_load function.
> 
> In the case of PIIX3 vs. i8259, however, you know that i8259 is
> deserialized _last_ because i8259 is an ISA device and PIIX3 provides
> the ISA bus.  So it's incorrect, even though it's currently harmless, to
> touch the i8259 before it's deserialized.

OK, got this, thanks for the explanation!
So the reason i8259 might be out of sync is
because it's not yet deserialized.

I think it's a good idea to put (at least the
last part) in the commit log.
Also it's updating irq state, not just raising irq,
that might be problematic, right?

So also, something like this for the comment:
+    /* We update irq levels in PIIX3 but don't set IRQ, since
+     *	IRQ state is serialized separately through the i8259,
+     * which is not deserialized yet, at this point.
+     */





> > I'd like to see a description of a scenario where this patch makes
> > a difference.
> 
> Of course it would be nice to have testcases for this, but I guess one
> case could be:
> 
> - LAPIC configured in ExtINT mode
> 
> - interrupts are masked in the i8259, but the i8259 doesn't know that
> yet because it's not been loaded yet
> 
> - the PIIX3 loads the state and the interrupt is set.  pic_set_irq is
> called, calls pic_update_irq
> 
> - pic_update_irq calls pic_get_irq, which uses IMR=0 and thus raises LINT0
> 
> - the APIC has been loaded already, so LINT0 is injected incorrectly
> 
> 
> Another case could be:
> 
> - i8259 is processing IRQ0.  The lower-priority interrupt from PIIX3 is
> in IRR.  Machine is migrated.
> 
> - the PIIX3 loads the state and sets the interrupt in the i8259.
> pic_set_irq is called, calls pic_update_irq, calls pic_get_irq
> 
> - because i8259 has not been loaded yet, pic_get_irq sees ISR=0 and the
> interrupt is injected even though IRQ0 (higher priority) is being serviced.
> 
> 
> In both cases, the saved i8259 state will have the PIIX3 interrupt in
> IRR, so the interrupt is not lost, just held (as it would have been on
> the source machine).
> 
> Paolo
Michael S. Tsirkin Sept. 10, 2014, 11:04 a.m. UTC | #12
On Wed, Sep 10, 2014 at 11:58:34AM +0200, Paolo Bonzini wrote:
> Il 10/09/2014 12:50, Michael S. Tsirkin ha scritto:
> > OK, got this, thanks for the explanation!
> > So the reason i8259 might be out of sync is
> > because it's not yet deserialized.
> 
> Yes, especially the IMR/IRR/ISR fields.
> 
> > I think it's a good idea to put (at least the
> > last part) in the commit log.
> 
> Like this:
> 
>     This patch disables raising an irq while loading the state of PCI bridge.
>     Because the i8259 has not been deserialized yet, raising an interrupt
>     could bring the system out-of-sync with the migration source.  For example,
>     the migration source could have masked the interrupt in the i8259. On the
>     destination, the i8259 device model would not know that yet and would
>     trigger an interrupt in the CPU.
>     
>     This patch eliminates raising an irq and just restores the calculated
>     state fields in post_load function.  Interrupt state will be deserialized
>     separately through the IRR field of the i8259.

Yes, thanks!
Except imho it's a bit better to s/raising/setting/ in the last paragraph.

> > Also it's updating irq state, not just raising irq,
> > that might be problematic, right?
> 
> Well, the i8259 is in the reset state so ISR=IRR=0, aka all IRQ lines 
> are known to be low.

By luck, yes.

>  But yes, in general it's the update that is 
> problematic.
Michael S. Tsirkin Sept. 10, 2014, 11:26 a.m. UTC | #13
On Wed, Sep 10, 2014 at 12:07:22PM +0200, Paolo Bonzini wrote:
> Il 10/09/2014 13:04, Michael S. Tsirkin ha scritto:
> >> >     This patch disables raising an irq while loading the state of PCI bridge.
> >> >     Because the i8259 has not been deserialized yet, raising an interrupt
> >> >     could bring the system out-of-sync with the migration source.  For example,
> >> >     the migration source could have masked the interrupt in the i8259. On the
> >> >     destination, the i8259 device model would not know that yet and would
> >> >     trigger an interrupt in the CPU.
> >> >     
> >> >     This patch eliminates raising an irq and just restores the calculated
> >> >     state fields in post_load function.  Interrupt state will be deserialized
> >> >     separately through the IRR field of the i8259.
> > Yes, thanks!
> > Except imho it's a bit better to s/raising/setting/ in the last paragraph.
> 
> And pretty much everywhere else, not just in the last paragraph.
> 
> Thanks!
> 
> Paolo

Right. With these minor changes, you can attach

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
diff mbox

Patch

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index e0e0946..cd50435 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -409,7 +409,7 @@  static void piix3_set_irq_pic(PIIX3State *piix3, int pic_irq)
                      (pic_irq * PIIX_NUM_PIRQS))));
 }
 
-static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level)
+static void piix3_set_irq_level_internal(PIIX3State *piix3, int pirq, int level)
 {
     int pic_irq;
     uint64_t mask;
@@ -422,6 +422,18 @@  static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level)
     mask = 1ULL << ((pic_irq * PIIX_NUM_PIRQS) + pirq);
     piix3->pic_levels &= ~mask;
     piix3->pic_levels |= mask * !!level;
+}
+
+static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level)
+{
+    int pic_irq;
+
+    pic_irq = piix3->dev.config[PIIX_PIRQC + pirq];
+    if (pic_irq >= PIIX_NUM_PIC_IRQS) {
+        return;
+    }
+
+    piix3_set_irq_level_internal(piix3, pirq, level);
 
     piix3_set_irq_pic(piix3, pic_irq);
 }
@@ -527,7 +539,18 @@  static void piix3_reset(void *opaque)
 static int piix3_post_load(void *opaque, int version_id)
 {
     PIIX3State *piix3 = opaque;
-    piix3_update_irq_levels(piix3);
+    int pirq;
+
+    /* Update irq levels without raising an interrupt which could
+     * happen in piix3_update_irq_levels.  Raising an IRQ will
+     * bring the system to a different state than the saved one.
+     * Interrupt state is serialized separately through the i8259.
+     */
+    piix3->pic_levels = 0;
+    for (pirq = 0; pirq < PIIX_NUM_PIRQS; pirq++) {
+        piix3_set_irq_level_internal(piix3, pirq,
+                            pci_bus_get_irq_level(piix3->dev.bus, pirq));
+    }
     return 0;
 }