Message ID | 1410265809-27247-10-git-send-email-pbonzini@redhat.com |
---|---|
State | New |
Headers | show |
Paolo Bonzini <pbonzini@redhat.com> wrote: > From: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> > > This patch disables raising an irq while loading the state of PCI bridge. > The aim of this patch is preserving the same behavior while saving and > restoring the VM state. IRQ is not raised while saving the state of the bridge. > That's why the behavior of the restored system will differ from > the original one. This patch eliminates raising an irq and just restores > the calculated state fields in post_load function. > > Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> No problem from migration point of view. I assume that the pci bits are ok, so Acked-by: Juan Quintela <quintela@redhat.com>
On Tue, Sep 09, 2014 at 02:30:08PM +0200, Paolo Bonzini wrote: > From: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> > > This patch disables raising an irq while loading the state of PCI bridge. > The aim of this patch is preserving the same behavior while saving and > restoring the VM state. IRQ is not raised while saving the state of the bridge. > That's why the behavior of the restored system will differ from > the original one. Hmm I don't understand. You are removing call to piix3_update_irq_levels this call is supposed to just sync up bus to irq level. How can this change system state? Saved state is supposed to already be in sync. > This patch eliminates raising an irq and just restores > the calculated state fields in post_load function. > > Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > hw/pci-host/piix.c | 26 ++++++++++++++++++++++++-- > 1 file changed, 24 insertions(+), 2 deletions(-) > > diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c > index e0e0946..cd50435 100644 > --- a/hw/pci-host/piix.c > +++ b/hw/pci-host/piix.c > @@ -409,7 +409,7 @@ static void piix3_set_irq_pic(PIIX3State *piix3, int pic_irq) > (pic_irq * PIIX_NUM_PIRQS)))); > } > > -static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level) > +static void piix3_set_irq_level_internal(PIIX3State *piix3, int pirq, int level) > { > int pic_irq; > uint64_t mask; > @@ -422,6 +422,18 @@ static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level) > mask = 1ULL << ((pic_irq * PIIX_NUM_PIRQS) + pirq); > piix3->pic_levels &= ~mask; > piix3->pic_levels |= mask * !!level; > +} > + > +static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level) > +{ > + int pic_irq; > + > + pic_irq = piix3->dev.config[PIIX_PIRQC + pirq]; > + if (pic_irq >= PIIX_NUM_PIC_IRQS) { > + return; > + } > + > + piix3_set_irq_level_internal(piix3, pirq, level); > > piix3_set_irq_pic(piix3, pic_irq); > } > @@ -527,7 +539,18 @@ static void piix3_reset(void *opaque) > static int piix3_post_load(void *opaque, int version_id) > { > PIIX3State *piix3 = opaque; > - piix3_update_irq_levels(piix3); > + int pirq; > + > + /* Update irq levels without raising an interrupt which could > + * happen in piix3_update_irq_levels. Raising an IRQ will > + * bring the system to a different state than the saved one. I don't like comments like this: don't discuss what could have been if code was different. Explain why code is like it is. > + * Interrupt state is serialized separately through the i8259. > + */ > + piix3->pic_levels = 0; > + for (pirq = 0; pirq < PIIX_NUM_PIRQS; pirq++) { > + piix3_set_irq_level_internal(piix3, pirq, > + pci_bus_get_irq_level(piix3->dev.bus, pirq)); > + } > return 0; > } > > -- > 2.1.0 > >
Il 09/09/2014 15:54, Michael S. Tsirkin ha scritto: > > Hmm I don't understand. > You are removing call to piix3_update_irq_levels > this call is supposed to just sync up bus to > irq level. > > How can this change system state? Saved state > is supposed to already be in sync. i440FX/PIIX3 state is loaded before i8259, so the interrupt will never be in the i8259 ISR. I am not sure why it is a problem for record/replay, but I think it's plausible to consider this a bug. i8259 state should not be affected by the load of PIIX3 state, since i8259 is migrated separately. Paolo
On Tue, Sep 09, 2014 at 07:16:19PM +0200, Paolo Bonzini wrote: > Il 09/09/2014 15:54, Michael S. Tsirkin ha scritto: > > > > Hmm I don't understand. > > You are removing call to piix3_update_irq_levels > > this call is supposed to just sync up bus to > > irq level. > > > > How can this change system state? Saved state > > is supposed to already be in sync. > > i440FX/PIIX3 state is loaded before i8259, so the interrupt will never > be in the i8259 ISR. I am not sure why it is a problem for > record/replay, but I think it's plausible to consider this a bug. i8259 > state should not be affected by the load of PIIX3 state, since i8259 is > migrated separately. > > Paolo Sorry I still don't understand. Why do stuff from vmstate callback then? How is it different? I'd like to see a description of a scenario where this patch makes a difference.
Il 09/09/2014 22:51, Michael S. Tsirkin ha scritto: > > i440FX/PIIX3 state is loaded before i8259, so the interrupt will never > > be in the i8259 ISR. I am not sure why it is a problem for > > record/replay, but I think it's plausible to consider this a bug. i8259 > > state should not be affected by the load of PIIX3 state, since i8259 is > > migrated separately. > > Sorry I still don't understand. Why do stuff from vmstate callback then? > How is it different? Reconstructing internal state from post_load is okay. What is not okay (and I think it should be a rule) is to touch other devices from post_load, unless you know that they are deserialized first. For example it's okay for a PCI device to talk to the parent bridge in its post_load function. In the case of PIIX3 vs. i8259, however, you know that i8259 is deserialized _last_ because i8259 is an ISA device and PIIX3 provides the ISA bus. So it's incorrect, even though it's currently harmless, to touch the i8259 before it's deserialized. > I'd like to see a description of a scenario where this patch makes > a difference. Of course it would be nice to have testcases for this, but I guess one case could be: - LAPIC configured in ExtINT mode - interrupts are masked in the i8259, but the i8259 doesn't know that yet because it's not been loaded yet - the PIIX3 loads the state and the interrupt is set. pic_set_irq is called, calls pic_update_irq - pic_update_irq calls pic_get_irq, which uses IMR=0 and thus raises LINT0 - the APIC has been loaded already, so LINT0 is injected incorrectly Another case could be: - i8259 is processing IRQ0. The lower-priority interrupt from PIIX3 is in IRR. Machine is migrated. - the PIIX3 loads the state and sets the interrupt in the i8259. pic_set_irq is called, calls pic_update_irq, calls pic_get_irq - because i8259 has not been loaded yet, pic_get_irq sees ISR=0 and the interrupt is injected even though IRQ0 (higher priority) is being serviced. In both cases, the saved i8259 state will have the PIIX3 interrupt in IRR, so the interrupt is not lost, just held (as it would have been on the source machine). Paolo
On 10 September 2014 09:38, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 09/09/2014 22:51, Michael S. Tsirkin ha scritto: >> Sorry I still don't understand. Why do stuff from vmstate callback then? >> How is it different? > > Reconstructing internal state from post_load is okay. > > What is not okay (and I think it should be a rule) is to touch other > devices from post_load, unless you know that they are deserialized > first. For example it's okay for a PCI device to talk to the parent > bridge in its post_load function. I don't think it's right to talk to another device even if you do know it's deserialized first. Talking to it might make it change its state, which would be wrong (since its correct state is the state it's just deserialized). I would suggest the rule should be "never do something that can change the state of another device in post-load". (We have similar issues with reset, except worse in that we don't have a coherent rule to cause everything to come out of reset in the right state.) thanks -- PMM
Il 10/09/2014 10:51, Peter Maydell ha scritto: > > What is not okay (and I think it should be a rule) is to touch other > > devices from post_load, unless you know that they are deserialized > > first. For example it's okay for a PCI device to talk to the parent > > bridge in its post_load function. > > I don't think it's right to talk to another device even if you do > know it's deserialized first. Talking to it might make it change > its state, which would be wrong (since its correct state is > the state it's just deserialized). I would suggest the rule should > be "never do something that can change the state of another > device in post-load". That's harder to do, but if it is possible to do it, it would be great as well. It would not surprise me to find a case where the parent device actually _expects_ the children's post_load to inform it about something, instead of serializing that part of state on its own. Paolo > (We have similar issues with reset, except worse in that we > don't have a coherent rule to cause everything to come out > of reset in the right state.)
Il 10/09/2014 12:50, Michael S. Tsirkin ha scritto: > OK, got this, thanks for the explanation! > So the reason i8259 might be out of sync is > because it's not yet deserialized. Yes, especially the IMR/IRR/ISR fields. > I think it's a good idea to put (at least the > last part) in the commit log. Like this: This patch disables raising an irq while loading the state of PCI bridge. Because the i8259 has not been deserialized yet, raising an interrupt could bring the system out-of-sync with the migration source. For example, the migration source could have masked the interrupt in the i8259. On the destination, the i8259 device model would not know that yet and would trigger an interrupt in the CPU. This patch eliminates raising an irq and just restores the calculated state fields in post_load function. Interrupt state will be deserialized separately through the IRR field of the i8259. > Also it's updating irq state, not just raising irq, > that might be problematic, right? Well, the i8259 is in the reset state so ISR=IRR=0, aka all IRQ lines are known to be low. But yes, in general it's the update that is problematic.
Il 10/09/2014 13:04, Michael S. Tsirkin ha scritto: >> > This patch disables raising an irq while loading the state of PCI bridge. >> > Because the i8259 has not been deserialized yet, raising an interrupt >> > could bring the system out-of-sync with the migration source. For example, >> > the migration source could have masked the interrupt in the i8259. On the >> > destination, the i8259 device model would not know that yet and would >> > trigger an interrupt in the CPU. >> > >> > This patch eliminates raising an irq and just restores the calculated >> > state fields in post_load function. Interrupt state will be deserialized >> > separately through the IRR field of the i8259. > Yes, thanks! > Except imho it's a bit better to s/raising/setting/ in the last paragraph. And pretty much everywhere else, not just in the last paragraph. Thanks! Paolo
On Wed, Sep 10, 2014 at 11:05:39AM +0200, Paolo Bonzini wrote: > Il 10/09/2014 10:51, Peter Maydell ha scritto: > > > What is not okay (and I think it should be a rule) is to touch other > > > devices from post_load, unless you know that they are deserialized > > > first. For example it's okay for a PCI device to talk to the parent > > > bridge in its post_load function. > > > > I don't think it's right to talk to another device even if you do > > know it's deserialized first. Talking to it might make it change > > its state, which would be wrong (since its correct state is > > the state it's just deserialized). I would suggest the rule should > > be "never do something that can change the state of another > > device in post-load". > > That's harder to do, but if it is possible to do it, it would be great > as well. > > It would not surprise me to find a case where the parent device actually > _expects_ the children's post_load to inform it about something, instead > of serializing that part of state on its own. > > Paolo Absolutely, I don't think we can require that. For example, at the moment, for PCI bridges, we serialize the state of all interrupt lines, but that's just a function of all devices connected to each line. So we are transmitting redundant information, and I have plans to discard that and recompute parent state based on child state. > > (We have similar issues with reset, except worse in that we > > don't have a coherent rule to cause everything to come out > > of reset in the right state.)
On Wed, Sep 10, 2014 at 10:38:46AM +0200, Paolo Bonzini wrote: > Il 09/09/2014 22:51, Michael S. Tsirkin ha scritto: > > > i440FX/PIIX3 state is loaded before i8259, so the interrupt will never > > > be in the i8259 ISR. I am not sure why it is a problem for > > > record/replay, but I think it's plausible to consider this a bug. i8259 > > > state should not be affected by the load of PIIX3 state, since i8259 is > > > migrated separately. > > > > Sorry I still don't understand. Why do stuff from vmstate callback then? > > How is it different? > > Reconstructing internal state from post_load is okay. > > What is not okay (and I think it should be a rule) is to touch other > devices from post_load, unless you know that they are deserialized > first. For example it's okay for a PCI device to talk to the parent > bridge in its post_load function. > > In the case of PIIX3 vs. i8259, however, you know that i8259 is > deserialized _last_ because i8259 is an ISA device and PIIX3 provides > the ISA bus. So it's incorrect, even though it's currently harmless, to > touch the i8259 before it's deserialized. OK, got this, thanks for the explanation! So the reason i8259 might be out of sync is because it's not yet deserialized. I think it's a good idea to put (at least the last part) in the commit log. Also it's updating irq state, not just raising irq, that might be problematic, right? So also, something like this for the comment: + /* We update irq levels in PIIX3 but don't set IRQ, since + * IRQ state is serialized separately through the i8259, + * which is not deserialized yet, at this point. + */ > > I'd like to see a description of a scenario where this patch makes > > a difference. > > Of course it would be nice to have testcases for this, but I guess one > case could be: > > - LAPIC configured in ExtINT mode > > - interrupts are masked in the i8259, but the i8259 doesn't know that > yet because it's not been loaded yet > > - the PIIX3 loads the state and the interrupt is set. pic_set_irq is > called, calls pic_update_irq > > - pic_update_irq calls pic_get_irq, which uses IMR=0 and thus raises LINT0 > > - the APIC has been loaded already, so LINT0 is injected incorrectly > > > Another case could be: > > - i8259 is processing IRQ0. The lower-priority interrupt from PIIX3 is > in IRR. Machine is migrated. > > - the PIIX3 loads the state and sets the interrupt in the i8259. > pic_set_irq is called, calls pic_update_irq, calls pic_get_irq > > - because i8259 has not been loaded yet, pic_get_irq sees ISR=0 and the > interrupt is injected even though IRQ0 (higher priority) is being serviced. > > > In both cases, the saved i8259 state will have the PIIX3 interrupt in > IRR, so the interrupt is not lost, just held (as it would have been on > the source machine). > > Paolo
On Wed, Sep 10, 2014 at 11:58:34AM +0200, Paolo Bonzini wrote: > Il 10/09/2014 12:50, Michael S. Tsirkin ha scritto: > > OK, got this, thanks for the explanation! > > So the reason i8259 might be out of sync is > > because it's not yet deserialized. > > Yes, especially the IMR/IRR/ISR fields. > > > I think it's a good idea to put (at least the > > last part) in the commit log. > > Like this: > > This patch disables raising an irq while loading the state of PCI bridge. > Because the i8259 has not been deserialized yet, raising an interrupt > could bring the system out-of-sync with the migration source. For example, > the migration source could have masked the interrupt in the i8259. On the > destination, the i8259 device model would not know that yet and would > trigger an interrupt in the CPU. > > This patch eliminates raising an irq and just restores the calculated > state fields in post_load function. Interrupt state will be deserialized > separately through the IRR field of the i8259. Yes, thanks! Except imho it's a bit better to s/raising/setting/ in the last paragraph. > > Also it's updating irq state, not just raising irq, > > that might be problematic, right? > > Well, the i8259 is in the reset state so ISR=IRR=0, aka all IRQ lines > are known to be low. By luck, yes. > But yes, in general it's the update that is > problematic.
On Wed, Sep 10, 2014 at 12:07:22PM +0200, Paolo Bonzini wrote: > Il 10/09/2014 13:04, Michael S. Tsirkin ha scritto: > >> > This patch disables raising an irq while loading the state of PCI bridge. > >> > Because the i8259 has not been deserialized yet, raising an interrupt > >> > could bring the system out-of-sync with the migration source. For example, > >> > the migration source could have masked the interrupt in the i8259. On the > >> > destination, the i8259 device model would not know that yet and would > >> > trigger an interrupt in the CPU. > >> > > >> > This patch eliminates raising an irq and just restores the calculated > >> > state fields in post_load function. Interrupt state will be deserialized > >> > separately through the IRR field of the i8259. > > Yes, thanks! > > Except imho it's a bit better to s/raising/setting/ in the last paragraph. > > And pretty much everywhere else, not just in the last paragraph. > > Thanks! > > Paolo Right. With these minor changes, you can attach Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index e0e0946..cd50435 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -409,7 +409,7 @@ static void piix3_set_irq_pic(PIIX3State *piix3, int pic_irq) (pic_irq * PIIX_NUM_PIRQS)))); } -static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level) +static void piix3_set_irq_level_internal(PIIX3State *piix3, int pirq, int level) { int pic_irq; uint64_t mask; @@ -422,6 +422,18 @@ static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level) mask = 1ULL << ((pic_irq * PIIX_NUM_PIRQS) + pirq); piix3->pic_levels &= ~mask; piix3->pic_levels |= mask * !!level; +} + +static void piix3_set_irq_level(PIIX3State *piix3, int pirq, int level) +{ + int pic_irq; + + pic_irq = piix3->dev.config[PIIX_PIRQC + pirq]; + if (pic_irq >= PIIX_NUM_PIC_IRQS) { + return; + } + + piix3_set_irq_level_internal(piix3, pirq, level); piix3_set_irq_pic(piix3, pic_irq); } @@ -527,7 +539,18 @@ static void piix3_reset(void *opaque) static int piix3_post_load(void *opaque, int version_id) { PIIX3State *piix3 = opaque; - piix3_update_irq_levels(piix3); + int pirq; + + /* Update irq levels without raising an interrupt which could + * happen in piix3_update_irq_levels. Raising an IRQ will + * bring the system to a different state than the saved one. + * Interrupt state is serialized separately through the i8259. + */ + piix3->pic_levels = 0; + for (pirq = 0; pirq < PIIX_NUM_PIRQS; pirq++) { + piix3_set_irq_level_internal(piix3, pirq, + pci_bus_get_irq_level(piix3->dev.bus, pirq)); + } return 0; }