Patchwork [3/3] apic: always update the in-kernel status after loading

login
register
mail settings
Submitter Paolo Bonzini
Date Oct. 30, 2012, 12:16 p.m.
Message ID <1351599394-24876-4-git-send-email-pbonzini@redhat.com>
Download mbox | patch
Permalink /patch/195472/
State New
Headers show

Comments

Paolo Bonzini - Oct. 30, 2012, 12:16 p.m.
The LAPIC is loaded separately from the rest of the VCPU state.  Thus,
when restoring the CPU the dummy post-reset state is passed to the
in-kernel APIC.

This can cause MSI injection to fail if attempted during the restore of
another device, because the LAPIC believes it's not enabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/apic_common.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
Avi Kivity - Oct. 30, 2012, 12:38 p.m.
On 10/30/2012 02:16 PM, Paolo Bonzini wrote:
> The LAPIC is loaded separately from the rest of the VCPU state.  Thus,
> when restoring the CPU the dummy post-reset state is passed to the
> in-kernel APIC.
> 
> This can cause MSI injection to fail if attempted during the restore of
> another device, because the LAPIC believes it's not enabled.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/apic_common.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/apic_common.c b/hw/apic_common.c
> index f373ba8..1ef52b2 100644
> --- a/hw/apic_common.c
> +++ b/hw/apic_common.c
> @@ -362,6 +362,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
>      if (info->post_load) {
>          info->post_load(s);
>      }
> +    cpu_put_apic_state(DEVICE(s));
>      return 0;
>  }

Aren't we still dependent on the order of processing?  If the APIC is
restored after the device, won't we get the same problem?
Paolo Bonzini - Oct. 30, 2012, 2:16 p.m.
Il 30/10/2012 13:38, Avi Kivity ha scritto:
> On 10/30/2012 02:16 PM, Paolo Bonzini wrote:
>> The LAPIC is loaded separately from the rest of the VCPU state.  Thus,
>> when restoring the CPU the dummy post-reset state is passed to the
>> in-kernel APIC.
>>
>> This can cause MSI injection to fail if attempted during the restore of
>> another device, because the LAPIC believes it's not enabled.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  hw/apic_common.c |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/hw/apic_common.c b/hw/apic_common.c
>> index f373ba8..1ef52b2 100644
>> --- a/hw/apic_common.c
>> +++ b/hw/apic_common.c
>> @@ -362,6 +362,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
>>      if (info->post_load) {
>>          info->post_load(s);
>>      }
>> +    cpu_put_apic_state(DEVICE(s));
>>      return 0;
>>  }
> 
> Aren't we still dependent on the order of processing?  If the APIC is
> restored after the device, won't we get the same problem?

Strictly speaking yes, but CPUs and APICs are always the first devices
to be saved.

Paolo
Jan Kiszka - Oct. 30, 2012, 6:17 p.m.
On 2012-10-30 13:16, Paolo Bonzini wrote:
> The LAPIC is loaded separately from the rest of the VCPU state.  Thus,
> when restoring the CPU the dummy post-reset state is passed to the
> in-kernel APIC.
> 
> This can cause MSI injection to fail if attempted during the restore of
> another device, because the LAPIC believes it's not enabled.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/apic_common.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/apic_common.c b/hw/apic_common.c
> index f373ba8..1ef52b2 100644
> --- a/hw/apic_common.c
> +++ b/hw/apic_common.c
> @@ -362,6 +362,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
>      if (info->post_load) {
>          info->post_load(s);
>      }
> +    cpu_put_apic_state(DEVICE(s));

Just implement a post_load handler for the KVM APIC and trigger putting
from there.

Jan

>      return 0;
>  }
>  
>
Jan Kiszka - Oct. 30, 2012, 6:21 p.m.
On 2012-10-30 15:16, Paolo Bonzini wrote:
> Il 30/10/2012 13:38, Avi Kivity ha scritto:
>> On 10/30/2012 02:16 PM, Paolo Bonzini wrote:
>>> The LAPIC is loaded separately from the rest of the VCPU state.  Thus,
>>> when restoring the CPU the dummy post-reset state is passed to the
>>> in-kernel APIC.
>>>
>>> This can cause MSI injection to fail if attempted during the restore of
>>> another device, because the LAPIC believes it's not enabled.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  hw/apic_common.c |    1 +
>>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/hw/apic_common.c b/hw/apic_common.c
>>> index f373ba8..1ef52b2 100644
>>> --- a/hw/apic_common.c
>>> +++ b/hw/apic_common.c
>>> @@ -362,6 +362,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
>>>      if (info->post_load) {
>>>          info->post_load(s);
>>>      }
>>> +    cpu_put_apic_state(DEVICE(s));
>>>      return 0;
>>>  }
>>
>> Aren't we still dependent on the order of processing?  If the APIC is
>> restored after the device, won't we get the same problem?
> 
> Strictly speaking yes, but CPUs and APICs are always the first devices
> to be saved.

Hmm, thinking about this again: Why is the MSI event injected at all
during restore, specifically while the device models are in transitional
state. Can you explain this? Does the same pattern then also apply on
INTx injection?

Jan
Paolo Bonzini - Nov. 2, 2012, 2:53 p.m.
Il 30/10/2012 19:21, Jan Kiszka ha scritto:
> > > Aren't we still dependent on the order of processing?  If the APIC is
> > > restored after the device, won't we get the same problem?
> > 
> > Strictly speaking yes, but CPUs and APICs are always the first devices
> > to be saved.
> Hmm, thinking about this again: Why is the MSI event injected at all
> during restore, specifically while the device models are in transitional
> state. Can you explain this?

Because the (virtio-serial) port was connected on the source and
disconnected on the destination, or vice versa.

In my simplified reproducer, I'm really using different command-lines on
the source and destination, but it is not necessary.  For example, if
you have a socket backend, the destination will usually be disconnected
at the time the machine loads.

One alternative fix is a vm_clock timer that expires immediately.  It
would fix both MSI and INTx, on the other hand I thought it was an APIC
bug because the QEMU APIC works nicely.

> Does the same pattern then also apply on INTx injection?

Yes.

Paolo
Jan Kiszka - Nov. 2, 2012, 2:59 p.m.
On 2012-11-02 15:53, Paolo Bonzini wrote:
> Il 30/10/2012 19:21, Jan Kiszka ha scritto:
>>>> Aren't we still dependent on the order of processing?  If the APIC is
>>>> restored after the device, won't we get the same problem?
>>>
>>> Strictly speaking yes, but CPUs and APICs are always the first devices
>>> to be saved.
>> Hmm, thinking about this again: Why is the MSI event injected at all
>> during restore, specifically while the device models are in transitional
>> state. Can you explain this?
> 
> Because the (virtio-serial) port was connected on the source and
> disconnected on the destination, or vice versa.
> 
> In my simplified reproducer, I'm really using different command-lines on
> the source and destination, but it is not necessary.  For example, if
> you have a socket backend, the destination will usually be disconnected
> at the time the machine loads.
> 
> One alternative fix is a vm_clock timer that expires immediately.  It
> would fix both MSI and INTx, on the other hand I thought it was an APIC
> bug because the QEMU APIC works nicely.

I think deferring IRQ events to the point when the complete vmstate is
loaded is the cleaner and more robust approach.

Jan
Gerd Hoffmann - Nov. 2, 2012, 3:07 p.m.
Hi,

> I think deferring IRQ events to the point when the complete vmstate is
> loaded is the cleaner and more robust approach.

Agree.  Just schedule a bh in post_load.
See also a229c0535bd336efaec786dd6e352a54e0a8187d

cheers,
  Gerd
Paolo Bonzini - Nov. 2, 2012, 3:13 p.m.
> Hi,
> 
> > I think deferring IRQ events to the point when the complete vmstate
> > is
> > loaded is the cleaner and more robust approach.
> 
> Agree.  Just schedule a bh in post_load.
> See also a229c0535bd336efaec786dd6e352a54e0a8187d

No, it cannot a bh.  Right now incoming migration is blocking,
but this will change in 1.3.  There is no guarantee that a
bottom half will run after migration has completed.

Paolo
Gerd Hoffmann - Nov. 2, 2012, 3:17 p.m.
On 11/02/12 16:13, Paolo Bonzini wrote:
>> Hi,
>>
>>> I think deferring IRQ events to the point when the complete vmstate
>>> is
>>> loaded is the cleaner and more robust approach.
>>
>> Agree.  Just schedule a bh in post_load.
>> See also a229c0535bd336efaec786dd6e352a54e0a8187d
> 
> No, it cannot a bh.  Right now incoming migration is blocking,
> but this will change in 1.3.  There is no guarantee that a
> bottom half will run after migration has completed.

Then we'll need some new way to do this, maybe a new post_load handler
which is called once _all_ state is loaded.

cheers,
  Gerd
Paolo Bonzini - Nov. 2, 2012, 3:21 p.m.
Il 02/11/2012 16:17, Gerd Hoffmann ha scritto:
> On 11/02/12 16:13, Paolo Bonzini wrote:
>>> >> Hi,
>>> >>
>>>> >>> I think deferring IRQ events to the point when the complete vmstate
>>>> >>> is loaded is the cleaner and more robust approach.
>>> >>
>>> >> Agree.  Just schedule a bh in post_load.
>>> >> See also a229c0535bd336efaec786dd6e352a54e0a8187d
>> > 
>> > No, it cannot a bh.  Right now incoming migration is blocking,
>> > but this will change in 1.3.  There is no guarantee that a
>> > bottom half will run after migration has completed.
> Then we'll need some new way to do this, maybe a new post_load handler
> which is called once _all_ state is loaded.

The simplest is a vm_clock timer that expires at time 0.

Paolo

Patch

diff --git a/hw/apic_common.c b/hw/apic_common.c
index f373ba8..1ef52b2 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -362,6 +362,7 @@  static int apic_dispatch_post_load(void *opaque, int version_id)
     if (info->post_load) {
         info->post_load(s);
     }
+    cpu_put_apic_state(DEVICE(s));
     return 0;
 }