Patchwork [KVM,v2,1/4] KVM: fix i8254 IRQ0 to be normally high

login
register
mail settings
Submitter Matthew Ogilvie
Date Dec. 27, 2012, 5:39 a.m.
Message ID <1356586796-7631-2-git-send-email-mmogilvi_qemu@miniinfo.net>
Download mbox | patch
Permalink /patch/208251/
State New
Headers show

Comments

Matthew Ogilvie - Dec. 27, 2012, 5:39 a.m.
Reading the spec, it is clear that most modes normally leave the IRQ
output line high, and only pulse it low to generate a leading edge.
Especially the most commonly used mode 2.

The KVM i8254 model does not try to emulate the duration of the pulse at
all, so just swap the high/low settings it to leave it high most of
the time.

This fix is a prerequisite to improving the i8259 model to handle
the trailing edge of an interupt request as indicated in its spec:
If it gets a trailing edge of an IRQ line before it starts to service
the interrupt, the request should be canceled.

See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
or search the net for 23124406.pdf.

Risks:

There is a risk that migrating a running guest between versions
with and without this patch will lose or gain a single timer
interrupt during the migration process.  The only case where
this is likely to be serious is probably losing a single-shot (mode 4)
interrupt, but if my understanding of how things work is good, then
that should only be possible if a whole slew of conditions are
all met:

 1. The guest is configured to run in a "tickless" mode (like
    modern Linux).
 2. The guest is for some reason still using the i8254 rather
    than something more modern like an HPET.  (The combination
    of 1 and 2 should be rare.)
 3. The migration is going from a fixed version back to the
    old version.  (Not sure how common this is, but it should
    be rarer than migrating from old to new.)
 4. There are not going to be any "timely" events/interrupts
    (keyboard, network, process sleeps, etc) that cause the guest
    to reset the PIT mode 4 one-shot counter "soon enough".

This combination should be rare enough that more complicated
solutions are not worth the effort.

Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
---
 arch/x86/kvm/i8254.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
Gleb Natapov - Jan. 7, 2013, 9:39 a.m.
On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
> Reading the spec, it is clear that most modes normally leave the IRQ
> output line high, and only pulse it low to generate a leading edge.
> Especially the most commonly used mode 2.
> 
> The KVM i8254 model does not try to emulate the duration of the pulse at
> all, so just swap the high/low settings it to leave it high most of
> the time.
> 
> This fix is a prerequisite to improving the i8259 model to handle
> the trailing edge of an interupt request as indicated in its spec:
> If it gets a trailing edge of an IRQ line before it starts to service
> the interrupt, the request should be canceled.
> 
> See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
> or search the net for 23124406.pdf.
> 
> Risks:
> 
> There is a risk that migrating a running guest between versions
> with and without this patch will lose or gain a single timer
> interrupt during the migration process.  The only case where
Can you elaborate on how exactly this can happen? Do not see it.

> this is likely to be serious is probably losing a single-shot (mode 4)
> interrupt, but if my understanding of how things work is good, then
> that should only be possible if a whole slew of conditions are
> all met:
> 
>  1. The guest is configured to run in a "tickless" mode (like
>     modern Linux).
>  2. The guest is for some reason still using the i8254 rather
>     than something more modern like an HPET.  (The combination
>     of 1 and 2 should be rare.)
This is not so rare. For performance reason it is better to not have
HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.

>  3. The migration is going from a fixed version back to the
>     old version.  (Not sure how common this is, but it should
>     be rarer than migrating from old to new.)
>  4. There are not going to be any "timely" events/interrupts
>     (keyboard, network, process sleeps, etc) that cause the guest
>     to reset the PIT mode 4 one-shot counter "soon enough".
> 
> This combination should be rare enough that more complicated
> solutions are not worth the effort.
> 
> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
> ---
>  arch/x86/kvm/i8254.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index c1d30b2..cd4ec60 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
>  	}
>  	spin_unlock(&ps->inject_lock);
>  	if (inject) {
> -		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> +		/* Clear previous interrupt, then create a rising
> +		 * edge to request another interupt, and leave it at
> +		 * level=1 until time to inject another one.
> +		 */
>  		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> +		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>  
>  		/*
>  		 * Provides NMI watchdog support via Virtual Wire mode.
> -- 
> 1.7.10.2.484.gcd07cc5

--
			Gleb.
Gleb Natapov - Jan. 7, 2013, 12:48 p.m.
On Mon, Jan 07, 2013 at 11:39:18AM +0200, Gleb Natapov wrote:
> On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
> > Reading the spec, it is clear that most modes normally leave the IRQ
> > output line high, and only pulse it low to generate a leading edge.
> > Especially the most commonly used mode 2.
> > 
> > The KVM i8254 model does not try to emulate the duration of the pulse at
> > all, so just swap the high/low settings it to leave it high most of
> > the time.
> > 
> > This fix is a prerequisite to improving the i8259 model to handle
> > the trailing edge of an interupt request as indicated in its spec:
> > If it gets a trailing edge of an IRQ line before it starts to service
> > the interrupt, the request should be canceled.
> > 
> > See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
> > or search the net for 23124406.pdf.
> > 
> > Risks:
> > 
> > There is a risk that migrating a running guest between versions
> > with and without this patch will lose or gain a single timer
> > interrupt during the migration process.  The only case where
> Can you elaborate on how exactly this can happen? Do not see it.
> 
> > this is likely to be serious is probably losing a single-shot (mode 4)
> > interrupt, but if my understanding of how things work is good, then
> > that should only be possible if a whole slew of conditions are
> > all met:
> > 
> >  1. The guest is configured to run in a "tickless" mode (like
> >     modern Linux).
> >  2. The guest is for some reason still using the i8254 rather
> >     than something more modern like an HPET.  (The combination
> >     of 1 and 2 should be rare.)
> This is not so rare. For performance reason it is better to not have
> HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.
> 
It looks like Linux prefer to use APIC timer anyway.

> >  3. The migration is going from a fixed version back to the
> >     old version.  (Not sure how common this is, but it should
> >     be rarer than migrating from old to new.)
> >  4. There are not going to be any "timely" events/interrupts
> >     (keyboard, network, process sleeps, etc) that cause the guest
> >     to reset the PIT mode 4 one-shot counter "soon enough".
> > 
> > This combination should be rare enough that more complicated
> > solutions are not worth the effort.
> > 
> > Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
> > ---
> >  arch/x86/kvm/i8254.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> > index c1d30b2..cd4ec60 100644
> > --- a/arch/x86/kvm/i8254.c
> > +++ b/arch/x86/kvm/i8254.c
> > @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
> >  	}
> >  	spin_unlock(&ps->inject_lock);
> >  	if (inject) {
> > -		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > +		/* Clear previous interrupt, then create a rising
> > +		 * edge to request another interupt, and leave it at
> > +		 * level=1 until time to inject another one.
> > +		 */
> >  		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> > +		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> >  
> >  		/*
> >  		 * Provides NMI watchdog support via Virtual Wire mode.
> > -- 
> > 1.7.10.2.484.gcd07cc5
> 
> --
> 			Gleb.

--
			Gleb.
mmogilvi@miniinfo.net - Jan. 8, 2013, 12:17 a.m.
On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov <gleb@redhat.com> wrote:
> On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
>> Reading the spec, it is clear that most modes normally leave the IRQ
>> output line high, and only pulse it low to generate a leading edge.
>> Especially the most commonly used mode 2.
>> 
>> The KVM i8254 model does not try to emulate the duration of the pulse at
>> all, so just swap the high/low settings it to leave it high most of
>> the time.
>> 
>> This fix is a prerequisite to improving the i8259 model to handle
>> the trailing edge of an interupt request as indicated in its spec:
>> If it gets a trailing edge of an IRQ line before it starts to service
>> the interrupt, the request should be canceled.
>> 
>> See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
>> or search the net for 23124406.pdf.
>> 
>> Risks:
>> 
>> There is a risk that migrating a running guest between versions
>> with and without this patch will lose or gain a single timer
>> interrupt during the migration process.  The only case where
> Can you elaborate on how exactly this can happen? Do not see it.
> 

KVM 8254: In the corrected model, when the count expires, the model
briefly pulses output low and then high again, with the low to high
transition being what triggers the interrupt.  In the old model,
when the count expires, the model expects the output line
to already be low, and briefly pulses it high (triggering the
interrupt) and then low again.  But if the line was already
high (because it migrated from the corrected model),
this won't generate a new leading edge (low to high) and won't
trigger a new interrupt (the first post-back-migration pulse turns
into a simple trailing edge instead of a pulse).

Unless there is something I'm missing?

The qemu 8254 model actually models each edge at independent
clock ticks instead of combining both into a very brief pulse at one time.
I've found it handy to draw out old and new timing diagrams on paper
(for each mode), and then carefully think about what happens with respect
to levels and edges when you transition back and forth between old and
new models at various points in the timing cycle.  (Note I've spent more
time examining the qemu models rather than the kvm models.)

>> this is likely to be serious is probably losing a single-shot (mode 4)
>> interrupt, but if my understanding of how things work is good, then
>> that should only be possible if a whole slew of conditions are
>> all met:
>> 
>>  1. The guest is configured to run in a "tickless" mode (like
>>     modern Linux).
>>  2. The guest is for some reason still using the i8254 rather
>>     than something more modern like an HPET.  (The combination
>>     of 1 and 2 should be rare.)
> This is not so rare. For performance reason it is better to not have
> HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.

In a later email you mention that Linux prefers a timer in the APIC.
I don't know much about the APIC (advanced interrupt controller), and
wasn't
even aware had it's own timer.

The big question is if we can safely just fix the i825* models, or if
we need something more subtle to avoid breaking commonly used guests
like modern Linux (support both corrected and old models,
or only fix IRQ2 instead of all IRQs, or similar subtlety).

> 
>>  3. The migration is going from a fixed version back to the
>>     old version.  (Not sure how common this is, but it should
>>     be rarer than migrating from old to new.)
>>  4. There are not going to be any "timely" events/interrupts
>>     (keyboard, network, process sleeps, etc) that cause the guest
>>     to reset the PIT mode 4 one-shot counter "soon enough".
>> 
>> This combination should be rare enough that more complicated
>> solutions are not worth the effort.
>> 
>> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
>> ---
>>  arch/x86/kvm/i8254.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
>> index c1d30b2..cd4ec60 100644
>> --- a/arch/x86/kvm/i8254.c
>> +++ b/arch/x86/kvm/i8254.c
>> @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
>>  	}
>>  	spin_unlock(&ps->inject_lock);
>>  	if (inject) {
>> -		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>> +		/* Clear previous interrupt, then create a rising
>> +		 * edge to request another interupt, and leave it at
>> +		 * level=1 until time to inject another one.
>> +		 */
>>  		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
>> +		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>>  
>>  		/*
>>  		 * Provides NMI watchdog support via Virtual Wire mode.
>> -- 
>> 1.7.10.2.484.gcd07cc5
> 
> --
> 			Gleb.
Gleb Natapov - Jan. 8, 2013, 7:31 a.m.
On Mon, Jan 07, 2013 at 06:17:22PM -0600, mmogilvi@miniinfo.net wrote:
> On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov <gleb@redhat.com> wrote:
> > On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
> >> Reading the spec, it is clear that most modes normally leave the IRQ
> >> output line high, and only pulse it low to generate a leading edge.
> >> Especially the most commonly used mode 2.
> >> 
> >> The KVM i8254 model does not try to emulate the duration of the pulse at
> >> all, so just swap the high/low settings it to leave it high most of
> >> the time.
> >> 
> >> This fix is a prerequisite to improving the i8259 model to handle
> >> the trailing edge of an interupt request as indicated in its spec:
> >> If it gets a trailing edge of an IRQ line before it starts to service
> >> the interrupt, the request should be canceled.
> >> 
> >> See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
> >> or search the net for 23124406.pdf.
> >> 
> >> Risks:
> >> 
> >> There is a risk that migrating a running guest between versions
> >> with and without this patch will lose or gain a single timer
> >> interrupt during the migration process.  The only case where
> > Can you elaborate on how exactly this can happen? Do not see it.
> > 
> 
> KVM 8254: In the corrected model, when the count expires, the model
> briefly pulses output low and then high again, with the low to high
> transition being what triggers the interrupt.  In the old model,
> when the count expires, the model expects the output line
> to already be low, and briefly pulses it high (triggering the
> interrupt) and then low again.  But if the line was already
> high (because it migrated from the corrected model),
> this won't generate a new leading edge (low to high) and won't
> trigger a new interrupt (the first post-back-migration pulse turns
> into a simple trailing edge instead of a pulse).
> 
> Unless there is something I'm missing?
> 
No, I missed that pic->last_irr/ioapic->irr  will be migrated as 1. But
this means that next interrupt after migration from new to old will
always be lost.  What about clearing pit bit from last_irr/irr before
migration? Should not affect new->new migration and should fix new->old
one. The only problem is that we may need to consult irq routing table
to know how pit is connected to ioapic.

Still do not see how can we gain one interrupt.

> The qemu 8254 model actually models each edge at independent
> clock ticks instead of combining both into a very brief pulse at one time.
> I've found it handy to draw out old and new timing diagrams on paper
> (for each mode), and then carefully think about what happens with respect
> to levels and edges when you transition back and forth between old and
> new models at various points in the timing cycle.  (Note I've spent more
> time examining the qemu models rather than the kvm models.)
> 
Yes, drawing it definitely helps :)

> >> this is likely to be serious is probably losing a single-shot (mode 4)
> >> interrupt, but if my understanding of how things work is good, then
> >> that should only be possible if a whole slew of conditions are
> >> all met:
> >> 
> >>  1. The guest is configured to run in a "tickless" mode (like
> >>     modern Linux).
> >>  2. The guest is for some reason still using the i8254 rather
> >>     than something more modern like an HPET.  (The combination
> >>     of 1 and 2 should be rare.)
> > This is not so rare. For performance reason it is better to not have
> > HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.
> 
> In a later email you mention that Linux prefers a timer in the APIC.
> I don't know much about the APIC (advanced interrupt controller), and
> wasn't
> even aware had it's own timer.
> 
> The big question is if we can safely just fix the i825* models, or if
> we need something more subtle to avoid breaking commonly used guests
> like modern Linux (support both corrected and old models,
> or only fix IRQ2 instead of all IRQs, or similar subtlety).
Migration may happen while guest is running firmaware. Who knows what
those are doing. If the fix is as easy as I described above we should go
for it.

> 
> > 
> >>  3. The migration is going from a fixed version back to the
> >>     old version.  (Not sure how common this is, but it should
> >>     be rarer than migrating from old to new.)
> >>  4. There are not going to be any "timely" events/interrupts
> >>     (keyboard, network, process sleeps, etc) that cause the guest
> >>     to reset the PIT mode 4 one-shot counter "soon enough".
> >> 
> >> This combination should be rare enough that more complicated
> >> solutions are not worth the effort.
> >> 
> >> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
> >> ---
> >>  arch/x86/kvm/i8254.c | 6 +++++-
> >>  1 file changed, 5 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> >> index c1d30b2..cd4ec60 100644
> >> --- a/arch/x86/kvm/i8254.c
> >> +++ b/arch/x86/kvm/i8254.c
> >> @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
> >>  	}
> >>  	spin_unlock(&ps->inject_lock);
> >>  	if (inject) {
> >> -		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> >> +		/* Clear previous interrupt, then create a rising
> >> +		 * edge to request another interupt, and leave it at
> >> +		 * level=1 until time to inject another one.
> >> +		 */
> >>  		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> >> +		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> >>  
> >>  		/*
> >>  		 * Provides NMI watchdog support via Virtual Wire mode.
> >> -- 
> >> 1.7.10.2.484.gcd07cc5
> > 
> > --
> > 			Gleb.

--
			Gleb.
Matthew Ogilvie - Jan. 11, 2013, 6:40 a.m.
On Tue, Jan 08, 2013 at 09:31:19AM +0200, Gleb Natapov wrote:
> On Mon, Jan 07, 2013 at 06:17:22PM -0600, mmogilvi@miniinfo.net wrote:
> > On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov <gleb@redhat.com> wrote:
> > > On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
> > >> Reading the spec, it is clear that most modes normally leave the IRQ
> > >> output line high, and only pulse it low to generate a leading edge.
> > >> Especially the most commonly used mode 2.
> > >> 
> > >> The KVM i8254 model does not try to emulate the duration of the pulse at
> > >> all, so just swap the high/low settings it to leave it high most of
> > >> the time.
> > >> 
> > >> This fix is a prerequisite to improving the i8259 model to handle
> > >> the trailing edge of an interupt request as indicated in its spec:
> > >> If it gets a trailing edge of an IRQ line before it starts to service
> > >> the interrupt, the request should be canceled.
> > >> 
> > >> See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
> > >> or search the net for 23124406.pdf.
> > >> 
> > >> Risks:
> > >> 
> > >> There is a risk that migrating a running guest between versions
> > >> with and without this patch will lose or gain a single timer
> > >> interrupt during the migration process.  The only case where
> > > Can you elaborate on how exactly this can happen? Do not see it.
> > > 
> > 
> > KVM 8254: In the corrected model, when the count expires, the model
> > briefly pulses output low and then high again, with the low to high
> > transition being what triggers the interrupt.  In the old model,
> > when the count expires, the model expects the output line
> > to already be low, and briefly pulses it high (triggering the
> > interrupt) and then low again.  But if the line was already
> > high (because it migrated from the corrected model),
> > this won't generate a new leading edge (low to high) and won't
> > trigger a new interrupt (the first post-back-migration pulse turns
> > into a simple trailing edge instead of a pulse).
> > 
> > Unless there is something I'm missing?
> > 
> No, I missed that pic->last_irr/ioapic->irr  will be migrated as 1. But
> this means that next interrupt after migration from new to old will
> always be lost.  What about clearing pit bit from last_irr/irr before
> migration? Should not affect new->new migration and should fix new->old
> one. The only problem is that we may need to consult irq routing table
> to know how pit is connected to ioapic.

We should not clear both last_irr and irr.  That
cancels the interrupt early if the CPU hasn't started servicing
it yet:  If the guest CPU has interrupts disabled
and hasn't begun to service the interrupt, a new->new migration could
lose the interrupt much earlier in the countdown cycle than it otherwise
might be lost.

Potentially we could clear the last_irr bit only.  This effectively
makes migration behave like the old unfixed code.  But if this is
considered acceptable, I would be more inclined to not fix IRQ0 at all,
rather than make IRQ0 work subtly differently normally vs during
migration.  One of my earlier patch series (qemu v7 dated Nov 25)
attempts to use basically this strategy for the qemu model, at
least in the short term (and then potentially fix it properly
in the longer term), although the details of series might
be tweaked.

Or the minimal-risk strategy is to ONLY fix the cascade IRQ2's
trailing edge [the original i825* problem that spawned this whole
thing], leaving all other IRQs as-is.

> 
> Still do not see how can we gain one interrupt.

In most cases I don't think we get an extra interrupt from
a direct fix.  But some kinds of attempts to work around missing
interrupts during migration can cause cause extra interrupts in other
cases.  In the old qemu model (but perhaps not kvm), the mode 2 leading
edge occurs one clock tick earlier than it ought to.  So if you
try to be tricky with adjusting levels during migration, you
may introduce possible cases where it gets an interrupt in
the old model, then migrates and gets another interrupt one tick
later in the new model...

Also, it occurs to me that maybe there might be an off-by-one issue in
the kvm model of mode 2 that ought to be fixed as well?  Although the  
zero length pulse in kvm suggests that one-off issues in counters do  
not matter.  In the qemu model, the leading edge of OUT in mode 2
moves by one tick...

> 
> > The qemu 8254 model actually models each edge at independent
> > clock ticks instead of combining both into a very brief pulse at one time.
> > I've found it handy to draw out old and new timing diagrams on paper
> > (for each mode), and then carefully think about what happens with respect
> > to levels and edges when you transition back and forth between old and
> > new models at various points in the timing cycle.  (Note I've spent more
> > time examining the qemu models rather than the kvm models.)
> > 
> Yes, drawing it definitely helps :)
> 
> > >> this is likely to be serious is probably losing a single-shot (mode 4)
> > >> interrupt, but if my understanding of how things work is good, then
> > >> that should only be possible if a whole slew of conditions are
> > >> all met:
> > >> 
> > >>  1. The guest is configured to run in a "tickless" mode (like
> > >>     modern Linux).
> > >>  2. The guest is for some reason still using the i8254 rather
> > >>     than something more modern like an HPET.  (The combination
> > >>     of 1 and 2 should be rare.)
> > > This is not so rare. For performance reason it is better to not have
> > > HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.
> > 
> > In a later email you mention that Linux prefers a timer in the APIC.
> > I don't know much about the APIC (advanced interrupt controller), and
> > wasn't
> > even aware had it's own timer.
> > 
> > The big question is if we can safely just fix the i825* models, or if
> > we need something more subtle to avoid breaking commonly used guests
> > like modern Linux (support both corrected and old models,
> > or only fix IRQ2 instead of all IRQs, or similar subtlety).
> Migration may happen while guest is running firmaware. Who knows what
> those are doing. If the fix is as easy as I described above we should go
> for it.
> 
> > 
> > > 
> > >>  3. The migration is going from a fixed version back to the
> > >>     old version.  (Not sure how common this is, but it should
> > >>     be rarer than migrating from old to new.)
> > >>  4. There are not going to be any "timely" events/interrupts
> > >>     (keyboard, network, process sleeps, etc) that cause the guest
> > >>     to reset the PIT mode 4 one-shot counter "soon enough".
> > >> 
> > >> This combination should be rare enough that more complicated
> > >> solutions are not worth the effort.
> > >> 
> > >> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
> > >> ---
> > >>  arch/x86/kvm/i8254.c | 6 +++++-
> > >>  1 file changed, 5 insertions(+), 1 deletion(-)
> > >> 
> > >> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> > >> index c1d30b2..cd4ec60 100644
> > >> --- a/arch/x86/kvm/i8254.c
> > >> +++ b/arch/x86/kvm/i8254.c
> > >> @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
> > >>  	}
> > >>  	spin_unlock(&ps->inject_lock);
> > >>  	if (inject) {
> > >> -		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > >> +		/* Clear previous interrupt, then create a rising
> > >> +		 * edge to request another interupt, and leave it at
> > >> +		 * level=1 until time to inject another one.
> > >> +		 */
> > >>  		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> > >> +		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > >>  
> > >>  		/*
> > >>  		 * Provides NMI watchdog support via Virtual Wire mode.
> > >> -- 
> > >> 1.7.10.2.484.gcd07cc5
> > > 
> > > --
> > > 			Gleb.
> 
> --
> 			Gleb.
Gleb Natapov - Jan. 11, 2013, 3:45 p.m.
On Thu, Jan 10, 2013 at 11:40:07PM -0700, Matthew Ogilvie wrote:
> On Tue, Jan 08, 2013 at 09:31:19AM +0200, Gleb Natapov wrote:
> > On Mon, Jan 07, 2013 at 06:17:22PM -0600, mmogilvi@miniinfo.net wrote:
> > > On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov <gleb@redhat.com> wrote:
> > > > On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
> > > >> Reading the spec, it is clear that most modes normally leave the IRQ
> > > >> output line high, and only pulse it low to generate a leading edge.
> > > >> Especially the most commonly used mode 2.
> > > >> 
> > > >> The KVM i8254 model does not try to emulate the duration of the pulse at
> > > >> all, so just swap the high/low settings it to leave it high most of
> > > >> the time.
> > > >> 
> > > >> This fix is a prerequisite to improving the i8259 model to handle
> > > >> the trailing edge of an interupt request as indicated in its spec:
> > > >> If it gets a trailing edge of an IRQ line before it starts to service
> > > >> the interrupt, the request should be canceled.
> > > >> 
> > > >> See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
> > > >> or search the net for 23124406.pdf.
> > > >> 
> > > >> Risks:
> > > >> 
> > > >> There is a risk that migrating a running guest between versions
> > > >> with and without this patch will lose or gain a single timer
> > > >> interrupt during the migration process.  The only case where
> > > > Can you elaborate on how exactly this can happen? Do not see it.
> > > > 
> > > 
> > > KVM 8254: In the corrected model, when the count expires, the model
> > > briefly pulses output low and then high again, with the low to high
> > > transition being what triggers the interrupt.  In the old model,
> > > when the count expires, the model expects the output line
> > > to already be low, and briefly pulses it high (triggering the
> > > interrupt) and then low again.  But if the line was already
> > > high (because it migrated from the corrected model),
> > > this won't generate a new leading edge (low to high) and won't
> > > trigger a new interrupt (the first post-back-migration pulse turns
> > > into a simple trailing edge instead of a pulse).
> > > 
> > > Unless there is something I'm missing?
> > > 
> > No, I missed that pic->last_irr/ioapic->irr  will be migrated as 1. But
> > this means that next interrupt after migration from new to old will
> > always be lost.  What about clearing pit bit from last_irr/irr before
> > migration? Should not affect new->new migration and should fix new->old
> > one. The only problem is that we may need to consult irq routing table
> > to know how pit is connected to ioapic.
> 
> We should not clear both last_irr and irr.  That
> cancels the interrupt early if the CPU hasn't started servicing
> it yet:  If the guest CPU has interrupts disabled
> and hasn't begun to service the interrupt, a new->new migration could
> lose the interrupt much earlier in the countdown cycle than it otherwise
> might be lost.
> 
I am talking about last_irr in pic and irr in ioapic. Of course we
shouldn't clear pic->irr on migration. ioapic uses irr to detect edge.

> Potentially we could clear the last_irr bit only.  This effectively
> makes migration behave like the old unfixed code.  But if this is
> considered acceptable, I would be more inclined to not fix IRQ0 at all,
If we do not fix IRQ0 next timer interrupt after migration will always be
lost.

> rather than make IRQ0 work subtly differently normally vs during
> migration.  One of my earlier patch series (qemu v7 dated Nov 25)
> attempts to use basically this strategy for the qemu model, at
> least in the short term (and then potentially fix it properly
> in the longer term), although the details of series might
> be tweaked.
> 
> Or the minimal-risk strategy is to ONLY fix the cascade IRQ2's
> trailing edge [the original i825* problem that spawned this whole
> thing], leaving all other IRQs as-is.
> 
> > 
> > Still do not see how can we gain one interrupt.
> 
> In most cases I don't think we get an extra interrupt from
> a direct fix.  But some kinds of attempts to work around missing
> interrupts during migration can cause cause extra interrupts in other
> cases.  In the old qemu model (but perhaps not kvm), the mode 2 leading
> edge occurs one clock tick earlier than it ought to.  So if you
> try to be tricky with adjusting levels during migration, you
> may introduce possible cases where it gets an interrupt in
> the old model, then migrates and gets another interrupt one tick
> later in the new model...
> 
> Also, it occurs to me that maybe there might be an off-by-one issue in
> the kvm model of mode 2 that ought to be fixed as well?  Although the  
> zero length pulse in kvm suggests that one-off issues in counters do  
> not matter.  In the qemu model, the leading edge of OUT in mode 2
> moves by one tick...
> 
> > 
> > > The qemu 8254 model actually models each edge at independent
> > > clock ticks instead of combining both into a very brief pulse at one time.
> > > I've found it handy to draw out old and new timing diagrams on paper
> > > (for each mode), and then carefully think about what happens with respect
> > > to levels and edges when you transition back and forth between old and
> > > new models at various points in the timing cycle.  (Note I've spent more
> > > time examining the qemu models rather than the kvm models.)
> > > 
> > Yes, drawing it definitely helps :)
> > 
> > > >> this is likely to be serious is probably losing a single-shot (mode 4)
> > > >> interrupt, but if my understanding of how things work is good, then
> > > >> that should only be possible if a whole slew of conditions are
> > > >> all met:
> > > >> 
> > > >>  1. The guest is configured to run in a "tickless" mode (like
> > > >>     modern Linux).
> > > >>  2. The guest is for some reason still using the i8254 rather
> > > >>     than something more modern like an HPET.  (The combination
> > > >>     of 1 and 2 should be rare.)
> > > > This is not so rare. For performance reason it is better to not have
> > > > HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.
> > > 
> > > In a later email you mention that Linux prefers a timer in the APIC.
> > > I don't know much about the APIC (advanced interrupt controller), and
> > > wasn't
> > > even aware had it's own timer.
> > > 
> > > The big question is if we can safely just fix the i825* models, or if
> > > we need something more subtle to avoid breaking commonly used guests
> > > like modern Linux (support both corrected and old models,
> > > or only fix IRQ2 instead of all IRQs, or similar subtlety).
> > Migration may happen while guest is running firmaware. Who knows what
> > those are doing. If the fix is as easy as I described above we should go
> > for it.
> > 
> > > 
> > > > 
> > > >>  3. The migration is going from a fixed version back to the
> > > >>     old version.  (Not sure how common this is, but it should
> > > >>     be rarer than migrating from old to new.)
> > > >>  4. There are not going to be any "timely" events/interrupts
> > > >>     (keyboard, network, process sleeps, etc) that cause the guest
> > > >>     to reset the PIT mode 4 one-shot counter "soon enough".
> > > >> 
> > > >> This combination should be rare enough that more complicated
> > > >> solutions are not worth the effort.
> > > >> 
> > > >> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
> > > >> ---
> > > >>  arch/x86/kvm/i8254.c | 6 +++++-
> > > >>  1 file changed, 5 insertions(+), 1 deletion(-)
> > > >> 
> > > >> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> > > >> index c1d30b2..cd4ec60 100644
> > > >> --- a/arch/x86/kvm/i8254.c
> > > >> +++ b/arch/x86/kvm/i8254.c
> > > >> @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
> > > >>  	}
> > > >>  	spin_unlock(&ps->inject_lock);
> > > >>  	if (inject) {
> > > >> -		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > > >> +		/* Clear previous interrupt, then create a rising
> > > >> +		 * edge to request another interupt, and leave it at
> > > >> +		 * level=1 until time to inject another one.
> > > >> +		 */
> > > >>  		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> > > >> +		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > > >>  
> > > >>  		/*
> > > >>  		 * Provides NMI watchdog support via Virtual Wire mode.
> > > >> -- 
> > > >> 1.7.10.2.484.gcd07cc5
> > > > 
> > > > --
> > > > 			Gleb.
> > 
> > --
> > 			Gleb.

--
			Gleb.
Matthew Ogilvie - Jan. 15, 2013, 9:49 a.m.
On Fri, Jan 11, 2013 at 05:45:28PM +0200, Gleb Natapov wrote:
> On Thu, Jan 10, 2013 at 11:40:07PM -0700, Matthew Ogilvie wrote:
> > On Tue, Jan 08, 2013 at 09:31:19AM +0200, Gleb Natapov wrote:
> > > On Mon, Jan 07, 2013 at 06:17:22PM -0600, mmogilvi@miniinfo.net wrote:
> > > > On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov <gleb@redhat.com> wrote:
> > > > > On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
> > > > >> Reading the spec, it is clear that most modes normally leave the IRQ
> > > > >> output line high, and only pulse it low to generate a leading edge.
> > > > >> Especially the most commonly used mode 2.
> > > > >> 
> > > > >> The KVM i8254 model does not try to emulate the duration of the pulse at
> > > > >> all, so just swap the high/low settings it to leave it high most of
> > > > >> the time.
> > > > >> 
> > > > >> This fix is a prerequisite to improving the i8259 model to handle
> > > > >> the trailing edge of an interupt request as indicated in its spec:
> > > > >> If it gets a trailing edge of an IRQ line before it starts to service
> > > > >> the interrupt, the request should be canceled.
> > > > >> 
> > > > >> See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
> > > > >> or search the net for 23124406.pdf.
> > > > >> 
> > > > >> Risks:
> > > > >> 
> > > > >> There is a risk that migrating a running guest between versions
> > > > >> with and without this patch will lose or gain a single timer
> > > > >> interrupt during the migration process.  The only case where
> > > > > Can you elaborate on how exactly this can happen? Do not see it.
> > > > > 
> > > > 
> > > > KVM 8254: In the corrected model, when the count expires, the model
> > > > briefly pulses output low and then high again, with the low to high
> > > > transition being what triggers the interrupt.  In the old model,
> > > > when the count expires, the model expects the output line
> > > > to already be low, and briefly pulses it high (triggering the
> > > > interrupt) and then low again.  But if the line was already
> > > > high (because it migrated from the corrected model),
> > > > this won't generate a new leading edge (low to high) and won't
> > > > trigger a new interrupt (the first post-back-migration pulse turns
> > > > into a simple trailing edge instead of a pulse).
> > > > 
> > > > Unless there is something I'm missing?
> > > > 
> > > No, I missed that pic->last_irr/ioapic->irr  will be migrated as 1. But
> > > this means that next interrupt after migration from new to old will
> > > always be lost.  What about clearing pit bit from last_irr/irr before
> > > migration? Should not affect new->new migration and should fix new->old
> > > one. The only problem is that we may need to consult irq routing table
> > > to know how pit is connected to ioapic.
> > 
> > We should not clear both last_irr and irr.  That
> > cancels the interrupt early if the CPU hasn't started servicing
> > it yet:  If the guest CPU has interrupts disabled
> > and hasn't begun to service the interrupt, a new->new migration could
> > lose the interrupt much earlier in the countdown cycle than it otherwise
> > might be lost.
> > 
> I am talking about last_irr in pic and irr in ioapic. Of course we
> shouldn't clear pic->irr on migration. ioapic uses irr to detect edge.

I probably need to look into the details of how the ioapic is supposed
to work.  Sigh.

> 
> > Potentially we could clear the last_irr bit only.  This effectively
> > makes migration behave like the old unfixed code.  But if this is
> > considered acceptable, I would be more inclined to not fix IRQ0 at all,
> If we do not fix IRQ0 next timer interrupt after migration will always be
> lost.

Obviously true if the trailing edge consumption logic for the
IRQ0 line is corrected in the 8259.  But if it isn't:

I probably could have been clearer that we could leave the 8254 unchanged,
and adjust the 8259 fix to leave IRQ0 as-is (with the incorrect handling
of a trailing edge - only other IRQ lines would be fixed), and then
timer interrupts would just work exactly as they do now.
This minimal fix would would probably be the lowest risk of
breaking something that currently works, but I don't know if
people would go for a patch that intentionally leaves in known
breakage in IRQ0...  This is option 1 below.

------

If we cleared last_irr in the qemu model during migration,
we risk getting an EXTRA interrupt when
migrating mode 4 (misremembered as mode 2 in an earlier
email below) from old to new models.  (If the sequence goes: line
is asserted, guest migrates, interrupt is handled [all in less than
a 8254 clock tick (approx 1 MHz)], then the off-by-one edge
in the new model triggers another leading edge...)  In contrast,
this might not be a risk in the KVM model as currently implemented.

Maybe this objection is not important, and you like option 4
listed below.

-------

So I'm still not sure what overall fix strategy the main qemu and kvm
maintainers would like the best.  There are several possibilities,
but they all seem to have notable downsides.

Some possible fix strategies:

1. Only fix the IRQ2 (cascade) line in 8259.  Leave trailing edge
   logic for other lines as-is, and don't touch the 8254.  (Or
   similar: only fix lines that don't risk breaking
   anything [everything except IRQ0?])
    * For: This has no risk of breaking anything that currently
      works, and fixes my own problem.
    * Against: It somehow feels wrong to intentionally leave incorrect
      models for IRQ0 (both generation and consumption).  It also feels
      wrong to add more assumptions about PC-style
      slave/master PIC and PIT wiring in the device model code.
2. Just fix the 8254 and 8259 directly, with nothing fancy for
   compatibility.  My v8 (Dec 16) patch series does this.
    * For: simple
    * For: A single lost interrupt during migration between versions
      is probably not serious.
    * Against: Unless it looses a one-shot mode interrupt, like with
      a "tickless" Linux kernel.  (Although if the guest gets
      some other interrupt "soon", and it resets the 8254
      as part of handling that interrupt, then maybe
      this isn't serious either.)
    * For: But on the other hand, I get the impression that
      anything new enough to do such a "tickless" thing is
      likely to prefer using some other timer besides the 8254.  But
      maybe this impression is wrong?
3. Fix it in stages, with years between stages, where it can safely
   migrate between adjacent stages, but not as safely migrate between
   skipped stages.  My v7 (Nov 25) patch series was attempting one way of
   doing this, although there are probably other possibilities.
4. Fix the 8254 and 8259, but add in some migration logic to force
   IRQ0's pic->last_irr bit to 0 when migrating.
    * For: Avoid lost interrupts.
    * Against: This basically sets the migration state as if we did
      not fix the IRQ0 line trailing edge handling consumption
      logic in the 8259 at all, which is subtly different from
      normal operation.  I would tend prefer to avoid making
      post-migration state any different from normal
      operation state.
    * For: But maybe this could be the first stage of a
      multi-stage approach, where we can eventually
      eliminate the special logic in the migration someday
      in the future (when migrating to versions with the old
      models is no longer something we care about).  So the
      ugly inconsistency would only be in the code for a
      few years.  (See also strategy 3 above.)
    * Against: In the qemu model (but not the kvm model), clearing
      last_irr may occasionally cause an EXTRA interrupt for mode
      4 during part of the output cycle.  [Sequence: Old version
      raises IRQ0 one clock tick early, guest migrates, interrupt
      is handled, then new version raises IRQ0 at the correct
      time...]
    * Question: How to handle the migration hack for the
      KVM model?  It may needs to do this correction both in
      the kernel and in qemu, which feels ugly.   Or potentially
      add new ioctl()'s (and fallback code if the ioctl is
      missing) to support both the "compatible" state and
      the "actual" state, but then that is drifting towards
      option 5 below:
5. Make the IRQ0 generation/consumption behavior user-selectable
   with a command line option (the cross-version migration "promise"
   is allowed to be broken if qemu is invoked with different options).
   Sometime in the future, perhaps change the default, and/or
   eliminate the old option.
    * For: By defaulting to current model behavior, we could avoid
      breaking anything that currently depends on current
      behavior in hard-to-forsee ways.
    * Against: More complicated code, and depending on details
      may either be slightly slower (more conditional branches in common
      code paths), or even more complicated code (another level
      of abstraction and more C-style virtual methods...).
    * Against: The non-default case probably wont be tested much,
      and risks bit-rotting.
    * Question: What should the command line option look like?


Anyone have any preference for any of these?  Or other alternatives?

-------------------
Note that my tentative assignment of requirement priorities is
as described below.

High priority goals:

1. Handle the "trailing edge" of IRQ2 in the qemu 8259 model correctly,
   specifically to fix the case of a guest (Microport UNIX) that
   dynamically manipulated the IMR (mask register) of the slave
   while an IRQ line is asserted and unhandled.
2. Do not break any common use cases, at least not badly (perhaps
   a single lost or gained interrupt would be OK as long as it
   doesn't cause longer term breakage in a guest).
    * all device models should still work
    * basic operation of modern/common guest operating systems
    * cross version migration when KVM is in use

Moderate priority goals (I wouldn't mind if these
weren't met, but I would like to try to meet them):

3. Support cross version migration with plain qemu (not KVM). (or
   is this high priority like KVM migration?)
4. Completely fix all IRQ lines in both 8259 models.  (I only really need
   the cascade line (IRQ2) to be fixed.)
5. Update the KVM models to continue to match the qemu models, at least
   to the same extent they are similar now.  (I don't actually care about
   the KVM 825* models much, but I would like to avoid making them
   any less consistent.  [see also 7 below])
6. Improve the output logic of 8254 models to match the spec better,
   instead of depending on the broken trailing edge logic in the
   8259 models in order to work.  And as long as it is changing,
   fix some other small related details, like the exact clock
   tick on which edges occur, some GATE stuff, etc.

Low priority (non)-goals (I'm not inclined to worry about these):

7. Enhance the KVM 8254 model to be able to model the leading and trailing
   edges of the output line at distinct times (similar to the
   qemu 8254 model).  Perhaps it only does this when it isn't
   trying to "catch up" with missed interrupts.  It could still
   do zero-time pulses when catching up.
8. Accurately model the 8254's GATE line.  PC hardware hardwires it
   in the main channel 0, so modeling changes to it isn't
   particularly important.  Currently the models don't have
   a good way to accurately model a paused countdown when GATE is low.
    * (exception): However, there is already some imperfect handling
      of GATE in the 8254 models, and the speaker channel has a
      software-controlled GATE.  So I don't want to make GATE any
      worse than it already is, and if I can make partial improvements
      by changing just a couple of lines, I'm inclined to do so as
      part of item 6 above.


- Matthew Ogilvie

> 
> > rather than make IRQ0 work subtly differently normally vs during
> > migration.  One of my earlier patch series (qemu v7 dated Nov 25)
> > attempts to use basically this strategy for the qemu model, at
> > least in the short term (and then potentially fix it properly
> > in the longer term), although the details of series might
> > be tweaked.
> > 
> > Or the minimal-risk strategy is to ONLY fix the cascade IRQ2's
> > trailing edge [the original i825* problem that spawned this whole
> > thing], leaving all other IRQs as-is.
> > 
> > > 
> > > Still do not see how can we gain one interrupt.
> > 
> > In most cases I don't think we get an extra interrupt from
> > a direct fix.  But some kinds of attempts to work around missing
> > interrupts during migration can cause cause extra interrupts in other
> > cases.  In the old qemu model (but perhaps not kvm), the mode 2 leading
> > edge occurs one clock tick earlier than it ought to.  So if you
> > try to be tricky with adjusting levels during migration, you
> > may introduce possible cases where it gets an interrupt in
> > the old model, then migrates and gets another interrupt one tick
> > later in the new model...
> > 
> > Also, it occurs to me that maybe there might be an off-by-one issue in
> > the kvm model of mode 2 that ought to be fixed as well?  Although the  
> > zero length pulse in kvm suggests that one-off issues in counters do  
> > not matter.  In the qemu model, the leading edge of OUT in mode 2
> > moves by one tick...
> > 
> > > 
> > > > The qemu 8254 model actually models each edge at independent
> > > > clock ticks instead of combining both into a very brief pulse at one time.
> > > > I've found it handy to draw out old and new timing diagrams on paper
> > > > (for each mode), and then carefully think about what happens with respect
> > > > to levels and edges when you transition back and forth between old and
> > > > new models at various points in the timing cycle.  (Note I've spent more
> > > > time examining the qemu models rather than the kvm models.)
> > > > 
> > > Yes, drawing it definitely helps :)
> > > 
> > > > >> this is likely to be serious is probably losing a single-shot (mode 4)
> > > > >> interrupt, but if my understanding of how things work is good, then
> > > > >> that should only be possible if a whole slew of conditions are
> > > > >> all met:
> > > > >> 
> > > > >>  1. The guest is configured to run in a "tickless" mode (like
> > > > >>     modern Linux).
> > > > >>  2. The guest is for some reason still using the i8254 rather
> > > > >>     than something more modern like an HPET.  (The combination
> > > > >>     of 1 and 2 should be rare.)
> > > > > This is not so rare. For performance reason it is better to not have
> > > > > HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.
> > > > 
> > > > In a later email you mention that Linux prefers a timer in the APIC.
> > > > I don't know much about the APIC (advanced interrupt controller), and
> > > > wasn't
> > > > even aware had it's own timer.
> > > > 
> > > > The big question is if we can safely just fix the i825* models, or if
> > > > we need something more subtle to avoid breaking commonly used guests
> > > > like modern Linux (support both corrected and old models,
> > > > or only fix IRQ2 instead of all IRQs, or similar subtlety).
> > > Migration may happen while guest is running firmaware. Who knows what
> > > those are doing. If the fix is as easy as I described above we should go
> > > for it.
> > > 
> > > > 
> > > > > 
> > > > >>  3. The migration is going from a fixed version back to the
> > > > >>     old version.  (Not sure how common this is, but it should
> > > > >>     be rarer than migrating from old to new.)
> > > > >>  4. There are not going to be any "timely" events/interrupts
> > > > >>     (keyboard, network, process sleeps, etc) that cause the guest
> > > > >>     to reset the PIT mode 4 one-shot counter "soon enough".
> > > > >> 
> > > > >> This combination should be rare enough that more complicated
> > > > >> solutions are not worth the effort.
> > > > >> 
> > > > >> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
> > > > >> ---
> > > > >>  arch/x86/kvm/i8254.c | 6 +++++-
> > > > >>  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > >> 
> > > > >> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> > > > >> index c1d30b2..cd4ec60 100644
> > > > >> --- a/arch/x86/kvm/i8254.c
> > > > >> +++ b/arch/x86/kvm/i8254.c
> > > > >> @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
> > > > >>  	}
> > > > >>  	spin_unlock(&ps->inject_lock);
> > > > >>  	if (inject) {
> > > > >> -		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > > > >> +		/* Clear previous interrupt, then create a rising
> > > > >> +		 * edge to request another interupt, and leave it at
> > > > >> +		 * level=1 until time to inject another one.
> > > > >> +		 */
> > > > >>  		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> > > > >> +		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > > > >>  
> > > > >>  		/*
> > > > >>  		 * Provides NMI watchdog support via Virtual Wire mode.
> > > > >> -- 
> > > > >> 1.7.10.2.484.gcd07cc5
> > > > > 
> > > > > --
> > > > > 			Gleb.
> > > 
> > > --
> > > 			Gleb.
> 
> --
> 			Gleb.

Patch

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index c1d30b2..cd4ec60 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -290,8 +290,12 @@  static void pit_do_work(struct kthread_work *work)
 	}
 	spin_unlock(&ps->inject_lock);
 	if (inject) {
-		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
+		/* Clear previous interrupt, then create a rising
+		 * edge to request another interupt, and leave it at
+		 * level=1 until time to inject another one.
+		 */
 		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
+		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
 
 		/*
 		 * Provides NMI watchdog support via Virtual Wire mode.