Patchwork [v5,5/6] i8259: fix so that dropping IRQ level always clears the interrupt request

login
register
mail settings
Submitter Matthew Ogilvie
Date Sept. 10, 2012, 1:27 a.m.
Message ID <1347240466-6152-6-git-send-email-mmogilvi_qemu@miniinfo.net>
Download mbox | patch
Permalink /patch/182773/
State New
Headers show

Comments

Matthew Ogilvie - Sept. 10, 2012, 1:27 a.m.
Intel's definition of "edge triggered" means: "asserted with a
low-to-high transition at the time an interrupt is registered and
then kept high until the interrupt is served via one of the
EOI mechanisms or goes away unhandled."

So the only difference between edge triggered and level triggered
is in the leading edge, with no difference in the trailing edge.

This bug manifested itself when the guest was Microport UNIX
System V/386 v2.1 (ca. 1987), because it would sometimes mask
off IRQ14 in the slave IMR after it had already been asserted.
The master would still try to deliver an interrupt even though
IRQ2 had dropped again, resulting in a spurious interupt
(IRQ15) and a panicked kernel.

Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
---

If you missed the previous thread about this, see
http://www.mail-archive.com/qemu-devel@nongnu.org/msg129071.html

 hw/i8259.c | 1 +
 1 file changed, 1 insertion(+)
Avi Kivity - Sept. 10, 2012, 8:56 a.m.
On 09/10/2012 04:27 AM, Matthew Ogilvie wrote:
> Intel's definition of "edge triggered" means: "asserted with a
> low-to-high transition at the time an interrupt is registered and
> then kept high until the interrupt is served via one of the
> EOI mechanisms or goes away unhandled."
> 
> So the only difference between edge triggered and level triggered
> is in the leading edge, with no difference in the trailing edge.

Hard to believe.  So an edge while cpu interrupts are disabled is ignored?

> 
> This bug manifested itself when the guest was Microport UNIX
> System V/386 v2.1 (ca. 1987), because it would sometimes mask
> off IRQ14 in the slave IMR after it had already been asserted.
> The master would still try to deliver an interrupt even though
> IRQ2 had dropped again, resulting in a spurious interupt
> (IRQ15) and a panicked kernel.

This is something else.  It means that setting an edge must happen after
IMR is cleared to be picked up.  But this is not what the patch is doing.

> 
> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
> ---
> 
> If you missed the previous thread about this, see
> http://www.mail-archive.com/qemu-devel@nongnu.org/msg129071.html
> 
>  hw/i8259.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/i8259.c b/hw/i8259.c
> index 6587666..c011787 100644
> --- a/hw/i8259.c
> +++ b/hw/i8259.c
> @@ -157,6 +157,7 @@ static void pic_set_irq(void *opaque, int irq, int level)
>              }
>              s->last_irr |= mask;
>          } else {
> +            s->irr &= ~mask;
>              s->last_irr &= ~mask;
>          }
>      }
>
Jan Kiszka - Sept. 10, 2012, 9:09 a.m.
On 2012-09-10 10:56, Avi Kivity wrote:
> On 09/10/2012 04:27 AM, Matthew Ogilvie wrote:
>> Intel's definition of "edge triggered" means: "asserted with a
>> low-to-high transition at the time an interrupt is registered and
>> then kept high until the interrupt is served via one of the
>> EOI mechanisms or goes away unhandled."
>>
>> So the only difference between edge triggered and level triggered
>> is in the leading edge, with no difference in the trailing edge.
> 
> Hard to believe.  So an edge while cpu interrupts are disabled is ignored?

No, this is about the PIC, not the CPU interrupt inputs.

Matthew, did you verify this on real hardware by reading back the IRR as
I suggested?

Jan
Avi Kivity - Sept. 10, 2012, 9:18 a.m.
On 09/10/2012 12:09 PM, Jan Kiszka wrote:
> On 2012-09-10 10:56, Avi Kivity wrote:
>> On 09/10/2012 04:27 AM, Matthew Ogilvie wrote:
>>> Intel's definition of "edge triggered" means: "asserted with a
>>> low-to-high transition at the time an interrupt is registered and
>>> then kept high until the interrupt is served via one of the
>>> EOI mechanisms or goes away unhandled."
>>>
>>> So the only difference between edge triggered and level triggered
>>> is in the leading edge, with no difference in the trailing edge.
>> 
>> Hard to believe.  So an edge while cpu interrupts are disabled is ignored?
> 
> No, this is about the PIC, not the CPU interrupt inputs.

I see, the interrupt is still sent to the processor; but IRR reflects
that status of the input line, not a "pending interrupt" status.

Will this survive live migration?  If we clear IRR, then we must rely on
the other end to remember the IRQ, but if processor interrupts are
disabled there won't be an INTACK and the signal is lost.
Jan Kiszka - Sept. 10, 2012, 9:33 a.m.
On 2012-09-10 11:18, Avi Kivity wrote:
> On 09/10/2012 12:09 PM, Jan Kiszka wrote:
>> On 2012-09-10 10:56, Avi Kivity wrote:
>>> On 09/10/2012 04:27 AM, Matthew Ogilvie wrote:
>>>> Intel's definition of "edge triggered" means: "asserted with a
>>>> low-to-high transition at the time an interrupt is registered and
>>>> then kept high until the interrupt is served via one of the
>>>> EOI mechanisms or goes away unhandled."
>>>>
>>>> So the only difference between edge triggered and level triggered
>>>> is in the leading edge, with no difference in the trailing edge.
>>>
>>> Hard to believe.  So an edge while cpu interrupts are disabled is ignored?
>>
>> No, this is about the PIC, not the CPU interrupt inputs.
> 
> I see, the interrupt is still sent to the processor; but IRR reflects
> that status of the input line, not a "pending interrupt" status.
> 
> Will this survive live migration?  If we clear IRR, then we must rely on
> the other end to remember the IRQ, but if processor interrupts are
> disabled there won't be an INTACK and the signal is lost.

We clear the IRR as there is nothing to deliver to the CPU anymore. No
IRQ source will drop its line as long as there is a reason for the IRQ,
I checked the edge-using devices. So we can't lose anything.

Jan
Maciej W. Rozycki - Sept. 10, 2012, 1:09 p.m.
On Mon, 10 Sep 2012, Avi Kivity wrote:

> >>> So the only difference between edge triggered and level triggered
> >>> is in the leading edge, with no difference in the trailing edge.
> >> 
> >> Hard to believe.  So an edge while cpu interrupts are disabled is ignored?

 Please note that x86 CPU's INT input is level-triggered, not 
edge-triggered -- I mean the "naked" input to the core, not anything that 
may be presented to the outside world by any local APIC present (IOW the 
line at the *output* of the local APIC where one is present).

 There's more confusion about the ExtINTA mode (even the name itself is 
spelled differently as ExtINTA vs ExtINT across various documents) as its 
trigger mode is implied and not configurable with the vector/redirection 
table entry's trigger-mode bit and various APIC implementations treat it 
differently and hardwire as either edge-triggered or level-triggered as 
the designers saw fit.

> > No, this is about the PIC, not the CPU interrupt inputs.
> 
> I see, the interrupt is still sent to the processor; but IRR reflects
> that status of the input line, not a "pending interrupt" status.

 Not really, this is still a "pending interrupt" status.

 For level-triggered inputs the state of IRR bits do indeed follow the 
respective IRx inputs (taking the IMR into account).  For edge-triggered 
inputs the relevant IRR bit is set by a leading edge on its corresponding 
IRx input and cleared when the interrupt is acknowledged (either with an 
INTA bus cycle or by a data read bus cycle issued to the PIC armed with an 
OCW3 that has had the POLL command bit set) OR with a trailing edge on IRx 
(again, all this takes the IMR into account).  At this point another 
leading edge is required for the IRR bit to be set again, that is merely 
keeping the IRx input's level active won't trigger another interrupt.

  Maciej
Matthew Ogilvie - Sept. 11, 2012, 4:32 a.m.
On Mon, Sep 10, 2012 at 11:09:27AM +0200, Jan Kiszka wrote:
> On 2012-09-10 10:56, Avi Kivity wrote:
> > On 09/10/2012 04:27 AM, Matthew Ogilvie wrote:
> >> Intel's definition of "edge triggered" means: "asserted with a
> >> low-to-high transition at the time an interrupt is registered and
> >> then kept high until the interrupt is served via one of the
> >> EOI mechanisms or goes away unhandled."
> >>
> >> So the only difference between edge triggered and level triggered
> >> is in the leading edge, with no difference in the trailing edge.
> > 
> > Hard to believe.  So an edge while cpu interrupts are disabled is ignored?
> 
> No, this is about the PIC, not the CPU interrupt inputs.
> 
> Matthew, did you verify this on real hardware by reading back the IRR as
> I suggested?
> 
> Jan

I hadn't before, but now that I've checked, it's as expected:

-----------
Real hardware [Pentium 4]:

  cmdRead unmask IRR=4005 mask IRR=4001 sti unmask irq14 IRR=0001 DONE

[I also see a final IRR=0000 occasionally.  Probably just happened to
ask it while the timer (IRQ0) line is low (without the new understanding
of the trailing edge of an edge triggered interrupt, this would
have been confusing).  I have most IRQ's (including
timer) masked off.]

-----------
Unpatched qemu:

  cmdRead unmask IRR=4015 mask IRR=4015 sti irq15 unmask IRR=4015 DONE

[Presumably IRQ4 (0x10 - qemu's serial device model?) had a transient
edge during initialization, but had been masked off even before I
masked it off?]

-----------
Patched qemu:

  cmdRead unmask IRR=4005 mask IRR=4001 sti unmask irq14 IRR=0001 DONE


          - Matthew
Jan Kiszka - Sept. 11, 2012, 9:05 a.m.
On 2012-09-11 06:32, Matthew Ogilvie wrote:
> On Mon, Sep 10, 2012 at 11:09:27AM +0200, Jan Kiszka wrote:
>> On 2012-09-10 10:56, Avi Kivity wrote:
>>> On 09/10/2012 04:27 AM, Matthew Ogilvie wrote:
>>>> Intel's definition of "edge triggered" means: "asserted with a
>>>> low-to-high transition at the time an interrupt is registered and
>>>> then kept high until the interrupt is served via one of the
>>>> EOI mechanisms or goes away unhandled."
>>>>
>>>> So the only difference between edge triggered and level triggered
>>>> is in the leading edge, with no difference in the trailing edge.
>>>
>>> Hard to believe.  So an edge while cpu interrupts are disabled is ignored?
>>
>> No, this is about the PIC, not the CPU interrupt inputs.
>>
>> Matthew, did you verify this on real hardware by reading back the IRR as
>> I suggested?
>>
>> Jan
> 
> I hadn't before, but now that I've checked, it's as expected:
> 
> -----------
> Real hardware [Pentium 4]:
> 
>   cmdRead unmask IRR=4005 mask IRR=4001 sti unmask irq14 IRR=0001 DONE
> 
> [I also see a final IRR=0000 occasionally.  Probably just happened to
> ask it while the timer (IRQ0) line is low (without the new understanding
> of the trailing edge of an edge triggered interrupt, this would
> have been confusing).  I have most IRQ's (including
> timer) masked off.]
> 
> -----------
> Unpatched qemu:
> 
>   cmdRead unmask IRR=4015 mask IRR=4015 sti irq15 unmask IRR=4015 DONE
> 
> [Presumably IRQ4 (0x10 - qemu's serial device model?) had a transient
> edge during initialization, but had been masked off even before I
> masked it off?]
> 
> -----------
> Patched qemu:
> 
>   cmdRead unmask IRR=4005 mask IRR=4001 sti unmask irq14 IRR=0001 DONE
> 

I think this is convincing, maybe worth documenting in the related
changelogs of QEMU and KVM. Avi, doubts remaining?

Jan
Avi Kivity - Sept. 11, 2012, 12:57 p.m.
On 09/10/2012 04:09 PM, Maciej W. Rozycki wrote:
> 
>> > No, this is about the PIC, not the CPU interrupt inputs.
>> 
>> I see, the interrupt is still sent to the processor; but IRR reflects
>> that status of the input line, not a "pending interrupt" status.
> 
>  Not really, this is still a "pending interrupt" status.
> 
>  For level-triggered inputs the state of IRR bits do indeed follow the 
> respective IRx inputs (taking the IMR into account).  For edge-triggered 
> inputs the relevant IRR bit is set by a leading edge on its corresponding 
> IRx input and cleared when the interrupt is acknowledged (either with an 
> INTA bus cycle or by a data read bus cycle issued to the PIC armed with an 
> OCW3 that has had the POLL command bit set) OR with a trailing edge on IRx 
> (again, all this takes the IMR into account).  At this point another 
> leading edge is required for the IRR bit to be set again, that is merely 
> keeping the IRx input's level active won't trigger another interrupt.


Ok, thanks, that explains it for me.
Avi Kivity - Sept. 11, 2012, 12:58 p.m.
On 09/11/2012 12:05 PM, Jan Kiszka wrote:

> I think this is convincing, maybe worth documenting in the related
> changelogs of QEMU and KVM. Avi, doubts remaining?
> 

Nope, I think I've understood it finally.
BALATON Zoltan - Nov. 19, 2012, 3:28 p.m.
On Sun, 9 Sep 2012, Matthew Ogilvie wrote:
> Intel's definition of "edge triggered" means: "asserted with a
> low-to-high transition at the time an interrupt is registered and
> then kept high until the interrupt is served via one of the
> EOI mechanisms or goes away unhandled."
>
> So the only difference between edge triggered and level triggered
> is in the leading edge, with no difference in the trailing edge.
>
> This bug manifested itself when the guest was Microport UNIX
> System V/386 v2.1 (ca. 1987), because it would sometimes mask
> off IRQ14 in the slave IMR after it had already been asserted.
> The master would still try to deliver an interrupt even though
> IRQ2 had dropped again, resulting in a spurious interupt
> (IRQ15) and a panicked kernel.
>
> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net>
> ---
>
> If you missed the previous thread about this, see
> http://www.mail-archive.com/qemu-devel@nongnu.org/msg129071.html
>
> hw/i8259.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/hw/i8259.c b/hw/i8259.c
> index 6587666..c011787 100644
> --- a/hw/i8259.c
> +++ b/hw/i8259.c
> @@ -157,6 +157,7 @@ static void pic_set_irq(void *opaque, int irq, int level)
>             }
>             s->last_irr |= mask;
>         } else {
> +            s->irr &= ~mask;
>             s->last_irr &= ~mask;
>         }
>     }
>

What happened to this patch? Any chance it will be in 1.3?

Thanks,
BALATON Zoltan
Matthew Ogilvie - Nov. 20, 2012, 5:05 a.m.
On Mon, Nov 19, 2012 at 04:28:31PM +0100, BALATON Zoltan wrote:
> On Sun, 9 Sep 2012, Matthew Ogilvie wrote:
> > Intel's definition of "edge triggered" means: "asserted with a
> > low-to-high transition at the time an interrupt is registered and
> > then kept high until the interrupt is served via one of the
> > EOI mechanisms or goes away unhandled."
> >
> > So the only difference between edge triggered and level triggered
> > is in the leading edge, with no difference in the trailing edge.
> 
> What happened to this patch? Any chance it will be in 1.3?
> 
> Thanks,
> BALATON Zoltan

I kind of doubt it will make it into 1.3, although I would like to
get it in eventually.  Maybe the first few essentially unrelated
trivial patches can make it in?

Yours is the first comment I've seen since I've posted version 6 of
the series (this particular patch is the same as version 5).  The lack
of feedback combined with other demands on my time have prevented me
from progressing on this for over a month and a half.

Status:

   * We can't completely fix the i8259 model without addressing some
     issues in the i8254 model as well.  The i8254 issues (very short
     duty cycle on IRQ0 causing lost timer ticks) become rather
     severe if you try to fix the i8259 without fixing the i8254.
   * But the obvious direct fixes to the i8254 may risk causing issues with
     breaking migration between pre-fix and post-fix versions of qemu,
     due to lost and/or extra timer interrupts.  I'm not sure how big
     of an issue this is in practice, but it is a concern.
   * I've outlined some possible ways to address the migration issue
     in the cover letter of version 6 of the patch series.
   * Similar issue cascades exist in the in-kernel KVM model as well.
     The i8254 issues are even more severe in KVM: timer interrupts
     are always lost [0-length duty cycle in KVM model], instead of
     sometimes lost [non-0 but tiny duty cycle in qemu model].
   * I'm not aware of any similar problems in other device interrupt
     models, but I haven't really checked them, either.

Next step?:

In the absense of any other suggestions, I'm thinking about rolling
a version of the series that leaves IRQ0 (timer) working the way
it does in qemu 1.2.  Except in the i8254 I'll immediately preceed
each "intended to be leading edge transition" with an
explicit lowering of IRQ0, so that if migrating from some
future really-fixed version that leaves IRQ0 high most
of the time, it will still recognize the new edge.  A "real" fix
would then be delayed (years?) until versions without this first
stage fix are no longer in production.  I can probably
get this done this weekend.  [Although I'm not sure how to
deal with the issue that some of the i8254 modes' leading edge
transitions are off by one count.  Perhaps we'll need multiple
intermediate stages, or one of the other strategies outlined in
the version 6 cover letter.]

Or does anyone have a better suggestion?

                         - Matthew Ogilvie

Patch

diff --git a/hw/i8259.c b/hw/i8259.c
index 6587666..c011787 100644
--- a/hw/i8259.c
+++ b/hw/i8259.c
@@ -157,6 +157,7 @@  static void pic_set_irq(void *opaque, int irq, int level)
             }
             s->last_irr |= mask;
         } else {
+            s->irr &= ~mask;
             s->last_irr &= ~mask;
         }
     }