Message ID | 1346640974-30974-6-git-send-email-mmogilvi_qemu@miniinfo.net |
---|---|
State | New |
Headers | show |
Il 03/09/2012 04:56, Matthew Ogilvie ha scritto: > Although I haven't found any specs that say so, on real hardware > I have a test program that shows if you mask off the slave > interrupt (say IRQ14) in the IMR after it has already been raised, > the master (IRQ2) gets canceled (that is, IRQ2 acts like it is level > triggered). Without this patch, qemu delivers a > spurious interrupt (IRQ15) instead when running the test program. > > Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net> Nice testing, thanks! KVM's i8259 emulation can be patched at arch/x86/kvm/i8259.c in the Linux tree. You can write a test for kvm-unit-tests. These tests can be written in C, including interrupt handlers (see lib/x86/isr.h). The git tree is at git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git. Paolo
Am 03.09.2012 04:56, schrieb Matthew Ogilvie: > diff --git a/hw/i8259_common.c b/hw/i8259_common.c > index ab3d98b..dcde5f2 100644 > --- a/hw/i8259_common.c > +++ b/hw/i8259_common.c [...] > @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { > VMSTATE_UINT8(isr, PICCommonState), > VMSTATE_UINT8(priority_add, PICCommonState), > VMSTATE_UINT8(irq_base, PICCommonState), > + VMSTATE_UINT8(icw3, PICCommonState), > VMSTATE_UINT8(read_reg_select, PICCommonState), > VMSTATE_UINT8(poll, PICCommonState), > VMSTATE_UINT8(special_mask, PICCommonState), Additional VMState needs to be versioned by incrementing .version_id and by specifying the new version number here. Otherwise it breaks migration. Regards, Andreas
On 2012-09-03 04:56, Matthew Ogilvie wrote: > Although I haven't found any specs that say so, on real hardware > I have a test program that shows if you mask off the slave > interrupt (say IRQ14) in the IMR after it has already been raised, > the master (IRQ2) gets canceled (that is, IRQ2 acts like it is level > triggered). Without this patch, qemu delivers a > spurious interrupt (IRQ15) instead when running the test program. > > Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net> > --- > > I've written a test program (in the form of a floppy disk boot sector) > that demonstrates that qemu's emulation of IRQ2 propagation from the > slave i8259 to the master does not work correctly when the CPU has > interrupts disabled and it masks off the original interrupt (IRQ14) > in the slave's IMR register. This was based on simplifying breakage > observed when trying to run an old Microport UNIX system (ca 1987). > > Earlier I speculated that maybe the ELCR bit for IRQ2 was incorrect, > but now I don't think changing the bit (from the target's > perspective) would be a good idea. See below. Yes, you cannot set IRQ0..2 to level triggered mode on modern chipsets, look e.g. at the ICH9 spec: "In edge mode, (bit[x] = 0), the interrupt is recognized by a low to high transition. In level mode (bit[x] = 1), the interrupt is recognized by a high level. The cascade channel, IRQ2, the heart beat timer (IRQ0), and the keyboard controller (IRQ1), cannot be put into level mode." So, fiddling with ELCR from guest perspective cannot be the solution. > > You can download the source code for the test program from > http://home.comcast.net/~mmogilvi/downloads/i8259-imr-test-2012-09-02.tar.bz2 > It can be compiled using a standard GNU i386 toolchain on Linux. As Paolo said, we are better off with a KVM unit test. That should be loadable via grub & co. on real HW as well. > > The heart of the test program is: > Initialization is also important to exclude leftovers of the BIOS. > cli > > # i8259:imm: mask off everything except IRQ2 > movb $0xfb,%al # master (only IRQ2 clear) > outb %al,$0x21 > movb $0xff,%al # slave > outb %al,$0xa1 > > mov $.msgCmdRead,%ax > call print > call initIrqHandlers > call scheduleIrq14 > > call .largeDelay # note: IRQ14 raised while this is waiting > > mov $.msgUnmask,%ax > call print > movb $0x3f,%al # unmask IRQ14 and IRQ15 > outb %al,$0xa1 > > call .largeDelay # (probably not important) > > mov $.msgMask,%ax > call print > movb $0xff,%al # mask IRQ14 and IRQ15 again > outb %al,$0xa1 And now check IRR of the master by reading it back. That would be a smoking gun if bit 2 is set before and cleared afterward. > > call .largeDelay # (probably not important) > > mov $.msgSti,%ax > call print > sti > > call .largeDelay # note: spurious interrupt (IRQ15) from qemu > > cli > mov $.msgUnmask,%ax > call print > movb $0x3f,%al # unmask IRQ14 and IRQ15 > outb %al,$0xa1 > sti > > call .largeDelay # we should finally see IRQ14 here? > > # DONE: > cli > movw $.msgDone, %ax > call print > .Lhalt: > hlt > jmp .Lhalt > > In qemu, the master treats IRQ2 as edge triggered, and delivers a > spurious interrupt if the CPU re-enables interrupts after > the slave's IMR is masked off (it also doesn't seem to deliver > the real interrupt when the IMR is unmasked, but maybe that is > because I'm not correctly acknowledging the spurious interrupt). Yes, that is required. > > - Qemu output (without this patch): > elcr=0c00 cmdRead ummask mask sti irq15 unmask DONE > > But on real hardware, the master seems to treat IRQ2 as level triggered, > and doesn't deliver an interrupt to the CPU until the slave unmasks IRQ14. > I've tried this on several machines, including a non-PCI 486 that > does not seem to have ELCR registers. > > - Typical output (pentium/pentium pro/dual AMD Athlon/etc; slight > variations in some elcr bits depending on machine, but bit 0x04 > is always clear): > elcr=0c20 cmdRead unmask mask sti unmask irq14 DONE > > - 486 without elcr (just an ISA bus): > elcr=e0e0 cmdRead unmask mask sti unmask irq14 DONE > > - One failure: It doesn't boot properly (no output) with a USB floppy > drive on my Intel Core I7. Guess: The test program just barely > fits in a sector, with no room for any tables (partition/etc) > that BIOS might check for if it isn't an original, native floppy > drive. > > ----------------------- > > I've found a few descriptions of programming the i8259. I was mostly following the official "8259A PROGRAMMABLE INTERRUPT CONTROLLER (8259A 8259A-2)" by Intel. But it also doesn't mention any trigger mode changes for the cascading input lines. > The closest thing I've found to a detailed spec is in > "iAPX 86, 88 User's Manual", dated August 1981: > http://ebookbrowse.com/1981-iapx-86-88-users-manual-pdf-d3089962 > It has a significant section titled "Using the 8259A Programmable > Interrupt Controller" starting on page 438 or A-135. > > But none of my sources seem to specify how master/slave cascading > interacts with the IMR (mask register) and edge trigger mode, > although it talks about those things individually. > Also, given the date it isn't surprising that it doesn't mention > the elcr registers at all. > > I have not found any real specs for the ELCR registers, but I've found a > few hints: Intel chipset docs describe it, though only briefly. > > - Two 8 bit registers: one for master (0x4d0) and one for slave (0x4d1). > - One bit per IRQ line: 0 for edge trigger, 1 for level trigger. > - Not present unless the machine has EISA or PCI buses. Plain > ISA doesn't have it. > - Might be implemented completely external to the actual i8259s. > - As seen in test program output above, ELCR bit 0x04 is clear, > indicating that IRQ2 is supposedly edge triggered, even though > actual tested behavior is level triggered. > - A google search (8259 elcr -linux -qemu) [exclude the > dominant Linux and qemu related pages] found at least one operating > system that checks several ELCR bits (including IRQ2) as part > of a probe to determine if ELCR exists. So simply setting > the 0x04 bit is probably a bad idea. For example, line 119 of: > https://bitbucket.org/npe/nix/src/35d39df17077/src/9kron2/k8/i8259.c > > ----------------------- > > My guess is that if a master IRQ line is marked as coming from a slave > (in the ICW3 register), then the i8259 treats that line as level > triggered regardless of ELCR registers or the LTIM bit of ICW1 (in > addition to other special slave line logic). Otherwise, the slave > IMR register would seem rather useless. That IMR is surely not useless, even if we do not automatically revoke the IRQ2 event. It would just require more careful unmasking in that case. > > Under that theory, something like the following patch would be appropriate > for qemu. This fixes both the test program, and the original Microport > UNIX problem. > > Possible concerns with this patch: > > - KVM still needs to be fixed. I haven't researched this at all, > beyond noticing that it has the same broken behavior running the > test program, and the high level symptoms from Microport UNIX > are slightly different with KVM. > - It might also be a good idea to add something like my test > program to qemu/tests somewhere. This would be a separate patch. > - Best icw3 configuration strategy?: Should it be stored, or > just assume it is correctly configured based on > PICCommonState::master and standard PC IRQ2 configuration? Let's avoid adding more logic that assumes cascading IRQ == 2. > Perhaps add sanity checks for reasonable values when the > guest is configuring the 8259s? Did I catch all the places No need for sanity checks if the logic is generic enough. > that need to deal with a new state variable (assuming we > decide to store it)? > - This patch is adding code to what may be a relatively common > code path (every time interrupts raised/lowered/acknowledged, masks > changed, etc). Potentially it could be optimized by adding an > "effective_elcr" state variable, that is maintined as a combination > of icw3 and elcr (and maybe add support for LTIM bit from ICW1 at the > same time), so that we don't need to have extra code in the > critical path. Does anyone think this is worth it? I'm leaning > towards "no". No need for such a micro-optimization. > > ========================================== > > hw/i8259.c | 12 ++++++++---- > hw/i8259_common.c | 2 ++ > hw/i8259_internal.h | 1 + > 3 files changed, 11 insertions(+), 4 deletions(-) > > diff --git a/hw/i8259.c b/hw/i8259.c > index 6587666..8e2f9f4 100644 > --- a/hw/i8259.c > +++ b/hw/i8259.c > @@ -140,7 +140,7 @@ static void pic_set_irq(void *opaque, int irq, int level) > } > #endif > > - if (s->elcr & mask) { > + if ((s->elcr | s->icw3) & mask) { > /* level triggered */ > if (level) { > s->irr |= mask; > @@ -174,7 +174,7 @@ static void pic_intack(PICCommonState *s, int irq) > s->isr |= (1 << irq); > } > /* We don't clear a level sensitive interrupt here */ > - if (!(s->elcr & (1 << irq))) { > + if (!((s->elcr | s->icw3) & (1 << irq))) { > s->irr &= ~(1 << irq); > } > pic_update_irq(s); > @@ -314,6 +314,10 @@ static void pic_ioport_write(void *opaque, target_phys_addr_t addr64, > s->init_state = s->single_mode ? (s->init4 ? 3 : 0) : 2; > break; > case 2: > + s->icw3 = val; > + /* TODO: Enforce constraints?: Master is typically > + * 0x04 for IRQ2 (maybe 0 is also OK). Slave must be 0. > + */ No constraints, see above (and slave must be 2, its ID == cascading IRQ#). > if (s->init4) { > s->init_state = 3; > } else { > @@ -419,9 +423,9 @@ void pic_info(Monitor *mon) > for (i = 0; i < 2; i++) { > s = i == 0 ? DO_UPCAST(PICCommonState, dev.qdev, isa_pic) : slave_pic; > monitor_printf(mon, "pic%d: irr=%02x imr=%02x isr=%02x hprio=%d " > - "irq_base=%02x rr_sel=%d elcr=%02x fnm=%d\n", > + "irq_base=%02x icw3=%02x rr_sel=%d elcr=%02x fnm=%d\n", > i, s->irr, s->imr, s->isr, s->priority_add, > - s->irq_base, s->read_reg_select, s->elcr, > + s->irq_base, s->icw3, s->read_reg_select, s->elcr, > s->special_fully_nested_mode); > } > } > diff --git a/hw/i8259_common.c b/hw/i8259_common.c > index ab3d98b..dcde5f2 100644 > --- a/hw/i8259_common.c > +++ b/hw/i8259_common.c > @@ -33,6 +33,7 @@ void pic_reset_common(PICCommonState *s) > s->isr = 0; > s->priority_add = 0; > s->irq_base = 0; > + s->icw3 = 0; > s->read_reg_select = 0; > s->poll = 0; > s->special_mask = 0; > @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { > VMSTATE_UINT8(isr, PICCommonState), > VMSTATE_UINT8(priority_add, PICCommonState), > VMSTATE_UINT8(irq_base, PICCommonState), > + VMSTATE_UINT8(icw3, PICCommonState), Let's add this as an optional subsection, only written when it's not 0x04 (for a master) or 0x2 (for a slave). See target-i386/machine.c for examples, or read docs/migration.txt. That will mean you need to set icw3 to 0x4 before loading a vmstate (=> pre_load handler). The effect is that, generally, no additional state will be written in practice - unless we save right after reset or some guest messes its PIC configuration up. > VMSTATE_UINT8(read_reg_select, PICCommonState), > VMSTATE_UINT8(poll, PICCommonState), > VMSTATE_UINT8(special_mask, PICCommonState), > diff --git a/hw/i8259_internal.h b/hw/i8259_internal.h > index 4137b61..de67d49 100644 > --- a/hw/i8259_internal.h > +++ b/hw/i8259_internal.h > @@ -55,6 +55,7 @@ struct PICCommonState { > uint8_t isr; /* interrupt service register */ > uint8_t priority_add; /* highest irq priority */ > uint8_t irq_base; > + uint8_t icw3; /* bit set if line is connected to a slave */ Comment is wrong. This word is used for masters and slaves. > uint8_t read_reg_select; > uint8_t poll; > uint8_t special_mask; > The only thing that worries me is that we just consider the PC so far while the i8259 is also used on different architectures (PPC, MIPS, Alpha?). Jan
On 2012-09-03 10:51, Jan Kiszka wrote: >> diff --git a/hw/i8259_common.c b/hw/i8259_common.c >> index ab3d98b..dcde5f2 100644 >> --- a/hw/i8259_common.c >> +++ b/hw/i8259_common.c >> @@ -33,6 +33,7 @@ void pic_reset_common(PICCommonState *s) >> s->isr = 0; >> s->priority_add = 0; >> s->irq_base = 0; >> + s->icw3 = 0; >> s->read_reg_select = 0; >> s->poll = 0; >> s->special_mask = 0; >> @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { >> VMSTATE_UINT8(isr, PICCommonState), >> VMSTATE_UINT8(priority_add, PICCommonState), >> VMSTATE_UINT8(irq_base, PICCommonState), >> + VMSTATE_UINT8(icw3, PICCommonState), > > Let's add this as an optional subsection, only written when it's not > 0x04 (for a master) or 0x2 (for a slave). See target-i386/machine.c for > examples, or read docs/migration.txt. That will mean you need to set > icw3 to 0x4 before loading a vmstate (=> pre_load handler). Oh, and this code affects also the kvm-pic. So you have to maintain icw3 for that model as well, setting it to the PC defaults. Please test KVM after your changes, including migration. Jan
Il 03/09/2012 10:51, Jan Kiszka ha scritto: > The only thing that worries me is that we just consider the PC so far > while the i8259 is also used on different architectures (PPC, MIPS, Alpha?). Why is this a problem? All of them use IRQ2 for a cascade, and initialize icw3 to 0x4/0x2 (I checked OpenBIOS, rth's palcode for Alpha, and Linux). BTW, from the palcode it looks like Alpha wants LTIM=1, so it would be nice to implement that one as well: /* ??? MILO initializes the PIC as edge triggered; I do not know how SRM initializes them. However, Linux seems to expect that these are level triggered. That may be a kernel bug, but level triggers are more reliable anyway so lets go with that. */ /* Initialize level triggers. The CY82C693UB that's on real alpha hardware doesn't have this; this is a PIIX extension. However, QEMU doesn't implement regular level triggers. */ outb(0xff, PORT_PIC2_ELCR); outb(0xff, PORT_PIC1_ELCR); Paolo
On 2012-09-03 11:34, Paolo Bonzini wrote: > Il 03/09/2012 10:51, Jan Kiszka ha scritto: >> The only thing that worries me is that we just consider the PC so far >> while the i8259 is also used on different architectures (PPC, MIPS, Alpha?). > > Why is this a problem? All of them use IRQ2 for a cascade, and initialize > icw3 to 0x4/0x2 (I checked OpenBIOS, rth's palcode for Alpha, and Linux). IRQ2 is already hard-coded in QEMU (we had to patch this for our machine emulation, but less in recent versions), that is not the point. I'm concerned about the behavioral changes we are discussing here, ie. the special handling of cascading interrupt inputs. > > BTW, from the palcode it looks like Alpha wants LTIM=1, so it would be nice > to implement that one as well: > > /* ??? MILO initializes the PIC as edge triggered; I do not know how SRM > initializes them. However, Linux seems to expect that these are level > triggered. That may be a kernel bug, but level triggers are more > reliable anyway so lets go with that. */ > > /* Initialize level triggers. The CY82C693UB that's on real alpha > hardware doesn't have this; this is a PIIX extension. However, > QEMU doesn't implement regular level triggers. */ > outb(0xff, PORT_PIC2_ELCR); > outb(0xff, PORT_PIC1_ELCR); Just takes someone to write the patch, as usual. :) Jan
Il 03/09/2012 12:34, Jan Kiszka ha scritto: >> > Why is this a problem? All of them use IRQ2 for a cascade, and initialize >> > icw3 to 0x4/0x2 (I checked OpenBIOS, rth's palcode for Alpha, and Linux). > IRQ2 is already hard-coded in QEMU (we had to patch this for our machine > emulation, but less in recent versions), that is not the point. I'm > concerned about the behavioral changes we are discussing here, ie. the > special handling of cascading interrupt inputs. Yeah, it's quite interesting that the behavior isn't mentioned in the 8259 datasheets. Still in retrospect it's hard to see how it can possibly work with edge-triggered cascaded inputs. Paolo
On 2012-09-03 13:11, Paolo Bonzini wrote: > Il 03/09/2012 12:34, Jan Kiszka ha scritto: >>>> Why is this a problem? All of them use IRQ2 for a cascade, and initialize >>>> icw3 to 0x4/0x2 (I checked OpenBIOS, rth's palcode for Alpha, and Linux). >> IRQ2 is already hard-coded in QEMU (we had to patch this for our machine >> emulation, but less in recent versions), that is not the point. I'm >> concerned about the behavioral changes we are discussing here, ie. the >> special handling of cascading interrupt inputs. > > Yeah, it's quite interesting that the behavior isn't mentioned in the > 8259 datasheets. Still in retrospect it's hard to see how it can > possibly work with edge-triggered cascaded inputs. As I said: by avoiding the pattern Matthew generated in his test case. That's not impossible. All the other OSes running fine against out PIC models are proving this. Jan
On 09/03/2012 11:40 AM, Andreas Färber wrote: > Am 03.09.2012 04:56, schrieb Matthew Ogilvie: >> diff --git a/hw/i8259_common.c b/hw/i8259_common.c >> index ab3d98b..dcde5f2 100644 >> --- a/hw/i8259_common.c >> +++ b/hw/i8259_common.c > [...] >> @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { >> VMSTATE_UINT8(isr, PICCommonState), >> VMSTATE_UINT8(priority_add, PICCommonState), >> VMSTATE_UINT8(irq_base, PICCommonState), >> + VMSTATE_UINT8(icw3, PICCommonState), >> VMSTATE_UINT8(read_reg_select, PICCommonState), >> VMSTATE_UINT8(poll, PICCommonState), >> VMSTATE_UINT8(special_mask, PICCommonState), > > Additional VMState needs to be versioned by incrementing .version_id and > by specifying the new version number here. Otherwise it breaks migration. And incrementing the version ID breaks backwards migration. The correct solution is subsections, copying Juan and booking a trip to the Mariana trench.
Avi Kivity <avi@redhat.com> wrote: > On 09/03/2012 11:40 AM, Andreas Färber wrote: >> Am 03.09.2012 04:56, schrieb Matthew Ogilvie: >>> diff --git a/hw/i8259_common.c b/hw/i8259_common.c >>> index ab3d98b..dcde5f2 100644 >>> --- a/hw/i8259_common.c >>> +++ b/hw/i8259_common.c >> [...] >>> @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { >>> VMSTATE_UINT8(isr, PICCommonState), >>> VMSTATE_UINT8(priority_add, PICCommonState), >>> VMSTATE_UINT8(irq_base, PICCommonState), >>> + VMSTATE_UINT8(icw3, PICCommonState), >>> VMSTATE_UINT8(read_reg_select, PICCommonState), >>> VMSTATE_UINT8(poll, PICCommonState), >>> VMSTATE_UINT8(special_mask, PICCommonState), >> >> Additional VMState needs to be versioned by incrementing .version_id and >> by specifying the new version number here. Otherwise it breaks migration. For the subsection, only sending this when icw3 != 0 is enough? I am searching for a test about when we need to send it. > And incrementing the version ID breaks backwards migration. The correct > solution is subsections, copying Juan and booking a trip to the Mariana > trench. Book also for me, no need for the return ticket. Later, Juan.
On 2012-09-03 17:42, Juan Quintela wrote: > Avi Kivity <avi@redhat.com> wrote: >> On 09/03/2012 11:40 AM, Andreas Färber wrote: >>> Am 03.09.2012 04:56, schrieb Matthew Ogilvie: >>>> diff --git a/hw/i8259_common.c b/hw/i8259_common.c >>>> index ab3d98b..dcde5f2 100644 >>>> --- a/hw/i8259_common.c >>>> +++ b/hw/i8259_common.c >>> [...] >>>> @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { >>>> VMSTATE_UINT8(isr, PICCommonState), >>>> VMSTATE_UINT8(priority_add, PICCommonState), >>>> VMSTATE_UINT8(irq_base, PICCommonState), >>>> + VMSTATE_UINT8(icw3, PICCommonState), >>>> VMSTATE_UINT8(read_reg_select, PICCommonState), >>>> VMSTATE_UINT8(poll, PICCommonState), >>>> VMSTATE_UINT8(special_mask, PICCommonState), >>> >>> Additional VMState needs to be versioned by incrementing .version_id and >>> by specifying the new version number here. Otherwise it breaks migration. > > For the subsection, only sending this when icw3 != 0 is enough? I am > searching for a test about when we need to send it. See my reply on this topic in the other branch of this thread. > >> And incrementing the version ID breaks backwards migration. The correct >> solution is subsections, copying Juan and booking a trip to the Mariana >> trench. > > Book also for me, no need for the return ticket. > Hey, this scenario is rather harmless (famous last words...). ;) Jan
On 09/03/2012 06:42 PM, Juan Quintela wrote: > Avi Kivity <avi@redhat.com> wrote: >> On 09/03/2012 11:40 AM, Andreas Färber wrote: >>> Am 03.09.2012 04:56, schrieb Matthew Ogilvie: >>>> diff --git a/hw/i8259_common.c b/hw/i8259_common.c >>>> index ab3d98b..dcde5f2 100644 >>>> --- a/hw/i8259_common.c >>>> +++ b/hw/i8259_common.c >>> [...] >>>> @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { >>>> VMSTATE_UINT8(isr, PICCommonState), >>>> VMSTATE_UINT8(priority_add, PICCommonState), >>>> VMSTATE_UINT8(irq_base, PICCommonState), >>>> + VMSTATE_UINT8(icw3, PICCommonState), >>>> VMSTATE_UINT8(read_reg_select, PICCommonState), >>>> VMSTATE_UINT8(poll, PICCommonState), >>>> VMSTATE_UINT8(special_mask, PICCommonState), >>> >>> Additional VMState needs to be versioned by incrementing .version_id and >>> by specifying the new version number here. Otherwise it breaks migration. > > For the subsection, only sending this when icw3 != 0 is enough? I am > searching for a test about when we need to send it. Looks like the optimal condition is ((s->icw3 & ~s->eclr) != 0) (i.e. bit set in icw3 but clear in eclr).
On 2012-09-03 17:52, Avi Kivity wrote: > On 09/03/2012 06:42 PM, Juan Quintela wrote: >> Avi Kivity <avi@redhat.com> wrote: >>> On 09/03/2012 11:40 AM, Andreas Färber wrote: >>>> Am 03.09.2012 04:56, schrieb Matthew Ogilvie: >>>>> diff --git a/hw/i8259_common.c b/hw/i8259_common.c >>>>> index ab3d98b..dcde5f2 100644 >>>>> --- a/hw/i8259_common.c >>>>> +++ b/hw/i8259_common.c >>>> [...] >>>>> @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { >>>>> VMSTATE_UINT8(isr, PICCommonState), >>>>> VMSTATE_UINT8(priority_add, PICCommonState), >>>>> VMSTATE_UINT8(irq_base, PICCommonState), >>>>> + VMSTATE_UINT8(icw3, PICCommonState), >>>>> VMSTATE_UINT8(read_reg_select, PICCommonState), >>>>> VMSTATE_UINT8(poll, PICCommonState), >>>>> VMSTATE_UINT8(special_mask, PICCommonState), >>>> >>>> Additional VMState needs to be versioned by incrementing .version_id and >>>> by specifying the new version number here. Otherwise it breaks migration. >> >> For the subsection, only sending this when icw3 != 0 is enough? I am >> searching for a test about when we need to send it. > > Looks like the optimal condition is ((s->icw3 & ~s->eclr) != 0) (i.e. > bit set in icw3 but clear in eclr). The standard PC values are optimal: 4 for master, 2 for slave. Jan
On 09/03/2012 06:54 PM, Jan Kiszka wrote: > On 2012-09-03 17:52, Avi Kivity wrote: >> On 09/03/2012 06:42 PM, Juan Quintela wrote: >>> Avi Kivity <avi@redhat.com> wrote: >>>> On 09/03/2012 11:40 AM, Andreas Färber wrote: >>>>> Am 03.09.2012 04:56, schrieb Matthew Ogilvie: >>>>>> diff --git a/hw/i8259_common.c b/hw/i8259_common.c >>>>>> index ab3d98b..dcde5f2 100644 >>>>>> --- a/hw/i8259_common.c >>>>>> +++ b/hw/i8259_common.c >>>>> [...] >>>>>> @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { >>>>>> VMSTATE_UINT8(isr, PICCommonState), >>>>>> VMSTATE_UINT8(priority_add, PICCommonState), >>>>>> VMSTATE_UINT8(irq_base, PICCommonState), >>>>>> + VMSTATE_UINT8(icw3, PICCommonState), >>>>>> VMSTATE_UINT8(read_reg_select, PICCommonState), >>>>>> VMSTATE_UINT8(poll, PICCommonState), >>>>>> VMSTATE_UINT8(special_mask, PICCommonState), >>>>> >>>>> Additional VMState needs to be versioned by incrementing .version_id and >>>>> by specifying the new version number here. Otherwise it breaks migration. >>> >>> For the subsection, only sending this when icw3 != 0 is enough? I am >>> searching for a test about when we need to send it. >> >> Looks like the optimal condition is ((s->icw3 & ~s->eclr) != 0) (i.e. >> bit set in icw3 but clear in eclr). > > The standard PC values are optimal: 4 for master, 2 for slave. Can you explain why? I saw that icw3 is always ORed with eclr, so my condition will catch exactly those cases where a change in behaviour occurs, and no more.
On 2012-09-03 17:57, Avi Kivity wrote: > On 09/03/2012 06:54 PM, Jan Kiszka wrote: >> On 2012-09-03 17:52, Avi Kivity wrote: >>> On 09/03/2012 06:42 PM, Juan Quintela wrote: >>>> Avi Kivity <avi@redhat.com> wrote: >>>>> On 09/03/2012 11:40 AM, Andreas Färber wrote: >>>>>> Am 03.09.2012 04:56, schrieb Matthew Ogilvie: >>>>>>> diff --git a/hw/i8259_common.c b/hw/i8259_common.c >>>>>>> index ab3d98b..dcde5f2 100644 >>>>>>> --- a/hw/i8259_common.c >>>>>>> +++ b/hw/i8259_common.c >>>>>> [...] >>>>>>> @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { >>>>>>> VMSTATE_UINT8(isr, PICCommonState), >>>>>>> VMSTATE_UINT8(priority_add, PICCommonState), >>>>>>> VMSTATE_UINT8(irq_base, PICCommonState), >>>>>>> + VMSTATE_UINT8(icw3, PICCommonState), >>>>>>> VMSTATE_UINT8(read_reg_select, PICCommonState), >>>>>>> VMSTATE_UINT8(poll, PICCommonState), >>>>>>> VMSTATE_UINT8(special_mask, PICCommonState), >>>>>> >>>>>> Additional VMState needs to be versioned by incrementing .version_id and >>>>>> by specifying the new version number here. Otherwise it breaks migration. >>>> >>>> For the subsection, only sending this when icw3 != 0 is enough? I am >>>> searching for a test about when we need to send it. >>> >>> Looks like the optimal condition is ((s->icw3 & ~s->eclr) != 0) (i.e. >>> bit set in icw3 but clear in eclr). >> >> The standard PC values are optimal: 4 for master, 2 for slave. > > Can you explain why? I saw that icw3 is always ORed with eclr, so my > condition will catch exactly those cases where a change in behaviour > occurs, and no more. The values above are what every user of the PIC cascaded on our targets must program to use them. So We will find them in the state once any relevant guest code was able to run (e.g. the BIOS). Jan
On 09/03/2012 07:02 PM, Jan Kiszka wrote: >>>> Looks like the optimal condition is ((s->icw3 & ~s->eclr) != 0) (i.e. >>>> bit set in icw3 but clear in eclr). >>> >>> The standard PC values are optimal: 4 for master, 2 for slave. >> >> Can you explain why? I saw that icw3 is always ORed with eclr, so my >> condition will catch exactly those cases where a change in behaviour >> occurs, and no more. > > The values above are what every user of the PIC cascaded on our targets > must program to use them. So We will find them in the state once any > relevant guest code was able to run (e.g. the BIOS). > Suppose the bios has not run yet?
Il 03/09/2012 18:15, Avi Kivity ha scritto: >> > The values above are what every user of the PIC cascaded on our targets >> > must program to use them. So We will find them in the state once any >> > relevant guest code was able to run (e.g. the BIOS). >> > > Suppose the bios has not run yet? Then you transmit the subsection. BTW this also means that simply checking against eclr|icw3 is wrong; the right test is: * against elcr if !s->master * against elcr|icw3 if s->master This makes precomputing the value more appealing. Similarly, perhaps this: if (s->special_fully_nested_mode && s->master) { mask &= ~(1 << 2); } should be changed to if (s->special_fully_nested_mode && s->master) { mask &= ~s->icw3; } ? Paolo
On 2012-09-03 18:15, Avi Kivity wrote: > On 09/03/2012 07:02 PM, Jan Kiszka wrote: > >>>>> Looks like the optimal condition is ((s->icw3 & ~s->eclr) != 0) (i.e. >>>>> bit set in icw3 but clear in eclr). >>>> >>>> The standard PC values are optimal: 4 for master, 2 for slave. >>> >>> Can you explain why? I saw that icw3 is always ORed with eclr, so my >>> condition will catch exactly those cases where a change in behaviour >>> occurs, and no more. >> >> The values above are what every user of the PIC cascaded on our targets >> must program to use them. So We will find them in the state once any >> relevant guest code was able to run (e.g. the BIOS). >> > > Suppose the bios has not run yet? Then we must save the 0. Your logic will force us to save in all standard cases (ELCR's bit for IRQ2 must not be set by a guest). So it's not really helpful. Jan
On 09/03/2012 07:23 PM, Paolo Bonzini wrote: > Il 03/09/2012 18:15, Avi Kivity ha scritto: >>> > The values above are what every user of the PIC cascaded on our targets >>> > must program to use them. So We will find them in the state once any >>> > relevant guest code was able to run (e.g. the BIOS). >>> > >> Suppose the bios has not run yet? > > Then you transmit the subsection. And the migration fails. Needlessly, since icw3 == 0 doesn't affect guest operation.
Il 03/09/2012 18:30, Avi Kivity ha scritto: >>>>> The values above are what every user of the PIC cascaded on our targets >>>>> >>> > must program to use them. So We will find them in the state once any >>>>> >>> > relevant guest code was able to run (e.g. the BIOS). >>>>> >>> > >>> >> Suppose the bios has not run yet? >> > >> > Then you transmit the subsection. > And the migration fails. Needlessly, since icw3 == 0 doesn't affect > guest operation. But the point of subsections is to succeed migration in the common case, assuming there is more than one case that doesn't affect guest operation. Paolo
On 2012-09-03 18:33, Paolo Bonzini wrote: > Il 03/09/2012 18:30, Avi Kivity ha scritto: >>>>>> The values above are what every user of the PIC cascaded on our targets >>>>>>>>>> must program to use them. So We will find them in the state once any >>>>>>>>>> relevant guest code was able to run (e.g. the BIOS). >>>>>>>>>> >>>>>> Suppose the bios has not run yet? >>>> >>>> Then you transmit the subsection. >> And the migration fails. Needlessly, since icw3 == 0 doesn't affect >> guest operation. > > But the point of subsections is to succeed migration in the common case, > assuming there is more than one case that doesn't affect guest operation. The point is that the common case is icw3 = 4 (for the master), not 0. And if we don't save that case, we must save the rest. If we don't save this as well, we lose state information. This can't work. Jan
Il 03/09/2012 18:40, Jan Kiszka ha scritto: >>> >> And the migration fails. Needlessly, since icw3 == 0 doesn't affect >>> >> guest operation. >> > >> > But the point of subsections is to succeed migration in the common case, >> > assuming there is more than one case that doesn't affect guest operation. > The point is that the common case is icw3 = 4 (for the master), not 0. > And if we don't save that case, we must save the rest. If we don't save > this as well, we lose state information. This can't work. I was agreeing with you, not Avi. :) Paolo
On 09/03/2012 07:33 PM, Paolo Bonzini wrote: > Il 03/09/2012 18:30, Avi Kivity ha scritto: >>>>>> The values above are what every user of the PIC cascaded on our targets >>>>>> >>> > must program to use them. So We will find them in the state once any >>>>>> >>> > relevant guest code was able to run (e.g. the BIOS). >>>>>> >>> > >>>> >> Suppose the bios has not run yet? >>> > >>> > Then you transmit the subsection. >> And the migration fails. Needlessly, since icw3 == 0 doesn't affect >> guest operation. > > But the point of subsections is to succeed migration in the common case, > assuming there is more than one case that doesn't affect guest operation. According to the patch, if icw3 == 4 && !(eclr & 4), then behaviour will change. With the standard configuration, if two pci interrupts hit at once, then before the patch irr.2 will be clear, and afterwards set. So we do have a behavioural change. Is the rest of the code masking this change under the standard configuration?
Il 04/09/2012 10:16, Avi Kivity ha scritto: >> > But the point of subsections is to succeed migration in the common case, >> > assuming there is more than one case that doesn't affect guest operation. > According to the patch, if icw3 == 4 && !(eclr & 4), then behaviour will > change. With the standard configuration, if two pci interrupts hit at > once, then before the patch irr.2 will be clear, and afterwards set. > > So we do have a behavioural change. Is the rest of the code masking > this change under the standard configuration? No, it is not masking the change. The assumption is that nothing should care about irr.2 or isr.2, because nothing attaches an handler to the cascade interrupt. You have to choose between assuming this, and breaking backwards migration. I would rather break backwards migration, but others disagree... Paolo
On 09/04/2012 12:15 PM, Paolo Bonzini wrote: > Il 04/09/2012 10:16, Avi Kivity ha scritto: >>> > But the point of subsections is to succeed migration in the common case, >>> > assuming there is more than one case that doesn't affect guest operation. >> According to the patch, if icw3 == 4 && !(eclr & 4), then behaviour will >> change. With the standard configuration, if two pci interrupts hit at >> once, then before the patch irr.2 will be clear, and afterwards set. >> >> So we do have a behavioural change. Is the rest of the code masking >> this change under the standard configuration? > > No, it is not masking the change. The assumption is that nothing should > care about irr.2 or isr.2, because nothing attaches an handler to the > cascade interrupt. Won't the next call to pic_get_irq() notice the difference in s->irr? > You have to choose between assuming this, and breaking backwards > migration. I would rather break backwards migration, but others disagree... Normally I'd agree, but if the only known breakee is a 1987 guest then I'd make an exception.
On Tue, 4 Sep 2012, Avi Kivity wrote: > On 09/04/2012 12:15 PM, Paolo Bonzini wrote: >> Il 04/09/2012 10:16, Avi Kivity ha scritto: >>>>> But the point of subsections is to succeed migration in the common case, >>>>> assuming there is more than one case that doesn't affect guest operation. >>> According to the patch, if icw3 == 4 && !(eclr & 4), then behaviour will >>> change. With the standard configuration, if two pci interrupts hit at >>> once, then before the patch irr.2 will be clear, and afterwards set. >>> >>> So we do have a behavioural change. Is the rest of the code masking >>> this change under the standard configuration? >> >> No, it is not masking the change. The assumption is that nothing should >> care about irr.2 or isr.2, because nothing attaches an handler to the >> cascade interrupt. > > Won't the next call to pic_get_irq() notice the difference in s->irr? > >> You have to choose between assuming this, and breaking backwards >> migration. I would rather break backwards migration, but others disagree... > > Normally I'd agree, but if the only known breakee is a 1987 guest then > I'd make an exception. Another one affected by this is OpenStep 4.2 (probably NeXTstep and Rhapsody too) which are not exactly recent either but there are more than only one "breakee". Regards, BALATON Zoltan
On 09/04/2012 12:29 PM, BALATON Zoltan wrote: > On Tue, 4 Sep 2012, Avi Kivity wrote: >> On 09/04/2012 12:15 PM, Paolo Bonzini wrote: >>> Il 04/09/2012 10:16, Avi Kivity ha scritto: >>>>>> But the point of subsections is to succeed migration in the common >>>>>> case, >>>>>> assuming there is more than one case that doesn't affect guest >>>>>> operation. >>>> According to the patch, if icw3 == 4 && !(eclr & 4), then behaviour >>>> will >>>> change. With the standard configuration, if two pci interrupts hit at >>>> once, then before the patch irr.2 will be clear, and afterwards set. >>>> >>>> So we do have a behavioural change. Is the rest of the code masking >>>> this change under the standard configuration? >>> >>> No, it is not masking the change. The assumption is that nothing should >>> care about irr.2 or isr.2, because nothing attaches an handler to the >>> cascade interrupt. >> >> Won't the next call to pic_get_irq() notice the difference in s->irr? >> >>> You have to choose between assuming this, and breaking backwards >>> migration. I would rather break backwards migration, but others >>> disagree... >> >> Normally I'd agree, but if the only known breakee is a 1987 guest then >> I'd make an exception. > > Another one affected by this is OpenStep 4.2 (probably NeXTstep and > Rhapsody too) which are not exactly recent either but there are more > than only one "breakee". Those are all filed under "esoterics". I don't mean to say we shouldn't care about them, but there are likely to be a lot more users doing backwards migration than users running those guests, let alone migrating them (forwards or backwards). The pragmatic choice is clear.
On 2012-09-04 11:37, Avi Kivity wrote: > On 09/04/2012 12:29 PM, BALATON Zoltan wrote: >> On Tue, 4 Sep 2012, Avi Kivity wrote: >>> On 09/04/2012 12:15 PM, Paolo Bonzini wrote: >>>> Il 04/09/2012 10:16, Avi Kivity ha scritto: >>>>>>> But the point of subsections is to succeed migration in the common >>>>>>> case, >>>>>>> assuming there is more than one case that doesn't affect guest >>>>>>> operation. >>>>> According to the patch, if icw3 == 4 && !(eclr & 4), then behaviour >>>>> will >>>>> change. With the standard configuration, if two pci interrupts hit at >>>>> once, then before the patch irr.2 will be clear, and afterwards set. >>>>> >>>>> So we do have a behavioural change. Is the rest of the code masking >>>>> this change under the standard configuration? >>>> >>>> No, it is not masking the change. The assumption is that nothing should >>>> care about irr.2 or isr.2, because nothing attaches an handler to the >>>> cascade interrupt. >>> >>> Won't the next call to pic_get_irq() notice the difference in s->irr? >>> >>>> You have to choose between assuming this, and breaking backwards >>>> migration. I would rather break backwards migration, but others >>>> disagree... >>> >>> Normally I'd agree, but if the only known breakee is a 1987 guest then >>> I'd make an exception. >> >> Another one affected by this is OpenStep 4.2 (probably NeXTstep and >> Rhapsody too) which are not exactly recent either but there are more >> than only one "breakee". > > Those are all filed under "esoterics". > > I don't mean to say we shouldn't care about them, but there are likely > to be a lot more users doing backwards migration than users running > those guests, let alone migrating them (forwards or backwards). The > pragmatic choice is clear. BTW, did anyone actually test backward migration recently? I thought to remember I effectively broke it in 1.1 with some changes to the i8259 (or was it the PIT?) vmstate, and no one really cared about this or my first proposals to fix it. Jan
Il 04/09/2012 11:51, Jan Kiszka ha scritto: >> > >> > I don't mean to say we shouldn't care about them, but there are likely >> > to be a lot more users doing backwards migration than users running >> > those guests, let alone migrating them (forwards or backwards). The >> > pragmatic choice is clear. > BTW, did anyone actually test backward migration recently? I thought to > remember I effectively broke it in 1.1 with some changes to the i8259 > (or was it the PIT?) vmstate, and no one really cared about this or my > first proposals to fix it. Correct: commit ce967e2 (i8254: Rework & fix interaction with HPET in legacy mode, 2012-02-01) bumped the PIT version from 2 to 3. RTC changes will break it more in 1.3. Honestly, backwards migration only works on "enterprise" qemu-kvm because it is tested only there. And so far the only sample across major releases is that RHEL6->RHEL5 migration didn't work. Here the choice is between changing guest behavior by defaulting to 4/2, and always transmitting the subsection by defaulting to 0/0. The latter makes the subsection useless, so at that point we might as well bump the version number and board said flight to the Pacific. Paolo
On 09/04/2012 01:06 PM, Paolo Bonzini wrote: > Il 04/09/2012 11:51, Jan Kiszka ha scritto: >>> > >>> > I don't mean to say we shouldn't care about them, but there are likely >>> > to be a lot more users doing backwards migration than users running >>> > those guests, let alone migrating them (forwards or backwards). The >>> > pragmatic choice is clear. >> BTW, did anyone actually test backward migration recently? I thought to >> remember I effectively broke it in 1.1 with some changes to the i8259 >> (or was it the PIT?) vmstate, and no one really cared about this or my >> first proposals to fix it. > > Correct: commit ce967e2 (i8254: Rework & fix interaction with HPET in > legacy mode, 2012-02-01) bumped the PIT version from 2 to 3. Gah, this is 1.1, so there's no way to fix it. > RTC > changes will break it more in 1.3. > > Honestly, backwards migration only works on "enterprise" qemu-kvm > because it is tested only there. And so far the only sample across > major releases is that RHEL6->RHEL5 migration didn't work. I expect that we will want 7->6, though we may have to settle for minor releases only. > Here the choice is between changing guest behavior by defaulting to 4/2, > and always transmitting the subsection by defaulting to 0/0. The latter > makes the subsection useless, so at that point we might as well bump the > version number and board said flight to the Pacific. So we do the 4/2 thing. I expect if we go they'll tell us they don't do backwards flights.
On Mon, 3 Sep 2012, Jan Kiszka wrote: > > - Qemu output (without this patch): > > elcr=0c00 cmdRead ummask mask sti irq15 unmask DONE > > > > But on real hardware, the master seems to treat IRQ2 as level triggered, That is not universally true, however in reality it does not matter, more on that below. > > and doesn't deliver an interrupt to the CPU until the slave unmasks IRQ14. > > I've tried this on several machines, including a non-PCI 486 that > > does not seem to have ELCR registers. > > > > - Typical output (pentium/pentium pro/dual AMD Athlon/etc; slight > > variations in some elcr bits depending on machine, but bit 0x04 > > is always clear): > > elcr=0c20 cmdRead unmask mask sti unmask irq14 DONE > > > > - 486 without elcr (just an ISA bus): > > elcr=e0e0 cmdRead unmask mask sti unmask irq14 DONE > > > > - One failure: It doesn't boot properly (no output) with a USB floppy > > drive on my Intel Core I7. Guess: The test program just barely > > fits in a sector, with no room for any tables (partition/etc) > > that BIOS might check for if it isn't an original, native floppy > > drive. > > > > ----------------------- > > > > I've found a few descriptions of programming the i8259. > > I was mostly following the official "8259A PROGRAMMABLE INTERRUPT > CONTROLLER (8259A 8259A-2)" by Intel. But it also doesn't mention any > trigger mode changes for the cascading input lines. The datasheet is vague on the subject and there is apparently a common misconception on how the 8259A trigger modes work. PIC core modifications, especially the ELCR (originally added for EISA support), seen in chipsets obfuscate the matter further. So first of all, the *output* of the 8259A is always edge triggered, regardless of whether it's the master or one of the slaves (only one slave is used in the PC/AT architecture, but up to eight are supported; the PC/XT had none). If once an interrupt has been served any interrupts remain in the pending state, then the 8259A deasserts its output briefly before reasserting it. Secondly, for inputs the original 8259A only allowed a global (i.e. for all at once) edge/level selection via ICW1. That somehow had to be compatible with the master/slave mode. However as of the 8259A "edge-triggered" in Intel's nomenclature meant: "asserted with a low-to-high transition at the time an interrupt is registered and then kept high until the interrupt is served via one of the EOI mechanisms (there are a few, including AEOI) or goes away unhandled." This is important for the output because the x86 architecture's definition for the INT input is level-triggered. The edge-triggered mode was also the only one supported for inputs with the original 8080-compatible 8259; note that for inputs a high-to-low transition cancels the interrupt as in the level-triggered mode. The "level-triggered" mode was then added with the usual meaning -- essentially copying the interrupt input onto the output, ORing with the other inputs. So thus defined "edge-triggered" mode is in fact compatible with both edge-triggered and level-triggered inputs -- the former work by means of observing the low-to-high transition and the latter also work by only seeing the interrupt go away briefly -- but that does not really matter for level-triggered interrupts as the 8259A still knows an IRQ is pending and if a race causes an INTA cycle to arrive at the time the PIC's output is low it will still correctly serve the vector (I think in practice it does not really happen as the vector is served with the second INTA cycle only and by then the interrupt output must have recovered). There were some interesting errata in some early embedded 8259A cores in EISA chipses where the brief deassertion of the output did not happen. That worked just fine when wired directly to an x86 processor (because as noted above the INT input is level-triggered), however when connected to an i82489DX discrete APIC, that interprets the edge-triggered mode in the traditional way and implies that mode for inputs cascaded to 8259A, the problem had to be worked around in hardware, by adding extra logic to regenerate the missing high-to-low-to-high transition. Some of the information presented here was only noted by Intel in APIC documentation; I may track down and quote references, but these documents have never been available online anyway. Did that help? Maciej
========================================== hw/i8259.c | 12 ++++++++---- hw/i8259_common.c | 2 ++ hw/i8259_internal.h | 1 + 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/i8259.c b/hw/i8259.c index 6587666..8e2f9f4 100644 --- a/hw/i8259.c +++ b/hw/i8259.c @@ -140,7 +140,7 @@ static void pic_set_irq(void *opaque, int irq, int level) } #endif - if (s->elcr & mask) { + if ((s->elcr | s->icw3) & mask) { /* level triggered */ if (level) { s->irr |= mask; @@ -174,7 +174,7 @@ static void pic_intack(PICCommonState *s, int irq) s->isr |= (1 << irq); } /* We don't clear a level sensitive interrupt here */ - if (!(s->elcr & (1 << irq))) { + if (!((s->elcr | s->icw3) & (1 << irq))) { s->irr &= ~(1 << irq); } pic_update_irq(s); @@ -314,6 +314,10 @@ static void pic_ioport_write(void *opaque, target_phys_addr_t addr64, s->init_state = s->single_mode ? (s->init4 ? 3 : 0) : 2; break; case 2: + s->icw3 = val; + /* TODO: Enforce constraints?: Master is typically + * 0x04 for IRQ2 (maybe 0 is also OK). Slave must be 0. + */ if (s->init4) { s->init_state = 3; } else { @@ -419,9 +423,9 @@ void pic_info(Monitor *mon) for (i = 0; i < 2; i++) { s = i == 0 ? DO_UPCAST(PICCommonState, dev.qdev, isa_pic) : slave_pic; monitor_printf(mon, "pic%d: irr=%02x imr=%02x isr=%02x hprio=%d " - "irq_base=%02x rr_sel=%d elcr=%02x fnm=%d\n", + "irq_base=%02x icw3=%02x rr_sel=%d elcr=%02x fnm=%d\n", i, s->irr, s->imr, s->isr, s->priority_add, - s->irq_base, s->read_reg_select, s->elcr, + s->irq_base, s->icw3, s->read_reg_select, s->elcr, s->special_fully_nested_mode); } } diff --git a/hw/i8259_common.c b/hw/i8259_common.c index ab3d98b..dcde5f2 100644 --- a/hw/i8259_common.c +++ b/hw/i8259_common.c @@ -33,6 +33,7 @@ void pic_reset_common(PICCommonState *s) s->isr = 0; s->priority_add = 0; s->irq_base = 0; + s->icw3 = 0; s->read_reg_select = 0; s->poll = 0; s->special_mask = 0; @@ -111,6 +112,7 @@ static const VMStateDescription vmstate_pic_common = { VMSTATE_UINT8(isr, PICCommonState), VMSTATE_UINT8(priority_add, PICCommonState), VMSTATE_UINT8(irq_base, PICCommonState), + VMSTATE_UINT8(icw3, PICCommonState), VMSTATE_UINT8(read_reg_select, PICCommonState), VMSTATE_UINT8(poll, PICCommonState), VMSTATE_UINT8(special_mask, PICCommonState), diff --git a/hw/i8259_internal.h b/hw/i8259_internal.h index 4137b61..de67d49 100644 --- a/hw/i8259_internal.h +++ b/hw/i8259_internal.h @@ -55,6 +55,7 @@ struct PICCommonState { uint8_t isr; /* interrupt service register */ uint8_t priority_add; /* highest irq priority */ uint8_t irq_base; + uint8_t icw3; /* bit set if line is connected to a slave */ uint8_t read_reg_select; uint8_t poll; uint8_t special_mask;
Although I haven't found any specs that say so, on real hardware I have a test program that shows if you mask off the slave interrupt (say IRQ14) in the IMR after it has already been raised, the master (IRQ2) gets canceled (that is, IRQ2 acts like it is level triggered). Without this patch, qemu delivers a spurious interrupt (IRQ15) instead when running the test program. Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@miniinfo.net> --- I've written a test program (in the form of a floppy disk boot sector) that demonstrates that qemu's emulation of IRQ2 propagation from the slave i8259 to the master does not work correctly when the CPU has interrupts disabled and it masks off the original interrupt (IRQ14) in the slave's IMR register. This was based on simplifying breakage observed when trying to run an old Microport UNIX system (ca 1987). Earlier I speculated that maybe the ELCR bit for IRQ2 was incorrect, but now I don't think changing the bit (from the target's perspective) would be a good idea. See below. You can download the source code for the test program from http://home.comcast.net/~mmogilvi/downloads/i8259-imr-test-2012-09-02.tar.bz2 It can be compiled using a standard GNU i386 toolchain on Linux. The heart of the test program is: cli # i8259:imm: mask off everything except IRQ2 movb $0xfb,%al # master (only IRQ2 clear) outb %al,$0x21 movb $0xff,%al # slave outb %al,$0xa1 mov $.msgCmdRead,%ax call print call initIrqHandlers call scheduleIrq14 call .largeDelay # note: IRQ14 raised while this is waiting mov $.msgUnmask,%ax call print movb $0x3f,%al # unmask IRQ14 and IRQ15 outb %al,$0xa1 call .largeDelay # (probably not important) mov $.msgMask,%ax call print movb $0xff,%al # mask IRQ14 and IRQ15 again outb %al,$0xa1 call .largeDelay # (probably not important) mov $.msgSti,%ax call print sti call .largeDelay # note: spurious interrupt (IRQ15) from qemu cli mov $.msgUnmask,%ax call print movb $0x3f,%al # unmask IRQ14 and IRQ15 outb %al,$0xa1 sti call .largeDelay # we should finally see IRQ14 here? # DONE: cli movw $.msgDone, %ax call print .Lhalt: hlt jmp .Lhalt In qemu, the master treats IRQ2 as edge triggered, and delivers a spurious interrupt if the CPU re-enables interrupts after the slave's IMR is masked off (it also doesn't seem to deliver the real interrupt when the IMR is unmasked, but maybe that is because I'm not correctly acknowledging the spurious interrupt). - Qemu output (without this patch): elcr=0c00 cmdRead ummask mask sti irq15 unmask DONE But on real hardware, the master seems to treat IRQ2 as level triggered, and doesn't deliver an interrupt to the CPU until the slave unmasks IRQ14. I've tried this on several machines, including a non-PCI 486 that does not seem to have ELCR registers. - Typical output (pentium/pentium pro/dual AMD Athlon/etc; slight variations in some elcr bits depending on machine, but bit 0x04 is always clear): elcr=0c20 cmdRead unmask mask sti unmask irq14 DONE - 486 without elcr (just an ISA bus): elcr=e0e0 cmdRead unmask mask sti unmask irq14 DONE - One failure: It doesn't boot properly (no output) with a USB floppy drive on my Intel Core I7. Guess: The test program just barely fits in a sector, with no room for any tables (partition/etc) that BIOS might check for if it isn't an original, native floppy drive. ----------------------- I've found a few descriptions of programming the i8259. The closest thing I've found to a detailed spec is in "iAPX 86, 88 User's Manual", dated August 1981: http://ebookbrowse.com/1981-iapx-86-88-users-manual-pdf-d3089962 It has a significant section titled "Using the 8259A Programmable Interrupt Controller" starting on page 438 or A-135. But none of my sources seem to specify how master/slave cascading interacts with the IMR (mask register) and edge trigger mode, although it talks about those things individually. Also, given the date it isn't surprising that it doesn't mention the elcr registers at all. I have not found any real specs for the ELCR registers, but I've found a few hints: - Two 8 bit registers: one for master (0x4d0) and one for slave (0x4d1). - One bit per IRQ line: 0 for edge trigger, 1 for level trigger. - Not present unless the machine has EISA or PCI buses. Plain ISA doesn't have it. - Might be implemented completely external to the actual i8259s. - As seen in test program output above, ELCR bit 0x04 is clear, indicating that IRQ2 is supposedly edge triggered, even though actual tested behavior is level triggered. - A google search (8259 elcr -linux -qemu) [exclude the dominant Linux and qemu related pages] found at least one operating system that checks several ELCR bits (including IRQ2) as part of a probe to determine if ELCR exists. So simply setting the 0x04 bit is probably a bad idea. For example, line 119 of: https://bitbucket.org/npe/nix/src/35d39df17077/src/9kron2/k8/i8259.c ----------------------- My guess is that if a master IRQ line is marked as coming from a slave (in the ICW3 register), then the i8259 treats that line as level triggered regardless of ELCR registers or the LTIM bit of ICW1 (in addition to other special slave line logic). Otherwise, the slave IMR register would seem rather useless. Under that theory, something like the following patch would be appropriate for qemu. This fixes both the test program, and the original Microport UNIX problem. Possible concerns with this patch: - KVM still needs to be fixed. I haven't researched this at all, beyond noticing that it has the same broken behavior running the test program, and the high level symptoms from Microport UNIX are slightly different with KVM. - It might also be a good idea to add something like my test program to qemu/tests somewhere. This would be a separate patch. - Best icw3 configuration strategy?: Should it be stored, or just assume it is correctly configured based on PICCommonState::master and standard PC IRQ2 configuration? Perhaps add sanity checks for reasonable values when the guest is configuring the 8259s? Did I catch all the places that need to deal with a new state variable (assuming we decide to store it)? - This patch is adding code to what may be a relatively common code path (every time interrupts raised/lowered/acknowledged, masks changed, etc). Potentially it could be optimized by adding an "effective_elcr" state variable, that is maintined as a combination of icw3 and elcr (and maybe add support for LTIM bit from ICW1 at the same time), so that we don't need to have extra code in the critical path. Does anyone think this is worth it? I'm leaning towards "no".