Patchwork [7/7] ahci: work around bug with level interrupts

login
register
mail settings
Submitter Alexander Graf
Date Feb. 1, 2011, 2:51 p.m.
Message ID <1296571892-12702-8-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/81321/
State New
Headers show

Comments

Alexander Graf - Feb. 1, 2011, 2:51 p.m.
When using level based interrupts, the interrupt is treated the same as an
edge triggered one: leaving the line up does not retrigger the interrupt.

In fact, when not lowering the line, we won't ever get a new interrupt inside
the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
something on the device. This way we're sure the guest doesn't starve on
interrupts until someone fixes the actual interrupt path.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 hw/ide/ahci.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)
Aurelien Jarno - Feb. 1, 2011, 4:34 p.m.
On Tue, Feb 01, 2011 at 03:51:32PM +0100, Alexander Graf wrote:
> When using level based interrupts, the interrupt is treated the same as an
> edge triggered one: leaving the line up does not retrigger the interrupt.
> 
> In fact, when not lowering the line, we won't ever get a new interrupt inside
> the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
> something on the device. This way we're sure the guest doesn't starve on
> interrupts until someone fixes the actual interrupt path.

Given this issue mostly concerns x86 and not other architectures where
the SATA emulation can probably be used, what about putting the two
versions of the codes like in i8259.c:

| * all targets should do this rather than acking the IRQ in the cpu */
| #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)

The list of architectures here is reduced given the few architectures 
that actually use the i8259, so for ahci.c it should probably be #if not
defined(TARGET_I386) instead.
Alexander Graf - Feb. 1, 2011, 4:53 p.m.
On 01.02.2011, at 17:34, Aurelien Jarno wrote:

> On Tue, Feb 01, 2011 at 03:51:32PM +0100, Alexander Graf wrote:
>> When using level based interrupts, the interrupt is treated the same as an
>> edge triggered one: leaving the line up does not retrigger the interrupt.
>> 
>> In fact, when not lowering the line, we won't ever get a new interrupt inside
>> the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
>> something on the device. This way we're sure the guest doesn't starve on
>> interrupts until someone fixes the actual interrupt path.
> 
> Given this issue mostly concerns x86 and not other architectures where
> the SATA emulation can probably be used, what about putting the two
> versions of the codes like in i8259.c:
> 
> | * all targets should do this rather than acking the IRQ in the cpu */
> | #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)
> 
> The list of architectures here is reduced given the few architectures 
> that actually use the i8259, so for ahci.c it should probably be #if not
> defined(TARGET_I386) instead.

Because then we'd have to build the ahci code conditionally on the architecture. Right now it builds into libhw :)


Alex
Aurelien Jarno - Feb. 1, 2011, 5:06 p.m.
On Tue, Feb 01, 2011 at 05:53:43PM +0100, Alexander Graf wrote:
> 
> On 01.02.2011, at 17:34, Aurelien Jarno wrote:
> 
> > On Tue, Feb 01, 2011 at 03:51:32PM +0100, Alexander Graf wrote:
> >> When using level based interrupts, the interrupt is treated the same as an
> >> edge triggered one: leaving the line up does not retrigger the interrupt.
> >> 
> >> In fact, when not lowering the line, we won't ever get a new interrupt inside
> >> the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
> >> something on the device. This way we're sure the guest doesn't starve on
> >> interrupts until someone fixes the actual interrupt path.
> > 
> > Given this issue mostly concerns x86 and not other architectures where
> > the SATA emulation can probably be used, what about putting the two
> > versions of the codes like in i8259.c:
> > 
> > | * all targets should do this rather than acking the IRQ in the cpu */
> > | #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)
> > 
> > The list of architectures here is reduced given the few architectures 
> > that actually use the i8259, so for ahci.c it should probably be #if not
> > defined(TARGET_I386) instead.
> 
> Because then we'd have to build the ahci code conditionally on the architecture. Right now it builds into libhw :)

If we want to keep it in libhw, it's probably better to disable it on
other architectures than i386.
Alexander Graf - Feb. 1, 2011, 5:10 p.m.
On 01.02.2011, at 18:06, Aurelien Jarno wrote:

> On Tue, Feb 01, 2011 at 05:53:43PM +0100, Alexander Graf wrote:
>> 
>> On 01.02.2011, at 17:34, Aurelien Jarno wrote:
>> 
>>> On Tue, Feb 01, 2011 at 03:51:32PM +0100, Alexander Graf wrote:
>>>> When using level based interrupts, the interrupt is treated the same as an
>>>> edge triggered one: leaving the line up does not retrigger the interrupt.
>>>> 
>>>> In fact, when not lowering the line, we won't ever get a new interrupt inside
>>>> the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
>>>> something on the device. This way we're sure the guest doesn't starve on
>>>> interrupts until someone fixes the actual interrupt path.
>>> 
>>> Given this issue mostly concerns x86 and not other architectures where
>>> the SATA emulation can probably be used, what about putting the two
>>> versions of the codes like in i8259.c:
>>> 
>>> | * all targets should do this rather than acking the IRQ in the cpu */
>>> | #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)
>>> 
>>> The list of architectures here is reduced given the few architectures 
>>> that actually use the i8259, so for ahci.c it should probably be #if not
>>> defined(TARGET_I386) instead.
>> 
>> Because then we'd have to build the ahci code conditionally on the architecture. Right now it builds into libhw :)
> 
> If we want to keep it in libhw, it's probably better to disable it on
> other architectures than i386.

How so? The workaround simply triggers a superfluous interrupt event, but shouldn't break anything for other architectures. In fact, last time I checked it worked just fine on ppc.


Alex
Aurelien Jarno - Feb. 1, 2011, 5:15 p.m.
On Tue, Feb 01, 2011 at 06:10:56PM +0100, Alexander Graf wrote:
> 
> On 01.02.2011, at 18:06, Aurelien Jarno wrote:
> 
> > On Tue, Feb 01, 2011 at 05:53:43PM +0100, Alexander Graf wrote:
> >> 
> >> On 01.02.2011, at 17:34, Aurelien Jarno wrote:
> >> 
> >>> On Tue, Feb 01, 2011 at 03:51:32PM +0100, Alexander Graf wrote:
> >>>> When using level based interrupts, the interrupt is treated the same as an
> >>>> edge triggered one: leaving the line up does not retrigger the interrupt.
> >>>> 
> >>>> In fact, when not lowering the line, we won't ever get a new interrupt inside
> >>>> the guest. So let's always retrigger an interrupt as soon as the OS ack'ed
> >>>> something on the device. This way we're sure the guest doesn't starve on
> >>>> interrupts until someone fixes the actual interrupt path.
> >>> 
> >>> Given this issue mostly concerns x86 and not other architectures where
> >>> the SATA emulation can probably be used, what about putting the two
> >>> versions of the codes like in i8259.c:
> >>> 
> >>> | * all targets should do this rather than acking the IRQ in the cpu */
> >>> | #if defined(TARGET_MIPS) || defined(TARGET_PPC) || defined(TARGET_ALPHA)
> >>> 
> >>> The list of architectures here is reduced given the few architectures 
> >>> that actually use the i8259, so for ahci.c it should probably be #if not
> >>> defined(TARGET_I386) instead.
> >> 
> >> Because then we'd have to build the ahci code conditionally on the architecture. Right now it builds into libhw :)
> > 
> > If we want to keep it in libhw, it's probably better to disable it on
> > other architectures than i386.
> 
> How so? The workaround simply triggers a superfluous interrupt event, but shouldn't break anything for other architectures. In fact, last time I checked it worked just fine on ppc.
> 

Superfluous interrupt events can have a lot of consequences, it's often
a kernel panic (see recent zilog serial port issues for example), and
when it comes to disk controllers, it might mean data loss.

Anyway the way to go there is cleary to fix the x86 interrupt model,
it's the only one broken among all targets. If we can't find anyone to
do that, we can also declare x86 unsupported :)

Patch

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 98bdf70..bce7fba 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -152,11 +152,10 @@  static void ahci_check_irq(AHCIState *s)
         }
     }
 
+    ahci_irq_lower(s, NULL);
     if (s->control_regs.irqstatus &&
         (s->control_regs.ghc & HOST_CTL_IRQ_EN)) {
             ahci_irq_raise(s, NULL);
-    } else {
-        ahci_irq_lower(s, NULL);
     }
 }