diff mbox

sparc32 irq clearing (guest Solaris performance+NetBSD) fix

Message ID fb8d4f70911131703m12f56e36ofeb0172bcc9d0734@mail.gmail.com
State New
Headers show

Commit Message

Artyom Tarasenko Nov. 14, 2009, 1:03 a.m. UTC
According to NCR89C105 documentation
http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt

Interrupts are cleared by disabling and then re-enabling them.
This patch implements the specified behaviour. The most visible effects:
- NetBSD 1.3.3 - 1.5.3 boots successfully
- Solaris 2.5.1 - 7 boots ~1500 times faster (~20 seconds instead of ~8 hours)

Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
---
                 s->intregm_disabled);

Comments

Blue Swirl Nov. 14, 2009, 3:26 p.m. UTC | #1
On Sat, Nov 14, 2009 at 3:03 AM, Artyom Tarasenko
<atar4qemu@googlemail.com> wrote:
> According to NCR89C105 documentation
> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
>
> Interrupts are cleared by disabling and then re-enabling them.
> This patch implements the specified behaviour. The most visible effects:

I think the current version also implements this behaviour. The
difference is that now we clear on disable, with your version, the
interrupts are cleared when re-enabling them.

With the current version, the interrupts which arrived after getting
cleared during disable but before re-enabling become visible after
re-enabling. It looks like esp driver in Aurora 1.0, 2.0 and 2.1
depend on this.

> - NetBSD 1.3.3 - 1.5.3 boots successfully

I didn't find those even on ftp.netbsd.org, the first I have is 1.6.
Thus I could not test if any of the changes I did had any positive
effect except if some test failed.

> - Solaris 2.5.1 - 7 boots ~1500 times faster (~20 seconds instead of ~8 hours)
>
> Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
> ---
> diff --git a/hw/slavio_intctl.c b/hw/slavio_intctl.c
> index 9680392..779c661 100644
> --- a/hw/slavio_intctl.c
> +++ b/hw/slavio_intctl.c
> @@ -177,19 +177,19 @@ static void slavio_intctlm_mem_writel(void
> *opaque, target_phys_addr_t addr,
>     saddr = addr >> 2;
>     DPRINTF("write system reg 0x" TARGET_FMT_plx " = %x\n", addr, val);
>     switch (saddr) {
> -    case 2: // clear (enable)
> +    case 2: // clear (enable, clear formerly disabled pending)
>         // Force clear unused bits
>         val &= MASTER_IRQ_MASK;
> +        s->intregm_pending &= (s->intregm_disabled & val);

This looks buggy, the AND operation will clear a lot of bits unrelated
to val. I think you are missing a ~ here. I tried a few combinations,
but none of them passed my tests.

>         s->intregm_disabled &= ~val;
>         DPRINTF("Enabled master irq mask %x, curmask %x\n", val,
>                 s->intregm_disabled);
>         slavio_check_interrupts(s, 1);
>         break;
> -    case 3: // set (disable, clear pending)
> +    case 3: // set (disable, do not clear pending)
>         // Force clear unused bits
>         val &= MASTER_IRQ_MASK;
>         s->intregm_disabled |= val;
> -        s->intregm_pending &= ~val;

Here
s->intregm_pending &= ~s->intregm_disabled;
would also pass my tests. Does that change anything?

>         slavio_check_interrupts(s, 1);
>         DPRINTF("Disabled master irq mask %x, curmask %x\n", val,
>                 s->intregm_disabled);
>
Artyom Tarasenko Nov. 14, 2009, 11:15 p.m. UTC | #2
2009/11/14 Blue Swirl <blauwirbel@gmail.com>:
> On Sat, Nov 14, 2009 at 3:03 AM, Artyom Tarasenko
> <atar4qemu@googlemail.com> wrote:
>> According to NCR89C105 documentation
>> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
>>
>> Interrupts are cleared by disabling and then re-enabling them.
>> This patch implements the specified behaviour. The most visible effects:
>
> I think the current version also implements this behaviour. The
> difference is that now we clear on disable, with your version, the
> interrupts are cleared when re-enabling them.

Doesn't this imply that the current version does not implement this
("Interrupts are cleared by disabling and then re-enabling them") behavior? ;-)

> With the current version, the interrupts which arrived after getting
> cleared during disable but before re-enabling become visible after
> re-enabling. It looks like esp driver in Aurora 1.0, 2.0 and 2.1
> depend on this.

Hm. It certainly makes sense. But I don't see how does it agree with
the NCR89C105 docs. Can it be the documentation is not precise?

>> - NetBSD 1.3.3 - 1.5.3 boots successfully
>
> I didn't find those even on ftp.netbsd.org, the first I have is 1.6.
> Thus I could not test if any of the changes I did had any positive
> effect except if some test failed.

Have you looked in the NetBSD-archive at ftp.netbsd.org?
ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-archive/

>> - Solaris 2.5.1 - 7 boots ~1500 times faster (~20 seconds instead of ~8 hours)
>>
>> Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
>> ---
>> diff --git a/hw/slavio_intctl.c b/hw/slavio_intctl.c
>> index 9680392..779c661 100644
>> --- a/hw/slavio_intctl.c
>> +++ b/hw/slavio_intctl.c
>> @@ -177,19 +177,19 @@ static void slavio_intctlm_mem_writel(void
>> *opaque, target_phys_addr_t addr,
>>     saddr = addr >> 2;
>>     DPRINTF("write system reg 0x" TARGET_FMT_plx " = %x\n", addr, val);
>>     switch (saddr) {
>> -    case 2: // clear (enable)
>> +    case 2: // clear (enable, clear formerly disabled pending)
>>         // Force clear unused bits
>>         val &= MASTER_IRQ_MASK;
>> +        s->intregm_pending &= (s->intregm_disabled & val);
>
> This looks buggy, the AND operation will clear a lot of bits unrelated
> to val. I think you are missing a ~ here. I tried a few combinations,
> but none of them passed my tests.

My bad. It had to be s->intregm_pending &= ~(s->intregm_disabled &
val) . But unfortunately it doesn't work.

>>         s->intregm_disabled &= ~val;
>>         DPRINTF("Enabled master irq mask %x, curmask %x\n", val,
>>                 s->intregm_disabled);
>>         slavio_check_interrupts(s, 1);
>>         break;
>> -    case 3: // set (disable, clear pending)
>> +    case 3: // set (disable, do not clear pending)
>>         // Force clear unused bits
>>         val &= MASTER_IRQ_MASK;
>>         s->intregm_disabled |= val;
>> -        s->intregm_pending &= ~val;
>
> Here
> s->intregm_pending &= ~s->intregm_disabled;
> would also pass my tests. Does that change anything?

No, doesn't work for me either.

Also I put some additional printfs into sun4m.c and see that
interrupts are both set while being already set, and reset while not being set.
Looks like a bug?

static void cpu_set_irq(void *opaque, int irq, int level)
{
    CPUState *env = opaque;

    if (level) {
        DPRINTF("Raise CPU IRQ %d\n", irq);
        env->halted = 0;
        if(env->pil_in & (1 << irq)) {printf("already set: %d\n",irq);return;}
        env->pil_in |= 1 << irq;
        cpu_check_irqs(env);
    } else {
        DPRINTF("Lower CPU IRQ %d\n", irq);
        if(~(env->pil_in & (1 << irq))) {printf("already low:
%d\n",irq);return;}
        env->pil_in &= ~(1 << irq);
        cpu_check_irqs(env);
    }
}
Blue Swirl Nov. 15, 2009, 10:13 p.m. UTC | #3
On Sun, Nov 15, 2009 at 1:15 AM, Artyom Tarasenko
<atar4qemu@googlemail.com> wrote:
> 2009/11/14 Blue Swirl <blauwirbel@gmail.com>:
>> On Sat, Nov 14, 2009 at 3:03 AM, Artyom Tarasenko
>> <atar4qemu@googlemail.com> wrote:
>>> According to NCR89C105 documentation
>>> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
>>>
>>> Interrupts are cleared by disabling and then re-enabling them.
>>> This patch implements the specified behaviour. The most visible effects:
>>
>> I think the current version also implements this behaviour. The
>> difference is that now we clear on disable, with your version, the
>> interrupts are cleared when re-enabling them.
>
> Doesn't this imply that the current version does not implement this
> ("Interrupts are cleared by disabling and then re-enabling them") behavior? ;-)

The specification only says that the sequence disable-enable clears
interrupts, but not which of these is true:
- clearing happens in the moment of disabling (and interrupts after
that are not cleared, current version)
- clearing happens  in the moment of re-enabling (your version, sort of)
- clearing happens in both cases (lose interrupts)

It's also interesting to think what happens between the interrupt
controller and the devices. Clearing an interrupt at controller level
does not clear the interrupt condition at the device. Aren't the
interrupts level triggered on Sparc, so the interrupt is still posted?
Is the interrupt actually masked by clearing until the level is
deactivated?

Or maybe the controller latches the interrupt so that even after the
device releases the interrupt line, interrupt is still active towards
the CPU. Then the clearing would make sense. This does not match
current code.

>> With the current version, the interrupts which arrived after getting
>> cleared during disable but before re-enabling become visible after
>> re-enabling. It looks like esp driver in Aurora 1.0, 2.0 and 2.1
>> depend on this.
>
> Hm. It certainly makes sense. But I don't see how does it agree with
> the NCR89C105 docs. Can it be the documentation is not precise?
>
>>> - NetBSD 1.3.3 - 1.5.3 boots successfully
>>
>> I didn't find those even on ftp.netbsd.org, the first I have is 1.6.
>> Thus I could not test if any of the changes I did had any positive
>> effect except if some test failed.
>
> Have you looked in the NetBSD-archive at ftp.netbsd.org?
> ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-archive/

Yes, but there are no ISOs.

>>> - Solaris 2.5.1 - 7 boots ~1500 times faster (~20 seconds instead of ~8 hours)
>>>
>>> Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
>>> ---
>>> diff --git a/hw/slavio_intctl.c b/hw/slavio_intctl.c
>>> index 9680392..779c661 100644
>>> --- a/hw/slavio_intctl.c
>>> +++ b/hw/slavio_intctl.c
>>> @@ -177,19 +177,19 @@ static void slavio_intctlm_mem_writel(void
>>> *opaque, target_phys_addr_t addr,
>>>     saddr = addr >> 2;
>>>     DPRINTF("write system reg 0x" TARGET_FMT_plx " = %x\n", addr, val);
>>>     switch (saddr) {
>>> -    case 2: // clear (enable)
>>> +    case 2: // clear (enable, clear formerly disabled pending)
>>>         // Force clear unused bits
>>>         val &= MASTER_IRQ_MASK;
>>> +        s->intregm_pending &= (s->intregm_disabled & val);
>>
>> This looks buggy, the AND operation will clear a lot of bits unrelated
>> to val. I think you are missing a ~ here. I tried a few combinations,
>> but none of them passed my tests.
>
> My bad. It had to be s->intregm_pending &= ~(s->intregm_disabled &
> val) . But unfortunately it doesn't work.
>
>>>         s->intregm_disabled &= ~val;
>>>         DPRINTF("Enabled master irq mask %x, curmask %x\n", val,
>>>                 s->intregm_disabled);
>>>         slavio_check_interrupts(s, 1);
>>>         break;
>>> -    case 3: // set (disable, clear pending)
>>> +    case 3: // set (disable, do not clear pending)
>>>         // Force clear unused bits
>>>         val &= MASTER_IRQ_MASK;
>>>         s->intregm_disabled |= val;
>>> -        s->intregm_pending &= ~val;
>>
>> Here
>> s->intregm_pending &= ~s->intregm_disabled;
>> would also pass my tests. Does that change anything?
>
> No, doesn't work for me either.
>
> Also I put some additional printfs into sun4m.c and see that
> interrupts are both set while being already set, and reset while not being set.
> Looks like a bug?
>
> static void cpu_set_irq(void *opaque, int irq, int level)
> {
>    CPUState *env = opaque;
>
>    if (level) {
>        DPRINTF("Raise CPU IRQ %d\n", irq);
>        env->halted = 0;
>        if(env->pil_in & (1 << irq)) {printf("already set: %d\n",irq);return;}
>        env->pil_in |= 1 << irq;
>        cpu_check_irqs(env);
>    } else {
>        DPRINTF("Lower CPU IRQ %d\n", irq);
>        if(~(env->pil_in & (1 << irq))) {printf("already low:
> %d\n",irq);return;}
>        env->pil_in &= ~(1 << irq);
>        cpu_check_irqs(env);
>    }
> }
>

It should be harmless, though it may suck performance.
Artyom Tarasenko Nov. 16, 2009, 11:47 a.m. UTC | #4
2009/11/15 Blue Swirl <blauwirbel@gmail.com>:
> On Sun, Nov 15, 2009 at 1:15 AM, Artyom Tarasenko
> <atar4qemu@googlemail.com> wrote:
>> 2009/11/14 Blue Swirl <blauwirbel@gmail.com>:
>>> On Sat, Nov 14, 2009 at 3:03 AM, Artyom Tarasenko
>>> <atar4qemu@googlemail.com> wrote:
>>>> According to NCR89C105 documentation
>>>> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
>>>>
>>>> Interrupts are cleared by disabling and then re-enabling them.
>>>> This patch implements the specified behaviour. The most visible effects:
>>>
>>> I think the current version also implements this behaviour. The
>>> difference is that now we clear on disable, with your version, the
>>> interrupts are cleared when re-enabling them.
>>
>> Doesn't this imply that the current version does not implement this
>> ("Interrupts are cleared by disabling and then re-enabling them") behavior? ;-)
>
> The specification only says that the sequence disable-enable clears
> interrupts, but not which of these is true:
> - clearing happens in the moment of disabling (and interrupts after
> that are not cleared, current version)
> - clearing happens  in the moment of re-enabling (your version, sort of)
> - clearing happens in both cases (lose interrupts)

English is not my native language, but fwiw I think "and then
re-enabling" can only be the second variant. Without "then" it could
be either one or three. And if the first variant is what they really
meant, the part with "and then" is totally redundant and misleading.

> It's also interesting to think what happens between the interrupt
> controller and the devices. Clearing an interrupt at controller level
> does not clear the interrupt condition at the device. Aren't the
> interrupts level triggered on Sparc, so the interrupt is still posted?
> Is the interrupt actually masked by clearing until the level is
> deactivated?

Looks unlikely to me. In the book "Panic! Unix system crash dump
analysis" the authors write that the first thing interrupt handler has
to do is disable the interrupt, and yes wrting "unix" they mean
"SunOS/Solaris".

That's also what I observe debugging the Solaris kernel code
(Solaris kernel debugger is a really powerful tool).
Looks like interrupts can be shared between devices, so the general
handler disables the interrupt and then calls multiple device-specific
handlers sequentially and checks if any of then claims the interrupt.
If no one does it writes the message "Spurious interrupt %d\n".


> Or maybe the controller latches the interrupt so that even after the
> device releases the interrupt line, interrupt is still active towards
> the CPU. Then the clearing would make sense.

Looks very realistic to me. I think that's the way the interrupts are
handled at least under x86.

> This does not match current code.
>
>>> With the current version, the interrupts which arrived after getting
>>> cleared during disable but before re-enabling become visible after
>>> re-enabling. It looks like esp driver in Aurora 1.0, 2.0 and 2.1
>>> depend on this.
>>
>> Hm. It certainly makes sense. But I don't see how does it agree with
>> the NCR89C105 docs. Can it be the documentation is not precise?
>>
>>>> - NetBSD 1.3.3 - 1.5.3 boots successfully
>>>
>>> I didn't find those even on ftp.netbsd.org, the first I have is 1.6.
>>> Thus I could not test if any of the changes I did had any positive
>>> effect except if some test failed.
>>
>> Have you looked in the NetBSD-archive at ftp.netbsd.org?
>> ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-archive/
>
> Yes, but there are no ISOs.

Don't have ISOs either. I think they did never exist for the early
versions. Miniroots can be used.
One nice [missing] feature of OpenBIOS is that it doesn't check
whether a disk label is valid.
The disk labels of NetBSD miniroots are invalid, so it's not possible
to boot them on a real hw,
or under OBP (it drove me crazy when I was fixing the scsi layer to
work with OBP because
I thought miniroots were hdd images).
But, under OpenBIOS it works like a charm:

qemu-system-sparc -nographic  -hda miniroot-133.fs -m 64

>>>> - Solaris 2.5.1 - 7 boots ~1500 times faster (~20 seconds instead of ~8 hours)
>>>>
>>>> Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
>>>> ---
>>>> diff --git a/hw/slavio_intctl.c b/hw/slavio_intctl.c
>>>> index 9680392..779c661 100644
>>>> --- a/hw/slavio_intctl.c
>>>> +++ b/hw/slavio_intctl.c
>>>> @@ -177,19 +177,19 @@ static void slavio_intctlm_mem_writel(void
>>>> *opaque, target_phys_addr_t addr,
>>>>     saddr = addr >> 2;
>>>>     DPRINTF("write system reg 0x" TARGET_FMT_plx " = %x\n", addr, val);
>>>>     switch (saddr) {
>>>> -    case 2: // clear (enable)
>>>> +    case 2: // clear (enable, clear formerly disabled pending)
>>>>         // Force clear unused bits
>>>>         val &= MASTER_IRQ_MASK;
>>>> +        s->intregm_pending &= (s->intregm_disabled & val);
>>>
>>> This looks buggy, the AND operation will clear a lot of bits unrelated
>>> to val. I think you are missing a ~ here. I tried a few combinations,
>>> but none of them passed my tests.
>>
>> My bad. It had to be s->intregm_pending &= ~(s->intregm_disabled &
>> val) . But unfortunately it doesn't work.
>>
>>>>         s->intregm_disabled &= ~val;
>>>>         DPRINTF("Enabled master irq mask %x, curmask %x\n", val,
>>>>                 s->intregm_disabled);
>>>>         slavio_check_interrupts(s, 1);
>>>>         break;
>>>> -    case 3: // set (disable, clear pending)
>>>> +    case 3: // set (disable, do not clear pending)
>>>>         // Force clear unused bits
>>>>         val &= MASTER_IRQ_MASK;
>>>>         s->intregm_disabled |= val;
>>>> -        s->intregm_pending &= ~val;
>>>
>>> Here
>>> s->intregm_pending &= ~s->intregm_disabled;
>>> would also pass my tests. Does that change anything?
>>
>> No, doesn't work for me either.
>>
>> Also I put some additional printfs into sun4m.c and see that
>> interrupts are both set while being already set, and reset while not being set.
>> Looks like a bug?
>>
>> static void cpu_set_irq(void *opaque, int irq, int level)
>> {
>>    CPUState *env = opaque;
>>
>>    if (level) {
>>        DPRINTF("Raise CPU IRQ %d\n", irq);
>>        env->halted = 0;
>>        if(env->pil_in & (1 << irq)) {printf("already set: %d\n",irq);return;}
>>        env->pil_in |= 1 << irq;
>>        cpu_check_irqs(env);
>>    } else {
>>        DPRINTF("Lower CPU IRQ %d\n", irq);
>>        if(~(env->pil_in & (1 << irq))) {printf("already low:
>> %d\n",irq);return;}
>>        env->pil_in &= ~(1 << irq);
>>        cpu_check_irqs(env);
>>    }
>> }
>>
>
> It should be harmless, though it may suck performance.

Do you mean the check against old_interrupt in the cpu_check_irqs()?
To me it looks harmless only in the case when we have screaming
interrupts of the same level. Or do you mean something else?

The problem is, with "return" after "pritnf" above at least Linux,
NetBSD and OBP
hang (or booting so slow that it looks like they hang).

It looks like something is depending on this broken behaviour.
Blue Swirl Nov. 16, 2009, 4:14 p.m. UTC | #5
On Mon, Nov 16, 2009 at 1:47 PM, Artyom Tarasenko
<atar4qemu@googlemail.com> wrote:
> 2009/11/15 Blue Swirl <blauwirbel@gmail.com>:
>> On Sun, Nov 15, 2009 at 1:15 AM, Artyom Tarasenko
>> <atar4qemu@googlemail.com> wrote:
>>> 2009/11/14 Blue Swirl <blauwirbel@gmail.com>:
>>>> On Sat, Nov 14, 2009 at 3:03 AM, Artyom Tarasenko
>>>> <atar4qemu@googlemail.com> wrote:
>>>>> According to NCR89C105 documentation
>>>>> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
>>>>>
>>>>> Interrupts are cleared by disabling and then re-enabling them.
>>>>> This patch implements the specified behaviour. The most visible effects:
>>>>
>>>> I think the current version also implements this behaviour. The
>>>> difference is that now we clear on disable, with your version, the
>>>> interrupts are cleared when re-enabling them.
>>>
>>> Doesn't this imply that the current version does not implement this
>>> ("Interrupts are cleared by disabling and then re-enabling them") behavior? ;-)
>>
>> The specification only says that the sequence disable-enable clears
>> interrupts, but not which of these is true:
>> - clearing happens in the moment of disabling (and interrupts after
>> that are not cleared, current version)
>> - clearing happens  in the moment of re-enabling (your version, sort of)
>> - clearing happens in both cases (lose interrupts)
>
> English is not my native language, but fwiw I think "and then
> re-enabling" can only be the second variant. Without "then" it could
> be either one or three. And if the first variant is what they really
> meant, the part with "and then" is totally redundant and misleading.

Still, this is user documentation, not implementation specification.
I'm open to both versions, if they work.

>> It's also interesting to think what happens between the interrupt
>> controller and the devices. Clearing an interrupt at controller level
>> does not clear the interrupt condition at the device. Aren't the
>> interrupts level triggered on Sparc, so the interrupt is still posted?
>> Is the interrupt actually masked by clearing until the level is
>> deactivated?
>
> Looks unlikely to me. In the book "Panic! Unix system crash dump
> analysis" the authors write that the first thing interrupt handler has
> to do is disable the interrupt, and yes wrting "unix" they mean
> "SunOS/Solaris".
>
> That's also what I observe debugging the Solaris kernel code
> (Solaris kernel debugger is a really powerful tool).
> Looks like interrupts can be shared between devices, so the general
> handler disables the interrupt and then calls multiple device-specific
> handlers sequentially and checks if any of then claims the interrupt.
> If no one does it writes the message "Spurious interrupt %d\n".
>
>
>> Or maybe the controller latches the interrupt so that even after the
>> device releases the interrupt line, interrupt is still active towards
>> the CPU. Then the clearing would make sense.
>
> Looks very realistic to me. I think that's the way the interrupts are
> handled at least under x86.

It's a must on x86, because the interrupts are edge triggered.

>> This does not match current code.
>>
>>>> With the current version, the interrupts which arrived after getting
>>>> cleared during disable but before re-enabling become visible after
>>>> re-enabling. It looks like esp driver in Aurora 1.0, 2.0 and 2.1
>>>> depend on this.
>>>
>>> Hm. It certainly makes sense. But I don't see how does it agree with
>>> the NCR89C105 docs. Can it be the documentation is not precise?
>>>
>>>>> - NetBSD 1.3.3 - 1.5.3 boots successfully
>>>>
>>>> I didn't find those even on ftp.netbsd.org, the first I have is 1.6.
>>>> Thus I could not test if any of the changes I did had any positive
>>>> effect except if some test failed.
>>>
>>> Have you looked in the NetBSD-archive at ftp.netbsd.org?
>>> ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-archive/
>>
>> Yes, but there are no ISOs.
>
> Don't have ISOs either. I think they did never exist for the early
> versions. Miniroots can be used.
> One nice [missing] feature of OpenBIOS is that it doesn't check
> whether a disk label is valid.
> The disk labels of NetBSD miniroots are invalid, so it's not possible
> to boot them on a real hw,
> or under OBP (it drove me crazy when I was fixing the scsi layer to
> work with OBP because
> I thought miniroots were hdd images).
> But, under OpenBIOS it works like a charm:
>
> qemu-system-sparc -nographic  -hda miniroot-133.fs -m 64

Thanks, now I can try something.

>>>>> - Solaris 2.5.1 - 7 boots ~1500 times faster (~20 seconds instead of ~8 hours)
>>>>>
>>>>> Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
>>>>> ---
>>>>> diff --git a/hw/slavio_intctl.c b/hw/slavio_intctl.c
>>>>> index 9680392..779c661 100644
>>>>> --- a/hw/slavio_intctl.c
>>>>> +++ b/hw/slavio_intctl.c
>>>>> @@ -177,19 +177,19 @@ static void slavio_intctlm_mem_writel(void
>>>>> *opaque, target_phys_addr_t addr,
>>>>>     saddr = addr >> 2;
>>>>>     DPRINTF("write system reg 0x" TARGET_FMT_plx " = %x\n", addr, val);
>>>>>     switch (saddr) {
>>>>> -    case 2: // clear (enable)
>>>>> +    case 2: // clear (enable, clear formerly disabled pending)
>>>>>         // Force clear unused bits
>>>>>         val &= MASTER_IRQ_MASK;
>>>>> +        s->intregm_pending &= (s->intregm_disabled & val);
>>>>
>>>> This looks buggy, the AND operation will clear a lot of bits unrelated
>>>> to val. I think you are missing a ~ here. I tried a few combinations,
>>>> but none of them passed my tests.
>>>
>>> My bad. It had to be s->intregm_pending &= ~(s->intregm_disabled &
>>> val) . But unfortunately it doesn't work.
>>>
>>>>>         s->intregm_disabled &= ~val;
>>>>>         DPRINTF("Enabled master irq mask %x, curmask %x\n", val,
>>>>>                 s->intregm_disabled);
>>>>>         slavio_check_interrupts(s, 1);
>>>>>         break;
>>>>> -    case 3: // set (disable, clear pending)
>>>>> +    case 3: // set (disable, do not clear pending)
>>>>>         // Force clear unused bits
>>>>>         val &= MASTER_IRQ_MASK;
>>>>>         s->intregm_disabled |= val;
>>>>> -        s->intregm_pending &= ~val;
>>>>
>>>> Here
>>>> s->intregm_pending &= ~s->intregm_disabled;
>>>> would also pass my tests. Does that change anything?
>>>
>>> No, doesn't work for me either.
>>>
>>> Also I put some additional printfs into sun4m.c and see that
>>> interrupts are both set while being already set, and reset while not being set.
>>> Looks like a bug?
>>>
>>> static void cpu_set_irq(void *opaque, int irq, int level)
>>> {
>>>    CPUState *env = opaque;
>>>
>>>    if (level) {
>>>        DPRINTF("Raise CPU IRQ %d\n", irq);
>>>        env->halted = 0;
>>>        if(env->pil_in & (1 << irq)) {printf("already set: %d\n",irq);return;}
>>>        env->pil_in |= 1 << irq;
>>>        cpu_check_irqs(env);
>>>    } else {
>>>        DPRINTF("Lower CPU IRQ %d\n", irq);
>>>        if(~(env->pil_in & (1 << irq))) {printf("already low:
>>> %d\n",irq);return;}
>>>        env->pil_in &= ~(1 << irq);
>>>        cpu_check_irqs(env);
>>>    }
>>> }
>>>
>>
>> It should be harmless, though it may suck performance.
>
> Do you mean the check against old_interrupt in the cpu_check_irqs()?
> To me it looks harmless only in the case when we have screaming
> interrupts of the same level. Or do you mean something else?

If the old level is unchanged, there should be no action. What's the
case where you see problems?

> The problem is, with "return" after "pritnf" above at least Linux,
> NetBSD and OBP
> hang (or booting so slow that it looks like they hang).
>
> It looks like something is depending on this broken behaviour.
Artyom Tarasenko Nov. 16, 2009, 4:45 p.m. UTC | #6
> It looks like something is depending on this broken behaviour.

Or that I shouldn't have used bitwise "not" where boolean "not" were
actually meant. cpu_set_irq() is not the source of spurious
interrupts, sorry for the noise.
Artyom Tarasenko Nov. 16, 2009, 5:19 p.m. UTC | #7
2009/11/16 Blue Swirl <blauwirbel@gmail.com>:
> On Mon, Nov 16, 2009 at 1:47 PM, Artyom Tarasenko
> <atar4qemu@googlemail.com> wrote:
>> 2009/11/15 Blue Swirl <blauwirbel@gmail.com>:
>>> On Sun, Nov 15, 2009 at 1:15 AM, Artyom Tarasenko
>>> <atar4qemu@googlemail.com> wrote:
>>>> 2009/11/14 Blue Swirl <blauwirbel@gmail.com>:
>>>>> On Sat, Nov 14, 2009 at 3:03 AM, Artyom Tarasenko
>>>>> <atar4qemu@googlemail.com> wrote:
>>>>>> According to NCR89C105 documentation
>>>>>> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
>>>>>>
>>>>>> Interrupts are cleared by disabling and then re-enabling them.
>>>>>> This patch implements the specified behaviour. The most visible effects:
>>>>>
>>>>> I think the current version also implements this behaviour. The
>>>>> difference is that now we clear on disable, with your version, the
>>>>> interrupts are cleared when re-enabling them.
>>>>
>>>> Doesn't this imply that the current version does not implement this
>>>> ("Interrupts are cleared by disabling and then re-enabling them") behavior? ;-)
>>>
>>> The specification only says that the sequence disable-enable clears
>>> interrupts, but not which of these is true:
>>> - clearing happens in the moment of disabling (and interrupts after
>>> that are not cleared, current version)
>>> - clearing happens  in the moment of re-enabling (your version, sort of)
>>> - clearing happens in both cases (lose interrupts)
>>
>> English is not my native language, but fwiw I think "and then
>> re-enabling" can only be the second variant. Without "then" it could
>> be either one or three. And if the first variant is what they really
>> meant, the part with "and then" is totally redundant and misleading.
>
> Still, this is user documentation, not implementation specification.
> I'm open to both versions, if they work.
>
>>> It's also interesting to think what happens between the interrupt
>>> controller and the devices. Clearing an interrupt at controller level
>>> does not clear the interrupt condition at the device. Aren't the
>>> interrupts level triggered on Sparc, so the interrupt is still posted?
>>> Is the interrupt actually masked by clearing until the level is
>>> deactivated?
>>
>> Looks unlikely to me. In the book "Panic! Unix system crash dump
>> analysis" the authors write that the first thing interrupt handler has
>> to do is disable the interrupt, and yes wrting "unix" they mean
>> "SunOS/Solaris".
>>
>> That's also what I observe debugging the Solaris kernel code
>> (Solaris kernel debugger is a really powerful tool).
>> Looks like interrupts can be shared between devices, so the general
>> handler disables the interrupt and then calls multiple device-specific
>> handlers sequentially and checks if any of then claims the interrupt.
>> If no one does it writes the message "Spurious interrupt %d\n".
>>
>>
>>> Or maybe the controller latches the interrupt so that even after the
>>> device releases the interrupt line, interrupt is still active towards
>>> the CPU. Then the clearing would make sense.
>>
>> Looks very realistic to me. I think that's the way the interrupts are
>> handled at least under x86.
>
> It's a must on x86, because the interrupts are edge triggered.

I don't know, how the real sun4m reacts in the case where irq stays
on, not being cleared.
It can not be though that it would try to process irq for every next
tick. The CPU must have some time to clear the pending irq, so it must
be edge triggered too, at least in a way.

>>> This does not match current code.
>>>
>>>>> With the current version, the interrupts which arrived after getting
>>>>> cleared during disable but before re-enabling become visible after
>>>>> re-enabling. It looks like esp driver in Aurora 1.0, 2.0 and 2.1
>>>>> depend on this.
>>>>
>>>> Hm. It certainly makes sense. But I don't see how does it agree with
>>>> the NCR89C105 docs. Can it be the documentation is not precise?
>>>>
>>>>>> - NetBSD 1.3.3 - 1.5.3 boots successfully
>>>>>
>>>>> I didn't find those even on ftp.netbsd.org, the first I have is 1.6.
>>>>> Thus I could not test if any of the changes I did had any positive
>>>>> effect except if some test failed.
>>>>
>>>> Have you looked in the NetBSD-archive at ftp.netbsd.org?
>>>> ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-archive/
>>>
>>> Yes, but there are no ISOs.
>>
>> Don't have ISOs either. I think they did never exist for the early
>> versions. Miniroots can be used.
>> One nice [missing] feature of OpenBIOS is that it doesn't check
>> whether a disk label is valid.
>> The disk labels of NetBSD miniroots are invalid, so it's not possible
>> to boot them on a real hw,
>> or under OBP (it drove me crazy when I was fixing the scsi layer to
>> work with OBP because
>> I thought miniroots were hdd images).
>> But, under OpenBIOS it works like a charm:
>>
>> qemu-system-sparc -nographic  -hda miniroot-133.fs -m 64
>
> Thanks, now I can try something.
>
>>>>>> - Solaris 2.5.1 - 7 boots ~1500 times faster (~20 seconds instead of ~8 hours)
>>>>>>
>>>>>> Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
>>>>>> ---
>>>>>> diff --git a/hw/slavio_intctl.c b/hw/slavio_intctl.c
>>>>>> index 9680392..779c661 100644
>>>>>> --- a/hw/slavio_intctl.c
>>>>>> +++ b/hw/slavio_intctl.c
>>>>>> @@ -177,19 +177,19 @@ static void slavio_intctlm_mem_writel(void
>>>>>> *opaque, target_phys_addr_t addr,
>>>>>>     saddr = addr >> 2;
>>>>>>     DPRINTF("write system reg 0x" TARGET_FMT_plx " = %x\n", addr, val);
>>>>>>     switch (saddr) {
>>>>>> -    case 2: // clear (enable)
>>>>>> +    case 2: // clear (enable, clear formerly disabled pending)
>>>>>>         // Force clear unused bits
>>>>>>         val &= MASTER_IRQ_MASK;
>>>>>> +        s->intregm_pending &= (s->intregm_disabled & val);
>>>>>
>>>>> This looks buggy, the AND operation will clear a lot of bits unrelated
>>>>> to val. I think you are missing a ~ here. I tried a few combinations,
>>>>> but none of them passed my tests.
>>>>
>>>> My bad. It had to be s->intregm_pending &= ~(s->intregm_disabled &
>>>> val) . But unfortunately it doesn't work.
>>>>
>>>>>>         s->intregm_disabled &= ~val;
>>>>>>         DPRINTF("Enabled master irq mask %x, curmask %x\n", val,
>>>>>>                 s->intregm_disabled);
>>>>>>         slavio_check_interrupts(s, 1);
>>>>>>         break;
>>>>>> -    case 3: // set (disable, clear pending)
>>>>>> +    case 3: // set (disable, do not clear pending)
>>>>>>         // Force clear unused bits
>>>>>>         val &= MASTER_IRQ_MASK;
>>>>>>         s->intregm_disabled |= val;
>>>>>> -        s->intregm_pending &= ~val;
>>>>>
>>>>> Here
>>>>> s->intregm_pending &= ~s->intregm_disabled;
>>>>> would also pass my tests. Does that change anything?
>>>>
>>>> No, doesn't work for me either.
>>>>
>>>> Also I put some additional printfs into sun4m.c and see that
>>>> interrupts are both set while being already set, and reset while not being set.
>>>> Looks like a bug?
>>>>
>>>> static void cpu_set_irq(void *opaque, int irq, int level)
>>>> {
>>>>    CPUState *env = opaque;
>>>>
>>>>    if (level) {
>>>>        DPRINTF("Raise CPU IRQ %d\n", irq);
>>>>        env->halted = 0;
>>>>        if(env->pil_in & (1 << irq)) {printf("already set: %d\n",irq);return;}
>>>>        env->pil_in |= 1 << irq;
>>>>        cpu_check_irqs(env);
>>>>    } else {
>>>>        DPRINTF("Lower CPU IRQ %d\n", irq);
>>>>        if(~(env->pil_in & (1 << irq))) {printf("already low:
>>>> %d\n",irq);return;}
>>>>        env->pil_in &= ~(1 << irq);
>>>>        cpu_check_irqs(env);
>>>>    }
>>>> }
>>>>
>>>
>>> It should be harmless, though it may suck performance.
>>
>> Do you mean the check against old_interrupt in the cpu_check_irqs()?
>> To me it looks harmless only in the case when we have screaming
>> interrupts of the same level. Or do you mean something else?
>
> If the old level is unchanged, there should be no action. What's the
> case where you see problems?


In the case where irq 10 and 14 are both being set it can happen that
old_interrupt==14 and another call cpu_set_irq(10)  would call
do_interrupt() ?
Blue Swirl Nov. 16, 2009, 9:53 p.m. UTC | #8
On Mon, Nov 16, 2009 at 7:19 PM, Artyom Tarasenko
<atar4qemu@googlemail.com> wrote:
> 2009/11/16 Blue Swirl <blauwirbel@gmail.com>:
>> On Mon, Nov 16, 2009 at 1:47 PM, Artyom Tarasenko
>> <atar4qemu@googlemail.com> wrote:
>>> 2009/11/15 Blue Swirl <blauwirbel@gmail.com>:
>>>> On Sun, Nov 15, 2009 at 1:15 AM, Artyom Tarasenko
>>>> <atar4qemu@googlemail.com> wrote:
>>>>> 2009/11/14 Blue Swirl <blauwirbel@gmail.com>:
>>>>>> On Sat, Nov 14, 2009 at 3:03 AM, Artyom Tarasenko
>>>>>> <atar4qemu@googlemail.com> wrote:
>>>>>>> According to NCR89C105 documentation
>>>>>>> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
>>>>>>>
>>>>>>> Interrupts are cleared by disabling and then re-enabling them.
>>>>>>> This patch implements the specified behaviour. The most visible effects:
>>>>>>
>>>>>> I think the current version also implements this behaviour. The
>>>>>> difference is that now we clear on disable, with your version, the
>>>>>> interrupts are cleared when re-enabling them.
>>>>>
>>>>> Doesn't this imply that the current version does not implement this
>>>>> ("Interrupts are cleared by disabling and then re-enabling them") behavior? ;-)
>>>>
>>>> The specification only says that the sequence disable-enable clears
>>>> interrupts, but not which of these is true:
>>>> - clearing happens in the moment of disabling (and interrupts after
>>>> that are not cleared, current version)
>>>> - clearing happens  in the moment of re-enabling (your version, sort of)
>>>> - clearing happens in both cases (lose interrupts)
>>>
>>> English is not my native language, but fwiw I think "and then
>>> re-enabling" can only be the second variant. Without "then" it could
>>> be either one or three. And if the first variant is what they really
>>> meant, the part with "and then" is totally redundant and misleading.
>>
>> Still, this is user documentation, not implementation specification.
>> I'm open to both versions, if they work.
>>
>>>> It's also interesting to think what happens between the interrupt
>>>> controller and the devices. Clearing an interrupt at controller level
>>>> does not clear the interrupt condition at the device. Aren't the
>>>> interrupts level triggered on Sparc, so the interrupt is still posted?
>>>> Is the interrupt actually masked by clearing until the level is
>>>> deactivated?
>>>
>>> Looks unlikely to me. In the book "Panic! Unix system crash dump
>>> analysis" the authors write that the first thing interrupt handler has
>>> to do is disable the interrupt, and yes wrting "unix" they mean
>>> "SunOS/Solaris".
>>>
>>> That's also what I observe debugging the Solaris kernel code
>>> (Solaris kernel debugger is a really powerful tool).
>>> Looks like interrupts can be shared between devices, so the general
>>> handler disables the interrupt and then calls multiple device-specific
>>> handlers sequentially and checks if any of then claims the interrupt.
>>> If no one does it writes the message "Spurious interrupt %d\n".
>>>
>>>
>>>> Or maybe the controller latches the interrupt so that even after the
>>>> device releases the interrupt line, interrupt is still active towards
>>>> the CPU. Then the clearing would make sense.
>>>
>>> Looks very realistic to me. I think that's the way the interrupts are
>>> handled at least under x86.
>>
>> It's a must on x86, because the interrupts are edge triggered.
>
> I don't know, how the real sun4m reacts in the case where irq stays
> on, not being cleared.
> It can not be though that it would try to process irq for every next
> tick. The CPU must have some time to clear the pending irq, so it must
> be edge triggered too, at least in a way.

This patch makes the interrupts latch: ignore source clearing the
interrupt. It seems be ~okay for my usual test setup, but does not
help NetBSD 1.3.3. Some other NetBSD tests are changed, but they
crashed before.
Jamie Lokier Nov. 16, 2009, 10:27 p.m. UTC | #9
Artyom Tarasenko wrote:
> 2009/11/15 Blue Swirl <blauwirbel@gmail.com>:
> > On Sun, Nov 15, 2009 at 1:15 AM, Artyom Tarasenko
> > <atar4qemu@googlemail.com> wrote:
> >> 2009/11/14 Blue Swirl <blauwirbel@gmail.com>:
> >>> On Sat, Nov 14, 2009 at 3:03 AM, Artyom Tarasenko
> >>> <atar4qemu@googlemail.com> wrote:
> >>>> According to NCR89C105 documentation
> >>>> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
> >>>>
> >>>> Interrupts are cleared by disabling and then re-enabling them.
> >>>> This patch implements the specified behaviour. The most visible effects:
> >>>
> >>> I think the current version also implements this behaviour. The
> >>> difference is that now we clear on disable, with your version, the
> >>> interrupts are cleared when re-enabling them.
> >>
> >> Doesn't this imply that the current version does not implement this
> >> ("Interrupts are cleared by disabling and then re-enabling them") behavior? ;-)
> >
> > The specification only says that the sequence disable-enable clears
> > interrupts, but not which of these is true:
> > - clearing happens in the moment of disabling (and interrupts after
> > that are not cleared, current version)
> > - clearing happens  in the moment of re-enabling (your version, sort of)
> > - clearing happens in both cases (lose interrupts)
> 
> English is not my native language, but fwiw I think "and then
> re-enabling" can only be the second variant. Without "then" it could
> be either one or three. And if the first variant is what they really
> meant, the part with "and then" is totally redundant and misleading.

It could also be a fourth:

- Clearing happens continuously _during_ the time the interrupt is disabled.

Note that neither this, nor the third possibility, have to cause lost
interrupts - that depends on whether the code which enables interrupts
checks for them being pending after enabling them, or checks the
devices generating them.

> > It's also interesting to think what happens between the interrupt
> > controller and the devices. Clearing an interrupt at controller level
> > does not clear the interrupt condition at the device. Aren't the
> > interrupts level triggered on Sparc, so the interrupt is still posted?
> > Is the interrupt actually masked by clearing until the level is
> > deactivated?
> 
> Looks unlikely to me. In the book "Panic! Unix system crash dump
> analysis" the authors write that the first thing interrupt handler has
> to do is disable the interrupt, and yes wrting "unix" they mean
> "SunOS/Solaris".
>
> That's also what I observe debugging the Solaris kernel code
> (Solaris kernel debugger is a really powerful tool).
> Looks like interrupts can be shared between devices, so the general
> handler disables the interrupt and then calls multiple device-specific
> handlers sequentially and checks if any of then claims the interrupt.
> If no one does it writes the message "Spurious interrupt %d\n".

That's consistent with level triggered interrupts, and may require the
interrupt to be cleared at the device before it is re-enabled at the
interrupt controller.

> > Or maybe the controller latches the interrupt so that even after the
> > device releases the interrupt line, interrupt is still active towards
> > the CPU. Then the clearing would make sense.
> 
> Looks very realistic to me.

From what you said above about Solaris, I don't think you can
distinguish between interrupts being asserted only while a device
continues to assert it, and interrupts remaining asserted after the
device stops.

-- Jamie
Jamie Lokier Nov. 16, 2009, 10:31 p.m. UTC | #10
Blue Swirl wrote:
> This patch makes the interrupts latch: ignore source clearing the
> interrupt. It seems be ~okay for my usual test setup, but does not
> help NetBSD 1.3.3. Some other NetBSD tests are changed, but they
> crashed before.

I'd expect the worst that would happen is "spurious interrupt"
messages under some kernel, according to the earlier description of
how Solaris handles interrupts (which is quite usual).

At least you'd expect the behaviour to be fine because no interrupts
are lost.

Drivers which go wrong because their interrupt handler is called
unnecessarily (or at least the "test device has condition pending"
function) tend to be rather fragile, because it's expected when
interrupt lines are shared between devices.

Then again, some old devices never did shared interrupt lines much.

-- Jamie
Jamie Lokier Nov. 16, 2009, 10:35 p.m. UTC | #11
Artyom Tarasenko wrote:
> I don't know, how the real sun4m reacts in the case where irq stays
> on, not being cleared.
> It can not be though that it would try to process irq for every next
> tick. The CPU must have some time to clear the pending irq, so it must
> be edge triggered too, at least in a way.

In general, most CPUs have a "disable interrupts" flag which is set at
the same time as calling the irq handler.

That's enough, even if everything about interrupts is level triggered,
so you can't assume anything is edge triggered just from that alone.

However if a CPU doesn't have an "disable interrupts" flag, then of
course the triggering must be edge triggered somewhere.

-- Jamie
Artyom Tarasenko Nov. 16, 2009, 10:44 p.m. UTC | #12
2009/11/16 Jamie Lokier <jamie@shareable.org>:
> Artyom Tarasenko wrote:
>> I don't know, how the real sun4m reacts in the case where irq stays
>> on, not being cleared.
>> It can not be though that it would try to process irq for every next
>> tick. The CPU must have some time to clear the pending irq, so it must
>> be edge triggered too, at least in a way.
>
> In general, most CPUs have a "disable interrupts" flag which is set at
> the same time as calling the irq handler.

I see. The handler has enough time to clear pending irq then. Maybe
there is really no latch.
Artyom Tarasenko Nov. 16, 2009, 10:50 p.m. UTC | #13
2009/11/16 Blue Swirl <blauwirbel@gmail.com>:
> On Mon, Nov 16, 2009 at 7:19 PM, Artyom Tarasenko
> <atar4qemu@googlemail.com> wrote:
>> 2009/11/16 Blue Swirl <blauwirbel@gmail.com>:
>>> On Mon, Nov 16, 2009 at 1:47 PM, Artyom Tarasenko
>>> <atar4qemu@googlemail.com> wrote:
>>>> 2009/11/15 Blue Swirl <blauwirbel@gmail.com>:
>>>>> On Sun, Nov 15, 2009 at 1:15 AM, Artyom Tarasenko
>>>>> <atar4qemu@googlemail.com> wrote:
>>>>>> 2009/11/14 Blue Swirl <blauwirbel@gmail.com>:
>>>>>>> On Sat, Nov 14, 2009 at 3:03 AM, Artyom Tarasenko
>>>>>>> <atar4qemu@googlemail.com> wrote:
>>>>>>>> According to NCR89C105 documentation
>>>>>>>> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
>>>>>>>>
>>>>>>>> Interrupts are cleared by disabling and then re-enabling them.
>>>>>>>> This patch implements the specified behaviour. The most visible effects:
>>>>>>>
>>>>>>> I think the current version also implements this behaviour. The
>>>>>>> difference is that now we clear on disable, with your version, the
>>>>>>> interrupts are cleared when re-enabling them.
>>>>>>
>>>>>> Doesn't this imply that the current version does not implement this
>>>>>> ("Interrupts are cleared by disabling and then re-enabling them") behavior? ;-)
>>>>>
>>>>> The specification only says that the sequence disable-enable clears
>>>>> interrupts, but not which of these is true:
>>>>> - clearing happens in the moment of disabling (and interrupts after
>>>>> that are not cleared, current version)
>>>>> - clearing happens  in the moment of re-enabling (your version, sort of)
>>>>> - clearing happens in both cases (lose interrupts)
>>>>
>>>> English is not my native language, but fwiw I think "and then
>>>> re-enabling" can only be the second variant. Without "then" it could
>>>> be either one or three. And if the first variant is what they really
>>>> meant, the part with "and then" is totally redundant and misleading.
>>>
>>> Still, this is user documentation, not implementation specification.
>>> I'm open to both versions, if they work.
>>>
>>>>> It's also interesting to think what happens between the interrupt
>>>>> controller and the devices. Clearing an interrupt at controller level
>>>>> does not clear the interrupt condition at the device. Aren't the
>>>>> interrupts level triggered on Sparc, so the interrupt is still posted?
>>>>> Is the interrupt actually masked by clearing until the level is
>>>>> deactivated?
>>>>
>>>> Looks unlikely to me. In the book "Panic! Unix system crash dump
>>>> analysis" the authors write that the first thing interrupt handler has
>>>> to do is disable the interrupt, and yes wrting "unix" they mean
>>>> "SunOS/Solaris".
>>>>
>>>> That's also what I observe debugging the Solaris kernel code
>>>> (Solaris kernel debugger is a really powerful tool).
>>>> Looks like interrupts can be shared between devices, so the general
>>>> handler disables the interrupt and then calls multiple device-specific
>>>> handlers sequentially and checks if any of then claims the interrupt.
>>>> If no one does it writes the message "Spurious interrupt %d\n".
>>>>
>>>>
>>>>> Or maybe the controller latches the interrupt so that even after the
>>>>> device releases the interrupt line, interrupt is still active towards
>>>>> the CPU. Then the clearing would make sense.
>>>>
>>>> Looks very realistic to me. I think that's the way the interrupts are
>>>> handled at least under x86.
>>>
>>> It's a must on x86, because the interrupts are edge triggered.
>>
>> I don't know, how the real sun4m reacts in the case where irq stays
>> on, not being cleared.
>> It can not be though that it would try to process irq for every next
>> tick. The CPU must have some time to clear the pending irq, so it must
>> be edge triggered too, at least in a way.
>
> This patch makes the interrupts latch: ignore source clearing the
> interrupt. It seems be ~okay for my usual test setup, but does not
> help NetBSD 1.3.3. Some other NetBSD tests are changed, but they
> crashed before.

Makes no difference in my tests. Except that my broken patch really
clears too many interrupts when combined with it. Will play with it
further.
Blue Swirl Nov. 17, 2009, 8:45 p.m. UTC | #14
On Tue, Nov 17, 2009 at 12:35 AM, Jamie Lokier <jamie@shareable.org> wrote:
> Artyom Tarasenko wrote:
>> I don't know, how the real sun4m reacts in the case where irq stays
>> on, not being cleared.
>> It can not be though that it would try to process irq for every next
>> tick. The CPU must have some time to clear the pending irq, so it must
>> be edge triggered too, at least in a way.
>
> In general, most CPUs have a "disable interrupts" flag which is set at
> the same time as calling the irq handler.
>
> That's enough, even if everything about interrupts is level triggered,
> so you can't assume anything is edge triggered just from that alone.
>
> However if a CPU doesn't have an "disable interrupts" flag, then of
> course the triggering must be edge triggered somewhere.

Sparc has a flag for disabling all traps. Also the minimum allowed
interrupt level (PIL) can be selected from 0 to 14, level 15 is
non-maskable from CPU point of view. This level allows interrupt
handlers to be interrupted by a higher priority interrupt, if the
traps are enabled.
Artyom Tarasenko Jan. 15, 2010, 10:37 p.m. UTC | #15
after running some OBP/forth tests on a real SS-20 I must say that
most of our (especially my) speculations were wrong, as well as what
is written in
http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt :

1. SS-20 may loose interrupts. At least if a timer interrupt was lowered while
still being masked, it is not triggered when enabled.

2. Neither masking, nor unmasking interrupt clears pending bits in the
status register. NCR89C105.txt claims it should, but
Sun4M_SystemArchitecture_edited2.pdf doesn't. I guess the later one is
more reliable.
diff mbox

Patch

diff --git a/hw/slavio_intctl.c b/hw/slavio_intctl.c
index 9680392..779c661 100644
--- a/hw/slavio_intctl.c
+++ b/hw/slavio_intctl.c
@@ -177,19 +177,19 @@  static void slavio_intctlm_mem_writel(void
*opaque, target_phys_addr_t addr,
     saddr = addr >> 2;
     DPRINTF("write system reg 0x" TARGET_FMT_plx " = %x\n", addr, val);
     switch (saddr) {
-    case 2: // clear (enable)
+    case 2: // clear (enable, clear formerly disabled pending)
         // Force clear unused bits
         val &= MASTER_IRQ_MASK;
+        s->intregm_pending &= (s->intregm_disabled & val);
         s->intregm_disabled &= ~val;
         DPRINTF("Enabled master irq mask %x, curmask %x\n", val,
                 s->intregm_disabled);
         slavio_check_interrupts(s, 1);
         break;
-    case 3: // set (disable, clear pending)
+    case 3: // set (disable, do not clear pending)
         // Force clear unused bits
         val &= MASTER_IRQ_MASK;
         s->intregm_disabled |= val;
-        s->intregm_pending &= ~val;
         slavio_check_interrupts(s, 1);
         DPRINTF("Disabled master irq mask %x, curmask %x\n", val,