diff mbox

[v3,0/2] 40p: fix PCI interrupt routing

Message ID 20180910204631.24106-1-mark.cave-ayland@ilande.co.uk
State New
Headers show

Commit Message

Mark Cave-Ayland Sept. 10, 2018, 8:46 p.m. UTC
According to the PReP specification section 6.1.6 "System Interrupt
Assignments", all PCI interrupts are routed via IRQ 15.

In the case of the 40p machine this isn't quite true in that it has a routing
quirk: the LSI SCSI device is always routed to IRQ 13. At least Linux and
NetBSD compare the model name presented by the firmware to "IBM PPS Model
6015", and if it matches will active this quirk.

In order for guest OSs to make use of the fixed IRQ routing, the model name
in the residual data must be changed in OpenBIOS using the diff below:


With the above OpenBIOS patch applied as well as this patchset, it is now
possible to boot the sandalfoot zImage all the way through to a working
userspace when using OpenBIOS.

(Note: this patchset requires the changes in my previous patchset "scsi:
replace lsi53c895a_create() and lsi53c810_create() functions)

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Based-on: <20180907125653.5010-1-mark.cave-ayland@ilande.co.uk>

v3:
- Add external IRQ to LSI SCSI device instead of hacking the PCI interrupt
  routing as suggested by David
- Rebase onto the patches from v2 already applied to ppc-for-3.1

v2:
- Add OR gate as recommended by Zoltan and implement routing quirk by hacking
  the raven PCI interrupt routing


Mark Cave-Ayland (2):
  lsi53c895a: add optional external IRQ via qdev
  40p: add fixed IRQ routing for LSI SCSI device

 hw/ppc/prep.c        | 11 ++++++-----
 hw/scsi/lsi53c895a.c | 16 ++++++++++++++--
 2 files changed, 20 insertions(+), 7 deletions(-)

Comments

Hervé Poussineau Sept. 12, 2018, 7:23 p.m. UTC | #1
Le 10/09/2018 à 22:46, Mark Cave-Ayland a écrit :
> According to the PReP specification section 6.1.6 "System Interrupt
> Assignments", all PCI interrupts are routed via IRQ 15.
> 
> In the case of the 40p machine this isn't quite true in that it has a routing
> quirk: the LSI SCSI device is always routed to IRQ 13. At least Linux and
> NetBSD compare the model name presented by the firmware to "IBM PPS Model
> 6015", and if it matches will active this quirk.
> 
> In order for guest OSs to make use of the fixed IRQ routing, the model name
> in the residual data must be changed in OpenBIOS using the diff below:
> 
> diff --git a/arch/ppc/qemu/context.c b/arch/ppc/qemu/context.c
> index 06e0122..5815895 100644
> --- a/arch/ppc/qemu/context.c
> +++ b/arch/ppc/qemu/context.c
> @@ -111,7 +111,7 @@ static void *
>   residual_build(uint32_t memsize, uint32_t load_base, uint32_t load_size)
>   {
>       residual_t *res;
> -    const unsigned char model[] = "Qemu\0PPC\0";
> +    const unsigned char model[] = "IBM PPS Model 6015\0";
>       int i;
>   
>       res = malloc(sizeof(residual_t));
> 
> With the above OpenBIOS patch applied as well as this patchset, it is now
> possible to boot the sandalfoot zImage all the way through to a working
> userspace when using OpenBIOS.
> 
> (Note: this patchset requires the changes in my previous patchset "scsi:
> replace lsi53c895a_create() and lsi53c810_create() functions)
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Based-on: <20180907125653.5010-1-mark.cave-ayland@ilande.co.uk>
> 
> v3:
> - Add external IRQ to LSI SCSI device instead of hacking the PCI interrupt
>    routing as suggested by David
> - Rebase onto the patches from v2 already applied to ppc-for-3.1
> 
> v2:
> - Add OR gate as recommended by Zoltan and implement routing quirk by hacking
>    the raven PCI interrupt routing
> 
> 
> Mark Cave-Ayland (2):
>    lsi53c895a: add optional external IRQ via qdev
>    40p: add fixed IRQ routing for LSI SCSI device
> 
>   hw/ppc/prep.c        | 11 ++++++-----
>   hw/scsi/lsi53c895a.c | 16 ++++++++++++++--
>   2 files changed, 20 insertions(+), 7 deletions(-)
> 

Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Tested-by: Hervé Poussineau <hpoussin@reactos.org>
David Gibson Sept. 17, 2018, 3:54 a.m. UTC | #2
On Mon, Sep 10, 2018 at 09:46:29PM +0100, Mark Cave-Ayland wrote:
> According to the PReP specification section 6.1.6 "System Interrupt
> Assignments", all PCI interrupts are routed via IRQ 15.
> 
> In the case of the 40p machine this isn't quite true in that it has a routing
> quirk: the LSI SCSI device is always routed to IRQ 13. At least Linux and
> NetBSD compare the model name presented by the firmware to "IBM PPS Model
> 6015", and if it matches will active this quirk.
> 
> In order for guest OSs to make use of the fixed IRQ routing, the model name
> in the residual data must be changed in OpenBIOS using the diff below:
> 
> diff --git a/arch/ppc/qemu/context.c b/arch/ppc/qemu/context.c
> index 06e0122..5815895 100644
> --- a/arch/ppc/qemu/context.c
> +++ b/arch/ppc/qemu/context.c
> @@ -111,7 +111,7 @@ static void *
>  residual_build(uint32_t memsize, uint32_t load_base, uint32_t load_size)
>  {
>      residual_t *res;
> -    const unsigned char model[] = "Qemu\0PPC\0";
> +    const unsigned char model[] = "IBM PPS Model 6015\0";
>      int i;
>  
>      res = malloc(sizeof(residual_t));
> 
> With the above OpenBIOS patch applied as well as this patchset, it is now
> possible to boot the sandalfoot zImage all the way through to a working
> userspace when using OpenBIOS.
> 
> (Note: this patchset requires the changes in my previous patchset "scsi:
> replace lsi53c895a_create() and lsi53c810_create() functions)
> 
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Based-on: <20180907125653.5010-1-mark.cave-ayland@ilande.co.uk>

Mark,

I think we have all the necessary acks to go ahead with this.
However, I'm afraid I've lost track of the various prereq patches that
were necessary here.  Can you resend with all the necessary pieces
rebased against ppc-for-3.1 and the appropriate acked-bys included?

> 
> v3:
> - Add external IRQ to LSI SCSI device instead of hacking the PCI interrupt
>   routing as suggested by David
> - Rebase onto the patches from v2 already applied to ppc-for-3.1
> 
> v2:
> - Add OR gate as recommended by Zoltan and implement routing quirk by hacking
>   the raven PCI interrupt routing
> 
> 
> Mark Cave-Ayland (2):
>   lsi53c895a: add optional external IRQ via qdev
>   40p: add fixed IRQ routing for LSI SCSI device
> 
>  hw/ppc/prep.c        | 11 ++++++-----
>  hw/scsi/lsi53c895a.c | 16 ++++++++++++++--
>  2 files changed, 20 insertions(+), 7 deletions(-)
>
Mark Cave-Ayland Sept. 17, 2018, 4:29 a.m. UTC | #3
On 17/09/18 04:54, David Gibson wrote:

> On Mon, Sep 10, 2018 at 09:46:29PM +0100, Mark Cave-Ayland wrote:
>> According to the PReP specification section 6.1.6 "System Interrupt
>> Assignments", all PCI interrupts are routed via IRQ 15.
>>
>> In the case of the 40p machine this isn't quite true in that it has a routing
>> quirk: the LSI SCSI device is always routed to IRQ 13. At least Linux and
>> NetBSD compare the model name presented by the firmware to "IBM PPS Model
>> 6015", and if it matches will active this quirk.
>>
>> In order for guest OSs to make use of the fixed IRQ routing, the model name
>> in the residual data must be changed in OpenBIOS using the diff below:
>>
>> diff --git a/arch/ppc/qemu/context.c b/arch/ppc/qemu/context.c
>> index 06e0122..5815895 100644
>> --- a/arch/ppc/qemu/context.c
>> +++ b/arch/ppc/qemu/context.c
>> @@ -111,7 +111,7 @@ static void *
>>  residual_build(uint32_t memsize, uint32_t load_base, uint32_t load_size)
>>  {
>>      residual_t *res;
>> -    const unsigned char model[] = "Qemu\0PPC\0";
>> +    const unsigned char model[] = "IBM PPS Model 6015\0";
>>      int i;
>>  
>>      res = malloc(sizeof(residual_t));
>>
>> With the above OpenBIOS patch applied as well as this patchset, it is now
>> possible to boot the sandalfoot zImage all the way through to a working
>> userspace when using OpenBIOS.
>>
>> (Note: this patchset requires the changes in my previous patchset "scsi:
>> replace lsi53c895a_create() and lsi53c810_create() functions)
>>
>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>> Based-on: <20180907125653.5010-1-mark.cave-ayland@ilande.co.uk>
> 
> Mark,
> 
> I think we have all the necessary acks to go ahead with this.
> However, I'm afraid I've lost track of the various prereq patches that
> were necessary here.  Can you resend with all the necessary pieces
> rebased against ppc-for-3.1 and the appropriate acked-bys included?

Sure, no problem. I'll resend it this evening as a new 40p PCI routing
"roll-up" patchset.


ATB,

Mark.
Mark Cave-Ayland Sept. 18, 2018, 9:12 p.m. UTC | #4
On 17/09/2018 04:54, David Gibson wrote:

> Mark,
> 
> I think we have all the necessary acks to go ahead with this.
> However, I'm afraid I've lost track of the various prereq patches that
> were necessary here.  Can you resend with all the necessary pieces
> rebased against ppc-for-3.1 and the appropriate acked-bys included?

Apologies for the delay on this one, but once I rebased onto your ppc
for-3.1 branch I noticed a regression whereby the 40p sandalfoot image
would fail to boot to userspace - it would hang towards the end of the
boot with:

Bad inittab entry:
Bad inittab entry:
No more tasks for init -- sleeping forever

In the end I bisected it down to this patch:

commit d12a22c5c7ccb6cb7e8438b7e6e1011c6b712ae2
Author: Roman Kapl <rka@sysgo.com>
Date:   Tue Sep 11 13:34:51 2018 +0200

    target/ppc: add external PID support

Playing with this a bit more, it appears that the bug is only triggered
if I configure QEMU with --enable-debug which is my development default.

Roman, can you reproduce this locally at all? My setup is nothing
special, just Debian Stretch on amd64.


ATB,

Mark.
Mark Cave-Ayland Sept. 19, 2018, 6:57 a.m. UTC | #5
On 18/09/2018 22:12, Mark Cave-Ayland wrote:

> Roman, can you reproduce this locally at all? My setup is nothing
> special, just Debian Stretch on amd64.

I've just realised that because this bug is still dependent upon queued patches, it
would make sense for me to push a slightly modified version of David's ppc-for-3.1
branch to github to help reproduce the issue. Below are the instructions on how to
reproduce the bug locally:


1) Grab the 40p test image from http://www.juneau-lug.org/zImage.initrd.sandalfoot

2) Fetch my slightly-modified ppc-for-3.1 branch from
https://github.com/mcayland/qemu/tree/ppc-for-3.1-regression

3) Build QEMU with --enable-debug:

./configure --target-list=ppc-softmmu --enable-debug
make

4) Boot the 40p test image:

./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p

You will see that the test image fails at the end of boot with messages like this:

Bad inittab entry:
Bad inittab entry:
No more tasks for init -- sleeping forever

5) Now remove the "target/ppc: add external PID support" patch from the above branch:

git checkout HEAD~1
make

6) Boot the 40p test image again:

./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p

You will see that the test image now boots successfully to the "Please press Enter to
activate this console" message.


ATB,

Mark.
Roman Kapl Sept. 19, 2018, 7:34 a.m. UTC | #6
Hi,

On 09/19/2018 08:57 AM, Mark Cave-Ayland wrote:
> On 18/09/2018 22:12, Mark Cave-Ayland wrote:
> 
>> Roman, can you reproduce this locally at all? My setup is nothing
>> special, just Debian Stretch on amd64.
> 
> I've just realised that because this bug is still dependent upon queued patches, it
> would make sense for me to push a slightly modified version of David's ppc-for-3.1
> branch to github to help reproduce the issue. Below are the instructions on how to
> reproduce the bug locally:
> 
> 
> 1) Grab the 40p test image from http://www.juneau-lug.org/zImage.initrd.sandalfoot
> 
> 2) Fetch my slightly-modified ppc-for-3.1 branch from
> https://github.com/mcayland/qemu/tree/ppc-for-3.1-regression
> 
> 3) Build QEMU with --enable-debug:
> 
> ./configure --target-list=ppc-softmmu --enable-debug
> make
> 
> 4) Boot the 40p test image:
> 
> ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p
> 
> You will see that the test image fails at the end of boot with messages like this:
> 
> Bad inittab entry:
> Bad inittab entry:
> No more tasks for init -- sleeping forever
> 
> 5) Now remove the "target/ppc: add external PID support" patch from the above branch:
> 
> git checkout HEAD~1
> make
> 
> 6) Boot the 40p test image again:
> 
> ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p
> 
> You will see that the test image now boots successfully to the "Please press Enter to
> activate this console" message.
> 

Hm... that indeed looks bad, I will have a look at it.

Thanks for notifying me, Roman
Roman Kapl Sept. 19, 2018, 2:47 p.m. UTC | #7
Hi,

On 09/19/2018 08:57 AM, Mark Cave-Ayland wrote:
> On 18/09/2018 22:12, Mark Cave-Ayland wrote:
> 
>> Roman, can you reproduce this locally at all? My setup is nothing
>> special, just Debian Stretch on amd64.

Ok, so I am able to reproduce it with your image and --enable-debug, but 
I was not able to find the root cause, just narrow it.

It seems that the `dcbz` instruction is not emulated correctly (which 
may lead to some garbage in inittab?). However, if I manualy inline the 
`helper_dcbz_common` code into `helper dcbz`, it starts to work. I just 
literally copy it and add `int mmu_idx = env->dmmu_idx;` at the 
beginning. That could be related to the `--disable-debug` flag, since 
the compiler will inline the code when optimizations are enabled.

If you have any debugging ideas, they would be welcome. Maybe there are 
some helper call limitations? Anyway, I will look at it again tomorrow.

Apart from that, I've found some problems in my EPID patch, I will send 
a fix shortly. Namely the instructions are enabled outside of Booke206, 
one unrelated instruction had its opcode chcnaged by mistake and the 
slow path for dbczep is wrong. However, these problems are not related 
to the problem with your image.

Thanks, Roman Kapl

> 
> I've just realised that because this bug is still dependent upon queued patches, it
> would make sense for me to push a slightly modified version of David's ppc-for-3.1
> branch to github to help reproduce the issue. Below are the instructions on how to
> reproduce the bug locally:
> 
> 
> 1) Grab the 40p test image from http://www.juneau-lug.org/zImage.initrd.sandalfoot
> 
> 2) Fetch my slightly-modified ppc-for-3.1 branch from
> https://github.com/mcayland/qemu/tree/ppc-for-3.1-regression
> 
> 3) Build QEMU with --enable-debug:
> 
> ./configure --target-list=ppc-softmmu --enable-debug
> make
> 
> 4) Boot the 40p test image:
> 
> ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p
> 
> You will see that the test image fails at the end of boot with messages like this:
> 
> Bad inittab entry:
> Bad inittab entry:
> No more tasks for init -- sleeping forever
> 
> 5) Now remove the "target/ppc: add external PID support" patch from the above branch:
> 
> git checkout HEAD~1
> make
> 
> 6) Boot the 40p test image again:
> 
> ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p
> 
> You will see that the test image now boots successfully to the "Please press Enter to
> activate this console" message.
> 
> 
> ATB,
> 
> Mark.
>
Peter Maydell Sept. 19, 2018, 3:33 p.m. UTC | #8
On 19 September 2018 at 07:47, Roman Kapl <roman.kapl@sysgo.com> wrote:
> It seems that the `dcbz` instruction is not emulated correctly (which may
> lead to some garbage in inittab?). However, if I manualy inline the
> `helper_dcbz_common` code into `helper dcbz`, it starts to work.

This is because helper_dcbz_common() uses GETPC() to get the
return address inside generated code which will be used when
an exception occurs. This only works from a function called
directly from generated code. If you want to abstract out
into a second function, then you need:
 * the second function to take a retaddr argument, which it
   can then pass to cpu_stq_data_ra()
 * the top level helpers called from TCG to pass GETPC() as
   that retaddr parameter

Incidentally, calling your secondary helper function
"helper_dcbz_common" is not ideal -- the "helper_" prefix
is generally used to indicate functions which are directly
called from TCG generated code as helper functions (which
does matter for some purposes, like this one).

thanks
-- PMM
Mark Cave-Ayland Sept. 19, 2018, 4:54 p.m. UTC | #9
On 19/09/2018 15:47, Roman Kapl wrote:

>>> Roman, can you reproduce this locally at all? My setup is nothing
>>> special, just Debian Stretch on amd64.
> 
> Ok, so I am able to reproduce it with your image and --enable-debug, but I was not
> able to find the root cause, just narrow it.
> 
> It seems that the `dcbz` instruction is not emulated correctly (which may lead to
> some garbage in inittab?). However, if I manualy inline the `helper_dcbz_common` code
> into `helper dcbz`, it starts to work. I just literally copy it and add `int mmu_idx
> = env->dmmu_idx;` at the beginning. That could be related to the `--disable-debug`
> flag, since the compiler will inline the code when optimizations are enabled.
> 
> If you have any debugging ideas, they would be welcome. Maybe there are some helper
> call limitations? Anyway, I will look at it again tomorrow.

Great, glad you managed to reproduce the issue. I'm not sure why the above would make
any difference at all, but I quickly tried your inline test above and that also
caused it to start to work for me? I'm still new at TCG so I'm not exactly sure what
to suggest right now - maybe someone else will have a better idea?

> Apart from that, I've found some problems in my EPID patch, I will send a fix
> shortly. Namely the instructions are enabled outside of Booke206, one unrelated
> instruction had its opcode chcnaged by mistake and the slow path for dbczep is wrong.
> However, these problems are not related to the problem with your image.

No worries.


ATB,

Mark.
Mark Cave-Ayland Sept. 19, 2018, 5:22 p.m. UTC | #10
On 19/09/2018 16:33, Peter Maydell wrote:

> On 19 September 2018 at 07:47, Roman Kapl <roman.kapl@sysgo.com> wrote:
>> It seems that the `dcbz` instruction is not emulated correctly (which may
>> lead to some garbage in inittab?). However, if I manualy inline the
>> `helper_dcbz_common` code into `helper dcbz`, it starts to work.
> 
> This is because helper_dcbz_common() uses GETPC() to get the
> return address inside generated code which will be used when
> an exception occurs. This only works from a function called
> directly from generated code. If you want to abstract out
> into a second function, then you need:
>  * the second function to take a retaddr argument, which it
>    can then pass to cpu_stq_data_ra()
>  * the top level helpers called from TCG to pass GETPC() as
>    that retaddr parameter
> 
> Incidentally, calling your secondary helper function
> "helper_dcbz_common" is not ideal -- the "helper_" prefix
> is generally used to indicate functions which are directly
> called from TCG generated code as helper functions (which
> does matter for some purposes, like this one).

Brilliant! I had no idea about the restrictions on GETPC(). Thanks so much for
helping here, Peter.


ATB,

Mark.
David Gibson Sept. 20, 2018, 1:35 a.m. UTC | #11
On Wed, Sep 19, 2018 at 04:47:36PM +0200, Roman Kapl wrote:
> Hi,
> 
> On 09/19/2018 08:57 AM, Mark Cave-Ayland wrote:
> > On 18/09/2018 22:12, Mark Cave-Ayland wrote:
> > 
> > > Roman, can you reproduce this locally at all? My setup is nothing
> > > special, just Debian Stretch on amd64.
> 
> Ok, so I am able to reproduce it with your image and --enable-debug, but I
> was not able to find the root cause, just narrow it.
> 
> It seems that the `dcbz` instruction is not emulated correctly (which may
> lead to some garbage in inittab?). However, if I manualy inline the
> `helper_dcbz_common` code into `helper dcbz`, it starts to work. I just
> literally copy it and add `int mmu_idx = env->dmmu_idx;` at the beginning.
> That could be related to the `--disable-debug` flag, since the compiler will
> inline the code when optimizations are enabled.

Ouch.  That almost sounds like a compiler bug - have you tried with
some different compiler versions?

> If you have any debugging ideas, they would be welcome. Maybe there are some
> helper call limitations? Anyway, I will look at it again tomorrow.
> 
> Apart from that, I've found some problems in my EPID patch, I will send a
> fix shortly. Namely the instructions are enabled outside of Booke206, one
> unrelated instruction had its opcode chcnaged by mistake and the slow path
> for dbczep is wrong. However, these problems are not related to the problem
> with your image.
> 
> Thanks, Roman Kapl
> 
> > 
> > I've just realised that because this bug is still dependent upon queued patches, it
> > would make sense for me to push a slightly modified version of David's ppc-for-3.1
> > branch to github to help reproduce the issue. Below are the instructions on how to
> > reproduce the bug locally:
> > 
> > 
> > 1) Grab the 40p test image from http://www.juneau-lug.org/zImage.initrd.sandalfoot
> > 
> > 2) Fetch my slightly-modified ppc-for-3.1 branch from
> > https://github.com/mcayland/qemu/tree/ppc-for-3.1-regression
> > 
> > 3) Build QEMU with --enable-debug:
> > 
> > ./configure --target-list=ppc-softmmu --enable-debug
> > make
> > 
> > 4) Boot the 40p test image:
> > 
> > ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p
> > 
> > You will see that the test image fails at the end of boot with messages like this:
> > 
> > Bad inittab entry:
> > Bad inittab entry:
> > No more tasks for init -- sleeping forever
> > 
> > 5) Now remove the "target/ppc: add external PID support" patch from the above branch:
> > 
> > git checkout HEAD~1
> > make
> > 
> > 6) Boot the 40p test image again:
> > 
> > ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p
> > 
> > You will see that the test image now boots successfully to the "Please press Enter to
> > activate this console" message.
> > 
> > 
> > ATB,
> > 
> > Mark.
> > 
>
diff mbox

Patch

diff --git a/arch/ppc/qemu/context.c b/arch/ppc/qemu/context.c
index 06e0122..5815895 100644
--- a/arch/ppc/qemu/context.c
+++ b/arch/ppc/qemu/context.c
@@ -111,7 +111,7 @@  static void *
 residual_build(uint32_t memsize, uint32_t load_base, uint32_t load_size)
 {
     residual_t *res;
-    const unsigned char model[] = "Qemu\0PPC\0";
+    const unsigned char model[] = "IBM PPS Model 6015\0";
     int i;
 
     res = malloc(sizeof(residual_t));