Message ID | 20180910204631.24106-1-mark.cave-ayland@ilande.co.uk |
---|---|
State | New |
Headers | show |
Le 10/09/2018 à 22:46, Mark Cave-Ayland a écrit : > According to the PReP specification section 6.1.6 "System Interrupt > Assignments", all PCI interrupts are routed via IRQ 15. > > In the case of the 40p machine this isn't quite true in that it has a routing > quirk: the LSI SCSI device is always routed to IRQ 13. At least Linux and > NetBSD compare the model name presented by the firmware to "IBM PPS Model > 6015", and if it matches will active this quirk. > > In order for guest OSs to make use of the fixed IRQ routing, the model name > in the residual data must be changed in OpenBIOS using the diff below: > > diff --git a/arch/ppc/qemu/context.c b/arch/ppc/qemu/context.c > index 06e0122..5815895 100644 > --- a/arch/ppc/qemu/context.c > +++ b/arch/ppc/qemu/context.c > @@ -111,7 +111,7 @@ static void * > residual_build(uint32_t memsize, uint32_t load_base, uint32_t load_size) > { > residual_t *res; > - const unsigned char model[] = "Qemu\0PPC\0"; > + const unsigned char model[] = "IBM PPS Model 6015\0"; > int i; > > res = malloc(sizeof(residual_t)); > > With the above OpenBIOS patch applied as well as this patchset, it is now > possible to boot the sandalfoot zImage all the way through to a working > userspace when using OpenBIOS. > > (Note: this patchset requires the changes in my previous patchset "scsi: > replace lsi53c895a_create() and lsi53c810_create() functions) > > Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> > Based-on: <20180907125653.5010-1-mark.cave-ayland@ilande.co.uk> > > v3: > - Add external IRQ to LSI SCSI device instead of hacking the PCI interrupt > routing as suggested by David > - Rebase onto the patches from v2 already applied to ppc-for-3.1 > > v2: > - Add OR gate as recommended by Zoltan and implement routing quirk by hacking > the raven PCI interrupt routing > > > Mark Cave-Ayland (2): > lsi53c895a: add optional external IRQ via qdev > 40p: add fixed IRQ routing for LSI SCSI device > > hw/ppc/prep.c | 11 ++++++----- > hw/scsi/lsi53c895a.c | 16 ++++++++++++++-- > 2 files changed, 20 insertions(+), 7 deletions(-) > Reviewed-by: Hervé Poussineau <hpoussin@reactos.org> Tested-by: Hervé Poussineau <hpoussin@reactos.org>
On Mon, Sep 10, 2018 at 09:46:29PM +0100, Mark Cave-Ayland wrote: > According to the PReP specification section 6.1.6 "System Interrupt > Assignments", all PCI interrupts are routed via IRQ 15. > > In the case of the 40p machine this isn't quite true in that it has a routing > quirk: the LSI SCSI device is always routed to IRQ 13. At least Linux and > NetBSD compare the model name presented by the firmware to "IBM PPS Model > 6015", and if it matches will active this quirk. > > In order for guest OSs to make use of the fixed IRQ routing, the model name > in the residual data must be changed in OpenBIOS using the diff below: > > diff --git a/arch/ppc/qemu/context.c b/arch/ppc/qemu/context.c > index 06e0122..5815895 100644 > --- a/arch/ppc/qemu/context.c > +++ b/arch/ppc/qemu/context.c > @@ -111,7 +111,7 @@ static void * > residual_build(uint32_t memsize, uint32_t load_base, uint32_t load_size) > { > residual_t *res; > - const unsigned char model[] = "Qemu\0PPC\0"; > + const unsigned char model[] = "IBM PPS Model 6015\0"; > int i; > > res = malloc(sizeof(residual_t)); > > With the above OpenBIOS patch applied as well as this patchset, it is now > possible to boot the sandalfoot zImage all the way through to a working > userspace when using OpenBIOS. > > (Note: this patchset requires the changes in my previous patchset "scsi: > replace lsi53c895a_create() and lsi53c810_create() functions) > > Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> > Based-on: <20180907125653.5010-1-mark.cave-ayland@ilande.co.uk> Mark, I think we have all the necessary acks to go ahead with this. However, I'm afraid I've lost track of the various prereq patches that were necessary here. Can you resend with all the necessary pieces rebased against ppc-for-3.1 and the appropriate acked-bys included? > > v3: > - Add external IRQ to LSI SCSI device instead of hacking the PCI interrupt > routing as suggested by David > - Rebase onto the patches from v2 already applied to ppc-for-3.1 > > v2: > - Add OR gate as recommended by Zoltan and implement routing quirk by hacking > the raven PCI interrupt routing > > > Mark Cave-Ayland (2): > lsi53c895a: add optional external IRQ via qdev > 40p: add fixed IRQ routing for LSI SCSI device > > hw/ppc/prep.c | 11 ++++++----- > hw/scsi/lsi53c895a.c | 16 ++++++++++++++-- > 2 files changed, 20 insertions(+), 7 deletions(-) >
On 17/09/18 04:54, David Gibson wrote: > On Mon, Sep 10, 2018 at 09:46:29PM +0100, Mark Cave-Ayland wrote: >> According to the PReP specification section 6.1.6 "System Interrupt >> Assignments", all PCI interrupts are routed via IRQ 15. >> >> In the case of the 40p machine this isn't quite true in that it has a routing >> quirk: the LSI SCSI device is always routed to IRQ 13. At least Linux and >> NetBSD compare the model name presented by the firmware to "IBM PPS Model >> 6015", and if it matches will active this quirk. >> >> In order for guest OSs to make use of the fixed IRQ routing, the model name >> in the residual data must be changed in OpenBIOS using the diff below: >> >> diff --git a/arch/ppc/qemu/context.c b/arch/ppc/qemu/context.c >> index 06e0122..5815895 100644 >> --- a/arch/ppc/qemu/context.c >> +++ b/arch/ppc/qemu/context.c >> @@ -111,7 +111,7 @@ static void * >> residual_build(uint32_t memsize, uint32_t load_base, uint32_t load_size) >> { >> residual_t *res; >> - const unsigned char model[] = "Qemu\0PPC\0"; >> + const unsigned char model[] = "IBM PPS Model 6015\0"; >> int i; >> >> res = malloc(sizeof(residual_t)); >> >> With the above OpenBIOS patch applied as well as this patchset, it is now >> possible to boot the sandalfoot zImage all the way through to a working >> userspace when using OpenBIOS. >> >> (Note: this patchset requires the changes in my previous patchset "scsi: >> replace lsi53c895a_create() and lsi53c810_create() functions) >> >> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> >> Based-on: <20180907125653.5010-1-mark.cave-ayland@ilande.co.uk> > > Mark, > > I think we have all the necessary acks to go ahead with this. > However, I'm afraid I've lost track of the various prereq patches that > were necessary here. Can you resend with all the necessary pieces > rebased against ppc-for-3.1 and the appropriate acked-bys included? Sure, no problem. I'll resend it this evening as a new 40p PCI routing "roll-up" patchset. ATB, Mark.
On 17/09/2018 04:54, David Gibson wrote: > Mark, > > I think we have all the necessary acks to go ahead with this. > However, I'm afraid I've lost track of the various prereq patches that > were necessary here. Can you resend with all the necessary pieces > rebased against ppc-for-3.1 and the appropriate acked-bys included? Apologies for the delay on this one, but once I rebased onto your ppc for-3.1 branch I noticed a regression whereby the 40p sandalfoot image would fail to boot to userspace - it would hang towards the end of the boot with: Bad inittab entry: Bad inittab entry: No more tasks for init -- sleeping forever In the end I bisected it down to this patch: commit d12a22c5c7ccb6cb7e8438b7e6e1011c6b712ae2 Author: Roman Kapl <rka@sysgo.com> Date: Tue Sep 11 13:34:51 2018 +0200 target/ppc: add external PID support Playing with this a bit more, it appears that the bug is only triggered if I configure QEMU with --enable-debug which is my development default. Roman, can you reproduce this locally at all? My setup is nothing special, just Debian Stretch on amd64. ATB, Mark.
On 18/09/2018 22:12, Mark Cave-Ayland wrote: > Roman, can you reproduce this locally at all? My setup is nothing > special, just Debian Stretch on amd64. I've just realised that because this bug is still dependent upon queued patches, it would make sense for me to push a slightly modified version of David's ppc-for-3.1 branch to github to help reproduce the issue. Below are the instructions on how to reproduce the bug locally: 1) Grab the 40p test image from http://www.juneau-lug.org/zImage.initrd.sandalfoot 2) Fetch my slightly-modified ppc-for-3.1 branch from https://github.com/mcayland/qemu/tree/ppc-for-3.1-regression 3) Build QEMU with --enable-debug: ./configure --target-list=ppc-softmmu --enable-debug make 4) Boot the 40p test image: ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p You will see that the test image fails at the end of boot with messages like this: Bad inittab entry: Bad inittab entry: No more tasks for init -- sleeping forever 5) Now remove the "target/ppc: add external PID support" patch from the above branch: git checkout HEAD~1 make 6) Boot the 40p test image again: ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p You will see that the test image now boots successfully to the "Please press Enter to activate this console" message. ATB, Mark.
Hi, On 09/19/2018 08:57 AM, Mark Cave-Ayland wrote: > On 18/09/2018 22:12, Mark Cave-Ayland wrote: > >> Roman, can you reproduce this locally at all? My setup is nothing >> special, just Debian Stretch on amd64. > > I've just realised that because this bug is still dependent upon queued patches, it > would make sense for me to push a slightly modified version of David's ppc-for-3.1 > branch to github to help reproduce the issue. Below are the instructions on how to > reproduce the bug locally: > > > 1) Grab the 40p test image from http://www.juneau-lug.org/zImage.initrd.sandalfoot > > 2) Fetch my slightly-modified ppc-for-3.1 branch from > https://github.com/mcayland/qemu/tree/ppc-for-3.1-regression > > 3) Build QEMU with --enable-debug: > > ./configure --target-list=ppc-softmmu --enable-debug > make > > 4) Boot the 40p test image: > > ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p > > You will see that the test image fails at the end of boot with messages like this: > > Bad inittab entry: > Bad inittab entry: > No more tasks for init -- sleeping forever > > 5) Now remove the "target/ppc: add external PID support" patch from the above branch: > > git checkout HEAD~1 > make > > 6) Boot the 40p test image again: > > ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p > > You will see that the test image now boots successfully to the "Please press Enter to > activate this console" message. > Hm... that indeed looks bad, I will have a look at it. Thanks for notifying me, Roman
Hi, On 09/19/2018 08:57 AM, Mark Cave-Ayland wrote: > On 18/09/2018 22:12, Mark Cave-Ayland wrote: > >> Roman, can you reproduce this locally at all? My setup is nothing >> special, just Debian Stretch on amd64. Ok, so I am able to reproduce it with your image and --enable-debug, but I was not able to find the root cause, just narrow it. It seems that the `dcbz` instruction is not emulated correctly (which may lead to some garbage in inittab?). However, if I manualy inline the `helper_dcbz_common` code into `helper dcbz`, it starts to work. I just literally copy it and add `int mmu_idx = env->dmmu_idx;` at the beginning. That could be related to the `--disable-debug` flag, since the compiler will inline the code when optimizations are enabled. If you have any debugging ideas, they would be welcome. Maybe there are some helper call limitations? Anyway, I will look at it again tomorrow. Apart from that, I've found some problems in my EPID patch, I will send a fix shortly. Namely the instructions are enabled outside of Booke206, one unrelated instruction had its opcode chcnaged by mistake and the slow path for dbczep is wrong. However, these problems are not related to the problem with your image. Thanks, Roman Kapl > > I've just realised that because this bug is still dependent upon queued patches, it > would make sense for me to push a slightly modified version of David's ppc-for-3.1 > branch to github to help reproduce the issue. Below are the instructions on how to > reproduce the bug locally: > > > 1) Grab the 40p test image from http://www.juneau-lug.org/zImage.initrd.sandalfoot > > 2) Fetch my slightly-modified ppc-for-3.1 branch from > https://github.com/mcayland/qemu/tree/ppc-for-3.1-regression > > 3) Build QEMU with --enable-debug: > > ./configure --target-list=ppc-softmmu --enable-debug > make > > 4) Boot the 40p test image: > > ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p > > You will see that the test image fails at the end of boot with messages like this: > > Bad inittab entry: > Bad inittab entry: > No more tasks for init -- sleeping forever > > 5) Now remove the "target/ppc: add external PID support" patch from the above branch: > > git checkout HEAD~1 > make > > 6) Boot the 40p test image again: > > ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p > > You will see that the test image now boots successfully to the "Please press Enter to > activate this console" message. > > > ATB, > > Mark. >
On 19 September 2018 at 07:47, Roman Kapl <roman.kapl@sysgo.com> wrote: > It seems that the `dcbz` instruction is not emulated correctly (which may > lead to some garbage in inittab?). However, if I manualy inline the > `helper_dcbz_common` code into `helper dcbz`, it starts to work. This is because helper_dcbz_common() uses GETPC() to get the return address inside generated code which will be used when an exception occurs. This only works from a function called directly from generated code. If you want to abstract out into a second function, then you need: * the second function to take a retaddr argument, which it can then pass to cpu_stq_data_ra() * the top level helpers called from TCG to pass GETPC() as that retaddr parameter Incidentally, calling your secondary helper function "helper_dcbz_common" is not ideal -- the "helper_" prefix is generally used to indicate functions which are directly called from TCG generated code as helper functions (which does matter for some purposes, like this one). thanks -- PMM
On 19/09/2018 15:47, Roman Kapl wrote: >>> Roman, can you reproduce this locally at all? My setup is nothing >>> special, just Debian Stretch on amd64. > > Ok, so I am able to reproduce it with your image and --enable-debug, but I was not > able to find the root cause, just narrow it. > > It seems that the `dcbz` instruction is not emulated correctly (which may lead to > some garbage in inittab?). However, if I manualy inline the `helper_dcbz_common` code > into `helper dcbz`, it starts to work. I just literally copy it and add `int mmu_idx > = env->dmmu_idx;` at the beginning. That could be related to the `--disable-debug` > flag, since the compiler will inline the code when optimizations are enabled. > > If you have any debugging ideas, they would be welcome. Maybe there are some helper > call limitations? Anyway, I will look at it again tomorrow. Great, glad you managed to reproduce the issue. I'm not sure why the above would make any difference at all, but I quickly tried your inline test above and that also caused it to start to work for me? I'm still new at TCG so I'm not exactly sure what to suggest right now - maybe someone else will have a better idea? > Apart from that, I've found some problems in my EPID patch, I will send a fix > shortly. Namely the instructions are enabled outside of Booke206, one unrelated > instruction had its opcode chcnaged by mistake and the slow path for dbczep is wrong. > However, these problems are not related to the problem with your image. No worries. ATB, Mark.
On 19/09/2018 16:33, Peter Maydell wrote: > On 19 September 2018 at 07:47, Roman Kapl <roman.kapl@sysgo.com> wrote: >> It seems that the `dcbz` instruction is not emulated correctly (which may >> lead to some garbage in inittab?). However, if I manualy inline the >> `helper_dcbz_common` code into `helper dcbz`, it starts to work. > > This is because helper_dcbz_common() uses GETPC() to get the > return address inside generated code which will be used when > an exception occurs. This only works from a function called > directly from generated code. If you want to abstract out > into a second function, then you need: > * the second function to take a retaddr argument, which it > can then pass to cpu_stq_data_ra() > * the top level helpers called from TCG to pass GETPC() as > that retaddr parameter > > Incidentally, calling your secondary helper function > "helper_dcbz_common" is not ideal -- the "helper_" prefix > is generally used to indicate functions which are directly > called from TCG generated code as helper functions (which > does matter for some purposes, like this one). Brilliant! I had no idea about the restrictions on GETPC(). Thanks so much for helping here, Peter. ATB, Mark.
On Wed, Sep 19, 2018 at 04:47:36PM +0200, Roman Kapl wrote: > Hi, > > On 09/19/2018 08:57 AM, Mark Cave-Ayland wrote: > > On 18/09/2018 22:12, Mark Cave-Ayland wrote: > > > > > Roman, can you reproduce this locally at all? My setup is nothing > > > special, just Debian Stretch on amd64. > > Ok, so I am able to reproduce it with your image and --enable-debug, but I > was not able to find the root cause, just narrow it. > > It seems that the `dcbz` instruction is not emulated correctly (which may > lead to some garbage in inittab?). However, if I manualy inline the > `helper_dcbz_common` code into `helper dcbz`, it starts to work. I just > literally copy it and add `int mmu_idx = env->dmmu_idx;` at the beginning. > That could be related to the `--disable-debug` flag, since the compiler will > inline the code when optimizations are enabled. Ouch. That almost sounds like a compiler bug - have you tried with some different compiler versions? > If you have any debugging ideas, they would be welcome. Maybe there are some > helper call limitations? Anyway, I will look at it again tomorrow. > > Apart from that, I've found some problems in my EPID patch, I will send a > fix shortly. Namely the instructions are enabled outside of Booke206, one > unrelated instruction had its opcode chcnaged by mistake and the slow path > for dbczep is wrong. However, these problems are not related to the problem > with your image. > > Thanks, Roman Kapl > > > > > I've just realised that because this bug is still dependent upon queued patches, it > > would make sense for me to push a slightly modified version of David's ppc-for-3.1 > > branch to github to help reproduce the issue. Below are the instructions on how to > > reproduce the bug locally: > > > > > > 1) Grab the 40p test image from http://www.juneau-lug.org/zImage.initrd.sandalfoot > > > > 2) Fetch my slightly-modified ppc-for-3.1 branch from > > https://github.com/mcayland/qemu/tree/ppc-for-3.1-regression > > > > 3) Build QEMU with --enable-debug: > > > > ./configure --target-list=ppc-softmmu --enable-debug > > make > > > > 4) Boot the 40p test image: > > > > ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p > > > > You will see that the test image fails at the end of boot with messages like this: > > > > Bad inittab entry: > > Bad inittab entry: > > No more tasks for init -- sleeping forever > > > > 5) Now remove the "target/ppc: add external PID support" patch from the above branch: > > > > git checkout HEAD~1 > > make > > > > 6) Boot the 40p test image again: > > > > ./ppc-softmmu/qemu-system-ppc -cdrom zImage.initrd.sandalfoot -boot d -M 40p > > > > You will see that the test image now boots successfully to the "Please press Enter to > > activate this console" message. > > > > > > ATB, > > > > Mark. > > >
diff --git a/arch/ppc/qemu/context.c b/arch/ppc/qemu/context.c index 06e0122..5815895 100644 --- a/arch/ppc/qemu/context.c +++ b/arch/ppc/qemu/context.c @@ -111,7 +111,7 @@ static void * residual_build(uint32_t memsize, uint32_t load_base, uint32_t load_size) { residual_t *res; - const unsigned char model[] = "Qemu\0PPC\0"; + const unsigned char model[] = "IBM PPS Model 6015\0"; int i; res = malloc(sizeof(residual_t));
According to the PReP specification section 6.1.6 "System Interrupt Assignments", all PCI interrupts are routed via IRQ 15. In the case of the 40p machine this isn't quite true in that it has a routing quirk: the LSI SCSI device is always routed to IRQ 13. At least Linux and NetBSD compare the model name presented by the firmware to "IBM PPS Model 6015", and if it matches will active this quirk. In order for guest OSs to make use of the fixed IRQ routing, the model name in the residual data must be changed in OpenBIOS using the diff below: With the above OpenBIOS patch applied as well as this patchset, it is now possible to boot the sandalfoot zImage all the way through to a working userspace when using OpenBIOS. (Note: this patchset requires the changes in my previous patchset "scsi: replace lsi53c895a_create() and lsi53c810_create() functions) Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Based-on: <20180907125653.5010-1-mark.cave-ayland@ilande.co.uk> v3: - Add external IRQ to LSI SCSI device instead of hacking the PCI interrupt routing as suggested by David - Rebase onto the patches from v2 already applied to ppc-for-3.1 v2: - Add OR gate as recommended by Zoltan and implement routing quirk by hacking the raven PCI interrupt routing Mark Cave-Ayland (2): lsi53c895a: add optional external IRQ via qdev 40p: add fixed IRQ routing for LSI SCSI device hw/ppc/prep.c | 11 ++++++----- hw/scsi/lsi53c895a.c | 16 ++++++++++++++-- 2 files changed, 20 insertions(+), 7 deletions(-)