Message ID | 20210520090557.435689-1-aik@ozlabs.ru |
---|---|
State | New |
Headers | show |
Series | [qemu,v20] spapr: Implement Open Firmware client interface | expand |
On Thu, 20 May 2021, Alexey Kardashevskiy wrote: > The PAPR platform describes an OS environment that's presented by > a combination of a hypervisor and firmware. The features it specifies > require collaboration between the firmware and the hypervisor. > > Since the beginning, the runtime component of the firmware (RTAS) has > been implemented as a 20 byte shim which simply forwards it to > a hypercall implemented in qemu. The boot time firmware component is > SLOF - but a build that's specific to qemu, and has always needed to be > updated in sync with it. Even though we've managed to limit the amount > of runtime communication we need between qemu and SLOF, there's some, > and it has become increasingly awkward to handle as we've implemented > new features. > > This implements a boot time OF client interface (CI) which is > enabled by a new "x-vof" pseries machine option (stands for "Virtual Open > Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall > which implements Open Firmware Client Interface (OF CI). This allows > using a smaller stateless firmware which does not have to manage > the device tree. > > The new "vof.bin" firmware image is included with source code under > pc-bios/. It also includes RTAS blob. > > This implements a handful of CI methods just to get -kernel/-initrd > working. In particular, this implements the device tree fetching and > simple memory allocator - "claim" (an OF CI memory allocator) and updates > "/memory@0/available" to report the client about available memory. > > This implements changing some device tree properties which we know how > to deal with, the rest is ignored. To allow changes, this skips > fdt_pack() when x-vof=on as not packing the blob leaves some room for > appending. > > In absence of SLOF, this assigns phandles to device tree nodes to make > device tree traversing work. > > When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. > > This adds basic instances support which are managed by a hash map > ihandle -> [phandle]. > > Before the guest started, the used memory is: > 0..e60 - the initial firmware > 8000..10000 - stack > 400000.. - kernel > 3ea0000.. - initramdisk > > This OF CI does not implement "interpret". > > Unlike SLOF, this does not format uninitialized nvram. Instead, this > includes a disk image with pre-formatted nvram. > > With this basic support, this can only boot into kernel directly. > However this is just enough for the petitboot kernel and initradmdisk to > boot from any possible source. Note this requires reasonably recent guest > kernel with: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 > > The immediate benefit is much faster booting time which especially > crucial with fully emulated early CPU bring up environments. Also this > may come handy when/if GRUB-in-the-userspace sees light of the day. > > This separates VOF and sPAPR in a hope that VOF bits may be reused by > other POWERPC boards which do not support pSeries. > > This is coded in assumption that later on we might be adding support for > booting from QEMU backends (blockdev is the first candidate) without > devices/drivers in between as OF1275 does not require that and > it is quite easy to so. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > > The example command line is: > > /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ > -nodefaults \ > -chardev stdio,id=STDIO0,signal=off,mux=on \ > -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ > -mon id=MON0,chardev=STDIO0,mode=readline \ > -nographic \ > -vga none \ > -enable-kvm \ > -m 8G \ > -machine pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off \ > -kernel pbuild/kernel-le-guest/vmlinux \ > -initrd pb/rootfs.cpio.xz \ > -drive id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \ > -global spapr-nvram.drive=DRIVE0 \ > -snapshot \ > -smp 8,threads=8 \ > -L /home/aik/t/qemu-ppc64-bios/ \ > -trace events=qemu_trace_events \ > -d guest_errors \ > -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ > -mon chardev=SOCKET0,mode=control > > --- > Changes: > v20: > * compile vof.bin with -mcpu=power4 for better compatibility > * s/std/stw/ in entry.S to make it work on ppc32 > * fixed dt_available property to support both 32 and 64bit > * shuffled prom_args handling code > * do not enforce 32bit in MSR (again, to support 32bit platforms) > [...] > diff --git a/default-configs/devices/ppc64-softmmu.mak b/default-configs/devices/ppc64-softmmu.mak > index ae0841fa3a18..9fb201dfacfa 100644 > --- a/default-configs/devices/ppc64-softmmu.mak > +++ b/default-configs/devices/ppc64-softmmu.mak > @@ -9,3 +9,4 @@ CONFIG_POWERNV=y > # For pSeries > CONFIG_PSERIES=y > CONFIG_NVDIMM=y > +CONFIG_VOF=y > diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig > index e51e0e5e5ac6..964510dfc73d 100644 > --- a/hw/ppc/Kconfig > +++ b/hw/ppc/Kconfig > @@ -143,3 +143,6 @@ config FW_CFG_PPC > > config FDT_PPC > bool > + > +config VOF > + bool I think you should just add "select VOF" to config PSERIES section in Kconfig instead of adding it to default-configs/devices/ppc64-softmmu.mak. That should do it, it works in my updated pegasos2 patch: https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 [...] > diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S > new file mode 100644 > index 000000000000..569688714c91 > --- /dev/null > +++ b/pc-bios/vof/entry.S > @@ -0,0 +1,51 @@ > +#define LOAD32(rn, name) \ > + lis rn,name##@h; \ > + ori rn,rn,name##@l > + > +#define ENTRY(func_name) \ > + .text; \ > + .align 2; \ > + .globl .func_name; \ > + .func_name: \ > + .globl func_name; \ > + func_name: > + > +#define KVMPPC_HCALL_BASE 0xf000 > +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) > +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) > + > + . = 0x100 /* Do exactly as SLOF does */ > + > +ENTRY(_start) > +# LOAD32(%r31, 0) /* Go 32bit mode */ > +# mtmsrd %r31,0 > + LOAD32(2, __toc_start) > + b entry_c > + > +ENTRY(_prom_entry) > + LOAD32(2, __toc_start) > + stwu %r1,-112(%r1) > + stw %r31,104(%r1) > + mflr %r31 > + bl prom_entry > + nop > + mtlr %r31 > + ld %r31,104(%r1) It's getting there, now I see the first client call from the guest boot code but then it crashes on this ld opcode which apparently is 64 bit only: IN: 0x00c00214: 9421ffd0 stwu r1, -0x30(r1) 0x00c00218: 7c691b78 mr r9, r3 0x00c0021c: 7c0802a6 mflr r0 0x00c00220: 7d2903a6 mtctr r9 0x00c00224: 3d2000d9 lis r9, 0xd9 0x00c00228: 39400001 li r10, 1 0x00c0022c: 3929fc58 addi r9, r9, 0xfc58 0x00c00230: 90010034 stw r0, 0x34(r1) 0x00c00234: 38610008 addi r3, r1, 8 0x00c00238: 39000000 li r8, 0 0x00c0023c: 90810014 stw r4, 0x14(r1) 0x00c00240: 91210008 stw r9, 8(r1) 0x00c00244: 9141000c stw r10, 0xc(r1) 0x00c00248: 91410010 stw r10, 0x10(r1) 0x00c0024c: 91010018 stw r8, 0x18(r1) 0x00c00250: 4e800421 bctrl ---------------- IN: 0x0000010c: 3c400000 lis r2, 0 0x00000110: 60428e00 ori r2, r2, 0x8e00 0x00000114: 9421ff90 stwu r1, -0x70(r1) 0x00000118: 93e10068 stw r31, 0x68(r1) 0x0000011c: 7fe802a6 mflr r31 0x00000120: 4800028d bl 0x3ac [...] IN: 0x000003e4: 7c691b78 mr r9, r3 0x000003e8: 2c090000 cmpwi r9, 0 0x000003ec: 4182000c beq 0x3f8 ---------------- IN: 0x000003f0: 807f0008 lwz r3, 8(r31) 0x000003f4: 4bfffd45 bl 0x138 Raise exception at 00000144 => 00000008 (01) hypercall r3=000000000000f005 r4=000000000000fae8 r5=000000000000010c r6=0000000000000005 r7=0000000000000e80 r8=0000000000000000 r9=00000000ffffffff r10=0000000000000063 r11=000000000000fa50 r12=0000000000000040 nip=00000144 vof_finddevice "/" => ph=0x1 ---------------- IN: 0x000003f8: 60000000 nop 0x000003fc: 397f0020 addi r11, r31, 0x20 0x00000400: 800b0004 lwz r0, 4(r11) 0x00000404: 7c0803a6 mtlr r0 0x00000408: 83cbfff8 lwz r30, -8(r11) 0x0000040c: 83ebfffc lwz r31, -4(r11) 0x00000410: 7d615b78 mr r1, r11 0x00000414: 4e800020 blr invalid/unsupported opcode: 3a - 14 - 01 - 01 (ebe10068) 0000012c ---------------- IN: 0x00000124: 60000000 nop 0x00000128: 7fe803a6 mtlr r31 0x0000012c: ebe10068 ld r31, 0x68(r1) Raise exception at 0000012c => 00000060 (21) invalid/unsupported opcode: 00 - 00 - 00 - 00 (00000000) fff00700 ---------------- IN: 0xfff00700: 00000000 .byte 0x00, 0x00, 0x00, 0x00 Hopefully this is the last such opcode left before I can really test this. Do you have some info on how the stdout works in VOF? I think I'll need that to test with Linux and get output but I'm not sure what's needed on the machine side. Regards, BALATON Zoltan > + addi %r1,%r1,112 > + blr > + > +ENTRY(ci_entry) > + mr 4,3 > + LOAD32(3,KVMPPC_H_VOF_CLIENT) > + sc 1 > + blr > + > +/* This is the actual RTAS blob copied to the OS at instantiate-rtas */ > +ENTRY(hv_rtas) > + mr %r4,%r3 > + LOAD32(3,KVMPPC_H_RTAS) > + sc 1 > + blr > + .globl hv_rtas_size > +hv_rtas_size: > + .long . - hv_rtas; > diff --git a/pc-bios/vof/l.lds b/pc-bios/vof/l.lds > new file mode 100644 > index 000000000000..10b557a81f78 > --- /dev/null > +++ b/pc-bios/vof/l.lds > @@ -0,0 +1,48 @@ > +OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc") > +OUTPUT_ARCH(powerpc:common) > + > +/* set the entry point */ > +ENTRY ( __start ) > + > +SECTIONS { > + __executable_start = .; > + > + .text : { > + *(.text) > + } > + > + __etext = .; > + > + . = ALIGN(8); > + > + .data : { > + *(.data) > + *(.rodata .rodata.*) > + *(.got1) > + *(.sdata) > + *(.opd) > + } > + > + /* FIXME bss at end ??? */ > + > + . = ALIGN(8); > + __bss_start = .; > + .bss : { > + *(.sbss) *(.scommon) > + *(.dynbss) > + *(.bss) > + } > + > + . = ALIGN(8); > + __bss_end = .; > + __bss_size = (__bss_end - __bss_start); > + > + . = ALIGN(256); > + __toc_start = DEFINED (.TOC.) ? .TOC. : ADDR (.got) + 0x8000; > + .got : > + { > + *(.toc .got) > + } > + . = ALIGN(8); > + __toc_end = .; > +} > -- > 2.30.2 > >
On 21/05/2021 07:59, BALATON Zoltan wrote: > On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >> The PAPR platform describes an OS environment that's presented by >> a combination of a hypervisor and firmware. The features it specifies >> require collaboration between the firmware and the hypervisor. >> >> Since the beginning, the runtime component of the firmware (RTAS) has >> been implemented as a 20 byte shim which simply forwards it to >> a hypercall implemented in qemu. The boot time firmware component is >> SLOF - but a build that's specific to qemu, and has always needed to be >> updated in sync with it. Even though we've managed to limit the amount >> of runtime communication we need between qemu and SLOF, there's some, >> and it has become increasingly awkward to handle as we've implemented >> new features. >> >> This implements a boot time OF client interface (CI) which is >> enabled by a new "x-vof" pseries machine option (stands for "Virtual Open >> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >> which implements Open Firmware Client Interface (OF CI). This allows >> using a smaller stateless firmware which does not have to manage >> the device tree. >> >> The new "vof.bin" firmware image is included with source code under >> pc-bios/. It also includes RTAS blob. >> >> This implements a handful of CI methods just to get -kernel/-initrd >> working. In particular, this implements the device tree fetching and >> simple memory allocator - "claim" (an OF CI memory allocator) and updates >> "/memory@0/available" to report the client about available memory. >> >> This implements changing some device tree properties which we know how >> to deal with, the rest is ignored. To allow changes, this skips >> fdt_pack() when x-vof=on as not packing the blob leaves some room for >> appending. >> >> In absence of SLOF, this assigns phandles to device tree nodes to make >> device tree traversing work. >> >> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >> >> This adds basic instances support which are managed by a hash map >> ihandle -> [phandle]. >> >> Before the guest started, the used memory is: >> 0..e60 - the initial firmware >> 8000..10000 - stack >> 400000.. - kernel >> 3ea0000.. - initramdisk >> >> This OF CI does not implement "interpret". >> >> Unlike SLOF, this does not format uninitialized nvram. Instead, this >> includes a disk image with pre-formatted nvram. >> >> With this basic support, this can only boot into kernel directly. >> However this is just enough for the petitboot kernel and initradmdisk to >> boot from any possible source. Note this requires reasonably recent guest >> kernel with: >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >> >> >> The immediate benefit is much faster booting time which especially >> crucial with fully emulated early CPU bring up environments. Also this >> may come handy when/if GRUB-in-the-userspace sees light of the day. >> >> This separates VOF and sPAPR in a hope that VOF bits may be reused by >> other POWERPC boards which do not support pSeries. >> >> This is coded in assumption that later on we might be adding support for >> booting from QEMU backends (blockdev is the first candidate) without >> devices/drivers in between as OF1275 does not require that and >> it is quite easy to so. >> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> >> The example command line is: >> >> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >> -nodefaults \ >> -chardev stdio,id=STDIO0,signal=off,mux=on \ >> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >> -mon id=MON0,chardev=STDIO0,mode=readline \ >> -nographic \ >> -vga none \ >> -enable-kvm \ >> -m 8G \ >> -machine >> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >> \ >> -kernel pbuild/kernel-le-guest/vmlinux \ >> -initrd pb/rootfs.cpio.xz \ >> -drive >> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >> \ >> -global spapr-nvram.drive=DRIVE0 \ >> -snapshot \ >> -smp 8,threads=8 \ >> -L /home/aik/t/qemu-ppc64-bios/ \ >> -trace events=qemu_trace_events \ >> -d guest_errors \ >> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >> -mon chardev=SOCKET0,mode=control >> >> --- >> Changes: >> v20: >> * compile vof.bin with -mcpu=power4 for better compatibility >> * s/std/stw/ in entry.S to make it work on ppc32 >> * fixed dt_available property to support both 32 and 64bit >> * shuffled prom_args handling code >> * do not enforce 32bit in MSR (again, to support 32bit platforms) >> > > [...] > >> diff --git a/default-configs/devices/ppc64-softmmu.mak >> b/default-configs/devices/ppc64-softmmu.mak >> index ae0841fa3a18..9fb201dfacfa 100644 >> --- a/default-configs/devices/ppc64-softmmu.mak >> +++ b/default-configs/devices/ppc64-softmmu.mak >> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >> # For pSeries >> CONFIG_PSERIES=y >> CONFIG_NVDIMM=y >> +CONFIG_VOF=y >> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >> index e51e0e5e5ac6..964510dfc73d 100644 >> --- a/hw/ppc/Kconfig >> +++ b/hw/ppc/Kconfig >> @@ -143,3 +143,6 @@ config FW_CFG_PPC >> >> config FDT_PPC >> bool >> + >> +config VOF >> + bool > > I think you should just add "select VOF" to config PSERIES section in > Kconfig instead of adding it to > default-configs/devices/ppc64-softmmu.mak. oh well, can do that too. > That should do it, it works > in my updated pegasos2 patch: > > https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 > > > [...] >> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >> new file mode 100644 >> index 000000000000..569688714c91 >> --- /dev/null >> +++ b/pc-bios/vof/entry.S >> @@ -0,0 +1,51 @@ >> +#define LOAD32(rn, name) \ >> + lis rn,name##@h; \ >> + ori rn,rn,name##@l >> + >> +#define ENTRY(func_name) \ >> + .text; \ >> + .align 2; \ >> + .globl .func_name; \ >> + .func_name: \ >> + .globl func_name; \ >> + func_name: >> + >> +#define KVMPPC_HCALL_BASE 0xf000 >> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >> + >> + . = 0x100 /* Do exactly as SLOF does */ >> + >> +ENTRY(_start) >> +# LOAD32(%r31, 0) /* Go 32bit mode */ >> +# mtmsrd %r31,0 >> + LOAD32(2, __toc_start) >> + b entry_c >> + >> +ENTRY(_prom_entry) >> + LOAD32(2, __toc_start) >> + stwu %r1,-112(%r1) >> + stw %r31,104(%r1) >> + mflr %r31 >> + bl prom_entry >> + nop >> + mtlr %r31 >> + ld %r31,104(%r1) > > It's getting there, now I see the first client call from the guest boot > code but then it crashes on this ld opcode which apparently is 64 bit only: Oh right. > Hopefully this is the last such opcode left before I can really test this. Make it lwz, and test it? > Do you have some info on how the stdout works in VOF? I think I'll need > that to test with Linux and get output but I'm not sure what's needed on > the machine side. VOF opens stsout and stores the ihandle (in fdt) which the client (==kernel) uses for writing. To make it work properly, you need to hook up that instance to a device backend similar to what I have for spapr-vty: https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd39cc5dad5b6385965d This is not a part of this patch as I'm trying to keep things simpler and accessing backends from VOF is still unsettled. But there is a workaround which is trace_vof_write, I use this. Thanks,
On Fri, 21 May 2021, Alexey Kardashevskiy wrote: > On 21/05/2021 07:59, BALATON Zoltan wrote: >> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>> The PAPR platform describes an OS environment that's presented by >>> a combination of a hypervisor and firmware. The features it specifies >>> require collaboration between the firmware and the hypervisor. >>> >>> Since the beginning, the runtime component of the firmware (RTAS) has >>> been implemented as a 20 byte shim which simply forwards it to >>> a hypercall implemented in qemu. The boot time firmware component is >>> SLOF - but a build that's specific to qemu, and has always needed to be >>> updated in sync with it. Even though we've managed to limit the amount >>> of runtime communication we need between qemu and SLOF, there's some, >>> and it has become increasingly awkward to handle as we've implemented >>> new features. >>> >>> This implements a boot time OF client interface (CI) which is >>> enabled by a new "x-vof" pseries machine option (stands for "Virtual Open >>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >>> which implements Open Firmware Client Interface (OF CI). This allows >>> using a smaller stateless firmware which does not have to manage >>> the device tree. >>> >>> The new "vof.bin" firmware image is included with source code under >>> pc-bios/. It also includes RTAS blob. >>> >>> This implements a handful of CI methods just to get -kernel/-initrd >>> working. In particular, this implements the device tree fetching and >>> simple memory allocator - "claim" (an OF CI memory allocator) and updates >>> "/memory@0/available" to report the client about available memory. >>> >>> This implements changing some device tree properties which we know how >>> to deal with, the rest is ignored. To allow changes, this skips >>> fdt_pack() when x-vof=on as not packing the blob leaves some room for >>> appending. >>> >>> In absence of SLOF, this assigns phandles to device tree nodes to make >>> device tree traversing work. >>> >>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >>> >>> This adds basic instances support which are managed by a hash map >>> ihandle -> [phandle]. >>> >>> Before the guest started, the used memory is: >>> 0..e60 - the initial firmware >>> 8000..10000 - stack >>> 400000.. - kernel >>> 3ea0000.. - initramdisk >>> >>> This OF CI does not implement "interpret". >>> >>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>> includes a disk image with pre-formatted nvram. >>> >>> With this basic support, this can only boot into kernel directly. >>> However this is just enough for the petitboot kernel and initradmdisk to >>> boot from any possible source. Note this requires reasonably recent guest >>> kernel with: >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>> >>> The immediate benefit is much faster booting time which especially >>> crucial with fully emulated early CPU bring up environments. Also this >>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>> >>> This separates VOF and sPAPR in a hope that VOF bits may be reused by >>> other POWERPC boards which do not support pSeries. >>> >>> This is coded in assumption that later on we might be adding support for >>> booting from QEMU backends (blockdev is the first candidate) without >>> devices/drivers in between as OF1275 does not require that and >>> it is quite easy to so. >>> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>> --- >>> >>> The example command line is: >>> >>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>> -nodefaults \ >>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>> -nographic \ >>> -vga none \ >>> -enable-kvm \ >>> -m 8G \ >>> -machine >>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >>> \ >>> -kernel pbuild/kernel-le-guest/vmlinux \ >>> -initrd pb/rootfs.cpio.xz \ >>> -drive >>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >>> \ >>> -global spapr-nvram.drive=DRIVE0 \ >>> -snapshot \ >>> -smp 8,threads=8 \ >>> -L /home/aik/t/qemu-ppc64-bios/ \ >>> -trace events=qemu_trace_events \ >>> -d guest_errors \ >>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>> -mon chardev=SOCKET0,mode=control >>> >>> --- >>> Changes: >>> v20: >>> * compile vof.bin with -mcpu=power4 for better compatibility >>> * s/std/stw/ in entry.S to make it work on ppc32 >>> * fixed dt_available property to support both 32 and 64bit >>> * shuffled prom_args handling code >>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>> >> >> [...] >> >>> diff --git a/default-configs/devices/ppc64-softmmu.mak >>> b/default-configs/devices/ppc64-softmmu.mak >>> index ae0841fa3a18..9fb201dfacfa 100644 >>> --- a/default-configs/devices/ppc64-softmmu.mak >>> +++ b/default-configs/devices/ppc64-softmmu.mak >>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>> # For pSeries >>> CONFIG_PSERIES=y >>> CONFIG_NVDIMM=y >>> +CONFIG_VOF=y >>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>> index e51e0e5e5ac6..964510dfc73d 100644 >>> --- a/hw/ppc/Kconfig >>> +++ b/hw/ppc/Kconfig >>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>> >>> config FDT_PPC >>> bool >>> + >>> +config VOF >>> + bool >> >> I think you should just add "select VOF" to config PSERIES section in >> Kconfig instead of adding it to default-configs/devices/ppc64-softmmu.mak. > > oh well, can do that too. I think most config options should be selected by KConfig and the default config should only include machines, otherwise VOF would be added also when you don't compile PSERIES or PEGASOS2. With select in Kconfig it will be added when needed. That's why it's better to use select in this case. >> That should do it, it works in my updated pegasos2 patch: >> >> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 >> >> [...] >>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >>> new file mode 100644 >>> index 000000000000..569688714c91 >>> --- /dev/null >>> +++ b/pc-bios/vof/entry.S >>> @@ -0,0 +1,51 @@ >>> +#define LOAD32(rn, name) \ >>> + lis rn,name##@h; \ >>> + ori rn,rn,name##@l >>> + >>> +#define ENTRY(func_name) \ >>> + .text; \ >>> + .align 2; \ >>> + .globl .func_name; \ >>> + .func_name: \ >>> + .globl func_name; \ >>> + func_name: >>> + >>> +#define KVMPPC_HCALL_BASE 0xf000 >>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >>> + >>> + . = 0x100 /* Do exactly as SLOF does */ >>> + >>> +ENTRY(_start) >>> +# LOAD32(%r31, 0) /* Go 32bit mode */ >>> +# mtmsrd %r31,0 >>> + LOAD32(2, __toc_start) >>> + b entry_c >>> + >>> +ENTRY(_prom_entry) >>> + LOAD32(2, __toc_start) >>> + stwu %r1,-112(%r1) >>> + stw %r31,104(%r1) >>> + mflr %r31 >>> + bl prom_entry >>> + nop >>> + mtlr %r31 >>> + ld %r31,104(%r1) >> >> It's getting there, now I see the first client call from the guest boot >> code but then it crashes on this ld opcode which apparently is 64 bit only: > > Oh right. > > >> Hopefully this is the last such opcode left before I can really test this. > > Make it lwz, and test it? Yes, figured that out too after sending this message. Replacing with lwz works but I wonder that now you have stwu lwz do the stack offsets need adjusting too or you just waste 4 bytes now? With lwz here I found no further 64 bit opcodes and the guest boot code could walk the device tree. It failed later but I think that's because I'll need to fill more info about the machine in the device tree. I'll experiment with that but it looks like it could work at least for MorphOS. I'll have to try Linux too. >> Do you have some info on how the stdout works in VOF? I think I'll need >> that to test with Linux and get output but I'm not sure what's needed on >> the machine side. > > VOF opens stsout and stores the ihandle (in fdt) which the client (==kernel) > uses for writing. To make it work properly, you need to hook up that instance > to a device backend similar to what I have for spapr-vty: > > https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd39cc5dad5b6385965d > > This is not a part of this patch as I'm trying to keep things simpler and > accessing backends from VOF is still unsettled. But there is a workaround > which is trace_vof_write, I use this. Thanks, The above patch is about stdin but stdout seems to be added by the current vof patch. What is spapr-vty? I don't think I have something similar in pegasos2 where I just have a normal serial port created by ISASuperIO in the vt8231 model. Can I use that backend somehow or have to create some other serial device to connect to stdout? Does trace_vof_write work for stuff output by the guest? I guess that's only for things printed by VOF itself but to see Linux output do I need a stdout in VOF or it will just open the serial with its own driver and use that? So I'm not sure what's the stdout parts in the current vof patch does and if I need that for anything. I'll try to experiment with it some more but fixing the ld and Kconfig seems to be enough to get it work for me. Regards, BALATON Zoltan
On Fri, 21 May 2021, BALATON Zoltan wrote: > On Fri, 21 May 2021, Alexey Kardashevskiy wrote: >> On 21/05/2021 07:59, BALATON Zoltan wrote: >>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>> The PAPR platform describes an OS environment that's presented by >>>> a combination of a hypervisor and firmware. The features it specifies >>>> require collaboration between the firmware and the hypervisor. >>>> >>>> Since the beginning, the runtime component of the firmware (RTAS) has >>>> been implemented as a 20 byte shim which simply forwards it to >>>> a hypercall implemented in qemu. The boot time firmware component is >>>> SLOF - but a build that's specific to qemu, and has always needed to be >>>> updated in sync with it. Even though we've managed to limit the amount >>>> of runtime communication we need between qemu and SLOF, there's some, >>>> and it has become increasingly awkward to handle as we've implemented >>>> new features. >>>> >>>> This implements a boot time OF client interface (CI) which is >>>> enabled by a new "x-vof" pseries machine option (stands for "Virtual Open >>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >>>> which implements Open Firmware Client Interface (OF CI). This allows >>>> using a smaller stateless firmware which does not have to manage >>>> the device tree. >>>> >>>> The new "vof.bin" firmware image is included with source code under >>>> pc-bios/. It also includes RTAS blob. >>>> >>>> This implements a handful of CI methods just to get -kernel/-initrd >>>> working. In particular, this implements the device tree fetching and >>>> simple memory allocator - "claim" (an OF CI memory allocator) and updates >>>> "/memory@0/available" to report the client about available memory. >>>> >>>> This implements changing some device tree properties which we know how >>>> to deal with, the rest is ignored. To allow changes, this skips >>>> fdt_pack() when x-vof=on as not packing the blob leaves some room for >>>> appending. >>>> >>>> In absence of SLOF, this assigns phandles to device tree nodes to make >>>> device tree traversing work. >>>> >>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >>>> >>>> This adds basic instances support which are managed by a hash map >>>> ihandle -> [phandle]. >>>> >>>> Before the guest started, the used memory is: >>>> 0..e60 - the initial firmware >>>> 8000..10000 - stack >>>> 400000.. - kernel >>>> 3ea0000.. - initramdisk >>>> >>>> This OF CI does not implement "interpret". >>>> >>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>>> includes a disk image with pre-formatted nvram. >>>> >>>> With this basic support, this can only boot into kernel directly. >>>> However this is just enough for the petitboot kernel and initradmdisk to >>>> boot from any possible source. Note this requires reasonably recent guest >>>> kernel with: >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>>> The immediate benefit is much faster booting time which especially >>>> crucial with fully emulated early CPU bring up environments. Also this >>>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>>> >>>> This separates VOF and sPAPR in a hope that VOF bits may be reused by >>>> other POWERPC boards which do not support pSeries. >>>> >>>> This is coded in assumption that later on we might be adding support for >>>> booting from QEMU backends (blockdev is the first candidate) without >>>> devices/drivers in between as OF1275 does not require that and >>>> it is quite easy to so. >>>> >>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>> --- >>>> >>>> The example command line is: >>>> >>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>>> -nodefaults \ >>>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>>> -nographic \ >>>> -vga none \ >>>> -enable-kvm \ >>>> -m 8G \ >>>> -machine >>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >>>> \ >>>> -kernel pbuild/kernel-le-guest/vmlinux \ >>>> -initrd pb/rootfs.cpio.xz \ >>>> -drive >>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >>>> \ >>>> -global spapr-nvram.drive=DRIVE0 \ >>>> -snapshot \ >>>> -smp 8,threads=8 \ >>>> -L /home/aik/t/qemu-ppc64-bios/ \ >>>> -trace events=qemu_trace_events \ >>>> -d guest_errors \ >>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>>> -mon chardev=SOCKET0,mode=control >>>> >>>> --- >>>> Changes: >>>> v20: >>>> * compile vof.bin with -mcpu=power4 for better compatibility >>>> * s/std/stw/ in entry.S to make it work on ppc32 >>>> * fixed dt_available property to support both 32 and 64bit >>>> * shuffled prom_args handling code >>>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>>> >>> >>> [...] >>> >>>> diff --git a/default-configs/devices/ppc64-softmmu.mak >>>> b/default-configs/devices/ppc64-softmmu.mak >>>> index ae0841fa3a18..9fb201dfacfa 100644 >>>> --- a/default-configs/devices/ppc64-softmmu.mak >>>> +++ b/default-configs/devices/ppc64-softmmu.mak >>>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>>> # For pSeries >>>> CONFIG_PSERIES=y >>>> CONFIG_NVDIMM=y >>>> +CONFIG_VOF=y >>>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>>> index e51e0e5e5ac6..964510dfc73d 100644 >>>> --- a/hw/ppc/Kconfig >>>> +++ b/hw/ppc/Kconfig >>>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>>> >>>> config FDT_PPC >>>> bool >>>> + >>>> +config VOF >>>> + bool >>> >>> I think you should just add "select VOF" to config PSERIES section in >>> Kconfig instead of adding it to default-configs/devices/ppc64-softmmu.mak. >> >> oh well, can do that too. > > I think most config options should be selected by KConfig and the default > config should only include machines, otherwise VOF would be added also when > you don't compile PSERIES or PEGASOS2. With select in Kconfig it will be > added when needed. That's why it's better to use select in this case. > >>> That should do it, it works in my updated pegasos2 patch: >>> >>> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 >>> [...] >>>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >>>> new file mode 100644 >>>> index 000000000000..569688714c91 >>>> --- /dev/null >>>> +++ b/pc-bios/vof/entry.S >>>> @@ -0,0 +1,51 @@ >>>> +#define LOAD32(rn, name) \ >>>> + lis rn,name##@h; \ >>>> + ori rn,rn,name##@l >>>> + >>>> +#define ENTRY(func_name) \ >>>> + .text; \ >>>> + .align 2; \ >>>> + .globl .func_name; \ >>>> + .func_name: \ >>>> + .globl func_name; \ >>>> + func_name: >>>> + >>>> +#define KVMPPC_HCALL_BASE 0xf000 >>>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >>>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >>>> + >>>> + . = 0x100 /* Do exactly as SLOF does */ >>>> + >>>> +ENTRY(_start) >>>> +# LOAD32(%r31, 0) /* Go 32bit mode */ >>>> +# mtmsrd %r31,0 >>>> + LOAD32(2, __toc_start) >>>> + b entry_c >>>> + >>>> +ENTRY(_prom_entry) >>>> + LOAD32(2, __toc_start) >>>> + stwu %r1,-112(%r1) >>>> + stw %r31,104(%r1) >>>> + mflr %r31 >>>> + bl prom_entry >>>> + nop >>>> + mtlr %r31 >>>> + ld %r31,104(%r1) >>> >>> It's getting there, now I see the first client call from the guest boot >>> code but then it crashes on this ld opcode which apparently is 64 bit >>> only: >> >> Oh right. >> >> >>> Hopefully this is the last such opcode left before I can really test this. >> >> Make it lwz, and test it? > > Yes, figured that out too after sending this message. Replacing with lwz > works but I wonder that now you have stwu lwz do the stack offsets need > adjusting too or you just waste 4 bytes now? With lwz here I found no further > 64 bit opcodes and the guest boot code could walk the device tree. It failed > later but I think that's because I'll need to fill more info about the > machine in the device tree. I'll experiment with that but it looks like it > could work at least for MorphOS. I'll have to try Linux too. I was trying to get a linux kernel from a debian powerpc iso to do something (debian before 10.0 has Pegasos support) but I've run into the problem that the kernel is loaded at 0x400000 but the start address is at some offset from that. How do I set qemu,boot-kernel in this case? Because when I set it to the address/size where the kernel is loaded it jumps to the beginnig not the correct start address. If I set the address to the start address then size will be wrong so I don't know how to set qemu,boot-kernel in this case or is there another property to tell the start address? (Vof does not seem to check any other property and seems to assume the entry point is the same as the load address but for this linux kernel it's not.) Regards, BALATON Zoltan
On 21/05/2021 19:05, BALATON Zoltan wrote: > On Fri, 21 May 2021, Alexey Kardashevskiy wrote: >> On 21/05/2021 07:59, BALATON Zoltan wrote: >>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>> The PAPR platform describes an OS environment that's presented by >>>> a combination of a hypervisor and firmware. The features it specifies >>>> require collaboration between the firmware and the hypervisor. >>>> >>>> Since the beginning, the runtime component of the firmware (RTAS) has >>>> been implemented as a 20 byte shim which simply forwards it to >>>> a hypercall implemented in qemu. The boot time firmware component is >>>> SLOF - but a build that's specific to qemu, and has always needed to be >>>> updated in sync with it. Even though we've managed to limit the amount >>>> of runtime communication we need between qemu and SLOF, there's some, >>>> and it has become increasingly awkward to handle as we've implemented >>>> new features. >>>> >>>> This implements a boot time OF client interface (CI) which is >>>> enabled by a new "x-vof" pseries machine option (stands for "Virtual >>>> Open >>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >>>> which implements Open Firmware Client Interface (OF CI). This allows >>>> using a smaller stateless firmware which does not have to manage >>>> the device tree. >>>> >>>> The new "vof.bin" firmware image is included with source code under >>>> pc-bios/. It also includes RTAS blob. >>>> >>>> This implements a handful of CI methods just to get -kernel/-initrd >>>> working. In particular, this implements the device tree fetching and >>>> simple memory allocator - "claim" (an OF CI memory allocator) and >>>> updates >>>> "/memory@0/available" to report the client about available memory. >>>> >>>> This implements changing some device tree properties which we know how >>>> to deal with, the rest is ignored. To allow changes, this skips >>>> fdt_pack() when x-vof=on as not packing the blob leaves some room for >>>> appending. >>>> >>>> In absence of SLOF, this assigns phandles to device tree nodes to make >>>> device tree traversing work. >>>> >>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >>>> >>>> This adds basic instances support which are managed by a hash map >>>> ihandle -> [phandle]. >>>> >>>> Before the guest started, the used memory is: >>>> 0..e60 - the initial firmware >>>> 8000..10000 - stack >>>> 400000.. - kernel >>>> 3ea0000.. - initramdisk >>>> >>>> This OF CI does not implement "interpret". >>>> >>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>>> includes a disk image with pre-formatted nvram. >>>> >>>> With this basic support, this can only boot into kernel directly. >>>> However this is just enough for the petitboot kernel and >>>> initradmdisk to >>>> boot from any possible source. Note this requires reasonably recent >>>> guest >>>> kernel with: >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>>> >>>> The immediate benefit is much faster booting time which especially >>>> crucial with fully emulated early CPU bring up environments. Also this >>>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>>> >>>> This separates VOF and sPAPR in a hope that VOF bits may be reused by >>>> other POWERPC boards which do not support pSeries. >>>> >>>> This is coded in assumption that later on we might be adding support >>>> for >>>> booting from QEMU backends (blockdev is the first candidate) without >>>> devices/drivers in between as OF1275 does not require that and >>>> it is quite easy to so. >>>> >>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>> --- >>>> >>>> The example command line is: >>>> >>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>>> -nodefaults \ >>>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>>> -nographic \ >>>> -vga none \ >>>> -enable-kvm \ >>>> -m 8G \ >>>> -machine >>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >>>> \ >>>> -kernel pbuild/kernel-le-guest/vmlinux \ >>>> -initrd pb/rootfs.cpio.xz \ >>>> -drive >>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >>>> \ >>>> -global spapr-nvram.drive=DRIVE0 \ >>>> -snapshot \ >>>> -smp 8,threads=8 \ >>>> -L /home/aik/t/qemu-ppc64-bios/ \ >>>> -trace events=qemu_trace_events \ >>>> -d guest_errors \ >>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>>> -mon chardev=SOCKET0,mode=control >>>> >>>> --- >>>> Changes: >>>> v20: >>>> * compile vof.bin with -mcpu=power4 for better compatibility >>>> * s/std/stw/ in entry.S to make it work on ppc32 >>>> * fixed dt_available property to support both 32 and 64bit >>>> * shuffled prom_args handling code >>>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>>> >>> >>> [...] >>> >>>> diff --git a/default-configs/devices/ppc64-softmmu.mak >>>> b/default-configs/devices/ppc64-softmmu.mak >>>> index ae0841fa3a18..9fb201dfacfa 100644 >>>> --- a/default-configs/devices/ppc64-softmmu.mak >>>> +++ b/default-configs/devices/ppc64-softmmu.mak >>>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>>> # For pSeries >>>> CONFIG_PSERIES=y >>>> CONFIG_NVDIMM=y >>>> +CONFIG_VOF=y >>>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>>> index e51e0e5e5ac6..964510dfc73d 100644 >>>> --- a/hw/ppc/Kconfig >>>> +++ b/hw/ppc/Kconfig >>>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>>> >>>> config FDT_PPC >>>> bool >>>> + >>>> +config VOF >>>> + bool >>> >>> I think you should just add "select VOF" to config PSERIES section in >>> Kconfig instead of adding it to >>> default-configs/devices/ppc64-softmmu.mak. >> >> oh well, can do that too. > > I think most config options should be selected by KConfig and the > default config should only include machines, otherwise VOF would be > added also when you don't compile PSERIES or PEGASOS2. With select in > Kconfig it will be added when needed. That's why it's better to use > select in this case. > >>> That should do it, it works in my updated pegasos2 patch: >>> >>> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 >>> >>> [...] >>>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >>>> new file mode 100644 >>>> index 000000000000..569688714c91 >>>> --- /dev/null >>>> +++ b/pc-bios/vof/entry.S >>>> @@ -0,0 +1,51 @@ >>>> +#define LOAD32(rn, name) \ >>>> + lis rn,name##@h; \ >>>> + ori rn,rn,name##@l >>>> + >>>> +#define ENTRY(func_name) \ >>>> + .text; \ >>>> + .align 2; \ >>>> + .globl .func_name; \ >>>> + .func_name: \ >>>> + .globl func_name; \ >>>> + func_name: >>>> + >>>> +#define KVMPPC_HCALL_BASE 0xf000 >>>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >>>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >>>> + >>>> + . = 0x100 /* Do exactly as SLOF does */ >>>> + >>>> +ENTRY(_start) >>>> +# LOAD32(%r31, 0) /* Go 32bit mode */ >>>> +# mtmsrd %r31,0 >>>> + LOAD32(2, __toc_start) >>>> + b entry_c >>>> + >>>> +ENTRY(_prom_entry) >>>> + LOAD32(2, __toc_start) >>>> + stwu %r1,-112(%r1) >>>> + stw %r31,104(%r1) >>>> + mflr %r31 >>>> + bl prom_entry >>>> + nop >>>> + mtlr %r31 >>>> + ld %r31,104(%r1) >>> >>> It's getting there, now I see the first client call from the guest >>> boot code but then it crashes on this ld opcode which apparently is >>> 64 bit only: >> >> Oh right. >> >> >>> Hopefully this is the last such opcode left before I can really test >>> this. >> >> Make it lwz, and test it? > > Yes, figured that out too after sending this message. Replacing with lwz > works but I wonder that now you have stwu lwz do the stack offsets need > adjusting too or you just waste 4 bytes now? Well, this assumes the 64bit client and that ABI. I think ideally the firmware is supposed to use its own stack but I did not bother here. I do not know 32bit ABI at all so say whether the existing code should just work or not :-/ > With lwz here I found no > further 64 bit opcodes and the guest boot code could walk the device > tree. It failed later but I think that's because I'll need to fill more > info about the machine in the device tree. I'll experiment with that but > it looks like it could work at least for MorphOS. I'll have to try Linux > too. There are plenty of tracepoints, enable them all. > >>> Do you have some info on how the stdout works in VOF? I think I'll >>> need that to test with Linux and get output but I'm not sure what's >>> needed on the machine side. >> >> VOF opens stsout and stores the ihandle (in fdt) which the client >> (==kernel) uses for writing. To make it work properly, you need to >> hook up that instance to a device backend similar to what I have for >> spapr-vty: >> >> https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd39cc5dad5b6385965d >> >> >> This is not a part of this patch as I'm trying to keep things simpler >> and accessing backends from VOF is still unsettled. But there is a >> workaround which is trace_vof_write, I use this. Thanks, > > The above patch is about stdin but stdout seems to be added by the > current vof patch. What is spapr-vty? It is pseries' paravirtual serial device, pegasos does not have it. > I don't think I have something > similar in pegasos2 where I just have a normal serial port created by > ISASuperIO in the vt8231 model. Correct. > Can I use that backend somehow or have > to create some other serial device to connect to stdout? > Does > trace_vof_write work for stuff output by the guest? > I guess that's only > for things printed by VOF itself VOF itself does not prints anything in this patch. > but to see Linux output do I need a > stdout in VOF > or it will just open the serial with its own driver and > use that? > So I'm not sure what's the stdout parts in the current vof > patch does and if I need that for anything. I'll try to experiment with > it some more but fixing the ld and Kconfig seems to be enough to get it > work for me. So for the client to print something, /chosen/stdout needs to have a valid ihandle. The only way to get a valid ihandle is having a valid phandle which vof_client_open() can open. A valid phandle is a phandle of any node in the device tree. On spapr we pick some spapr-vty, open it and store in /chosen/stdout. From this point output from the client can be seen via a tracepoint. Now if we want proper output without tracepoints - we need to hook it up with some chardev backend (not a device such a vt8231 or spapr-vty but backend). https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd3 does this: 1. when a phandle is open, QEMU will search for DeviceState* for the specific FDT node and get a chardev from the device. 2. when write() is called, QEMU calls qemu_chr_fe_write_all() on chardev from 1. From this point you do not need a tracepoint and the output will appears in the console you set up for stdout. Now if you want input from this console, things get tricky. First, on powernv/pseries we only need this for grub as otherwise the kernel has all the drivers needed and will not use the client interface. For the grub, we need to provide a valid ihandle for /chosen/stdin which is easy but implementing read() on this is not as there is no simple device-type-independend way of reading from chardev. I hacked it for spapr-tvy but other serial devices will need special handling, or we'll have to introduce some VOF_SERIAL_READ interface for those which will face opposition :) Makes sense?
On 22/05/2021 05:57, BALATON Zoltan wrote: > On Fri, 21 May 2021, BALATON Zoltan wrote: >> On Fri, 21 May 2021, Alexey Kardashevskiy wrote: >>> On 21/05/2021 07:59, BALATON Zoltan wrote: >>>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>>> The PAPR platform describes an OS environment that's presented by >>>>> a combination of a hypervisor and firmware. The features it specifies >>>>> require collaboration between the firmware and the hypervisor. >>>>> >>>>> Since the beginning, the runtime component of the firmware (RTAS) has >>>>> been implemented as a 20 byte shim which simply forwards it to >>>>> a hypercall implemented in qemu. The boot time firmware component is >>>>> SLOF - but a build that's specific to qemu, and has always needed >>>>> to be >>>>> updated in sync with it. Even though we've managed to limit the amount >>>>> of runtime communication we need between qemu and SLOF, there's some, >>>>> and it has become increasingly awkward to handle as we've implemented >>>>> new features. >>>>> >>>>> This implements a boot time OF client interface (CI) which is >>>>> enabled by a new "x-vof" pseries machine option (stands for >>>>> "Virtual Open >>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >>>>> which implements Open Firmware Client Interface (OF CI). This allows >>>>> using a smaller stateless firmware which does not have to manage >>>>> the device tree. >>>>> >>>>> The new "vof.bin" firmware image is included with source code under >>>>> pc-bios/. It also includes RTAS blob. >>>>> >>>>> This implements a handful of CI methods just to get -kernel/-initrd >>>>> working. In particular, this implements the device tree fetching and >>>>> simple memory allocator - "claim" (an OF CI memory allocator) and >>>>> updates >>>>> "/memory@0/available" to report the client about available memory. >>>>> >>>>> This implements changing some device tree properties which we know how >>>>> to deal with, the rest is ignored. To allow changes, this skips >>>>> fdt_pack() when x-vof=on as not packing the blob leaves some room for >>>>> appending. >>>>> >>>>> In absence of SLOF, this assigns phandles to device tree nodes to make >>>>> device tree traversing work. >>>>> >>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >>>>> >>>>> This adds basic instances support which are managed by a hash map >>>>> ihandle -> [phandle]. >>>>> >>>>> Before the guest started, the used memory is: >>>>> 0..e60 - the initial firmware >>>>> 8000..10000 - stack >>>>> 400000.. - kernel >>>>> 3ea0000.. - initramdisk >>>>> >>>>> This OF CI does not implement "interpret". >>>>> >>>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>>>> includes a disk image with pre-formatted nvram. >>>>> >>>>> With this basic support, this can only boot into kernel directly. >>>>> However this is just enough for the petitboot kernel and >>>>> initradmdisk to >>>>> boot from any possible source. Note this requires reasonably recent >>>>> guest >>>>> kernel with: >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>>>> The immediate benefit is much faster booting time which especially >>>>> crucial with fully emulated early CPU bring up environments. Also this >>>>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>>>> >>>>> This separates VOF and sPAPR in a hope that VOF bits may be reused by >>>>> other POWERPC boards which do not support pSeries. >>>>> >>>>> This is coded in assumption that later on we might be adding >>>>> support for >>>>> booting from QEMU backends (blockdev is the first candidate) without >>>>> devices/drivers in between as OF1275 does not require that and >>>>> it is quite easy to so. >>>>> >>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>> --- >>>>> >>>>> The example command line is: >>>>> >>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>>>> -nodefaults \ >>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>>>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>>>> -nographic \ >>>>> -vga none \ >>>>> -enable-kvm \ >>>>> -m 8G \ >>>>> -machine >>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >>>>> \ >>>>> -kernel pbuild/kernel-le-guest/vmlinux \ >>>>> -initrd pb/rootfs.cpio.xz \ >>>>> -drive >>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >>>>> \ >>>>> -global spapr-nvram.drive=DRIVE0 \ >>>>> -snapshot \ >>>>> -smp 8,threads=8 \ >>>>> -L /home/aik/t/qemu-ppc64-bios/ \ >>>>> -trace events=qemu_trace_events \ >>>>> -d guest_errors \ >>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>>>> -mon chardev=SOCKET0,mode=control >>>>> >>>>> --- >>>>> Changes: >>>>> v20: >>>>> * compile vof.bin with -mcpu=power4 for better compatibility >>>>> * s/std/stw/ in entry.S to make it work on ppc32 >>>>> * fixed dt_available property to support both 32 and 64bit >>>>> * shuffled prom_args handling code >>>>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>>>> >>>> >>>> [...] >>>> >>>>> diff --git a/default-configs/devices/ppc64-softmmu.mak >>>>> b/default-configs/devices/ppc64-softmmu.mak >>>>> index ae0841fa3a18..9fb201dfacfa 100644 >>>>> --- a/default-configs/devices/ppc64-softmmu.mak >>>>> +++ b/default-configs/devices/ppc64-softmmu.mak >>>>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>>>> # For pSeries >>>>> CONFIG_PSERIES=y >>>>> CONFIG_NVDIMM=y >>>>> +CONFIG_VOF=y >>>>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>>>> index e51e0e5e5ac6..964510dfc73d 100644 >>>>> --- a/hw/ppc/Kconfig >>>>> +++ b/hw/ppc/Kconfig >>>>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>>>> >>>>> config FDT_PPC >>>>> bool >>>>> + >>>>> +config VOF >>>>> + bool >>>> >>>> I think you should just add "select VOF" to config PSERIES section >>>> in Kconfig instead of adding it to >>>> default-configs/devices/ppc64-softmmu.mak. >>> >>> oh well, can do that too. >> >> I think most config options should be selected by KConfig and the >> default config should only include machines, otherwise VOF would be >> added also when you don't compile PSERIES or PEGASOS2. With select in >> Kconfig it will be added when needed. That's why it's better to use >> select in this case. >> >>>> That should do it, it works in my updated pegasos2 patch: >>>> >>>> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 >>>> [...] >>>>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >>>>> new file mode 100644 >>>>> index 000000000000..569688714c91 >>>>> --- /dev/null >>>>> +++ b/pc-bios/vof/entry.S >>>>> @@ -0,0 +1,51 @@ >>>>> +#define LOAD32(rn, name) \ >>>>> + lis rn,name##@h; \ >>>>> + ori rn,rn,name##@l >>>>> + >>>>> +#define ENTRY(func_name) \ >>>>> + .text; \ >>>>> + .align 2; \ >>>>> + .globl .func_name; \ >>>>> + .func_name: \ >>>>> + .globl func_name; \ >>>>> + func_name: >>>>> + >>>>> +#define KVMPPC_HCALL_BASE 0xf000 >>>>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >>>>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >>>>> + >>>>> + . = 0x100 /* Do exactly as SLOF does */ >>>>> + >>>>> +ENTRY(_start) >>>>> +# LOAD32(%r31, 0) /* Go 32bit mode */ >>>>> +# mtmsrd %r31,0 >>>>> + LOAD32(2, __toc_start) >>>>> + b entry_c >>>>> + >>>>> +ENTRY(_prom_entry) >>>>> + LOAD32(2, __toc_start) >>>>> + stwu %r1,-112(%r1) >>>>> + stw %r31,104(%r1) >>>>> + mflr %r31 >>>>> + bl prom_entry >>>>> + nop >>>>> + mtlr %r31 >>>>> + ld %r31,104(%r1) >>>> >>>> It's getting there, now I see the first client call from the guest >>>> boot code but then it crashes on this ld opcode which apparently is >>>> 64 bit only: >>> >>> Oh right. >>> >>> >>>> Hopefully this is the last such opcode left before I can really test >>>> this. >>> >>> Make it lwz, and test it? >> >> Yes, figured that out too after sending this message. Replacing with >> lwz works but I wonder that now you have stwu lwz do the stack offsets >> need adjusting too or you just waste 4 bytes now? With lwz here I >> found no further 64 bit opcodes and the guest boot code could walk the >> device tree. It failed later but I think that's because I'll need to >> fill more info about the machine in the device tree. I'll experiment >> with that but it looks like it could work at least for MorphOS. I'll >> have to try Linux too. > > I was trying to get a linux kernel from a debian powerpc iso to do > something (debian before 10.0 has Pegasos support) but I've run into the > problem that the kernel is loaded at 0x400000 but the start address is > at some offset from that. How do I set qemu,boot-kernel in this case? The pseries kernel can work from any location (and it relocates itself to 0 at some point) even though it is linked at c000.0000.0000.0000, and there is no start address offset: === > objdump -D ~/pbuild/kernel-le/vmlinux /home/aik/pbuild/kernel-le/vmlinux: file format elf64-powerpcle Disassembly of section .head.text: c000000000000000 <__start>: c000000000000000: 48 00 00 08 tdi 0,r0,72 c000000000000004: 2c 00 00 48 b c000000000000030 <__start+0x30> ... === Not sure about pegasos2 kernels (or any ppc32 really), sorry. > Because when I set it to the address/size where the kernel is loaded it > jumps to the beginnig not the correct start address. If I set the > address to the start address then size will be wrong so I don't know how > to set qemu,boot-kernel in this case or is there another property to > tell the start address? > (Vof does not seem to check any other property > and seems to assume the entry point is the same as the load address but > for this linux kernel it's not.) I guess if you really need an offset, you'll have to add a new property ("qemu,boot-kernel-start"?) and look for it in the firmware. Or, say, put in gpr5 in your version of spapr_cpu_set_entry_state() and make boot_from_memory() use it.
On Sat, 22 May 2021, Alexey Kardashevskiy wrote: > On 21/05/2021 19:05, BALATON Zoltan wrote: >> On Fri, 21 May 2021, Alexey Kardashevskiy wrote: >>> On 21/05/2021 07:59, BALATON Zoltan wrote: >>>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>>> The PAPR platform describes an OS environment that's presented by >>>>> a combination of a hypervisor and firmware. The features it specifies >>>>> require collaboration between the firmware and the hypervisor. >>>>> >>>>> Since the beginning, the runtime component of the firmware (RTAS) has >>>>> been implemented as a 20 byte shim which simply forwards it to >>>>> a hypercall implemented in qemu. The boot time firmware component is >>>>> SLOF - but a build that's specific to qemu, and has always needed to be >>>>> updated in sync with it. Even though we've managed to limit the amount >>>>> of runtime communication we need between qemu and SLOF, there's some, >>>>> and it has become increasingly awkward to handle as we've implemented >>>>> new features. >>>>> >>>>> This implements a boot time OF client interface (CI) which is >>>>> enabled by a new "x-vof" pseries machine option (stands for "Virtual >>>>> Open >>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >>>>> which implements Open Firmware Client Interface (OF CI). This allows >>>>> using a smaller stateless firmware which does not have to manage >>>>> the device tree. >>>>> >>>>> The new "vof.bin" firmware image is included with source code under >>>>> pc-bios/. It also includes RTAS blob. >>>>> >>>>> This implements a handful of CI methods just to get -kernel/-initrd >>>>> working. In particular, this implements the device tree fetching and >>>>> simple memory allocator - "claim" (an OF CI memory allocator) and >>>>> updates >>>>> "/memory@0/available" to report the client about available memory. >>>>> >>>>> This implements changing some device tree properties which we know how >>>>> to deal with, the rest is ignored. To allow changes, this skips >>>>> fdt_pack() when x-vof=on as not packing the blob leaves some room for >>>>> appending. >>>>> >>>>> In absence of SLOF, this assigns phandles to device tree nodes to make >>>>> device tree traversing work. >>>>> >>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >>>>> >>>>> This adds basic instances support which are managed by a hash map >>>>> ihandle -> [phandle]. >>>>> >>>>> Before the guest started, the used memory is: >>>>> 0..e60 - the initial firmware >>>>> 8000..10000 - stack >>>>> 400000.. - kernel >>>>> 3ea0000.. - initramdisk >>>>> >>>>> This OF CI does not implement "interpret". >>>>> >>>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>>>> includes a disk image with pre-formatted nvram. >>>>> >>>>> With this basic support, this can only boot into kernel directly. >>>>> However this is just enough for the petitboot kernel and initradmdisk to >>>>> boot from any possible source. Note this requires reasonably recent >>>>> guest >>>>> kernel with: >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>>>> The immediate benefit is much faster booting time which especially >>>>> crucial with fully emulated early CPU bring up environments. Also this >>>>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>>>> >>>>> This separates VOF and sPAPR in a hope that VOF bits may be reused by >>>>> other POWERPC boards which do not support pSeries. >>>>> >>>>> This is coded in assumption that later on we might be adding support for >>>>> booting from QEMU backends (blockdev is the first candidate) without >>>>> devices/drivers in between as OF1275 does not require that and >>>>> it is quite easy to so. >>>>> >>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>> --- >>>>> >>>>> The example command line is: >>>>> >>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>>>> -nodefaults \ >>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>>>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>>>> -nographic \ >>>>> -vga none \ >>>>> -enable-kvm \ >>>>> -m 8G \ >>>>> -machine >>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >>>>> \ >>>>> -kernel pbuild/kernel-le-guest/vmlinux \ >>>>> -initrd pb/rootfs.cpio.xz \ >>>>> -drive >>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >>>>> \ >>>>> -global spapr-nvram.drive=DRIVE0 \ >>>>> -snapshot \ >>>>> -smp 8,threads=8 \ >>>>> -L /home/aik/t/qemu-ppc64-bios/ \ >>>>> -trace events=qemu_trace_events \ >>>>> -d guest_errors \ >>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>>>> -mon chardev=SOCKET0,mode=control >>>>> >>>>> --- >>>>> Changes: >>>>> v20: >>>>> * compile vof.bin with -mcpu=power4 for better compatibility >>>>> * s/std/stw/ in entry.S to make it work on ppc32 >>>>> * fixed dt_available property to support both 32 and 64bit >>>>> * shuffled prom_args handling code >>>>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>>>> >>>> >>>> [...] >>>> >>>>> diff --git a/default-configs/devices/ppc64-softmmu.mak >>>>> b/default-configs/devices/ppc64-softmmu.mak >>>>> index ae0841fa3a18..9fb201dfacfa 100644 >>>>> --- a/default-configs/devices/ppc64-softmmu.mak >>>>> +++ b/default-configs/devices/ppc64-softmmu.mak >>>>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>>>> # For pSeries >>>>> CONFIG_PSERIES=y >>>>> CONFIG_NVDIMM=y >>>>> +CONFIG_VOF=y >>>>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>>>> index e51e0e5e5ac6..964510dfc73d 100644 >>>>> --- a/hw/ppc/Kconfig >>>>> +++ b/hw/ppc/Kconfig >>>>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>>>> >>>>> config FDT_PPC >>>>> bool >>>>> + >>>>> +config VOF >>>>> + bool >>>> >>>> I think you should just add "select VOF" to config PSERIES section in >>>> Kconfig instead of adding it to >>>> default-configs/devices/ppc64-softmmu.mak. >>> >>> oh well, can do that too. >> >> I think most config options should be selected by KConfig and the default >> config should only include machines, otherwise VOF would be added also when >> you don't compile PSERIES or PEGASOS2. With select in Kconfig it will be >> added when needed. That's why it's better to use select in this case. >> >>>> That should do it, it works in my updated pegasos2 patch: >>>> >>>> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 >>>> [...] >>>>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >>>>> new file mode 100644 >>>>> index 000000000000..569688714c91 >>>>> --- /dev/null >>>>> +++ b/pc-bios/vof/entry.S >>>>> @@ -0,0 +1,51 @@ >>>>> +#define LOAD32(rn, name) \ >>>>> + lis rn,name##@h; \ >>>>> + ori rn,rn,name##@l >>>>> + >>>>> +#define ENTRY(func_name) \ >>>>> + .text; \ >>>>> + .align 2; \ >>>>> + .globl .func_name; \ >>>>> + .func_name: \ >>>>> + .globl func_name; \ >>>>> + func_name: >>>>> + >>>>> +#define KVMPPC_HCALL_BASE 0xf000 >>>>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >>>>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >>>>> + >>>>> + . = 0x100 /* Do exactly as SLOF does */ >>>>> + >>>>> +ENTRY(_start) >>>>> +# LOAD32(%r31, 0) /* Go 32bit mode */ >>>>> +# mtmsrd %r31,0 >>>>> + LOAD32(2, __toc_start) >>>>> + b entry_c >>>>> + >>>>> +ENTRY(_prom_entry) >>>>> + LOAD32(2, __toc_start) >>>>> + stwu %r1,-112(%r1) >>>>> + stw %r31,104(%r1) >>>>> + mflr %r31 >>>>> + bl prom_entry >>>>> + nop >>>>> + mtlr %r31 >>>>> + ld %r31,104(%r1) >>>> >>>> It's getting there, now I see the first client call from the guest boot >>>> code but then it crashes on this ld opcode which apparently is 64 bit >>>> only: >>> >>> Oh right. >>> >>> >>>> Hopefully this is the last such opcode left before I can really test >>>> this. >>> >>> Make it lwz, and test it? >> >> Yes, figured that out too after sending this message. Replacing with lwz >> works but I wonder that now you have stwu lwz do the stack offsets need >> adjusting too or you just waste 4 bytes now? > > Well, this assumes the 64bit client and that ABI. I think ideally the > firmware is supposed to use its own stack but I did not bother here. I do not > know 32bit ABI at all so say whether the existing code should just work or > not :-/ It seems to work so that's OK, just thought if the firmware is 32 bit it does not need 64 bit values on stack but if that's also potentially used by a 64 bit kernel then it may be better to keep it that way to avoid confusion. With the 64 bit opcodes replaced it seems to work on pegasos2 and the guest can call CI functions and get a reply so maybe it's just a few wasted bytes that's not a big deal. >> With lwz here I found no further 64 bit opcodes and the guest boot code >> could walk the device tree. It failed later but I think that's because I'll >> need to fill more info about the machine in the device tree. I'll >> experiment with that but it looks like it could work at least for MorphOS. >> I'll have to try Linux too. > > There are plenty of tracepoints, enable them all. I'm running with -trace enable="vof*" but it does not give me too much info as a lot of calls (such as peer, child, etc.) don't log anything other than there was a hypercall so only get info about opening paths and querying some props. The MorphOS boot.img just walks the device tree gathering some data about the machine then calls quiesce and boot into the OS that later tries to use the gathered info at which point it crashes without any logs if some info is not as expected. This does not make it easy to debug but I think once I fill the device tree enough with all needed info it should work. Currently I'm missing info about PCI devices that it may need. >>>> Do you have some info on how the stdout works in VOF? I think I'll need >>>> that to test with Linux and get output but I'm not sure what's needed on >>>> the machine side. >>> >>> VOF opens stsout and stores the ihandle (in fdt) which the client >>> (==kernel) uses for writing. To make it work properly, you need to hook up >>> that instance to a device backend similar to what I have for spapr-vty: >>> >>> https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd39cc5dad5b6385965d >>> >>> This is not a part of this patch as I'm trying to keep things simpler and >>> accessing backends from VOF is still unsettled. But there is a workaround >>> which is trace_vof_write, I use this. Thanks, >> >> The above patch is about stdin but stdout seems to be added by the current >> vof patch. What is spapr-vty? > > It is pseries' paravirtual serial device, pegasos does not have it. > >> I don't think I have something similar in pegasos2 where I just have a >> normal serial port created by ISASuperIO in the vt8231 model. > > Correct. > >> Can I use that backend somehow or have to create some other serial device >> to connect to stdout? >> Does trace_vof_write work for stuff output by the guest? >> I guess that's only for things printed by VOF itself > > VOF itself does not prints anything in this patch. However it seems to be needed for linux as the first thing it does seems to be getting /chosen/stdout and calls exit if it returns nothing. So I'll need this at least for linux. (I think MorphOS may also query it to print a banner or some messages but not sure it needs it, at least it does not abort right away if not found.) >> but to see Linux output do I need a stdout in VOF or it will just open the >> serial with its own driver and use that? >> So I'm not sure what's the stdout parts in the current vof patch does and >> if I need that for anything. I'll try to experiment with it some more but >> fixing the ld and Kconfig seems to be enough to get it work for me. > > So for the client to print something, /chosen/stdout needs to have a valid > ihandle. > The only way to get a valid ihandle is having a valid phandle which > vof_client_open() can open. > A valid phandle is a phandle of any node in the device tree. On spapr we pick > some spapr-vty, open it and store in /chosen/stdout. > > From this point output from the client can be seen via a tracepoint. > > Now if we want proper output without tracepoints - we need to hook it up with > some chardev backend (not a device such a vt8231 or spapr-vty but backend). I don't know much about it but devices are also connected to some backend so is it possible to use the same backend for VOF as used for the normal serial port? But I need a way to find that and connect it to VOF and I'm not qure how to do that yet. Or do I need to create a separate serial backend and connect that to VOF? I'll try to look at spapr-vty to see what it does. > https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd3 does this: > 1. when a phandle is open, QEMU will search for DeviceState* for the specific > FDT node and get a chardev from the device. > 2. when write() is called, QEMU calls qemu_chr_fe_write_all() on chardev from > 1. > > From this point you do not need a tracepoint and the output will appears in > the console you set up for stdout. > > Now if you want input from this console, things get tricky. First, on > powernv/pseries we only need this for grub as otherwise the kernel has all > the drivers needed and will not use the client interface. For the grub, we > need to provide a valid ihandle for /chosen/stdin which is easy but > implementing read() on this is not as there is no simple > device-type-independend way of reading from chardev. I hacked it for > spapr-tvy but other serial devices will need special handling, or we'll have > to introduce some VOF_SERIAL_READ interface for those which will face > opposition :) > > Makes sense? It explains things a bit but still not entirely clear how can I get something to add as a stdout. With the pegasos2 firmware it puts the serial device there normally that it inits and opens. Without that firmware we have to somehow do that from QEMU so find the serial backend used by the serial device within the vt8231 model (or use a different backend just for this?) then open it and put it in the device tree. If that's correct or how to do it is not clear yet. Regards. BALATON Zoltan
On Sat, 22 May 2021, Alexey Kardashevskiy wrote: > On 22/05/2021 05:57, BALATON Zoltan wrote: >> On Fri, 21 May 2021, BALATON Zoltan wrote: >>> On Fri, 21 May 2021, Alexey Kardashevskiy wrote: >>>> On 21/05/2021 07:59, BALATON Zoltan wrote: >>>>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>>>> The PAPR platform describes an OS environment that's presented by >>>>>> a combination of a hypervisor and firmware. The features it specifies >>>>>> require collaboration between the firmware and the hypervisor. >>>>>> >>>>>> Since the beginning, the runtime component of the firmware (RTAS) has >>>>>> been implemented as a 20 byte shim which simply forwards it to >>>>>> a hypercall implemented in qemu. The boot time firmware component is >>>>>> SLOF - but a build that's specific to qemu, and has always needed to be >>>>>> updated in sync with it. Even though we've managed to limit the amount >>>>>> of runtime communication we need between qemu and SLOF, there's some, >>>>>> and it has become increasingly awkward to handle as we've implemented >>>>>> new features. >>>>>> >>>>>> This implements a boot time OF client interface (CI) which is >>>>>> enabled by a new "x-vof" pseries machine option (stands for "Virtual >>>>>> Open >>>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >>>>>> which implements Open Firmware Client Interface (OF CI). This allows >>>>>> using a smaller stateless firmware which does not have to manage >>>>>> the device tree. >>>>>> >>>>>> The new "vof.bin" firmware image is included with source code under >>>>>> pc-bios/. It also includes RTAS blob. >>>>>> >>>>>> This implements a handful of CI methods just to get -kernel/-initrd >>>>>> working. In particular, this implements the device tree fetching and >>>>>> simple memory allocator - "claim" (an OF CI memory allocator) and >>>>>> updates >>>>>> "/memory@0/available" to report the client about available memory. >>>>>> >>>>>> This implements changing some device tree properties which we know how >>>>>> to deal with, the rest is ignored. To allow changes, this skips >>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some room for >>>>>> appending. >>>>>> >>>>>> In absence of SLOF, this assigns phandles to device tree nodes to make >>>>>> device tree traversing work. >>>>>> >>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >>>>>> >>>>>> This adds basic instances support which are managed by a hash map >>>>>> ihandle -> [phandle]. >>>>>> >>>>>> Before the guest started, the used memory is: >>>>>> 0..e60 - the initial firmware >>>>>> 8000..10000 - stack >>>>>> 400000.. - kernel >>>>>> 3ea0000.. - initramdisk >>>>>> >>>>>> This OF CI does not implement "interpret". >>>>>> >>>>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>>>>> includes a disk image with pre-formatted nvram. >>>>>> >>>>>> With this basic support, this can only boot into kernel directly. >>>>>> However this is just enough for the petitboot kernel and initradmdisk >>>>>> to >>>>>> boot from any possible source. Note this requires reasonably recent >>>>>> guest >>>>>> kernel with: >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>>>>> The immediate benefit is much faster booting time which especially >>>>>> crucial with fully emulated early CPU bring up environments. Also this >>>>>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>>>>> >>>>>> This separates VOF and sPAPR in a hope that VOF bits may be reused by >>>>>> other POWERPC boards which do not support pSeries. >>>>>> >>>>>> This is coded in assumption that later on we might be adding support >>>>>> for >>>>>> booting from QEMU backends (blockdev is the first candidate) without >>>>>> devices/drivers in between as OF1275 does not require that and >>>>>> it is quite easy to so. >>>>>> >>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>> --- >>>>>> >>>>>> The example command line is: >>>>>> >>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>>>>> -nodefaults \ >>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>>>>> -nographic \ >>>>>> -vga none \ >>>>>> -enable-kvm \ >>>>>> -m 8G \ >>>>>> -machine >>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >>>>>> \ >>>>>> -kernel pbuild/kernel-le-guest/vmlinux \ >>>>>> -initrd pb/rootfs.cpio.xz \ >>>>>> -drive >>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >>>>>> \ >>>>>> -global spapr-nvram.drive=DRIVE0 \ >>>>>> -snapshot \ >>>>>> -smp 8,threads=8 \ >>>>>> -L /home/aik/t/qemu-ppc64-bios/ \ >>>>>> -trace events=qemu_trace_events \ >>>>>> -d guest_errors \ >>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>>>>> -mon chardev=SOCKET0,mode=control >>>>>> >>>>>> --- >>>>>> Changes: >>>>>> v20: >>>>>> * compile vof.bin with -mcpu=power4 for better compatibility >>>>>> * s/std/stw/ in entry.S to make it work on ppc32 >>>>>> * fixed dt_available property to support both 32 and 64bit >>>>>> * shuffled prom_args handling code >>>>>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>>>>> >>>>> >>>>> [...] >>>>> >>>>>> diff --git a/default-configs/devices/ppc64-softmmu.mak >>>>>> b/default-configs/devices/ppc64-softmmu.mak >>>>>> index ae0841fa3a18..9fb201dfacfa 100644 >>>>>> --- a/default-configs/devices/ppc64-softmmu.mak >>>>>> +++ b/default-configs/devices/ppc64-softmmu.mak >>>>>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>>>>> # For pSeries >>>>>> CONFIG_PSERIES=y >>>>>> CONFIG_NVDIMM=y >>>>>> +CONFIG_VOF=y >>>>>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>>>>> index e51e0e5e5ac6..964510dfc73d 100644 >>>>>> --- a/hw/ppc/Kconfig >>>>>> +++ b/hw/ppc/Kconfig >>>>>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>>>>> >>>>>> config FDT_PPC >>>>>> bool >>>>>> + >>>>>> +config VOF >>>>>> + bool >>>>> >>>>> I think you should just add "select VOF" to config PSERIES section in >>>>> Kconfig instead of adding it to >>>>> default-configs/devices/ppc64-softmmu.mak. >>>> >>>> oh well, can do that too. >>> >>> I think most config options should be selected by KConfig and the default >>> config should only include machines, otherwise VOF would be added also >>> when you don't compile PSERIES or PEGASOS2. With select in Kconfig it will >>> be added when needed. That's why it's better to use select in this case. >>> >>>>> That should do it, it works in my updated pegasos2 patch: >>>>> >>>>> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 >>>>> [...] >>>>>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >>>>>> new file mode 100644 >>>>>> index 000000000000..569688714c91 >>>>>> --- /dev/null >>>>>> +++ b/pc-bios/vof/entry.S >>>>>> @@ -0,0 +1,51 @@ >>>>>> +#define LOAD32(rn, name) \ >>>>>> + lis rn,name##@h; \ >>>>>> + ori rn,rn,name##@l >>>>>> + >>>>>> +#define ENTRY(func_name) \ >>>>>> + .text; \ >>>>>> + .align 2; \ >>>>>> + .globl .func_name; \ >>>>>> + .func_name: \ >>>>>> + .globl func_name; \ >>>>>> + func_name: >>>>>> + >>>>>> +#define KVMPPC_HCALL_BASE 0xf000 >>>>>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >>>>>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >>>>>> + >>>>>> + . = 0x100 /* Do exactly as SLOF does */ >>>>>> + >>>>>> +ENTRY(_start) >>>>>> +# LOAD32(%r31, 0) /* Go 32bit mode */ >>>>>> +# mtmsrd %r31,0 >>>>>> + LOAD32(2, __toc_start) >>>>>> + b entry_c >>>>>> + >>>>>> +ENTRY(_prom_entry) >>>>>> + LOAD32(2, __toc_start) >>>>>> + stwu %r1,-112(%r1) >>>>>> + stw %r31,104(%r1) >>>>>> + mflr %r31 >>>>>> + bl prom_entry >>>>>> + nop >>>>>> + mtlr %r31 >>>>>> + ld %r31,104(%r1) >>>>> >>>>> It's getting there, now I see the first client call from the guest boot >>>>> code but then it crashes on this ld opcode which apparently is 64 bit >>>>> only: >>>> >>>> Oh right. >>>> >>>> >>>>> Hopefully this is the last such opcode left before I can really test >>>>> this. >>>> >>>> Make it lwz, and test it? >>> >>> Yes, figured that out too after sending this message. Replacing with lwz >>> works but I wonder that now you have stwu lwz do the stack offsets need >>> adjusting too or you just waste 4 bytes now? With lwz here I found no >>> further 64 bit opcodes and the guest boot code could walk the device tree. >>> It failed later but I think that's because I'll need to fill more info >>> about the machine in the device tree. I'll experiment with that but it >>> looks like it could work at least for MorphOS. I'll have to try Linux too. >> >> I was trying to get a linux kernel from a debian powerpc iso to do >> something (debian before 10.0 has Pegasos support) but I've run into the >> problem that the kernel is loaded at 0x400000 but the start address is at >> some offset from that. How do I set qemu,boot-kernel in this case? > > > The pseries kernel can work from any location (and it relocates itself to 0 > at some point) even though it is linked at c000.0000.0000.0000, and there is > no start address offset: > > === >> objdump -D ~/pbuild/kernel-le/vmlinux > /home/aik/pbuild/kernel-le/vmlinux: file format elf64-powerpcle > > > Disassembly of section .head.text: > > c000000000000000 <__start>: > c000000000000000: 48 00 00 08 tdi 0,r0,72 > c000000000000004: 2c 00 00 48 b c000000000000030 > <__start+0x30> > ... > === > > Not sure about pegasos2 kernels (or any ppc32 really), sorry. The kernel from Debian 10.0 powerpc used on pegasos looks like this: vmlinuz-chrp.initrd: file format elf32-powerpc vmlinuz-chrp.initrd architecture: powerpc:common, flags 0x00000112: EXEC_P, HAS_SYMS, D_PAGED start address 0x004002fc Program Header: LOAD off 0x00010000 vaddr 0x00400000 paddr 0x00400000 align 2**16 filesz 0x0127b72a memsz 0x0127d5d8 flags rwx STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4 filesz 0x00000000 memsz 0x00000000 flags rwx NOTE off 0x000000b4 vaddr 0x00000000 paddr 0x00000000 align 2**0 filesz 0x0000002c memsz 0x00000000 flags --- NOTE off 0x000000e0 vaddr 0x00000000 paddr 0x00000000 align 2**0 filesz 0x0000002c memsz 0x00000000 flags --- Sections: Idx Name Size VMA LMA File off Algn 0 .text 00008588 00400000 00400000 00010000 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .text.unlikely 00000078 00408588 00408588 00018588 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 2 .data 00001bec 00409000 00409000 00019000 2**2 CONTENTS, ALLOC, LOAD, DATA 3 .got 0000000c 0040abec 0040abec 0001abec 2**2 CONTENTS, ALLOC, LOAD, DATA 4 __builtin_cmdline 00000800 0040abf8 0040abf8 0001abf8 2**2 CONTENTS, ALLOC, LOAD, DATA 5 .kernel:vmlinux.strip 0047658e 0040c000 0040c000 0001c000 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .kernel:initrd 00df872a 00883000 00883000 00493000 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 7 .bss 000015d8 0167c000 0167c000 0128b72a 2**2 ALLOC 8 .debug_info 0000e7fd 00000000 00000000 0128b72a 2**0 CONTENTS, READONLY, DEBUGGING 9 .debug_abbrev 00002a4f 00000000 00000000 01299f27 2**0 CONTENTS, READONLY, DEBUGGING 10 .debug_loc 00009df1 00000000 00000000 0129c976 2**0 CONTENTS, READONLY, DEBUGGING 11 .debug_aranges 00000250 00000000 00000000 012a6767 2**0 CONTENTS, READONLY, DEBUGGING 12 .debug_line 000026b8 00000000 00000000 012a69b7 2**0 CONTENTS, READONLY, DEBUGGING 13 .debug_str 00001d9c 00000000 00000000 012a906f 2**0 CONTENTS, READONLY, DEBUGGING 14 .comment 0000001d 00000000 00000000 012aae0b 2**0 CONTENTS, READONLY 15 .gnu.attributes 00000010 00000000 00000000 012aae28 2**0 CONTENTS, READONLY 16 .debug_frame 00001c88 00000000 00000000 012aae38 2**2 CONTENTS, READONLY, DEBUGGING 17 .debug_ranges 00000740 00000000 00000000 012acac0 2**0 CONTENTS, READONLY, DEBUGGING It even seems to have the initrd embedded in it. If I just use 0x400000 as start address it does not work, has to jump to the start address for it to start correctly. >> Because when I set it to the address/size where the kernel is loaded it >> jumps to the beginnig not the correct start address. If I set the address >> to the start address then size will be wrong so I don't know how to set >> qemu,boot-kernel in this case or is there another property to tell the >> start address? >> (Vof does not seem to check any other property and seems to assume the >> entry point is the same as the load address but for this linux kernel it's >> not.) > > I guess if you really need an offset, you'll have to add a new property > ("qemu,boot-kernel-start"?) and look for it in the firmware. Or, say, put in > gpr5 in your version of spapr_cpu_set_entry_state() and make > boot_from_memory() use it. Either way would work but I don't want to recompile vof.bin so if you implement any of these in the next version I can use that. For now I've just set kernel address to the start address and decreased size a bit, the memory for the kernel is still claimed correctly when it's loaded so unless something relies on the size in qemu,boot-kernel it does not matter and this way the kernel starts but only gets to finding no /chosen/stdout and exit there so I can't try it until I resolve that. Regards, BALATON Zoltan
On Sat, 22 May 2021, BALATON Zoltan wrote: > On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >> VOF itself does not prints anything in this patch. > > However it seems to be needed for linux as the first thing it does seems to > be getting /chosen/stdout and calls exit if it returns nothing. So I'll need > this at least for linux. (I think MorphOS may also query it to print a banner > or some messages but not sure it needs it, at least it does not abort right > away if not found.) > >>> but to see Linux output do I need a stdout in VOF or it will just open the >>> serial with its own driver and use that? >>> So I'm not sure what's the stdout parts in the current vof patch does and >>> if I need that for anything. I'll try to experiment with it some more but >>> fixing the ld and Kconfig seems to be enough to get it work for me. >> >> So for the client to print something, /chosen/stdout needs to have a valid >> ihandle. >> The only way to get a valid ihandle is having a valid phandle which >> vof_client_open() can open. >> A valid phandle is a phandle of any node in the device tree. On spapr we >> pick some spapr-vty, open it and store in /chosen/stdout. >> >> From this point output from the client can be seen via a tracepoint. I've got it now. Looking at the original firmware device tree dump: https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2/attach/PegasosII_OFW-Dump.txt I see that /chosen/stdout points to "screen" which is an alias to /bootconsole. Just adding an empty /bootconsole node in the device tree and vof_client_open_store() that as /chosen/stdout works and I get output via vof_write traces so this is enough for now to test Linux. Properly connecting a serial backend can thus be postponed. So with this the Linux kernel does not abort on the first device tree access but starts to decompress itself then the embedded initrd and crashes at calling setprop: [...] vof_client_handle: setprop Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault. (gdb) bt #0 0x0000000000000000 in () #1 0x0000555555a5c2bf in vof_setprop (vof=0x7ffff48e9420, vallen=4, valaddr=<optimized out>, pname=<optimized out>, nodeph=8, fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:308 #2 0x0000555555a5c2bf in vof_client_handle (nrets=1, rets=0x7ffff48e93f0, nargs=4, args=0x7ffff48e93c0, service=0x7ffff48e9460 "setprop", vof=0x7ffff48e9420, fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:842 #3 0x0000555555a5c2bf in vof_client_call (ms=0x5555564f8800, vof=vof@entry=0x55555662a3d0, fdt=fdt@entry=0x7fff8aaff010, args_real=args_real@entry=23580472) at ../hw/ppc/vof.c:935 loooks like it's trying to set /chosen/linux,initrd-start: (gdb) up #1 0x0000555555a5c2bf in vof_setprop (vof=0x7ffff48e9420, vallen=4, valaddr=<optimized out>, pname=<optimized out>, nodeph=8, fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:308 308 if (!vmc->setprop(ms, nodepath, propname, val, vallen)) { (gdb) p nodepath $1 = "/chosen\000\060/rPC,750CXE/", '\000' <repeats 234 times> (gdb) p propname $2 = "linux,initrd-start\000linux,initrd-end\000linux,cmdline-timeout\000bootarg" (gdb) p val $3 = <optimized out> I think I need the callback for setprop in TYPE_VOF_MACHINE_IF. I can copy spapr_vof_setprop() but some explanation on why that's needed might help. Ciould I just do fdt_setprop in my callback as vof_setprop() would do without a machine callback or is there some special handling needed for these properties? Regards. BALATON Zoltan
On Sat, 22 May 2021, BALATON Zoltan wrote: > On Sat, 22 May 2021, BALATON Zoltan wrote: >> On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >>> VOF itself does not prints anything in this patch. >> >> However it seems to be needed for linux as the first thing it does seems to >> be getting /chosen/stdout and calls exit if it returns nothing. So I'll >> need this at least for linux. (I think MorphOS may also query it to print a >> banner or some messages but not sure it needs it, at least it does not >> abort right away if not found.) >> >>>> but to see Linux output do I need a stdout in VOF or it will just open >>>> the serial with its own driver and use that? >>>> So I'm not sure what's the stdout parts in the current vof patch does and >>>> if I need that for anything. I'll try to experiment with it some more but >>>> fixing the ld and Kconfig seems to be enough to get it work for me. >>> >>> So for the client to print something, /chosen/stdout needs to have a valid >>> ihandle. >>> The only way to get a valid ihandle is having a valid phandle which >>> vof_client_open() can open. >>> A valid phandle is a phandle of any node in the device tree. On spapr we >>> pick some spapr-vty, open it and store in /chosen/stdout. >>> >>> From this point output from the client can be seen via a tracepoint. > > I've got it now. Looking at the original firmware device tree dump: > > https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2/attach/PegasosII_OFW-Dump.txt > > I see that /chosen/stdout points to "screen" which is an alias to > /bootconsole. Just adding an empty /bootconsole node in the device tree and > vof_client_open_store() that as /chosen/stdout works and I get output via > vof_write traces so this is enough for now to test Linux. Properly connecting > a serial backend can thus be postponed. Using /failsafe instead of /bootconsole is even better because Linux then adds console=ttyS0 to the bootargs by default as it knows that's a serial port. > So with this the Linux kernel does not abort on the first device tree access > but starts to decompress itself then the embedded initrd and crashes at > calling setprop: > > [...] > vof_client_handle: setprop > > Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault. > (gdb) bt > #0 0x0000000000000000 in () > #1 0x0000555555a5c2bf in vof_setprop > (vof=0x7ffff48e9420, vallen=4, valaddr=<optimized out>, pname=<optimized > out>, nodeph=8, fdt=0x7fff8aaff010, ms=0x5555564f8800) > at ../hw/ppc/vof.c:308 > #2 0x0000555555a5c2bf in vof_client_handle > (nrets=1, rets=0x7ffff48e93f0, nargs=4, args=0x7ffff48e93c0, > service=0x7ffff48e9460 "setprop", > vof=0x7ffff48e9420, fdt=0x7fff8aaff010, ms=0x5555564f8800) at > ../hw/ppc/vof.c:842 > #3 0x0000555555a5c2bf in vof_client_call > (ms=0x5555564f8800, vof=vof@entry=0x55555662a3d0, > fdt=fdt@entry=0x7fff8aaff010, args_real=args_real@entry=23580472) > at ../hw/ppc/vof.c:935 > > loooks like it's trying to set /chosen/linux,initrd-start: > > (gdb) up > #1 0x0000555555a5c2bf in vof_setprop (vof=0x7ffff48e9420, vallen=4, > valaddr=<optimized out>, pname=<optimized out>, nodeph=8, > fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:308 > 308 if (!vmc->setprop(ms, nodepath, propname, val, vallen)) { > (gdb) p nodepath > $1 = "/chosen\000\060/rPC,750CXE/", '\000' <repeats 234 times> > (gdb) p propname > $2 = > "linux,initrd-start\000linux,initrd-end\000linux,cmdline-timeout\000bootarg" > (gdb) p val > $3 = <optimized out> > > I think I need the callback for setprop in TYPE_VOF_MACHINE_IF. I can copy > spapr_vof_setprop() but some explanation on why that's needed might help. > Ciould I just do fdt_setprop in my callback as vof_setprop() would do without > a machine callback or is there some special handling needed for these > properties? Just returning true from the setprop callback of the VofMachineIfClass for now to see what it would do and then it gets to all the way of calling quiesce. Unfortunately it then tries to call prom_printf on Pegasos2 as seen here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.233#n3261 which does not work because I have to shut down vhyp at quiesce otherwise it trips an assert on writing sdr1 (and may also interfere with the guest's usage of syscalls). So I need a way to not generate an exception if the guest calls back into OF after quiesce. A hacky solution is to patch out the sc 1 or _prom_entry point to just return after quiesce but maybe a better way is needed such as a switch in vof.bin that it checks before doing a syscall. Other than this problem it seems to work for the most part so maybe making the _prom_entry check some global value that I can set from quiesce to stop it doing syscalls and just return would be the simplest way to avoid this crash in Linux and not need a special version of vof for pegasos2. (MorphOS does not seem to call OF after quiesce which seems safer to do anyway, don't know why Linux does that. It could just print that one line before quiesce and then it would work, unfortunately that's not what they did.) Regards. BALATON Zoltan
On 22/05/2021 23:01, BALATON Zoltan wrote: > On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >> On 21/05/2021 19:05, BALATON Zoltan wrote: >>> On Fri, 21 May 2021, Alexey Kardashevskiy wrote: >>>> On 21/05/2021 07:59, BALATON Zoltan wrote: >>>>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>>>> The PAPR platform describes an OS environment that's presented by >>>>>> a combination of a hypervisor and firmware. The features it specifies >>>>>> require collaboration between the firmware and the hypervisor. >>>>>> >>>>>> Since the beginning, the runtime component of the firmware (RTAS) has >>>>>> been implemented as a 20 byte shim which simply forwards it to >>>>>> a hypercall implemented in qemu. The boot time firmware component is >>>>>> SLOF - but a build that's specific to qemu, and has always needed >>>>>> to be >>>>>> updated in sync with it. Even though we've managed to limit the >>>>>> amount >>>>>> of runtime communication we need between qemu and SLOF, there's some, >>>>>> and it has become increasingly awkward to handle as we've implemented >>>>>> new features. >>>>>> >>>>>> This implements a boot time OF client interface (CI) which is >>>>>> enabled by a new "x-vof" pseries machine option (stands for >>>>>> "Virtual Open >>>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >>>>>> which implements Open Firmware Client Interface (OF CI). This allows >>>>>> using a smaller stateless firmware which does not have to manage >>>>>> the device tree. >>>>>> >>>>>> The new "vof.bin" firmware image is included with source code under >>>>>> pc-bios/. It also includes RTAS blob. >>>>>> >>>>>> This implements a handful of CI methods just to get -kernel/-initrd >>>>>> working. In particular, this implements the device tree fetching and >>>>>> simple memory allocator - "claim" (an OF CI memory allocator) and >>>>>> updates >>>>>> "/memory@0/available" to report the client about available memory. >>>>>> >>>>>> This implements changing some device tree properties which we know >>>>>> how >>>>>> to deal with, the rest is ignored. To allow changes, this skips >>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some room for >>>>>> appending. >>>>>> >>>>>> In absence of SLOF, this assigns phandles to device tree nodes to >>>>>> make >>>>>> device tree traversing work. >>>>>> >>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >>>>>> >>>>>> This adds basic instances support which are managed by a hash map >>>>>> ihandle -> [phandle]. >>>>>> >>>>>> Before the guest started, the used memory is: >>>>>> 0..e60 - the initial firmware >>>>>> 8000..10000 - stack >>>>>> 400000.. - kernel >>>>>> 3ea0000.. - initramdisk >>>>>> >>>>>> This OF CI does not implement "interpret". >>>>>> >>>>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>>>>> includes a disk image with pre-formatted nvram. >>>>>> >>>>>> With this basic support, this can only boot into kernel directly. >>>>>> However this is just enough for the petitboot kernel and >>>>>> initradmdisk to >>>>>> boot from any possible source. Note this requires reasonably >>>>>> recent guest >>>>>> kernel with: >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>>>>> The immediate benefit is much faster booting time which especially >>>>>> crucial with fully emulated early CPU bring up environments. Also >>>>>> this >>>>>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>>>>> >>>>>> This separates VOF and sPAPR in a hope that VOF bits may be reused by >>>>>> other POWERPC boards which do not support pSeries. >>>>>> >>>>>> This is coded in assumption that later on we might be adding >>>>>> support for >>>>>> booting from QEMU backends (blockdev is the first candidate) without >>>>>> devices/drivers in between as OF1275 does not require that and >>>>>> it is quite easy to so. >>>>>> >>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>> --- >>>>>> >>>>>> The example command line is: >>>>>> >>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>>>>> -nodefaults \ >>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>>>>> -nographic \ >>>>>> -vga none \ >>>>>> -enable-kvm \ >>>>>> -m 8G \ >>>>>> -machine >>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >>>>>> \ >>>>>> -kernel pbuild/kernel-le-guest/vmlinux \ >>>>>> -initrd pb/rootfs.cpio.xz \ >>>>>> -drive >>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >>>>>> \ >>>>>> -global spapr-nvram.drive=DRIVE0 \ >>>>>> -snapshot \ >>>>>> -smp 8,threads=8 \ >>>>>> -L /home/aik/t/qemu-ppc64-bios/ \ >>>>>> -trace events=qemu_trace_events \ >>>>>> -d guest_errors \ >>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>>>>> -mon chardev=SOCKET0,mode=control >>>>>> >>>>>> --- >>>>>> Changes: >>>>>> v20: >>>>>> * compile vof.bin with -mcpu=power4 for better compatibility >>>>>> * s/std/stw/ in entry.S to make it work on ppc32 >>>>>> * fixed dt_available property to support both 32 and 64bit >>>>>> * shuffled prom_args handling code >>>>>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>>>>> >>>>> >>>>> [...] >>>>> >>>>>> diff --git a/default-configs/devices/ppc64-softmmu.mak >>>>>> b/default-configs/devices/ppc64-softmmu.mak >>>>>> index ae0841fa3a18..9fb201dfacfa 100644 >>>>>> --- a/default-configs/devices/ppc64-softmmu.mak >>>>>> +++ b/default-configs/devices/ppc64-softmmu.mak >>>>>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>>>>> # For pSeries >>>>>> CONFIG_PSERIES=y >>>>>> CONFIG_NVDIMM=y >>>>>> +CONFIG_VOF=y >>>>>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>>>>> index e51e0e5e5ac6..964510dfc73d 100644 >>>>>> --- a/hw/ppc/Kconfig >>>>>> +++ b/hw/ppc/Kconfig >>>>>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>>>>> >>>>>> config FDT_PPC >>>>>> bool >>>>>> + >>>>>> +config VOF >>>>>> + bool >>>>> >>>>> I think you should just add "select VOF" to config PSERIES section >>>>> in Kconfig instead of adding it to >>>>> default-configs/devices/ppc64-softmmu.mak. >>>> >>>> oh well, can do that too. >>> >>> I think most config options should be selected by KConfig and the >>> default config should only include machines, otherwise VOF would be >>> added also when you don't compile PSERIES or PEGASOS2. With select in >>> Kconfig it will be added when needed. That's why it's better to use >>> select in this case. >>> >>>>> That should do it, it works in my updated pegasos2 patch: >>>>> >>>>> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 >>>>> [...] >>>>>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >>>>>> new file mode 100644 >>>>>> index 000000000000..569688714c91 >>>>>> --- /dev/null >>>>>> +++ b/pc-bios/vof/entry.S >>>>>> @@ -0,0 +1,51 @@ >>>>>> +#define LOAD32(rn, name) \ >>>>>> + lis rn,name##@h; \ >>>>>> + ori rn,rn,name##@l >>>>>> + >>>>>> +#define ENTRY(func_name) \ >>>>>> + .text; \ >>>>>> + .align 2; \ >>>>>> + .globl .func_name; \ >>>>>> + .func_name: \ >>>>>> + .globl func_name; \ >>>>>> + func_name: >>>>>> + >>>>>> +#define KVMPPC_HCALL_BASE 0xf000 >>>>>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >>>>>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >>>>>> + >>>>>> + . = 0x100 /* Do exactly as SLOF does */ >>>>>> + >>>>>> +ENTRY(_start) >>>>>> +# LOAD32(%r31, 0) /* Go 32bit mode */ >>>>>> +# mtmsrd %r31,0 >>>>>> + LOAD32(2, __toc_start) >>>>>> + b entry_c >>>>>> + >>>>>> +ENTRY(_prom_entry) >>>>>> + LOAD32(2, __toc_start) >>>>>> + stwu %r1,-112(%r1) >>>>>> + stw %r31,104(%r1) >>>>>> + mflr %r31 >>>>>> + bl prom_entry >>>>>> + nop >>>>>> + mtlr %r31 >>>>>> + ld %r31,104(%r1) >>>>> >>>>> It's getting there, now I see the first client call from the guest >>>>> boot code but then it crashes on this ld opcode which apparently is >>>>> 64 bit only: >>>> >>>> Oh right. >>>> >>>> >>>>> Hopefully this is the last such opcode left before I can really >>>>> test this. >>>> >>>> Make it lwz, and test it? >>> >>> Yes, figured that out too after sending this message. Replacing with >>> lwz works but I wonder that now you have stwu lwz do the stack >>> offsets need adjusting too or you just waste 4 bytes now? >> >> Well, this assumes the 64bit client and that ABI. I think ideally the >> firmware is supposed to use its own stack but I did not bother here. I >> do not know 32bit ABI at all so say whether the existing code should >> just work or not :-/ > > It seems to work so that's OK, just thought if the firmware is 32 bit it > does not need 64 bit values on stack but if that's also potentially used > by a 64 bit kernel then it may be better to keep it that way to avoid > confusion. With the 64 bit opcodes replaced it seems to work on pegasos2 > and the guest can call CI functions and get a reply so maybe it's just a > few wasted bytes that's not a big deal. > >>> With lwz here I found no further 64 bit opcodes and the guest boot >>> code could walk the device tree. It failed later but I think that's >>> because I'll need to fill more info about the machine in the device >>> tree. I'll experiment with that but it looks like it could work at >>> least for MorphOS. I'll have to try Linux too. >> >> There are plenty of tracepoints, enable them all. > > I'm running with -trace enable="vof*" but it does not give me too much > info as a lot of calls (such as peer, child, etc.) don't log anything > other than there was a hypercall so only get info about opening paths > and querying some props. The MorphOS boot.img just walks the device tree > gathering some data about the machine then calls quiesce and boot into > the OS that later tries to use the gathered info at which point it > crashes without any logs if some info is not as expected. This does not > make it easy to debug but I think once I fill the device tree enough > with all needed info it should work. Currently I'm missing info about > PCI devices that it may need. One thing to note about PCI is that normally I think the client expects the firmware to do PCI probing and SLOF does it. But VOF does not and Linux scans PCI bus(es) itself. Might be a problem for you kernel. > >>>>> Do you have some info on how the stdout works in VOF? I think I'll >>>>> need that to test with Linux and get output but I'm not sure what's >>>>> needed on the machine side. >>>> >>>> VOF opens stsout and stores the ihandle (in fdt) which the client >>>> (==kernel) uses for writing. To make it work properly, you need to >>>> hook up that instance to a device backend similar to what I have for >>>> spapr-vty: >>>> >>>> https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd39cc5dad5b6385965d >>>> >>>> This is not a part of this patch as I'm trying to keep things >>>> simpler and accessing backends from VOF is still unsettled. But >>>> there is a workaround which is trace_vof_write, I use this. Thanks, >>> >>> The above patch is about stdin but stdout seems to be added by the >>> current vof patch. What is spapr-vty? >> >> It is pseries' paravirtual serial device, pegasos does not have it. >> >>> I don't think I have something similar in pegasos2 where I just have >>> a normal serial port created by ISASuperIO in the vt8231 model. >> >> Correct. >> >>> Can I use that backend somehow or have to create some other serial >>> device to connect to stdout? >>> Does trace_vof_write work for stuff output by the guest? >>> I guess that's only for things printed by VOF itself >> >> VOF itself does not prints anything in this patch. > > However it seems to be needed for linux as the first thing it does seems > to be getting /chosen/stdout and calls exit if it returns nothing. So Right, Linux does but VOF (==vof.bin) does not. > I'll need this at least for linux. (I think MorphOS may also query it to > print a banner or some messages but not sure it needs it, at least it > does not abort right away if not found.) Tracepoints print this :) >>> but to see Linux output do I need a stdout in VOF or it will just >>> open the serial with its own driver and use that? >>> So I'm not sure what's the stdout parts in the current vof patch does >>> and if I need that for anything. I'll try to experiment with it some >>> more but fixing the ld and Kconfig seems to be enough to get it work >>> for me. >> >> So for the client to print something, /chosen/stdout needs to have a >> valid ihandle. >> The only way to get a valid ihandle is having a valid phandle which >> vof_client_open() can open. >> A valid phandle is a phandle of any node in the device tree. On spapr >> we pick some spapr-vty, open it and store in /chosen/stdout. >> >> From this point output from the client can be seen via a tracepoint. >> >> Now if we want proper output without tracepoints - we need to hook it >> up with some chardev backend (not a device such a vt8231 or spapr-vty >> but backend). > > I don't know much about it but devices are also connected to some > backend so is it possible to use the same backend for VOF as used for > the normal serial port? Yes but with this initial patch there is no backend support, you only get tracepoints. > But I need a way to find that and connect it to > VOF and I'm not qure how to do that yet. Pick some device in the machine reset code (or you can open the root - "/"), resolve its FW (==FDT) path, call vof_client_open_store() on it, it will store ihandle in the FDT. This will enable stdout and the output can be seen via tracepoint. > Or do I need to create a > separate serial backend and connect that to VOF? I'll try to look at > spapr-vty to see what it does. No additional devices needed. > >> https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd3 does this: >> 1. when a phandle is open, QEMU will search for DeviceState* for the >> specific FDT node and get a chardev from the device. >> 2. when write() is called, QEMU calls qemu_chr_fe_write_all() on >> chardev from 1. >> >> From this point you do not need a tracepoint and the output will >> appears in the console you set up for stdout. >> >> Now if you want input from this console, things get tricky. First, on >> powernv/pseries we only need this for grub as otherwise the kernel has >> all the drivers needed and will not use the client interface. For the >> grub, we need to provide a valid ihandle for /chosen/stdin which is >> easy but implementing read() on this is not as there is no simple >> device-type-independend way of reading from chardev. I hacked it for >> spapr-tvy but other serial devices will need special handling, or >> we'll have to introduce some VOF_SERIAL_READ interface for those which >> will face opposition :) >> >> Makes sense? > > It explains things a bit but still not entirely clear how can I get > something to add as a stdout. With the pegasos2 firmware it puts the > serial device there normally that it inits and opens. Without that > firmware we have to somehow do that from QEMU so find the serial backend > used by the serial device within the vt8231 model (or use a different > backend just for this?) then open it and put it in the device tree. If > that's correct or how to do it is not clear yet. spapr looks through all spapr-vty and picks one with the lowest @reg. You can do a similar thing. Or add a machine option with a serial device id which you want to be the default console. So many options :)
On 23/05/2021 01:02, BALATON Zoltan wrote: > On Sat, 22 May 2021, BALATON Zoltan wrote: >> On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >>> VOF itself does not prints anything in this patch. >> >> However it seems to be needed for linux as the first thing it does >> seems to be getting /chosen/stdout and calls exit if it returns >> nothing. So I'll need this at least for linux. (I think MorphOS may >> also query it to print a banner or some messages but not sure it needs >> it, at least it does not abort right away if not found.) >> >>>> but to see Linux output do I need a stdout in VOF or it will just >>>> open the serial with its own driver and use that? >>>> So I'm not sure what's the stdout parts in the current vof patch >>>> does and if I need that for anything. I'll try to experiment with it >>>> some more but fixing the ld and Kconfig seems to be enough to get it >>>> work for me. >>> >>> So for the client to print something, /chosen/stdout needs to have a >>> valid ihandle. >>> The only way to get a valid ihandle is having a valid phandle which >>> vof_client_open() can open. >>> A valid phandle is a phandle of any node in the device tree. On spapr >>> we pick some spapr-vty, open it and store in /chosen/stdout. >>> >>> From this point output from the client can be seen via a tracepoint. > > I've got it now. Looking at the original firmware device tree dump: > > https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2/attach/PegasosII_OFW-Dump.txt > > > I see that /chosen/stdout points to "screen" which is an alias to > /bootconsole. Just adding an empty /bootconsole node in the device tree > and vof_client_open_store() that as /chosen/stdout works and I get > output via vof_write traces so this is enough for now to test Linux. > Properly connecting a serial backend can thus be postponed. > > So with this the Linux kernel does not abort on the first device tree > access but starts to decompress itself then the embedded initrd and > crashes at calling setprop: > > [...] > vof_client_handle: setprop > > Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault. > (gdb) bt > #0 0x0000000000000000 in () > #1 0x0000555555a5c2bf in vof_setprop > (vof=0x7ffff48e9420, vallen=4, valaddr=<optimized out>, > pname=<optimized out>, nodeph=8, fdt=0x7fff8aaff010, ms=0x5555564f8800) > at ../hw/ppc/vof.c:308 > #2 0x0000555555a5c2bf in vof_client_handle > (nrets=1, rets=0x7ffff48e93f0, nargs=4, args=0x7ffff48e93c0, > service=0x7ffff48e9460 "setprop", > vof=0x7ffff48e9420, fdt=0x7fff8aaff010, ms=0x5555564f8800) at > ../hw/ppc/vof.c:842 > #3 0x0000555555a5c2bf in vof_client_call > (ms=0x5555564f8800, vof=vof@entry=0x55555662a3d0, > fdt=fdt@entry=0x7fff8aaff010, args_real=args_real@entry=23580472) > at ../hw/ppc/vof.c:935 > > loooks like it's trying to set /chosen/linux,initrd-start: It is not horribly clear why it crashed though. > > (gdb) up > #1 0x0000555555a5c2bf in vof_setprop (vof=0x7ffff48e9420, vallen=4, > valaddr=<optimized out>, pname=<optimized out>, nodeph=8, > fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:308 > 308 if (!vmc->setprop(ms, nodepath, propname, val, vallen)) { > (gdb) p nodepath > $1 = "/chosen\000\060/rPC,750CXE/", '\000' <repeats 234 times> > (gdb) p propname > $2 = > "linux,initrd-start\000linux,initrd-end\000linux,cmdline-timeout\000bootarg" > > (gdb) p val > $3 = <optimized out> > > I think I need the callback for setprop in TYPE_VOF_MACHINE_IF. I can > copy spapr_vof_setprop() but some explanation on why that's needed might > help. Ciould I just do fdt_setprop in my callback as vof_setprop() would > do without a machine callback or is there some special handling needed > for these properties? The short answer is yes, you do not need TYPE_VOF_MACHINE_IF. The long answer is that we build the FDT on spapr twice: 1. at the reset time and 2. after "ibm,client-arhitecture-support" (early in the boot the spapr paravirtual client says what it supports - ISA level, MMU features, etc) Between 1 and 2 the kernel moves initrd and we do not update the QEMU's version of its location, the tree at 2) will have the old values. So for that reason I have TYPE_VOF_MACHINE_IF. You most definitely do not need it.
On 23/05/2021 02:46, BALATON Zoltan wrote: > On Sat, 22 May 2021, BALATON Zoltan wrote: >> On Sat, 22 May 2021, BALATON Zoltan wrote: >>> On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >>>> VOF itself does not prints anything in this patch. >>> >>> However it seems to be needed for linux as the first thing it does >>> seems to be getting /chosen/stdout and calls exit if it returns >>> nothing. So I'll need this at least for linux. (I think MorphOS may >>> also query it to print a banner or some messages but not sure it >>> needs it, at least it does not abort right away if not found.) >>> >>>>> but to see Linux output do I need a stdout in VOF or it will just >>>>> open the serial with its own driver and use that? >>>>> So I'm not sure what's the stdout parts in the current vof patch >>>>> does and if I need that for anything. I'll try to experiment with >>>>> it some more but fixing the ld and Kconfig seems to be enough to >>>>> get it work for me. >>>> >>>> So for the client to print something, /chosen/stdout needs to have a >>>> valid ihandle. >>>> The only way to get a valid ihandle is having a valid phandle which >>>> vof_client_open() can open. >>>> A valid phandle is a phandle of any node in the device tree. On >>>> spapr we pick some spapr-vty, open it and store in /chosen/stdout. >>>> >>>> From this point output from the client can be seen via a tracepoint. >> >> I've got it now. Looking at the original firmware device tree dump: >> >> https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2/attach/PegasosII_OFW-Dump.txt >> >> >> I see that /chosen/stdout points to "screen" which is an alias to >> /bootconsole. Just adding an empty /bootconsole node in the device >> tree and vof_client_open_store() that as /chosen/stdout works and I >> get output via vof_write traces so this is enough for now to test >> Linux. Properly connecting a serial backend can thus be postponed. > > Using /failsafe instead of /bootconsole is even better because Linux > then adds console=ttyS0 to the bootargs by default as it knows that's a > serial port. When linux boots so far that it can use whatever is passed in "console=" - the client interface is done pretty much and the output happens without it. > >> So with this the Linux kernel does not abort on the first device tree >> access but starts to decompress itself then the embedded initrd and >> crashes at calling setprop: >> >> [...] >> vof_client_handle: setprop >> >> Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault. >> (gdb) bt >> #0 0x0000000000000000 in () >> #1 0x0000555555a5c2bf in vof_setprop >> (vof=0x7ffff48e9420, vallen=4, valaddr=<optimized out>, >> pname=<optimized out>, nodeph=8, fdt=0x7fff8aaff010, ms=0x5555564f8800) >> at ../hw/ppc/vof.c:308 >> #2 0x0000555555a5c2bf in vof_client_handle >> (nrets=1, rets=0x7ffff48e93f0, nargs=4, args=0x7ffff48e93c0, >> service=0x7ffff48e9460 "setprop", >> vof=0x7ffff48e9420, fdt=0x7fff8aaff010, ms=0x5555564f8800) at >> ../hw/ppc/vof.c:842 >> #3 0x0000555555a5c2bf in vof_client_call >> (ms=0x5555564f8800, vof=vof@entry=0x55555662a3d0, >> fdt=fdt@entry=0x7fff8aaff010, args_real=args_real@entry=23580472) >> at ../hw/ppc/vof.c:935 >> >> loooks like it's trying to set /chosen/linux,initrd-start: >> >> (gdb) up >> #1 0x0000555555a5c2bf in vof_setprop (vof=0x7ffff48e9420, vallen=4, >> valaddr=<optimized out>, pname=<optimized out>, nodeph=8, >> fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:308 >> 308 if (!vmc->setprop(ms, nodepath, propname, val, vallen)) { >> (gdb) p nodepath >> $1 = "/chosen\000\060/rPC,750CXE/", '\000' <repeats 234 times> >> (gdb) p propname >> $2 = >> "linux,initrd-start\000linux,initrd-end\000linux,cmdline-timeout\000bootarg" >> >> (gdb) p val >> $3 = <optimized out> >> >> I think I need the callback for setprop in TYPE_VOF_MACHINE_IF. I can >> copy spapr_vof_setprop() but some explanation on why that's needed >> might help. Ciould I just do fdt_setprop in my callback as >> vof_setprop() would do without a machine callback or is there some >> special handling needed for these properties? > > Just returning true from the setprop callback of the VofMachineIfClass > for now to see what it would do and then it gets to all the way of > calling quiesce. Unfortunately it then tries to call prom_printf on > Pegasos2 as seen here: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.233#n3261 > > > which does not work because I have to shut down vhyp at quiesce What is vhyp and why do you have to shut it down? > otherwise it trips an assert on writing sdr1 (and may also interfere > with the guest's usage of syscalls). Where is that assert? I am a bit lost here. Nothing in the current VOF should touch any actual device, it prints via tracepoints or (with that additional patch) to a chardev backend. > So I need a way to not generate an > exception if the guest calls back into OF after quiesce. A hacky > solution is to patch out the sc 1 or _prom_entry point to just return > after quiesce but maybe a better way is needed such as a switch in > vof.bin that it checks before doing a syscall. Other than this problem > it seems to work for the most part so maybe making the _prom_entry check > some global value that I can set from quiesce to stop it doing syscalls > and just return would be the simplest way to avoid this crash in Linux > and not need a special version of vof for pegasos2. (MorphOS does not > seem to call OF after quiesce which seems safer to do anyway, don't know > why Linux does that. It could just print that one line before quiesce > and then it would work, unfortunately that's not what they did.) quiesce is supposed to wait until ongoing DMA is finished (or something like that), it was (people say) a request from Apple back then and was never really architected.
On 22/05/2021 23:08, BALATON Zoltan wrote: > On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >> On 22/05/2021 05:57, BALATON Zoltan wrote: >>> On Fri, 21 May 2021, BALATON Zoltan wrote: >>>> On Fri, 21 May 2021, Alexey Kardashevskiy wrote: >>>>> On 21/05/2021 07:59, BALATON Zoltan wrote: >>>>>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>>>>> The PAPR platform describes an OS environment that's presented by >>>>>>> a combination of a hypervisor and firmware. The features it >>>>>>> specifies >>>>>>> require collaboration between the firmware and the hypervisor. >>>>>>> >>>>>>> Since the beginning, the runtime component of the firmware (RTAS) >>>>>>> has >>>>>>> been implemented as a 20 byte shim which simply forwards it to >>>>>>> a hypercall implemented in qemu. The boot time firmware component is >>>>>>> SLOF - but a build that's specific to qemu, and has always needed >>>>>>> to be >>>>>>> updated in sync with it. Even though we've managed to limit the >>>>>>> amount >>>>>>> of runtime communication we need between qemu and SLOF, there's >>>>>>> some, >>>>>>> and it has become increasingly awkward to handle as we've >>>>>>> implemented >>>>>>> new features. >>>>>>> >>>>>>> This implements a boot time OF client interface (CI) which is >>>>>>> enabled by a new "x-vof" pseries machine option (stands for >>>>>>> "Virtual Open >>>>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT >>>>>>> hcall >>>>>>> which implements Open Firmware Client Interface (OF CI). This allows >>>>>>> using a smaller stateless firmware which does not have to manage >>>>>>> the device tree. >>>>>>> >>>>>>> The new "vof.bin" firmware image is included with source code under >>>>>>> pc-bios/. It also includes RTAS blob. >>>>>>> >>>>>>> This implements a handful of CI methods just to get -kernel/-initrd >>>>>>> working. In particular, this implements the device tree fetching and >>>>>>> simple memory allocator - "claim" (an OF CI memory allocator) and >>>>>>> updates >>>>>>> "/memory@0/available" to report the client about available memory. >>>>>>> >>>>>>> This implements changing some device tree properties which we >>>>>>> know how >>>>>>> to deal with, the rest is ignored. To allow changes, this skips >>>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some room >>>>>>> for >>>>>>> appending. >>>>>>> >>>>>>> In absence of SLOF, this assigns phandles to device tree nodes to >>>>>>> make >>>>>>> device tree traversing work. >>>>>>> >>>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a >>>>>>> tree. >>>>>>> >>>>>>> This adds basic instances support which are managed by a hash map >>>>>>> ihandle -> [phandle]. >>>>>>> >>>>>>> Before the guest started, the used memory is: >>>>>>> 0..e60 - the initial firmware >>>>>>> 8000..10000 - stack >>>>>>> 400000.. - kernel >>>>>>> 3ea0000.. - initramdisk >>>>>>> >>>>>>> This OF CI does not implement "interpret". >>>>>>> >>>>>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>>>>>> includes a disk image with pre-formatted nvram. >>>>>>> >>>>>>> With this basic support, this can only boot into kernel directly. >>>>>>> However this is just enough for the petitboot kernel and >>>>>>> initradmdisk to >>>>>>> boot from any possible source. Note this requires reasonably >>>>>>> recent guest >>>>>>> kernel with: >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>>>>>> The immediate benefit is much faster booting time which especially >>>>>>> crucial with fully emulated early CPU bring up environments. Also >>>>>>> this >>>>>>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>>>>>> >>>>>>> This separates VOF and sPAPR in a hope that VOF bits may be >>>>>>> reused by >>>>>>> other POWERPC boards which do not support pSeries. >>>>>>> >>>>>>> This is coded in assumption that later on we might be adding >>>>>>> support for >>>>>>> booting from QEMU backends (blockdev is the first candidate) without >>>>>>> devices/drivers in between as OF1275 does not require that and >>>>>>> it is quite easy to so. >>>>>>> >>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>> --- >>>>>>> >>>>>>> The example command line is: >>>>>>> >>>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>>>>>> -nodefaults \ >>>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>>>>>> -nographic \ >>>>>>> -vga none \ >>>>>>> -enable-kvm \ >>>>>>> -m 8G \ >>>>>>> -machine >>>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >>>>>>> \ >>>>>>> -kernel pbuild/kernel-le-guest/vmlinux \ >>>>>>> -initrd pb/rootfs.cpio.xz \ >>>>>>> -drive >>>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >>>>>>> \ >>>>>>> -global spapr-nvram.drive=DRIVE0 \ >>>>>>> -snapshot \ >>>>>>> -smp 8,threads=8 \ >>>>>>> -L /home/aik/t/qemu-ppc64-bios/ \ >>>>>>> -trace events=qemu_trace_events \ >>>>>>> -d guest_errors \ >>>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>>>>>> -mon chardev=SOCKET0,mode=control >>>>>>> >>>>>>> --- >>>>>>> Changes: >>>>>>> v20: >>>>>>> * compile vof.bin with -mcpu=power4 for better compatibility >>>>>>> * s/std/stw/ in entry.S to make it work on ppc32 >>>>>>> * fixed dt_available property to support both 32 and 64bit >>>>>>> * shuffled prom_args handling code >>>>>>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>>>>>> >>>>>> >>>>>> [...] >>>>>> >>>>>>> diff --git a/default-configs/devices/ppc64-softmmu.mak >>>>>>> b/default-configs/devices/ppc64-softmmu.mak >>>>>>> index ae0841fa3a18..9fb201dfacfa 100644 >>>>>>> --- a/default-configs/devices/ppc64-softmmu.mak >>>>>>> +++ b/default-configs/devices/ppc64-softmmu.mak >>>>>>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>>>>>> # For pSeries >>>>>>> CONFIG_PSERIES=y >>>>>>> CONFIG_NVDIMM=y >>>>>>> +CONFIG_VOF=y >>>>>>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>>>>>> index e51e0e5e5ac6..964510dfc73d 100644 >>>>>>> --- a/hw/ppc/Kconfig >>>>>>> +++ b/hw/ppc/Kconfig >>>>>>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>>>>>> >>>>>>> config FDT_PPC >>>>>>> bool >>>>>>> + >>>>>>> +config VOF >>>>>>> + bool >>>>>> >>>>>> I think you should just add "select VOF" to config PSERIES section >>>>>> in Kconfig instead of adding it to >>>>>> default-configs/devices/ppc64-softmmu.mak. >>>>> >>>>> oh well, can do that too. >>>> >>>> I think most config options should be selected by KConfig and the >>>> default config should only include machines, otherwise VOF would be >>>> added also when you don't compile PSERIES or PEGASOS2. With select >>>> in Kconfig it will be added when needed. That's why it's better to >>>> use select in this case. >>>> >>>>>> That should do it, it works in my updated pegasos2 patch: >>>>>> >>>>>> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 >>>>>> [...] >>>>>>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >>>>>>> new file mode 100644 >>>>>>> index 000000000000..569688714c91 >>>>>>> --- /dev/null >>>>>>> +++ b/pc-bios/vof/entry.S >>>>>>> @@ -0,0 +1,51 @@ >>>>>>> +#define LOAD32(rn, name) \ >>>>>>> + lis rn,name##@h; \ >>>>>>> + ori rn,rn,name##@l >>>>>>> + >>>>>>> +#define ENTRY(func_name) \ >>>>>>> + .text; \ >>>>>>> + .align 2; \ >>>>>>> + .globl .func_name; \ >>>>>>> + .func_name: \ >>>>>>> + .globl func_name; \ >>>>>>> + func_name: >>>>>>> + >>>>>>> +#define KVMPPC_HCALL_BASE 0xf000 >>>>>>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >>>>>>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >>>>>>> + >>>>>>> + . = 0x100 /* Do exactly as SLOF does */ >>>>>>> + >>>>>>> +ENTRY(_start) >>>>>>> +# LOAD32(%r31, 0) /* Go 32bit mode */ >>>>>>> +# mtmsrd %r31,0 >>>>>>> + LOAD32(2, __toc_start) >>>>>>> + b entry_c >>>>>>> + >>>>>>> +ENTRY(_prom_entry) >>>>>>> + LOAD32(2, __toc_start) >>>>>>> + stwu %r1,-112(%r1) >>>>>>> + stw %r31,104(%r1) >>>>>>> + mflr %r31 >>>>>>> + bl prom_entry >>>>>>> + nop >>>>>>> + mtlr %r31 >>>>>>> + ld %r31,104(%r1) >>>>>> >>>>>> It's getting there, now I see the first client call from the guest >>>>>> boot code but then it crashes on this ld opcode which apparently >>>>>> is 64 bit only: >>>>> >>>>> Oh right. >>>>> >>>>> >>>>>> Hopefully this is the last such opcode left before I can really >>>>>> test this. >>>>> >>>>> Make it lwz, and test it? >>>> >>>> Yes, figured that out too after sending this message. Replacing with >>>> lwz works but I wonder that now you have stwu lwz do the stack >>>> offsets need adjusting too or you just waste 4 bytes now? With lwz >>>> here I found no further 64 bit opcodes and the guest boot code could >>>> walk the device tree. It failed later but I think that's because >>>> I'll need to fill more info about the machine in the device tree. >>>> I'll experiment with that but it looks like it could work at least >>>> for MorphOS. I'll have to try Linux too. >>> >>> I was trying to get a linux kernel from a debian powerpc iso to do >>> something (debian before 10.0 has Pegasos support) but I've run into >>> the problem that the kernel is loaded at 0x400000 but the start >>> address is at some offset from that. How do I set qemu,boot-kernel in >>> this case? >> >> >> The pseries kernel can work from any location (and it relocates itself >> to 0 at some point) even though it is linked at c000.0000.0000.0000, >> and there is no start address offset: >> >> === >>> objdump -D ~/pbuild/kernel-le/vmlinux >> /home/aik/pbuild/kernel-le/vmlinux: file format elf64-powerpcle >> >> >> Disassembly of section .head.text: >> >> c000000000000000 <__start>: >> c000000000000000: 48 00 00 08 tdi 0,r0,72 >> c000000000000004: 2c 00 00 48 b c000000000000030 >> <__start+0x30> >> ... >> === >> >> Not sure about pegasos2 kernels (or any ppc32 really), sorry. > > The kernel from Debian 10.0 powerpc used on pegasos looks like this: > > vmlinuz-chrp.initrd: file format elf32-powerpc > vmlinuz-chrp.initrd > architecture: powerpc:common, flags 0x00000112: > EXEC_P, HAS_SYMS, D_PAGED > start address 0x004002fc > > Program Header: > LOAD off 0x00010000 vaddr 0x00400000 paddr 0x00400000 align 2**16 > filesz 0x0127b72a memsz 0x0127d5d8 flags rwx > STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4 > filesz 0x00000000 memsz 0x00000000 flags rwx > NOTE off 0x000000b4 vaddr 0x00000000 paddr 0x00000000 align 2**0 > filesz 0x0000002c memsz 0x00000000 flags --- > NOTE off 0x000000e0 vaddr 0x00000000 paddr 0x00000000 align 2**0 > filesz 0x0000002c memsz 0x00000000 flags --- > > Sections: > Idx Name Size VMA LMA File off Algn > 0 .text 00008588 00400000 00400000 00010000 2**2 > CONTENTS, ALLOC, LOAD, READONLY, CODE > 1 .text.unlikely 00000078 00408588 00408588 00018588 2**2 > CONTENTS, ALLOC, LOAD, READONLY, CODE > 2 .data 00001bec 00409000 00409000 00019000 2**2 > CONTENTS, ALLOC, LOAD, DATA > 3 .got 0000000c 0040abec 0040abec 0001abec 2**2 > CONTENTS, ALLOC, LOAD, DATA > 4 __builtin_cmdline 00000800 0040abf8 0040abf8 0001abf8 2**2 > CONTENTS, ALLOC, LOAD, DATA > 5 .kernel:vmlinux.strip 0047658e 0040c000 0040c000 0001c000 2**0 > CONTENTS, ALLOC, LOAD, READONLY, DATA > 6 .kernel:initrd 00df872a 00883000 00883000 00493000 2**0 > CONTENTS, ALLOC, LOAD, READONLY, DATA > 7 .bss 000015d8 0167c000 0167c000 0128b72a 2**2 > ALLOC > 8 .debug_info 0000e7fd 00000000 00000000 0128b72a 2**0 > CONTENTS, READONLY, DEBUGGING > 9 .debug_abbrev 00002a4f 00000000 00000000 01299f27 2**0 > CONTENTS, READONLY, DEBUGGING > 10 .debug_loc 00009df1 00000000 00000000 0129c976 2**0 > CONTENTS, READONLY, DEBUGGING > 11 .debug_aranges 00000250 00000000 00000000 012a6767 2**0 > CONTENTS, READONLY, DEBUGGING > 12 .debug_line 000026b8 00000000 00000000 012a69b7 2**0 > CONTENTS, READONLY, DEBUGGING > 13 .debug_str 00001d9c 00000000 00000000 012a906f 2**0 > CONTENTS, READONLY, DEBUGGING > 14 .comment 0000001d 00000000 00000000 012aae0b 2**0 > CONTENTS, READONLY > 15 .gnu.attributes 00000010 00000000 00000000 012aae28 2**0 > CONTENTS, READONLY > 16 .debug_frame 00001c88 00000000 00000000 012aae38 2**2 > CONTENTS, READONLY, DEBUGGING > 17 .debug_ranges 00000740 00000000 00000000 012acac0 2**0 > CONTENTS, READONLY, DEBUGGING > > It even seems to have the initrd embedded in it. If I just use 0x400000 > as start address it does not work, has to jump to the start address for > it to start correctly. > >>> Because when I set it to the address/size where the kernel is loaded >>> it jumps to the beginnig not the correct start address. If I set the >>> address to the start address then size will be wrong so I don't know >>> how to set qemu,boot-kernel in this case or is there another property >>> to tell the start address? >>> (Vof does not seem to check any other property and seems to assume >>> the entry point is the same as the load address but for this linux >>> kernel it's not.) >> >> I guess if you really need an offset, you'll have to add a new >> property ("qemu,boot-kernel-start"?) and look for it in the firmware. >> Or, say, put in gpr5 in your version of spapr_cpu_set_entry_state() >> and make boot_from_memory() use it. > > Either way would work but I don't want to recompile vof.bin so if you I really do not want to add features with no user for it; and having this added with pegasos2 support make it clear why it is there. Also recompile is really simple :) > implement any of these in the next version I can use that. For now I've > just set kernel address to the start address and decreased size a bit, > the memory for the kernel is still claimed correctly when it's loaded so > unless something relies on the size in qemu,boot-kernel it does not > matter and this way the kernel starts but only gets to finding no > /chosen/stdout and exit there so I can't try it until I resolve that.
On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > On 22/05/2021 23:01, BALATON Zoltan wrote: >> On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >>> On 21/05/2021 19:05, BALATON Zoltan wrote: >>>> On Fri, 21 May 2021, Alexey Kardashevskiy wrote: >>>>> On 21/05/2021 07:59, BALATON Zoltan wrote: >>>>>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>>>>> The PAPR platform describes an OS environment that's presented by >>>>>>> a combination of a hypervisor and firmware. The features it specifies >>>>>>> require collaboration between the firmware and the hypervisor. >>>>>>> >>>>>>> Since the beginning, the runtime component of the firmware (RTAS) has >>>>>>> been implemented as a 20 byte shim which simply forwards it to >>>>>>> a hypercall implemented in qemu. The boot time firmware component is >>>>>>> SLOF - but a build that's specific to qemu, and has always needed to >>>>>>> be >>>>>>> updated in sync with it. Even though we've managed to limit the amount >>>>>>> of runtime communication we need between qemu and SLOF, there's some, >>>>>>> and it has become increasingly awkward to handle as we've implemented >>>>>>> new features. >>>>>>> >>>>>>> This implements a boot time OF client interface (CI) which is >>>>>>> enabled by a new "x-vof" pseries machine option (stands for "Virtual >>>>>>> Open >>>>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >>>>>>> which implements Open Firmware Client Interface (OF CI). This allows >>>>>>> using a smaller stateless firmware which does not have to manage >>>>>>> the device tree. >>>>>>> >>>>>>> The new "vof.bin" firmware image is included with source code under >>>>>>> pc-bios/. It also includes RTAS blob. >>>>>>> >>>>>>> This implements a handful of CI methods just to get -kernel/-initrd >>>>>>> working. In particular, this implements the device tree fetching and >>>>>>> simple memory allocator - "claim" (an OF CI memory allocator) and >>>>>>> updates >>>>>>> "/memory@0/available" to report the client about available memory. >>>>>>> >>>>>>> This implements changing some device tree properties which we know how >>>>>>> to deal with, the rest is ignored. To allow changes, this skips >>>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some room for >>>>>>> appending. >>>>>>> >>>>>>> In absence of SLOF, this assigns phandles to device tree nodes to make >>>>>>> device tree traversing work. >>>>>>> >>>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >>>>>>> >>>>>>> This adds basic instances support which are managed by a hash map >>>>>>> ihandle -> [phandle]. >>>>>>> >>>>>>> Before the guest started, the used memory is: >>>>>>> 0..e60 - the initial firmware >>>>>>> 8000..10000 - stack >>>>>>> 400000.. - kernel >>>>>>> 3ea0000.. - initramdisk >>>>>>> >>>>>>> This OF CI does not implement "interpret". >>>>>>> >>>>>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>>>>>> includes a disk image with pre-formatted nvram. >>>>>>> >>>>>>> With this basic support, this can only boot into kernel directly. >>>>>>> However this is just enough for the petitboot kernel and initradmdisk >>>>>>> to >>>>>>> boot from any possible source. Note this requires reasonably recent >>>>>>> guest >>>>>>> kernel with: >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>>>>>> The immediate benefit is much faster booting time which especially >>>>>>> crucial with fully emulated early CPU bring up environments. Also this >>>>>>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>>>>>> >>>>>>> This separates VOF and sPAPR in a hope that VOF bits may be reused by >>>>>>> other POWERPC boards which do not support pSeries. >>>>>>> >>>>>>> This is coded in assumption that later on we might be adding support >>>>>>> for >>>>>>> booting from QEMU backends (blockdev is the first candidate) without >>>>>>> devices/drivers in between as OF1275 does not require that and >>>>>>> it is quite easy to so. >>>>>>> >>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>> --- >>>>>>> >>>>>>> The example command line is: >>>>>>> >>>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>>>>>> -nodefaults \ >>>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>>>>>> -nographic \ >>>>>>> -vga none \ >>>>>>> -enable-kvm \ >>>>>>> -m 8G \ >>>>>>> -machine >>>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >>>>>>> \ >>>>>>> -kernel pbuild/kernel-le-guest/vmlinux \ >>>>>>> -initrd pb/rootfs.cpio.xz \ >>>>>>> -drive >>>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >>>>>>> \ >>>>>>> -global spapr-nvram.drive=DRIVE0 \ >>>>>>> -snapshot \ >>>>>>> -smp 8,threads=8 \ >>>>>>> -L /home/aik/t/qemu-ppc64-bios/ \ >>>>>>> -trace events=qemu_trace_events \ >>>>>>> -d guest_errors \ >>>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>>>>>> -mon chardev=SOCKET0,mode=control >>>>>>> >>>>>>> --- >>>>>>> Changes: >>>>>>> v20: >>>>>>> * compile vof.bin with -mcpu=power4 for better compatibility >>>>>>> * s/std/stw/ in entry.S to make it work on ppc32 >>>>>>> * fixed dt_available property to support both 32 and 64bit >>>>>>> * shuffled prom_args handling code >>>>>>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>>>>>> >>>>>> >>>>>> [...] >>>>>> >>>>>>> diff --git a/default-configs/devices/ppc64-softmmu.mak >>>>>>> b/default-configs/devices/ppc64-softmmu.mak >>>>>>> index ae0841fa3a18..9fb201dfacfa 100644 >>>>>>> --- a/default-configs/devices/ppc64-softmmu.mak >>>>>>> +++ b/default-configs/devices/ppc64-softmmu.mak >>>>>>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>>>>>> # For pSeries >>>>>>> CONFIG_PSERIES=y >>>>>>> CONFIG_NVDIMM=y >>>>>>> +CONFIG_VOF=y >>>>>>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>>>>>> index e51e0e5e5ac6..964510dfc73d 100644 >>>>>>> --- a/hw/ppc/Kconfig >>>>>>> +++ b/hw/ppc/Kconfig >>>>>>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>>>>>> >>>>>>> config FDT_PPC >>>>>>> bool >>>>>>> + >>>>>>> +config VOF >>>>>>> + bool >>>>>> >>>>>> I think you should just add "select VOF" to config PSERIES section in >>>>>> Kconfig instead of adding it to >>>>>> default-configs/devices/ppc64-softmmu.mak. >>>>> >>>>> oh well, can do that too. >>>> >>>> I think most config options should be selected by KConfig and the default >>>> config should only include machines, otherwise VOF would be added also >>>> when you don't compile PSERIES or PEGASOS2. With select in Kconfig it >>>> will be added when needed. That's why it's better to use select in this >>>> case. >>>> >>>>>> That should do it, it works in my updated pegasos2 patch: >>>>>> >>>>>> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 >>>>>> [...] >>>>>>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >>>>>>> new file mode 100644 >>>>>>> index 000000000000..569688714c91 >>>>>>> --- /dev/null >>>>>>> +++ b/pc-bios/vof/entry.S >>>>>>> @@ -0,0 +1,51 @@ >>>>>>> +#define LOAD32(rn, name) \ >>>>>>> + lis rn,name##@h; \ >>>>>>> + ori rn,rn,name##@l >>>>>>> + >>>>>>> +#define ENTRY(func_name) \ >>>>>>> + .text; \ >>>>>>> + .align 2; \ >>>>>>> + .globl .func_name; \ >>>>>>> + .func_name: \ >>>>>>> + .globl func_name; \ >>>>>>> + func_name: >>>>>>> + >>>>>>> +#define KVMPPC_HCALL_BASE 0xf000 >>>>>>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >>>>>>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >>>>>>> + >>>>>>> + . = 0x100 /* Do exactly as SLOF does */ >>>>>>> + >>>>>>> +ENTRY(_start) >>>>>>> +# LOAD32(%r31, 0) /* Go 32bit mode */ >>>>>>> +# mtmsrd %r31,0 >>>>>>> + LOAD32(2, __toc_start) >>>>>>> + b entry_c >>>>>>> + >>>>>>> +ENTRY(_prom_entry) >>>>>>> + LOAD32(2, __toc_start) >>>>>>> + stwu %r1,-112(%r1) >>>>>>> + stw %r31,104(%r1) >>>>>>> + mflr %r31 >>>>>>> + bl prom_entry >>>>>>> + nop >>>>>>> + mtlr %r31 >>>>>>> + ld %r31,104(%r1) >>>>>> >>>>>> It's getting there, now I see the first client call from the guest boot >>>>>> code but then it crashes on this ld opcode which apparently is 64 bit >>>>>> only: >>>>> >>>>> Oh right. >>>>> >>>>> >>>>>> Hopefully this is the last such opcode left before I can really test >>>>>> this. >>>>> >>>>> Make it lwz, and test it? >>>> >>>> Yes, figured that out too after sending this message. Replacing with lwz >>>> works but I wonder that now you have stwu lwz do the stack offsets need >>>> adjusting too or you just waste 4 bytes now? >>> >>> Well, this assumes the 64bit client and that ABI. I think ideally the >>> firmware is supposed to use its own stack but I did not bother here. I do >>> not know 32bit ABI at all so say whether the existing code should just >>> work or not :-/ >> >> It seems to work so that's OK, just thought if the firmware is 32 bit it >> does not need 64 bit values on stack but if that's also potentially used by >> a 64 bit kernel then it may be better to keep it that way to avoid >> confusion. With the 64 bit opcodes replaced it seems to work on pegasos2 >> and the guest can call CI functions and get a reply so maybe it's just a >> few wasted bytes that's not a big deal. >> >>>> With lwz here I found no further 64 bit opcodes and the guest boot code >>>> could walk the device tree. It failed later but I think that's because >>>> I'll need to fill more info about the machine in the device tree. I'll >>>> experiment with that but it looks like it could work at least for >>>> MorphOS. I'll have to try Linux too. >>> >>> There are plenty of tracepoints, enable them all. >> >> I'm running with -trace enable="vof*" but it does not give me too much info >> as a lot of calls (such as peer, child, etc.) don't log anything other than >> there was a hypercall so only get info about opening paths and querying >> some props. The MorphOS boot.img just walks the device tree gathering some >> data about the machine then calls quiesce and boot into the OS that later >> tries to use the gathered info at which point it crashes without any logs >> if some info is not as expected. This does not make it easy to debug but I >> think once I fill the device tree enough with all needed info it should >> work. Currently I'm missing info about PCI devices that it may need. > > > One thing to note about PCI is that normally I think the client expects the > firmware to do PCI probing and SLOF does it. But VOF does not and Linux scans > PCI bus(es) itself. Might be a problem for you kernel. I'm not sure what info does MorphOS get from the device tree and what it probes itself but I think it may at least need device ids and info about the PCI bus to be able to access the config regs, after that it should set the devices up hopefully. I could add these from the board code to device tree so VOF does not need to do anything about it. However I'm not getting to that point yet because it crashes on something that it's missing and couldn't yet find out what is that. I'd like to get Linux working now as that would be enough to test this and then if for MorphOS we still need a ROM it's not a problem if at least we can boot Linux without the original firmware. But I can't make Linux open a serial console and I don't know what it needs for that. Do you happen to know? I've looked at the sources in Linux/arch/powerpc but not sure how it would find and open a serial port on pegasos2. It seems to work with the board firmware and now I can get it to boot with VOF but then it does not open serial so it probably needs something in the device tree or expects the firmware to set something up that we should add in pegasos2.c when using VOF. >>>>>> Do you have some info on how the stdout works in VOF? I think I'll need >>>>>> that to test with Linux and get output but I'm not sure what's needed >>>>>> on the machine side. >>>>> >>>>> VOF opens stsout and stores the ihandle (in fdt) which the client >>>>> (==kernel) uses for writing. To make it work properly, you need to hook >>>>> up that instance to a device backend similar to what I have for >>>>> spapr-vty: >>>>> >>>>> https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd39cc5dad5b6385965d >>>>> This is not a part of this patch as I'm trying to keep things simpler >>>>> and accessing backends from VOF is still unsettled. But there is a >>>>> workaround which is trace_vof_write, I use this. Thanks, >>>> >>>> The above patch is about stdin but stdout seems to be added by the >>>> current vof patch. What is spapr-vty? >>> >>> It is pseries' paravirtual serial device, pegasos does not have it. >>> >>>> I don't think I have something similar in pegasos2 where I just have a >>>> normal serial port created by ISASuperIO in the vt8231 model. >>> >>> Correct. >>> >>>> Can I use that backend somehow or have to create some other serial device >>>> to connect to stdout? >>>> Does trace_vof_write work for stuff output by the guest? >>>> I guess that's only for things printed by VOF itself >>> >>> VOF itself does not prints anything in this patch. >> >> However it seems to be needed for linux as the first thing it does seems to >> be getting /chosen/stdout and calls exit if it returns nothing. So > > Right, Linux does but VOF (==vof.bin) does not. > >> I'll need this at least for linux. (I think MorphOS may also query it to >> print a banner or some messages but not sure it needs it, at least it does >> not abort right away if not found.) > > Tracepoints print this :) The vof_write tracepoints only work until the guest calls quiesce, after that it should open the serial and use that or init the screen but it does not seem to work yet. >>>> but to see Linux output do I need a stdout in VOF or it will just open >>>> the serial with its own driver and use that? >>>> So I'm not sure what's the stdout parts in the current vof patch does and >>>> if I need that for anything. I'll try to experiment with it some more but >>>> fixing the ld and Kconfig seems to be enough to get it work for me. >>> >>> So for the client to print something, /chosen/stdout needs to have a valid >>> ihandle. >>> The only way to get a valid ihandle is having a valid phandle which >>> vof_client_open() can open. >>> A valid phandle is a phandle of any node in the device tree. On spapr we >>> pick some spapr-vty, open it and store in /chosen/stdout. >>> >>> From this point output from the client can be seen via a tracepoint. >>> >>> Now if we want proper output without tracepoints - we need to hook it up >>> with some chardev backend (not a device such a vt8231 or spapr-vty but >>> backend). >> >> I don't know much about it but devices are also connected to some backend >> so is it possible to use the same backend for VOF as used for the normal >> serial port? > > Yes but with this initial patch there is no backend support, you only get > tracepoints. OK, I've got that now, traces work and if Linux would open the serial with its own driver then that would be enough for now. >> But I need a way to find that and connect it to VOF and I'm not qure how to >> do that yet. > > Pick some device in the machine reset code (or you can open the root - "/"), > resolve its FW (==FDT) path, call vof_client_open_store() on it, it will > store ihandle in the FDT. This will enable stdout and the output can be seen > via tracepoint. > > >> Or do I need to create a separate serial backend and connect that to VOF? >> I'll try to look at spapr-vty to see what it does. > > No additional devices needed. Yes, as I wrote in a subsequent message I've figured this out. >>> https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd3 does this: >>> 1. when a phandle is open, QEMU will search for DeviceState* for the >>> specific FDT node and get a chardev from the device. >>> 2. when write() is called, QEMU calls qemu_chr_fe_write_all() on chardev >>> from 1. >>> >>> From this point you do not need a tracepoint and the output will appears >>> in the console you set up for stdout. >>> >>> Now if you want input from this console, things get tricky. First, on >>> powernv/pseries we only need this for grub as otherwise the kernel has all >>> the drivers needed and will not use the client interface. For the grub, we >>> need to provide a valid ihandle for /chosen/stdin which is easy but >>> implementing read() on this is not as there is no simple >>> device-type-independend way of reading from chardev. I hacked it for >>> spapr-tvy but other serial devices will need special handling, or we'll >>> have to introduce some VOF_SERIAL_READ interface for those which will face >>> opposition :) >>> >>> Makes sense? >> >> It explains things a bit but still not entirely clear how can I get >> something to add as a stdout. With the pegasos2 firmware it puts the serial >> device there normally that it inits and opens. Without that firmware we >> have to somehow do that from QEMU so find the serial backend used by the >> serial device within the vt8231 model (or use a different backend just for >> this?) then open it and put it in the device tree. If that's correct or how >> to do it is not clear yet. > > spapr looks through all spapr-vty and picks one with the lowest @reg. You can > do a similar thing. Or add a machine option with a serial device id which you > want to be the default console. So many options :) Fortunately pegasos2 has a single serial port so that's easy to find. For now I'm using what the board firmware does and add a /failsafe node with device_type serial and open that which works for vof_write traces and Linux finds it as a serial console and adds console=ttyS0 to command line if it's not there yet so it should work but then it does not seem to find the serial device so I get no output. I don't know what Linux needs from the device tree to find the serial. I've tried adding it and some properties I say it querying but could not make it work yet. Regards, BALATON Zoltan
On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > On 23/05/2021 01:02, BALATON Zoltan wrote: >> On Sat, 22 May 2021, BALATON Zoltan wrote: >>> On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >>>> VOF itself does not prints anything in this patch. >>> >>> However it seems to be needed for linux as the first thing it does seems >>> to be getting /chosen/stdout and calls exit if it returns nothing. So I'll >>> need this at least for linux. (I think MorphOS may also query it to print >>> a banner or some messages but not sure it needs it, at least it does not >>> abort right away if not found.) >>> >>>>> but to see Linux output do I need a stdout in VOF or it will just open >>>>> the serial with its own driver and use that? >>>>> So I'm not sure what's the stdout parts in the current vof patch does >>>>> and if I need that for anything. I'll try to experiment with it some >>>>> more but fixing the ld and Kconfig seems to be enough to get it work for >>>>> me. >>>> >>>> So for the client to print something, /chosen/stdout needs to have a >>>> valid ihandle. >>>> The only way to get a valid ihandle is having a valid phandle which >>>> vof_client_open() can open. >>>> A valid phandle is a phandle of any node in the device tree. On spapr we >>>> pick some spapr-vty, open it and store in /chosen/stdout. >>>> >>>> From this point output from the client can be seen via a tracepoint. >> >> I've got it now. Looking at the original firmware device tree dump: >> >> https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2/attach/PegasosII_OFW-Dump.txt >> >> I see that /chosen/stdout points to "screen" which is an alias to >> /bootconsole. Just adding an empty /bootconsole node in the device tree and >> vof_client_open_store() that as /chosen/stdout works and I get output via >> vof_write traces so this is enough for now to test Linux. Properly >> connecting a serial backend can thus be postponed. >> >> So with this the Linux kernel does not abort on the first device tree >> access but starts to decompress itself then the embedded initrd and crashes >> at calling setprop: >> >> [...] >> vof_client_handle: setprop >> >> Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault. >> (gdb) bt >> #0 0x0000000000000000 in () >> #1 0x0000555555a5c2bf in vof_setprop >> (vof=0x7ffff48e9420, vallen=4, valaddr=<optimized out>, >> pname=<optimized out>, nodeph=8, fdt=0x7fff8aaff010, ms=0x5555564f8800) >> at ../hw/ppc/vof.c:308 >> #2 0x0000555555a5c2bf in vof_client_handle >> (nrets=1, rets=0x7ffff48e93f0, nargs=4, args=0x7ffff48e93c0, >> service=0x7ffff48e9460 "setprop", >> vof=0x7ffff48e9420, fdt=0x7fff8aaff010, ms=0x5555564f8800) at >> ../hw/ppc/vof.c:842 >> #3 0x0000555555a5c2bf in vof_client_call >> (ms=0x5555564f8800, vof=vof@entry=0x55555662a3d0, >> fdt=fdt@entry=0x7fff8aaff010, args_real=args_real@entry=23580472) >> at ../hw/ppc/vof.c:935 >> >> loooks like it's trying to set /chosen/linux,initrd-start: > > It is not horribly clear why it crashed though. It crashed becuase I had TYPE_VOF_MACHINE_IF but did not set a setprop callback and it tried to call that here. Adding a {return true;} empty callback avoids this. >> (gdb) up >> #1 0x0000555555a5c2bf in vof_setprop (vof=0x7ffff48e9420, vallen=4, >> valaddr=<optimized out>, pname=<optimized out>, nodeph=8, >> fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:308 >> 308 if (!vmc->setprop(ms, nodepath, propname, val, vallen)) { >> (gdb) p nodepath >> $1 = "/chosen\000\060/rPC,750CXE/", '\000' <repeats 234 times> >> (gdb) p propname >> $2 = >> "linux,initrd-start\000linux,initrd-end\000linux,cmdline-timeout\000bootarg" >> (gdb) p val >> $3 = <optimized out> >> >> I think I need the callback for setprop in TYPE_VOF_MACHINE_IF. I can copy >> spapr_vof_setprop() but some explanation on why that's needed might help. >> Ciould I just do fdt_setprop in my callback as vof_setprop() would do >> without a machine callback or is there some special handling needed for >> these properties? > > The short answer is yes, you do not need TYPE_VOF_MACHINE_IF. > > The long answer is that we build the FDT on spapr twice: > 1. at the reset time and > 2. after "ibm,client-arhitecture-support" (early in the boot the spapr > paravirtual client says what it supports - ISA level, MMU features, etc) > > Between 1 and 2 the kernel moves initrd and we do not update the QEMU's > version of its location, the tree at 2) will have the old values. > > So for that reason I have TYPE_VOF_MACHINE_IF. You most definitely do not > need it. I need TYPE_VOF_MACHINE_IF because that has the quiesce callback that I need to shut VOF down when the guest is finished with it otherwise it would crash later (more on this in next message). But since I shut down VOF here I don't need to remember changes to the FDT so I can just use an empty setprop callback. (I wouldn't even need that if VOF would check that a callback is non-NULL before calling it.) Regards, BALATON Zoltan
On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > On 23/05/2021 02:46, BALATON Zoltan wrote: >> On Sat, 22 May 2021, BALATON Zoltan wrote: >>> On Sat, 22 May 2021, BALATON Zoltan wrote: >>>> On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >>>>> VOF itself does not prints anything in this patch. >>>> >>>> However it seems to be needed for linux as the first thing it does seems >>>> to be getting /chosen/stdout and calls exit if it returns nothing. So >>>> I'll need this at least for linux. (I think MorphOS may also query it to >>>> print a banner or some messages but not sure it needs it, at least it >>>> does not abort right away if not found.) >>>> >>>>>> but to see Linux output do I need a stdout in VOF or it will just open >>>>>> the serial with its own driver and use that? >>>>>> So I'm not sure what's the stdout parts in the current vof patch does >>>>>> and if I need that for anything. I'll try to experiment with it some >>>>>> more but fixing the ld and Kconfig seems to be enough to get it work >>>>>> for me. >>>>> >>>>> So for the client to print something, /chosen/stdout needs to have a >>>>> valid ihandle. >>>>> The only way to get a valid ihandle is having a valid phandle which >>>>> vof_client_open() can open. >>>>> A valid phandle is a phandle of any node in the device tree. On spapr we >>>>> pick some spapr-vty, open it and store in /chosen/stdout. >>>>> >>>>> From this point output from the client can be seen via a tracepoint. >>> >>> I've got it now. Looking at the original firmware device tree dump: >>> >>> https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2/attach/PegasosII_OFW-Dump.txt >>> >>> I see that /chosen/stdout points to "screen" which is an alias to >>> /bootconsole. Just adding an empty /bootconsole node in the device tree >>> and vof_client_open_store() that as /chosen/stdout works and I get output >>> via vof_write traces so this is enough for now to test Linux. Properly >>> connecting a serial backend can thus be postponed. >> >> Using /failsafe instead of /bootconsole is even better because Linux then >> adds console=ttyS0 to the bootargs by default as it knows that's a serial >> port. > > When linux boots so far that it can use whatever is passed in "console=" - > the client interface is done pretty much and the output happens without it. That's the problem that Linux does not open serial yet when booting with VOF but I don't have everyhing in the device tree yet and devices may be set up differently when the board firmware haven't run so I'm not sure what's missing for Linux to find and open serial. Does anybody happen to know? >>> So with this the Linux kernel does not abort on the first device tree >>> access but starts to decompress itself then the embedded initrd and >>> crashes at calling setprop: >>> >>> [...] >>> vof_client_handle: setprop >>> >>> Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault. >>> (gdb) bt >>> #0 0x0000000000000000 in () >>> #1 0x0000555555a5c2bf in vof_setprop >>> (vof=0x7ffff48e9420, vallen=4, valaddr=<optimized out>, >>> pname=<optimized out>, nodeph=8, fdt=0x7fff8aaff010, ms=0x5555564f8800) >>> at ../hw/ppc/vof.c:308 >>> #2 0x0000555555a5c2bf in vof_client_handle >>> (nrets=1, rets=0x7ffff48e93f0, nargs=4, args=0x7ffff48e93c0, >>> service=0x7ffff48e9460 "setprop", >>> vof=0x7ffff48e9420, fdt=0x7fff8aaff010, ms=0x5555564f8800) at >>> ../hw/ppc/vof.c:842 >>> #3 0x0000555555a5c2bf in vof_client_call >>> (ms=0x5555564f8800, vof=vof@entry=0x55555662a3d0, >>> fdt=fdt@entry=0x7fff8aaff010, args_real=args_real@entry=23580472) >>> at ../hw/ppc/vof.c:935 >>> >>> loooks like it's trying to set /chosen/linux,initrd-start: >>> >>> (gdb) up >>> #1 0x0000555555a5c2bf in vof_setprop (vof=0x7ffff48e9420, vallen=4, >>> valaddr=<optimized out>, pname=<optimized out>, nodeph=8, >>> fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:308 >>> 308 if (!vmc->setprop(ms, nodepath, propname, val, vallen)) { >>> (gdb) p nodepath >>> $1 = "/chosen\000\060/rPC,750CXE/", '\000' <repeats 234 times> >>> (gdb) p propname >>> $2 = >>> "linux,initrd-start\000linux,initrd-end\000linux,cmdline-timeout\000bootarg" >>> (gdb) p val >>> $3 = <optimized out> >>> >>> I think I need the callback for setprop in TYPE_VOF_MACHINE_IF. I can copy >>> spapr_vof_setprop() but some explanation on why that's needed might help. >>> Ciould I just do fdt_setprop in my callback as vof_setprop() would do >>> without a machine callback or is there some special handling needed for >>> these properties? >> >> Just returning true from the setprop callback of the VofMachineIfClass for >> now to see what it would do and then it gets to all the way of calling >> quiesce. Unfortunately it then tries to call prom_printf on Pegasos2 as >> seen here: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.233#n3261 >> >> which does not work because I have to shut down vhyp at quiesce > > What is vhyp and why do you have to shut it down? The vhyp is the TYPE_PPC_VIRTUAL_HYPERVISOR interface that I need to get hypercalls working as I don't normally have it on pegasos2 so I need to install that for VOF but have to tear it down on quiece otherwise it would conflict with things later (at least the assert below but guests also use syscalls and I'm not sure that would also conflict). It works though early in the boot when VOF and guest code using VOF runs which is before the guest takes over the CPU and no syscalls are used by guests yet at this point. This is the current version of the patch I'm experimenting with: https://osdn.net/projects/qmiga/scm/git/qemu/commits/dd4ed0901501e12921cbdbe9e1f918167b168197 and the pegasos2.c after the patch: https://osdn.net/projects/qmiga/scm/git/qemu/blobs/pegasos2/hw/ppc/pegasos2.c maybe it explains more what I'm talking about. >> otherwise it trips an assert on writing sdr1 (and may also interfere with >> the guest's usage of syscalls). > > Where is that assert? It's here on line 73 in ppc_store_sdr1(): https://git.qemu.org/?p=qemu.git;a=blob;f=target/ppc/cpu.c;h=d957d1a687bf8ade79b5f466dd696b56f63d7e1e;hb=HEAD#l73 which is called when the guest tries to set up the MMU I think and if I still have vhyp set at that point. So I have to remove that on quiesce but then any further CI call will cause an exception due to sc 1 being a normal syscall again but we don't have exception handlers yet so it will be a run away exception first due to sc 1 then due to invalid opcode at the handler address. > I am a bit lost here. Nothing in the current VOF should touch any actual > device, it prints via tracepoints or (with that additional patch) to a > chardev backend. > > >> So I need a way to not generate an exception if the guest calls back into >> OF after quiesce. A hacky solution is to patch out the sc 1 or _prom_entry >> point to just return after quiesce but maybe a better way is needed such as >> a switch in vof.bin that it checks before doing a syscall. Other than this >> problem it seems to work for the most part so maybe making the _prom_entry >> check some global value that I can set from quiesce to stop it doing >> syscalls and just return would be the simplest way to avoid this crash in >> Linux and not need a special version of vof for pegasos2. (MorphOS does not >> seem to call OF after quiesce which seems safer to do anyway, don't know >> why Linux does that. It could just print that one line before quiesce and >> then it would work, unfortunately that's not what they did.) > > quiesce is supposed to wait until ongoing DMA is finished (or something like > that), it was (people say) a request from Apple back then and was never > really architected. Still it's used by guests to signal that they're finshed with OF calls so it's a convenient place to shut down VOF. Unfortunately Linux does another write call after quiesce which is silly as it does not even work on the real firmware (I'm not seeing the output of that call even with pegasos2.rom just does not crash) and the comment in the kernel says that some firmwares do crash so I don't know why they put it there but it's there and since there are binaries out there with this bug/feature we should handle that somehow. I can think of two ways: One is patching the ci_entry to just return after quiesce without doing the hypercall that I've done in the patch above but instead of the hack binary patching. a better way would be to have a known address in VOF holding a flag that I can flip to disable ci_entry so it would check the flag and return if it's set then I would not need to modify the binary and know the address of ci_entry. Or second option would be to have dummy exception handlers in VOF that ignores this exception so it won't crash on this CI call after quiesce that Linux does. Does it make sense? Do you have other idea? Regards, BALATON Zoltan
On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > On 22/05/2021 23:08, BALATON Zoltan wrote: >> On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >>> On 22/05/2021 05:57, BALATON Zoltan wrote: >>>> On Fri, 21 May 2021, BALATON Zoltan wrote: >>>>> On Fri, 21 May 2021, Alexey Kardashevskiy wrote: >>>>>> On 21/05/2021 07:59, BALATON Zoltan wrote: >>>>>>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>>>>>> The PAPR platform describes an OS environment that's presented by >>>>>>>> a combination of a hypervisor and firmware. The features it specifies >>>>>>>> require collaboration between the firmware and the hypervisor. >>>>>>>> >>>>>>>> Since the beginning, the runtime component of the firmware (RTAS) has >>>>>>>> been implemented as a 20 byte shim which simply forwards it to >>>>>>>> a hypercall implemented in qemu. The boot time firmware component is >>>>>>>> SLOF - but a build that's specific to qemu, and has always needed to >>>>>>>> be >>>>>>>> updated in sync with it. Even though we've managed to limit the >>>>>>>> amount >>>>>>>> of runtime communication we need between qemu and SLOF, there's some, >>>>>>>> and it has become increasingly awkward to handle as we've implemented >>>>>>>> new features. >>>>>>>> >>>>>>>> This implements a boot time OF client interface (CI) which is >>>>>>>> enabled by a new "x-vof" pseries machine option (stands for "Virtual >>>>>>>> Open >>>>>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >>>>>>>> which implements Open Firmware Client Interface (OF CI). This allows >>>>>>>> using a smaller stateless firmware which does not have to manage >>>>>>>> the device tree. >>>>>>>> >>>>>>>> The new "vof.bin" firmware image is included with source code under >>>>>>>> pc-bios/. It also includes RTAS blob. >>>>>>>> >>>>>>>> This implements a handful of CI methods just to get -kernel/-initrd >>>>>>>> working. In particular, this implements the device tree fetching and >>>>>>>> simple memory allocator - "claim" (an OF CI memory allocator) and >>>>>>>> updates >>>>>>>> "/memory@0/available" to report the client about available memory. >>>>>>>> >>>>>>>> This implements changing some device tree properties which we know >>>>>>>> how >>>>>>>> to deal with, the rest is ignored. To allow changes, this skips >>>>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some room for >>>>>>>> appending. >>>>>>>> >>>>>>>> In absence of SLOF, this assigns phandles to device tree nodes to >>>>>>>> make >>>>>>>> device tree traversing work. >>>>>>>> >>>>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >>>>>>>> >>>>>>>> This adds basic instances support which are managed by a hash map >>>>>>>> ihandle -> [phandle]. >>>>>>>> >>>>>>>> Before the guest started, the used memory is: >>>>>>>> 0..e60 - the initial firmware >>>>>>>> 8000..10000 - stack >>>>>>>> 400000.. - kernel >>>>>>>> 3ea0000.. - initramdisk >>>>>>>> >>>>>>>> This OF CI does not implement "interpret". >>>>>>>> >>>>>>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>>>>>>> includes a disk image with pre-formatted nvram. >>>>>>>> >>>>>>>> With this basic support, this can only boot into kernel directly. >>>>>>>> However this is just enough for the petitboot kernel and initradmdisk >>>>>>>> to >>>>>>>> boot from any possible source. Note this requires reasonably recent >>>>>>>> guest >>>>>>>> kernel with: >>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>>>>>>> The immediate benefit is much faster booting time which especially >>>>>>>> crucial with fully emulated early CPU bring up environments. Also >>>>>>>> this >>>>>>>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>>>>>>> >>>>>>>> This separates VOF and sPAPR in a hope that VOF bits may be reused by >>>>>>>> other POWERPC boards which do not support pSeries. >>>>>>>> >>>>>>>> This is coded in assumption that later on we might be adding support >>>>>>>> for >>>>>>>> booting from QEMU backends (blockdev is the first candidate) without >>>>>>>> devices/drivers in between as OF1275 does not require that and >>>>>>>> it is quite easy to so. >>>>>>>> >>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>>> --- >>>>>>>> >>>>>>>> The example command line is: >>>>>>>> >>>>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>>>>>>> -nodefaults \ >>>>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>>>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>>>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>>>>>>> -nographic \ >>>>>>>> -vga none \ >>>>>>>> -enable-kvm \ >>>>>>>> -m 8G \ >>>>>>>> -machine >>>>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off >>>>>>>> \ >>>>>>>> -kernel pbuild/kernel-le-guest/vmlinux \ >>>>>>>> -initrd pb/rootfs.cpio.xz \ >>>>>>>> -drive >>>>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw >>>>>>>> \ >>>>>>>> -global spapr-nvram.drive=DRIVE0 \ >>>>>>>> -snapshot \ >>>>>>>> -smp 8,threads=8 \ >>>>>>>> -L /home/aik/t/qemu-ppc64-bios/ \ >>>>>>>> -trace events=qemu_trace_events \ >>>>>>>> -d guest_errors \ >>>>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>>>>>>> -mon chardev=SOCKET0,mode=control >>>>>>>> >>>>>>>> --- >>>>>>>> Changes: >>>>>>>> v20: >>>>>>>> * compile vof.bin with -mcpu=power4 for better compatibility >>>>>>>> * s/std/stw/ in entry.S to make it work on ppc32 >>>>>>>> * fixed dt_available property to support both 32 and 64bit >>>>>>>> * shuffled prom_args handling code >>>>>>>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>>>>>>> >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>> diff --git a/default-configs/devices/ppc64-softmmu.mak >>>>>>>> b/default-configs/devices/ppc64-softmmu.mak >>>>>>>> index ae0841fa3a18..9fb201dfacfa 100644 >>>>>>>> --- a/default-configs/devices/ppc64-softmmu.mak >>>>>>>> +++ b/default-configs/devices/ppc64-softmmu.mak >>>>>>>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>>>>>>> # For pSeries >>>>>>>> CONFIG_PSERIES=y >>>>>>>> CONFIG_NVDIMM=y >>>>>>>> +CONFIG_VOF=y >>>>>>>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>>>>>>> index e51e0e5e5ac6..964510dfc73d 100644 >>>>>>>> --- a/hw/ppc/Kconfig >>>>>>>> +++ b/hw/ppc/Kconfig >>>>>>>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>>>>>>> >>>>>>>> config FDT_PPC >>>>>>>> bool >>>>>>>> + >>>>>>>> +config VOF >>>>>>>> + bool >>>>>>> >>>>>>> I think you should just add "select VOF" to config PSERIES section in >>>>>>> Kconfig instead of adding it to >>>>>>> default-configs/devices/ppc64-softmmu.mak. >>>>>> >>>>>> oh well, can do that too. >>>>> >>>>> I think most config options should be selected by KConfig and the >>>>> default config should only include machines, otherwise VOF would be >>>>> added also when you don't compile PSERIES or PEGASOS2. With select in >>>>> Kconfig it will be added when needed. That's why it's better to use >>>>> select in this case. >>>>> >>>>>>> That should do it, it works in my updated pegasos2 patch: >>>>>>> >>>>>>> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61 >>>>>>> [...] >>>>>>>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S >>>>>>>> new file mode 100644 >>>>>>>> index 000000000000..569688714c91 >>>>>>>> --- /dev/null >>>>>>>> +++ b/pc-bios/vof/entry.S >>>>>>>> @@ -0,0 +1,51 @@ >>>>>>>> +#define LOAD32(rn, name) \ >>>>>>>> + lis rn,name##@h; \ >>>>>>>> + ori rn,rn,name##@l >>>>>>>> + >>>>>>>> +#define ENTRY(func_name) \ >>>>>>>> + .text; \ >>>>>>>> + .align 2; \ >>>>>>>> + .globl .func_name; \ >>>>>>>> + .func_name: \ >>>>>>>> + .globl func_name; \ >>>>>>>> + func_name: >>>>>>>> + >>>>>>>> +#define KVMPPC_HCALL_BASE 0xf000 >>>>>>>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) >>>>>>>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) >>>>>>>> + >>>>>>>> + . = 0x100 /* Do exactly as SLOF does */ >>>>>>>> + >>>>>>>> +ENTRY(_start) >>>>>>>> +# LOAD32(%r31, 0) /* Go 32bit mode */ >>>>>>>> +# mtmsrd %r31,0 >>>>>>>> + LOAD32(2, __toc_start) >>>>>>>> + b entry_c >>>>>>>> + >>>>>>>> +ENTRY(_prom_entry) >>>>>>>> + LOAD32(2, __toc_start) >>>>>>>> + stwu %r1,-112(%r1) >>>>>>>> + stw %r31,104(%r1) >>>>>>>> + mflr %r31 >>>>>>>> + bl prom_entry >>>>>>>> + nop >>>>>>>> + mtlr %r31 >>>>>>>> + ld %r31,104(%r1) >>>>>>> >>>>>>> It's getting there, now I see the first client call from the guest >>>>>>> boot code but then it crashes on this ld opcode which apparently is 64 >>>>>>> bit only: >>>>>> >>>>>> Oh right. >>>>>> >>>>>> >>>>>>> Hopefully this is the last such opcode left before I can really test >>>>>>> this. >>>>>> >>>>>> Make it lwz, and test it? >>>>> >>>>> Yes, figured that out too after sending this message. Replacing with lwz >>>>> works but I wonder that now you have stwu lwz do the stack offsets need >>>>> adjusting too or you just waste 4 bytes now? With lwz here I found no >>>>> further 64 bit opcodes and the guest boot code could walk the device >>>>> tree. It failed later but I think that's because I'll need to fill more >>>>> info about the machine in the device tree. I'll experiment with that but >>>>> it looks like it could work at least for MorphOS. I'll have to try Linux >>>>> too. >>>> >>>> I was trying to get a linux kernel from a debian powerpc iso to do >>>> something (debian before 10.0 has Pegasos support) but I've run into the >>>> problem that the kernel is loaded at 0x400000 but the start address is at >>>> some offset from that. How do I set qemu,boot-kernel in this case? >>> >>> >>> The pseries kernel can work from any location (and it relocates itself to >>> 0 at some point) even though it is linked at c000.0000.0000.0000, and >>> there is no start address offset: >>> >>> === >>>> objdump -D ~/pbuild/kernel-le/vmlinux >>> /home/aik/pbuild/kernel-le/vmlinux: file format elf64-powerpcle >>> >>> >>> Disassembly of section .head.text: >>> >>> c000000000000000 <__start>: >>> c000000000000000: 48 00 00 08 tdi 0,r0,72 >>> c000000000000004: 2c 00 00 48 b c000000000000030 >>> <__start+0x30> >>> ... >>> === >>> >>> Not sure about pegasos2 kernels (or any ppc32 really), sorry. >> >> The kernel from Debian 10.0 powerpc used on pegasos looks like this: >> >> vmlinuz-chrp.initrd: file format elf32-powerpc >> vmlinuz-chrp.initrd >> architecture: powerpc:common, flags 0x00000112: >> EXEC_P, HAS_SYMS, D_PAGED >> start address 0x004002fc >> >> Program Header: >> LOAD off 0x00010000 vaddr 0x00400000 paddr 0x00400000 align 2**16 >> filesz 0x0127b72a memsz 0x0127d5d8 flags rwx >> STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4 >> filesz 0x00000000 memsz 0x00000000 flags rwx >> NOTE off 0x000000b4 vaddr 0x00000000 paddr 0x00000000 align 2**0 >> filesz 0x0000002c memsz 0x00000000 flags --- >> NOTE off 0x000000e0 vaddr 0x00000000 paddr 0x00000000 align 2**0 >> filesz 0x0000002c memsz 0x00000000 flags --- >> >> Sections: >> Idx Name Size VMA LMA File off Algn >> 0 .text 00008588 00400000 00400000 00010000 2**2 >> CONTENTS, ALLOC, LOAD, READONLY, CODE >> 1 .text.unlikely 00000078 00408588 00408588 00018588 2**2 >> CONTENTS, ALLOC, LOAD, READONLY, CODE >> 2 .data 00001bec 00409000 00409000 00019000 2**2 >> CONTENTS, ALLOC, LOAD, DATA >> 3 .got 0000000c 0040abec 0040abec 0001abec 2**2 >> CONTENTS, ALLOC, LOAD, DATA >> 4 __builtin_cmdline 00000800 0040abf8 0040abf8 0001abf8 2**2 >> CONTENTS, ALLOC, LOAD, DATA >> 5 .kernel:vmlinux.strip 0047658e 0040c000 0040c000 0001c000 2**0 >> CONTENTS, ALLOC, LOAD, READONLY, DATA >> 6 .kernel:initrd 00df872a 00883000 00883000 00493000 2**0 >> CONTENTS, ALLOC, LOAD, READONLY, DATA >> 7 .bss 000015d8 0167c000 0167c000 0128b72a 2**2 >> ALLOC >> 8 .debug_info 0000e7fd 00000000 00000000 0128b72a 2**0 >> CONTENTS, READONLY, DEBUGGING >> 9 .debug_abbrev 00002a4f 00000000 00000000 01299f27 2**0 >> CONTENTS, READONLY, DEBUGGING >> 10 .debug_loc 00009df1 00000000 00000000 0129c976 2**0 >> CONTENTS, READONLY, DEBUGGING >> 11 .debug_aranges 00000250 00000000 00000000 012a6767 2**0 >> CONTENTS, READONLY, DEBUGGING >> 12 .debug_line 000026b8 00000000 00000000 012a69b7 2**0 >> CONTENTS, READONLY, DEBUGGING >> 13 .debug_str 00001d9c 00000000 00000000 012a906f 2**0 >> CONTENTS, READONLY, DEBUGGING >> 14 .comment 0000001d 00000000 00000000 012aae0b 2**0 >> CONTENTS, READONLY >> 15 .gnu.attributes 00000010 00000000 00000000 012aae28 2**0 >> CONTENTS, READONLY >> 16 .debug_frame 00001c88 00000000 00000000 012aae38 2**2 >> CONTENTS, READONLY, DEBUGGING >> 17 .debug_ranges 00000740 00000000 00000000 012acac0 2**0 >> CONTENTS, READONLY, DEBUGGING >> >> It even seems to have the initrd embedded in it. If I just use 0x400000 as >> start address it does not work, has to jump to the start address for it to >> start correctly. >> >>>> Because when I set it to the address/size where the kernel is loaded it >>>> jumps to the beginnig not the correct start address. If I set the address >>>> to the start address then size will be wrong so I don't know how to set >>>> qemu,boot-kernel in this case or is there another property to tell the >>>> start address? >>>> (Vof does not seem to check any other property and seems to assume the >>>> entry point is the same as the load address but for this linux kernel >>>> it's not.) >>> >>> I guess if you really need an offset, you'll have to add a new property >>> ("qemu,boot-kernel-start"?) and look for it in the firmware. Or, say, put >>> in gpr5 in your version of spapr_cpu_set_entry_state() and make >>> boot_from_memory() use it. >> >> Either way would work but I don't want to recompile vof.bin so if you > > I really do not want to add features with no user for it; and having this > added with pegasos2 support make it clear why it is there. Also recompile is > really simple :) Provided you have a cross compiler set up and do not run into problems you've mentioned before. So I'd prefer to not increase the source of bugs by also modifying VOF. This is not only for pegasos2. An ELF file does not necessarily have the entry point equal to its load address so while you happen to have a kernel that does that now you could have another later that won't so supporting it in some way would be the right thing to do anyway. Also I can't decide what is better, using a gpr or a device tree property so whatever you prefer. You probably need a respin anyway so adding it to that seems to be simpler than for me to try starting compiling and testing VOF too (also I can't test with spapr so can't make sure the changes I make won't break something). So you seem to be a better position for VOF changes. I think I only need these: 1. Get rid of the ld 64 bit opcode in _prom_entry 2. Support ELF with entry point != load address 3. Have a way to disable ci_entry after quiesce so it won't do sc 1 that would generate exception in my case otherwise or ignore that exception within VOF. I think other issues have been already resolved with your latest patch version unless I forgot something. Regards, BALATON Zoltan
On Sun, 23 May 2021, BALATON Zoltan wrote: > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >> One thing to note about PCI is that normally I think the client expects the >> firmware to do PCI probing and SLOF does it. But VOF does not and Linux >> scans PCI bus(es) itself. Might be a problem for you kernel. > > I'm not sure what info does MorphOS get from the device tree and what it > probes itself but I think it may at least need device ids and info about the > PCI bus to be able to access the config regs, after that it should set the > devices up hopefully. I could add these from the board code to device tree so > VOF does not need to do anything about it. However I'm not getting to that > point yet because it crashes on something that it's missing and couldn't yet > find out what is that. > > I'd like to get Linux working now as that would be enough to test this and > then if for MorphOS we still need a ROM it's not a problem if at least we can > boot Linux without the original firmware. But I can't make Linux open a > serial console and I don't know what it needs for that. Do you happen to > know? I've looked at the sources in Linux/arch/powerpc but not sure how it > would find and open a serial port on pegasos2. It seems to work with the > board firmware and now I can get it to boot with VOF but then it does not > open serial so it probably needs something in the device tree or expects the > firmware to set something up that we should add in pegasos2.c when using VOF. I've now found that Linux uses rtas methods read-pci-config and write-pci-config for PCI access on pegasos2 so this means that we'll probably need rtas too (I hoped we could get away without it if it were only used for shutdown/reboot or so but seems Linux needs it for PCI as well and does not scan the bus and won't find some devices without it). While VOF can do rtas, this causes a problem with the hypercall method using sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so cannot work after guest is past quiesce. So the question is why is that assert there and would using sc 1 for hypercalls on pegasos2 cause other problems later even if the assert could be removed? Can somebody who knows more about it explain this please? If this cannot be resolved then we may need a different hypercall method on pegasos2 (I've considered MOL OSI or are there other options? I may use some advice from people who know it better, especially the possible interaction with KVM later as the long term goal with pegasos2 is to be able to run with KVM on PPC hardware eventually.) But this also means that if that assert cannot be dropped or there may be other problems with sc 1 hypercalls then we maybe cannot have the same vof.bin and we'll need a separate version that I would like to avoid if possible so if there's a simple way to keep it working or make vof.bin use alternate hypercall method without needing a separate binary that would be the direction I'd tend to go. Even if we need a seoarate version I'd like to keep as much common as possible. I've tested that the missing rtas is not the reason for getting no output via serial though, as even when disabling rtas on pegasos2.rom it boots and I still get serial output just some PCI devices are not detected (such as USB, the video card and the not emulated ethernet port but these are not fatal so it might even work as a first try without rtas, just to boot a Linux kernel for testing it would be enough if I can fix the serial output). I still don't know why it's not finding serial but I think it may be some missing or wrong info in the device tree I generat. I'll try to focus on this for now and leave the above rtas question for later. Regards, BALATON Zoltan
On 5/23/21 21:24, BALATON Zoltan wrote: > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >> On 23/05/2021 01:02, BALATON Zoltan wrote: >>> On Sat, 22 May 2021, BALATON Zoltan wrote: >>>> On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >>>>> VOF itself does not prints anything in this patch. >>>> >>>> However it seems to be needed for linux as the first thing it does >>>> seems to be getting /chosen/stdout and calls exit if it returns >>>> nothing. So I'll need this at least for linux. (I think MorphOS may >>>> also query it to print a banner or some messages but not sure it >>>> needs it, at least it does not abort right away if not found.) >>>> >>>>>> but to see Linux output do I need a stdout in VOF or it will just >>>>>> open the serial with its own driver and use that? >>>>>> So I'm not sure what's the stdout parts in the current vof patch >>>>>> does and if I need that for anything. I'll try to experiment with >>>>>> it some more but fixing the ld and Kconfig seems to be enough to >>>>>> get it work for me. >>>>> >>>>> So for the client to print something, /chosen/stdout needs to have >>>>> a valid ihandle. >>>>> The only way to get a valid ihandle is having a valid phandle which >>>>> vof_client_open() can open. >>>>> A valid phandle is a phandle of any node in the device tree. On >>>>> spapr we pick some spapr-vty, open it and store in /chosen/stdout. >>>>> >>>>> From this point output from the client can be seen via a tracepoint. >>> >>> I've got it now. Looking at the original firmware device tree dump: >>> >>> https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2/attach/PegasosII_OFW-Dump.txt >>> >>> I see that /chosen/stdout points to "screen" which is an alias to >>> /bootconsole. Just adding an empty /bootconsole node in the device >>> tree and vof_client_open_store() that as /chosen/stdout works and I >>> get output via vof_write traces so this is enough for now to test >>> Linux. Properly connecting a serial backend can thus be postponed. >>> >>> So with this the Linux kernel does not abort on the first device tree >>> access but starts to decompress itself then the embedded initrd and >>> crashes at calling setprop: >>> >>> [...] >>> vof_client_handle: setprop >>> >>> Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault. >>> (gdb) bt >>> #0 0x0000000000000000 in () >>> #1 0x0000555555a5c2bf in vof_setprop >>> (vof=0x7ffff48e9420, vallen=4, valaddr=<optimized out>, >>> pname=<optimized out>, nodeph=8, fdt=0x7fff8aaff010, ms=0x5555564f8800) >>> at ../hw/ppc/vof.c:308 >>> #2 0x0000555555a5c2bf in vof_client_handle >>> (nrets=1, rets=0x7ffff48e93f0, nargs=4, args=0x7ffff48e93c0, >>> service=0x7ffff48e9460 "setprop", >>> vof=0x7ffff48e9420, fdt=0x7fff8aaff010, ms=0x5555564f8800) at >>> ../hw/ppc/vof.c:842 >>> #3 0x0000555555a5c2bf in vof_client_call >>> (ms=0x5555564f8800, vof=vof@entry=0x55555662a3d0, >>> fdt=fdt@entry=0x7fff8aaff010, args_real=args_real@entry=23580472) >>> at ../hw/ppc/vof.c:935 >>> >>> loooks like it's trying to set /chosen/linux,initrd-start: >> >> It is not horribly clear why it crashed though. > > It crashed becuase I had TYPE_VOF_MACHINE_IF but did not set a setprop > callback and it tried to call that here. Adding a {return true;} empty > callback avoids this. Ah ok. > >>> (gdb) up >>> #1 0x0000555555a5c2bf in vof_setprop (vof=0x7ffff48e9420, vallen=4, >>> valaddr=<optimized out>, pname=<optimized out>, nodeph=8, >>> fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:308 >>> 308 if (!vmc->setprop(ms, nodepath, propname, val, vallen)) { >>> (gdb) p nodepath >>> $1 = "/chosen\000\060/rPC,750CXE/", '\000' <repeats 234 times> >>> (gdb) p propname >>> $2 = >>> "linux,initrd-start\000linux,initrd-end\000linux,cmdline-timeout\000bootarg" >>> (gdb) p val >>> $3 = <optimized out> >>> >>> I think I need the callback for setprop in TYPE_VOF_MACHINE_IF. I can >>> copy spapr_vof_setprop() but some explanation on why that's needed >>> might help. Ciould I just do fdt_setprop in my callback as >>> vof_setprop() would do without a machine callback or is there some >>> special handling needed for these properties? >> >> The short answer is yes, you do not need TYPE_VOF_MACHINE_IF. >> >> The long answer is that we build the FDT on spapr twice: >> 1. at the reset time and >> 2. after "ibm,client-arhitecture-support" (early in the boot the spapr >> paravirtual client says what it supports - ISA level, MMU features, etc) >> >> Between 1 and 2 the kernel moves initrd and we do not update the >> QEMU's version of its location, the tree at 2) will have the old values. >> >> So for that reason I have TYPE_VOF_MACHINE_IF. You most definitely do >> not need it. > > I need TYPE_VOF_MACHINE_IF because that has the quiesce callback that I > need to shut VOF down when the guest is finished with it otherwise it > would crash later (more on this in next message). Nah, quiesce() only means stopping IO in VOF. VOF is shut down when the client decides to stop using it (and zero that memory). > But since I shut down > VOF here I don't need to remember changes to the FDT so I can just use > an empty setprop callback. (I wouldn't even need that if VOF would check > that a callback is non-NULL before calling it.) I'll add the check. I'll need some time to go though the other mails, closer to the weekend, there are too many gaps in my knowledge about those 32bit systems. I am really not sure that you need TYPE_PPC_VIRTUAL_HYPERVISOR (is this just to make "sc 1" work? there should be a better way) or RTAS (although it looks like you need it for PCI, you likely do not need it for your serial device which is ISA which I have no idea how it works). Do you have an actual machine? Can you dump its device tree to see what yours is missing?
On Thu, May 20, 2021 at 11:59:07PM +0200, BALATON Zoltan wrote: > On Thu, 20 May 2021, Alexey Kardashevskiy wrote: > > The PAPR platform describes an OS environment that's presented by > > a combination of a hypervisor and firmware. The features it specifies > > require collaboration between the firmware and the hypervisor. > > > > Since the beginning, the runtime component of the firmware (RTAS) has > > been implemented as a 20 byte shim which simply forwards it to > > a hypercall implemented in qemu. The boot time firmware component is > > SLOF - but a build that's specific to qemu, and has always needed to be > > updated in sync with it. Even though we've managed to limit the amount > > of runtime communication we need between qemu and SLOF, there's some, > > and it has become increasingly awkward to handle as we've implemented > > new features. > > > > This implements a boot time OF client interface (CI) which is > > enabled by a new "x-vof" pseries machine option (stands for "Virtual Open > > Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall > > which implements Open Firmware Client Interface (OF CI). This allows > > using a smaller stateless firmware which does not have to manage > > the device tree. > > > > The new "vof.bin" firmware image is included with source code under > > pc-bios/. It also includes RTAS blob. > > > > This implements a handful of CI methods just to get -kernel/-initrd > > working. In particular, this implements the device tree fetching and > > simple memory allocator - "claim" (an OF CI memory allocator) and updates > > "/memory@0/available" to report the client about available memory. > > > > This implements changing some device tree properties which we know how > > to deal with, the rest is ignored. To allow changes, this skips > > fdt_pack() when x-vof=on as not packing the blob leaves some room for > > appending. > > > > In absence of SLOF, this assigns phandles to device tree nodes to make > > device tree traversing work. > > > > When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. > > > > This adds basic instances support which are managed by a hash map > > ihandle -> [phandle]. > > > > Before the guest started, the used memory is: > > 0..e60 - the initial firmware > > 8000..10000 - stack > > 400000.. - kernel > > 3ea0000.. - initramdisk > > > > This OF CI does not implement "interpret". > > > > Unlike SLOF, this does not format uninitialized nvram. Instead, this > > includes a disk image with pre-formatted nvram. > > > > With this basic support, this can only boot into kernel directly. > > However this is just enough for the petitboot kernel and initradmdisk to > > boot from any possible source. Note this requires reasonably recent guest > > kernel with: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 > > > > The immediate benefit is much faster booting time which especially > > crucial with fully emulated early CPU bring up environments. Also this > > may come handy when/if GRUB-in-the-userspace sees light of the day. > > > > This separates VOF and sPAPR in a hope that VOF bits may be reused by > > other POWERPC boards which do not support pSeries. > > > > This is coded in assumption that later on we might be adding support for > > booting from QEMU backends (blockdev is the first candidate) without > > devices/drivers in between as OF1275 does not require that and > > it is quite easy to so. > > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > --- > > > > The example command line is: > > > > /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ > > -nodefaults \ > > -chardev stdio,id=STDIO0,signal=off,mux=on \ > > -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ > > -mon id=MON0,chardev=STDIO0,mode=readline \ > > -nographic \ > > -vga none \ > > -enable-kvm \ > > -m 8G \ > > -machine pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off \ > > -kernel pbuild/kernel-le-guest/vmlinux \ > > -initrd pb/rootfs.cpio.xz \ > > -drive id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \ > > -global spapr-nvram.drive=DRIVE0 \ > > -snapshot \ > > -smp 8,threads=8 \ > > -L /home/aik/t/qemu-ppc64-bios/ \ > > -trace events=qemu_trace_events \ > > -d guest_errors \ > > -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ > > -mon chardev=SOCKET0,mode=control > > > > --- > > Changes: > > v20: > > * compile vof.bin with -mcpu=power4 for better compatibility > > * s/std/stw/ in entry.S to make it work on ppc32 > > * fixed dt_available property to support both 32 and 64bit > > * shuffled prom_args handling code > > * do not enforce 32bit in MSR (again, to support 32bit platforms) > > > > [...] > > > diff --git a/default-configs/devices/ppc64-softmmu.mak b/default-configs/devices/ppc64-softmmu.mak > > index ae0841fa3a18..9fb201dfacfa 100644 > > --- a/default-configs/devices/ppc64-softmmu.mak > > +++ b/default-configs/devices/ppc64-softmmu.mak > > @@ -9,3 +9,4 @@ CONFIG_POWERNV=y > > # For pSeries > > CONFIG_PSERIES=y > > CONFIG_NVDIMM=y > > +CONFIG_VOF=y > > diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig > > index e51e0e5e5ac6..964510dfc73d 100644 > > --- a/hw/ppc/Kconfig > > +++ b/hw/ppc/Kconfig > > @@ -143,3 +143,6 @@ config FW_CFG_PPC > > > > config FDT_PPC > > bool > > + > > +config VOF > > + bool > > I think you should just add "select VOF" to config PSERIES section in > Kconfig instead of adding it to default-configs/devices/ppc64-softmmu.mak. > That should do it, it works in my updated pegasos2 patch: No, we don't want a "select": PSERIES doesn't require VOF while we still support SLOF, and indeed we're quite a ways from being ready to even make VOF the default pseries firmware.
On Mon, May 24, 2021 at 02:26:42PM +1000, Alexey Kardashevskiy wrote: > > > On 5/23/21 21:24, BALATON Zoltan wrote: > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > > > On 23/05/2021 01:02, BALATON Zoltan wrote: > > > > On Sat, 22 May 2021, BALATON Zoltan wrote: > > > > > On Sat, 22 May 2021, Alexey Kardashevskiy wrote: > > > > > > VOF itself does not prints anything in this patch. > > > > > > > > > > However it seems to be needed for linux as the first thing > > > > > it does seems to be getting /chosen/stdout and calls exit if > > > > > it returns nothing. So I'll need this at least for linux. (I > > > > > think MorphOS may also query it to print a banner or some > > > > > messages but not sure it needs it, at least it does not > > > > > abort right away if not found.) > > > > > > > > > > > > but to see Linux output do I need a stdout in VOF or > > > > > > > it will just open the serial with its own driver and > > > > > > > use that? > > > > > > > So I'm not sure what's the stdout parts in the > > > > > > > current vof patch does and if I need that for > > > > > > > anything. I'll try to experiment with it some more > > > > > > > but fixing the ld and Kconfig seems to be enough to > > > > > > > get it work for me. > > > > > > > > > > > > So for the client to print something, /chosen/stdout > > > > > > needs to have a valid ihandle. > > > > > > The only way to get a valid ihandle is having a valid > > > > > > phandle which vof_client_open() can open. > > > > > > A valid phandle is a phandle of any node in the device > > > > > > tree. On spapr we pick some spapr-vty, open it and store > > > > > > in /chosen/stdout. > > > > > > > > > > > > From this point output from the client can be seen via a tracepoint. > > > > > > > > I've got it now. Looking at the original firmware device tree dump: > > > > > > > > https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2/attach/PegasosII_OFW-Dump.txt > > > > > > > > I see that /chosen/stdout points to "screen" which is an alias > > > > to /bootconsole. Just adding an empty /bootconsole node in the > > > > device tree and vof_client_open_store() that as /chosen/stdout > > > > works and I get output via vof_write traces so this is enough > > > > for now to test Linux. Properly connecting a serial backend can > > > > thus be postponed. > > > > > > > > So with this the Linux kernel does not abort on the first device > > > > tree access but starts to decompress itself then the embedded > > > > initrd and crashes at calling setprop: > > > > > > > > [...] > > > > vof_client_handle: setprop > > > > > > > > Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault. > > > > (gdb) bt > > > > #0 0x0000000000000000 in () > > > > #1 0x0000555555a5c2bf in vof_setprop > > > > (vof=0x7ffff48e9420, vallen=4, valaddr=<optimized out>, > > > > pname=<optimized out>, nodeph=8, fdt=0x7fff8aaff010, > > > > ms=0x5555564f8800) > > > > at ../hw/ppc/vof.c:308 > > > > #2 0x0000555555a5c2bf in vof_client_handle > > > > (nrets=1, rets=0x7ffff48e93f0, nargs=4, > > > > args=0x7ffff48e93c0, service=0x7ffff48e9460 "setprop", > > > > vof=0x7ffff48e9420, fdt=0x7fff8aaff010, ms=0x5555564f8800) > > > > at ../hw/ppc/vof.c:842 > > > > #3 0x0000555555a5c2bf in vof_client_call > > > > (ms=0x5555564f8800, vof=vof@entry=0x55555662a3d0, > > > > fdt=fdt@entry=0x7fff8aaff010, > > > > args_real=args_real@entry=23580472) > > > > at ../hw/ppc/vof.c:935 > > > > > > > > loooks like it's trying to set /chosen/linux,initrd-start: > > > > > > It is not horribly clear why it crashed though. > > > > It crashed becuase I had TYPE_VOF_MACHINE_IF but did not set a setprop > > callback and it tried to call that here. Adding a {return true;} empty > > callback avoids this. > > > Ah ok. > > > > > > > (gdb) up > > > > #1 0x0000555555a5c2bf in vof_setprop (vof=0x7ffff48e9420, > > > > vallen=4, valaddr=<optimized out>, pname=<optimized out>, > > > > nodeph=8, > > > > fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:308 > > > > 308 if (!vmc->setprop(ms, nodepath, propname, val, vallen)) { > > > > (gdb) p nodepath > > > > $1 = "/chosen\000\060/rPC,750CXE/", '\000' <repeats 234 times> > > > > (gdb) p propname > > > > $2 = "linux,initrd-start\000linux,initrd-end\000linux,cmdline-timeout\000bootarg" > > > > (gdb) p val > > > > $3 = <optimized out> > > > > > > > > I think I need the callback for setprop in TYPE_VOF_MACHINE_IF. > > > > I can copy spapr_vof_setprop() but some explanation on why > > > > that's needed might help. Ciould I just do fdt_setprop in my > > > > callback as vof_setprop() would do without a machine callback or > > > > is there some special handling needed for these properties? > > > > > > The short answer is yes, you do not need TYPE_VOF_MACHINE_IF. > > > > > > The long answer is that we build the FDT on spapr twice: > > > 1. at the reset time and > > > 2. after "ibm,client-arhitecture-support" (early in the boot the > > > spapr paravirtual client says what it supports - ISA level, MMU > > > features, etc) > > > > > > Between 1 and 2 the kernel moves initrd and we do not update the > > > QEMU's version of its location, the tree at 2) will have the old > > > values. > > > > > > So for that reason I have TYPE_VOF_MACHINE_IF. You most definitely > > > do not need it. > > > > I need TYPE_VOF_MACHINE_IF because that has the quiesce callback that I > > need to shut VOF down when the guest is finished with it otherwise it > > would crash later (more on this in next message). > > Nah, quiesce() only means stopping IO in VOF. VOF is shut down when the > client decides to stop using it (and zero that memory). > > > But since I shut down VOF here I don't need to remember changes to the > > FDT so I can just use an empty setprop callback. (I wouldn't even need > > that if VOF would check that a callback is non-NULL before calling it.) > > I'll add the check. > > I'll need some time to go though the other mails, closer to the weekend, > there are too many gaps in my knowledge about those 32bit systems. > > I am really not sure that you need TYPE_PPC_VIRTUAL_HYPERVISOR (is this just > to make "sc 1" work? there should be a better way) or RTAS (although it > looks like you need it for PCI, you likely do not need it for your serial > device which is ISA which I have no idea how it works). Do you have an > actual machine? Can you dump its device tree to see what yours is missing? IIUC, it's basicaly so that the 'sc 1' instructions can be routed through to VOF. 'sc 1' is an illegal instruction on ppc32, AFAIK, so we need some sort of hack here. vhyp wasn't really designed for this, but I suspect it is the simplest way to intercept those 'sc 1' calls. Unfortunately, shutting it down presents a real problem. Currently you're relying on quiesce being the last call to OF the client makes. That's often the case in practice, but not necessarily in all cases, as you've seen. However, there's no alternative point at which we can determine that we're done with the client interface. My inclination for now would be to just leave the vhyp handler in place. Strictly speaking it won't give you correct behaviour: later calls to 'sc 1' will invoke VOF rather than giving a 0x700 exception. But nothing on a 32-bit system should be attempting 'sc 1' anyway, so I think it will probably work in practice.
On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: > On Sun, 23 May 2021, BALATON Zoltan wrote: > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > > > One thing to note about PCI is that normally I think the client > > > expects the firmware to do PCI probing and SLOF does it. But VOF > > > does not and Linux scans PCI bus(es) itself. Might be a problem for > > > you kernel. > > > > I'm not sure what info does MorphOS get from the device tree and what it > > probes itself but I think it may at least need device ids and info about > > the PCI bus to be able to access the config regs, after that it should > > set the devices up hopefully. I could add these from the board code to > > device tree so VOF does not need to do anything about it. However I'm > > not getting to that point yet because it crashes on something that it's > > missing and couldn't yet find out what is that. > > > > I'd like to get Linux working now as that would be enough to test this > > and then if for MorphOS we still need a ROM it's not a problem if at > > least we can boot Linux without the original firmware. But I can't make > > Linux open a serial console and I don't know what it needs for that. Do > > you happen to know? I've looked at the sources in Linux/arch/powerpc but > > not sure how it would find and open a serial port on pegasos2. It seems > > to work with the board firmware and now I can get it to boot with VOF > > but then it does not open serial so it probably needs something in the > > device tree or expects the firmware to set something up that we should > > add in pegasos2.c when using VOF. > > I've now found that Linux uses rtas methods read-pci-config and > write-pci-config for PCI access on pegasos2 so this means that we'll > probably need rtas too (I hoped we could get away without it if it were only > used for shutdown/reboot or so but seems Linux needs it for PCI as well and > does not scan the bus and won't find some devices without it). Yes, definitely sounds like you'll need an RTAS implementation. > While VOF can do rtas, this causes a problem with the hypercall method using > sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so > cannot work after guest is past quiesce. > So the question is why is that > assert there Ah.. right. So, vhyp was designed for the PAPR use case, where we want to model the CPU when it's in supervisor and user mode, but not when it's in hypervisor mode. We want qemu to mimic the behaviour of the hypervisor, rather than attempting to actually execute hypervisor code in the virtual CPU. On systems that have a hypervisor mode, SDR1 is hypervisor privileged, so it makes no sense for the guest to attempt to set it. That should be caught by the general SPR code and turned into a 0x700, hence the assert() if we somehow reach ppc_store_sdr1(). So, we are seeing a problem here because you want the 'sc 1' interception of vhyp, but not the rest of the stuff that goes with it. > and would using sc 1 for hypercalls on pegasos2 cause other > problems later even if the assert could be removed? At least in the short term, I think you probably can remove the assert. In your case the 'sc 1' calls aren't truly to a hypervisor, but a special case escape to qemu for the firmware emulation. I think it's unlikely to cause problems later, because nothing on a 32-bit system should be attempting an 'sc 1'. The only thing I can think of that would fail is some test case which explicitly verified that 'sc 1' triggered a 0x700 (SIGILL from userspace). > Can somebody who knows > more about it explain this please? If this cannot be resolved then we may > need a different hypercall method on pegasos2 (I've considered MOL OSI or > are there other options? I may use some advice from people who know it > better, especially the possible interaction with KVM later as the long term > goal with pegasos2 is to be able to run with KVM on PPC hardware > eventually.) Right, you might need an alternative method eventually. Really any illegal instruction for your cpu is a possible candidate. Bear in mind that this is *not* truly a hypercall interface, instead it's something we're special casing for the purposes of faking the firmware. The "attn" instruction used on BookE might be a reasonable candidate (assuming it doesn't conflict with something on 32-bit BookS) - that's often used for things like signalling the attention of hardware debuggers, and this is somewhat akin. Mostly it's just a matter of working out what would be least messy to intercept in the TCG instruction decoding path. > But this also means that if that assert cannot be dropped or > there may be other problems with sc 1 hypercalls then we maybe cannot have > the same vof.bin and we'll need a separate version that I would like to > avoid if possible so if there's a simple way to keep it working or make > vof.bin use alternate hypercall method without needing a separate binary > that would be the direction I'd tend to go. Even if we need a seoarate > version I'd like to keep as much common as possible. > > I've tested that the missing rtas is not the reason for getting no output > via serial though, as even when disabling rtas on pegasos2.rom it boots and > I still get serial output just some PCI devices are not detected (such as > USB, the video card and the not emulated ethernet port but these are not > fatal so it might even work as a first try without rtas, just to boot a > Linux kernel for testing it would be enough if I can fix the serial output). > I still don't know why it's not finding serial but I think it may be some > missing or wrong info in the device tree I generat. I'll try to focus on > this for now and leave the above rtas question for later. Oh.. another thought on that. You have an ISA serial port on Pegasos, I believe. I wonder if the PCI->ISA bridge needs some configuration / initialization that the firmware is expected to do. If so you'll need to mimic that setup in qemu for the VOF case.
On Mon, 24 May 2021, David Gibson wrote: > On Thu, May 20, 2021 at 11:59:07PM +0200, BALATON Zoltan wrote: >> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>> The PAPR platform describes an OS environment that's presented by >>> a combination of a hypervisor and firmware. The features it specifies >>> require collaboration between the firmware and the hypervisor. >>> >>> Since the beginning, the runtime component of the firmware (RTAS) has >>> been implemented as a 20 byte shim which simply forwards it to >>> a hypercall implemented in qemu. The boot time firmware component is >>> SLOF - but a build that's specific to qemu, and has always needed to be >>> updated in sync with it. Even though we've managed to limit the amount >>> of runtime communication we need between qemu and SLOF, there's some, >>> and it has become increasingly awkward to handle as we've implemented >>> new features. >>> >>> This implements a boot time OF client interface (CI) which is >>> enabled by a new "x-vof" pseries machine option (stands for "Virtual Open >>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall >>> which implements Open Firmware Client Interface (OF CI). This allows >>> using a smaller stateless firmware which does not have to manage >>> the device tree. >>> >>> The new "vof.bin" firmware image is included with source code under >>> pc-bios/. It also includes RTAS blob. >>> >>> This implements a handful of CI methods just to get -kernel/-initrd >>> working. In particular, this implements the device tree fetching and >>> simple memory allocator - "claim" (an OF CI memory allocator) and updates >>> "/memory@0/available" to report the client about available memory. >>> >>> This implements changing some device tree properties which we know how >>> to deal with, the rest is ignored. To allow changes, this skips >>> fdt_pack() when x-vof=on as not packing the blob leaves some room for >>> appending. >>> >>> In absence of SLOF, this assigns phandles to device tree nodes to make >>> device tree traversing work. >>> >>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. >>> >>> This adds basic instances support which are managed by a hash map >>> ihandle -> [phandle]. >>> >>> Before the guest started, the used memory is: >>> 0..e60 - the initial firmware >>> 8000..10000 - stack >>> 400000.. - kernel >>> 3ea0000.. - initramdisk >>> >>> This OF CI does not implement "interpret". >>> >>> Unlike SLOF, this does not format uninitialized nvram. Instead, this >>> includes a disk image with pre-formatted nvram. >>> >>> With this basic support, this can only boot into kernel directly. >>> However this is just enough for the petitboot kernel and initradmdisk to >>> boot from any possible source. Note this requires reasonably recent guest >>> kernel with: >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 >>> >>> The immediate benefit is much faster booting time which especially >>> crucial with fully emulated early CPU bring up environments. Also this >>> may come handy when/if GRUB-in-the-userspace sees light of the day. >>> >>> This separates VOF and sPAPR in a hope that VOF bits may be reused by >>> other POWERPC boards which do not support pSeries. >>> >>> This is coded in assumption that later on we might be adding support for >>> booting from QEMU backends (blockdev is the first candidate) without >>> devices/drivers in between as OF1275 does not require that and >>> it is quite easy to so. >>> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>> --- >>> >>> The example command line is: >>> >>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ >>> -nodefaults \ >>> -chardev stdio,id=STDIO0,signal=off,mux=on \ >>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ >>> -mon id=MON0,chardev=STDIO0,mode=readline \ >>> -nographic \ >>> -vga none \ >>> -enable-kvm \ >>> -m 8G \ >>> -machine pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off \ >>> -kernel pbuild/kernel-le-guest/vmlinux \ >>> -initrd pb/rootfs.cpio.xz \ >>> -drive id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \ >>> -global spapr-nvram.drive=DRIVE0 \ >>> -snapshot \ >>> -smp 8,threads=8 \ >>> -L /home/aik/t/qemu-ppc64-bios/ \ >>> -trace events=qemu_trace_events \ >>> -d guest_errors \ >>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ >>> -mon chardev=SOCKET0,mode=control >>> >>> --- >>> Changes: >>> v20: >>> * compile vof.bin with -mcpu=power4 for better compatibility >>> * s/std/stw/ in entry.S to make it work on ppc32 >>> * fixed dt_available property to support both 32 and 64bit >>> * shuffled prom_args handling code >>> * do not enforce 32bit in MSR (again, to support 32bit platforms) >>> >> >> [...] >> >>> diff --git a/default-configs/devices/ppc64-softmmu.mak b/default-configs/devices/ppc64-softmmu.mak >>> index ae0841fa3a18..9fb201dfacfa 100644 >>> --- a/default-configs/devices/ppc64-softmmu.mak >>> +++ b/default-configs/devices/ppc64-softmmu.mak >>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y >>> # For pSeries >>> CONFIG_PSERIES=y >>> CONFIG_NVDIMM=y >>> +CONFIG_VOF=y >>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig >>> index e51e0e5e5ac6..964510dfc73d 100644 >>> --- a/hw/ppc/Kconfig >>> +++ b/hw/ppc/Kconfig >>> @@ -143,3 +143,6 @@ config FW_CFG_PPC >>> >>> config FDT_PPC >>> bool >>> + >>> +config VOF >>> + bool >> >> I think you should just add "select VOF" to config PSERIES section in >> Kconfig instead of adding it to default-configs/devices/ppc64-softmmu.mak. >> That should do it, it works in my updated pegasos2 patch: > > No, we don't want a "select": PSERIES doesn't require VOF while we > still support SLOF, and indeed we're quite a ways from being ready to > even make VOF the default pseries firmware. Shouldn't you then also need to make code in spapr adding x-vof conditional on CONFIG_VOF or make sure it cannot be enabled if not compiled in? Otherwise it means VOF is always an option so spapr depends on VOF for which select is the way to describe that. Regards, BALATON Zoltan
On Mon, May 24, 2021 at 11:57:27AM +0200, BALATON Zoltan wrote: > On Mon, 24 May 2021, David Gibson wrote: > > On Thu, May 20, 2021 at 11:59:07PM +0200, BALATON Zoltan wrote: > > > On Thu, 20 May 2021, Alexey Kardashevskiy wrote: > > > > The PAPR platform describes an OS environment that's presented by > > > > a combination of a hypervisor and firmware. The features it specifies > > > > require collaboration between the firmware and the hypervisor. > > > > > > > > Since the beginning, the runtime component of the firmware (RTAS) has > > > > been implemented as a 20 byte shim which simply forwards it to > > > > a hypercall implemented in qemu. The boot time firmware component is > > > > SLOF - but a build that's specific to qemu, and has always needed to be > > > > updated in sync with it. Even though we've managed to limit the amount > > > > of runtime communication we need between qemu and SLOF, there's some, > > > > and it has become increasingly awkward to handle as we've implemented > > > > new features. > > > > > > > > This implements a boot time OF client interface (CI) which is > > > > enabled by a new "x-vof" pseries machine option (stands for "Virtual Open > > > > Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall > > > > which implements Open Firmware Client Interface (OF CI). This allows > > > > using a smaller stateless firmware which does not have to manage > > > > the device tree. > > > > > > > > The new "vof.bin" firmware image is included with source code under > > > > pc-bios/. It also includes RTAS blob. > > > > > > > > This implements a handful of CI methods just to get -kernel/-initrd > > > > working. In particular, this implements the device tree fetching and > > > > simple memory allocator - "claim" (an OF CI memory allocator) and updates > > > > "/memory@0/available" to report the client about available memory. > > > > > > > > This implements changing some device tree properties which we know how > > > > to deal with, the rest is ignored. To allow changes, this skips > > > > fdt_pack() when x-vof=on as not packing the blob leaves some room for > > > > appending. > > > > > > > > In absence of SLOF, this assigns phandles to device tree nodes to make > > > > device tree traversing work. > > > > > > > > When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. > > > > > > > > This adds basic instances support which are managed by a hash map > > > > ihandle -> [phandle]. > > > > > > > > Before the guest started, the used memory is: > > > > 0..e60 - the initial firmware > > > > 8000..10000 - stack > > > > 400000.. - kernel > > > > 3ea0000.. - initramdisk > > > > > > > > This OF CI does not implement "interpret". > > > > > > > > Unlike SLOF, this does not format uninitialized nvram. Instead, this > > > > includes a disk image with pre-formatted nvram. > > > > > > > > With this basic support, this can only boot into kernel directly. > > > > However this is just enough for the petitboot kernel and initradmdisk to > > > > boot from any possible source. Note this requires reasonably recent guest > > > > kernel with: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 > > > > > > > > The immediate benefit is much faster booting time which especially > > > > crucial with fully emulated early CPU bring up environments. Also this > > > > may come handy when/if GRUB-in-the-userspace sees light of the day. > > > > > > > > This separates VOF and sPAPR in a hope that VOF bits may be reused by > > > > other POWERPC boards which do not support pSeries. > > > > > > > > This is coded in assumption that later on we might be adding support for > > > > booting from QEMU backends (blockdev is the first candidate) without > > > > devices/drivers in between as OF1275 does not require that and > > > > it is quite easy to so. > > > > > > > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > > > --- > > > > > > > > The example command line is: > > > > > > > > /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ > > > > -nodefaults \ > > > > -chardev stdio,id=STDIO0,signal=off,mux=on \ > > > > -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ > > > > -mon id=MON0,chardev=STDIO0,mode=readline \ > > > > -nographic \ > > > > -vga none \ > > > > -enable-kvm \ > > > > -m 8G \ > > > > -machine pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off \ > > > > -kernel pbuild/kernel-le-guest/vmlinux \ > > > > -initrd pb/rootfs.cpio.xz \ > > > > -drive id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \ > > > > -global spapr-nvram.drive=DRIVE0 \ > > > > -snapshot \ > > > > -smp 8,threads=8 \ > > > > -L /home/aik/t/qemu-ppc64-bios/ \ > > > > -trace events=qemu_trace_events \ > > > > -d guest_errors \ > > > > -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ > > > > -mon chardev=SOCKET0,mode=control > > > > > > > > --- > > > > Changes: > > > > v20: > > > > * compile vof.bin with -mcpu=power4 for better compatibility > > > > * s/std/stw/ in entry.S to make it work on ppc32 > > > > * fixed dt_available property to support both 32 and 64bit > > > > * shuffled prom_args handling code > > > > * do not enforce 32bit in MSR (again, to support 32bit platforms) > > > > > > > > > > [...] > > > > > > > diff --git a/default-configs/devices/ppc64-softmmu.mak b/default-configs/devices/ppc64-softmmu.mak > > > > index ae0841fa3a18..9fb201dfacfa 100644 > > > > --- a/default-configs/devices/ppc64-softmmu.mak > > > > +++ b/default-configs/devices/ppc64-softmmu.mak > > > > @@ -9,3 +9,4 @@ CONFIG_POWERNV=y > > > > # For pSeries > > > > CONFIG_PSERIES=y > > > > CONFIG_NVDIMM=y > > > > +CONFIG_VOF=y > > > > diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig > > > > index e51e0e5e5ac6..964510dfc73d 100644 > > > > --- a/hw/ppc/Kconfig > > > > +++ b/hw/ppc/Kconfig > > > > @@ -143,3 +143,6 @@ config FW_CFG_PPC > > > > > > > > config FDT_PPC > > > > bool > > > > + > > > > +config VOF > > > > + bool > > > > > > I think you should just add "select VOF" to config PSERIES section in > > > Kconfig instead of adding it to default-configs/devices/ppc64-softmmu.mak. > > > That should do it, it works in my updated pegasos2 patch: > > > > No, we don't want a "select": PSERIES doesn't require VOF while we > > still support SLOF, and indeed we're quite a ways from being ready to > > even make VOF the default pseries firmware. > > Shouldn't you then also need to make code in spapr adding x-vof conditional > on CONFIG_VOF or make sure it cannot be enabled if not compiled in? > Otherwise it means VOF is always an option so spapr depends on VOF for which > select is the way to describe that. Uh, yes, we probably should.
On Mon, 24 May 2021, David Gibson wrote: > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >> On Sun, 23 May 2021, BALATON Zoltan wrote: >>> On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >>>> One thing to note about PCI is that normally I think the client >>>> expects the firmware to do PCI probing and SLOF does it. But VOF >>>> does not and Linux scans PCI bus(es) itself. Might be a problem for >>>> you kernel. >>> >>> I'm not sure what info does MorphOS get from the device tree and what it >>> probes itself but I think it may at least need device ids and info about >>> the PCI bus to be able to access the config regs, after that it should >>> set the devices up hopefully. I could add these from the board code to >>> device tree so VOF does not need to do anything about it. However I'm >>> not getting to that point yet because it crashes on something that it's >>> missing and couldn't yet find out what is that. >>> >>> I'd like to get Linux working now as that would be enough to test this >>> and then if for MorphOS we still need a ROM it's not a problem if at >>> least we can boot Linux without the original firmware. But I can't make >>> Linux open a serial console and I don't know what it needs for that. Do >>> you happen to know? I've looked at the sources in Linux/arch/powerpc but >>> not sure how it would find and open a serial port on pegasos2. It seems >>> to work with the board firmware and now I can get it to boot with VOF >>> but then it does not open serial so it probably needs something in the >>> device tree or expects the firmware to set something up that we should >>> add in pegasos2.c when using VOF. >> >> I've now found that Linux uses rtas methods read-pci-config and >> write-pci-config for PCI access on pegasos2 so this means that we'll >> probably need rtas too (I hoped we could get away without it if it were only >> used for shutdown/reboot or so but seems Linux needs it for PCI as well and >> does not scan the bus and won't find some devices without it). > > Yes, definitely sounds like you'll need an RTAS implementation. I plan to fix that after managed to get serial working as that seems to not need it. If I delete the rtas-size property from /rtas on the original firmware that makes Linux skip instantiating rtas, but I still get serial output just not accessing PCI devices. So I think it should work and keeps things simpler at first. Then I'll try rtas later. >> While VOF can do rtas, this causes a problem with the hypercall method using >> sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so >> cannot work after guest is past quiesce. > >> So the question is why is that >> assert there > > Ah.. right. So, vhyp was designed for the PAPR use case, where we > want to model the CPU when it's in supervisor and user mode, but not > when it's in hypervisor mode. We want qemu to mimic the behaviour of > the hypervisor, rather than attempting to actually execute hypervisor > code in the virtual CPU. > > On systems that have a hypervisor mode, SDR1 is hypervisor privileged, > so it makes no sense for the guest to attempt to set it. That should > be caught by the general SPR code and turned into a 0x700, hence the > assert() if we somehow reach ppc_store_sdr1(). > > So, we are seeing a problem here because you want the 'sc 1' > interception of vhyp, but not the rest of the stuff that goes with it. > >> and would using sc 1 for hypercalls on pegasos2 cause other >> problems later even if the assert could be removed? > > At least in the short term, I think you probably can remove the > assert. In your case the 'sc 1' calls aren't truly to a hypervisor, > but a special case escape to qemu for the firmware emulation. I think > it's unlikely to cause problems later, because nothing on a 32-bit > system should be attempting an 'sc 1'. The only thing I can think of > that would fail is some test case which explicitly verified that 'sc > 1' triggered a 0x700 (SIGILL from userspace). OK so the assert should check if the CPU has an HV bit. I think there was a #detine for that somewhere that I can add to the assert then I can try that. What I wasn't sure about is that sc 1 would conflict with the guest's usage of normal sc calls or are these going through different paths and only sc 1 will trigger vhyp callback not affecting notmal sc calls? (Or if this causes an otherwise unnecessary VM exit on KVM even when it works then maybe looking for a different way in the future might be needed. But for now if this works with modifying the assert to allow this on ppc32 then I go for that as that's the simplest way for now.) >> Can somebody who knows >> more about it explain this please? If this cannot be resolved then we may >> need a different hypercall method on pegasos2 (I've considered MOL OSI or >> are there other options? I may use some advice from people who know it >> better, especially the possible interaction with KVM later as the long term >> goal with pegasos2 is to be able to run with KVM on PPC hardware >> eventually.) > > Right, you might need an alternative method eventually. Really any > illegal instruction for your cpu is a possible candidate. Bear in > mind that this is *not* truly a hypercall interface, instead it's > something we're special casing for the purposes of faking the > firmware. > > The "attn" instruction used on BookE might be a reasonable candidate > (assuming it doesn't conflict with something on 32-bit BookS) - that's > often used for things like signalling the attention of hardware > debuggers, and this is somewhat akin. > > Mostly it's just a matter of working out what would be least messy to > intercept in the TCG instruction decoding path. I'll wait for the current ongoing reorganisations to settle for that. If an alternative is needed I was considering the interface used by Mac on Linux: https://lists.nongnu.org/archive/html/qemu-ppc/2021-03/msg00047.html becuase there are some paravirtual drivers I think that use these on Mac OS X so this might also be useful for that use case for Mac emulation. But that seems very similar just checking for magic values at a normal syscall which means all syscalls will be intercepted anyway. In that case if sc 1 does not interfere with normal sc instructions then it may be better to keep that as the invalid instruction we trap on. >> But this also means that if that assert cannot be dropped or >> there may be other problems with sc 1 hypercalls then we maybe cannot have >> the same vof.bin and we'll need a separate version that I would like to >> avoid if possible so if there's a simple way to keep it working or make >> vof.bin use alternate hypercall method without needing a separate binary >> that would be the direction I'd tend to go. Even if we need a seoarate >> version I'd like to keep as much common as possible. >> >> I've tested that the missing rtas is not the reason for getting no output >> via serial though, as even when disabling rtas on pegasos2.rom it boots and >> I still get serial output just some PCI devices are not detected (such as >> USB, the video card and the not emulated ethernet port but these are not >> fatal so it might even work as a first try without rtas, just to boot a >> Linux kernel for testing it would be enough if I can fix the serial output). >> I still don't know why it's not finding serial but I think it may be some >> missing or wrong info in the device tree I generat. I'll try to focus on >> this for now and leave the above rtas question for later. > > Oh.. another thought on that. You have an ISA serial port on Pegasos, > I believe. I wonder if the PCI->ISA bridge needs some configuration / > initialization that the firmware is expected to do. If so you'll need > to mimic that setup in qemu for the VOF case. That's what I begin to think because I've added everything to the device tree that I thought could be needed and I still don't get it working so it may need some config from the firmware. But how do I access device registers from board code? I've tried adding a machine reset method and write to memory mapped device registers but all my attempts failed. I've tried cpu_stl_le_data and even memory_region_dispatch_write but these did not get to the device. What's the way to access guest mmio regs from QEMU? Regards, BALATON Zoltan
On Mon, 24 May 2021, David Gibson wrote: > On Mon, May 24, 2021 at 02:26:42PM +1000, Alexey Kardashevskiy wrote: >> On 5/23/21 21:24, BALATON Zoltan wrote: >>> On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >>>> On 23/05/2021 01:02, BALATON Zoltan wrote: >>>>> On Sat, 22 May 2021, BALATON Zoltan wrote: >>>>>> On Sat, 22 May 2021, Alexey Kardashevskiy wrote: >>>>>>> VOF itself does not prints anything in this patch. >>>>>> >>>>>> However it seems to be needed for linux as the first thing >>>>>> it does seems to be getting /chosen/stdout and calls exit if >>>>>> it returns nothing. So I'll need this at least for linux. (I >>>>>> think MorphOS may also query it to print a banner or some >>>>>> messages but not sure it needs it, at least it does not >>>>>> abort right away if not found.) >>>>>> >>>>>>>> but to see Linux output do I need a stdout in VOF or >>>>>>>> it will just open the serial with its own driver and >>>>>>>> use that? >>>>>>>> So I'm not sure what's the stdout parts in the >>>>>>>> current vof patch does and if I need that for >>>>>>>> anything. I'll try to experiment with it some more >>>>>>>> but fixing the ld and Kconfig seems to be enough to >>>>>>>> get it work for me. >>>>>>> >>>>>>> So for the client to print something, /chosen/stdout >>>>>>> needs to have a valid ihandle. >>>>>>> The only way to get a valid ihandle is having a valid >>>>>>> phandle which vof_client_open() can open. >>>>>>> A valid phandle is a phandle of any node in the device >>>>>>> tree. On spapr we pick some spapr-vty, open it and store >>>>>>> in /chosen/stdout. >>>>>>> >>>>>>> From this point output from the client can be seen via a tracepoint. >>>>> >>>>> I've got it now. Looking at the original firmware device tree dump: >>>>> >>>>> https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2/attach/PegasosII_OFW-Dump.txt >>>>> >>>>> I see that /chosen/stdout points to "screen" which is an alias >>>>> to /bootconsole. Just adding an empty /bootconsole node in the >>>>> device tree and vof_client_open_store() that as /chosen/stdout >>>>> works and I get output via vof_write traces so this is enough >>>>> for now to test Linux. Properly connecting a serial backend can >>>>> thus be postponed. >>>>> >>>>> So with this the Linux kernel does not abort on the first device >>>>> tree access but starts to decompress itself then the embedded >>>>> initrd and crashes at calling setprop: >>>>> >>>>> [...] >>>>> vof_client_handle: setprop >>>>> >>>>> Thread 4 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault. >>>>> (gdb) bt >>>>> #0 0x0000000000000000 in () >>>>> #1 0x0000555555a5c2bf in vof_setprop >>>>> (vof=0x7ffff48e9420, vallen=4, valaddr=<optimized out>, >>>>> pname=<optimized out>, nodeph=8, fdt=0x7fff8aaff010, >>>>> ms=0x5555564f8800) >>>>> at ../hw/ppc/vof.c:308 >>>>> #2 0x0000555555a5c2bf in vof_client_handle >>>>> (nrets=1, rets=0x7ffff48e93f0, nargs=4, >>>>> args=0x7ffff48e93c0, service=0x7ffff48e9460 "setprop", >>>>> vof=0x7ffff48e9420, fdt=0x7fff8aaff010, ms=0x5555564f8800) >>>>> at ../hw/ppc/vof.c:842 >>>>> #3 0x0000555555a5c2bf in vof_client_call >>>>> (ms=0x5555564f8800, vof=vof@entry=0x55555662a3d0, >>>>> fdt=fdt@entry=0x7fff8aaff010, >>>>> args_real=args_real@entry=23580472) >>>>> at ../hw/ppc/vof.c:935 >>>>> >>>>> loooks like it's trying to set /chosen/linux,initrd-start: >>>> >>>> It is not horribly clear why it crashed though. >>> >>> It crashed becuase I had TYPE_VOF_MACHINE_IF but did not set a setprop >>> callback and it tried to call that here. Adding a {return true;} empty >>> callback avoids this. >> >> >> Ah ok. >> >>> >>>>> (gdb) up >>>>> #1 0x0000555555a5c2bf in vof_setprop (vof=0x7ffff48e9420, >>>>> vallen=4, valaddr=<optimized out>, pname=<optimized out>, >>>>> nodeph=8, >>>>> fdt=0x7fff8aaff010, ms=0x5555564f8800) at ../hw/ppc/vof.c:308 >>>>> 308 if (!vmc->setprop(ms, nodepath, propname, val, vallen)) { >>>>> (gdb) p nodepath >>>>> $1 = "/chosen\000\060/rPC,750CXE/", '\000' <repeats 234 times> >>>>> (gdb) p propname >>>>> $2 = "linux,initrd-start\000linux,initrd-end\000linux,cmdline-timeout\000bootarg" >>>>> (gdb) p val >>>>> $3 = <optimized out> >>>>> >>>>> I think I need the callback for setprop in TYPE_VOF_MACHINE_IF. >>>>> I can copy spapr_vof_setprop() but some explanation on why >>>>> that's needed might help. Ciould I just do fdt_setprop in my >>>>> callback as vof_setprop() would do without a machine callback or >>>>> is there some special handling needed for these properties? >>>> >>>> The short answer is yes, you do not need TYPE_VOF_MACHINE_IF. >>>> >>>> The long answer is that we build the FDT on spapr twice: >>>> 1. at the reset time and >>>> 2. after "ibm,client-arhitecture-support" (early in the boot the >>>> spapr paravirtual client says what it supports - ISA level, MMU >>>> features, etc) >>>> >>>> Between 1 and 2 the kernel moves initrd and we do not update the >>>> QEMU's version of its location, the tree at 2) will have the old >>>> values. >>>> >>>> So for that reason I have TYPE_VOF_MACHINE_IF. You most definitely >>>> do not need it. >>> >>> I need TYPE_VOF_MACHINE_IF because that has the quiesce callback that I >>> need to shut VOF down when the guest is finished with it otherwise it >>> would crash later (more on this in next message). >> >> Nah, quiesce() only means stopping IO in VOF. VOF is shut down when the >> client decides to stop using it (and zero that memory). >> >>> But since I shut down VOF here I don't need to remember changes to the >>> FDT so I can just use an empty setprop callback. (I wouldn't even need >>> that if VOF would check that a callback is non-NULL before calling it.) >> >> I'll add the check. >> >> I'll need some time to go though the other mails, closer to the weekend, >> there are too many gaps in my knowledge about those 32bit systems. >> >> I am really not sure that you need TYPE_PPC_VIRTUAL_HYPERVISOR (is this just >> to make "sc 1" work? there should be a better way) or RTAS (although it >> looks like you need it for PCI, you likely do not need it for your serial >> device which is ISA which I have no idea how it works). Do you have an >> actual machine? Can you dump its device tree to see what yours is missing? > > IIUC, it's basicaly so that the 'sc 1' instructions can be routed > through to VOF. 'sc 1' is an illegal instruction on ppc32, AFAIK, so > we need some sort of hack here. Yes correct, I'm just using vhyp as that was already there and is the simplest way to get it working without any other changes needed to target/ppc or vof (apart from small changes to vof to make it work on ppc32 and correctly handle ELF with entry != load address that the pegasos2 kernel happens to do which are probably bugs in vof anyway so could be fixed). > vhyp wasn't really designed for this, but I suspect it is the simplest > way to intercept those 'sc 1' calls. > > Unfortunately, shutting it down presents a real problem. Currently > you're relying on quiesce being the last call to OF the client makes. > That's often the case in practice, but not necessarily in all cases, > as you've seen. However, there's no alternative point at which we can > determine that we're done with the client interface. > > My inclination for now would be to just leave the vhyp handler in > place. Strictly speaking it won't give you correct behaviour: later > calls to 'sc 1' will invoke VOF rather than giving a 0x700 exception. > But nothing on a 32-bit system should be attempting 'sc 1' anyway, so > I think it will probably work in practice. I agree with that if it works. Until we find a reason to replace it I think this is the simplest way and so far I could get it mostly working. I'll keep trying. Regards, BALATON Zoltan
On Mon, 24 May 2021, David Gibson wrote: > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >> On Sun, 23 May 2021, BALATON Zoltan wrote: >>> On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >>>> One thing to note about PCI is that normally I think the client >>>> expects the firmware to do PCI probing and SLOF does it. But VOF >>>> does not and Linux scans PCI bus(es) itself. Might be a problem for >>>> you kernel. >>> >>> I'm not sure what info does MorphOS get from the device tree and what it >>> probes itself but I think it may at least need device ids and info about >>> the PCI bus to be able to access the config regs, after that it should >>> set the devices up hopefully. I could add these from the board code to >>> device tree so VOF does not need to do anything about it. However I'm >>> not getting to that point yet because it crashes on something that it's >>> missing and couldn't yet find out what is that. >>> >>> I'd like to get Linux working now as that would be enough to test this >>> and then if for MorphOS we still need a ROM it's not a problem if at >>> least we can boot Linux without the original firmware. But I can't make >>> Linux open a serial console and I don't know what it needs for that. Do >>> you happen to know? I've looked at the sources in Linux/arch/powerpc but >>> not sure how it would find and open a serial port on pegasos2. It seems >>> to work with the board firmware and now I can get it to boot with VOF >>> but then it does not open serial so it probably needs something in the >>> device tree or expects the firmware to set something up that we should >>> add in pegasos2.c when using VOF. >> >> I've now found that Linux uses rtas methods read-pci-config and >> write-pci-config for PCI access on pegasos2 so this means that we'll >> probably need rtas too (I hoped we could get away without it if it were only >> used for shutdown/reboot or so but seems Linux needs it for PCI as well and >> does not scan the bus and won't find some devices without it). > > Yes, definitely sounds like you'll need an RTAS implementation. > >> While VOF can do rtas, this causes a problem with the hypercall method using >> sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so >> cannot work after guest is past quiesce. > >> So the question is why is that >> assert there > > Ah.. right. So, vhyp was designed for the PAPR use case, where we > want to model the CPU when it's in supervisor and user mode, but not > when it's in hypervisor mode. We want qemu to mimic the behaviour of > the hypervisor, rather than attempting to actually execute hypervisor > code in the virtual CPU. > > On systems that have a hypervisor mode, SDR1 is hypervisor privileged, > so it makes no sense for the guest to attempt to set it. That should > be caught by the general SPR code and turned into a 0x700, hence the > assert() if we somehow reach ppc_store_sdr1(). This seems to work to avoid my problem so I can leave vhyp enabled after qiuesce for now: diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c index d957d1a687..13b87b9b36 100644 --- a/target/ppc/cpu.c +++ b/target/ppc/cpu.c @@ -70,7 +70,7 @@ void ppc_store_sdr1(CPUPPCState *env, target_ulong value) { PowerPCCPU *cpu = env_archcpu(env); qemu_log_mask(CPU_LOG_MMU, "%s: " TARGET_FMT_lx "\n", __func__, value); - assert(!cpu->vhyp); + assert(!cpu->env.has_hv_mode || !cpu->vhyp); #if defined(TARGET_PPC64) if (mmu_is_64bit(env->mmu_model)) { target_ulong sdr_mask = SDR_64_HTABORG | SDR_64_HTABSIZE; But I wonder if the assert should also be moved within the TARGET_PPC64 block and if we may need to generate some exception here instead. Not sure what a real CPU would do in this case but if accessing sdr1 is privileged in HV mode then there should be an exception or if that's catched elsewhere then this assert may not be needed at all. I can make a patch if you tell me what should it do. Regards, BALATON Zoltan
On 24/05/2021 20:55, BALATON Zoltan wrote: > On Mon, 24 May 2021, David Gibson wrote: >> On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >>> On Sun, 23 May 2021, BALATON Zoltan wrote: >>>> On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >>>>> One thing to note about PCI is that normally I think the client >>>>> expects the firmware to do PCI probing and SLOF does it. But VOF >>>>> does not and Linux scans PCI bus(es) itself. Might be a problem for >>>>> you kernel. >>>> >>>> I'm not sure what info does MorphOS get from the device tree and >>>> what it >>>> probes itself but I think it may at least need device ids and info >>>> about >>>> the PCI bus to be able to access the config regs, after that it should >>>> set the devices up hopefully. I could add these from the board code to >>>> device tree so VOF does not need to do anything about it. However I'm >>>> not getting to that point yet because it crashes on something that it's >>>> missing and couldn't yet find out what is that. >>>> >>>> I'd like to get Linux working now as that would be enough to test this >>>> and then if for MorphOS we still need a ROM it's not a problem if at >>>> least we can boot Linux without the original firmware. But I can't make >>>> Linux open a serial console and I don't know what it needs for that. Do >>>> you happen to know? I've looked at the sources in Linux/arch/powerpc >>>> but >>>> not sure how it would find and open a serial port on pegasos2. It seems >>>> to work with the board firmware and now I can get it to boot with VOF >>>> but then it does not open serial so it probably needs something in the >>>> device tree or expects the firmware to set something up that we should >>>> add in pegasos2.c when using VOF. >>> >>> I've now found that Linux uses rtas methods read-pci-config and >>> write-pci-config for PCI access on pegasos2 so this means that we'll >>> probably need rtas too (I hoped we could get away without it if it >>> were only >>> used for shutdown/reboot or so but seems Linux needs it for PCI as >>> well and >>> does not scan the bus and won't find some devices without it). >> >> Yes, definitely sounds like you'll need an RTAS implementation. > > I plan to fix that after managed to get serial working as that seems to > not need it. If I delete the rtas-size property from /rtas on the > original firmware that makes Linux skip instantiating rtas, but I still > get serial output just not accessing PCI devices. So I think it should > work and keeps things simpler at first. Then I'll try rtas later. > >>> While VOF can do rtas, this causes a problem with the hypercall >>> method using >>> sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so >>> cannot work after guest is past quiesce. >> >>> So the question is why is that >>> assert there >> >> Ah.. right. So, vhyp was designed for the PAPR use case, where we >> want to model the CPU when it's in supervisor and user mode, but not >> when it's in hypervisor mode. We want qemu to mimic the behaviour of >> the hypervisor, rather than attempting to actually execute hypervisor >> code in the virtual CPU. >> >> On systems that have a hypervisor mode, SDR1 is hypervisor privileged, >> so it makes no sense for the guest to attempt to set it. That should >> be caught by the general SPR code and turned into a 0x700, hence the >> assert() if we somehow reach ppc_store_sdr1(). >> >> So, we are seeing a problem here because you want the 'sc 1' >> interception of vhyp, but not the rest of the stuff that goes with it. >> >>> and would using sc 1 for hypercalls on pegasos2 cause other >>> problems later even if the assert could be removed? >> >> At least in the short term, I think you probably can remove the >> assert. In your case the 'sc 1' calls aren't truly to a hypervisor, >> but a special case escape to qemu for the firmware emulation. I think >> it's unlikely to cause problems later, because nothing on a 32-bit >> system should be attempting an 'sc 1'. The only thing I can think of >> that would fail is some test case which explicitly verified that 'sc >> 1' triggered a 0x700 (SIGILL from userspace). > > OK so the assert should check if the CPU has an HV bit. I think there > was a #detine for that somewhere that I can add to the assert then I can > try that. What I wasn't sure about is that sc 1 would conflict with the > guest's usage of normal sc calls or are these going through different > paths and only sc 1 will trigger vhyp callback not affecting notmal sc > calls? (Or if this causes an otherwise unnecessary VM exit on KVM even > when it works then maybe looking for a different way in the future might > be needed. But for now if this works with modifying the assert to allow > this on ppc32 then I go for that as that's the simplest way for now.) > >>> Can somebody who knows >>> more about it explain this please? If this cannot be resolved then we >>> may >>> need a different hypercall method on pegasos2 (I've considered MOL >>> OSI or >>> are there other options? I may use some advice from people who know it >>> better, especially the possible interaction with KVM later as the >>> long term >>> goal with pegasos2 is to be able to run with KVM on PPC hardware >>> eventually.) >> >> Right, you might need an alternative method eventually. Really any >> illegal instruction for your cpu is a possible candidate. Bear in >> mind that this is *not* truly a hypercall interface, instead it's >> something we're special casing for the purposes of faking the >> firmware. >> >> The "attn" instruction used on BookE might be a reasonable candidate >> (assuming it doesn't conflict with something on 32-bit BookS) - that's >> often used for things like signalling the attention of hardware >> debuggers, and this is somewhat akin. >> >> Mostly it's just a matter of working out what would be least messy to >> intercept in the TCG instruction decoding path. > > I'll wait for the current ongoing reorganisations to settle for that. If > an alternative is needed I was considering the interface used by Mac on > Linux: > > https://lists.nongnu.org/archive/html/qemu-ppc/2021-03/msg00047.html > > becuase there are some paravirtual drivers I think that use these on Mac > OS X so this might also be useful for that use case for Mac emulation. > But that seems very similar just checking for magic values at a normal > syscall which means all syscalls will be intercepted anyway. In that > case if sc 1 does not interfere with normal sc instructions then it may > be better to keep that as the invalid instruction we trap on. > >>> But this also means that if that assert cannot be dropped or >>> there may be other problems with sc 1 hypercalls then we maybe cannot >>> have >>> the same vof.bin and we'll need a separate version that I would like to >>> avoid if possible so if there's a simple way to keep it working or make >>> vof.bin use alternate hypercall method without needing a separate binary >>> that would be the direction I'd tend to go. Even if we need a seoarate >>> version I'd like to keep as much common as possible. >>> >>> I've tested that the missing rtas is not the reason for getting no >>> output >>> via serial though, as even when disabling rtas on pegasos2.rom it >>> boots and >>> I still get serial output just some PCI devices are not detected >>> (such as >>> USB, the video card and the not emulated ethernet port but these are not >>> fatal so it might even work as a first try without rtas, just to boot a >>> Linux kernel for testing it would be enough if I can fix the serial >>> output). >>> I still don't know why it's not finding serial but I think it may be >>> some >>> missing or wrong info in the device tree I generat. I'll try to focus on >>> this for now and leave the above rtas question for later. >> >> Oh.. another thought on that. You have an ISA serial port on Pegasos, >> I believe. I wonder if the PCI->ISA bridge needs some configuration / >> initialization that the firmware is expected to do. If so you'll need >> to mimic that setup in qemu for the VOF case. > > That's what I begin to think because I've added everything to the device > tree that I thought could be needed and I still don't get it working so > it may need some config from the firmware. But how do I access device > registers from board code? I've tried adding a machine reset method and > write to memory mapped device registers but all my attempts failed. I've > tried cpu_stl_le_data and even memory_region_dispatch_write but these > did not get to the device. What's the way to access guest mmio regs from > QEMU? If we know that that serial is sitting behind PCI->ISA bridge (is it?), I think you need to assign a BAR to that bridge, do some ISA setup (no idea which) and enable that bridge (write MEMORY to PCI_COMMAND), this should enable its registers. In pseries we add "linux,pci-probe-only"=0 which makes Linux do all the above instead of relying on the firmware doing BAR assignment.
On Mon, 24 May 2021, Alexey Kardashevskiy wrote: > On 24/05/2021 20:55, BALATON Zoltan wrote: >> On Mon, 24 May 2021, David Gibson wrote: >>> On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >>>> On Sun, 23 May 2021, BALATON Zoltan wrote: >>>>> On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >>>>>> One thing to note about PCI is that normally I think the client >>>>>> expects the firmware to do PCI probing and SLOF does it. But VOF >>>>>> does not and Linux scans PCI bus(es) itself. Might be a problem for >>>>>> you kernel. >>>>> >>>>> I'm not sure what info does MorphOS get from the device tree and what it >>>>> probes itself but I think it may at least need device ids and info about >>>>> the PCI bus to be able to access the config regs, after that it should >>>>> set the devices up hopefully. I could add these from the board code to >>>>> device tree so VOF does not need to do anything about it. However I'm >>>>> not getting to that point yet because it crashes on something that it's >>>>> missing and couldn't yet find out what is that. >>>>> >>>>> I'd like to get Linux working now as that would be enough to test this >>>>> and then if for MorphOS we still need a ROM it's not a problem if at >>>>> least we can boot Linux without the original firmware. But I can't make >>>>> Linux open a serial console and I don't know what it needs for that. Do >>>>> you happen to know? I've looked at the sources in Linux/arch/powerpc but >>>>> not sure how it would find and open a serial port on pegasos2. It seems >>>>> to work with the board firmware and now I can get it to boot with VOF >>>>> but then it does not open serial so it probably needs something in the >>>>> device tree or expects the firmware to set something up that we should >>>>> add in pegasos2.c when using VOF. >>>> >>>> I've now found that Linux uses rtas methods read-pci-config and >>>> write-pci-config for PCI access on pegasos2 so this means that we'll >>>> probably need rtas too (I hoped we could get away without it if it were >>>> only >>>> used for shutdown/reboot or so but seems Linux needs it for PCI as well >>>> and >>>> does not scan the bus and won't find some devices without it). >>> >>> Yes, definitely sounds like you'll need an RTAS implementation. >> >> I plan to fix that after managed to get serial working as that seems to not >> need it. If I delete the rtas-size property from /rtas on the original >> firmware that makes Linux skip instantiating rtas, but I still get serial >> output just not accessing PCI devices. So I think it should work and keeps >> things simpler at first. Then I'll try rtas later. >> >>>> While VOF can do rtas, this causes a problem with the hypercall method >>>> using >>>> sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so >>>> cannot work after guest is past quiesce. >>> >>>> So the question is why is that >>>> assert there >>> >>> Ah.. right. So, vhyp was designed for the PAPR use case, where we >>> want to model the CPU when it's in supervisor and user mode, but not >>> when it's in hypervisor mode. We want qemu to mimic the behaviour of >>> the hypervisor, rather than attempting to actually execute hypervisor >>> code in the virtual CPU. >>> >>> On systems that have a hypervisor mode, SDR1 is hypervisor privileged, >>> so it makes no sense for the guest to attempt to set it. That should >>> be caught by the general SPR code and turned into a 0x700, hence the >>> assert() if we somehow reach ppc_store_sdr1(). >>> >>> So, we are seeing a problem here because you want the 'sc 1' >>> interception of vhyp, but not the rest of the stuff that goes with it. >>> >>>> and would using sc 1 for hypercalls on pegasos2 cause other >>>> problems later even if the assert could be removed? >>> >>> At least in the short term, I think you probably can remove the >>> assert. In your case the 'sc 1' calls aren't truly to a hypervisor, >>> but a special case escape to qemu for the firmware emulation. I think >>> it's unlikely to cause problems later, because nothing on a 32-bit >>> system should be attempting an 'sc 1'. The only thing I can think of >>> that would fail is some test case which explicitly verified that 'sc >>> 1' triggered a 0x700 (SIGILL from userspace). >> >> OK so the assert should check if the CPU has an HV bit. I think there was a >> #detine for that somewhere that I can add to the assert then I can try >> that. What I wasn't sure about is that sc 1 would conflict with the guest's >> usage of normal sc calls or are these going through different paths and >> only sc 1 will trigger vhyp callback not affecting notmal sc calls? (Or if >> this causes an otherwise unnecessary VM exit on KVM even when it works then >> maybe looking for a different way in the future might be needed. But for >> now if this works with modifying the assert to allow this on ppc32 then I >> go for that as that's the simplest way for now.) >> >>>> Can somebody who knows >>>> more about it explain this please? If this cannot be resolved then we may >>>> need a different hypercall method on pegasos2 (I've considered MOL OSI or >>>> are there other options? I may use some advice from people who know it >>>> better, especially the possible interaction with KVM later as the long >>>> term >>>> goal with pegasos2 is to be able to run with KVM on PPC hardware >>>> eventually.) >>> >>> Right, you might need an alternative method eventually. Really any >>> illegal instruction for your cpu is a possible candidate. Bear in >>> mind that this is *not* truly a hypercall interface, instead it's >>> something we're special casing for the purposes of faking the >>> firmware. >>> >>> The "attn" instruction used on BookE might be a reasonable candidate >>> (assuming it doesn't conflict with something on 32-bit BookS) - that's >>> often used for things like signalling the attention of hardware >>> debuggers, and this is somewhat akin. >>> >>> Mostly it's just a matter of working out what would be least messy to >>> intercept in the TCG instruction decoding path. >> >> I'll wait for the current ongoing reorganisations to settle for that. If an >> alternative is needed I was considering the interface used by Mac on Linux: >> >> https://lists.nongnu.org/archive/html/qemu-ppc/2021-03/msg00047.html >> >> becuase there are some paravirtual drivers I think that use these on Mac OS >> X so this might also be useful for that use case for Mac emulation. But >> that seems very similar just checking for magic values at a normal syscall >> which means all syscalls will be intercepted anyway. In that case if sc 1 >> does not interfere with normal sc instructions then it may be better to >> keep that as the invalid instruction we trap on. >> >>>> But this also means that if that assert cannot be dropped or >>>> there may be other problems with sc 1 hypercalls then we maybe cannot >>>> have >>>> the same vof.bin and we'll need a separate version that I would like to >>>> avoid if possible so if there's a simple way to keep it working or make >>>> vof.bin use alternate hypercall method without needing a separate binary >>>> that would be the direction I'd tend to go. Even if we need a seoarate >>>> version I'd like to keep as much common as possible. >>>> >>>> I've tested that the missing rtas is not the reason for getting no output >>>> via serial though, as even when disabling rtas on pegasos2.rom it boots >>>> and >>>> I still get serial output just some PCI devices are not detected (such as >>>> USB, the video card and the not emulated ethernet port but these are not >>>> fatal so it might even work as a first try without rtas, just to boot a >>>> Linux kernel for testing it would be enough if I can fix the serial >>>> output). >>>> I still don't know why it's not finding serial but I think it may be some >>>> missing or wrong info in the device tree I generat. I'll try to focus on >>>> this for now and leave the above rtas question for later. >>> >>> Oh.. another thought on that. You have an ISA serial port on Pegasos, >>> I believe. I wonder if the PCI->ISA bridge needs some configuration / >>> initialization that the firmware is expected to do. If so you'll need >>> to mimic that setup in qemu for the VOF case. >> >> That's what I begin to think because I've added everything to the device >> tree that I thought could be needed and I still don't get it working so it >> may need some config from the firmware. But how do I access device >> registers from board code? I've tried adding a machine reset method and >> write to memory mapped device registers but all my attempts failed. I've >> tried cpu_stl_le_data and even memory_region_dispatch_write but these did >> not get to the device. What's the way to access guest mmio regs from QEMU? > > If we know that that serial is sitting behind PCI->ISA bridge (is it?), I > think you need to assign a BAR to that bridge, do some ISA setup (no idea > which) and enable that bridge (write MEMORY to PCI_COMMAND), this should > enable its registers. > > In pseries we add "linux,pci-probe-only"=0 which makes Linux do all the above > instead of relying on the firmware doing BAR assignment. Turns out it was not that. Configuring the serial device is not implemented in the ISA bridge model because it could not be done cleanly (I had a hacky way that was rejected) so the port is always enabled and the other defaults seem to work at least for getting serial without further config of devices. What was missing is a bus-range property in the device tree for /pci nodes that's seemingly unrelated but Linux needs this to get past trying the detect PCI even if it can't probe devices without rtas and without that it never gets to write anything to serial even if it detects it. Also the order in which PCI busses are added to the device tree seem to matter regardless of their properties or there's still some problem with this bus-range property that I'll need to check again. But now I can get serial output with Linux under VOF and it boots but will need to implement rtas for PCI devices and RTC access. I've started with the general rtas callbacks infrastructure but I'll need to implement PCI access methods. (MorphOS is still not happy with it, maybe it needs more device infos in the device tree but as long as Linux boots with it I don't care as those who want MorphOS could use a firmware rom image.) Regards, BALATON Zoltan
On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: > On Mon, 24 May 2021, David Gibson wrote: > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: > > > On Sun, 23 May 2021, BALATON Zoltan wrote: > > > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > > > > > One thing to note about PCI is that normally I think the client > > > > > expects the firmware to do PCI probing and SLOF does it. But VOF > > > > > does not and Linux scans PCI bus(es) itself. Might be a problem for > > > > > you kernel. > > > > > > > > I'm not sure what info does MorphOS get from the device tree and what it > > > > probes itself but I think it may at least need device ids and info about > > > > the PCI bus to be able to access the config regs, after that it should > > > > set the devices up hopefully. I could add these from the board code to > > > > device tree so VOF does not need to do anything about it. However I'm > > > > not getting to that point yet because it crashes on something that it's > > > > missing and couldn't yet find out what is that. > > > > > > > > I'd like to get Linux working now as that would be enough to test this > > > > and then if for MorphOS we still need a ROM it's not a problem if at > > > > least we can boot Linux without the original firmware. But I can't make > > > > Linux open a serial console and I don't know what it needs for that. Do > > > > you happen to know? I've looked at the sources in Linux/arch/powerpc but > > > > not sure how it would find and open a serial port on pegasos2. It seems > > > > to work with the board firmware and now I can get it to boot with VOF > > > > but then it does not open serial so it probably needs something in the > > > > device tree or expects the firmware to set something up that we should > > > > add in pegasos2.c when using VOF. > > > > > > I've now found that Linux uses rtas methods read-pci-config and > > > write-pci-config for PCI access on pegasos2 so this means that we'll > > > probably need rtas too (I hoped we could get away without it if it were only > > > used for shutdown/reboot or so but seems Linux needs it for PCI as well and > > > does not scan the bus and won't find some devices without it). > > > > Yes, definitely sounds like you'll need an RTAS implementation. > > I plan to fix that after managed to get serial working as that seems to not > need it. If I delete the rtas-size property from /rtas on the original > firmware that makes Linux skip instantiating rtas, but I still get serial > output just not accessing PCI devices. So I think it should work and keeps > things simpler at first. Then I'll try rtas later. > > > > While VOF can do rtas, this causes a problem with the hypercall method using > > > sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so > > > cannot work after guest is past quiesce. > > > > > So the question is why is that > > > assert there > > > > Ah.. right. So, vhyp was designed for the PAPR use case, where we > > want to model the CPU when it's in supervisor and user mode, but not > > when it's in hypervisor mode. We want qemu to mimic the behaviour of > > the hypervisor, rather than attempting to actually execute hypervisor > > code in the virtual CPU. > > > > On systems that have a hypervisor mode, SDR1 is hypervisor privileged, > > so it makes no sense for the guest to attempt to set it. That should > > be caught by the general SPR code and turned into a 0x700, hence the > > assert() if we somehow reach ppc_store_sdr1(). > > > > So, we are seeing a problem here because you want the 'sc 1' > > interception of vhyp, but not the rest of the stuff that goes with it. > > > > > and would using sc 1 for hypercalls on pegasos2 cause other > > > problems later even if the assert could be removed? > > > > At least in the short term, I think you probably can remove the > > assert. In your case the 'sc 1' calls aren't truly to a hypervisor, > > but a special case escape to qemu for the firmware emulation. I think > > it's unlikely to cause problems later, because nothing on a 32-bit > > system should be attempting an 'sc 1'. The only thing I can think of > > that would fail is some test case which explicitly verified that 'sc > > 1' triggered a 0x700 (SIGILL from userspace). > > OK so the assert should check if the CPU has an HV bit. I think there was a > #detine for that somewhere that I can add to the assert then I can try that. > What I wasn't sure about is that sc 1 would conflict with the guest's usage > of normal sc calls or are these going through different paths and only sc 1 > will trigger vhyp callback not affecting notmal sc calls? The vhyp shouldn't affect normal system calls, 'sc 1' is specifically for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the vhyp only intercepts the hypercall version (after all Linux on PAPR certainly uses its own system calls, and hypercalls are active for the lifetime of the guest there). > (Or if this causes > an otherwise unnecessary VM exit on KVM even when it works then maybe > looking for a different way in the future might be needed. What you're doing here won't work with KVM as it stands. There are basically two paths into the vhyp hypercall path: 1) from TCG, if we interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. The second path is specific to the PAPR (ppc64) implementation of KVM, and will not work for a non-PAPR platform without substantial modification of the KVM code. > But for now if > this works with modifying the assert to allow this on ppc32 then I go for > that as that's the simplest way for now.) > > > > Can somebody who knows > > > more about it explain this please? If this cannot be resolved then we may > > > need a different hypercall method on pegasos2 (I've considered MOL OSI or > > > are there other options? I may use some advice from people who know it > > > better, especially the possible interaction with KVM later as the long term > > > goal with pegasos2 is to be able to run with KVM on PPC hardware > > > eventually.) > > > > Right, you might need an alternative method eventually. Really any > > illegal instruction for your cpu is a possible candidate. Bear in > > mind that this is *not* truly a hypercall interface, instead it's > > something we're special casing for the purposes of faking the > > firmware. > > > > The "attn" instruction used on BookE might be a reasonable candidate > > (assuming it doesn't conflict with something on 32-bit BookS) - that's > > often used for things like signalling the attention of hardware > > debuggers, and this is somewhat akin. > > > > Mostly it's just a matter of working out what would be least messy to > > intercept in the TCG instruction decoding path. > > I'll wait for the current ongoing reorganisations to settle for that. If an > alternative is needed I was considering the interface used by Mac on Linux: > > https://lists.nongnu.org/archive/html/qemu-ppc/2021-03/msg00047.html > > becuase there are some paravirtual drivers I think that use these on Mac OS > X so this might also be useful for that use case for Mac emulation. But that > seems very similar just checking for magic values at a normal syscall which > means all syscalls will be intercepted anyway. In that case if sc 1 does not > interfere with normal sc instructions then it may be better to keep that as > the invalid instruction we trap on. > > > > But this also means that if that assert cannot be dropped or > > > there may be other problems with sc 1 hypercalls then we maybe cannot have > > > the same vof.bin and we'll need a separate version that I would like to > > > avoid if possible so if there's a simple way to keep it working or make > > > vof.bin use alternate hypercall method without needing a separate binary > > > that would be the direction I'd tend to go. Even if we need a seoarate > > > version I'd like to keep as much common as possible. > > > > > > I've tested that the missing rtas is not the reason for getting no output > > > via serial though, as even when disabling rtas on pegasos2.rom it boots and > > > I still get serial output just some PCI devices are not detected (such as > > > USB, the video card and the not emulated ethernet port but these are not > > > fatal so it might even work as a first try without rtas, just to boot a > > > Linux kernel for testing it would be enough if I can fix the serial output). > > > I still don't know why it's not finding serial but I think it may be some > > > missing or wrong info in the device tree I generat. I'll try to focus on > > > this for now and leave the above rtas question for later. > > > > Oh.. another thought on that. You have an ISA serial port on Pegasos, > > I believe. I wonder if the PCI->ISA bridge needs some configuration / > > initialization that the firmware is expected to do. If so you'll need > > to mimic that setup in qemu for the VOF case. > > That's what I begin to think because I've added everything to the device > tree that I thought could be needed and I still don't get it working so it > may need some config from the firmware. But how do I access device registers > from board code? I've tried adding a machine reset method and write to > memory mapped device registers but all my attempts failed. I've tried > cpu_stl_le_data and even memory_region_dispatch_write but these did not get > to the device. What's the way to access guest mmio regs from QEMU? That's odd, cpu_stl() and memory_region_dispatch_write() should work from board code (after the relevant memory regions are configured, of course). As an ISA serial port, it's probably accessed through IO space, not memory space though, so you'd need &address_space_io. And if there is some bridge configuration then it's the bridge control registers you need to look at not the serial registers - you'd have to look at the bridge documentation for that. Or, I guess the bridge implementation in qemu, which you wrote part of.
On Mon, May 24, 2021 at 10:46:26PM +1000, Alexey Kardashevskiy wrote: > > > On 24/05/2021 20:55, BALATON Zoltan wrote: > > On Mon, 24 May 2021, David Gibson wrote: > > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: > > > > On Sun, 23 May 2021, BALATON Zoltan wrote: > > > > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > > > > > > One thing to note about PCI is that normally I think the client > > > > > > expects the firmware to do PCI probing and SLOF does it. But VOF > > > > > > does not and Linux scans PCI bus(es) itself. Might be a problem for > > > > > > you kernel. > > > > > > > > > > I'm not sure what info does MorphOS get from the device tree > > > > > and what it > > > > > probes itself but I think it may at least need device ids > > > > > and info about > > > > > the PCI bus to be able to access the config regs, after that it should > > > > > set the devices up hopefully. I could add these from the board code to > > > > > device tree so VOF does not need to do anything about it. However I'm > > > > > not getting to that point yet because it crashes on something that it's > > > > > missing and couldn't yet find out what is that. > > > > > > > > > > I'd like to get Linux working now as that would be enough to test this > > > > > and then if for MorphOS we still need a ROM it's not a problem if at > > > > > least we can boot Linux without the original firmware. But I can't make > > > > > Linux open a serial console and I don't know what it needs for that. Do > > > > > you happen to know? I've looked at the sources in > > > > > Linux/arch/powerpc but > > > > > not sure how it would find and open a serial port on pegasos2. It seems > > > > > to work with the board firmware and now I can get it to boot with VOF > > > > > but then it does not open serial so it probably needs something in the > > > > > device tree or expects the firmware to set something up that we should > > > > > add in pegasos2.c when using VOF. > > > > > > > > I've now found that Linux uses rtas methods read-pci-config and > > > > write-pci-config for PCI access on pegasos2 so this means that we'll > > > > probably need rtas too (I hoped we could get away without it if > > > > it were only > > > > used for shutdown/reboot or so but seems Linux needs it for PCI > > > > as well and > > > > does not scan the bus and won't find some devices without it). > > > > > > Yes, definitely sounds like you'll need an RTAS implementation. > > > > I plan to fix that after managed to get serial working as that seems to > > not need it. If I delete the rtas-size property from /rtas on the > > original firmware that makes Linux skip instantiating rtas, but I still > > get serial output just not accessing PCI devices. So I think it should > > work and keeps things simpler at first. Then I'll try rtas later. > > > > > > While VOF can do rtas, this causes a problem with the hypercall > > > > method using > > > > sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so > > > > cannot work after guest is past quiesce. > > > > > > > So the question is why is that > > > > assert there > > > > > > Ah.. right. So, vhyp was designed for the PAPR use case, where we > > > want to model the CPU when it's in supervisor and user mode, but not > > > when it's in hypervisor mode. We want qemu to mimic the behaviour of > > > the hypervisor, rather than attempting to actually execute hypervisor > > > code in the virtual CPU. > > > > > > On systems that have a hypervisor mode, SDR1 is hypervisor privileged, > > > so it makes no sense for the guest to attempt to set it. That should > > > be caught by the general SPR code and turned into a 0x700, hence the > > > assert() if we somehow reach ppc_store_sdr1(). > > > > > > So, we are seeing a problem here because you want the 'sc 1' > > > interception of vhyp, but not the rest of the stuff that goes with it. > > > > > > > and would using sc 1 for hypercalls on pegasos2 cause other > > > > problems later even if the assert could be removed? > > > > > > At least in the short term, I think you probably can remove the > > > assert. In your case the 'sc 1' calls aren't truly to a hypervisor, > > > but a special case escape to qemu for the firmware emulation. I think > > > it's unlikely to cause problems later, because nothing on a 32-bit > > > system should be attempting an 'sc 1'. The only thing I can think of > > > that would fail is some test case which explicitly verified that 'sc > > > 1' triggered a 0x700 (SIGILL from userspace). > > > > OK so the assert should check if the CPU has an HV bit. I think there > > was a #detine for that somewhere that I can add to the assert then I can > > try that. What I wasn't sure about is that sc 1 would conflict with the > > guest's usage of normal sc calls or are these going through different > > paths and only sc 1 will trigger vhyp callback not affecting notmal sc > > calls? (Or if this causes an otherwise unnecessary VM exit on KVM even > > when it works then maybe looking for a different way in the future might > > be needed. But for now if this works with modifying the assert to allow > > this on ppc32 then I go for that as that's the simplest way for now.) > > > > > > Can somebody who knows > > > > more about it explain this please? If this cannot be resolved > > > > then we may > > > > need a different hypercall method on pegasos2 (I've considered > > > > MOL OSI or > > > > are there other options? I may use some advice from people who know it > > > > better, especially the possible interaction with KVM later as > > > > the long term > > > > goal with pegasos2 is to be able to run with KVM on PPC hardware > > > > eventually.) > > > > > > Right, you might need an alternative method eventually. Really any > > > illegal instruction for your cpu is a possible candidate. Bear in > > > mind that this is *not* truly a hypercall interface, instead it's > > > something we're special casing for the purposes of faking the > > > firmware. > > > > > > The "attn" instruction used on BookE might be a reasonable candidate > > > (assuming it doesn't conflict with something on 32-bit BookS) - that's > > > often used for things like signalling the attention of hardware > > > debuggers, and this is somewhat akin. > > > > > > Mostly it's just a matter of working out what would be least messy to > > > intercept in the TCG instruction decoding path. > > > > I'll wait for the current ongoing reorganisations to settle for that. If > > an alternative is needed I was considering the interface used by Mac on > > Linux: > > > > https://lists.nongnu.org/archive/html/qemu-ppc/2021-03/msg00047.html > > > > becuase there are some paravirtual drivers I think that use these on Mac > > OS X so this might also be useful for that use case for Mac emulation. > > But that seems very similar just checking for magic values at a normal > > syscall which means all syscalls will be intercepted anyway. In that > > case if sc 1 does not interfere with normal sc instructions then it may > > be better to keep that as the invalid instruction we trap on. > > > > > > But this also means that if that assert cannot be dropped or > > > > there may be other problems with sc 1 hypercalls then we maybe > > > > cannot have > > > > the same vof.bin and we'll need a separate version that I would like to > > > > avoid if possible so if there's a simple way to keep it working or make > > > > vof.bin use alternate hypercall method without needing a separate binary > > > > that would be the direction I'd tend to go. Even if we need a seoarate > > > > version I'd like to keep as much common as possible. > > > > > > > > I've tested that the missing rtas is not the reason for getting > > > > no output > > > > via serial though, as even when disabling rtas on pegasos2.rom > > > > it boots and > > > > I still get serial output just some PCI devices are not detected > > > > (such as > > > > USB, the video card and the not emulated ethernet port but these are not > > > > fatal so it might even work as a first try without rtas, just to boot a > > > > Linux kernel for testing it would be enough if I can fix the > > > > serial output). > > > > I still don't know why it's not finding serial but I think it > > > > may be some > > > > missing or wrong info in the device tree I generat. I'll try to focus on > > > > this for now and leave the above rtas question for later. > > > > > > Oh.. another thought on that. You have an ISA serial port on Pegasos, > > > I believe. I wonder if the PCI->ISA bridge needs some configuration / > > > initialization that the firmware is expected to do. If so you'll need > > > to mimic that setup in qemu for the VOF case. > > > > That's what I begin to think because I've added everything to the device > > tree that I thought could be needed and I still don't get it working so > > it may need some config from the firmware. But how do I access device > > registers from board code? I've tried adding a machine reset method and > > write to memory mapped device registers but all my attempts failed. I've > > tried cpu_stl_le_data and even memory_region_dispatch_write but these > > did not get to the device. What's the way to access guest mmio regs from > > QEMU? > > If we know that that serial is sitting behind PCI->ISA bridge (is it?), I Uh.. maybe. I think ISA bridges at least sometimes behave differently from regular PCI devices or bridges, because legacy. Also note that it's probably IO space you need to map in, not MMIO space. > think you need to assign a BAR to that bridge, do some ISA setup (no idea > which) and enable that bridge (write MEMORY to PCI_COMMAND), this should > enable its registers. > > In pseries we add "linux,pci-probe-only"=0 which makes Linux do all the > above instead of relying on the firmware doing BAR assignment. > >
On Mon, May 24, 2021 at 02:42:30PM +0200, BALATON Zoltan wrote: > On Mon, 24 May 2021, David Gibson wrote: > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: > > > On Sun, 23 May 2021, BALATON Zoltan wrote: > > > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > > > > > One thing to note about PCI is that normally I think the client > > > > > expects the firmware to do PCI probing and SLOF does it. But VOF > > > > > does not and Linux scans PCI bus(es) itself. Might be a problem for > > > > > you kernel. > > > > > > > > I'm not sure what info does MorphOS get from the device tree and what it > > > > probes itself but I think it may at least need device ids and info about > > > > the PCI bus to be able to access the config regs, after that it should > > > > set the devices up hopefully. I could add these from the board code to > > > > device tree so VOF does not need to do anything about it. However I'm > > > > not getting to that point yet because it crashes on something that it's > > > > missing and couldn't yet find out what is that. > > > > > > > > I'd like to get Linux working now as that would be enough to test this > > > > and then if for MorphOS we still need a ROM it's not a problem if at > > > > least we can boot Linux without the original firmware. But I can't make > > > > Linux open a serial console and I don't know what it needs for that. Do > > > > you happen to know? I've looked at the sources in Linux/arch/powerpc but > > > > not sure how it would find and open a serial port on pegasos2. It seems > > > > to work with the board firmware and now I can get it to boot with VOF > > > > but then it does not open serial so it probably needs something in the > > > > device tree or expects the firmware to set something up that we should > > > > add in pegasos2.c when using VOF. > > > > > > I've now found that Linux uses rtas methods read-pci-config and > > > write-pci-config for PCI access on pegasos2 so this means that we'll > > > probably need rtas too (I hoped we could get away without it if it were only > > > used for shutdown/reboot or so but seems Linux needs it for PCI as well and > > > does not scan the bus and won't find some devices without it). > > > > Yes, definitely sounds like you'll need an RTAS implementation. > > > > > While VOF can do rtas, this causes a problem with the hypercall method using > > > sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so > > > cannot work after guest is past quiesce. > > > > > So the question is why is that > > > assert there > > > > Ah.. right. So, vhyp was designed for the PAPR use case, where we > > want to model the CPU when it's in supervisor and user mode, but not > > when it's in hypervisor mode. We want qemu to mimic the behaviour of > > the hypervisor, rather than attempting to actually execute hypervisor > > code in the virtual CPU. > > > > On systems that have a hypervisor mode, SDR1 is hypervisor privileged, > > so it makes no sense for the guest to attempt to set it. That should > > be caught by the general SPR code and turned into a 0x700, hence the > > assert() if we somehow reach ppc_store_sdr1(). > > This seems to work to avoid my problem so I can leave vhyp enabled after > qiuesce for now: > > diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c > index d957d1a687..13b87b9b36 100644 > --- a/target/ppc/cpu.c > +++ b/target/ppc/cpu.c > @@ -70,7 +70,7 @@ void ppc_store_sdr1(CPUPPCState *env, target_ulong value) > { > PowerPCCPU *cpu = env_archcpu(env); > qemu_log_mask(CPU_LOG_MMU, "%s: " TARGET_FMT_lx "\n", __func__, value); > - assert(!cpu->vhyp); > + assert(!cpu->env.has_hv_mode || !cpu->vhyp); > #if defined(TARGET_PPC64) > if (mmu_is_64bit(env->mmu_model)) { > target_ulong sdr_mask = SDR_64_HTABORG | SDR_64_HTABSIZE; > > But I wonder if the assert should also be moved within the TARGET_PPC64 > block and if we may need to generate some exception here instead. Not sure > what a real CPU would do in this case but if accessing sdr1 is privileged in > HV mode then there should be an exception or if that's catched > elsewhere It should be caught elsehwere. Specifically, when the SDR1 SPR is registered, on CPUs with a hypervisor mode it should be registered as hypervisor privileged, so the general mtspr dispatch logic should generate the exception if it's called from !HV code. The assert here is just to sanity check that it has done so before we enter the actual softmmu code. > then this assert may not be needed at all. I can make a patch if you tell me > what should it do. > > Regards, > BALATON Zoltan >
On Tue, 25 May 2021, David Gibson wrote: > On Mon, May 24, 2021 at 02:42:30PM +0200, BALATON Zoltan wrote: >> On Mon, 24 May 2021, David Gibson wrote: >>> On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >>>> On Sun, 23 May 2021, BALATON Zoltan wrote: >>>>> On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >>>>>> One thing to note about PCI is that normally I think the client >>>>>> expects the firmware to do PCI probing and SLOF does it. But VOF >>>>>> does not and Linux scans PCI bus(es) itself. Might be a problem for >>>>>> you kernel. >>>>> >>>>> I'm not sure what info does MorphOS get from the device tree and what it >>>>> probes itself but I think it may at least need device ids and info about >>>>> the PCI bus to be able to access the config regs, after that it should >>>>> set the devices up hopefully. I could add these from the board code to >>>>> device tree so VOF does not need to do anything about it. However I'm >>>>> not getting to that point yet because it crashes on something that it's >>>>> missing and couldn't yet find out what is that. >>>>> >>>>> I'd like to get Linux working now as that would be enough to test this >>>>> and then if for MorphOS we still need a ROM it's not a problem if at >>>>> least we can boot Linux without the original firmware. But I can't make >>>>> Linux open a serial console and I don't know what it needs for that. Do >>>>> you happen to know? I've looked at the sources in Linux/arch/powerpc but >>>>> not sure how it would find and open a serial port on pegasos2. It seems >>>>> to work with the board firmware and now I can get it to boot with VOF >>>>> but then it does not open serial so it probably needs something in the >>>>> device tree or expects the firmware to set something up that we should >>>>> add in pegasos2.c when using VOF. >>>> >>>> I've now found that Linux uses rtas methods read-pci-config and >>>> write-pci-config for PCI access on pegasos2 so this means that we'll >>>> probably need rtas too (I hoped we could get away without it if it were only >>>> used for shutdown/reboot or so but seems Linux needs it for PCI as well and >>>> does not scan the bus and won't find some devices without it). >>> >>> Yes, definitely sounds like you'll need an RTAS implementation. >>> >>>> While VOF can do rtas, this causes a problem with the hypercall method using >>>> sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so >>>> cannot work after guest is past quiesce. >>> >>>> So the question is why is that >>>> assert there >>> >>> Ah.. right. So, vhyp was designed for the PAPR use case, where we >>> want to model the CPU when it's in supervisor and user mode, but not >>> when it's in hypervisor mode. We want qemu to mimic the behaviour of >>> the hypervisor, rather than attempting to actually execute hypervisor >>> code in the virtual CPU. >>> >>> On systems that have a hypervisor mode, SDR1 is hypervisor privileged, >>> so it makes no sense for the guest to attempt to set it. That should >>> be caught by the general SPR code and turned into a 0x700, hence the >>> assert() if we somehow reach ppc_store_sdr1(). >> >> This seems to work to avoid my problem so I can leave vhyp enabled after >> qiuesce for now: >> >> diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c >> index d957d1a687..13b87b9b36 100644 >> --- a/target/ppc/cpu.c >> +++ b/target/ppc/cpu.c >> @@ -70,7 +70,7 @@ void ppc_store_sdr1(CPUPPCState *env, target_ulong value) >> { >> PowerPCCPU *cpu = env_archcpu(env); >> qemu_log_mask(CPU_LOG_MMU, "%s: " TARGET_FMT_lx "\n", __func__, value); >> - assert(!cpu->vhyp); >> + assert(!cpu->env.has_hv_mode || !cpu->vhyp); >> #if defined(TARGET_PPC64) >> if (mmu_is_64bit(env->mmu_model)) { >> target_ulong sdr_mask = SDR_64_HTABORG | SDR_64_HTABSIZE; >> >> But I wonder if the assert should also be moved within the TARGET_PPC64 >> block and if we may need to generate some exception here instead. Not sure >> what a real CPU would do in this case but if accessing sdr1 is privileged in >> HV mode then there should be an exception or if that's catched >> elsewhere > > It should be caught elsehwere. Specifically, when the SDR1 SPR is > registered, on CPUs with a hypervisor mode it should be registered as > hypervisor privileged, so the general mtspr dispatch logic should > generate the exception if it's called from !HV code. The assert here > is just to sanity check that it has done so before we enter the actual > softmmu code. So what's the decision then? Remove this assert or modify it like above and move it to the TARGET_PPC64 block (as no 32 bit CPU should have an HV bit anyway). Regards, BALATON Zoltan
On Tue, 25 May 2021, David Gibson wrote: > On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: >> On Mon, 24 May 2021, David Gibson wrote: >>> On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >>>> On Sun, 23 May 2021, BALATON Zoltan wrote: >>>>> On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >>>>>> One thing to note about PCI is that normally I think the client >>>>>> expects the firmware to do PCI probing and SLOF does it. But VOF >>>>>> does not and Linux scans PCI bus(es) itself. Might be a problem for >>>>>> you kernel. >>>>> >>>>> I'm not sure what info does MorphOS get from the device tree and what it >>>>> probes itself but I think it may at least need device ids and info about >>>>> the PCI bus to be able to access the config regs, after that it should >>>>> set the devices up hopefully. I could add these from the board code to >>>>> device tree so VOF does not need to do anything about it. However I'm >>>>> not getting to that point yet because it crashes on something that it's >>>>> missing and couldn't yet find out what is that. >>>>> >>>>> I'd like to get Linux working now as that would be enough to test this >>>>> and then if for MorphOS we still need a ROM it's not a problem if at >>>>> least we can boot Linux without the original firmware. But I can't make >>>>> Linux open a serial console and I don't know what it needs for that. Do >>>>> you happen to know? I've looked at the sources in Linux/arch/powerpc but >>>>> not sure how it would find and open a serial port on pegasos2. It seems >>>>> to work with the board firmware and now I can get it to boot with VOF >>>>> but then it does not open serial so it probably needs something in the >>>>> device tree or expects the firmware to set something up that we should >>>>> add in pegasos2.c when using VOF. >>>> >>>> I've now found that Linux uses rtas methods read-pci-config and >>>> write-pci-config for PCI access on pegasos2 so this means that we'll >>>> probably need rtas too (I hoped we could get away without it if it were only >>>> used for shutdown/reboot or so but seems Linux needs it for PCI as well and >>>> does not scan the bus and won't find some devices without it). >>> >>> Yes, definitely sounds like you'll need an RTAS implementation. >> >> I plan to fix that after managed to get serial working as that seems to not >> need it. If I delete the rtas-size property from /rtas on the original >> firmware that makes Linux skip instantiating rtas, but I still get serial >> output just not accessing PCI devices. So I think it should work and keeps >> things simpler at first. Then I'll try rtas later. >> >>>> While VOF can do rtas, this causes a problem with the hypercall method using >>>> sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so >>>> cannot work after guest is past quiesce. >>> >>>> So the question is why is that >>>> assert there >>> >>> Ah.. right. So, vhyp was designed for the PAPR use case, where we >>> want to model the CPU when it's in supervisor and user mode, but not >>> when it's in hypervisor mode. We want qemu to mimic the behaviour of >>> the hypervisor, rather than attempting to actually execute hypervisor >>> code in the virtual CPU. >>> >>> On systems that have a hypervisor mode, SDR1 is hypervisor privileged, >>> so it makes no sense for the guest to attempt to set it. That should >>> be caught by the general SPR code and turned into a 0x700, hence the >>> assert() if we somehow reach ppc_store_sdr1(). >>> >>> So, we are seeing a problem here because you want the 'sc 1' >>> interception of vhyp, but not the rest of the stuff that goes with it. >>> >>>> and would using sc 1 for hypercalls on pegasos2 cause other >>>> problems later even if the assert could be removed? >>> >>> At least in the short term, I think you probably can remove the >>> assert. In your case the 'sc 1' calls aren't truly to a hypervisor, >>> but a special case escape to qemu for the firmware emulation. I think >>> it's unlikely to cause problems later, because nothing on a 32-bit >>> system should be attempting an 'sc 1'. The only thing I can think of >>> that would fail is some test case which explicitly verified that 'sc >>> 1' triggered a 0x700 (SIGILL from userspace). >> >> OK so the assert should check if the CPU has an HV bit. I think there was a >> #detine for that somewhere that I can add to the assert then I can try that. >> What I wasn't sure about is that sc 1 would conflict with the guest's usage >> of normal sc calls or are these going through different paths and only sc 1 >> will trigger vhyp callback not affecting notmal sc calls? > > The vhyp shouldn't affect normal system calls, 'sc 1' is specifically > for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the > vhyp only intercepts the hypercall version (after all Linux on PAPR > certainly uses its own system calls, and hypercalls are active for the > lifetime of the guest there). > >> (Or if this causes >> an otherwise unnecessary VM exit on KVM even when it works then maybe >> looking for a different way in the future might be needed. > > What you're doing here won't work with KVM as it stands. There are > basically two paths into the vhyp hypercall path: 1) from TCG, if we > interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get > a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. > > The second path is specific to the PAPR (ppc64) implementation of KVM, > and will not work for a non-PAPR platform without substantial > modification of the KVM code. OK so then at that point when we try KVM we'll need to look at alternative ways, I think MOL OSI worked with KVM at least in MOL but will probably make all syscalls exit KVM but since we'll probably need to use KVM PR it will exit anyway. For now I keep this vhyp as it does not run with KVM for other reasons yet so that's another area to clean up so as a proof of concept first version of using VOF vhyp will do. [...] >>>> I've tested that the missing rtas is not the reason for getting no output >>>> via serial though, as even when disabling rtas on pegasos2.rom it boots and >>>> I still get serial output just some PCI devices are not detected (such as >>>> USB, the video card and the not emulated ethernet port but these are not >>>> fatal so it might even work as a first try without rtas, just to boot a >>>> Linux kernel for testing it would be enough if I can fix the serial output). >>>> I still don't know why it's not finding serial but I think it may be some >>>> missing or wrong info in the device tree I generat. I'll try to focus on >>>> this for now and leave the above rtas question for later. >>> >>> Oh.. another thought on that. You have an ISA serial port on Pegasos, >>> I believe. I wonder if the PCI->ISA bridge needs some configuration / >>> initialization that the firmware is expected to do. If so you'll need >>> to mimic that setup in qemu for the VOF case. >> >> That's what I begin to think because I've added everything to the device >> tree that I thought could be needed and I still don't get it working so it >> may need some config from the firmware. But how do I access device registers >> from board code? I've tried adding a machine reset method and write to >> memory mapped device registers but all my attempts failed. I've tried >> cpu_stl_le_data and even memory_region_dispatch_write but these did not get >> to the device. What's the way to access guest mmio regs from QEMU? > > That's odd, cpu_stl() and memory_region_dispatch_write() should work > from board code (after the relevant memory regions are configured, of > course). As an ISA serial port, it's probably accessed through IO > space, not memory space though, so you'd need &address_space_io. And > if there is some bridge configuration then it's the bridge control > registers you need to look at not the serial registers - you'd have to > look at the bridge documentation for that. Or, I guess the bridge > implementation in qemu, which you wrote part of. I've found at last that stl_le_phys() works. There are so many of these that I never know when to use which. I think the address_space_rw calls in vof_client_call() in vof.c could also use these for somewhat shorter code. I've ended up with stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset methodbut I don't even need that now as it works without additional setup. Also VOF's memory access is basically the same as the already existing rtas_st() and co. so maybe that could be reused to make code smaller? Regards, BALATON Zoltan
On Tue, May 25, 2021 at 11:55:43AM +0200, BALATON Zoltan wrote: > On Tue, 25 May 2021, David Gibson wrote: > > On Mon, May 24, 2021 at 02:42:30PM +0200, BALATON Zoltan wrote: > > > On Mon, 24 May 2021, David Gibson wrote: > > > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: > > > > > On Sun, 23 May 2021, BALATON Zoltan wrote: > > > > > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > > > > > > > One thing to note about PCI is that normally I think the client > > > > > > > expects the firmware to do PCI probing and SLOF does it. But VOF > > > > > > > does not and Linux scans PCI bus(es) itself. Might be a problem for > > > > > > > you kernel. > > > > > > > > > > > > I'm not sure what info does MorphOS get from the device tree and what it > > > > > > probes itself but I think it may at least need device ids and info about > > > > > > the PCI bus to be able to access the config regs, after that it should > > > > > > set the devices up hopefully. I could add these from the board code to > > > > > > device tree so VOF does not need to do anything about it. However I'm > > > > > > not getting to that point yet because it crashes on something that it's > > > > > > missing and couldn't yet find out what is that. > > > > > > > > > > > > I'd like to get Linux working now as that would be enough to test this > > > > > > and then if for MorphOS we still need a ROM it's not a problem if at > > > > > > least we can boot Linux without the original firmware. But I can't make > > > > > > Linux open a serial console and I don't know what it needs for that. Do > > > > > > you happen to know? I've looked at the sources in Linux/arch/powerpc but > > > > > > not sure how it would find and open a serial port on pegasos2. It seems > > > > > > to work with the board firmware and now I can get it to boot with VOF > > > > > > but then it does not open serial so it probably needs something in the > > > > > > device tree or expects the firmware to set something up that we should > > > > > > add in pegasos2.c when using VOF. > > > > > > > > > > I've now found that Linux uses rtas methods read-pci-config and > > > > > write-pci-config for PCI access on pegasos2 so this means that we'll > > > > > probably need rtas too (I hoped we could get away without it if it were only > > > > > used for shutdown/reboot or so but seems Linux needs it for PCI as well and > > > > > does not scan the bus and won't find some devices without it). > > > > > > > > Yes, definitely sounds like you'll need an RTAS implementation. > > > > > > > > > While VOF can do rtas, this causes a problem with the hypercall method using > > > > > sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so > > > > > cannot work after guest is past quiesce. > > > > > > > > > So the question is why is that > > > > > assert there > > > > > > > > Ah.. right. So, vhyp was designed for the PAPR use case, where we > > > > want to model the CPU when it's in supervisor and user mode, but not > > > > when it's in hypervisor mode. We want qemu to mimic the behaviour of > > > > the hypervisor, rather than attempting to actually execute hypervisor > > > > code in the virtual CPU. > > > > > > > > On systems that have a hypervisor mode, SDR1 is hypervisor privileged, > > > > so it makes no sense for the guest to attempt to set it. That should > > > > be caught by the general SPR code and turned into a 0x700, hence the > > > > assert() if we somehow reach ppc_store_sdr1(). > > > > > > This seems to work to avoid my problem so I can leave vhyp enabled after > > > qiuesce for now: > > > > > > diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c > > > index d957d1a687..13b87b9b36 100644 > > > --- a/target/ppc/cpu.c > > > +++ b/target/ppc/cpu.c > > > @@ -70,7 +70,7 @@ void ppc_store_sdr1(CPUPPCState *env, target_ulong value) > > > { > > > PowerPCCPU *cpu = env_archcpu(env); > > > qemu_log_mask(CPU_LOG_MMU, "%s: " TARGET_FMT_lx "\n", __func__, value); > > > - assert(!cpu->vhyp); > > > + assert(!cpu->env.has_hv_mode || !cpu->vhyp); > > > #if defined(TARGET_PPC64) > > > if (mmu_is_64bit(env->mmu_model)) { > > > target_ulong sdr_mask = SDR_64_HTABORG | SDR_64_HTABSIZE; > > > > > > But I wonder if the assert should also be moved within the TARGET_PPC64 > > > block and if we may need to generate some exception here instead. Not sure > > > what a real CPU would do in this case but if accessing sdr1 is privileged in > > > HV mode then there should be an exception or if that's catched > > > elsewhere > > > > It should be caught elsehwere. Specifically, when the SDR1 SPR is > > registered, on CPUs with a hypervisor mode it should be registered as > > hypervisor privileged, so the general mtspr dispatch logic should > > generate the exception if it's called from !HV code. The assert here > > is just to sanity check that it has done so before we enter the actual > > softmmu code. > > So what's the decision then? Remove this assert or modify it like above and > move it to the TARGET_PPC64 block (as no 32 bit CPU should have an HV bit > anyway). Uh, I guess modify it with the if-hv-available thing. Don't move it under the ifdef, it still makes logical sense for 32-bit systems, even though the HV available side should never trip.
On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: > On Tue, 25 May 2021, David Gibson wrote: > > On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: > > > On Mon, 24 May 2021, David Gibson wrote: > > > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: > > > > > On Sun, 23 May 2021, BALATON Zoltan wrote: > > > > > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > > > > > > > One thing to note about PCI is that normally I think the client > > > > > > > expects the firmware to do PCI probing and SLOF does it. But VOF > > > > > > > does not and Linux scans PCI bus(es) itself. Might be a problem for > > > > > > > you kernel. > > > > > > > > > > > > I'm not sure what info does MorphOS get from the device tree and what it > > > > > > probes itself but I think it may at least need device ids and info about > > > > > > the PCI bus to be able to access the config regs, after that it should > > > > > > set the devices up hopefully. I could add these from the board code to > > > > > > device tree so VOF does not need to do anything about it. However I'm > > > > > > not getting to that point yet because it crashes on something that it's > > > > > > missing and couldn't yet find out what is that. > > > > > > > > > > > > I'd like to get Linux working now as that would be enough to test this > > > > > > and then if for MorphOS we still need a ROM it's not a problem if at > > > > > > least we can boot Linux without the original firmware. But I can't make > > > > > > Linux open a serial console and I don't know what it needs for that. Do > > > > > > you happen to know? I've looked at the sources in Linux/arch/powerpc but > > > > > > not sure how it would find and open a serial port on pegasos2. It seems > > > > > > to work with the board firmware and now I can get it to boot with VOF > > > > > > but then it does not open serial so it probably needs something in the > > > > > > device tree or expects the firmware to set something up that we should > > > > > > add in pegasos2.c when using VOF. > > > > > > > > > > I've now found that Linux uses rtas methods read-pci-config and > > > > > write-pci-config for PCI access on pegasos2 so this means that we'll > > > > > probably need rtas too (I hoped we could get away without it if it were only > > > > > used for shutdown/reboot or so but seems Linux needs it for PCI as well and > > > > > does not scan the bus and won't find some devices without it). > > > > > > > > Yes, definitely sounds like you'll need an RTAS implementation. > > > > > > I plan to fix that after managed to get serial working as that seems to not > > > need it. If I delete the rtas-size property from /rtas on the original > > > firmware that makes Linux skip instantiating rtas, but I still get serial > > > output just not accessing PCI devices. So I think it should work and keeps > > > things simpler at first. Then I'll try rtas later. > > > > > > > > While VOF can do rtas, this causes a problem with the hypercall method using > > > > > sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so > > > > > cannot work after guest is past quiesce. > > > > > > > > > So the question is why is that > > > > > assert there > > > > > > > > Ah.. right. So, vhyp was designed for the PAPR use case, where we > > > > want to model the CPU when it's in supervisor and user mode, but not > > > > when it's in hypervisor mode. We want qemu to mimic the behaviour of > > > > the hypervisor, rather than attempting to actually execute hypervisor > > > > code in the virtual CPU. > > > > > > > > On systems that have a hypervisor mode, SDR1 is hypervisor privileged, > > > > so it makes no sense for the guest to attempt to set it. That should > > > > be caught by the general SPR code and turned into a 0x700, hence the > > > > assert() if we somehow reach ppc_store_sdr1(). > > > > > > > > So, we are seeing a problem here because you want the 'sc 1' > > > > interception of vhyp, but not the rest of the stuff that goes with it. > > > > > > > > > and would using sc 1 for hypercalls on pegasos2 cause other > > > > > problems later even if the assert could be removed? > > > > > > > > At least in the short term, I think you probably can remove the > > > > assert. In your case the 'sc 1' calls aren't truly to a hypervisor, > > > > but a special case escape to qemu for the firmware emulation. I think > > > > it's unlikely to cause problems later, because nothing on a 32-bit > > > > system should be attempting an 'sc 1'. The only thing I can think of > > > > that would fail is some test case which explicitly verified that 'sc > > > > 1' triggered a 0x700 (SIGILL from userspace). > > > > > > OK so the assert should check if the CPU has an HV bit. I think there was a > > > #detine for that somewhere that I can add to the assert then I can try that. > > > What I wasn't sure about is that sc 1 would conflict with the guest's usage > > > of normal sc calls or are these going through different paths and only sc 1 > > > will trigger vhyp callback not affecting notmal sc calls? > > > > The vhyp shouldn't affect normal system calls, 'sc 1' is specifically > > for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the > > vhyp only intercepts the hypercall version (after all Linux on PAPR > > certainly uses its own system calls, and hypercalls are active for the > > lifetime of the guest there). > > > > > (Or if this causes > > > an otherwise unnecessary VM exit on KVM even when it works then maybe > > > looking for a different way in the future might be needed. > > > > What you're doing here won't work with KVM as it stands. There are > > basically two paths into the vhyp hypercall path: 1) from TCG, if we > > interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get > > a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. > > > > The second path is specific to the PAPR (ppc64) implementation of KVM, > > and will not work for a non-PAPR platform without substantial > > modification of the KVM code. > > OK so then at that point when we try KVM we'll need to look at alternative > ways, I think MOL OSI worked with KVM at least in MOL but will probably make > all syscalls exit KVM but since we'll probably need to use KVM PR it will > exit anyway. For now I keep this vhyp as it does not run with KVM for other > reasons yet so that's another area to clean up so as a proof of concept > first version of using VOF vhyp will do. Eh, since you'll need to modify KVM anyway, it probably makes just as much sense to modify it to catch the 'sc 1' as MoL's magic thingy. > [...] > > > > > I've tested that the missing rtas is not the reason for getting no output > > > > > via serial though, as even when disabling rtas on pegasos2.rom it boots and > > > > > I still get serial output just some PCI devices are not detected (such as > > > > > USB, the video card and the not emulated ethernet port but these are not > > > > > fatal so it might even work as a first try without rtas, just to boot a > > > > > Linux kernel for testing it would be enough if I can fix the serial output). > > > > > I still don't know why it's not finding serial but I think it may be some > > > > > missing or wrong info in the device tree I generat. I'll try to focus on > > > > > this for now and leave the above rtas question for later. > > > > > > > > Oh.. another thought on that. You have an ISA serial port on Pegasos, > > > > I believe. I wonder if the PCI->ISA bridge needs some configuration / > > > > initialization that the firmware is expected to do. If so you'll need > > > > to mimic that setup in qemu for the VOF case. > > > > > > That's what I begin to think because I've added everything to the device > > > tree that I thought could be needed and I still don't get it working so it > > > may need some config from the firmware. But how do I access device registers > > > from board code? I've tried adding a machine reset method and write to > > > memory mapped device registers but all my attempts failed. I've tried > > > cpu_stl_le_data and even memory_region_dispatch_write but these did not get > > > to the device. What's the way to access guest mmio regs from QEMU? > > > > That's odd, cpu_stl() and memory_region_dispatch_write() should work > > from board code (after the relevant memory regions are configured, of > > course). As an ISA serial port, it's probably accessed through IO > > space, not memory space though, so you'd need &address_space_io. And > > if there is some bridge configuration then it's the bridge control > > registers you need to look at not the serial registers - you'd have to > > look at the bridge documentation for that. Or, I guess the bridge > > implementation in qemu, which you wrote part of. > > I've found at last that stl_le_phys() works. There are so many of these that > I never know when to use which. > > I think the address_space_rw calls in vof_client_call() in vof.c could also > use these for somewhat shorter code. I've ended up with > stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset methodbut I don't > even need that now as it works without additional setup. Also VOF's memory > access is basically the same as the already existing rtas_st() and co. so > maybe that could be reused to make code smaller? rtas_ld() and rtas_st() should only be used for reading/writing RTAS parameters to and from memory. Accessing IO shouldn't be done with those. For IO you probably want the cpu_st*() variants in most cases, since you're trying to emulate an IO access from the virtual cpu.
On Thu, 27 May 2021, David Gibson wrote: > On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: >> On Tue, 25 May 2021, David Gibson wrote: >>> On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: >>>> On Mon, 24 May 2021, David Gibson wrote: >>>>> On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >>>>>> On Sun, 23 May 2021, BALATON Zoltan wrote: >>>>>>> On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >>>>>>>> One thing to note about PCI is that normally I think the client >>>>>>>> expects the firmware to do PCI probing and SLOF does it. But VOF >>>>>>>> does not and Linux scans PCI bus(es) itself. Might be a problem for >>>>>>>> you kernel. >>>>>>> >>>>>>> I'm not sure what info does MorphOS get from the device tree and what it >>>>>>> probes itself but I think it may at least need device ids and info about >>>>>>> the PCI bus to be able to access the config regs, after that it should >>>>>>> set the devices up hopefully. I could add these from the board code to >>>>>>> device tree so VOF does not need to do anything about it. However I'm >>>>>>> not getting to that point yet because it crashes on something that it's >>>>>>> missing and couldn't yet find out what is that. >>>>>>> >>>>>>> I'd like to get Linux working now as that would be enough to test this >>>>>>> and then if for MorphOS we still need a ROM it's not a problem if at >>>>>>> least we can boot Linux without the original firmware. But I can't make >>>>>>> Linux open a serial console and I don't know what it needs for that. Do >>>>>>> you happen to know? I've looked at the sources in Linux/arch/powerpc but >>>>>>> not sure how it would find and open a serial port on pegasos2. It seems >>>>>>> to work with the board firmware and now I can get it to boot with VOF >>>>>>> but then it does not open serial so it probably needs something in the >>>>>>> device tree or expects the firmware to set something up that we should >>>>>>> add in pegasos2.c when using VOF. >>>>>> >>>>>> I've now found that Linux uses rtas methods read-pci-config and >>>>>> write-pci-config for PCI access on pegasos2 so this means that we'll >>>>>> probably need rtas too (I hoped we could get away without it if it were only >>>>>> used for shutdown/reboot or so but seems Linux needs it for PCI as well and >>>>>> does not scan the bus and won't find some devices without it). >>>>> >>>>> Yes, definitely sounds like you'll need an RTAS implementation. >>>> >>>> I plan to fix that after managed to get serial working as that seems to not >>>> need it. If I delete the rtas-size property from /rtas on the original >>>> firmware that makes Linux skip instantiating rtas, but I still get serial >>>> output just not accessing PCI devices. So I think it should work and keeps >>>> things simpler at first. Then I'll try rtas later. >>>> >>>>>> While VOF can do rtas, this causes a problem with the hypercall method using >>>>>> sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so >>>>>> cannot work after guest is past quiesce. >>>>> >>>>>> So the question is why is that >>>>>> assert there >>>>> >>>>> Ah.. right. So, vhyp was designed for the PAPR use case, where we >>>>> want to model the CPU when it's in supervisor and user mode, but not >>>>> when it's in hypervisor mode. We want qemu to mimic the behaviour of >>>>> the hypervisor, rather than attempting to actually execute hypervisor >>>>> code in the virtual CPU. >>>>> >>>>> On systems that have a hypervisor mode, SDR1 is hypervisor privileged, >>>>> so it makes no sense for the guest to attempt to set it. That should >>>>> be caught by the general SPR code and turned into a 0x700, hence the >>>>> assert() if we somehow reach ppc_store_sdr1(). >>>>> >>>>> So, we are seeing a problem here because you want the 'sc 1' >>>>> interception of vhyp, but not the rest of the stuff that goes with it. >>>>> >>>>>> and would using sc 1 for hypercalls on pegasos2 cause other >>>>>> problems later even if the assert could be removed? >>>>> >>>>> At least in the short term, I think you probably can remove the >>>>> assert. In your case the 'sc 1' calls aren't truly to a hypervisor, >>>>> but a special case escape to qemu for the firmware emulation. I think >>>>> it's unlikely to cause problems later, because nothing on a 32-bit >>>>> system should be attempting an 'sc 1'. The only thing I can think of >>>>> that would fail is some test case which explicitly verified that 'sc >>>>> 1' triggered a 0x700 (SIGILL from userspace). >>>> >>>> OK so the assert should check if the CPU has an HV bit. I think there was a >>>> #detine for that somewhere that I can add to the assert then I can try that. >>>> What I wasn't sure about is that sc 1 would conflict with the guest's usage >>>> of normal sc calls or are these going through different paths and only sc 1 >>>> will trigger vhyp callback not affecting notmal sc calls? >>> >>> The vhyp shouldn't affect normal system calls, 'sc 1' is specifically >>> for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the >>> vhyp only intercepts the hypercall version (after all Linux on PAPR >>> certainly uses its own system calls, and hypercalls are active for the >>> lifetime of the guest there). >>> >>>> (Or if this causes >>>> an otherwise unnecessary VM exit on KVM even when it works then maybe >>>> looking for a different way in the future might be needed. >>> >>> What you're doing here won't work with KVM as it stands. There are >>> basically two paths into the vhyp hypercall path: 1) from TCG, if we >>> interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get >>> a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. >>> >>> The second path is specific to the PAPR (ppc64) implementation of KVM, >>> and will not work for a non-PAPR platform without substantial >>> modification of the KVM code. >> >> OK so then at that point when we try KVM we'll need to look at alternative >> ways, I think MOL OSI worked with KVM at least in MOL but will probably make >> all syscalls exit KVM but since we'll probably need to use KVM PR it will >> exit anyway. For now I keep this vhyp as it does not run with KVM for other >> reasons yet so that's another area to clean up so as a proof of concept >> first version of using VOF vhyp will do. > > Eh, since you'll need to modify KVM anyway, it probably makes just as > much sense to modify it to catch the 'sc 1' as MoL's magic thingy. I'm not sure how KVM works for this case so I also don't know why and what would need to be modified. I think we'll only have KVM PR working as newer POWER CPUs having HV (besides being rare among potential users) are probably too different to run the OSes that expect at most a G4 on pegasos2 so likely it won't work with KVM HV. If we have KVM PR doesn't sc already trap so we could add MOL OSI without further modification to KVM itself only needing change in QEMU? I also hope that MOL OSI could be useful for porting some paravirt drivers from MOL for running Mac OS X on Mac emulation but I don't know about that for sure so I'm open to any other solution too. For now I'm going with vhyp which is enough fot testing with TCG and if somebody wants KVM they could use he original firmware for now so this could be improved in a later version unless a simple solution is found before the freeze for 6.1. If we're in KVM PR what happens for sc 1 could that be used too so maybe what we have now could work? >> [...] >>>>>> I've tested that the missing rtas is not the reason for getting no output >>>>>> via serial though, as even when disabling rtas on pegasos2.rom it boots and >>>>>> I still get serial output just some PCI devices are not detected (such as >>>>>> USB, the video card and the not emulated ethernet port but these are not >>>>>> fatal so it might even work as a first try without rtas, just to boot a >>>>>> Linux kernel for testing it would be enough if I can fix the serial output). >>>>>> I still don't know why it's not finding serial but I think it may be some >>>>>> missing or wrong info in the device tree I generat. I'll try to focus on >>>>>> this for now and leave the above rtas question for later. >>>>> >>>>> Oh.. another thought on that. You have an ISA serial port on Pegasos, >>>>> I believe. I wonder if the PCI->ISA bridge needs some configuration / >>>>> initialization that the firmware is expected to do. If so you'll need >>>>> to mimic that setup in qemu for the VOF case. >>>> >>>> That's what I begin to think because I've added everything to the device >>>> tree that I thought could be needed and I still don't get it working so it >>>> may need some config from the firmware. But how do I access device registers >>>> from board code? I've tried adding a machine reset method and write to >>>> memory mapped device registers but all my attempts failed. I've tried >>>> cpu_stl_le_data and even memory_region_dispatch_write but these did not get >>>> to the device. What's the way to access guest mmio regs from QEMU? >>> >>> That's odd, cpu_stl() and memory_region_dispatch_write() should work >>> from board code (after the relevant memory regions are configured, of >>> course). As an ISA serial port, it's probably accessed through IO >>> space, not memory space though, so you'd need &address_space_io. And >>> if there is some bridge configuration then it's the bridge control >>> registers you need to look at not the serial registers - you'd have to >>> look at the bridge documentation for that. Or, I guess the bridge >>> implementation in qemu, which you wrote part of. >> >> I've found at last that stl_le_phys() works. There are so many of these that >> I never know when to use which. >> >> I think the address_space_rw calls in vof_client_call() in vof.c could also >> use these for somewhat shorter code. I've ended up with >> stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset methodbut I don't >> even need that now as it works without additional setup. Also VOF's memory >> access is basically the same as the already existing rtas_st() and co. so >> maybe that could be reused to make code smaller? > > rtas_ld() and rtas_st() should only be used for reading/writing RTAS > parameters to and from memory. Accessing IO shouldn't be done with > those. > > For IO you probably want the cpu_st*() variants in most cases, since > you're trying to emulate an IO access from the virtual cpu. I think I've tried that but what worked to access mmio device registers are stl_le_phys and similar that are wrappers around address_space_stl_*. But I did not mean that for rtas_ld/_st but the part when vof accessing the parameters passed by its hypercall which is memory access: https://github.com/patchew-project/qemu/blob/patchew/20210520090557.435689-1-aik%40ozlabs.ru/hw/ppc/vof.c line 893, and vof_client_call before that is very similar to what h_rtas does here: https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_hcall.c;h=f25014afda408002ee1ec1027a0dd7a6025eca61;hb=HEAD#l639 and I also need to do the same for rtas in pegasos2 for which I'm just using ldl_be_phys for now but I wonder if we really need 3 ways to do the same or the rtas_ld/_st could be made more generic and reused here? Regards, BALATON Zoltan
On Thu, 20 May 2021, Alexey Kardashevskiy wrote: > diff --git a/hw/ppc/spapr_vof.c b/hw/ppc/spapr_vof.c > new file mode 100644 > index 000000000000..5e34d5402abf > --- /dev/null > +++ b/hw/ppc/spapr_vof.c > @@ -0,0 +1,156 @@ > +/* > + * SPAPR machine hooks to Virtual Open Firmware, > + * > + * SPDX-License-Identifier: GPL-2.0-or-later > + */ > +#include "qemu/osdep.h" > +#include "qemu-common.h" > +#include <sys/ioctl.h> > +#include "qapi/error.h" > +#include "hw/ppc/spapr.h" > +#include "hw/ppc/spapr_vio.h" > +#include "hw/ppc/fdt.h" > +#include "sysemu/sysemu.h" > +#include "qom/qom-qobject.h" > +#include "trace.h" > + > +/* Copied from SLOF, and 4K is definitely not enough for GRUB */ > +#define OF_STACK_SIZE 0x8000 I found a reference explaining its value better than the comment above. Section 8.2.2 here: https://www.devicetree.org/open-firmware/bindings/ppc/release/ppc-2_1.html says it should be at least 32k. This define should be in vof.h so I don't have to duplicate it in pegasos2.c. Or vof_init could allocate and claim the stack so board code doesn't have to do that either. Maybe taking a pointer argument for preferred stack address as input and could return the aligned address where the stack was allocated or just store stack_base in struct vof where tha board code could get it for adding to r1 on calling the guest code. Regards, BALATON Zoltan
Hello, Two more problems I've found while testing with pegasos2 but I'm not sure how to fix them: On Thu, 20 May 2021, Alexey Kardashevskiy wrote: > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > new file mode 100644 > index 000000000000..a283b7d251a7 > --- /dev/null > +++ b/hw/ppc/vof.c > @@ -0,0 +1,1021 @@ > +/* > + * QEMU PowerPC Virtual Open Firmware. > + * > + * This implements client interface from OpenFirmware IEEE1275 on the QEMU > + * side to leave only a very basic firmware in the VM. > + * > + * Copyright (c) 2021 IBM Corporation. > + * > + * SPDX-License-Identifier: GPL-2.0-or-later > + */ > + > +#include "qemu/osdep.h" > +#include "qemu-common.h" > +#include "qemu/timer.h" > +#include "qemu/range.h" > +#include "qemu/units.h" > +#include "qapi/error.h" > +#include <sys/ioctl.h> > +#include "exec/ram_addr.h" > +#include "exec/address-spaces.h" > +#include "hw/ppc/vof.h" > +#include "hw/ppc/fdt.h" > +#include "sysemu/runstate.h" > +#include "qom/qom-qobject.h" > +#include "trace.h" > + > +#include <libfdt.h> > + > +/* > + * OF 1275 "nextprop" description suggests is it 32 bytes max but > + * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 chars long. > + */ > +#define OF_PROPNAME_LEN_MAX 64 > + > +#define VOF_MAX_PATH 256 > +#define VOF_MAX_SETPROPLEN 2048 > +#define VOF_MAX_METHODLEN 256 > +#define VOF_MAX_FORTHCODE 256 > +#define VOF_VTY_BUF_SIZE 256 > + > +typedef struct { > + uint64_t start; > + uint64_t size; > +} OfClaimed; > + > +typedef struct { > + char *path; /* the path used to open the instance */ > + uint32_t phandle; > +} OfInstance; > + > +#define VOF_MEM_READ(pa, buf, size) \ > + address_space_read_full(&address_space_memory, \ > + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) > +#define VOF_MEM_WRITE(pa, buf, size) \ > + address_space_write(&address_space_memory, \ > + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) > + > +static int readstr(hwaddr pa, char *buf, int size) > +{ > + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { > + return -1; > + } > + if (strnlen(buf, size) == size) { > + buf[size - 1] = '\0'; > + trace_vof_error_str_truncated(buf, size); > + return -1; > + } > + return 0; > +} > + > +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, > + const char *s1, unsigned nargscheck, unsigned nretcheck) > +{ > + if (strcmp(s, s1)) { > + return false; > + } > + if ((nargscheck && (nargs != nargscheck)) || > + (nretcheck && (nret != nretcheck))) { > + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); > + return false; > + } > + > + return true; > +} > + > +static void prop_format(char *tval, int tlen, const void *prop, int len) > +{ > + int i; > + const unsigned char *c; > + char *t; > + const char bin[] = "..."; > + > + for (i = 0, c = prop; i < len; ++i, ++c) { > + if (*c == '\0' && i == len - 1) { > + strncpy(tval, prop, tlen - 1); > + return; > + } > + if (*c < 0x20 || *c >= 0x80) { > + break; > + } > + } > + > + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { > + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { > + strcpy(t, bin); > + return; > + } > + if (i && i % 4 == 0 && i != len - 1) { > + strcat(t, " "); > + ++t; > + } > + t += sprintf(t, "%02X", *c & 0xFF); > + } > +} > + > +static int get_path(const void *fdt, int offset, char *buf, int len) > +{ > + int ret; > + > + ret = fdt_get_path(fdt, offset, buf, len - 1); > + if (ret < 0) { > + return ret; > + } > + > + buf[len - 1] = '\0'; > + > + return strlen(buf) + 1; > +} > + > +static int phandle_to_path(const void *fdt, uint32_t ph, char *buf, int len) > +{ > + int ret; > + > + ret = fdt_node_offset_by_phandle(fdt, ph); > + if (ret < 0) { > + return ret; > + } > + > + return get_path(fdt, ret, buf, len); > +} > + > +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) > +{ > + char fullnode[VOF_MAX_PATH]; > + uint32_t ret = -1; > + int offset; > + > + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { > + return (uint32_t) ret; > + } > + > + offset = fdt_path_offset(fdt, fullnode); > + if (offset >= 0) { > + ret = fdt_get_phandle(fdt, offset); > + } > + trace_vof_finddevice(fullnode, ret); > + return (uint32_t) ret; > +} The Linux init function that runs on pegasos2 here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.234#n2658 calls finddevice once with isa@c and next with isa@C (small and capital C) both of which works with the board firmware but with vof the comparison is case sensitive and one of these fails so I can't make it work. I don't know if this is a problem in libfdt or the vof_finddevice above should do something else to get case insensitive comparison. > + > +static const void *getprop(const void *fdt, int nodeoff, const char *propname, > + int *proplen, bool *write0) > +{ > + const char *unit, *prop; > + > + /* > + * The "name" property is not actually stored as a property in the FDT, > + * we emulate it by returning a pointer to the node's name and adjust > + * proplen to include only the name but not the unit. > + */ > + if (strcmp(propname, "name") == 0) { > + prop = fdt_get_name(fdt, nodeoff, proplen); > + if (!prop) { > + *proplen = 0; > + return NULL; > + } > + > + unit = memchr(prop, '@', *proplen); > + if (unit) { > + *proplen = unit - prop; > + } > + *proplen += 1; > + > + /* > + * Since it might be cut at "@" and there will be no trailing zero > + * in the prop buffer, tell the caller to write zero at the end. > + */ > + if (write0) { > + *write0 = true; > + } > + return prop; > + } > + > + if (write0) { > + *write0 = false; > + } > + return fdt_getprop(fdt, nodeoff, propname, proplen); > +} MorphOS checks the name property of the root node ("/") to decide what platform it runs on so we may need to be able to set this property on / where it should return "bplan,Pegasos2", therefore the above maybe should do getprop first and only generate name property if it's not set (or at least check if we're on the root node and allow setting name property there). (On Macs the root node is named "device-tree" and this was before found to be needed for MorphOS.) Other than the above two problems, I've found that getting the device tree from vof returns it in reverse order compared to the board firmware if I add it the expected order. This may or may not be a problem but to avoid it I can build the tree in reverse order then it comes out right so unless there's an easy fix this should not cause a problem but may worth a comment somewhere. Regards, BALATON Zoltan
On Sun, 30 May 2021, BALATON Zoltan wrote: > On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c >> new file mode 100644 >> index 000000000000..a283b7d251a7 >> --- /dev/null >> +++ b/hw/ppc/vof.c >> @@ -0,0 +1,1021 @@ >> +/* >> + * QEMU PowerPC Virtual Open Firmware. >> + * >> + * This implements client interface from OpenFirmware IEEE1275 on the QEMU >> + * side to leave only a very basic firmware in the VM. >> + * >> + * Copyright (c) 2021 IBM Corporation. >> + * >> + * SPDX-License-Identifier: GPL-2.0-or-later >> + */ >> + >> +#include "qemu/osdep.h" >> +#include "qemu-common.h" >> +#include "qemu/timer.h" >> +#include "qemu/range.h" >> +#include "qemu/units.h" >> +#include "qapi/error.h" >> +#include <sys/ioctl.h> >> +#include "exec/ram_addr.h" >> +#include "exec/address-spaces.h" >> +#include "hw/ppc/vof.h" >> +#include "hw/ppc/fdt.h" >> +#include "sysemu/runstate.h" >> +#include "qom/qom-qobject.h" >> +#include "trace.h" >> + >> +#include <libfdt.h> >> + >> +/* >> + * OF 1275 "nextprop" description suggests is it 32 bytes max but >> + * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 chars >> long. >> + */ >> +#define OF_PROPNAME_LEN_MAX 64 >> + >> +#define VOF_MAX_PATH 256 >> +#define VOF_MAX_SETPROPLEN 2048 >> +#define VOF_MAX_METHODLEN 256 >> +#define VOF_MAX_FORTHCODE 256 >> +#define VOF_VTY_BUF_SIZE 256 >> + >> +typedef struct { >> + uint64_t start; >> + uint64_t size; >> +} OfClaimed; >> + >> +typedef struct { >> + char *path; /* the path used to open the instance */ >> + uint32_t phandle; >> +} OfInstance; >> + >> +#define VOF_MEM_READ(pa, buf, size) \ >> + address_space_read_full(&address_space_memory, \ >> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >> +#define VOF_MEM_WRITE(pa, buf, size) \ >> + address_space_write(&address_space_memory, \ >> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >> + >> +static int readstr(hwaddr pa, char *buf, int size) >> +{ >> + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { >> + return -1; >> + } >> + if (strnlen(buf, size) == size) { >> + buf[size - 1] = '\0'; >> + trace_vof_error_str_truncated(buf, size); >> + return -1; >> + } >> + return 0; >> +} >> + >> +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, >> + const char *s1, unsigned nargscheck, unsigned >> nretcheck) >> +{ >> + if (strcmp(s, s1)) { >> + return false; >> + } >> + if ((nargscheck && (nargs != nargscheck)) || >> + (nretcheck && (nret != nretcheck))) { >> + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); >> + return false; >> + } >> + >> + return true; >> +} >> + >> +static void prop_format(char *tval, int tlen, const void *prop, int len) >> +{ >> + int i; >> + const unsigned char *c; >> + char *t; >> + const char bin[] = "..."; >> + >> + for (i = 0, c = prop; i < len; ++i, ++c) { >> + if (*c == '\0' && i == len - 1) { >> + strncpy(tval, prop, tlen - 1); >> + return; >> + } >> + if (*c < 0x20 || *c >= 0x80) { >> + break; >> + } >> + } >> + >> + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { >> + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { >> + strcpy(t, bin); >> + return; >> + } >> + if (i && i % 4 == 0 && i != len - 1) { >> + strcat(t, " "); >> + ++t; >> + } >> + t += sprintf(t, "%02X", *c & 0xFF); >> + } >> +} >> + >> +static int get_path(const void *fdt, int offset, char *buf, int len) >> +{ >> + int ret; >> + >> + ret = fdt_get_path(fdt, offset, buf, len - 1); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + buf[len - 1] = '\0'; >> + >> + return strlen(buf) + 1; >> +} >> + >> +static int phandle_to_path(const void *fdt, uint32_t ph, char *buf, int >> len) >> +{ >> + int ret; >> + >> + ret = fdt_node_offset_by_phandle(fdt, ph); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + return get_path(fdt, ret, buf, len); >> +} >> + >> +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) >> +{ >> + char fullnode[VOF_MAX_PATH]; >> + uint32_t ret = -1; >> + int offset; >> + >> + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { >> + return (uint32_t) ret; >> + } >> + >> + offset = fdt_path_offset(fdt, fullnode); >> + if (offset >= 0) { >> + ret = fdt_get_phandle(fdt, offset); >> + } >> + trace_vof_finddevice(fullnode, ret); >> + return (uint32_t) ret; >> +} > > The Linux init function that runs on pegasos2 here: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.234#n2658 > > calls finddevice once with isa@c and next with isa@C (small and capital C) > both of which works with the board firmware but with vof the comparison is > case sensitive and one of these fails so I can't make it work. I don't know > if this is a problem in libfdt or the vof_finddevice above should do > something else to get case insensitive comparison. This fixes the issue with Linux but I'm not sure if there's any better solution or would it break anything else. Regards, BALATON Zoltan >diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c index a283b7d251..b47bbd509d 100644 --- a/hw/ppc/vof.c +++ b/hw/ppc/vof.c @@ -144,12 +144,15 @@ static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) char fullnode[VOF_MAX_PATH]; uint32_t ret = -1; int offset; + gchar *p; if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { return (uint32_t) ret; } - offset = fdt_path_offset(fdt, fullnode); + p = g_ascii_strdown(fullnode, -1); + offset = fdt_path_offset(fdt, p); + g_free(p); if (offset >= 0) { ret = fdt_get_phandle(fdt, offset); }
On 31/05/2021 23:07, BALATON Zoltan wrote: > On Sun, 30 May 2021, BALATON Zoltan wrote: >> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c >>> new file mode 100644 >>> index 000000000000..a283b7d251a7 >>> --- /dev/null >>> +++ b/hw/ppc/vof.c >>> @@ -0,0 +1,1021 @@ >>> +/* >>> + * QEMU PowerPC Virtual Open Firmware. >>> + * >>> + * This implements client interface from OpenFirmware IEEE1275 on >>> the QEMU >>> + * side to leave only a very basic firmware in the VM. >>> + * >>> + * Copyright (c) 2021 IBM Corporation. >>> + * >>> + * SPDX-License-Identifier: GPL-2.0-or-later >>> + */ >>> + >>> +#include "qemu/osdep.h" >>> +#include "qemu-common.h" >>> +#include "qemu/timer.h" >>> +#include "qemu/range.h" >>> +#include "qemu/units.h" >>> +#include "qapi/error.h" >>> +#include <sys/ioctl.h> >>> +#include "exec/ram_addr.h" >>> +#include "exec/address-spaces.h" >>> +#include "hw/ppc/vof.h" >>> +#include "hw/ppc/fdt.h" >>> +#include "sysemu/runstate.h" >>> +#include "qom/qom-qobject.h" >>> +#include "trace.h" >>> + >>> +#include <libfdt.h> >>> + >>> +/* >>> + * OF 1275 "nextprop" description suggests is it 32 bytes max but >>> + * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 >>> chars long. >>> + */ >>> +#define OF_PROPNAME_LEN_MAX 64 >>> + >>> +#define VOF_MAX_PATH 256 >>> +#define VOF_MAX_SETPROPLEN 2048 >>> +#define VOF_MAX_METHODLEN 256 >>> +#define VOF_MAX_FORTHCODE 256 >>> +#define VOF_VTY_BUF_SIZE 256 >>> + >>> +typedef struct { >>> + uint64_t start; >>> + uint64_t size; >>> +} OfClaimed; >>> + >>> +typedef struct { >>> + char *path; /* the path used to open the instance */ >>> + uint32_t phandle; >>> +} OfInstance; >>> + >>> +#define VOF_MEM_READ(pa, buf, size) \ >>> + address_space_read_full(&address_space_memory, \ >>> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >>> +#define VOF_MEM_WRITE(pa, buf, size) \ >>> + address_space_write(&address_space_memory, \ >>> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >>> + >>> +static int readstr(hwaddr pa, char *buf, int size) >>> +{ >>> + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { >>> + return -1; >>> + } >>> + if (strnlen(buf, size) == size) { >>> + buf[size - 1] = '\0'; >>> + trace_vof_error_str_truncated(buf, size); >>> + return -1; >>> + } >>> + return 0; >>> +} >>> + >>> +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, >>> + const char *s1, unsigned nargscheck, unsigned >>> nretcheck) >>> +{ >>> + if (strcmp(s, s1)) { >>> + return false; >>> + } >>> + if ((nargscheck && (nargs != nargscheck)) || >>> + (nretcheck && (nret != nretcheck))) { >>> + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); >>> + return false; >>> + } >>> + >>> + return true; >>> +} >>> + >>> +static void prop_format(char *tval, int tlen, const void *prop, int >>> len) >>> +{ >>> + int i; >>> + const unsigned char *c; >>> + char *t; >>> + const char bin[] = "..."; >>> + >>> + for (i = 0, c = prop; i < len; ++i, ++c) { >>> + if (*c == '\0' && i == len - 1) { >>> + strncpy(tval, prop, tlen - 1); >>> + return; >>> + } >>> + if (*c < 0x20 || *c >= 0x80) { >>> + break; >>> + } >>> + } >>> + >>> + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { >>> + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { >>> + strcpy(t, bin); >>> + return; >>> + } >>> + if (i && i % 4 == 0 && i != len - 1) { >>> + strcat(t, " "); >>> + ++t; >>> + } >>> + t += sprintf(t, "%02X", *c & 0xFF); >>> + } >>> +} >>> + >>> +static int get_path(const void *fdt, int offset, char *buf, int len) >>> +{ >>> + int ret; >>> + >>> + ret = fdt_get_path(fdt, offset, buf, len - 1); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + >>> + buf[len - 1] = '\0'; >>> + >>> + return strlen(buf) + 1; >>> +} >>> + >>> +static int phandle_to_path(const void *fdt, uint32_t ph, char *buf, >>> int len) >>> +{ >>> + int ret; >>> + >>> + ret = fdt_node_offset_by_phandle(fdt, ph); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + >>> + return get_path(fdt, ret, buf, len); >>> +} >>> + >>> +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) >>> +{ >>> + char fullnode[VOF_MAX_PATH]; >>> + uint32_t ret = -1; >>> + int offset; >>> + >>> + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { >>> + return (uint32_t) ret; >>> + } >>> + >>> + offset = fdt_path_offset(fdt, fullnode); >>> + if (offset >= 0) { >>> + ret = fdt_get_phandle(fdt, offset); >>> + } >>> + trace_vof_finddevice(fullnode, ret); >>> + return (uint32_t) ret; >>> +} >> >> The Linux init function that runs on pegasos2 here: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.234#n2658 >> >> >> calls finddevice once with isa@c and next with isa@C (small and >> capital C) both of which works with the board firmware but with vof >> the comparison is case sensitive and one of these fails so I can't >> make it work. I don't know if this is a problem in libfdt or the >> vof_finddevice above should do something else to get case insensitive >> comparison. > > This fixes the issue with Linux but I'm not sure if there's any better > solution or would it break anything else. The bit after "@" is an address and needs to be case insensitive and I'll fix this indeed. I'm not so sure about the part before "@", I cannot imagine what could break if I made search insensitive to case. Hm :-/ > > Regards, > BALATON Zoltan > >> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > index a283b7d251..b47bbd509d 100644 > --- a/hw/ppc/vof.c > +++ b/hw/ppc/vof.c > @@ -144,12 +144,15 @@ static uint32_t vof_finddevice(const void *fdt, > uint32_t nodeaddr) > char fullnode[VOF_MAX_PATH]; > uint32_t ret = -1; > int offset; > + gchar *p; > > if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { > return (uint32_t) ret; > } > > - offset = fdt_path_offset(fdt, fullnode); > + p = g_ascii_strdown(fullnode, -1); > + offset = fdt_path_offset(fdt, p); > + g_free(p); > if (offset >= 0) { > ret = fdt_get_phandle(fdt, offset); > }
On Tue, 1 Jun 2021, Alexey Kardashevskiy wrote: > On 31/05/2021 23:07, BALATON Zoltan wrote: >> On Sun, 30 May 2021, BALATON Zoltan wrote: >>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c >>>> new file mode 100644 >>>> index 000000000000..a283b7d251a7 >>>> --- /dev/null >>>> +++ b/hw/ppc/vof.c >>>> @@ -0,0 +1,1021 @@ >>>> +/* >>>> + * QEMU PowerPC Virtual Open Firmware. >>>> + * >>>> + * This implements client interface from OpenFirmware IEEE1275 on the >>>> QEMU >>>> + * side to leave only a very basic firmware in the VM. >>>> + * >>>> + * Copyright (c) 2021 IBM Corporation. >>>> + * >>>> + * SPDX-License-Identifier: GPL-2.0-or-later >>>> + */ >>>> + >>>> +#include "qemu/osdep.h" >>>> +#include "qemu-common.h" >>>> +#include "qemu/timer.h" >>>> +#include "qemu/range.h" >>>> +#include "qemu/units.h" >>>> +#include "qapi/error.h" >>>> +#include <sys/ioctl.h> >>>> +#include "exec/ram_addr.h" >>>> +#include "exec/address-spaces.h" >>>> +#include "hw/ppc/vof.h" >>>> +#include "hw/ppc/fdt.h" >>>> +#include "sysemu/runstate.h" >>>> +#include "qom/qom-qobject.h" >>>> +#include "trace.h" >>>> + >>>> +#include <libfdt.h> >>>> + >>>> +/* >>>> + * OF 1275 "nextprop" description suggests is it 32 bytes max but >>>> + * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 chars >>>> long. >>>> + */ >>>> +#define OF_PROPNAME_LEN_MAX 64 >>>> + >>>> +#define VOF_MAX_PATH 256 >>>> +#define VOF_MAX_SETPROPLEN 2048 >>>> +#define VOF_MAX_METHODLEN 256 >>>> +#define VOF_MAX_FORTHCODE 256 >>>> +#define VOF_VTY_BUF_SIZE 256 >>>> + >>>> +typedef struct { >>>> + uint64_t start; >>>> + uint64_t size; >>>> +} OfClaimed; >>>> + >>>> +typedef struct { >>>> + char *path; /* the path used to open the instance */ >>>> + uint32_t phandle; >>>> +} OfInstance; >>>> + >>>> +#define VOF_MEM_READ(pa, buf, size) \ >>>> + address_space_read_full(&address_space_memory, \ >>>> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >>>> +#define VOF_MEM_WRITE(pa, buf, size) \ >>>> + address_space_write(&address_space_memory, \ >>>> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >>>> + >>>> +static int readstr(hwaddr pa, char *buf, int size) >>>> +{ >>>> + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { >>>> + return -1; >>>> + } >>>> + if (strnlen(buf, size) == size) { >>>> + buf[size - 1] = '\0'; >>>> + trace_vof_error_str_truncated(buf, size); >>>> + return -1; >>>> + } >>>> + return 0; >>>> +} >>>> + >>>> +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, >>>> + const char *s1, unsigned nargscheck, unsigned >>>> nretcheck) >>>> +{ >>>> + if (strcmp(s, s1)) { >>>> + return false; >>>> + } >>>> + if ((nargscheck && (nargs != nargscheck)) || >>>> + (nretcheck && (nret != nretcheck))) { >>>> + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); >>>> + return false; >>>> + } >>>> + >>>> + return true; >>>> +} >>>> + >>>> +static void prop_format(char *tval, int tlen, const void *prop, int len) >>>> +{ >>>> + int i; >>>> + const unsigned char *c; >>>> + char *t; >>>> + const char bin[] = "..."; >>>> + >>>> + for (i = 0, c = prop; i < len; ++i, ++c) { >>>> + if (*c == '\0' && i == len - 1) { >>>> + strncpy(tval, prop, tlen - 1); >>>> + return; >>>> + } >>>> + if (*c < 0x20 || *c >= 0x80) { >>>> + break; >>>> + } >>>> + } >>>> + >>>> + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { >>>> + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { >>>> + strcpy(t, bin); >>>> + return; >>>> + } >>>> + if (i && i % 4 == 0 && i != len - 1) { >>>> + strcat(t, " "); >>>> + ++t; >>>> + } >>>> + t += sprintf(t, "%02X", *c & 0xFF); >>>> + } >>>> +} >>>> + >>>> +static int get_path(const void *fdt, int offset, char *buf, int len) >>>> +{ >>>> + int ret; >>>> + >>>> + ret = fdt_get_path(fdt, offset, buf, len - 1); >>>> + if (ret < 0) { >>>> + return ret; >>>> + } >>>> + >>>> + buf[len - 1] = '\0'; >>>> + >>>> + return strlen(buf) + 1; >>>> +} >>>> + >>>> +static int phandle_to_path(const void *fdt, uint32_t ph, char *buf, int >>>> len) >>>> +{ >>>> + int ret; >>>> + >>>> + ret = fdt_node_offset_by_phandle(fdt, ph); >>>> + if (ret < 0) { >>>> + return ret; >>>> + } >>>> + >>>> + return get_path(fdt, ret, buf, len); >>>> +} >>>> + >>>> +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) >>>> +{ >>>> + char fullnode[VOF_MAX_PATH]; >>>> + uint32_t ret = -1; >>>> + int offset; >>>> + >>>> + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { >>>> + return (uint32_t) ret; >>>> + } >>>> + >>>> + offset = fdt_path_offset(fdt, fullnode); >>>> + if (offset >= 0) { >>>> + ret = fdt_get_phandle(fdt, offset); >>>> + } >>>> + trace_vof_finddevice(fullnode, ret); >>>> + return (uint32_t) ret; >>>> +} >>> >>> The Linux init function that runs on pegasos2 here: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.234#n2658 >>> >>> calls finddevice once with isa@c and next with isa@C (small and capital C) >>> both of which works with the board firmware but with vof the comparison is >>> case sensitive and one of these fails so I can't make it work. I don't >>> know if this is a problem in libfdt or the vof_finddevice above should do >>> something else to get case insensitive comparison. >> >> This fixes the issue with Linux but I'm not sure if there's any better >> solution or would it break anything else. > > The bit after "@" is an address and needs to be case insensitive and I'll fix > this indeed. I'm not so sure about the part before "@", I cannot imagine what > could break if I made search insensitive to case. Hm :-/ Fixing the match in the address part is probably enough as the name sent by guests is probably always lower case but the address could be formatted differently and that's what caused the problem. The patch below was only a quick fix to be able to test it further but your fix should work too. With this and the ld replaced in entry.S I can now boot Linux which is enough to submit the pegasos2 vof patch after an updated patch from you fixes these in vof. MorphOS still misses something but I'm not sure what as it uses the data gathered from the device tree later without printing diagnostics and fails due to a NULL dereference much after that so it seems to assume some value should exist but I'm not sure what value it needs and where that should come from. Maybe I'll try some more to find out just to make it simpler to boot but since it boots with the board firmware it's enough if Linux works with vof for now. Regards, BALATON Zoltan >>> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c >> index a283b7d251..b47bbd509d 100644 >> --- a/hw/ppc/vof.c >> +++ b/hw/ppc/vof.c >> @@ -144,12 +144,15 @@ static uint32_t vof_finddevice(const void *fdt, >> uint32_t nodeaddr) >> char fullnode[VOF_MAX_PATH]; >> uint32_t ret = -1; >> int offset; >> + gchar *p; >> >> if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { >> return (uint32_t) ret; >> } >> >> - offset = fdt_path_offset(fdt, fullnode); >> + p = g_ascii_strdown(fullnode, -1); >> + offset = fdt_path_offset(fdt, p); >> + g_free(p); >> if (offset >= 0) { >> ret = fdt_get_phandle(fdt, offset); >> } > >
On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: > On Thu, 27 May 2021, David Gibson wrote: > > On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: > > > On Tue, 25 May 2021, David Gibson wrote: > > > > On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: > > > > > On Mon, 24 May 2021, David Gibson wrote: > > > > > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: > > > > > > > On Sun, 23 May 2021, BALATON Zoltan wrote: > > > > > > > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > > > > > > > > > One thing to note about PCI is that normally I think the client > > > > > > > > > expects the firmware to do PCI probing and SLOF does it. But VOF > > > > > > > > > does not and Linux scans PCI bus(es) itself. Might be a problem for > > > > > > > > > you kernel. > > > > > > > > > > > > > > > > I'm not sure what info does MorphOS get from the device tree and what it > > > > > > > > probes itself but I think it may at least need device ids and info about > > > > > > > > the PCI bus to be able to access the config regs, after that it should > > > > > > > > set the devices up hopefully. I could add these from the board code to > > > > > > > > device tree so VOF does not need to do anything about it. However I'm > > > > > > > > not getting to that point yet because it crashes on something that it's > > > > > > > > missing and couldn't yet find out what is that. > > > > > > > > > > > > > > > > I'd like to get Linux working now as that would be enough to test this > > > > > > > > and then if for MorphOS we still need a ROM it's not a problem if at > > > > > > > > least we can boot Linux without the original firmware. But I can't make > > > > > > > > Linux open a serial console and I don't know what it needs for that. Do > > > > > > > > you happen to know? I've looked at the sources in Linux/arch/powerpc but > > > > > > > > not sure how it would find and open a serial port on pegasos2. It seems > > > > > > > > to work with the board firmware and now I can get it to boot with VOF > > > > > > > > but then it does not open serial so it probably needs something in the > > > > > > > > device tree or expects the firmware to set something up that we should > > > > > > > > add in pegasos2.c when using VOF. > > > > > > > > > > > > > > I've now found that Linux uses rtas methods read-pci-config and > > > > > > > write-pci-config for PCI access on pegasos2 so this means that we'll > > > > > > > probably need rtas too (I hoped we could get away without it if it were only > > > > > > > used for shutdown/reboot or so but seems Linux needs it for PCI as well and > > > > > > > does not scan the bus and won't find some devices without it). > > > > > > > > > > > > Yes, definitely sounds like you'll need an RTAS implementation. > > > > > > > > > > I plan to fix that after managed to get serial working as that seems to not > > > > > need it. If I delete the rtas-size property from /rtas on the original > > > > > firmware that makes Linux skip instantiating rtas, but I still get serial > > > > > output just not accessing PCI devices. So I think it should work and keeps > > > > > things simpler at first. Then I'll try rtas later. > > > > > > > > > > > > While VOF can do rtas, this causes a problem with the hypercall method using > > > > > > > sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so > > > > > > > cannot work after guest is past quiesce. > > > > > > > > > > > > > So the question is why is that > > > > > > > assert there > > > > > > > > > > > > Ah.. right. So, vhyp was designed for the PAPR use case, where we > > > > > > want to model the CPU when it's in supervisor and user mode, but not > > > > > > when it's in hypervisor mode. We want qemu to mimic the behaviour of > > > > > > the hypervisor, rather than attempting to actually execute hypervisor > > > > > > code in the virtual CPU. > > > > > > > > > > > > On systems that have a hypervisor mode, SDR1 is hypervisor privileged, > > > > > > so it makes no sense for the guest to attempt to set it. That should > > > > > > be caught by the general SPR code and turned into a 0x700, hence the > > > > > > assert() if we somehow reach ppc_store_sdr1(). > > > > > > > > > > > > So, we are seeing a problem here because you want the 'sc 1' > > > > > > interception of vhyp, but not the rest of the stuff that goes with it. > > > > > > > > > > > > > and would using sc 1 for hypercalls on pegasos2 cause other > > > > > > > problems later even if the assert could be removed? > > > > > > > > > > > > At least in the short term, I think you probably can remove the > > > > > > assert. In your case the 'sc 1' calls aren't truly to a hypervisor, > > > > > > but a special case escape to qemu for the firmware emulation. I think > > > > > > it's unlikely to cause problems later, because nothing on a 32-bit > > > > > > system should be attempting an 'sc 1'. The only thing I can think of > > > > > > that would fail is some test case which explicitly verified that 'sc > > > > > > 1' triggered a 0x700 (SIGILL from userspace). > > > > > > > > > > OK so the assert should check if the CPU has an HV bit. I think there was a > > > > > #detine for that somewhere that I can add to the assert then I can try that. > > > > > What I wasn't sure about is that sc 1 would conflict with the guest's usage > > > > > of normal sc calls or are these going through different paths and only sc 1 > > > > > will trigger vhyp callback not affecting notmal sc calls? > > > > > > > > The vhyp shouldn't affect normal system calls, 'sc 1' is specifically > > > > for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the > > > > vhyp only intercepts the hypercall version (after all Linux on PAPR > > > > certainly uses its own system calls, and hypercalls are active for the > > > > lifetime of the guest there). > > > > > > > > > (Or if this causes > > > > > an otherwise unnecessary VM exit on KVM even when it works then maybe > > > > > looking for a different way in the future might be needed. > > > > > > > > What you're doing here won't work with KVM as it stands. There are > > > > basically two paths into the vhyp hypercall path: 1) from TCG, if we > > > > interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get > > > > a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. > > > > > > > > The second path is specific to the PAPR (ppc64) implementation of KVM, > > > > and will not work for a non-PAPR platform without substantial > > > > modification of the KVM code. > > > > > > OK so then at that point when we try KVM we'll need to look at alternative > > > ways, I think MOL OSI worked with KVM at least in MOL but will probably make > > > all syscalls exit KVM but since we'll probably need to use KVM PR it will > > > exit anyway. For now I keep this vhyp as it does not run with KVM for other > > > reasons yet so that's another area to clean up so as a proof of concept > > > first version of using VOF vhyp will do. > > > > Eh, since you'll need to modify KVM anyway, it probably makes just as > > much sense to modify it to catch the 'sc 1' as MoL's magic thingy. > > I'm not sure how KVM works for this case so I also don't know why and what > would need to be modified. I think we'll only have KVM PR working as newer > POWER CPUs having HV (besides being rare among potential users) are probably > too different to run the OSes that expect at most a G4 on pegasos2 so likely > it won't work with KVM HV. Oh, it definitely won't work with KVM HV. > If we have KVM PR doesn't sc already trap so we > could add MOL OSI without further modification to KVM itself only needing > change in QEMU? Uh... I guess so? > I also hope that MOL OSI could be useful for porting some > paravirt drivers from MOL for running Mac OS X on Mac emulation but I don't > know about that for sure so I'm open to any other solution too. Maybe. I never know much about MOL to begin with, and anything I did know was a decade or more ago so I've probably forgotten. > For now I'm > going with vhyp which is enough fot testing with TCG and if somebody wants > KVM they could use he original firmware for now so this could be improved in > a later version unless a simple solution is found before the freeze for 6.1. > If we're in KVM PR what happens for sc 1 could that be used too so maybe > what we have now could work? Note that if you do go down the MOL path it wouldn't be that complex to make a "vMOL" interface so you can use the same mechanism for KVM and TCG. > > > [...] > > > > > > > I've tested that the missing rtas is not the reason for getting no output > > > > > > > via serial though, as even when disabling rtas on pegasos2.rom it boots and > > > > > > > I still get serial output just some PCI devices are not detected (such as > > > > > > > USB, the video card and the not emulated ethernet port but these are not > > > > > > > fatal so it might even work as a first try without rtas, just to boot a > > > > > > > Linux kernel for testing it would be enough if I can fix the serial output). > > > > > > > I still don't know why it's not finding serial but I think it may be some > > > > > > > missing or wrong info in the device tree I generat. I'll try to focus on > > > > > > > this for now and leave the above rtas question for later. > > > > > > > > > > > > Oh.. another thought on that. You have an ISA serial port on Pegasos, > > > > > > I believe. I wonder if the PCI->ISA bridge needs some configuration / > > > > > > initialization that the firmware is expected to do. If so you'll need > > > > > > to mimic that setup in qemu for the VOF case. > > > > > > > > > > That's what I begin to think because I've added everything to the device > > > > > tree that I thought could be needed and I still don't get it working so it > > > > > may need some config from the firmware. But how do I access device registers > > > > > from board code? I've tried adding a machine reset method and write to > > > > > memory mapped device registers but all my attempts failed. I've tried > > > > > cpu_stl_le_data and even memory_region_dispatch_write but these did not get > > > > > to the device. What's the way to access guest mmio regs from QEMU? > > > > > > > > That's odd, cpu_stl() and memory_region_dispatch_write() should work > > > > from board code (after the relevant memory regions are configured, of > > > > course). As an ISA serial port, it's probably accessed through IO > > > > space, not memory space though, so you'd need &address_space_io. And > > > > if there is some bridge configuration then it's the bridge control > > > > registers you need to look at not the serial registers - you'd have to > > > > look at the bridge documentation for that. Or, I guess the bridge > > > > implementation in qemu, which you wrote part of. > > > > > > I've found at last that stl_le_phys() works. There are so many of these that > > > I never know when to use which. > > > > > > I think the address_space_rw calls in vof_client_call() in vof.c could also > > > use these for somewhat shorter code. I've ended up with > > > stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset methodbut I don't > > > even need that now as it works without additional setup. Also VOF's memory > > > access is basically the same as the already existing rtas_st() and co. so > > > maybe that could be reused to make code smaller? > > > > rtas_ld() and rtas_st() should only be used for reading/writing RTAS > > parameters to and from memory. Accessing IO shouldn't be done with > > those. > > > > For IO you probably want the cpu_st*() variants in most cases, since > > you're trying to emulate an IO access from the virtual cpu. > > I think I've tried that but what worked to access mmio device registers are > stl_le_phys and similar that are wrappers around address_space_stl_*. But I > did not mean that for rtas_ld/_st but the part when vof accessing the > parameters passed by its hypercall which is memory access: > > https://github.com/patchew-project/qemu/blob/patchew/20210520090557.435689-1-aik%40ozlabs.ru/hw/ppc/vof.c > > line 893, and vof_client_call before that is very similar to what h_rtas > does here: > > https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_hcall.c;h=f25014afda408002ee1ec1027a0dd7a6025eca61;hb=HEAD#l639 > > and I also need to do the same for rtas in pegasos2 for which I'm just using > ldl_be_phys for now but I wonder if we really need 3 ways to do the same or > the rtas_ld/_st could be made more generic and reused here? For your rtas implementation you could definitely re-use them. For the client call I'm a bit less confident, but if the in-guest-memory structures are really the same, then it would make sense.
On Wed, 2 Jun 2021, David Gibson wrote: > On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: >> On Thu, 27 May 2021, David Gibson wrote: >>> On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: >>>> On Tue, 25 May 2021, David Gibson wrote: >>>>> On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: >>>>>> On Mon, 24 May 2021, David Gibson wrote: >>>>>>> On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >>>>>>>> On Sun, 23 May 2021, BALATON Zoltan wrote: >>>>>>>>> On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >>>>>>>>>> One thing to note about PCI is that normally I think the client >>>>>>>>>> expects the firmware to do PCI probing and SLOF does it. But VOF >>>>>>>>>> does not and Linux scans PCI bus(es) itself. Might be a problem for >>>>>>>>>> you kernel. >>>>>>>>> >>>>>>>>> I'm not sure what info does MorphOS get from the device tree and what it >>>>>>>>> probes itself but I think it may at least need device ids and info about >>>>>>>>> the PCI bus to be able to access the config regs, after that it should >>>>>>>>> set the devices up hopefully. I could add these from the board code to >>>>>>>>> device tree so VOF does not need to do anything about it. However I'm >>>>>>>>> not getting to that point yet because it crashes on something that it's >>>>>>>>> missing and couldn't yet find out what is that. >>>>>>>>> >>>>>>>>> I'd like to get Linux working now as that would be enough to test this >>>>>>>>> and then if for MorphOS we still need a ROM it's not a problem if at >>>>>>>>> least we can boot Linux without the original firmware. But I can't make >>>>>>>>> Linux open a serial console and I don't know what it needs for that. Do >>>>>>>>> you happen to know? I've looked at the sources in Linux/arch/powerpc but >>>>>>>>> not sure how it would find and open a serial port on pegasos2. It seems >>>>>>>>> to work with the board firmware and now I can get it to boot with VOF >>>>>>>>> but then it does not open serial so it probably needs something in the >>>>>>>>> device tree or expects the firmware to set something up that we should >>>>>>>>> add in pegasos2.c when using VOF. >>>>>>>> >>>>>>>> I've now found that Linux uses rtas methods read-pci-config and >>>>>>>> write-pci-config for PCI access on pegasos2 so this means that we'll >>>>>>>> probably need rtas too (I hoped we could get away without it if it were only >>>>>>>> used for shutdown/reboot or so but seems Linux needs it for PCI as well and >>>>>>>> does not scan the bus and won't find some devices without it). >>>>>>> >>>>>>> Yes, definitely sounds like you'll need an RTAS implementation. >>>>>> >>>>>> I plan to fix that after managed to get serial working as that seems to not >>>>>> need it. If I delete the rtas-size property from /rtas on the original >>>>>> firmware that makes Linux skip instantiating rtas, but I still get serial >>>>>> output just not accessing PCI devices. So I think it should work and keeps >>>>>> things simpler at first. Then I'll try rtas later. >>>>>> >>>>>>>> While VOF can do rtas, this causes a problem with the hypercall method using >>>>>>>> sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so >>>>>>>> cannot work after guest is past quiesce. >>>>>>> >>>>>>>> So the question is why is that >>>>>>>> assert there >>>>>>> >>>>>>> Ah.. right. So, vhyp was designed for the PAPR use case, where we >>>>>>> want to model the CPU when it's in supervisor and user mode, but not >>>>>>> when it's in hypervisor mode. We want qemu to mimic the behaviour of >>>>>>> the hypervisor, rather than attempting to actually execute hypervisor >>>>>>> code in the virtual CPU. >>>>>>> >>>>>>> On systems that have a hypervisor mode, SDR1 is hypervisor privileged, >>>>>>> so it makes no sense for the guest to attempt to set it. That should >>>>>>> be caught by the general SPR code and turned into a 0x700, hence the >>>>>>> assert() if we somehow reach ppc_store_sdr1(). >>>>>>> >>>>>>> So, we are seeing a problem here because you want the 'sc 1' >>>>>>> interception of vhyp, but not the rest of the stuff that goes with it. >>>>>>> >>>>>>>> and would using sc 1 for hypercalls on pegasos2 cause other >>>>>>>> problems later even if the assert could be removed? >>>>>>> >>>>>>> At least in the short term, I think you probably can remove the >>>>>>> assert. In your case the 'sc 1' calls aren't truly to a hypervisor, >>>>>>> but a special case escape to qemu for the firmware emulation. I think >>>>>>> it's unlikely to cause problems later, because nothing on a 32-bit >>>>>>> system should be attempting an 'sc 1'. The only thing I can think of >>>>>>> that would fail is some test case which explicitly verified that 'sc >>>>>>> 1' triggered a 0x700 (SIGILL from userspace). >>>>>> >>>>>> OK so the assert should check if the CPU has an HV bit. I think there was a >>>>>> #detine for that somewhere that I can add to the assert then I can try that. >>>>>> What I wasn't sure about is that sc 1 would conflict with the guest's usage >>>>>> of normal sc calls or are these going through different paths and only sc 1 >>>>>> will trigger vhyp callback not affecting notmal sc calls? >>>>> >>>>> The vhyp shouldn't affect normal system calls, 'sc 1' is specifically >>>>> for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the >>>>> vhyp only intercepts the hypercall version (after all Linux on PAPR >>>>> certainly uses its own system calls, and hypercalls are active for the >>>>> lifetime of the guest there). >>>>> >>>>>> (Or if this causes >>>>>> an otherwise unnecessary VM exit on KVM even when it works then maybe >>>>>> looking for a different way in the future might be needed. >>>>> >>>>> What you're doing here won't work with KVM as it stands. There are >>>>> basically two paths into the vhyp hypercall path: 1) from TCG, if we >>>>> interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get >>>>> a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. >>>>> >>>>> The second path is specific to the PAPR (ppc64) implementation of KVM, >>>>> and will not work for a non-PAPR platform without substantial >>>>> modification of the KVM code. >>>> >>>> OK so then at that point when we try KVM we'll need to look at alternative >>>> ways, I think MOL OSI worked with KVM at least in MOL but will probably make >>>> all syscalls exit KVM but since we'll probably need to use KVM PR it will >>>> exit anyway. For now I keep this vhyp as it does not run with KVM for other >>>> reasons yet so that's another area to clean up so as a proof of concept >>>> first version of using VOF vhyp will do. >>> >>> Eh, since you'll need to modify KVM anyway, it probably makes just as >>> much sense to modify it to catch the 'sc 1' as MoL's magic thingy. >> >> I'm not sure how KVM works for this case so I also don't know why and what >> would need to be modified. I think we'll only have KVM PR working as newer >> POWER CPUs having HV (besides being rare among potential users) are probably >> too different to run the OSes that expect at most a G4 on pegasos2 so likely >> it won't work with KVM HV. > > Oh, it definitely won't work with KVM HV. > >> If we have KVM PR doesn't sc already trap so we >> could add MOL OSI without further modification to KVM itself only needing >> change in QEMU? > > Uh... I guess so? > >> I also hope that MOL OSI could be useful for porting some >> paravirt drivers from MOL for running Mac OS X on Mac emulation but I don't >> know about that for sure so I'm open to any other solution too. > > Maybe. I never know much about MOL to begin with, and anything I did > know was a decade or more ago so I've probably forgotten. That may still be more than what I know about it since I never had any knowledge about PPC KVM and don't have any PPC hardware to test with so I'm mostly guessing. (I could test with KVM emulated in QEMU and I did set up an environment for that but that's a bit slow and inconvenient so I'd leave KVM support to those interested and have more knowledge and hardware for it.) >> For now I'm >> going with vhyp which is enough fot testing with TCG and if somebody wants >> KVM they could use he original firmware for now so this could be improved in >> a later version unless a simple solution is found before the freeze for 6.1. >> If we're in KVM PR what happens for sc 1 could that be used too so maybe >> what we have now could work? > > Note that if you do go down the MOL path it wouldn't be that complex > to make a "vMOL" interface so you can use the same mechanism for KVM > and TCG. Not sure what you mean by VMOL. Is it modifying MOL to use sc 1 like VOF instead of its OSI way for hypercalls? That would lose the advantage of being able to reuse MOL guest drivers without modification (which might be useful for running OS X guest on Mac emulation) so if we can't use vhyp then maybe using OSI would be the next choice for this reason but for now vhyp seems to be working for what I could test so unless somebody here sees a problem with it and has a better idea I'm going with vhyp for now just because that's what VOF uses and I don't want to modify VOF to reuse it as it is so I don't need to maintain a separate version and also get any enhancements without further need to sync with spapr VOF. I've found this document about possible hypercall interfaces on KVM (see Hypercall ABIs at the end): https://www.kernel.org/doc/html/latest/virt/kvm/ppc-pv.html Having both ePAPR (1.) and PAPR (2.) hypercalls is a bit confusing. Does vhyp correspond to 2. PAPR? The ePAPR (1.) seems to be preferred by KVM and MOL OSI supported for compatibility. So if we need something else instead of 2. PAPR hypercalls there seems to be two options: ePAPR and MOL OSI which should work with KVM but then I'm not sure how to handle those on TCG. >>>> [...] >>>>>>>> I've tested that the missing rtas is not the reason for getting no output >>>>>>>> via serial though, as even when disabling rtas on pegasos2.rom it boots and >>>>>>>> I still get serial output just some PCI devices are not detected (such as >>>>>>>> USB, the video card and the not emulated ethernet port but these are not >>>>>>>> fatal so it might even work as a first try without rtas, just to boot a >>>>>>>> Linux kernel for testing it would be enough if I can fix the serial output). >>>>>>>> I still don't know why it's not finding serial but I think it may be some >>>>>>>> missing or wrong info in the device tree I generat. I'll try to focus on >>>>>>>> this for now and leave the above rtas question for later. >>>>>>> >>>>>>> Oh.. another thought on that. You have an ISA serial port on Pegasos, >>>>>>> I believe. I wonder if the PCI->ISA bridge needs some configuration / >>>>>>> initialization that the firmware is expected to do. If so you'll need >>>>>>> to mimic that setup in qemu for the VOF case. >>>>>> >>>>>> That's what I begin to think because I've added everything to the device >>>>>> tree that I thought could be needed and I still don't get it working so it >>>>>> may need some config from the firmware. But how do I access device registers >>>>>> from board code? I've tried adding a machine reset method and write to >>>>>> memory mapped device registers but all my attempts failed. I've tried >>>>>> cpu_stl_le_data and even memory_region_dispatch_write but these did not get >>>>>> to the device. What's the way to access guest mmio regs from QEMU? >>>>> >>>>> That's odd, cpu_stl() and memory_region_dispatch_write() should work >>>>> from board code (after the relevant memory regions are configured, of >>>>> course). As an ISA serial port, it's probably accessed through IO >>>>> space, not memory space though, so you'd need &address_space_io. And >>>>> if there is some bridge configuration then it's the bridge control >>>>> registers you need to look at not the serial registers - you'd have to >>>>> look at the bridge documentation for that. Or, I guess the bridge >>>>> implementation in qemu, which you wrote part of. >>>> >>>> I've found at last that stl_le_phys() works. There are so many of these that >>>> I never know when to use which. >>>> >>>> I think the address_space_rw calls in vof_client_call() in vof.c could also >>>> use these for somewhat shorter code. I've ended up with >>>> stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset methodbut I don't >>>> even need that now as it works without additional setup. Also VOF's memory >>>> access is basically the same as the already existing rtas_st() and co. so >>>> maybe that could be reused to make code smaller? >>> >>> rtas_ld() and rtas_st() should only be used for reading/writing RTAS >>> parameters to and from memory. Accessing IO shouldn't be done with >>> those. >>> >>> For IO you probably want the cpu_st*() variants in most cases, since >>> you're trying to emulate an IO access from the virtual cpu. >> >> I think I've tried that but what worked to access mmio device registers are >> stl_le_phys and similar that are wrappers around address_space_stl_*. But I >> did not mean that for rtas_ld/_st but the part when vof accessing the >> parameters passed by its hypercall which is memory access: >> >> https://github.com/patchew-project/qemu/blob/patchew/20210520090557.435689-1-aik%40ozlabs.ru/hw/ppc/vof.c >> >> line 893, and vof_client_call before that is very similar to what h_rtas >> does here: >> >> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_hcall.c;h=f25014afda408002ee1ec1027a0dd7a6025eca61;hb=HEAD#l639 >> >> and I also need to do the same for rtas in pegasos2 for which I'm just using >> ldl_be_phys for now but I wonder if we really need 3 ways to do the same or >> the rtas_ld/_st could be made more generic and reused here? > > For your rtas implementation you could definitely re-use them. For > the client call I'm a bit less confident, but if the in-guest-memory > structures are really the same, then it would make sense. The memory structure seems very similar to me, the only difference is calling the first field service in VOF instead of token in RTAS. Both are just an array of big endian unit32_t with token, nargs, nret at the front followed by args and rets. Since these rtas_ld/st are defined in spapr.h I did not bother to split them off, so for pegasos2 rtas I'm just using the ldl_be_* functions directly for which these are a shorthand for. If these were split off for sharing between spapr rtas and VOF I may be able to reuse them as well but it's not that important so just mentioned it as a possible later clean up. Regards, BALATON Zoltan
On Sun, May 30, 2021 at 07:33:01PM +0200, BALATON Zoltan wrote: > Hello, > > Two more problems I've found while testing with pegasos2 but I'm not sure > how to fix them: > > On Thu, 20 May 2021, Alexey Kardashevskiy wrote: > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > new file mode 100644 > > index 000000000000..a283b7d251a7 > > --- /dev/null > > +++ b/hw/ppc/vof.c > > @@ -0,0 +1,1021 @@ > > +/* > > + * QEMU PowerPC Virtual Open Firmware. > > + * > > + * This implements client interface from OpenFirmware IEEE1275 on the QEMU > > + * side to leave only a very basic firmware in the VM. > > + * > > + * Copyright (c) 2021 IBM Corporation. > > + * > > + * SPDX-License-Identifier: GPL-2.0-or-later > > + */ > > + > > +#include "qemu/osdep.h" > > +#include "qemu-common.h" > > +#include "qemu/timer.h" > > +#include "qemu/range.h" > > +#include "qemu/units.h" > > +#include "qapi/error.h" > > +#include <sys/ioctl.h> > > +#include "exec/ram_addr.h" > > +#include "exec/address-spaces.h" > > +#include "hw/ppc/vof.h" > > +#include "hw/ppc/fdt.h" > > +#include "sysemu/runstate.h" > > +#include "qom/qom-qobject.h" > > +#include "trace.h" > > + > > +#include <libfdt.h> > > + > > +/* > > + * OF 1275 "nextprop" description suggests is it 32 bytes max but > > + * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 chars long. > > + */ > > +#define OF_PROPNAME_LEN_MAX 64 > > + > > +#define VOF_MAX_PATH 256 > > +#define VOF_MAX_SETPROPLEN 2048 > > +#define VOF_MAX_METHODLEN 256 > > +#define VOF_MAX_FORTHCODE 256 > > +#define VOF_VTY_BUF_SIZE 256 > > + > > +typedef struct { > > + uint64_t start; > > + uint64_t size; > > +} OfClaimed; > > + > > +typedef struct { > > + char *path; /* the path used to open the instance */ > > + uint32_t phandle; > > +} OfInstance; > > + > > +#define VOF_MEM_READ(pa, buf, size) \ > > + address_space_read_full(&address_space_memory, \ > > + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) > > +#define VOF_MEM_WRITE(pa, buf, size) \ > > + address_space_write(&address_space_memory, \ > > + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) > > + > > +static int readstr(hwaddr pa, char *buf, int size) > > +{ > > + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { > > + return -1; > > + } > > + if (strnlen(buf, size) == size) { > > + buf[size - 1] = '\0'; > > + trace_vof_error_str_truncated(buf, size); > > + return -1; > > + } > > + return 0; > > +} > > + > > +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, > > + const char *s1, unsigned nargscheck, unsigned nretcheck) > > +{ > > + if (strcmp(s, s1)) { > > + return false; > > + } > > + if ((nargscheck && (nargs != nargscheck)) || > > + (nretcheck && (nret != nretcheck))) { > > + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); > > + return false; > > + } > > + > > + return true; > > +} > > + > > +static void prop_format(char *tval, int tlen, const void *prop, int len) > > +{ > > + int i; > > + const unsigned char *c; > > + char *t; > > + const char bin[] = "..."; > > + > > + for (i = 0, c = prop; i < len; ++i, ++c) { > > + if (*c == '\0' && i == len - 1) { > > + strncpy(tval, prop, tlen - 1); > > + return; > > + } > > + if (*c < 0x20 || *c >= 0x80) { > > + break; > > + } > > + } > > + > > + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { > > + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { > > + strcpy(t, bin); > > + return; > > + } > > + if (i && i % 4 == 0 && i != len - 1) { > > + strcat(t, " "); > > + ++t; > > + } > > + t += sprintf(t, "%02X", *c & 0xFF); > > + } > > +} > > + > > +static int get_path(const void *fdt, int offset, char *buf, int len) > > +{ > > + int ret; > > + > > + ret = fdt_get_path(fdt, offset, buf, len - 1); > > + if (ret < 0) { > > + return ret; > > + } > > + > > + buf[len - 1] = '\0'; > > + > > + return strlen(buf) + 1; > > +} > > + > > +static int phandle_to_path(const void *fdt, uint32_t ph, char *buf, int len) > > +{ > > + int ret; > > + > > + ret = fdt_node_offset_by_phandle(fdt, ph); > > + if (ret < 0) { > > + return ret; > > + } > > + > > + return get_path(fdt, ret, buf, len); > > +} > > + > > +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) > > +{ > > + char fullnode[VOF_MAX_PATH]; > > + uint32_t ret = -1; > > + int offset; > > + > > + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { > > + return (uint32_t) ret; > > + } > > + > > + offset = fdt_path_offset(fdt, fullnode); > > + if (offset >= 0) { > > + ret = fdt_get_phandle(fdt, offset); > > + } > > + trace_vof_finddevice(fullnode, ret); > > + return (uint32_t) ret; > > +} > > The Linux init function that runs on pegasos2 here: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.234#n2658 > > calls finddevice once with isa@c and next with isa@C (small and capital C) > both of which works with the board firmware but with vof the comparison is > case sensitive and one of these fails so I can't make it work. I don't know > if this is a problem in libfdt or the vof_finddevice above should do > something else to get case insensitive comparison. This is kind of a subtle incompatibility between the traditional OF world and the flat tree world. In traditional OF, the unit address (bit after the @) doesn't exist as a string. Instead when you do the finddevice it will parse that address and compare it against the 'reg' properties for each of the relevant nodes. Since that's an integer comparison, case doesn't enter into it. But, how to parse (and write) addresses depends on the bus, so the firmware has to understand each bus type and act accordingly. That doesn't really work in the world of minimal firmwares dor the flat tree. So instead, we just incorporate a pre-formatted unit address in the flat tree directly. Most of the time that works fine, but there are some edge cases like the one you've hit. > > +static const void *getprop(const void *fdt, int nodeoff, const char *propname, > > + int *proplen, bool *write0) > > +{ > > + const char *unit, *prop; > > + > > + /* > > + * The "name" property is not actually stored as a property in the FDT, > > + * we emulate it by returning a pointer to the node's name and adjust > > + * proplen to include only the name but not the unit. > > + */ > > + if (strcmp(propname, "name") == 0) { > > + prop = fdt_get_name(fdt, nodeoff, proplen); > > + if (!prop) { > > + *proplen = 0; > > + return NULL; > > + } > > + > > + unit = memchr(prop, '@', *proplen); > > + if (unit) { > > + *proplen = unit - prop; > > + } > > + *proplen += 1; > > + > > + /* > > + * Since it might be cut at "@" and there will be no trailing zero > > + * in the prop buffer, tell the caller to write zero at the end. > > + */ > > + if (write0) { > > + *write0 = true; > > + } > > + return prop; > > + } > > + > > + if (write0) { > > + *write0 = false; > > + } > > + return fdt_getprop(fdt, nodeoff, propname, proplen); > > +} > > MorphOS checks the name property of the root node ("/") to decide what > platform it runs on so we may need to be able to set this property on / > where it should return "bplan,Pegasos2", therefore the above maybe should do > getprop first and only generate name property if it's not set (or at least > check if we're on the root node and allow setting name property there). (On > Macs the root node is named "device-tree" and this was before found to be > needed for MorphOS.) Ah. Hrm. Have to think about what to do about that. > Other than the above two problems, I've found that getting the device tree > from vof returns it in reverse order compared to the board firmware if I add > it the expected order. This may or may not be a problem but to avoid it I > can build the tree in reverse order then it comes out right so unless > there's an easy fix this should not cause a problem but may worth a comment > somewhere. The order of things in the device tree *should* never matter. If it does, that's definitely a client bug... but of course that doesn't necessarily mean we won't have to work around it in practice.
On Tue, Jun 01, 2021 at 04:12:44PM +0200, BALATON Zoltan wrote: > On Tue, 1 Jun 2021, Alexey Kardashevskiy wrote: > > On 31/05/2021 23:07, BALATON Zoltan wrote: > > > On Sun, 30 May 2021, BALATON Zoltan wrote: > > > > On Thu, 20 May 2021, Alexey Kardashevskiy wrote: > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > > new file mode 100644 > > > > > index 000000000000..a283b7d251a7 > > > > > --- /dev/null > > > > > +++ b/hw/ppc/vof.c > > > > > @@ -0,0 +1,1021 @@ > > > > > +/* > > > > > + * QEMU PowerPC Virtual Open Firmware. > > > > > + * > > > > > + * This implements client interface from OpenFirmware > > > > > IEEE1275 on the QEMU > > > > > + * side to leave only a very basic firmware in the VM. > > > > > + * > > > > > + * Copyright (c) 2021 IBM Corporation. > > > > > + * > > > > > + * SPDX-License-Identifier: GPL-2.0-or-later > > > > > + */ > > > > > + > > > > > +#include "qemu/osdep.h" > > > > > +#include "qemu-common.h" > > > > > +#include "qemu/timer.h" > > > > > +#include "qemu/range.h" > > > > > +#include "qemu/units.h" > > > > > +#include "qapi/error.h" > > > > > +#include <sys/ioctl.h> > > > > > +#include "exec/ram_addr.h" > > > > > +#include "exec/address-spaces.h" > > > > > +#include "hw/ppc/vof.h" > > > > > +#include "hw/ppc/fdt.h" > > > > > +#include "sysemu/runstate.h" > > > > > +#include "qom/qom-qobject.h" > > > > > +#include "trace.h" > > > > > + > > > > > +#include <libfdt.h> > > > > > + > > > > > +/* > > > > > + * OF 1275 "nextprop" description suggests is it 32 bytes max but > > > > > + * LoPAPR defines "ibm,query-interrupt-source-number" which > > > > > is 33 chars long. > > > > > + */ > > > > > +#define OF_PROPNAME_LEN_MAX 64 > > > > > + > > > > > +#define VOF_MAX_PATH 256 > > > > > +#define VOF_MAX_SETPROPLEN 2048 > > > > > +#define VOF_MAX_METHODLEN 256 > > > > > +#define VOF_MAX_FORTHCODE 256 > > > > > +#define VOF_VTY_BUF_SIZE 256 > > > > > + > > > > > +typedef struct { > > > > > + uint64_t start; > > > > > + uint64_t size; > > > > > +} OfClaimed; > > > > > + > > > > > +typedef struct { > > > > > + char *path; /* the path used to open the instance */ > > > > > + uint32_t phandle; > > > > > +} OfInstance; > > > > > + > > > > > +#define VOF_MEM_READ(pa, buf, size) \ > > > > > + address_space_read_full(&address_space_memory, \ > > > > > + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) > > > > > +#define VOF_MEM_WRITE(pa, buf, size) \ > > > > > + address_space_write(&address_space_memory, \ > > > > > + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) > > > > > + > > > > > +static int readstr(hwaddr pa, char *buf, int size) > > > > > +{ > > > > > + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { > > > > > + return -1; > > > > > + } > > > > > + if (strnlen(buf, size) == size) { > > > > > + buf[size - 1] = '\0'; > > > > > + trace_vof_error_str_truncated(buf, size); > > > > > + return -1; > > > > > + } > > > > > + return 0; > > > > > +} > > > > > + > > > > > +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, > > > > > + const char *s1, unsigned nargscheck, > > > > > unsigned nretcheck) > > > > > +{ > > > > > + if (strcmp(s, s1)) { > > > > > + return false; > > > > > + } > > > > > + if ((nargscheck && (nargs != nargscheck)) || > > > > > + (nretcheck && (nret != nretcheck))) { > > > > > + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); > > > > > + return false; > > > > > + } > > > > > + > > > > > + return true; > > > > > +} > > > > > + > > > > > +static void prop_format(char *tval, int tlen, const void *prop, int len) > > > > > +{ > > > > > + int i; > > > > > + const unsigned char *c; > > > > > + char *t; > > > > > + const char bin[] = "..."; > > > > > + > > > > > + for (i = 0, c = prop; i < len; ++i, ++c) { > > > > > + if (*c == '\0' && i == len - 1) { > > > > > + strncpy(tval, prop, tlen - 1); > > > > > + return; > > > > > + } > > > > > + if (*c < 0x20 || *c >= 0x80) { > > > > > + break; > > > > > + } > > > > > + } > > > > > + > > > > > + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { > > > > > + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { > > > > > + strcpy(t, bin); > > > > > + return; > > > > > + } > > > > > + if (i && i % 4 == 0 && i != len - 1) { > > > > > + strcat(t, " "); > > > > > + ++t; > > > > > + } > > > > > + t += sprintf(t, "%02X", *c & 0xFF); > > > > > + } > > > > > +} > > > > > + > > > > > +static int get_path(const void *fdt, int offset, char *buf, int len) > > > > > +{ > > > > > + int ret; > > > > > + > > > > > + ret = fdt_get_path(fdt, offset, buf, len - 1); > > > > > + if (ret < 0) { > > > > > + return ret; > > > > > + } > > > > > + > > > > > + buf[len - 1] = '\0'; > > > > > + > > > > > + return strlen(buf) + 1; > > > > > +} > > > > > + > > > > > +static int phandle_to_path(const void *fdt, uint32_t ph, > > > > > char *buf, int len) > > > > > +{ > > > > > + int ret; > > > > > + > > > > > + ret = fdt_node_offset_by_phandle(fdt, ph); > > > > > + if (ret < 0) { > > > > > + return ret; > > > > > + } > > > > > + > > > > > + return get_path(fdt, ret, buf, len); > > > > > +} > > > > > + > > > > > +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) > > > > > +{ > > > > > + char fullnode[VOF_MAX_PATH]; > > > > > + uint32_t ret = -1; > > > > > + int offset; > > > > > + > > > > > + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { > > > > > + return (uint32_t) ret; > > > > > + } > > > > > + > > > > > + offset = fdt_path_offset(fdt, fullnode); > > > > > + if (offset >= 0) { > > > > > + ret = fdt_get_phandle(fdt, offset); > > > > > + } > > > > > + trace_vof_finddevice(fullnode, ret); > > > > > + return (uint32_t) ret; > > > > > +} > > > > > > > > The Linux init function that runs on pegasos2 here: > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.234#n2658 > > > > > > > > calls finddevice once with isa@c and next with isa@C (small and > > > > capital C) both of which works with the board firmware but with > > > > vof the comparison is case sensitive and one of these fails so I > > > > can't make it work. I don't know if this is a problem in libfdt > > > > or the vof_finddevice above should do something else to get case > > > > insensitive comparison. > > > > > > This fixes the issue with Linux but I'm not sure if there's any > > > better solution or would it break anything else. > > > > The bit after "@" is an address and needs to be case insensitive and > > I'll fix this indeed. I'm not so sure about the part before "@", I > > cannot imagine what could break if I made search insensitive to case. Hm > > :-/ > > Fixing the match in the address part is probably enough as the name sent by > guests is probably always lower case I'm confused, I thought you just said that it looked for both isa@c and isa@C, which seems to contradict guests always using lower case. > but the address could be formatted > differently and that's what caused the problem. The patch below was only a > quick fix to be able to test it further but your fix should work too. > > With this and the ld replaced in entry.S I can now boot Linux which is > enough to submit the pegasos2 vof patch after an updated patch from you > fixes these in vof. > > MorphOS still misses something but I'm not sure what as it uses the data > gathered from the device tree later without printing diagnostics and fails > due to a NULL dereference much after that so it seems to assume some value > should exist but I'm not sure what value it needs and where that should come > from. Maybe I'll try some more to find out just to make it simpler to boot > but since it boots with the board firmware it's enough if Linux works with > vof for now.
On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote: > On Wed, 2 Jun 2021, David Gibson wrote: > > On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: > > > On Thu, 27 May 2021, David Gibson wrote: > > > > On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: > > > > > On Tue, 25 May 2021, David Gibson wrote: > > > > > > On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: > > > > > > > On Mon, 24 May 2021, David Gibson wrote: > > > > > > > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: > > > > > > > > > On Sun, 23 May 2021, BALATON Zoltan wrote: > > > > > > > > > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > > > > > > > > > > > One thing to note about PCI is that normally I think the client > > > > > > > > > > > expects the firmware to do PCI probing and SLOF does it. But VOF > > > > > > > > > > > does not and Linux scans PCI bus(es) itself. Might be a problem for > > > > > > > > > > > you kernel. > > > > > > > > > > > > > > > > > > > > I'm not sure what info does MorphOS get from the device tree and what it > > > > > > > > > > probes itself but I think it may at least need device ids and info about > > > > > > > > > > the PCI bus to be able to access the config regs, after that it should > > > > > > > > > > set the devices up hopefully. I could add these from the board code to > > > > > > > > > > device tree so VOF does not need to do anything about it. However I'm > > > > > > > > > > not getting to that point yet because it crashes on something that it's > > > > > > > > > > missing and couldn't yet find out what is that. > > > > > > > > > > > > > > > > > > > > I'd like to get Linux working now as that would be enough to test this > > > > > > > > > > and then if for MorphOS we still need a ROM it's not a problem if at > > > > > > > > > > least we can boot Linux without the original firmware. But I can't make > > > > > > > > > > Linux open a serial console and I don't know what it needs for that. Do > > > > > > > > > > you happen to know? I've looked at the sources in Linux/arch/powerpc but > > > > > > > > > > not sure how it would find and open a serial port on pegasos2. It seems > > > > > > > > > > to work with the board firmware and now I can get it to boot with VOF > > > > > > > > > > but then it does not open serial so it probably needs something in the > > > > > > > > > > device tree or expects the firmware to set something up that we should > > > > > > > > > > add in pegasos2.c when using VOF. > > > > > > > > > > > > > > > > > > I've now found that Linux uses rtas methods read-pci-config and > > > > > > > > > write-pci-config for PCI access on pegasos2 so this means that we'll > > > > > > > > > probably need rtas too (I hoped we could get away without it if it were only > > > > > > > > > used for shutdown/reboot or so but seems Linux needs it for PCI as well and > > > > > > > > > does not scan the bus and won't find some devices without it). > > > > > > > > > > > > > > > > Yes, definitely sounds like you'll need an RTAS implementation. > > > > > > > > > > > > > > I plan to fix that after managed to get serial working as that seems to not > > > > > > > need it. If I delete the rtas-size property from /rtas on the original > > > > > > > firmware that makes Linux skip instantiating rtas, but I still get serial > > > > > > > output just not accessing PCI devices. So I think it should work and keeps > > > > > > > things simpler at first. Then I'll try rtas later. > > > > > > > > > > > > > > > > While VOF can do rtas, this causes a problem with the hypercall method using > > > > > > > > > sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so > > > > > > > > > cannot work after guest is past quiesce. > > > > > > > > > > > > > > > > > So the question is why is that > > > > > > > > > assert there > > > > > > > > > > > > > > > > Ah.. right. So, vhyp was designed for the PAPR use case, where we > > > > > > > > want to model the CPU when it's in supervisor and user mode, but not > > > > > > > > when it's in hypervisor mode. We want qemu to mimic the behaviour of > > > > > > > > the hypervisor, rather than attempting to actually execute hypervisor > > > > > > > > code in the virtual CPU. > > > > > > > > > > > > > > > > On systems that have a hypervisor mode, SDR1 is hypervisor privileged, > > > > > > > > so it makes no sense for the guest to attempt to set it. That should > > > > > > > > be caught by the general SPR code and turned into a 0x700, hence the > > > > > > > > assert() if we somehow reach ppc_store_sdr1(). > > > > > > > > > > > > > > > > So, we are seeing a problem here because you want the 'sc 1' > > > > > > > > interception of vhyp, but not the rest of the stuff that goes with it. > > > > > > > > > > > > > > > > > and would using sc 1 for hypercalls on pegasos2 cause other > > > > > > > > > problems later even if the assert could be removed? > > > > > > > > > > > > > > > > At least in the short term, I think you probably can remove the > > > > > > > > assert. In your case the 'sc 1' calls aren't truly to a hypervisor, > > > > > > > > but a special case escape to qemu for the firmware emulation. I think > > > > > > > > it's unlikely to cause problems later, because nothing on a 32-bit > > > > > > > > system should be attempting an 'sc 1'. The only thing I can think of > > > > > > > > that would fail is some test case which explicitly verified that 'sc > > > > > > > > 1' triggered a 0x700 (SIGILL from userspace). > > > > > > > > > > > > > > OK so the assert should check if the CPU has an HV bit. I think there was a > > > > > > > #detine for that somewhere that I can add to the assert then I can try that. > > > > > > > What I wasn't sure about is that sc 1 would conflict with the guest's usage > > > > > > > of normal sc calls or are these going through different paths and only sc 1 > > > > > > > will trigger vhyp callback not affecting notmal sc calls? > > > > > > > > > > > > The vhyp shouldn't affect normal system calls, 'sc 1' is specifically > > > > > > for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the > > > > > > vhyp only intercepts the hypercall version (after all Linux on PAPR > > > > > > certainly uses its own system calls, and hypercalls are active for the > > > > > > lifetime of the guest there). > > > > > > > > > > > > > (Or if this causes > > > > > > > an otherwise unnecessary VM exit on KVM even when it works then maybe > > > > > > > looking for a different way in the future might be needed. > > > > > > > > > > > > What you're doing here won't work with KVM as it stands. There are > > > > > > basically two paths into the vhyp hypercall path: 1) from TCG, if we > > > > > > interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get > > > > > > a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. > > > > > > > > > > > > The second path is specific to the PAPR (ppc64) implementation of KVM, > > > > > > and will not work for a non-PAPR platform without substantial > > > > > > modification of the KVM code. > > > > > > > > > > OK so then at that point when we try KVM we'll need to look at alternative > > > > > ways, I think MOL OSI worked with KVM at least in MOL but will probably make > > > > > all syscalls exit KVM but since we'll probably need to use KVM PR it will > > > > > exit anyway. For now I keep this vhyp as it does not run with KVM for other > > > > > reasons yet so that's another area to clean up so as a proof of concept > > > > > first version of using VOF vhyp will do. > > > > > > > > Eh, since you'll need to modify KVM anyway, it probably makes just as > > > > much sense to modify it to catch the 'sc 1' as MoL's magic thingy. > > > > > > I'm not sure how KVM works for this case so I also don't know why and what > > > would need to be modified. I think we'll only have KVM PR working as newer > > > POWER CPUs having HV (besides being rare among potential users) are probably > > > too different to run the OSes that expect at most a G4 on pegasos2 so likely > > > it won't work with KVM HV. > > > > Oh, it definitely won't work with KVM HV. > > > > > If we have KVM PR doesn't sc already trap so we > > > could add MOL OSI without further modification to KVM itself only needing > > > change in QEMU? > > > > Uh... I guess so? > > > > > I also hope that MOL OSI could be useful for porting some > > > paravirt drivers from MOL for running Mac OS X on Mac emulation but I don't > > > know about that for sure so I'm open to any other solution too. > > > > Maybe. I never know much about MOL to begin with, and anything I did > > know was a decade or more ago so I've probably forgotten. > > That may still be more than what I know about it since I never had any > knowledge about PPC KVM and don't have any PPC hardware to test with so I'm > mostly guessing. (I could test with KVM emulated in QEMU and I did set up an > environment for that but that's a bit slow and inconvenient so I'd leave KVM > support to those interested and have more knowledge and hardware for it.) Sounds like a problem for someone else another time, then. > > > For now I'm > > > going with vhyp which is enough fot testing with TCG and if somebody wants > > > KVM they could use he original firmware for now so this could be improved in > > > a later version unless a simple solution is found before the freeze for 6.1. > > > If we're in KVM PR what happens for sc 1 could that be used too so maybe > > > what we have now could work? > > > > Note that if you do go down the MOL path it wouldn't be that complex > > to make a "vMOL" interface so you can use the same mechanism for KVM > > and TCG. > > Not sure what you mean by VMOL. Is it modifying MOL to use sc 1 like VOF > instead of its OSI way for hypercalls? No, I mean on the qemu side adding an optional hook which will intercept sc 0 instructions with the MOL magic register values and redirect them to a machine registered callback, rather than emulating the CPU's behaviour of jumping to the system call vector in guest space. Basically an equivalent of vhyp, but for MOL magic syscalls, instead of hypercalls. > That would lose the advantage of > being able to reuse MOL guest drivers without modification (which might be > useful for running OS X guest on Mac emulation) so if we can't use vhyp then > maybe using OSI would be the next choice for this reason but for now vhyp > seems to be working for what I could test so unless somebody here sees a > problem with it and has a better idea I'm going with vhyp for now just > because that's what VOF uses and I don't want to modify VOF to reuse it as > it is so I don't need to maintain a separate version and also get any > enhancements without further need to sync with spapr VOF. > > I've found this document about possible hypercall interfaces on KVM (see > Hypercall ABIs at the end): > > https://www.kernel.org/doc/html/latest/virt/kvm/ppc-pv.html > > Having both ePAPR (1.) and PAPR (2.) hypercalls is a bit confusing. Does > vhyp correspond to 2. PAPR? Yes. > The ePAPR (1.) seems to be preferred by KVM and > MOL OSI supported for compatibility. That document looks pretty out of date. Most of it is only discussing KVM PR, which is now barely maintained. KVM HV only works with PAPR hypercalls. > So if we need something else instead of > 2. PAPR hypercalls there seems to be two options: ePAPR and MOL OSI which > should work with KVM but then I'm not sure how to handle those on TCG. > > > > > > [...] > > > > > > > > > I've tested that the missing rtas is not the reason for getting no output > > > > > > > > > via serial though, as even when disabling rtas on pegasos2.rom it boots and > > > > > > > > > I still get serial output just some PCI devices are not detected (such as > > > > > > > > > USB, the video card and the not emulated ethernet port but these are not > > > > > > > > > fatal so it might even work as a first try without rtas, just to boot a > > > > > > > > > Linux kernel for testing it would be enough if I can fix the serial output). > > > > > > > > > I still don't know why it's not finding serial but I think it may be some > > > > > > > > > missing or wrong info in the device tree I generat. I'll try to focus on > > > > > > > > > this for now and leave the above rtas question for later. > > > > > > > > > > > > > > > > Oh.. another thought on that. You have an ISA serial port on Pegasos, > > > > > > > > I believe. I wonder if the PCI->ISA bridge needs some configuration / > > > > > > > > initialization that the firmware is expected to do. If so you'll need > > > > > > > > to mimic that setup in qemu for the VOF case. > > > > > > > > > > > > > > That's what I begin to think because I've added everything to the device > > > > > > > tree that I thought could be needed and I still don't get it working so it > > > > > > > may need some config from the firmware. But how do I access device registers > > > > > > > from board code? I've tried adding a machine reset method and write to > > > > > > > memory mapped device registers but all my attempts failed. I've tried > > > > > > > cpu_stl_le_data and even memory_region_dispatch_write but these did not get > > > > > > > to the device. What's the way to access guest mmio regs from QEMU? > > > > > > > > > > > > That's odd, cpu_stl() and memory_region_dispatch_write() should work > > > > > > from board code (after the relevant memory regions are configured, of > > > > > > course). As an ISA serial port, it's probably accessed through IO > > > > > > space, not memory space though, so you'd need &address_space_io. And > > > > > > if there is some bridge configuration then it's the bridge control > > > > > > registers you need to look at not the serial registers - you'd have to > > > > > > look at the bridge documentation for that. Or, I guess the bridge > > > > > > implementation in qemu, which you wrote part of. > > > > > > > > > > I've found at last that stl_le_phys() works. There are so many of these that > > > > > I never know when to use which. > > > > > > > > > > I think the address_space_rw calls in vof_client_call() in vof.c could also > > > > > use these for somewhat shorter code. I've ended up with > > > > > stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset methodbut I don't > > > > > even need that now as it works without additional setup. Also VOF's memory > > > > > access is basically the same as the already existing rtas_st() and co. so > > > > > maybe that could be reused to make code smaller? > > > > > > > > rtas_ld() and rtas_st() should only be used for reading/writing RTAS > > > > parameters to and from memory. Accessing IO shouldn't be done with > > > > those. > > > > > > > > For IO you probably want the cpu_st*() variants in most cases, since > > > > you're trying to emulate an IO access from the virtual cpu. > > > > > > I think I've tried that but what worked to access mmio device registers are > > > stl_le_phys and similar that are wrappers around address_space_stl_*. But I > > > did not mean that for rtas_ld/_st but the part when vof accessing the > > > parameters passed by its hypercall which is memory access: > > > > > > https://github.com/patchew-project/qemu/blob/patchew/20210520090557.435689-1-aik%40ozlabs.ru/hw/ppc/vof.c > > > > > > line 893, and vof_client_call before that is very similar to what h_rtas > > > does here: > > > > > > https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_hcall.c;h=f25014afda408002ee1ec1027a0dd7a6025eca61;hb=HEAD#l639 > > > > > > and I also need to do the same for rtas in pegasos2 for which I'm just using > > > ldl_be_phys for now but I wonder if we really need 3 ways to do the same or > > > the rtas_ld/_st could be made more generic and reused here? > > > > For your rtas implementation you could definitely re-use them. For > > the client call I'm a bit less confident, but if the in-guest-memory > > structures are really the same, then it would make sense. > > The memory structure seems very similar to me, the only difference is > calling the first field service in VOF instead of token in RTAS. Both are > just an array of big endian unit32_t with token, nargs, nret at the front > followed by args and rets. Since these rtas_ld/st are defined in spapr.h I > did not bother to split them off, so for pegasos2 rtas I'm just using the > ldl_be_* functions directly for which these are a shorthand for. If these > were split off for sharing between spapr rtas and VOF I may be able to reuse > them as well but it's not that important so just mentioned it as a possible > later clean up. Ok, sounds reasonable to re-use them then, though maybe add an aliased name for clarity ofci_{ld,st}(), maybe? (for "Open Firmware Client Interface")
On Fri, 4 Jun 2021, David Gibson wrote: > On Tue, Jun 01, 2021 at 04:12:44PM +0200, BALATON Zoltan wrote: >> On Tue, 1 Jun 2021, Alexey Kardashevskiy wrote: >>> On 31/05/2021 23:07, BALATON Zoltan wrote: >>>> On Sun, 30 May 2021, BALATON Zoltan wrote: >>>>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>>>> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c >>>>>> new file mode 100644 >>>>>> index 000000000000..a283b7d251a7 >>>>>> --- /dev/null >>>>>> +++ b/hw/ppc/vof.c >>>>>> @@ -0,0 +1,1021 @@ >>>>>> +/* >>>>>> + * QEMU PowerPC Virtual Open Firmware. >>>>>> + * >>>>>> + * This implements client interface from OpenFirmware >>>>>> IEEE1275 on the QEMU >>>>>> + * side to leave only a very basic firmware in the VM. >>>>>> + * >>>>>> + * Copyright (c) 2021 IBM Corporation. >>>>>> + * >>>>>> + * SPDX-License-Identifier: GPL-2.0-or-later >>>>>> + */ >>>>>> + >>>>>> +#include "qemu/osdep.h" >>>>>> +#include "qemu-common.h" >>>>>> +#include "qemu/timer.h" >>>>>> +#include "qemu/range.h" >>>>>> +#include "qemu/units.h" >>>>>> +#include "qapi/error.h" >>>>>> +#include <sys/ioctl.h> >>>>>> +#include "exec/ram_addr.h" >>>>>> +#include "exec/address-spaces.h" >>>>>> +#include "hw/ppc/vof.h" >>>>>> +#include "hw/ppc/fdt.h" >>>>>> +#include "sysemu/runstate.h" >>>>>> +#include "qom/qom-qobject.h" >>>>>> +#include "trace.h" >>>>>> + >>>>>> +#include <libfdt.h> >>>>>> + >>>>>> +/* >>>>>> + * OF 1275 "nextprop" description suggests is it 32 bytes max but >>>>>> + * LoPAPR defines "ibm,query-interrupt-source-number" which >>>>>> is 33 chars long. >>>>>> + */ >>>>>> +#define OF_PROPNAME_LEN_MAX 64 >>>>>> + >>>>>> +#define VOF_MAX_PATH 256 >>>>>> +#define VOF_MAX_SETPROPLEN 2048 >>>>>> +#define VOF_MAX_METHODLEN 256 >>>>>> +#define VOF_MAX_FORTHCODE 256 >>>>>> +#define VOF_VTY_BUF_SIZE 256 >>>>>> + >>>>>> +typedef struct { >>>>>> + uint64_t start; >>>>>> + uint64_t size; >>>>>> +} OfClaimed; >>>>>> + >>>>>> +typedef struct { >>>>>> + char *path; /* the path used to open the instance */ >>>>>> + uint32_t phandle; >>>>>> +} OfInstance; >>>>>> + >>>>>> +#define VOF_MEM_READ(pa, buf, size) \ >>>>>> + address_space_read_full(&address_space_memory, \ >>>>>> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >>>>>> +#define VOF_MEM_WRITE(pa, buf, size) \ >>>>>> + address_space_write(&address_space_memory, \ >>>>>> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >>>>>> + >>>>>> +static int readstr(hwaddr pa, char *buf, int size) >>>>>> +{ >>>>>> + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { >>>>>> + return -1; >>>>>> + } >>>>>> + if (strnlen(buf, size) == size) { >>>>>> + buf[size - 1] = '\0'; >>>>>> + trace_vof_error_str_truncated(buf, size); >>>>>> + return -1; >>>>>> + } >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, >>>>>> + const char *s1, unsigned nargscheck, >>>>>> unsigned nretcheck) >>>>>> +{ >>>>>> + if (strcmp(s, s1)) { >>>>>> + return false; >>>>>> + } >>>>>> + if ((nargscheck && (nargs != nargscheck)) || >>>>>> + (nretcheck && (nret != nretcheck))) { >>>>>> + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); >>>>>> + return false; >>>>>> + } >>>>>> + >>>>>> + return true; >>>>>> +} >>>>>> + >>>>>> +static void prop_format(char *tval, int tlen, const void *prop, int len) >>>>>> +{ >>>>>> + int i; >>>>>> + const unsigned char *c; >>>>>> + char *t; >>>>>> + const char bin[] = "..."; >>>>>> + >>>>>> + for (i = 0, c = prop; i < len; ++i, ++c) { >>>>>> + if (*c == '\0' && i == len - 1) { >>>>>> + strncpy(tval, prop, tlen - 1); >>>>>> + return; >>>>>> + } >>>>>> + if (*c < 0x20 || *c >= 0x80) { >>>>>> + break; >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { >>>>>> + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { >>>>>> + strcpy(t, bin); >>>>>> + return; >>>>>> + } >>>>>> + if (i && i % 4 == 0 && i != len - 1) { >>>>>> + strcat(t, " "); >>>>>> + ++t; >>>>>> + } >>>>>> + t += sprintf(t, "%02X", *c & 0xFF); >>>>>> + } >>>>>> +} >>>>>> + >>>>>> +static int get_path(const void *fdt, int offset, char *buf, int len) >>>>>> +{ >>>>>> + int ret; >>>>>> + >>>>>> + ret = fdt_get_path(fdt, offset, buf, len - 1); >>>>>> + if (ret < 0) { >>>>>> + return ret; >>>>>> + } >>>>>> + >>>>>> + buf[len - 1] = '\0'; >>>>>> + >>>>>> + return strlen(buf) + 1; >>>>>> +} >>>>>> + >>>>>> +static int phandle_to_path(const void *fdt, uint32_t ph, >>>>>> char *buf, int len) >>>>>> +{ >>>>>> + int ret; >>>>>> + >>>>>> + ret = fdt_node_offset_by_phandle(fdt, ph); >>>>>> + if (ret < 0) { >>>>>> + return ret; >>>>>> + } >>>>>> + >>>>>> + return get_path(fdt, ret, buf, len); >>>>>> +} >>>>>> + >>>>>> +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) >>>>>> +{ >>>>>> + char fullnode[VOF_MAX_PATH]; >>>>>> + uint32_t ret = -1; >>>>>> + int offset; >>>>>> + >>>>>> + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { >>>>>> + return (uint32_t) ret; >>>>>> + } >>>>>> + >>>>>> + offset = fdt_path_offset(fdt, fullnode); >>>>>> + if (offset >= 0) { >>>>>> + ret = fdt_get_phandle(fdt, offset); >>>>>> + } >>>>>> + trace_vof_finddevice(fullnode, ret); >>>>>> + return (uint32_t) ret; >>>>>> +} >>>>> >>>>> The Linux init function that runs on pegasos2 here: >>>>> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.234#n2658 >>>>> >>>>> calls finddevice once with isa@c and next with isa@C (small and >>>>> capital C) both of which works with the board firmware but with >>>>> vof the comparison is case sensitive and one of these fails so I >>>>> can't make it work. I don't know if this is a problem in libfdt >>>>> or the vof_finddevice above should do something else to get case >>>>> insensitive comparison. >>>> >>>> This fixes the issue with Linux but I'm not sure if there's any >>>> better solution or would it break anything else. >>> >>> The bit after "@" is an address and needs to be case insensitive and >>> I'll fix this indeed. I'm not so sure about the part before "@", I >>> cannot imagine what could break if I made search insensitive to case. Hm >>> :-/ >> >> Fixing the match in the address part is probably enough as the name sent by >> guests is probably always lower case > > I'm confused, I thought you just said that it looked for both isa@c > and isa@C, which seems to contradict guests always using lower case. I mean the part before the @ sign (that is the name part, "isa" above) is always lower case. I haven't seen guests trying to query that with other than lower case but the part after @ can be different even in the same guest code just a few lines apart as in the Linux kernel. So fixing the comparison to e.g. do toupper in the address part after @ should work I think even if we continue to do case sensitive comparison in the name part. Alexey said he'll fix that so there's no problem. Regards, BALATON Zoltam
On Fri, 4 Jun 2021, David Gibson wrote: > On Sun, May 30, 2021 at 07:33:01PM +0200, BALATON Zoltan wrote: >> Hello, >> >> Two more problems I've found while testing with pegasos2 but I'm not sure >> how to fix them: >> >> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c >>> new file mode 100644 >>> index 000000000000..a283b7d251a7 >>> --- /dev/null >>> +++ b/hw/ppc/vof.c >>> @@ -0,0 +1,1021 @@ >>> +/* >>> + * QEMU PowerPC Virtual Open Firmware. >>> + * >>> + * This implements client interface from OpenFirmware IEEE1275 on the QEMU >>> + * side to leave only a very basic firmware in the VM. >>> + * >>> + * Copyright (c) 2021 IBM Corporation. >>> + * >>> + * SPDX-License-Identifier: GPL-2.0-or-later >>> + */ >>> + >>> +#include "qemu/osdep.h" >>> +#include "qemu-common.h" >>> +#include "qemu/timer.h" >>> +#include "qemu/range.h" >>> +#include "qemu/units.h" >>> +#include "qapi/error.h" >>> +#include <sys/ioctl.h> >>> +#include "exec/ram_addr.h" >>> +#include "exec/address-spaces.h" >>> +#include "hw/ppc/vof.h" >>> +#include "hw/ppc/fdt.h" >>> +#include "sysemu/runstate.h" >>> +#include "qom/qom-qobject.h" >>> +#include "trace.h" >>> + >>> +#include <libfdt.h> >>> + >>> +/* >>> + * OF 1275 "nextprop" description suggests is it 32 bytes max but >>> + * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 chars long. >>> + */ >>> +#define OF_PROPNAME_LEN_MAX 64 >>> + >>> +#define VOF_MAX_PATH 256 >>> +#define VOF_MAX_SETPROPLEN 2048 >>> +#define VOF_MAX_METHODLEN 256 >>> +#define VOF_MAX_FORTHCODE 256 >>> +#define VOF_VTY_BUF_SIZE 256 >>> + >>> +typedef struct { >>> + uint64_t start; >>> + uint64_t size; >>> +} OfClaimed; >>> + >>> +typedef struct { >>> + char *path; /* the path used to open the instance */ >>> + uint32_t phandle; >>> +} OfInstance; >>> + >>> +#define VOF_MEM_READ(pa, buf, size) \ >>> + address_space_read_full(&address_space_memory, \ >>> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >>> +#define VOF_MEM_WRITE(pa, buf, size) \ >>> + address_space_write(&address_space_memory, \ >>> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >>> + >>> +static int readstr(hwaddr pa, char *buf, int size) >>> +{ >>> + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { >>> + return -1; >>> + } >>> + if (strnlen(buf, size) == size) { >>> + buf[size - 1] = '\0'; >>> + trace_vof_error_str_truncated(buf, size); >>> + return -1; >>> + } >>> + return 0; >>> +} >>> + >>> +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, >>> + const char *s1, unsigned nargscheck, unsigned nretcheck) >>> +{ >>> + if (strcmp(s, s1)) { >>> + return false; >>> + } >>> + if ((nargscheck && (nargs != nargscheck)) || >>> + (nretcheck && (nret != nretcheck))) { >>> + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); >>> + return false; >>> + } >>> + >>> + return true; >>> +} >>> + >>> +static void prop_format(char *tval, int tlen, const void *prop, int len) >>> +{ >>> + int i; >>> + const unsigned char *c; >>> + char *t; >>> + const char bin[] = "..."; >>> + >>> + for (i = 0, c = prop; i < len; ++i, ++c) { >>> + if (*c == '\0' && i == len - 1) { >>> + strncpy(tval, prop, tlen - 1); >>> + return; >>> + } >>> + if (*c < 0x20 || *c >= 0x80) { >>> + break; >>> + } >>> + } >>> + >>> + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { >>> + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { >>> + strcpy(t, bin); >>> + return; >>> + } >>> + if (i && i % 4 == 0 && i != len - 1) { >>> + strcat(t, " "); >>> + ++t; >>> + } >>> + t += sprintf(t, "%02X", *c & 0xFF); >>> + } >>> +} >>> + >>> +static int get_path(const void *fdt, int offset, char *buf, int len) >>> +{ >>> + int ret; >>> + >>> + ret = fdt_get_path(fdt, offset, buf, len - 1); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + >>> + buf[len - 1] = '\0'; >>> + >>> + return strlen(buf) + 1; >>> +} >>> + >>> +static int phandle_to_path(const void *fdt, uint32_t ph, char *buf, int len) >>> +{ >>> + int ret; >>> + >>> + ret = fdt_node_offset_by_phandle(fdt, ph); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + >>> + return get_path(fdt, ret, buf, len); >>> +} >>> + >>> +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) >>> +{ >>> + char fullnode[VOF_MAX_PATH]; >>> + uint32_t ret = -1; >>> + int offset; >>> + >>> + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { >>> + return (uint32_t) ret; >>> + } >>> + >>> + offset = fdt_path_offset(fdt, fullnode); >>> + if (offset >= 0) { >>> + ret = fdt_get_phandle(fdt, offset); >>> + } >>> + trace_vof_finddevice(fullnode, ret); >>> + return (uint32_t) ret; >>> +} >> >> The Linux init function that runs on pegasos2 here: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.234#n2658 >> >> calls finddevice once with isa@c and next with isa@C (small and capital C) >> both of which works with the board firmware but with vof the comparison is >> case sensitive and one of these fails so I can't make it work. I don't know >> if this is a problem in libfdt or the vof_finddevice above should do >> something else to get case insensitive comparison. > > This is kind of a subtle incompatibility between the traditional OF > world and the flat tree world. In traditional OF, the unit address > (bit after the @) doesn't exist as a string. Instead when you do the > finddevice it will parse that address and compare it against the 'reg' > properties for each of the relevant nodes. Since that's an integer > comparison, case doesn't enter into it. > > But, how to parse (and write) addresses depends on the bus, so the > firmware has to understand each bus type and act accordingly. That > doesn't really work in the world of minimal firmwares dor the flat > tree. So instead, we just incorporate a pre-formatted unit address in > the flat tree directly. Most of the time that works fine, but there > are some edge cases like the one you've hit. OK, thanks for the clarification, as said in previous message I think doing case insesitive comparison just in the address part should work then we don't have to implement reg parsing in VOF. >>> +static const void *getprop(const void *fdt, int nodeoff, const char *propname, >>> + int *proplen, bool *write0) >>> +{ >>> + const char *unit, *prop; >>> + >>> + /* >>> + * The "name" property is not actually stored as a property in the FDT, >>> + * we emulate it by returning a pointer to the node's name and adjust >>> + * proplen to include only the name but not the unit. >>> + */ >>> + if (strcmp(propname, "name") == 0) { >>> + prop = fdt_get_name(fdt, nodeoff, proplen); >>> + if (!prop) { >>> + *proplen = 0; >>> + return NULL; >>> + } >>> + >>> + unit = memchr(prop, '@', *proplen); >>> + if (unit) { >>> + *proplen = unit - prop; >>> + } >>> + *proplen += 1; >>> + >>> + /* >>> + * Since it might be cut at "@" and there will be no trailing zero >>> + * in the prop buffer, tell the caller to write zero at the end. >>> + */ >>> + if (write0) { >>> + *write0 = true; >>> + } >>> + return prop; >>> + } >>> + >>> + if (write0) { >>> + *write0 = false; >>> + } >>> + return fdt_getprop(fdt, nodeoff, propname, proplen); >>> +} >> >> MorphOS checks the name property of the root node ("/") to decide what >> platform it runs on so we may need to be able to set this property on / >> where it should return "bplan,Pegasos2", therefore the above maybe should do >> getprop first and only generate name property if it's not set (or at least >> check if we're on the root node and allow setting name property there). (On >> Macs the root node is named "device-tree" and this was before found to be >> needed for MorphOS.) > > Ah. Hrm. Have to think about what to do about that. This is easy to fix, this seems to allow setting a name property or return a default: >diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c index b47bbd509d..746842593e 100644 --- a/hw/ppc/vof.c +++ b/hw/ppc/vof.c @@ -163,14 +163,14 @@ static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) static const void *getprop(const void *fdt, int nodeoff, const char *propname, int *proplen, bool *write0) { - const char *unit, *prop; + const char *unit, *prop = fdt_getprop(fdt, nodeoff, propname, proplen); /* * The "name" property is not actually stored as a property in the FDT, * we emulate it by returning a pointer to the node's name and adjust * proplen to include only the name but not the unit. */ - if (strcmp(propname, "name") == 0) { + if (!prop && strcmp(propname, "name") == 0) { prop = fdt_get_name(fdt, nodeoff, proplen); if (!prop) { *proplen = 0; @@ -196,7 +196,7 @@ static const void *getprop(const void *fdt, int nodeoff, const char *propname, if (write0) { *write0 = false; } - return fdt_getprop(fdt, nodeoff, propname, proplen); + return prop; } This allows adding a name property to "/" different from the default but this does not yet fix MorphOS booting with VOF on pegasos2. I think it tries to query name on / and check if it's called "device-tree" in which case it assumes Mac hardware otherwise it goes with pegasos2 so even if we return nothing for name it would not matter in this case as we don't use VOF on Mac. If we wanted that then this would become a problem so it could be fixed now in advance just in case other guests may need it. >> Other than the above two problems, I've found that getting the device tree >> from vof returns it in reverse order compared to the board firmware if I add >> it the expected order. This may or may not be a problem but to avoid it I >> can build the tree in reverse order then it comes out right so unless >> there's an easy fix this should not cause a problem but may worth a comment >> somewhere. > > The order of things in the device tree *should* never matter. If it > does, that's definitely a client bug... but of course that doesn't > necessarily mean we won't have to work around it in practice. I don't know if it matters or not but having the device tree in the same order as the firmware ROM helps with comparing it for debugging but I've found I can solve this by building the tree in reverse order so no changes to VOF is needed for this, just thought adding a comment somewhere may clarify it but it's not really a problem. I still don't know what's MorphOS is missing, I've tried adding almost all misssing properties, checked what hardware is init by the firmware and tried to do the same in board reset code and even after that MorphOS still takes a different route with VOF and crashes but boots with the board firmware. I'm now thinking it may be either different memory organisation or the missing name properties that are not returned by nextprop in VOF so they are only appearing when explicitely queried whereas with the board firmware they are present as properties. With the above patch I could explicitely set it on nodes and test if that makes a difference. I got to this because adding more missing props or init more devices did not make a difference so I'm guessing it may be something else then and the only difference I can see compared to board firmware are the different memory ranges in claimed (VOF puts itself to 0 for example); and the missing name and additional phandle props in the device tree. MorphOS copies the whole device tree on startup then later it uses this copy of the device tree after shutting down OF with quiesce. I can imagine it may use some name props like that on the cpu node without checking assuming it's always there and if we're missing that it may cause a NULL dereference. I have no better idea what else could be missing so I'll test this next. If it helps I can try to come up with a patch to VOF to return these name props or allow setting them as above. Regards, BALATON Zoltan
On Fri, 4 Jun 2021, David Gibson wrote: > On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote: >> On Wed, 2 Jun 2021, David Gibson wrote: >>> On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: >>>> On Thu, 27 May 2021, David Gibson wrote: >>>>> On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: >>>>>> On Tue, 25 May 2021, David Gibson wrote: >>>>>>> On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: >>>>>>>> On Mon, 24 May 2021, David Gibson wrote: >>>>>>>>> On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >>>>>>>>>> On Sun, 23 May 2021, BALATON Zoltan wrote: >>>>>>>>>>> On Sun, 23 May 2021, Alexey Kardashevskiy wrote: >>>>>>>>>>>> One thing to note about PCI is that normally I think the client >>>>>>>>>>>> expects the firmware to do PCI probing and SLOF does it. But VOF >>>>>>>>>>>> does not and Linux scans PCI bus(es) itself. Might be a problem for >>>>>>>>>>>> you kernel. >>>>>>>>>>> >>>>>>>>>>> I'm not sure what info does MorphOS get from the device tree and what it >>>>>>>>>>> probes itself but I think it may at least need device ids and info about >>>>>>>>>>> the PCI bus to be able to access the config regs, after that it should >>>>>>>>>>> set the devices up hopefully. I could add these from the board code to >>>>>>>>>>> device tree so VOF does not need to do anything about it. However I'm >>>>>>>>>>> not getting to that point yet because it crashes on something that it's >>>>>>>>>>> missing and couldn't yet find out what is that. >>>>>>>>>>> >>>>>>>>>>> I'd like to get Linux working now as that would be enough to test this >>>>>>>>>>> and then if for MorphOS we still need a ROM it's not a problem if at >>>>>>>>>>> least we can boot Linux without the original firmware. But I can't make >>>>>>>>>>> Linux open a serial console and I don't know what it needs for that. Do >>>>>>>>>>> you happen to know? I've looked at the sources in Linux/arch/powerpc but >>>>>>>>>>> not sure how it would find and open a serial port on pegasos2. It seems >>>>>>>>>>> to work with the board firmware and now I can get it to boot with VOF >>>>>>>>>>> but then it does not open serial so it probably needs something in the >>>>>>>>>>> device tree or expects the firmware to set something up that we should >>>>>>>>>>> add in pegasos2.c when using VOF. >>>>>>>>>> >>>>>>>>>> I've now found that Linux uses rtas methods read-pci-config and >>>>>>>>>> write-pci-config for PCI access on pegasos2 so this means that we'll >>>>>>>>>> probably need rtas too (I hoped we could get away without it if it were only >>>>>>>>>> used for shutdown/reboot or so but seems Linux needs it for PCI as well and >>>>>>>>>> does not scan the bus and won't find some devices without it). >>>>>>>>> >>>>>>>>> Yes, definitely sounds like you'll need an RTAS implementation. >>>>>>>> >>>>>>>> I plan to fix that after managed to get serial working as that seems to not >>>>>>>> need it. If I delete the rtas-size property from /rtas on the original >>>>>>>> firmware that makes Linux skip instantiating rtas, but I still get serial >>>>>>>> output just not accessing PCI devices. So I think it should work and keeps >>>>>>>> things simpler at first. Then I'll try rtas later. >>>>>>>> >>>>>>>>>> While VOF can do rtas, this causes a problem with the hypercall method using >>>>>>>>>> sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so >>>>>>>>>> cannot work after guest is past quiesce. >>>>>>>>> >>>>>>>>>> So the question is why is that >>>>>>>>>> assert there >>>>>>>>> >>>>>>>>> Ah.. right. So, vhyp was designed for the PAPR use case, where we >>>>>>>>> want to model the CPU when it's in supervisor and user mode, but not >>>>>>>>> when it's in hypervisor mode. We want qemu to mimic the behaviour of >>>>>>>>> the hypervisor, rather than attempting to actually execute hypervisor >>>>>>>>> code in the virtual CPU. >>>>>>>>> >>>>>>>>> On systems that have a hypervisor mode, SDR1 is hypervisor privileged, >>>>>>>>> so it makes no sense for the guest to attempt to set it. That should >>>>>>>>> be caught by the general SPR code and turned into a 0x700, hence the >>>>>>>>> assert() if we somehow reach ppc_store_sdr1(). >>>>>>>>> >>>>>>>>> So, we are seeing a problem here because you want the 'sc 1' >>>>>>>>> interception of vhyp, but not the rest of the stuff that goes with it. >>>>>>>>> >>>>>>>>>> and would using sc 1 for hypercalls on pegasos2 cause other >>>>>>>>>> problems later even if the assert could be removed? >>>>>>>>> >>>>>>>>> At least in the short term, I think you probably can remove the >>>>>>>>> assert. In your case the 'sc 1' calls aren't truly to a hypervisor, >>>>>>>>> but a special case escape to qemu for the firmware emulation. I think >>>>>>>>> it's unlikely to cause problems later, because nothing on a 32-bit >>>>>>>>> system should be attempting an 'sc 1'. The only thing I can think of >>>>>>>>> that would fail is some test case which explicitly verified that 'sc >>>>>>>>> 1' triggered a 0x700 (SIGILL from userspace). >>>>>>>> >>>>>>>> OK so the assert should check if the CPU has an HV bit. I think there was a >>>>>>>> #detine for that somewhere that I can add to the assert then I can try that. >>>>>>>> What I wasn't sure about is that sc 1 would conflict with the guest's usage >>>>>>>> of normal sc calls or are these going through different paths and only sc 1 >>>>>>>> will trigger vhyp callback not affecting notmal sc calls? >>>>>>> >>>>>>> The vhyp shouldn't affect normal system calls, 'sc 1' is specifically >>>>>>> for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the >>>>>>> vhyp only intercepts the hypercall version (after all Linux on PAPR >>>>>>> certainly uses its own system calls, and hypercalls are active for the >>>>>>> lifetime of the guest there). >>>>>>> >>>>>>>> (Or if this causes >>>>>>>> an otherwise unnecessary VM exit on KVM even when it works then maybe >>>>>>>> looking for a different way in the future might be needed. >>>>>>> >>>>>>> What you're doing here won't work with KVM as it stands. There are >>>>>>> basically two paths into the vhyp hypercall path: 1) from TCG, if we >>>>>>> interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get >>>>>>> a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. >>>>>>> >>>>>>> The second path is specific to the PAPR (ppc64) implementation of KVM, >>>>>>> and will not work for a non-PAPR platform without substantial >>>>>>> modification of the KVM code. >>>>>> >>>>>> OK so then at that point when we try KVM we'll need to look at alternative >>>>>> ways, I think MOL OSI worked with KVM at least in MOL but will probably make >>>>>> all syscalls exit KVM but since we'll probably need to use KVM PR it will >>>>>> exit anyway. For now I keep this vhyp as it does not run with KVM for other >>>>>> reasons yet so that's another area to clean up so as a proof of concept >>>>>> first version of using VOF vhyp will do. >>>>> >>>>> Eh, since you'll need to modify KVM anyway, it probably makes just as >>>>> much sense to modify it to catch the 'sc 1' as MoL's magic thingy. >>>> >>>> I'm not sure how KVM works for this case so I also don't know why and what >>>> would need to be modified. I think we'll only have KVM PR working as newer >>>> POWER CPUs having HV (besides being rare among potential users) are probably >>>> too different to run the OSes that expect at most a G4 on pegasos2 so likely >>>> it won't work with KVM HV. >>> >>> Oh, it definitely won't work with KVM HV. >>> >>>> If we have KVM PR doesn't sc already trap so we >>>> could add MOL OSI without further modification to KVM itself only needing >>>> change in QEMU? >>> >>> Uh... I guess so? >>> >>>> I also hope that MOL OSI could be useful for porting some >>>> paravirt drivers from MOL for running Mac OS X on Mac emulation but I don't >>>> know about that for sure so I'm open to any other solution too. >>> >>> Maybe. I never know much about MOL to begin with, and anything I did >>> know was a decade or more ago so I've probably forgotten. >> >> That may still be more than what I know about it since I never had any >> knowledge about PPC KVM and don't have any PPC hardware to test with so I'm >> mostly guessing. (I could test with KVM emulated in QEMU and I did set up an >> environment for that but that's a bit slow and inconvenient so I'd leave KVM >> support to those interested and have more knowledge and hardware for it.) > > Sounds like a problem for someone else another time, then. > >>>> For now I'm >>>> going with vhyp which is enough fot testing with TCG and if somebody wants >>>> KVM they could use he original firmware for now so this could be improved in >>>> a later version unless a simple solution is found before the freeze for 6.1. >>>> If we're in KVM PR what happens for sc 1 could that be used too so maybe >>>> what we have now could work? >>> >>> Note that if you do go down the MOL path it wouldn't be that complex >>> to make a "vMOL" interface so you can use the same mechanism for KVM >>> and TCG. >> >> Not sure what you mean by VMOL. Is it modifying MOL to use sc 1 like VOF >> instead of its OSI way for hypercalls? > > No, I mean on the qemu side adding an optional hook which will > intercept sc 0 instructions with the MOL magic register values and > redirect them to a machine registered callback, rather than emulating > the CPU's behaviour of jumping to the system call vector in guest > space. > > Basically an equivalent of vhyp, but for MOL magic syscalls, instead > of hypercalls. OK, that's basically what BenH's OSI patch I've linked to before did I think, it may just need updating for changes in target/ppc since that patch was created. However that would also mean we'd need another version of VOF that uses this instead of sc 1 then so unless we need that I'd keep a single VOF that works for both spapr and pegasos2. >> That would lose the advantage of >> being able to reuse MOL guest drivers without modification (which might be >> useful for running OS X guest on Mac emulation) so if we can't use vhyp then >> maybe using OSI would be the next choice for this reason but for now vhyp >> seems to be working for what I could test so unless somebody here sees a >> problem with it and has a better idea I'm going with vhyp for now just >> because that's what VOF uses and I don't want to modify VOF to reuse it as >> it is so I don't need to maintain a separate version and also get any >> enhancements without further need to sync with spapr VOF. >> >> I've found this document about possible hypercall interfaces on KVM (see >> Hypercall ABIs at the end): >> >> https://www.kernel.org/doc/html/latest/virt/kvm/ppc-pv.html >> >> Having both ePAPR (1.) and PAPR (2.) hypercalls is a bit confusing. Does >> vhyp correspond to 2. PAPR? > > Yes. What's ePAPR then and how is it different from PAPR? I mean the acronym not the hypercall method, the latter is explained in that doc but what ePAPR stands for and why is that method called like that is not clear to me. >> The ePAPR (1.) seems to be preferred by KVM and >> MOL OSI supported for compatibility. > > That document looks pretty out of date. Most of it is only discussing > KVM PR, which is now barely maintained. KVM HV only works with PAPR > hypercalls. The links says it's latest kernel docs, so maybe an update need to be sent to KVM? >> So if we need something else instead of >> 2. PAPR hypercalls there seems to be two options: ePAPR and MOL OSI which >> should work with KVM but then I'm not sure how to handle those on TCG. >> >>>>>> [...] >>>>>>>>>> I've tested that the missing rtas is not the reason for getting no output >>>>>>>>>> via serial though, as even when disabling rtas on pegasos2.rom it boots and >>>>>>>>>> I still get serial output just some PCI devices are not detected (such as >>>>>>>>>> USB, the video card and the not emulated ethernet port but these are not >>>>>>>>>> fatal so it might even work as a first try without rtas, just to boot a >>>>>>>>>> Linux kernel for testing it would be enough if I can fix the serial output). >>>>>>>>>> I still don't know why it's not finding serial but I think it may be some >>>>>>>>>> missing or wrong info in the device tree I generat. I'll try to focus on >>>>>>>>>> this for now and leave the above rtas question for later. >>>>>>>>> >>>>>>>>> Oh.. another thought on that. You have an ISA serial port on Pegasos, >>>>>>>>> I believe. I wonder if the PCI->ISA bridge needs some configuration / >>>>>>>>> initialization that the firmware is expected to do. If so you'll need >>>>>>>>> to mimic that setup in qemu for the VOF case. >>>>>>>> >>>>>>>> That's what I begin to think because I've added everything to the device >>>>>>>> tree that I thought could be needed and I still don't get it working so it >>>>>>>> may need some config from the firmware. But how do I access device registers >>>>>>>> from board code? I've tried adding a machine reset method and write to >>>>>>>> memory mapped device registers but all my attempts failed. I've tried >>>>>>>> cpu_stl_le_data and even memory_region_dispatch_write but these did not get >>>>>>>> to the device. What's the way to access guest mmio regs from QEMU? >>>>>>> >>>>>>> That's odd, cpu_stl() and memory_region_dispatch_write() should work >>>>>>> from board code (after the relevant memory regions are configured, of >>>>>>> course). As an ISA serial port, it's probably accessed through IO >>>>>>> space, not memory space though, so you'd need &address_space_io. And >>>>>>> if there is some bridge configuration then it's the bridge control >>>>>>> registers you need to look at not the serial registers - you'd have to >>>>>>> look at the bridge documentation for that. Or, I guess the bridge >>>>>>> implementation in qemu, which you wrote part of. >>>>>> >>>>>> I've found at last that stl_le_phys() works. There are so many of these that >>>>>> I never know when to use which. >>>>>> >>>>>> I think the address_space_rw calls in vof_client_call() in vof.c could also >>>>>> use these for somewhat shorter code. I've ended up with >>>>>> stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset methodbut I don't >>>>>> even need that now as it works without additional setup. Also VOF's memory >>>>>> access is basically the same as the already existing rtas_st() and co. so >>>>>> maybe that could be reused to make code smaller? >>>>> >>>>> rtas_ld() and rtas_st() should only be used for reading/writing RTAS >>>>> parameters to and from memory. Accessing IO shouldn't be done with >>>>> those. >>>>> >>>>> For IO you probably want the cpu_st*() variants in most cases, since >>>>> you're trying to emulate an IO access from the virtual cpu. >>>> >>>> I think I've tried that but what worked to access mmio device registers are >>>> stl_le_phys and similar that are wrappers around address_space_stl_*. But I >>>> did not mean that for rtas_ld/_st but the part when vof accessing the >>>> parameters passed by its hypercall which is memory access: >>>> >>>> https://github.com/patchew-project/qemu/blob/patchew/20210520090557.435689-1-aik%40ozlabs.ru/hw/ppc/vof.c >>>> >>>> line 893, and vof_client_call before that is very similar to what h_rtas >>>> does here: >>>> >>>> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_hcall.c;h=f25014afda408002ee1ec1027a0dd7a6025eca61;hb=HEAD#l639 >>>> >>>> and I also need to do the same for rtas in pegasos2 for which I'm just using >>>> ldl_be_phys for now but I wonder if we really need 3 ways to do the same or >>>> the rtas_ld/_st could be made more generic and reused here? >>> >>> For your rtas implementation you could definitely re-use them. For >>> the client call I'm a bit less confident, but if the in-guest-memory >>> structures are really the same, then it would make sense. >> >> The memory structure seems very similar to me, the only difference is >> calling the first field service in VOF instead of token in RTAS. Both are >> just an array of big endian unit32_t with token, nargs, nret at the front >> followed by args and rets. Since these rtas_ld/st are defined in spapr.h I >> did not bother to split them off, so for pegasos2 rtas I'm just using the >> ldl_be_* functions directly for which these are a shorthand for. If these >> were split off for sharing between spapr rtas and VOF I may be able to reuse >> them as well but it's not that important so just mentioned it as a possible >> later clean up. > > Ok, sounds reasonable to re-use them then, though maybe add an aliased > name for clarity ofci_{ld,st}(), maybe? (for "Open Firmware Client > Interface") I'll wait for what Alexey decides to do in the next VOF patch version and if I can reuse that (I could if these were defined in vof.h). I don't want to come up with yet another abstraction to ldl_be_* which does not seem to make it more clear than using the actual functions for guest memory access which is what we're doing while getting the hypercall args so I think either using ldl_be_* directly or reusing already existing rfas_ls/_st would make sense but adding similar funcs with another name just makes it more confusing. Regards, BALATON Zoltan
On Fri, 4 Jun 2021, BALATON Zoltan wrote: > On Fri, 4 Jun 2021, David Gibson wrote: >> On Sun, May 30, 2021 at 07:33:01PM +0200, BALATON Zoltan wrote: >>> Hello, >>> >>> Two more problems I've found while testing with pegasos2 but I'm not sure >>> how to fix them: >>> >>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote: >>>> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c >>>> new file mode 100644 >>>> index 000000000000..a283b7d251a7 >>>> --- /dev/null >>>> +++ b/hw/ppc/vof.c >>>> @@ -0,0 +1,1021 @@ >>>> +/* >>>> + * QEMU PowerPC Virtual Open Firmware. >>>> + * >>>> + * This implements client interface from OpenFirmware IEEE1275 on the >>>> QEMU >>>> + * side to leave only a very basic firmware in the VM. >>>> + * >>>> + * Copyright (c) 2021 IBM Corporation. >>>> + * >>>> + * SPDX-License-Identifier: GPL-2.0-or-later >>>> + */ >>>> + >>>> +#include "qemu/osdep.h" >>>> +#include "qemu-common.h" >>>> +#include "qemu/timer.h" >>>> +#include "qemu/range.h" >>>> +#include "qemu/units.h" >>>> +#include "qapi/error.h" >>>> +#include <sys/ioctl.h> >>>> +#include "exec/ram_addr.h" >>>> +#include "exec/address-spaces.h" >>>> +#include "hw/ppc/vof.h" >>>> +#include "hw/ppc/fdt.h" >>>> +#include "sysemu/runstate.h" >>>> +#include "qom/qom-qobject.h" >>>> +#include "trace.h" >>>> + >>>> +#include <libfdt.h> >>>> + >>>> +/* >>>> + * OF 1275 "nextprop" description suggests is it 32 bytes max but >>>> + * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 chars >>>> long. >>>> + */ >>>> +#define OF_PROPNAME_LEN_MAX 64 >>>> + >>>> +#define VOF_MAX_PATH 256 >>>> +#define VOF_MAX_SETPROPLEN 2048 >>>> +#define VOF_MAX_METHODLEN 256 >>>> +#define VOF_MAX_FORTHCODE 256 >>>> +#define VOF_VTY_BUF_SIZE 256 >>>> + >>>> +typedef struct { >>>> + uint64_t start; >>>> + uint64_t size; >>>> +} OfClaimed; >>>> + >>>> +typedef struct { >>>> + char *path; /* the path used to open the instance */ >>>> + uint32_t phandle; >>>> +} OfInstance; >>>> + >>>> +#define VOF_MEM_READ(pa, buf, size) \ >>>> + address_space_read_full(&address_space_memory, \ >>>> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >>>> +#define VOF_MEM_WRITE(pa, buf, size) \ >>>> + address_space_write(&address_space_memory, \ >>>> + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) >>>> + >>>> +static int readstr(hwaddr pa, char *buf, int size) >>>> +{ >>>> + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { >>>> + return -1; >>>> + } >>>> + if (strnlen(buf, size) == size) { >>>> + buf[size - 1] = '\0'; >>>> + trace_vof_error_str_truncated(buf, size); >>>> + return -1; >>>> + } >>>> + return 0; >>>> +} >>>> + >>>> +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, >>>> + const char *s1, unsigned nargscheck, unsigned >>>> nretcheck) >>>> +{ >>>> + if (strcmp(s, s1)) { >>>> + return false; >>>> + } >>>> + if ((nargscheck && (nargs != nargscheck)) || >>>> + (nretcheck && (nret != nretcheck))) { >>>> + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); >>>> + return false; >>>> + } >>>> + >>>> + return true; >>>> +} >>>> + >>>> +static void prop_format(char *tval, int tlen, const void *prop, int len) >>>> +{ >>>> + int i; >>>> + const unsigned char *c; >>>> + char *t; >>>> + const char bin[] = "..."; >>>> + >>>> + for (i = 0, c = prop; i < len; ++i, ++c) { >>>> + if (*c == '\0' && i == len - 1) { >>>> + strncpy(tval, prop, tlen - 1); >>>> + return; >>>> + } >>>> + if (*c < 0x20 || *c >= 0x80) { >>>> + break; >>>> + } >>>> + } >>>> + >>>> + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { >>>> + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { >>>> + strcpy(t, bin); >>>> + return; >>>> + } >>>> + if (i && i % 4 == 0 && i != len - 1) { >>>> + strcat(t, " "); >>>> + ++t; >>>> + } >>>> + t += sprintf(t, "%02X", *c & 0xFF); >>>> + } >>>> +} >>>> + >>>> +static int get_path(const void *fdt, int offset, char *buf, int len) >>>> +{ >>>> + int ret; >>>> + >>>> + ret = fdt_get_path(fdt, offset, buf, len - 1); >>>> + if (ret < 0) { >>>> + return ret; >>>> + } >>>> + >>>> + buf[len - 1] = '\0'; >>>> + >>>> + return strlen(buf) + 1; >>>> +} >>>> + >>>> +static int phandle_to_path(const void *fdt, uint32_t ph, char *buf, int >>>> len) >>>> +{ >>>> + int ret; >>>> + >>>> + ret = fdt_node_offset_by_phandle(fdt, ph); >>>> + if (ret < 0) { >>>> + return ret; >>>> + } >>>> + >>>> + return get_path(fdt, ret, buf, len); >>>> +} >>>> + >>>> +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) >>>> +{ >>>> + char fullnode[VOF_MAX_PATH]; >>>> + uint32_t ret = -1; >>>> + int offset; >>>> + >>>> + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { >>>> + return (uint32_t) ret; >>>> + } >>>> + >>>> + offset = fdt_path_offset(fdt, fullnode); >>>> + if (offset >= 0) { >>>> + ret = fdt_get_phandle(fdt, offset); >>>> + } >>>> + trace_vof_finddevice(fullnode, ret); >>>> + return (uint32_t) ret; >>>> +} >>> >>> The Linux init function that runs on pegasos2 here: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.234#n2658 >>> >>> calls finddevice once with isa@c and next with isa@C (small and capital C) >>> both of which works with the board firmware but with vof the comparison is >>> case sensitive and one of these fails so I can't make it work. I don't >>> know >>> if this is a problem in libfdt or the vof_finddevice above should do >>> something else to get case insensitive comparison. >> >> This is kind of a subtle incompatibility between the traditional OF >> world and the flat tree world. In traditional OF, the unit address >> (bit after the @) doesn't exist as a string. Instead when you do the >> finddevice it will parse that address and compare it against the 'reg' >> properties for each of the relevant nodes. Since that's an integer >> comparison, case doesn't enter into it. >> >> But, how to parse (and write) addresses depends on the bus, so the >> firmware has to understand each bus type and act accordingly. That >> doesn't really work in the world of minimal firmwares dor the flat >> tree. So instead, we just incorporate a pre-formatted unit address in >> the flat tree directly. Most of the time that works fine, but there >> are some edge cases like the one you've hit. > > OK, thanks for the clarification, as said in previous message I think doing > case insesitive comparison just in the address part should work then we don't > have to implement reg parsing in VOF. > >>>> +static const void *getprop(const void *fdt, int nodeoff, const char >>>> *propname, >>>> + int *proplen, bool *write0) >>>> +{ >>>> + const char *unit, *prop; >>>> + >>>> + /* >>>> + * The "name" property is not actually stored as a property in the >>>> FDT, >>>> + * we emulate it by returning a pointer to the node's name and >>>> adjust >>>> + * proplen to include only the name but not the unit. >>>> + */ >>>> + if (strcmp(propname, "name") == 0) { >>>> + prop = fdt_get_name(fdt, nodeoff, proplen); >>>> + if (!prop) { >>>> + *proplen = 0; >>>> + return NULL; >>>> + } >>>> + >>>> + unit = memchr(prop, '@', *proplen); >>>> + if (unit) { >>>> + *proplen = unit - prop; >>>> + } >>>> + *proplen += 1; >>>> + >>>> + /* >>>> + * Since it might be cut at "@" and there will be no trailing >>>> zero >>>> + * in the prop buffer, tell the caller to write zero at the end. >>>> + */ >>>> + if (write0) { >>>> + *write0 = true; >>>> + } >>>> + return prop; >>>> + } >>>> + >>>> + if (write0) { >>>> + *write0 = false; >>>> + } >>>> + return fdt_getprop(fdt, nodeoff, propname, proplen); >>>> +} >>> >>> MorphOS checks the name property of the root node ("/") to decide what >>> platform it runs on so we may need to be able to set this property on / >>> where it should return "bplan,Pegasos2", therefore the above maybe should >>> do >>> getprop first and only generate name property if it's not set (or at least >>> check if we're on the root node and allow setting name property there). >>> (On >>> Macs the root node is named "device-tree" and this was before found to be >>> needed for MorphOS.) >> >> Ah. Hrm. Have to think about what to do about that. > > This is easy to fix, this seems to allow setting a name property or return a > default: > >> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > index b47bbd509d..746842593e 100644 > --- a/hw/ppc/vof.c > +++ b/hw/ppc/vof.c > @@ -163,14 +163,14 @@ static uint32_t vof_finddevice(const void *fdt, > uint32_t nodeaddr) > static const void *getprop(const void *fdt, int nodeoff, const char > *propname, > int *proplen, bool *write0) > { > - const char *unit, *prop; > + const char *unit, *prop = fdt_getprop(fdt, nodeoff, propname, proplen); > > /* > * The "name" property is not actually stored as a property in the FDT, > * we emulate it by returning a pointer to the node's name and adjust > * proplen to include only the name but not the unit. > */ > - if (strcmp(propname, "name") == 0) { > + if (!prop && strcmp(propname, "name") == 0) { > prop = fdt_get_name(fdt, nodeoff, proplen); > if (!prop) { > *proplen = 0; > @@ -196,7 +196,7 @@ static const void *getprop(const void *fdt, int nodeoff, > const char *propname, > if (write0) { > *write0 = false; > } > - return fdt_getprop(fdt, nodeoff, propname, proplen); > + return prop; > } > > This allows adding a name property to "/" different from the default but this > does not yet fix MorphOS booting with VOF on pegasos2. I think it tries to > query name on / and check if it's called "device-tree" in which case it > assumes Mac hardware otherwise it goes with pegasos2 so even if we return > nothing for name it would not matter in this case as we don't use VOF on Mac. > If we wanted that then this would become a problem so it could be fixed now > in advance just in case other guests may need it. > >>> Other than the above two problems, I've found that getting the device tree >>> from vof returns it in reverse order compared to the board firmware if I >>> add >>> it the expected order. This may or may not be a problem but to avoid it I >>> can build the tree in reverse order then it comes out right so unless >>> there's an easy fix this should not cause a problem but may worth a >>> comment >>> somewhere. >> >> The order of things in the device tree *should* never matter. If it >> does, that's definitely a client bug... but of course that doesn't >> necessarily mean we won't have to work around it in practice. > > I don't know if it matters or not but having the device tree in the same > order as the firmware ROM helps with comparing it for debugging but I've > found I can solve this by building the tree in reverse order so no changes to > VOF is needed for this, just thought adding a comment somewhere may clarify > it but it's not really a problem. > > I still don't know what's MorphOS is missing, I've tried adding almost all > misssing properties, checked what hardware is init by the firmware and tried > to do the same in board reset code and even after that MorphOS still takes a > different route with VOF and crashes but boots with the board firmware. I'm > now thinking it may be either different memory organisation or the missing > name properties that are not returned by nextprop in VOF so they are only > appearing when explicitely queried whereas with the board firmware they are > present as properties. With the above patch I could explicitely set it on > nodes and test if that makes a difference. > > I got to this because adding more missing props or init more devices did not > make a difference so I'm guessing it may be something else then and the only > difference I can see compared to board firmware are the different memory > ranges in claimed (VOF puts itself to 0 for example); and the missing name > and additional phandle props in the device tree. MorphOS copies the whole > device tree on startup then later it uses this copy of the device tree after > shutting down OF with quiesce. I can imagine it may use some name props like > that on the cpu node without checking assuming it's always there and if we're > missing that it may cause a NULL dereference. I have no better idea what else > could be missing so I'll test this next. If it helps I can try to come up > with a patch to VOF to return these name props or allow setting them as > above. Looks like it's the missing name props after all. Adding it to /memory and cpu makes it go further but probably needs more as it then does not find the boot device. Comparing with the device tree created by board firmware not all nodes seem to have a name property so maybe the board firmware also adds this to some nodes explicitely overriding the default so we should do the same in VOF for which the above patch is enough. Feel free to squash it into the next vof patch version or I can submit it afterwards as a separate patch whichever you prefer. Then I'll need to find out what other name props I need to set in board code for MorphOS. Linux does not seem to need any of the name props and boots without them. What I have now is good enough for Linux but if I can also fix MorphOS that would make it simpler to use because then one does not need the non-distributable firmware ROM which is the point of trying to use VOF here. Regards, BALATON Zoltan
On Fri, 4 Jun 2021, David Gibson wrote: > On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote: >> On Wed, 2 Jun 2021, David Gibson wrote: >>> On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: >>>> On Thu, 27 May 2021, David Gibson wrote: >>>>> On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: >>>>>> On Tue, 25 May 2021, David Gibson wrote: >>>>>>> On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: >>>>>>>> On Mon, 24 May 2021, David Gibson wrote: >>>>>>>>> On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >>>>>>>>>> On Sun, 23 May 2021, BALATON Zoltan wrote: >>>>>>>>>> and would using sc 1 for hypercalls on pegasos2 cause other >>>>>>>>>> problems later even if the assert could be removed? >>>>>>>>> >>>>>>>>> At least in the short term, I think you probably can remove the >>>>>>>>> assert. In your case the 'sc 1' calls aren't truly to a hypervisor, >>>>>>>>> but a special case escape to qemu for the firmware emulation. I think >>>>>>>>> it's unlikely to cause problems later, because nothing on a 32-bit >>>>>>>>> system should be attempting an 'sc 1'. The only thing I can think of >>>>>>>>> that would fail is some test case which explicitly verified that 'sc >>>>>>>>> 1' triggered a 0x700 (SIGILL from userspace). >>>>>>>> >>>>>>>> OK so the assert should check if the CPU has an HV bit. I think there was a >>>>>>>> #detine for that somewhere that I can add to the assert then I can try that. >>>>>>>> What I wasn't sure about is that sc 1 would conflict with the guest's usage >>>>>>>> of normal sc calls or are these going through different paths and only sc 1 >>>>>>>> will trigger vhyp callback not affecting notmal sc calls? >>>>>>> >>>>>>> The vhyp shouldn't affect normal system calls, 'sc 1' is specifically >>>>>>> for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the >>>>>>> vhyp only intercepts the hypercall version (after all Linux on PAPR >>>>>>> certainly uses its own system calls, and hypercalls are active for the >>>>>>> lifetime of the guest there). >>>>>>> >>>>>>>> (Or if this causes >>>>>>>> an otherwise unnecessary VM exit on KVM even when it works then maybe >>>>>>>> looking for a different way in the future might be needed. >>>>>>> >>>>>>> What you're doing here won't work with KVM as it stands. There are >>>>>>> basically two paths into the vhyp hypercall path: 1) from TCG, if we >>>>>>> interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get >>>>>>> a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. >>>>>>> >>>>>>> The second path is specific to the PAPR (ppc64) implementation of KVM, >>>>>>> and will not work for a non-PAPR platform without substantial >>>>>>> modification of the KVM code. >>>>>> >>>>>> OK so then at that point when we try KVM we'll need to look at alternative >>>>>> ways, I think MOL OSI worked with KVM at least in MOL but will probably make >>>>>> all syscalls exit KVM but since we'll probably need to use KVM PR it will >>>>>> exit anyway. For now I keep this vhyp as it does not run with KVM for other >>>>>> reasons yet so that's another area to clean up so as a proof of concept >>>>>> first version of using VOF vhyp will do. >>>>> >>>>> Eh, since you'll need to modify KVM anyway, it probably makes just as >>>>> much sense to modify it to catch the 'sc 1' as MoL's magic thingy. >>>> >>>> I'm not sure how KVM works for this case so I also don't know why and what >>>> would need to be modified. I think we'll only have KVM PR working as newer >>>> POWER CPUs having HV (besides being rare among potential users) are probably >>>> too different to run the OSes that expect at most a G4 on pegasos2 so likely >>>> it won't work with KVM HV. >>> >>> Oh, it definitely won't work with KVM HV. >>> >>>> If we have KVM PR doesn't sc already trap so we >>>> could add MOL OSI without further modification to KVM itself only needing >>>> change in QEMU? >>> >>> Uh... I guess so? >>> >>>> I also hope that MOL OSI could be useful for porting some >>>> paravirt drivers from MOL for running Mac OS X on Mac emulation but I don't >>>> know about that for sure so I'm open to any other solution too. >>> >>> Maybe. I never know much about MOL to begin with, and anything I did >>> know was a decade or more ago so I've probably forgotten. >> >> That may still be more than what I know about it since I never had any >> knowledge about PPC KVM and don't have any PPC hardware to test with so I'm >> mostly guessing. (I could test with KVM emulated in QEMU and I did set up an >> environment for that but that's a bit slow and inconvenient so I'd leave KVM >> support to those interested and have more knowledge and hardware for it.) > > Sounds like a problem for someone else another time, then. So now that it works on TCG with vhyp I tried what it would do on KVM PR with the sc 1 but I could only test that on QEMU itself running in a Linux guest. First I've hit missing this callback: https://git.qemu.org/?p=qemu.git;a=blob;f=target/ppc/kvm.c;h=104a308abb5700b2fe075397271f314d7f607543;hb=HEAD#l856 that I can fix by providing a callback in pegasos2.c that does what the else clause would do returning POWERPC_CPU(current_cpu)->env.spr[SPR_SDR1] (I guess that's the correct thing to do if it works without vhyp). After getting past this, the host QEMU crashed on the first sc 1 call with this error: qemu: fatal: Trying to deliver HV exception (MSR) 8 with no HV support NIP 0000000000000148 LR 0000000000000590 CTR 0000000000000000 XER 0000000000000000 CPU#0 MSR 000000000000d032 HID0 0000000060000000 HF 00004012 iidx 0 didx 0 TB 00000203 876006644638 DECR 422427 GPR00 0000000000000680 000000000000fe90 0000000000008e00 000000000000f005 GPR04 000000000000fe9c 0000000000000001 0000000000000e78 0000000000000000 GPR08 000000000000fe98 000000000000fe9c 0000000000000001 0000000000000000 GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28 0000000000000000 0000000000000000 0000000000008e9c 000000000000fe90 CR 20000000 [ E - - - - - - - ] RES ffffffffffffffff FPR00 bff0000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 3ff553f7ced91687 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 0000000082004000 SRR0 00000000000001d4 SRR1 300000000000d032 PVR 00000000003c0301 VRSAVE 00000000ffffffff SPRG0 000000003fe00000 SPRG1 c00000000ff60000 SPRG2 c00000000ff60000 SPRG3 0000000000000000 SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000 SDR1 000000003f000006 DAR f00000000090abf0 DSISR 0000000042000000 Aborted (core dumped) (vof.bin looks like this: 100: 3c 40 00 00 lis r2,0 104: 60 42 8e 00 ori r2,r2,36352 108: 48 00 00 cc b 0x1d4 10c: 3c 40 00 00 lis r2,0 110: 60 42 8e 00 ori r2,r2,36352 114: 94 21 ff 90 stwu r1,-112(r1) 118: 93 e1 00 68 stw r31,104(r1) 11c: 7f e8 02 a6 mflr r31 120: 48 00 02 8d bl 0x3ac 124: 60 00 00 00 nop 128: 7f e8 03 a6 mtlr r31 12c: 83 e1 00 68 lwz r31,104(r1) 130: 38 21 00 70 addi r1,r1,112 134: 4e 80 00 20 blr 138: 7c 64 1b 78 mr r4,r3 13c: 3c 60 00 00 lis r3,0 140: 60 63 f0 05 ori r3,r3,61445 144: 44 00 00 22 sc 1 148: 4e 80 00 20 blr so I think it's the sc 1 at 0x144) The error is coming from here: https://git.qemu.org/?p=qemu.git;a=blob;f=target/ppc/excp_helper.c;h=fd147e2a37662456d30f7ab74b23bfb036260ced;hb=HEAD#l830 What does this mean? What would a real CPU do with this and where it could be catched to use as hypercall method on CPUs without HV or what else should we do if we wanted this to work with KVM PR too in the future? Regards, BALATON Zoltan >>>> For now I'm >>>> going with vhyp which is enough fot testing with TCG and if somebody wants >>>> KVM they could use he original firmware for now so this could be improved in >>>> a later version unless a simple solution is found before the freeze for 6.1. >>>> If we're in KVM PR what happens for sc 1 could that be used too so maybe >>>> what we have now could work? >>> >>> Note that if you do go down the MOL path it wouldn't be that complex >>> to make a "vMOL" interface so you can use the same mechanism for KVM >>> and TCG. >> >> Not sure what you mean by VMOL. Is it modifying MOL to use sc 1 like VOF >> instead of its OSI way for hypercalls? > > No, I mean on the qemu side adding an optional hook which will > intercept sc 0 instructions with the MOL magic register values and > redirect them to a machine registered callback, rather than emulating > the CPU's behaviour of jumping to the system call vector in guest > space. > > Basically an equivalent of vhyp, but for MOL magic syscalls, instead > of hypercalls. > >> That would lose the advantage of >> being able to reuse MOL guest drivers without modification (which might be >> useful for running OS X guest on Mac emulation) so if we can't use vhyp then >> maybe using OSI would be the next choice for this reason but for now vhyp >> seems to be working for what I could test so unless somebody here sees a >> problem with it and has a better idea I'm going with vhyp for now just >> because that's what VOF uses and I don't want to modify VOF to reuse it as >> it is so I don't need to maintain a separate version and also get any >> enhancements without further need to sync with spapr VOF. >> >> I've found this document about possible hypercall interfaces on KVM (see >> Hypercall ABIs at the end): >> >> https://www.kernel.org/doc/html/latest/virt/kvm/ppc-pv.html >> >> Having both ePAPR (1.) and PAPR (2.) hypercalls is a bit confusing. Does >> vhyp correspond to 2. PAPR? > > Yes. > >> The ePAPR (1.) seems to be preferred by KVM and >> MOL OSI supported for compatibility. > > That document looks pretty out of date. Most of it is only discussing > KVM PR, which is now barely maintained. KVM HV only works with PAPR > hypercalls. > >> So if we need something else instead of >> 2. PAPR hypercalls there seems to be two options: ePAPR and MOL OSI which >> should work with KVM but then I'm not sure how to handle those on TCG.
On Fri, Jun 04, 2021 at 03:27:12PM +0200, BALATON Zoltan wrote: > On Fri, 4 Jun 2021, David Gibson wrote: > > On Tue, Jun 01, 2021 at 04:12:44PM +0200, BALATON Zoltan wrote: > > > On Tue, 1 Jun 2021, Alexey Kardashevskiy wrote: > > > > On 31/05/2021 23:07, BALATON Zoltan wrote: > > > > > On Sun, 30 May 2021, BALATON Zoltan wrote: > > > > > > On Thu, 20 May 2021, Alexey Kardashevskiy wrote: > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > > > > > > > new file mode 100644 > > > > > > > index 000000000000..a283b7d251a7 > > > > > > > --- /dev/null > > > > > > > +++ b/hw/ppc/vof.c > > > > > > > @@ -0,0 +1,1021 @@ > > > > > > > +/* > > > > > > > + * QEMU PowerPC Virtual Open Firmware. > > > > > > > + * > > > > > > > + * This implements client interface from OpenFirmware > > > > > > > IEEE1275 on the QEMU > > > > > > > + * side to leave only a very basic firmware in the VM. > > > > > > > + * > > > > > > > + * Copyright (c) 2021 IBM Corporation. > > > > > > > + * > > > > > > > + * SPDX-License-Identifier: GPL-2.0-or-later > > > > > > > + */ > > > > > > > + > > > > > > > +#include "qemu/osdep.h" > > > > > > > +#include "qemu-common.h" > > > > > > > +#include "qemu/timer.h" > > > > > > > +#include "qemu/range.h" > > > > > > > +#include "qemu/units.h" > > > > > > > +#include "qapi/error.h" > > > > > > > +#include <sys/ioctl.h> > > > > > > > +#include "exec/ram_addr.h" > > > > > > > +#include "exec/address-spaces.h" > > > > > > > +#include "hw/ppc/vof.h" > > > > > > > +#include "hw/ppc/fdt.h" > > > > > > > +#include "sysemu/runstate.h" > > > > > > > +#include "qom/qom-qobject.h" > > > > > > > +#include "trace.h" > > > > > > > + > > > > > > > +#include <libfdt.h> > > > > > > > + > > > > > > > +/* > > > > > > > + * OF 1275 "nextprop" description suggests is it 32 bytes max but > > > > > > > + * LoPAPR defines "ibm,query-interrupt-source-number" which > > > > > > > is 33 chars long. > > > > > > > + */ > > > > > > > +#define OF_PROPNAME_LEN_MAX 64 > > > > > > > + > > > > > > > +#define VOF_MAX_PATH 256 > > > > > > > +#define VOF_MAX_SETPROPLEN 2048 > > > > > > > +#define VOF_MAX_METHODLEN 256 > > > > > > > +#define VOF_MAX_FORTHCODE 256 > > > > > > > +#define VOF_VTY_BUF_SIZE 256 > > > > > > > + > > > > > > > +typedef struct { > > > > > > > + uint64_t start; > > > > > > > + uint64_t size; > > > > > > > +} OfClaimed; > > > > > > > + > > > > > > > +typedef struct { > > > > > > > + char *path; /* the path used to open the instance */ > > > > > > > + uint32_t phandle; > > > > > > > +} OfInstance; > > > > > > > + > > > > > > > +#define VOF_MEM_READ(pa, buf, size) \ > > > > > > > + address_space_read_full(&address_space_memory, \ > > > > > > > + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) > > > > > > > +#define VOF_MEM_WRITE(pa, buf, size) \ > > > > > > > + address_space_write(&address_space_memory, \ > > > > > > > + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) > > > > > > > + > > > > > > > +static int readstr(hwaddr pa, char *buf, int size) > > > > > > > +{ > > > > > > > + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { > > > > > > > + return -1; > > > > > > > + } > > > > > > > + if (strnlen(buf, size) == size) { > > > > > > > + buf[size - 1] = '\0'; > > > > > > > + trace_vof_error_str_truncated(buf, size); > > > > > > > + return -1; > > > > > > > + } > > > > > > > + return 0; > > > > > > > +} > > > > > > > + > > > > > > > +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, > > > > > > > + const char *s1, unsigned nargscheck, > > > > > > > unsigned nretcheck) > > > > > > > +{ > > > > > > > + if (strcmp(s, s1)) { > > > > > > > + return false; > > > > > > > + } > > > > > > > + if ((nargscheck && (nargs != nargscheck)) || > > > > > > > + (nretcheck && (nret != nretcheck))) { > > > > > > > + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); > > > > > > > + return false; > > > > > > > + } > > > > > > > + > > > > > > > + return true; > > > > > > > +} > > > > > > > + > > > > > > > +static void prop_format(char *tval, int tlen, const void *prop, int len) > > > > > > > +{ > > > > > > > + int i; > > > > > > > + const unsigned char *c; > > > > > > > + char *t; > > > > > > > + const char bin[] = "..."; > > > > > > > + > > > > > > > + for (i = 0, c = prop; i < len; ++i, ++c) { > > > > > > > + if (*c == '\0' && i == len - 1) { > > > > > > > + strncpy(tval, prop, tlen - 1); > > > > > > > + return; > > > > > > > + } > > > > > > > + if (*c < 0x20 || *c >= 0x80) { > > > > > > > + break; > > > > > > > + } > > > > > > > + } > > > > > > > + > > > > > > > + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { > > > > > > > + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { > > > > > > > + strcpy(t, bin); > > > > > > > + return; > > > > > > > + } > > > > > > > + if (i && i % 4 == 0 && i != len - 1) { > > > > > > > + strcat(t, " "); > > > > > > > + ++t; > > > > > > > + } > > > > > > > + t += sprintf(t, "%02X", *c & 0xFF); > > > > > > > + } > > > > > > > +} > > > > > > > + > > > > > > > +static int get_path(const void *fdt, int offset, char *buf, int len) > > > > > > > +{ > > > > > > > + int ret; > > > > > > > + > > > > > > > + ret = fdt_get_path(fdt, offset, buf, len - 1); > > > > > > > + if (ret < 0) { > > > > > > > + return ret; > > > > > > > + } > > > > > > > + > > > > > > > + buf[len - 1] = '\0'; > > > > > > > + > > > > > > > + return strlen(buf) + 1; > > > > > > > +} > > > > > > > + > > > > > > > +static int phandle_to_path(const void *fdt, uint32_t ph, > > > > > > > char *buf, int len) > > > > > > > +{ > > > > > > > + int ret; > > > > > > > + > > > > > > > + ret = fdt_node_offset_by_phandle(fdt, ph); > > > > > > > + if (ret < 0) { > > > > > > > + return ret; > > > > > > > + } > > > > > > > + > > > > > > > + return get_path(fdt, ret, buf, len); > > > > > > > +} > > > > > > > + > > > > > > > +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) > > > > > > > +{ > > > > > > > + char fullnode[VOF_MAX_PATH]; > > > > > > > + uint32_t ret = -1; > > > > > > > + int offset; > > > > > > > + > > > > > > > + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { > > > > > > > + return (uint32_t) ret; > > > > > > > + } > > > > > > > + > > > > > > > + offset = fdt_path_offset(fdt, fullnode); > > > > > > > + if (offset >= 0) { > > > > > > > + ret = fdt_get_phandle(fdt, offset); > > > > > > > + } > > > > > > > + trace_vof_finddevice(fullnode, ret); > > > > > > > + return (uint32_t) ret; > > > > > > > +} > > > > > > > > > > > > The Linux init function that runs on pegasos2 here: > > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/powerpc/kernel/prom_init.c?h=v4.14.234#n2658 > > > > > > > > > > > > calls finddevice once with isa@c and next with isa@C (small and > > > > > > capital C) both of which works with the board firmware but with > > > > > > vof the comparison is case sensitive and one of these fails so I > > > > > > can't make it work. I don't know if this is a problem in libfdt > > > > > > or the vof_finddevice above should do something else to get case > > > > > > insensitive comparison. > > > > > > > > > > This fixes the issue with Linux but I'm not sure if there's any > > > > > better solution or would it break anything else. > > > > > > > > The bit after "@" is an address and needs to be case insensitive and > > > > I'll fix this indeed. I'm not so sure about the part before "@", I > > > > cannot imagine what could break if I made search insensitive to case. Hm > > > > :-/ > > > > > > Fixing the match in the address part is probably enough as the name sent by > > > guests is probably always lower case > > > > I'm confused, I thought you just said that it looked for both isa@c > > and isa@C, which seems to contradict guests always using lower case. > > I mean the part before the @ sign (that is the name part, "isa" above) is > always lower case. I haven't seen guests trying to query that with other > than lower case Ah, I see. Yes, I think you can count on that, because I believe even in traditional OF the part before the @ *is* case-sensitive. At least there are certainly conventions about how the vendor is capitalized, so I assume it is. > but the part after @ can be different even in the same guest > code just a few lines apart as in the Linux kernel. So fixing the comparison > to e.g. do toupper in the address part after @ should work I think even if > we continue to do case sensitive comparison in the name part. Alexey said > he'll fix that so there's no problem. Yeah, that will probably work fine in practice. It's not technically correct in all cases, because how you're supposed to do the comparison depends on the bus type.
On Fri, Jun 04, 2021 at 03:50:28PM +0200, BALATON Zoltan wrote: > On Fri, 4 Jun 2021, David Gibson wrote: > > On Sun, May 30, 2021 at 07:33:01PM +0200, BALATON Zoltan wrote: [snip] > > > MorphOS checks the name property of the root node ("/") to decide what > > > platform it runs on so we may need to be able to set this property on / > > > where it should return "bplan,Pegasos2", therefore the above maybe should do > > > getprop first and only generate name property if it's not set (or at least > > > check if we're on the root node and allow setting name property there). (On > > > Macs the root node is named "device-tree" and this was before found to be > > > needed for MorphOS.) > > > > Ah. Hrm. Have to think about what to do about that. > > This is easy to fix, this seems to allow setting a name property or return a > default: > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c > index b47bbd509d..746842593e 100644 > --- a/hw/ppc/vof.c > +++ b/hw/ppc/vof.c > @@ -163,14 +163,14 @@ static uint32_t vof_finddevice(const void *fdt, > uint32_t nodeaddr) > static const void *getprop(const void *fdt, int nodeoff, const char *propname, > int *proplen, bool *write0) > { > - const char *unit, *prop; > + const char *unit, *prop = fdt_getprop(fdt, nodeoff, propname, proplen); > > /* > * The "name" property is not actually stored as a property in the FDT, > * we emulate it by returning a pointer to the node's name and adjust > * proplen to include only the name but not the unit. > */ > - if (strcmp(propname, "name") == 0) { > + if (!prop && strcmp(propname, "name") == 0) { > prop = fdt_get_name(fdt, nodeoff, proplen); > if (!prop) { > *proplen = 0; > @@ -196,7 +196,7 @@ static const void *getprop(const void *fdt, int nodeoff, const char *propname, > if (write0) { > *write0 = false; > } > - return fdt_getprop(fdt, nodeoff, propname, proplen); > + return prop; > } Kind of a hack, but it'll do for now. > This allows adding a name property to "/" different from the default but > this does not yet fix MorphOS booting with VOF on pegasos2. I think it tries > to query name on / and check if it's called "device-tree" in which case it > assumes Mac hardware otherwise it goes with pegasos2 so even if we return > nothing for name it would not matter in this case as we don't use VOF on > Mac. If we wanted that then this would become a problem so it could be fixed > now in advance just in case other guests may need it. > > > > Other than the above two problems, I've found that getting the device tree > > > from vof returns it in reverse order compared to the board firmware if I add > > > it the expected order. This may or may not be a problem but to avoid it I > > > can build the tree in reverse order then it comes out right so unless > > > there's an easy fix this should not cause a problem but may worth a comment > > > somewhere. > > > > The order of things in the device tree *should* never matter. If it > > does, that's definitely a client bug... but of course that doesn't > > necessarily mean we won't have to work around it in practice. > > I don't know if it matters or not but having the device tree in the same > order as the firmware ROM helps with comparing it for debugging but I've > found I can solve this by building the tree in reverse order so no changes > to VOF is needed for this, just thought adding a comment somewhere may > clarify it but it's not really a problem. > > I still don't know what's MorphOS is missing, I've tried adding almost all > misssing properties, checked what hardware is init by the firmware and tried > to do the same in board reset code and even after that MorphOS still takes a > different route with VOF and crashes but boots with the board firmware. I'm > now thinking it may be either different memory organisation or the missing > name properties that are not returned by nextprop in VOF so they are only > appearing when explicitely queried whereas with the board firmware they are > present as properties. With the above patch I could explicitely set it on > nodes and test if that makes a difference. > > I got to this because adding more missing props or init more devices did not > make a difference so I'm guessing it may be something else then and the only > difference I can see compared to board firmware are the different memory > ranges in claimed (VOF puts itself to 0 for example); and the missing name > and additional phandle props in the device tree. MorphOS copies the whole > device tree on startup then later it uses this copy of the device tree after > shutting down OF with quiesce. I can imagine it may use some name props like > that on the cpu node without checking assuming it's always there and if > we're missing that it may cause a NULL dereference. I have no better idea > what else could be missing so I'll test this next. If it helps I can try to > come up with a patch to VOF to return these name props or allow setting them > as above. > > Regards, > BALATON Zoltan >
On Fri, Jun 04, 2021 at 03:59:22PM +0200, BALATON Zoltan wrote: > On Fri, 4 Jun 2021, David Gibson wrote: > > On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote: > > > On Wed, 2 Jun 2021, David Gibson wrote: > > > > On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: > > > > > On Thu, 27 May 2021, David Gibson wrote: > > > > > > On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: > > > > > > > On Tue, 25 May 2021, David Gibson wrote: > > > > > > > > On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: > > > > > > > > > On Mon, 24 May 2021, David Gibson wrote: > > > > > > > > > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: > > > > > > > > > > > On Sun, 23 May 2021, BALATON Zoltan wrote: > > > > > > > > > > > > On Sun, 23 May 2021, Alexey Kardashevskiy wrote: > > > > > > > > > > > > > One thing to note about PCI is that normally I think the client > > > > > > > > > > > > > expects the firmware to do PCI probing and SLOF does it. But VOF > > > > > > > > > > > > > does not and Linux scans PCI bus(es) itself. Might be a problem for > > > > > > > > > > > > > you kernel. > > > > > > > > > > > > > > > > > > > > > > > > I'm not sure what info does MorphOS get from the device tree and what it > > > > > > > > > > > > probes itself but I think it may at least need device ids and info about > > > > > > > > > > > > the PCI bus to be able to access the config regs, after that it should > > > > > > > > > > > > set the devices up hopefully. I could add these from the board code to > > > > > > > > > > > > device tree so VOF does not need to do anything about it. However I'm > > > > > > > > > > > > not getting to that point yet because it crashes on something that it's > > > > > > > > > > > > missing and couldn't yet find out what is that. > > > > > > > > > > > > > > > > > > > > > > > > I'd like to get Linux working now as that would be enough to test this > > > > > > > > > > > > and then if for MorphOS we still need a ROM it's not a problem if at > > > > > > > > > > > > least we can boot Linux without the original firmware. But I can't make > > > > > > > > > > > > Linux open a serial console and I don't know what it needs for that. Do > > > > > > > > > > > > you happen to know? I've looked at the sources in Linux/arch/powerpc but > > > > > > > > > > > > not sure how it would find and open a serial port on pegasos2. It seems > > > > > > > > > > > > to work with the board firmware and now I can get it to boot with VOF > > > > > > > > > > > > but then it does not open serial so it probably needs something in the > > > > > > > > > > > > device tree or expects the firmware to set something up that we should > > > > > > > > > > > > add in pegasos2.c when using VOF. > > > > > > > > > > > > > > > > > > > > > > I've now found that Linux uses rtas methods read-pci-config and > > > > > > > > > > > write-pci-config for PCI access on pegasos2 so this means that we'll > > > > > > > > > > > probably need rtas too (I hoped we could get away without it if it were only > > > > > > > > > > > used for shutdown/reboot or so but seems Linux needs it for PCI as well and > > > > > > > > > > > does not scan the bus and won't find some devices without it). > > > > > > > > > > > > > > > > > > > > Yes, definitely sounds like you'll need an RTAS implementation. > > > > > > > > > > > > > > > > > > I plan to fix that after managed to get serial working as that seems to not > > > > > > > > > need it. If I delete the rtas-size property from /rtas on the original > > > > > > > > > firmware that makes Linux skip instantiating rtas, but I still get serial > > > > > > > > > output just not accessing PCI devices. So I think it should work and keeps > > > > > > > > > things simpler at first. Then I'll try rtas later. > > > > > > > > > > > > > > > > > > > > While VOF can do rtas, this causes a problem with the hypercall method using > > > > > > > > > > > sc 1 that goes through vhyp but trips the assert in ppc_store_sdr1() so > > > > > > > > > > > cannot work after guest is past quiesce. > > > > > > > > > > > > > > > > > > > > > So the question is why is that > > > > > > > > > > > assert there > > > > > > > > > > > > > > > > > > > > Ah.. right. So, vhyp was designed for the PAPR use case, where we > > > > > > > > > > want to model the CPU when it's in supervisor and user mode, but not > > > > > > > > > > when it's in hypervisor mode. We want qemu to mimic the behaviour of > > > > > > > > > > the hypervisor, rather than attempting to actually execute hypervisor > > > > > > > > > > code in the virtual CPU. > > > > > > > > > > > > > > > > > > > > On systems that have a hypervisor mode, SDR1 is hypervisor privileged, > > > > > > > > > > so it makes no sense for the guest to attempt to set it. That should > > > > > > > > > > be caught by the general SPR code and turned into a 0x700, hence the > > > > > > > > > > assert() if we somehow reach ppc_store_sdr1(). > > > > > > > > > > > > > > > > > > > > So, we are seeing a problem here because you want the 'sc 1' > > > > > > > > > > interception of vhyp, but not the rest of the stuff that goes with it. > > > > > > > > > > > > > > > > > > > > > and would using sc 1 for hypercalls on pegasos2 cause other > > > > > > > > > > > problems later even if the assert could be removed? > > > > > > > > > > > > > > > > > > > > At least in the short term, I think you probably can remove the > > > > > > > > > > assert. In your case the 'sc 1' calls aren't truly to a hypervisor, > > > > > > > > > > but a special case escape to qemu for the firmware emulation. I think > > > > > > > > > > it's unlikely to cause problems later, because nothing on a 32-bit > > > > > > > > > > system should be attempting an 'sc 1'. The only thing I can think of > > > > > > > > > > that would fail is some test case which explicitly verified that 'sc > > > > > > > > > > 1' triggered a 0x700 (SIGILL from userspace). > > > > > > > > > > > > > > > > > > OK so the assert should check if the CPU has an HV bit. I think there was a > > > > > > > > > #detine for that somewhere that I can add to the assert then I can try that. > > > > > > > > > What I wasn't sure about is that sc 1 would conflict with the guest's usage > > > > > > > > > of normal sc calls or are these going through different paths and only sc 1 > > > > > > > > > will trigger vhyp callback not affecting notmal sc calls? > > > > > > > > > > > > > > > > The vhyp shouldn't affect normal system calls, 'sc 1' is specifically > > > > > > > > for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the > > > > > > > > vhyp only intercepts the hypercall version (after all Linux on PAPR > > > > > > > > certainly uses its own system calls, and hypercalls are active for the > > > > > > > > lifetime of the guest there). > > > > > > > > > > > > > > > > > (Or if this causes > > > > > > > > > an otherwise unnecessary VM exit on KVM even when it works then maybe > > > > > > > > > looking for a different way in the future might be needed. > > > > > > > > > > > > > > > > What you're doing here won't work with KVM as it stands. There are > > > > > > > > basically two paths into the vhyp hypercall path: 1) from TCG, if we > > > > > > > > interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get > > > > > > > > a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. > > > > > > > > > > > > > > > > The second path is specific to the PAPR (ppc64) implementation of KVM, > > > > > > > > and will not work for a non-PAPR platform without substantial > > > > > > > > modification of the KVM code. > > > > > > > > > > > > > > OK so then at that point when we try KVM we'll need to look at alternative > > > > > > > ways, I think MOL OSI worked with KVM at least in MOL but will probably make > > > > > > > all syscalls exit KVM but since we'll probably need to use KVM PR it will > > > > > > > exit anyway. For now I keep this vhyp as it does not run with KVM for other > > > > > > > reasons yet so that's another area to clean up so as a proof of concept > > > > > > > first version of using VOF vhyp will do. > > > > > > > > > > > > Eh, since you'll need to modify KVM anyway, it probably makes just as > > > > > > much sense to modify it to catch the 'sc 1' as MoL's magic thingy. > > > > > > > > > > I'm not sure how KVM works for this case so I also don't know why and what > > > > > would need to be modified. I think we'll only have KVM PR working as newer > > > > > POWER CPUs having HV (besides being rare among potential users) are probably > > > > > too different to run the OSes that expect at most a G4 on pegasos2 so likely > > > > > it won't work with KVM HV. > > > > > > > > Oh, it definitely won't work with KVM HV. > > > > > > > > > If we have KVM PR doesn't sc already trap so we > > > > > could add MOL OSI without further modification to KVM itself only needing > > > > > change in QEMU? > > > > > > > > Uh... I guess so? > > > > > > > > > I also hope that MOL OSI could be useful for porting some > > > > > paravirt drivers from MOL for running Mac OS X on Mac emulation but I don't > > > > > know about that for sure so I'm open to any other solution too. > > > > > > > > Maybe. I never know much about MOL to begin with, and anything I did > > > > know was a decade or more ago so I've probably forgotten. > > > > > > That may still be more than what I know about it since I never had any > > > knowledge about PPC KVM and don't have any PPC hardware to test with so I'm > > > mostly guessing. (I could test with KVM emulated in QEMU and I did set up an > > > environment for that but that's a bit slow and inconvenient so I'd leave KVM > > > support to those interested and have more knowledge and hardware for it.) > > > > Sounds like a problem for someone else another time, then. > > > > > > > For now I'm > > > > > going with vhyp which is enough fot testing with TCG and if somebody wants > > > > > KVM they could use he original firmware for now so this could be improved in > > > > > a later version unless a simple solution is found before the freeze for 6.1. > > > > > If we're in KVM PR what happens for sc 1 could that be used too so maybe > > > > > what we have now could work? > > > > > > > > Note that if you do go down the MOL path it wouldn't be that complex > > > > to make a "vMOL" interface so you can use the same mechanism for KVM > > > > and TCG. > > > > > > Not sure what you mean by VMOL. Is it modifying MOL to use sc 1 like VOF > > > instead of its OSI way for hypercalls? > > > > No, I mean on the qemu side adding an optional hook which will > > intercept sc 0 instructions with the MOL magic register values and > > redirect them to a machine registered callback, rather than emulating > > the CPU's behaviour of jumping to the system call vector in guest > > space. > > > > Basically an equivalent of vhyp, but for MOL magic syscalls, instead > > of hypercalls. > > OK, that's basically what BenH's OSI patch I've linked to before did I > think, Ok, but probably cleaned up to more modern qemu approaches. > it may just need updating for changes in target/ppc since that patch > was created. However that would also mean we'd need another version of VOF > that uses this instead of sc 1 then so unless we need that I'd keep a single > VOF that works for both spapr and pegasos2. Yeah, fair enough. > > > That would lose the advantage of > > > being able to reuse MOL guest drivers without modification (which might be > > > useful for running OS X guest on Mac emulation) so if we can't use vhyp then > > > maybe using OSI would be the next choice for this reason but for now vhyp > > > seems to be working for what I could test so unless somebody here sees a > > > problem with it and has a better idea I'm going with vhyp for now just > > > because that's what VOF uses and I don't want to modify VOF to reuse it as > > > it is so I don't need to maintain a separate version and also get any > > > enhancements without further need to sync with spapr VOF. > > > > > > I've found this document about possible hypercall interfaces on KVM (see > > > Hypercall ABIs at the end): > > > > > > https://www.kernel.org/doc/html/latest/virt/kvm/ppc-pv.html > > > > > > Having both ePAPR (1.) and PAPR (2.) hypercalls is a bit confusing. Does > > > vhyp correspond to 2. PAPR? > > > > Yes. > > What's ePAPR then and how is it different from PAPR? I mean the acronym not > the hypercall method, the latter is explained in that doc but what ePAPR > stands for and why is that method called like that is not clear to me. Ok, history lesson time. For a long time PAPR has been the document that described the OS environment for IBM POWER based server hardware. Before it was called PAPR (POWER Architecture Platform Requirements) it was called the "RPA" (Requirements for the POWER Architecture, I think?). You might see the old name in a few places. Requiring a full Open Firmware and a bunch of other fairly heavyweight stuff, PAPR really wasn't suitable for embedded ppc chips and boards. The situation with those used to be a complete mess with basically every board variant having it's own different firmware with its own different way of presenting some fragments of vital data to the OS. ePAPR - Embedded Power Architecture Platform Requirements - was created as a standard to try to unify how this stuff was handled on embedded ppc chips. I was one of the authors on early versions of it. It's mostly based around giving the OS a flattened device tree, with some deliberately minimal requirements on firmware initialization and entry state. Here's a link to one of those early versions: http://elinux.org/images/c/cf/Power_ePAPR_APPROVED_v1.1.pdf I thought there were later versions, but I couldn't seem to find any. It's possible the process of refining later versions just petered out as the embedded ppc world mostly died and the flattened device tree development mostly moved to ARM. Since some of the embedded chips from Freescale had hypervisor capabilities, a hypercall model was added to ePAPR - but that wasn't something I was greatly involved in, so I don't know much about it. ePAPR is the reason that the original PAPR is sometimes referred to as "sPAPR" to disambiguate. > > > The ePAPR (1.) seems to be preferred by KVM and > > > MOL OSI supported for compatibility. > > > > That document looks pretty out of date. Most of it is only discussing > > KVM PR, which is now barely maintained. KVM HV only works with PAPR > > hypercalls. > > The links says it's latest kernel docs, so maybe an update need to be sent > to KVM? I guess, but the chances of me finding time to do it are approximately zero. > > > So if we need something else instead of > > > 2. PAPR hypercalls there seems to be two options: ePAPR and MOL OSI which > > > should work with KVM but then I'm not sure how to handle those on TCG. > > > > > > > > > > [...] > > > > > > > > > > > I've tested that the missing rtas is not the reason for getting no output > > > > > > > > > > > via serial though, as even when disabling rtas on pegasos2.rom it boots and > > > > > > > > > > > I still get serial output just some PCI devices are not detected (such as > > > > > > > > > > > USB, the video card and the not emulated ethernet port but these are not > > > > > > > > > > > fatal so it might even work as a first try without rtas, just to boot a > > > > > > > > > > > Linux kernel for testing it would be enough if I can fix the serial output). > > > > > > > > > > > I still don't know why it's not finding serial but I think it may be some > > > > > > > > > > > missing or wrong info in the device tree I generat. I'll try to focus on > > > > > > > > > > > this for now and leave the above rtas question for later. > > > > > > > > > > > > > > > > > > > > Oh.. another thought on that. You have an ISA serial port on Pegasos, > > > > > > > > > > I believe. I wonder if the PCI->ISA bridge needs some configuration / > > > > > > > > > > initialization that the firmware is expected to do. If so you'll need > > > > > > > > > > to mimic that setup in qemu for the VOF case. > > > > > > > > > > > > > > > > > > That's what I begin to think because I've added everything to the device > > > > > > > > > tree that I thought could be needed and I still don't get it working so it > > > > > > > > > may need some config from the firmware. But how do I access device registers > > > > > > > > > from board code? I've tried adding a machine reset method and write to > > > > > > > > > memory mapped device registers but all my attempts failed. I've tried > > > > > > > > > cpu_stl_le_data and even memory_region_dispatch_write but these did not get > > > > > > > > > to the device. What's the way to access guest mmio regs from QEMU? > > > > > > > > > > > > > > > > That's odd, cpu_stl() and memory_region_dispatch_write() should work > > > > > > > > from board code (after the relevant memory regions are configured, of > > > > > > > > course). As an ISA serial port, it's probably accessed through IO > > > > > > > > space, not memory space though, so you'd need &address_space_io. And > > > > > > > > if there is some bridge configuration then it's the bridge control > > > > > > > > registers you need to look at not the serial registers - you'd have to > > > > > > > > look at the bridge documentation for that. Or, I guess the bridge > > > > > > > > implementation in qemu, which you wrote part of. > > > > > > > > > > > > > > I've found at last that stl_le_phys() works. There are so many of these that > > > > > > > I never know when to use which. > > > > > > > > > > > > > > I think the address_space_rw calls in vof_client_call() in vof.c could also > > > > > > > use these for somewhat shorter code. I've ended up with > > > > > > > stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset methodbut I don't > > > > > > > even need that now as it works without additional setup. Also VOF's memory > > > > > > > access is basically the same as the already existing rtas_st() and co. so > > > > > > > maybe that could be reused to make code smaller? > > > > > > > > > > > > rtas_ld() and rtas_st() should only be used for reading/writing RTAS > > > > > > parameters to and from memory. Accessing IO shouldn't be done with > > > > > > those. > > > > > > > > > > > > For IO you probably want the cpu_st*() variants in most cases, since > > > > > > you're trying to emulate an IO access from the virtual cpu. > > > > > > > > > > I think I've tried that but what worked to access mmio device registers are > > > > > stl_le_phys and similar that are wrappers around address_space_stl_*. But I > > > > > did not mean that for rtas_ld/_st but the part when vof accessing the > > > > > parameters passed by its hypercall which is memory access: > > > > > > > > > > https://github.com/patchew-project/qemu/blob/patchew/20210520090557.435689-1-aik%40ozlabs.ru/hw/ppc/vof.c > > > > > > > > > > line 893, and vof_client_call before that is very similar to what h_rtas > > > > > does here: > > > > > > > > > > https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_hcall.c;h=f25014afda408002ee1ec1027a0dd7a6025eca61;hb=HEAD#l639 > > > > > > > > > > and I also need to do the same for rtas in pegasos2 for which I'm just using > > > > > ldl_be_phys for now but I wonder if we really need 3 ways to do the same or > > > > > the rtas_ld/_st could be made more generic and reused here? > > > > > > > > For your rtas implementation you could definitely re-use them. For > > > > the client call I'm a bit less confident, but if the in-guest-memory > > > > structures are really the same, then it would make sense. > > > > > > The memory structure seems very similar to me, the only difference is > > > calling the first field service in VOF instead of token in RTAS. Both are > > > just an array of big endian unit32_t with token, nargs, nret at the front > > > followed by args and rets. Since these rtas_ld/st are defined in spapr.h I > > > did not bother to split them off, so for pegasos2 rtas I'm just using the > > > ldl_be_* functions directly for which these are a shorthand for. If these > > > were split off for sharing between spapr rtas and VOF I may be able to reuse > > > them as well but it's not that important so just mentioned it as a possible > > > later clean up. > > > > Ok, sounds reasonable to re-use them then, though maybe add an aliased > > name for clarity ofci_{ld,st}(), maybe? (for "Open Firmware Client > > Interface") > > I'll wait for what Alexey decides to do in the next VOF patch version and if > I can reuse that (I could if these were defined in vof.h). I don't want to > come up with yet another abstraction to ldl_be_* which does not seem to make > it more clear than using the actual functions for guest memory access which > is what we're doing while getting the hypercall args so I think either using > ldl_be_* directly or reusing already existing rfas_ls/_st would make sense > but adding similar funcs with another name just makes it more confusing. Well, the point of the rtas_ld() functions isn't o be a different way of accessing memory. It's just a convenience wrapper that takes an RTAS args array and an argument index and does the right thing to retrieve it for you. So, if your RTAS function implementation when you want to get argument 0, you just go rtas_ld(args, 0) - more readable than having a bunch of offset calculations and a long winded call to the BE memory access function. You can look at the examples in hw/ppc/sppar_rtas.c to see how its used. Actually, looking again at how it works, you should probably only use rtas_ld() if your general dispatch code has pre-parsed the args structure into separate args and rets arrays, again as we do in spapr_rtas.c
On Mon, Jun 07, 2021 at 12:21:21AM +0200, BALATON Zoltan wrote: > On Fri, 4 Jun 2021, David Gibson wrote: > > On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote: > > > On Wed, 2 Jun 2021, David Gibson wrote: > > > > On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: > > > > > On Thu, 27 May 2021, David Gibson wrote: > > > > > > On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: > > > > > > > On Tue, 25 May 2021, David Gibson wrote: > > > > > > > > On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: > > > > > > > > > On Mon, 24 May 2021, David Gibson wrote: > > > > > > > > > > On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: > > > > > > > > > > > On Sun, 23 May 2021, BALATON Zoltan wrote: > > > > > > > > > > > and would using sc 1 for hypercalls on pegasos2 cause other > > > > > > > > > > > problems later even if the assert could be removed? > > > > > > > > > > > > > > > > > > > > At least in the short term, I think you probably can remove the > > > > > > > > > > assert. In your case the 'sc 1' calls aren't truly to a hypervisor, > > > > > > > > > > but a special case escape to qemu for the firmware emulation. I think > > > > > > > > > > it's unlikely to cause problems later, because nothing on a 32-bit > > > > > > > > > > system should be attempting an 'sc 1'. The only thing I can think of > > > > > > > > > > that would fail is some test case which explicitly verified that 'sc > > > > > > > > > > 1' triggered a 0x700 (SIGILL from userspace). > > > > > > > > > > > > > > > > > > OK so the assert should check if the CPU has an HV bit. I think there was a > > > > > > > > > #detine for that somewhere that I can add to the assert then I can try that. > > > > > > > > > What I wasn't sure about is that sc 1 would conflict with the guest's usage > > > > > > > > > of normal sc calls or are these going through different paths and only sc 1 > > > > > > > > > will trigger vhyp callback not affecting notmal sc calls? > > > > > > > > > > > > > > > > The vhyp shouldn't affect normal system calls, 'sc 1' is specifically > > > > > > > > for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the > > > > > > > > vhyp only intercepts the hypercall version (after all Linux on PAPR > > > > > > > > certainly uses its own system calls, and hypercalls are active for the > > > > > > > > lifetime of the guest there). > > > > > > > > > > > > > > > > > (Or if this causes > > > > > > > > > an otherwise unnecessary VM exit on KVM even when it works then maybe > > > > > > > > > looking for a different way in the future might be needed. > > > > > > > > > > > > > > > > What you're doing here won't work with KVM as it stands. There are > > > > > > > > basically two paths into the vhyp hypercall path: 1) from TCG, if we > > > > > > > > interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get > > > > > > > > a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. > > > > > > > > > > > > > > > > The second path is specific to the PAPR (ppc64) implementation of KVM, > > > > > > > > and will not work for a non-PAPR platform without substantial > > > > > > > > modification of the KVM code. > > > > > > > > > > > > > > OK so then at that point when we try KVM we'll need to look at alternative > > > > > > > ways, I think MOL OSI worked with KVM at least in MOL but will probably make > > > > > > > all syscalls exit KVM but since we'll probably need to use KVM PR it will > > > > > > > exit anyway. For now I keep this vhyp as it does not run with KVM for other > > > > > > > reasons yet so that's another area to clean up so as a proof of concept > > > > > > > first version of using VOF vhyp will do. > > > > > > > > > > > > Eh, since you'll need to modify KVM anyway, it probably makes just as > > > > > > much sense to modify it to catch the 'sc 1' as MoL's magic thingy. > > > > > > > > > > I'm not sure how KVM works for this case so I also don't know why and what > > > > > would need to be modified. I think we'll only have KVM PR working as newer > > > > > POWER CPUs having HV (besides being rare among potential users) are probably > > > > > too different to run the OSes that expect at most a G4 on pegasos2 so likely > > > > > it won't work with KVM HV. > > > > > > > > Oh, it definitely won't work with KVM HV. > > > > > > > > > If we have KVM PR doesn't sc already trap so we > > > > > could add MOL OSI without further modification to KVM itself only needing > > > > > change in QEMU? > > > > > > > > Uh... I guess so? > > > > > > > > > I also hope that MOL OSI could be useful for porting some > > > > > paravirt drivers from MOL for running Mac OS X on Mac emulation but I don't > > > > > know about that for sure so I'm open to any other solution too. > > > > > > > > Maybe. I never know much about MOL to begin with, and anything I did > > > > know was a decade or more ago so I've probably forgotten. > > > > > > That may still be more than what I know about it since I never had any > > > knowledge about PPC KVM and don't have any PPC hardware to test with so I'm > > > mostly guessing. (I could test with KVM emulated in QEMU and I did set up an > > > environment for that but that's a bit slow and inconvenient so I'd leave KVM > > > support to those interested and have more knowledge and hardware for it.) > > > > Sounds like a problem for someone else another time, then. > > So now that it works on TCG with vhyp I tried what it would do on KVM PR > with the sc 1 but I could only test that on QEMU itself running in a Linux > guest. First I've hit missing this callback: > > https://git.qemu.org/?p=qemu.git;a=blob;f=target/ppc/kvm.c;h=104a308abb5700b2fe075397271f314d7f607543;hb=HEAD#l856 > > that I can fix by providing a callback in pegasos2.c that does what the else > clause would do returning POWERPC_CPU(current_cpu)->env.spr[SPR_SDR1] (I > guess that's the correct thing to do if it works without vhyp). For your case, yes that's right. Again vhyp is designed for the case where the (hash) MMU is owned by the hypervisor. But due to a gross hack the way we communicate the userspace address of the hash table to KVM PR is via the SDR1 register, which is why we need that hook. > After getting past this, the host QEMU crashed on the first sc 1 call with > this error: > > qemu: fatal: Trying to deliver HV exception (MSR) 8 with no HV support > NIP 0000000000000148 LR 0000000000000590 CTR 0000000000000000 XER 0000000000000000 CPU#0 > MSR 000000000000d032 HID0 0000000060000000 HF 00004012 iidx 0 didx 0 > TB 00000203 876006644638 DECR 422427 > GPR00 0000000000000680 000000000000fe90 0000000000008e00 000000000000f005 > GPR04 000000000000fe9c 0000000000000001 0000000000000e78 0000000000000000 > GPR08 000000000000fe98 000000000000fe9c 0000000000000001 0000000000000000 > GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR28 0000000000000000 0000000000000000 0000000000008e9c 000000000000fe90 > CR 20000000 [ E - - - - - - - ] RES ffffffffffffffff > FPR00 bff0000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR12 3ff553f7ced91687 0000000000000000 0000000000000000 0000000000000000 > FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPSCR 0000000082004000 > SRR0 00000000000001d4 SRR1 300000000000d032 PVR 00000000003c0301 VRSAVE 00000000ffffffff > SPRG0 000000003fe00000 SPRG1 c00000000ff60000 SPRG2 c00000000ff60000 SPRG3 0000000000000000 > SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000 > SDR1 000000003f000006 DAR f00000000090abf0 DSISR 0000000042000000 > Aborted (core dumped) > > (vof.bin looks like this: > > 100: 3c 40 00 00 lis r2,0 > 104: 60 42 8e 00 ori r2,r2,36352 > 108: 48 00 00 cc b 0x1d4 > 10c: 3c 40 00 00 lis r2,0 > 110: 60 42 8e 00 ori r2,r2,36352 > 114: 94 21 ff 90 stwu r1,-112(r1) > 118: 93 e1 00 68 stw r31,104(r1) > 11c: 7f e8 02 a6 mflr r31 > 120: 48 00 02 8d bl 0x3ac > 124: 60 00 00 00 nop > 128: 7f e8 03 a6 mtlr r31 > 12c: 83 e1 00 68 lwz r31,104(r1) > 130: 38 21 00 70 addi r1,r1,112 > 134: 4e 80 00 20 blr > 138: 7c 64 1b 78 mr r4,r3 > 13c: 3c 60 00 00 lis r3,0 > 140: 60 63 f0 05 ori r3,r3,61445 > 144: 44 00 00 22 sc 1 > 148: 4e 80 00 20 blr > > so I think it's the sc 1 at 0x144) The error is coming from here: > > https://git.qemu.org/?p=qemu.git;a=blob;f=target/ppc/excp_helper.c;h=fd147e2a37662456d30f7ab74b23bfb036260ced;hb=HEAD#l830 > > What does this mean? What would a real CPU do with this and where it could > be catched to use as hypercall method on CPUs without HV or what else should > we do if we wanted this to work with KVM PR too in the future? The interesting bit is actually how we're getting to that part of powerpc_excp. I guess we must be getting a KVM exit for that 'sc 1', but I don't know what type. If we can figure out that would be where we'd need to intercept it and send it to the vhyp handler instead of actually trying to enter the hypercall vector on the emulated CPU, which doesn't have one.
On Mon, 7 Jun 2021, David Gibson wrote: > On Mon, Jun 07, 2021 at 12:21:21AM +0200, BALATON Zoltan wrote: >> On Fri, 4 Jun 2021, David Gibson wrote: >>> On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote: >>>> On Wed, 2 Jun 2021, David Gibson wrote: >>>>> On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: >>>>>> On Thu, 27 May 2021, David Gibson wrote: >>>>>>> On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: >>>>>>>> On Tue, 25 May 2021, David Gibson wrote: >>>>>>>>> On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: >>>>>>>>>> On Mon, 24 May 2021, David Gibson wrote: >>>>>>>>>>> On Sun, May 23, 2021 at 07:09:26PM +0200, BALATON Zoltan wrote: >>>>>>>>>>>> On Sun, 23 May 2021, BALATON Zoltan wrote: >>>>>>>>>>>> and would using sc 1 for hypercalls on pegasos2 cause other >>>>>>>>>>>> problems later even if the assert could be removed? >>>>>>>>>>> >>>>>>>>>>> At least in the short term, I think you probably can remove the >>>>>>>>>>> assert. In your case the 'sc 1' calls aren't truly to a hypervisor, >>>>>>>>>>> but a special case escape to qemu for the firmware emulation. I think >>>>>>>>>>> it's unlikely to cause problems later, because nothing on a 32-bit >>>>>>>>>>> system should be attempting an 'sc 1'. The only thing I can think of >>>>>>>>>>> that would fail is some test case which explicitly verified that 'sc >>>>>>>>>>> 1' triggered a 0x700 (SIGILL from userspace). >>>>>>>>>> >>>>>>>>>> OK so the assert should check if the CPU has an HV bit. I think there was a >>>>>>>>>> #detine for that somewhere that I can add to the assert then I can try that. >>>>>>>>>> What I wasn't sure about is that sc 1 would conflict with the guest's usage >>>>>>>>>> of normal sc calls or are these going through different paths and only sc 1 >>>>>>>>>> will trigger vhyp callback not affecting notmal sc calls? >>>>>>>>> >>>>>>>>> The vhyp shouldn't affect normal system calls, 'sc 1' is specifically >>>>>>>>> for hypercalls, as opposed to normal 'sc' (a.k.a. 'sc 0'), and the >>>>>>>>> vhyp only intercepts the hypercall version (after all Linux on PAPR >>>>>>>>> certainly uses its own system calls, and hypercalls are active for the >>>>>>>>> lifetime of the guest there). >>>>>>>>> >>>>>>>>>> (Or if this causes >>>>>>>>>> an otherwise unnecessary VM exit on KVM even when it works then maybe >>>>>>>>>> looking for a different way in the future might be needed. >>>>>>>>> >>>>>>>>> What you're doing here won't work with KVM as it stands. There are >>>>>>>>> basically two paths into the vhyp hypercall path: 1) from TCG, if we >>>>>>>>> interpret an 'sc 1' instruction we enter vhyp, 2) from KVM, if we get >>>>>>>>> a KVM_EXIT_PAPR_HCALL KVM exit then we also go to the vhyp path. >>>>>>>>> >>>>>>>>> The second path is specific to the PAPR (ppc64) implementation of KVM, >>>>>>>>> and will not work for a non-PAPR platform without substantial >>>>>>>>> modification of the KVM code. >>>>>>>> >>>>>>>> OK so then at that point when we try KVM we'll need to look at alternative >>>>>>>> ways, I think MOL OSI worked with KVM at least in MOL but will probably make >>>>>>>> all syscalls exit KVM but since we'll probably need to use KVM PR it will >>>>>>>> exit anyway. For now I keep this vhyp as it does not run with KVM for other >>>>>>>> reasons yet so that's another area to clean up so as a proof of concept >>>>>>>> first version of using VOF vhyp will do. >>>>>>> >>>>>>> Eh, since you'll need to modify KVM anyway, it probably makes just as >>>>>>> much sense to modify it to catch the 'sc 1' as MoL's magic thingy. >>>>>> >>>>>> I'm not sure how KVM works for this case so I also don't know why and what >>>>>> would need to be modified. I think we'll only have KVM PR working as newer >>>>>> POWER CPUs having HV (besides being rare among potential users) are probably >>>>>> too different to run the OSes that expect at most a G4 on pegasos2 so likely >>>>>> it won't work with KVM HV. >>>>> >>>>> Oh, it definitely won't work with KVM HV. >>>>> >>>>>> If we have KVM PR doesn't sc already trap so we >>>>>> could add MOL OSI without further modification to KVM itself only needing >>>>>> change in QEMU? >>>>> >>>>> Uh... I guess so? >>>>> >>>>>> I also hope that MOL OSI could be useful for porting some >>>>>> paravirt drivers from MOL for running Mac OS X on Mac emulation but I don't >>>>>> know about that for sure so I'm open to any other solution too. >>>>> >>>>> Maybe. I never know much about MOL to begin with, and anything I did >>>>> know was a decade or more ago so I've probably forgotten. >>>> >>>> That may still be more than what I know about it since I never had any >>>> knowledge about PPC KVM and don't have any PPC hardware to test with so I'm >>>> mostly guessing. (I could test with KVM emulated in QEMU and I did set up an >>>> environment for that but that's a bit slow and inconvenient so I'd leave KVM >>>> support to those interested and have more knowledge and hardware for it.) >>> >>> Sounds like a problem for someone else another time, then. >> >> So now that it works on TCG with vhyp I tried what it would do on KVM PR >> with the sc 1 but I could only test that on QEMU itself running in a Linux >> guest. First I've hit missing this callback: >> >> https://git.qemu.org/?p=qemu.git;a=blob;f=target/ppc/kvm.c;h=104a308abb5700b2fe075397271f314d7f607543;hb=HEAD#l856 >> >> that I can fix by providing a callback in pegasos2.c that does what the else >> clause would do returning POWERPC_CPU(current_cpu)->env.spr[SPR_SDR1] (I >> guess that's the correct thing to do if it works without vhyp). > > For your case, yes that's right. Again vhyp is designed for the case > where the (hash) MMU is owned by the hypervisor. But due to a gross > hack the way we communicate the userspace address of the hash table to > KVM PR is via the SDR1 register, which is why we need that hook. > >> After getting past this, the host QEMU crashed on the first sc 1 call with >> this error: >> >> qemu: fatal: Trying to deliver HV exception (MSR) 8 with no HV support > >> NIP 0000000000000148 LR 0000000000000590 CTR 0000000000000000 XER 0000000000000000 CPU#0 >> MSR 000000000000d032 HID0 0000000060000000 HF 00004012 iidx 0 didx 0 >> TB 00000203 876006644638 DECR 422427 >> GPR00 0000000000000680 000000000000fe90 0000000000008e00 000000000000f005 >> GPR04 000000000000fe9c 0000000000000001 0000000000000e78 0000000000000000 >> GPR08 000000000000fe98 000000000000fe9c 0000000000000001 0000000000000000 >> GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> GPR28 0000000000000000 0000000000000000 0000000000008e9c 000000000000fe90 >> CR 20000000 [ E - - - - - - - ] RES ffffffffffffffff >> FPR00 bff0000000000000 0000000000000000 0000000000000000 0000000000000000 >> FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> FPR12 3ff553f7ced91687 0000000000000000 0000000000000000 0000000000000000 >> FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> FPSCR 0000000082004000 >> SRR0 00000000000001d4 SRR1 300000000000d032 PVR 00000000003c0301 VRSAVE 00000000ffffffff >> SPRG0 000000003fe00000 SPRG1 c00000000ff60000 SPRG2 c00000000ff60000 SPRG3 0000000000000000 >> SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000 >> SDR1 000000003f000006 DAR f00000000090abf0 DSISR 0000000042000000 >> Aborted (core dumped) >> >> (vof.bin looks like this: >> >> 100: 3c 40 00 00 lis r2,0 >> 104: 60 42 8e 00 ori r2,r2,36352 >> 108: 48 00 00 cc b 0x1d4 >> 10c: 3c 40 00 00 lis r2,0 >> 110: 60 42 8e 00 ori r2,r2,36352 >> 114: 94 21 ff 90 stwu r1,-112(r1) >> 118: 93 e1 00 68 stw r31,104(r1) >> 11c: 7f e8 02 a6 mflr r31 >> 120: 48 00 02 8d bl 0x3ac >> 124: 60 00 00 00 nop >> 128: 7f e8 03 a6 mtlr r31 >> 12c: 83 e1 00 68 lwz r31,104(r1) >> 130: 38 21 00 70 addi r1,r1,112 >> 134: 4e 80 00 20 blr >> 138: 7c 64 1b 78 mr r4,r3 >> 13c: 3c 60 00 00 lis r3,0 >> 140: 60 63 f0 05 ori r3,r3,61445 >> 144: 44 00 00 22 sc 1 >> 148: 4e 80 00 20 blr >> >> so I think it's the sc 1 at 0x144) The error is coming from here: >> >> https://git.qemu.org/?p=qemu.git;a=blob;f=target/ppc/excp_helper.c;h=fd147e2a37662456d30f7ab74b23bfb036260ced;hb=HEAD#l830 >> >> What does this mean? What would a real CPU do with this and where it could >> be catched to use as hypercall method on CPUs without HV or what else should >> we do if we wanted this to work with KVM PR too in the future? > > The interesting bit is actually how we're getting to that part of > powerpc_excp. I guess we must be getting a KVM exit for that 'sc 1', > but I don't know what type. If we can figure out that would be where > we'd need to intercept it and send it to the vhyp handler instead of > actually trying to enter the hypercall vector on the emulated CPU, > which doesn't have one. Well, this is emulated KVM PR running in a TCG guest because as I mentioned before I don't have real hardware to test KVM on. So I've booted Linux on qemu-system-ppc64 -M mac99,via=pmu (that's using 970 (G5) CPU) and run qemu-system-ppc with KVM PR within it. So it's ultimately coming from somewhere in TCG: #0 0x00007f5d1c0f09ba in raise () at /lib64/libc.so.6 #1 0x00007f5d1c0d9524 in abort () at /lib64/libc.so.6 #2 0x00005557807f4776 in cpu_abort (cpu=cpu@entry=0x5557826bff30, fmt=fmt@entry=0x555780a8ad88 "Trying to deliver HV exception (MSR) %d with no HV support\n") at ../cpu.c:376 #3 0x00005557806a09c6 in powerpc_excp (cpu=0x5557826bff30, excp_model=13, excp=<optimized out>) at ../target/ppc/excp_helper.c:833 #4 0x000055578073bf43 in cpu_handle_exception (ret=<synthetic pointer>, cpu=0x555782669640) at ../accel/tcg/cpu-exec.c:524 #5 0x000055578073bf43 in cpu_exec (cpu=cpu@entry=0x5557826bff30) at ../accel/tcg/cpu-exec.c:778 #6 0x0000555780750d82 in tcg_cpus_exec (cpu=cpu@entry=0x5557826bff30) at ../accel/tcg/tcg-accel-ops.c:67 #7 0x0000555780755103 in mttcg_cpu_thread_fn (arg=arg@entry=0x5557826bff30) at ../accel/tcg/tcg-accel-ops-mttcg.c:70 #8 0x0000555780974eea in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:521 #9 0x00007f5d1c28704c in start_thread () at /lib64/libpthread.so.0 #10 0x00007f5d1c1b82cf in clone () at /lib64/libc.so.6 This is a backtrace on the host because to outer TCG qemu is getting this abort when the guest qemu in it runs the sc 1 via KVM PR. So it's trapped on the host not in the guest but I don't know what a real CPU would do in this case and if that's emulated correctly in nested KVM (apparently not as it's crashing). This makes it a bit hard to test as I can also run into KVM emulation bugs in the host QEMU as well as problems with KVM PR that would also happen on real hardware but it might not be obvious which I've hit. Another unrelated problem I've found with KVM PR is when trying to run it with the board firmware (that's not crashing as that's not using sc 1) it initially starts but gets stuck soon after starting. When enabling kvm traces in QEMU I see endless kvm exits: kvm_run_exit cpu_index 0, reason 6 kvm_run_exit cpu_index 0, reason 6 kvm_run_exit cpu_index 0, reason 6 kvm_run_exit cpu_index 0, reason 6 but NIP is not advancing in info registers: (qemu) info registers kvm_failed_spr_get Warning: Unable to retrieve SPR 1013 from KVM: Invalid argument NIP fff05958 LR fff05524 CTR 00000000 XER 20000000 CPU#0 MSR 00000030 HID0 00000000 HF 6c000002 iidx 3 didx 3 TB 00000000 00000000 DECR 0 GPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000081 GPR04 00000000fe000d00 0000000000000069 00000000fff042a0 00000000fff054cc GPR08 0000000000f5e7f8 00000000ffffff00 00000000fff04274 0000000000000000 GPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28 0000000000000000 0000000000000000 0000000000000002 0000000000000000 CR 40000000 [ G - - - - - - - ] RES ffffffff FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 00000000 SRR0 00000000 SRR1 00000000 PVR 000c0209 VRSAVE 00000000 SPRG0 00000000 SPRG1 00000000 SPRG2 00000000 SPRG3 00000000 SPRG4 00000000 SPRG5 00000000 SPRG6 00000000 SPRG7 00000000 SDR1 00000000 DAR 00000000 DSISR 00000000 (qemu) kvm_failed_spr_set Warning: Unable to set SPR 1013 to KVM: Invalid argument It stays on fff05958, the instruction it seems to be stuck on is doing some io: 0xfff05938: 7c0006ac eieio 0xfff0593c: 98640001 stb r3, 1(r4) 0xfff05940: 7c0006ac eieio 0xfff05944: 4e800020 blr 0xfff05948: 7c641b78 mr r4, r3 0xfff0594c: 6484fe00 oris r4, r4, 0xfe00 0xfff05950: 7c0006ac eieio 0xfff05954: 88640000 lbz r3, 0(r4) 0xfff05958: 7c0006ac eieio 0xfff0595c: 7c0004ac sync 0xfff05960: 4e800020 blr but likely it's trying to access a device which is not emulated so nothing's there. When I run the same with TCG I get some invalid accesss warnings with -d guest_errors enabled around the same point: Invalid access at addr 0xFE000E43, size 1, region '(null)', reason: rejected Invalid access at addr 0xE43, size 1, region '(null)', reason: rejected Invalid access at addr 0xFE000E44, size 1, region '(null)', reason: rejected Invalid access at addr 0xE44, size 1, region '(null)', reason: rejected Invalid access at addr 0xFE000E41, size 1, region '(null)', reason: rejected Invalid access at addr 0xE41, size 1, region '(null)', reason: rejected Invalid access at addr 0xFE000E42, size 1, region '(null)', reason: rejected Invalid access at addr 0xE42, size 1, region '(null)', reason: rejected Invalid access at addr 0xFE000E40, size 1, region '(null)', reason: rejected Invalid access at addr 0xE40, size 1, region '(null)', reason: rejected but it's moving on with TCG and since the device that should be here is not really needed (it's setting up some clock generators on real hardware at this point) it's working anyway. Is this a problem with KVM so do I really need to put unimplemented devices to every address the guest may access even when that's not needed on TCG or is this a bug somewhere that after detecting this error it's not advancing the IP and trying to execute the same instruction again? I'm not sure how to debug this further or where to look for a bug or fix it. Regards, BALATON Zoltan
On Mon, 7 Jun 2021, David Gibson wrote: > On Fri, Jun 04, 2021 at 03:59:22PM +0200, BALATON Zoltan wrote: >> On Fri, 4 Jun 2021, David Gibson wrote: >>> On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote: >>>> On Wed, 2 Jun 2021, David Gibson wrote: >>>>> On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: >>>>>> On Thu, 27 May 2021, David Gibson wrote: >>>>>>> On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: >>>>>>>> On Tue, 25 May 2021, David Gibson wrote: >>>>>>>>> On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: >>>>>>>>>> On Mon, 24 May 2021, David Gibson wrote: >> What's ePAPR then and how is it different from PAPR? I mean the acronym not >> the hypercall method, the latter is explained in that doc but what ePAPR >> stands for and why is that method called like that is not clear to me. > > Ok, history lesson time. > > For a long time PAPR has been the document that described the OS > environment for IBM POWER based server hardware. Before it was called > PAPR (POWER Architecture Platform Requirements) it was called the > "RPA" (Requirements for the POWER Architecture, I think?). You might > see the old name in a few places. > > Requiring a full Open Firmware and a bunch of other fairly heavyweight > stuff, PAPR really wasn't suitable for embedded ppc chips and boards. > The situation with those used to be a complete mess with basically > every board variant having it's own different firmware with its own > different way of presenting some fragments of vital data to the OS. > > ePAPR - Embedded Power Architecture Platform Requirements - was > created as a standard to try to unify how this stuff was handled on > embedded ppc chips. I was one of the authors on early versions of > it. It's mostly based around giving the OS a flattened device tree, > with some deliberately minimal requirements on firmware initialization > and entry state. Here's a link to one of those early versions: > > http://elinux.org/images/c/cf/Power_ePAPR_APPROVED_v1.1.pdf > > I thought there were later versions, but I couldn't seem to find any. > It's possible the process of refining later versions just petered out > as the embedded ppc world mostly died and the flattened device tree > development mostly moved to ARM. > > Since some of the embedded chips from Freescale had hypervisor > capabilities, a hypercall model was added to ePAPR - but that wasn't > something I was greatly involved in, so I don't know much about it. > > ePAPR is the reason that the original PAPR is sometimes referred to as > "sPAPR" to disambiguate. Ah, thanks that really puts it in context. I've heard about PReP and CHRP in connection with the boards I've tried to emulate but don't know much about PAPR and server POWER systems. >>>> The ePAPR (1.) seems to be preferred by KVM and >>>> MOL OSI supported for compatibility. >>> >>> That document looks pretty out of date. Most of it is only discussing >>> KVM PR, which is now barely maintained. KVM HV only works with PAPR >>> hypercalls. >> >> The links says it's latest kernel docs, so maybe an update need to be sent >> to KVM? > > I guess, but the chances of me finding time to do it are approximately > zero. > >>>> So if we need something else instead of >>>> 2. PAPR hypercalls there seems to be two options: ePAPR and MOL OSI which >>>> should work with KVM but then I'm not sure how to handle those on TCG. >>>> >>>>>>>> [...] >>>>>>>>>>>> I've tested that the missing rtas is not the reason for getting no output >>>>>>>>>>>> via serial though, as even when disabling rtas on pegasos2.rom it boots and >>>>>>>>>>>> I still get serial output just some PCI devices are not detected (such as >>>>>>>>>>>> USB, the video card and the not emulated ethernet port but these are not >>>>>>>>>>>> fatal so it might even work as a first try without rtas, just to boot a >>>>>>>>>>>> Linux kernel for testing it would be enough if I can fix the serial output). >>>>>>>>>>>> I still don't know why it's not finding serial but I think it may be some >>>>>>>>>>>> missing or wrong info in the device tree I generat. I'll try to focus on >>>>>>>>>>>> this for now and leave the above rtas question for later. >>>>>>>>>>> >>>>>>>>>>> Oh.. another thought on that. You have an ISA serial port on Pegasos, >>>>>>>>>>> I believe. I wonder if the PCI->ISA bridge needs some configuration / >>>>>>>>>>> initialization that the firmware is expected to do. If so you'll need >>>>>>>>>>> to mimic that setup in qemu for the VOF case. >>>>>>>>>> >>>>>>>>>> That's what I begin to think because I've added everything to the device >>>>>>>>>> tree that I thought could be needed and I still don't get it working so it >>>>>>>>>> may need some config from the firmware. But how do I access device registers >>>>>>>>>> from board code? I've tried adding a machine reset method and write to >>>>>>>>>> memory mapped device registers but all my attempts failed. I've tried >>>>>>>>>> cpu_stl_le_data and even memory_region_dispatch_write but these did not get >>>>>>>>>> to the device. What's the way to access guest mmio regs from QEMU? >>>>>>>>> >>>>>>>>> That's odd, cpu_stl() and memory_region_dispatch_write() should work >>>>>>>>> from board code (after the relevant memory regions are configured, of >>>>>>>>> course). As an ISA serial port, it's probably accessed through IO >>>>>>>>> space, not memory space though, so you'd need &address_space_io. And >>>>>>>>> if there is some bridge configuration then it's the bridge control >>>>>>>>> registers you need to look at not the serial registers - you'd have to >>>>>>>>> look at the bridge documentation for that. Or, I guess the bridge >>>>>>>>> implementation in qemu, which you wrote part of. >>>>>>>> >>>>>>>> I've found at last that stl_le_phys() works. There are so many of these that >>>>>>>> I never know when to use which. >>>>>>>> >>>>>>>> I think the address_space_rw calls in vof_client_call() in vof.c could also >>>>>>>> use these for somewhat shorter code. I've ended up with >>>>>>>> stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset methodbut I don't >>>>>>>> even need that now as it works without additional setup. Also VOF's memory >>>>>>>> access is basically the same as the already existing rtas_st() and co. so >>>>>>>> maybe that could be reused to make code smaller? >>>>>>> >>>>>>> rtas_ld() and rtas_st() should only be used for reading/writing RTAS >>>>>>> parameters to and from memory. Accessing IO shouldn't be done with >>>>>>> those. >>>>>>> >>>>>>> For IO you probably want the cpu_st*() variants in most cases, since >>>>>>> you're trying to emulate an IO access from the virtual cpu. >>>>>> >>>>>> I think I've tried that but what worked to access mmio device registers are >>>>>> stl_le_phys and similar that are wrappers around address_space_stl_*. But I >>>>>> did not mean that for rtas_ld/_st but the part when vof accessing the >>>>>> parameters passed by its hypercall which is memory access: >>>>>> >>>>>> https://github.com/patchew-project/qemu/blob/patchew/20210520090557.435689-1-aik%40ozlabs.ru/hw/ppc/vof.c >>>>>> >>>>>> line 893, and vof_client_call before that is very similar to what h_rtas >>>>>> does here: >>>>>> >>>>>> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_hcall.c;h=f25014afda408002ee1ec1027a0dd7a6025eca61;hb=HEAD#l639 >>>>>> >>>>>> and I also need to do the same for rtas in pegasos2 for which I'm just using >>>>>> ldl_be_phys for now but I wonder if we really need 3 ways to do the same or >>>>>> the rtas_ld/_st could be made more generic and reused here? >>>>> >>>>> For your rtas implementation you could definitely re-use them. For >>>>> the client call I'm a bit less confident, but if the in-guest-memory >>>>> structures are really the same, then it would make sense. >>>> >>>> The memory structure seems very similar to me, the only difference is >>>> calling the first field service in VOF instead of token in RTAS. Both are >>>> just an array of big endian unit32_t with token, nargs, nret at the front >>>> followed by args and rets. Since these rtas_ld/st are defined in spapr.h I >>>> did not bother to split them off, so for pegasos2 rtas I'm just using the >>>> ldl_be_* functions directly for which these are a shorthand for. If these >>>> were split off for sharing between spapr rtas and VOF I may be able to reuse >>>> them as well but it's not that important so just mentioned it as a possible >>>> later clean up. >>> >>> Ok, sounds reasonable to re-use them then, though maybe add an aliased >>> name for clarity ofci_{ld,st}(), maybe? (for "Open Firmware Client >>> Interface") >> >> I'll wait for what Alexey decides to do in the next VOF patch version and if >> I can reuse that (I could if these were defined in vof.h). I don't want to >> come up with yet another abstraction to ldl_be_* which does not seem to make >> it more clear than using the actual functions for guest memory access which >> is what we're doing while getting the hypercall args so I think either using >> ldl_be_* directly or reusing already existing rfas_ls/_st would make sense >> but adding similar funcs with another name just makes it more confusing. > > Well, the point of the rtas_ld() functions isn't o be a different way > of accessing memory. It's just a convenience wrapper that takes an > RTAS args array and an argument index and does the right thing to > retrieve it for you. > > So, if your RTAS function implementation when you want to get argument > 0, you just go rtas_ld(args, 0) - more readable than having a bunch of > offset calculations and a long winded call to the BE memory access > function. You can look at the examples in hw/ppc/sppar_rtas.c to see > how its used. > > Actually, looking again at how it works, you should probably only use > rtas_ld() if your general dispatch code has pre-parsed the args > structure into separate args and rets arrays, again as we do in > spapr_rtas.c The problem with those rtas_* functions is that they are in spapr now so to reuse it I'd need to split them off which I did not do because it's not too bad without it and modifying spapr would mean another round of review which could take long and delay my other patches. So if somebody splits these off for reuse (like if Alexey wants to reuse them in VOF) then I may use them but otherwise I've just noted these could be reused but don't intend to do that now. This could also be done later for both VOF and pegasos2 as a clean up so it does not seem to be too important at the moment. Regards, BALATON Zoltan
On 6/8/21 08:54, BALATON Zoltan wrote: > On Mon, 7 Jun 2021, David Gibson wrote: >> On Fri, Jun 04, 2021 at 03:59:22PM +0200, BALATON Zoltan wrote: >>> On Fri, 4 Jun 2021, David Gibson wrote: >>>> On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote: >>>>> On Wed, 2 Jun 2021, David Gibson wrote: >>>>>> On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: >>>>>>> On Thu, 27 May 2021, David Gibson wrote: >>>>>>>> On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: >>>>>>>>> On Tue, 25 May 2021, David Gibson wrote: >>>>>>>>>> On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: >>>>>>>>>>> On Mon, 24 May 2021, David Gibson wrote: >>> What's ePAPR then and how is it different from PAPR? I mean the >>> acronym not >>> the hypercall method, the latter is explained in that doc but what ePAPR >>> stands for and why is that method called like that is not clear to me. >> >> Ok, history lesson time. >> >> For a long time PAPR has been the document that described the OS >> environment for IBM POWER based server hardware. Before it was called >> PAPR (POWER Architecture Platform Requirements) it was called the >> "RPA" (Requirements for the POWER Architecture, I think?). You might >> see the old name in a few places. >> >> Requiring a full Open Firmware and a bunch of other fairly heavyweight >> stuff, PAPR really wasn't suitable for embedded ppc chips and boards. >> The situation with those used to be a complete mess with basically >> every board variant having it's own different firmware with its own >> different way of presenting some fragments of vital data to the OS. >> >> ePAPR - Embedded Power Architecture Platform Requirements - was >> created as a standard to try to unify how this stuff was handled on >> embedded ppc chips. I was one of the authors on early versions of >> it. It's mostly based around giving the OS a flattened device tree, >> with some deliberately minimal requirements on firmware initialization >> and entry state. Here's a link to one of those early versions: >> >> http://elinux.org/images/c/cf/Power_ePAPR_APPROVED_v1.1.pdf >> >> I thought there were later versions, but I couldn't seem to find any. >> It's possible the process of refining later versions just petered out >> as the embedded ppc world mostly died and the flattened device tree >> development mostly moved to ARM. >> >> Since some of the embedded chips from Freescale had hypervisor >> capabilities, a hypercall model was added to ePAPR - but that wasn't >> something I was greatly involved in, so I don't know much about it. >> >> ePAPR is the reason that the original PAPR is sometimes referred to as >> "sPAPR" to disambiguate. > > Ah, thanks that really puts it in context. I've heard about PReP and > CHRP in connection with the boards I've tried to emulate but don't know > much about PAPR and server POWER systems. > >>>>> The ePAPR (1.) seems to be preferred by KVM and >>>>> MOL OSI supported for compatibility. >>>> >>>> That document looks pretty out of date. Most of it is only discussing >>>> KVM PR, which is now barely maintained. KVM HV only works with PAPR >>>> hypercalls. >>> >>> The links says it's latest kernel docs, so maybe an update need to be >>> sent >>> to KVM? >> >> I guess, but the chances of me finding time to do it are approximately >> zero. >> >>>>> So if we need something else instead of >>>>> 2. PAPR hypercalls there seems to be two options: ePAPR and MOL OSI >>>>> which >>>>> should work with KVM but then I'm not sure how to handle those on TCG. >>>>> >>>>>>>>> [...] >>>>>>>>>>>>> I've tested that the missing rtas is not the reason for >>>>>>>>>>>>> getting no output >>>>>>>>>>>>> via serial though, as even when disabling rtas on >>>>>>>>>>>>> pegasos2.rom it boots and >>>>>>>>>>>>> I still get serial output just some PCI devices are not >>>>>>>>>>>>> detected (such as >>>>>>>>>>>>> USB, the video card and the not emulated ethernet port but >>>>>>>>>>>>> these are not >>>>>>>>>>>>> fatal so it might even work as a first try without rtas, >>>>>>>>>>>>> just to boot a >>>>>>>>>>>>> Linux kernel for testing it would be enough if I can fix >>>>>>>>>>>>> the serial output). >>>>>>>>>>>>> I still don't know why it's not finding serial but I think >>>>>>>>>>>>> it may be some >>>>>>>>>>>>> missing or wrong info in the device tree I generat. I'll >>>>>>>>>>>>> try to focus on >>>>>>>>>>>>> this for now and leave the above rtas question for later. >>>>>>>>>>>> >>>>>>>>>>>> Oh.. another thought on that. You have an ISA serial port >>>>>>>>>>>> on Pegasos, >>>>>>>>>>>> I believe. I wonder if the PCI->ISA bridge needs some >>>>>>>>>>>> configuration / >>>>>>>>>>>> initialization that the firmware is expected to do. If so >>>>>>>>>>>> you'll need >>>>>>>>>>>> to mimic that setup in qemu for the VOF case. >>>>>>>>>>> >>>>>>>>>>> That's what I begin to think because I've added everything to >>>>>>>>>>> the device >>>>>>>>>>> tree that I thought could be needed and I still don't get it >>>>>>>>>>> working so it >>>>>>>>>>> may need some config from the firmware. But how do I access >>>>>>>>>>> device registers >>>>>>>>>>> from board code? I've tried adding a machine reset method and >>>>>>>>>>> write to >>>>>>>>>>> memory mapped device registers but all my attempts failed. >>>>>>>>>>> I've tried >>>>>>>>>>> cpu_stl_le_data and even memory_region_dispatch_write but >>>>>>>>>>> these did not get >>>>>>>>>>> to the device. What's the way to access guest mmio regs from >>>>>>>>>>> QEMU? >>>>>>>>>> >>>>>>>>>> That's odd, cpu_stl() and memory_region_dispatch_write() >>>>>>>>>> should work >>>>>>>>>> from board code (after the relevant memory regions are >>>>>>>>>> configured, of >>>>>>>>>> course). As an ISA serial port, it's probably accessed >>>>>>>>>> through IO >>>>>>>>>> space, not memory space though, so you'd need >>>>>>>>>> &address_space_io. And >>>>>>>>>> if there is some bridge configuration then it's the bridge >>>>>>>>>> control >>>>>>>>>> registers you need to look at not the serial registers - you'd >>>>>>>>>> have to >>>>>>>>>> look at the bridge documentation for that. Or, I guess the >>>>>>>>>> bridge >>>>>>>>>> implementation in qemu, which you wrote part of. >>>>>>>>> >>>>>>>>> I've found at last that stl_le_phys() works. There are so many >>>>>>>>> of these that >>>>>>>>> I never know when to use which. >>>>>>>>> >>>>>>>>> I think the address_space_rw calls in vof_client_call() in >>>>>>>>> vof.c could also >>>>>>>>> use these for somewhat shorter code. I've ended up with >>>>>>>>> stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset >>>>>>>>> methodbut I don't >>>>>>>>> even need that now as it works without additional setup. Also >>>>>>>>> VOF's memory >>>>>>>>> access is basically the same as the already existing rtas_st() >>>>>>>>> and co. so >>>>>>>>> maybe that could be reused to make code smaller? >>>>>>>> >>>>>>>> rtas_ld() and rtas_st() should only be used for reading/writing >>>>>>>> RTAS >>>>>>>> parameters to and from memory. Accessing IO shouldn't be done with >>>>>>>> those. >>>>>>>> >>>>>>>> For IO you probably want the cpu_st*() variants in most cases, >>>>>>>> since >>>>>>>> you're trying to emulate an IO access from the virtual cpu. >>>>>>> >>>>>>> I think I've tried that but what worked to access mmio device >>>>>>> registers are >>>>>>> stl_le_phys and similar that are wrappers around >>>>>>> address_space_stl_*. But I >>>>>>> did not mean that for rtas_ld/_st but the part when vof accessing >>>>>>> the >>>>>>> parameters passed by its hypercall which is memory access: >>>>>>> >>>>>>> https://github.com/patchew-project/qemu/blob/patchew/20210520090557.435689-1-aik%40ozlabs.ru/hw/ppc/vof.c >>>>>>> >>>>>>> >>>>>>> line 893, and vof_client_call before that is very similar to what >>>>>>> h_rtas >>>>>>> does here: >>>>>>> >>>>>>> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_hcall.c;h=f25014afda408002ee1ec1027a0dd7a6025eca61;hb=HEAD#l639 >>>>>>> >>>>>>> >>>>>>> and I also need to do the same for rtas in pegasos2 for which I'm >>>>>>> just using >>>>>>> ldl_be_phys for now but I wonder if we really need 3 ways to do >>>>>>> the same or >>>>>>> the rtas_ld/_st could be made more generic and reused here? >>>>>> >>>>>> For your rtas implementation you could definitely re-use them. For >>>>>> the client call I'm a bit less confident, but if the in-guest-memory >>>>>> structures are really the same, then it would make sense. >>>>> >>>>> The memory structure seems very similar to me, the only difference is >>>>> calling the first field service in VOF instead of token in RTAS. >>>>> Both are >>>>> just an array of big endian unit32_t with token, nargs, nret at the >>>>> front >>>>> followed by args and rets. Since these rtas_ld/st are defined in >>>>> spapr.h I >>>>> did not bother to split them off, so for pegasos2 rtas I'm just >>>>> using the >>>>> ldl_be_* functions directly for which these are a shorthand for. If >>>>> these >>>>> were split off for sharing between spapr rtas and VOF I may be able >>>>> to reuse >>>>> them as well but it's not that important so just mentioned it as a >>>>> possible >>>>> later clean up. >>>> >>>> Ok, sounds reasonable to re-use them then, though maybe add an aliased >>>> name for clarity ofci_{ld,st}(), maybe? (for "Open Firmware Client >>>> Interface") >>> >>> I'll wait for what Alexey decides to do in the next VOF patch version >>> and if >>> I can reuse that (I could if these were defined in vof.h). I don't >>> want to >>> come up with yet another abstraction to ldl_be_* which does not seem >>> to make >>> it more clear than using the actual functions for guest memory access >>> which >>> is what we're doing while getting the hypercall args so I think >>> either using >>> ldl_be_* directly or reusing already existing rfas_ls/_st would make >>> sense >>> but adding similar funcs with another name just makes it more confusing. >> >> Well, the point of the rtas_ld() functions isn't o be a different way >> of accessing memory. It's just a convenience wrapper that takes an >> RTAS args array and an argument index and does the right thing to >> retrieve it for you. >> >> So, if your RTAS function implementation when you want to get argument >> 0, you just go rtas_ld(args, 0) - more readable than having a bunch of >> offset calculations and a long winded call to the BE memory access >> function. You can look at the examples in hw/ppc/sppar_rtas.c to see >> how its used. >> >> Actually, looking again at how it works, you should probably only use >> rtas_ld() if your general dispatch code has pre-parsed the args >> structure into separate args and rets arrays, again as we do in >> spapr_rtas.c > > The problem with those rtas_* functions is that they are in spapr now so > to reuse it I'd need to split them off which I did not do because it's > not too bad without it and modifying spapr would mean another round of > review which could take long and delay my other patches. So if somebody > splits these off for reuse (like if Alexey wants to reuse them in VOF) > then I may use them but otherwise I've just noted these could be reused > but don't intend to do that now. This could also be done later for both > VOF and pegasos2 as a clean up so it does not seem to be too important > at the moment. I added VOF_MEM_READ/VOF_MEM_WRITE as (unlike others) they can return an error code. I am not quite sure why we did not bother then when added rtas_ld/st (were we just learning then?) but we do care now. I am moving those to vof.h. Here is v21: https://github.com/aik/qemu/commits/killslof-cli-v21 changes: v21: * s/ld/ldz/ in entry.S * moved CONFIG_VOF from default-configs/devices/ppc64-softmmu.mak to Kconfig * made CONFIG_VOF optional * s/l.lds/vof.lds/ * force 32 BE in spapr_machine_reset() instead of the firmware * added checks for non-null methods of VofMachineIfClass * moved OF_STACK_SIZE to vof.h, renamed to VOF_..., added a better comment * added path_offset wrapper for handling mixed case for addresses after "@" in node names * changed getprop() to check for actual "name" property in the fdt * moved VOF_MEM_READ/VOF_MEM_WRITE to vof.h for sharing as (unlike similar rtas_ld/ldl_be_*) they return error codes * VOF_MEM_READ uses now address_space_read (it was address_space_read_full before, not sure why) I'll post it .... may be on friday unless you find something else :)
On 6/7/21 13:05, David Gibson wrote: > On Fri, Jun 04, 2021 at 03:50:28PM +0200, BALATON Zoltan wrote: >> On Fri, 4 Jun 2021, David Gibson wrote: >>> On Sun, May 30, 2021 at 07:33:01PM +0200, BALATON Zoltan wrote: > [snip] >>>> MorphOS checks the name property of the root node ("/") to decide what >>>> platform it runs on so we may need to be able to set this property on / >>>> where it should return "bplan,Pegasos2", therefore the above maybe should do >>>> getprop first and only generate name property if it's not set (or at least >>>> check if we're on the root node and allow setting name property there). (On >>>> Macs the root node is named "device-tree" and this was before found to be >>>> needed for MorphOS.) >>> >>> Ah. Hrm. Have to think about what to do about that. >> >> This is easy to fix, this seems to allow setting a name property or return a >> default: >> >>> diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c >> index b47bbd509d..746842593e 100644 >> --- a/hw/ppc/vof.c >> +++ b/hw/ppc/vof.c >> @@ -163,14 +163,14 @@ static uint32_t vof_finddevice(const void *fdt, >> uint32_t nodeaddr) >> static const void *getprop(const void *fdt, int nodeoff, const char *propname, >> int *proplen, bool *write0) >> { >> - const char *unit, *prop; >> + const char *unit, *prop = fdt_getprop(fdt, nodeoff, propname, proplen); >> >> /* >> * The "name" property is not actually stored as a property in the FDT, >> * we emulate it by returning a pointer to the node's name and adjust >> * proplen to include only the name but not the unit. >> */ >> - if (strcmp(propname, "name") == 0) { >> + if (!prop && strcmp(propname, "name") == 0) { >> prop = fdt_get_name(fdt, nodeoff, proplen); >> if (!prop) { >> *proplen = 0; >> @@ -196,7 +196,7 @@ static const void *getprop(const void *fdt, int nodeoff, const char *propname, >> if (write0) { >> *write0 = false; >> } >> - return fdt_getprop(fdt, nodeoff, propname, proplen); >> + return prop; >> } > > Kind of a hack, but it'll do for now. > oops missed this subthread but I ended up doing it anyway, just tiny bit different.
On Wed, 9 Jun 2021, Alexey Kardashevskiy wrote: > On 6/8/21 08:54, BALATON Zoltan wrote: >> On Mon, 7 Jun 2021, David Gibson wrote: >>> On Fri, Jun 04, 2021 at 03:59:22PM +0200, BALATON Zoltan wrote: >>>> On Fri, 4 Jun 2021, David Gibson wrote: >>>>> On Wed, Jun 02, 2021 at 02:29:29PM +0200, BALATON Zoltan wrote: >>>>>> On Wed, 2 Jun 2021, David Gibson wrote: >>>>>>> On Thu, May 27, 2021 at 02:42:39PM +0200, BALATON Zoltan wrote: >>>>>>>> On Thu, 27 May 2021, David Gibson wrote: >>>>>>>>> On Tue, May 25, 2021 at 12:08:45PM +0200, BALATON Zoltan wrote: >>>>>>>>>> On Tue, 25 May 2021, David Gibson wrote: >>>>>>>>>>> On Mon, May 24, 2021 at 12:55:07PM +0200, BALATON Zoltan wrote: >>>>>>>>>>>> On Mon, 24 May 2021, David Gibson wrote: >>>> What's ePAPR then and how is it different from PAPR? I mean the acronym >>>> not >>>> the hypercall method, the latter is explained in that doc but what ePAPR >>>> stands for and why is that method called like that is not clear to me. >>> >>> Ok, history lesson time. >>> >>> For a long time PAPR has been the document that described the OS >>> environment for IBM POWER based server hardware. Before it was called >>> PAPR (POWER Architecture Platform Requirements) it was called the >>> "RPA" (Requirements for the POWER Architecture, I think?). You might >>> see the old name in a few places. >>> >>> Requiring a full Open Firmware and a bunch of other fairly heavyweight >>> stuff, PAPR really wasn't suitable for embedded ppc chips and boards. >>> The situation with those used to be a complete mess with basically >>> every board variant having it's own different firmware with its own >>> different way of presenting some fragments of vital data to the OS. >>> >>> ePAPR - Embedded Power Architecture Platform Requirements - was >>> created as a standard to try to unify how this stuff was handled on >>> embedded ppc chips. I was one of the authors on early versions of >>> it. It's mostly based around giving the OS a flattened device tree, >>> with some deliberately minimal requirements on firmware initialization >>> and entry state. Here's a link to one of those early versions: >>> >>> http://elinux.org/images/c/cf/Power_ePAPR_APPROVED_v1.1.pdf >>> >>> I thought there were later versions, but I couldn't seem to find any. >>> It's possible the process of refining later versions just petered out >>> as the embedded ppc world mostly died and the flattened device tree >>> development mostly moved to ARM. >>> >>> Since some of the embedded chips from Freescale had hypervisor >>> capabilities, a hypercall model was added to ePAPR - but that wasn't >>> something I was greatly involved in, so I don't know much about it. >>> >>> ePAPR is the reason that the original PAPR is sometimes referred to as >>> "sPAPR" to disambiguate. >> >> Ah, thanks that really puts it in context. I've heard about PReP and CHRP >> in connection with the boards I've tried to emulate but don't know much >> about PAPR and server POWER systems. >> >>>>>> The ePAPR (1.) seems to be preferred by KVM and >>>>>> MOL OSI supported for compatibility. >>>>> >>>>> That document looks pretty out of date. Most of it is only discussing >>>>> KVM PR, which is now barely maintained. KVM HV only works with PAPR >>>>> hypercalls. >>>> >>>> The links says it's latest kernel docs, so maybe an update need to be >>>> sent >>>> to KVM? >>> >>> I guess, but the chances of me finding time to do it are approximately >>> zero. >>> >>>>>> So if we need something else instead of >>>>>> 2. PAPR hypercalls there seems to be two options: ePAPR and MOL OSI >>>>>> which >>>>>> should work with KVM but then I'm not sure how to handle those on TCG. >>>>>> >>>>>>>>>> [...] >>>>>>>>>>>>>> I've tested that the missing rtas is not the reason for getting >>>>>>>>>>>>>> no output >>>>>>>>>>>>>> via serial though, as even when disabling rtas on pegasos2.rom >>>>>>>>>>>>>> it boots and >>>>>>>>>>>>>> I still get serial output just some PCI devices are not >>>>>>>>>>>>>> detected (such as >>>>>>>>>>>>>> USB, the video card and the not emulated ethernet port but >>>>>>>>>>>>>> these are not >>>>>>>>>>>>>> fatal so it might even work as a first try without rtas, just >>>>>>>>>>>>>> to boot a >>>>>>>>>>>>>> Linux kernel for testing it would be enough if I can fix the >>>>>>>>>>>>>> serial output). >>>>>>>>>>>>>> I still don't know why it's not finding serial but I think it >>>>>>>>>>>>>> may be some >>>>>>>>>>>>>> missing or wrong info in the device tree I generat. I'll try to >>>>>>>>>>>>>> focus on >>>>>>>>>>>>>> this for now and leave the above rtas question for later. >>>>>>>>>>>>> >>>>>>>>>>>>> Oh.. another thought on that. You have an ISA serial port on >>>>>>>>>>>>> Pegasos, >>>>>>>>>>>>> I believe. I wonder if the PCI->ISA bridge needs some >>>>>>>>>>>>> configuration / >>>>>>>>>>>>> initialization that the firmware is expected to do. If so >>>>>>>>>>>>> you'll need >>>>>>>>>>>>> to mimic that setup in qemu for the VOF case. >>>>>>>>>>>> >>>>>>>>>>>> That's what I begin to think because I've added everything to the >>>>>>>>>>>> device >>>>>>>>>>>> tree that I thought could be needed and I still don't get it >>>>>>>>>>>> working so it >>>>>>>>>>>> may need some config from the firmware. But how do I access >>>>>>>>>>>> device registers >>>>>>>>>>>> from board code? I've tried adding a machine reset method and >>>>>>>>>>>> write to >>>>>>>>>>>> memory mapped device registers but all my attempts failed. I've >>>>>>>>>>>> tried >>>>>>>>>>>> cpu_stl_le_data and even memory_region_dispatch_write but these >>>>>>>>>>>> did not get >>>>>>>>>>>> to the device. What's the way to access guest mmio regs from >>>>>>>>>>>> QEMU? >>>>>>>>>>> >>>>>>>>>>> That's odd, cpu_stl() and memory_region_dispatch_write() should >>>>>>>>>>> work >>>>>>>>>>> from board code (after the relevant memory regions are configured, >>>>>>>>>>> of >>>>>>>>>>> course). As an ISA serial port, it's probably accessed through IO >>>>>>>>>>> space, not memory space though, so you'd need &address_space_io. >>>>>>>>>>> And >>>>>>>>>>> if there is some bridge configuration then it's the bridge control >>>>>>>>>>> registers you need to look at not the serial registers - you'd >>>>>>>>>>> have to >>>>>>>>>>> look at the bridge documentation for that. Or, I guess the bridge >>>>>>>>>>> implementation in qemu, which you wrote part of. >>>>>>>>>> >>>>>>>>>> I've found at last that stl_le_phys() works. There are so many of >>>>>>>>>> these that >>>>>>>>>> I never know when to use which. >>>>>>>>>> >>>>>>>>>> I think the address_space_rw calls in vof_client_call() in vof.c >>>>>>>>>> could also >>>>>>>>>> use these for somewhat shorter code. I've ended up with >>>>>>>>>> stl_le_phys(CPU(cpu)->as, addr, val) in my machine reset methodbut >>>>>>>>>> I don't >>>>>>>>>> even need that now as it works without additional setup. Also VOF's >>>>>>>>>> memory >>>>>>>>>> access is basically the same as the already existing rtas_st() and >>>>>>>>>> co. so >>>>>>>>>> maybe that could be reused to make code smaller? >>>>>>>>> >>>>>>>>> rtas_ld() and rtas_st() should only be used for reading/writing RTAS >>>>>>>>> parameters to and from memory. Accessing IO shouldn't be done with >>>>>>>>> those. >>>>>>>>> >>>>>>>>> For IO you probably want the cpu_st*() variants in most cases, since >>>>>>>>> you're trying to emulate an IO access from the virtual cpu. >>>>>>>> >>>>>>>> I think I've tried that but what worked to access mmio device >>>>>>>> registers are >>>>>>>> stl_le_phys and similar that are wrappers around address_space_stl_*. >>>>>>>> But I >>>>>>>> did not mean that for rtas_ld/_st but the part when vof accessing the >>>>>>>> parameters passed by its hypercall which is memory access: >>>>>>>> >>>>>>>> https://github.com/patchew-project/qemu/blob/patchew/20210520090557.435689-1-aik%40ozlabs.ru/hw/ppc/vof.c >>>>>>>> >>>>>>>> line 893, and vof_client_call before that is very similar to what >>>>>>>> h_rtas >>>>>>>> does here: >>>>>>>> >>>>>>>> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_hcall.c;h=f25014afda408002ee1ec1027a0dd7a6025eca61;hb=HEAD#l639 >>>>>>>> >>>>>>>> and I also need to do the same for rtas in pegasos2 for which I'm >>>>>>>> just using >>>>>>>> ldl_be_phys for now but I wonder if we really need 3 ways to do the >>>>>>>> same or >>>>>>>> the rtas_ld/_st could be made more generic and reused here? >>>>>>> >>>>>>> For your rtas implementation you could definitely re-use them. For >>>>>>> the client call I'm a bit less confident, but if the in-guest-memory >>>>>>> structures are really the same, then it would make sense. >>>>>> >>>>>> The memory structure seems very similar to me, the only difference is >>>>>> calling the first field service in VOF instead of token in RTAS. Both >>>>>> are >>>>>> just an array of big endian unit32_t with token, nargs, nret at the >>>>>> front >>>>>> followed by args and rets. Since these rtas_ld/st are defined in >>>>>> spapr.h I >>>>>> did not bother to split them off, so for pegasos2 rtas I'm just using >>>>>> the >>>>>> ldl_be_* functions directly for which these are a shorthand for. If >>>>>> these >>>>>> were split off for sharing between spapr rtas and VOF I may be able to >>>>>> reuse >>>>>> them as well but it's not that important so just mentioned it as a >>>>>> possible >>>>>> later clean up. >>>>> >>>>> Ok, sounds reasonable to re-use them then, though maybe add an aliased >>>>> name for clarity ofci_{ld,st}(), maybe? (for "Open Firmware Client >>>>> Interface") >>>> >>>> I'll wait for what Alexey decides to do in the next VOF patch version and >>>> if >>>> I can reuse that (I could if these were defined in vof.h). I don't want >>>> to >>>> come up with yet another abstraction to ldl_be_* which does not seem to >>>> make >>>> it more clear than using the actual functions for guest memory access >>>> which >>>> is what we're doing while getting the hypercall args so I think either >>>> using >>>> ldl_be_* directly or reusing already existing rfas_ls/_st would make >>>> sense >>>> but adding similar funcs with another name just makes it more confusing. >>> >>> Well, the point of the rtas_ld() functions isn't o be a different way >>> of accessing memory. It's just a convenience wrapper that takes an >>> RTAS args array and an argument index and does the right thing to >>> retrieve it for you. >>> >>> So, if your RTAS function implementation when you want to get argument >>> 0, you just go rtas_ld(args, 0) - more readable than having a bunch of >>> offset calculations and a long winded call to the BE memory access >>> function. You can look at the examples in hw/ppc/sppar_rtas.c to see >>> how its used. >>> >>> Actually, looking again at how it works, you should probably only use >>> rtas_ld() if your general dispatch code has pre-parsed the args >>> structure into separate args and rets arrays, again as we do in >>> spapr_rtas.c >> >> The problem with those rtas_* functions is that they are in spapr now so to >> reuse it I'd need to split them off which I did not do because it's not too >> bad without it and modifying spapr would mean another round of review which >> could take long and delay my other patches. So if somebody splits these off >> for reuse (like if Alexey wants to reuse them in VOF) then I may use them >> but otherwise I've just noted these could be reused but don't intend to do >> that now. This could also be done later for both VOF and pegasos2 as a >> clean up so it does not seem to be too important at the moment. > > I added VOF_MEM_READ/VOF_MEM_WRITE as (unlike others) they can return an > error code. I am not quite sure why we did not bother then when added > rtas_ld/st (were we just learning then?) but we do care now. > > I am moving those to vof.h. > > Here is v21: > https://github.com/aik/qemu/commits/killslof-cli-v21 > > changes: > v21: > * s/ld/ldz/ in entry.S > * moved CONFIG_VOF from default-configs/devices/ppc64-softmmu.mak to Kconfig > * made CONFIG_VOF optional > * s/l.lds/vof.lds/ > * force 32 BE in spapr_machine_reset() instead of the firmware > * added checks for non-null methods of VofMachineIfClass > * moved OF_STACK_SIZE to vof.h, renamed to VOF_..., added a better comment > * added path_offset wrapper for handling mixed case for addresses after "@" > in node names > * changed getprop() to check for actual "name" property in the fdt > * moved VOF_MEM_READ/VOF_MEM_WRITE to vof.h for sharing as (unlike similar > rtas_ld/ldl_be_*) they return error codes > * VOF_MEM_READ uses now address_space_read (it was address_space_read_full > before, not sure why) > > > > I'll post it .... may be on friday unless you find something else :) I likely won't be finding more as I've tun out of time for it now so I'm just waiting for your patch to rebase on. It works with TCG and got the problems I've posted with KVM so I can't move on with that without some help. Regards, BALATON Zoltan
diff --git a/pc-bios/vof/Makefile b/pc-bios/vof/Makefile new file mode 100644 index 000000000000..8d8c89e665ac --- /dev/null +++ b/pc-bios/vof/Makefile @@ -0,0 +1,23 @@ +all: build-all + +build-all: vof.bin + +CROSS ?= +CC = $(CROSS)gcc +LD = $(CROSS)ld +OBJCOPY = $(CROSS)objcopy + +%.o: %.S + $(CC) -m32 -mbig-endian -mcpu=power4 -c -o $@ $< + +%.o: %.c + $(CC) -m32 -mbig-endian -mcpu=power4 -c -fno-stack-protector -o $@ $< + +vof.elf: entry.o main.o ci.o bootmem.o libc.o + $(LD) -nostdlib -e_start -Tl.lds -EB -o $@ $^ + +%.bin: %.elf + $(OBJCOPY) -O binary -j .text -j .data -j .toc -j .got2 $^ $@ + +clean: + rm -f *.o vof.bin vof.elf *~ diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index bbf817af4647..cdbd9a03678a 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -12,6 +12,7 @@ #include "hw/ppc/spapr_xive.h" /* For SpaprXive */ #include "hw/ppc/xics.h" /* For ICSState */ #include "hw/ppc/spapr_tpm_proxy.h" +#include "hw/ppc/vof.h" struct SpaprVioBus; struct SpaprPhbState; @@ -180,6 +181,7 @@ struct SpaprMachineState { uint64_t kernel_addr; uint32_t initrd_base; long initrd_size; + Vof *vof; uint64_t rtc_offset; /* Now used only during incoming migration */ struct PPCTimebase tb; bool has_graphics; @@ -555,7 +557,9 @@ struct SpaprMachineState { /* Client Architecture support */ #define KVMPPC_H_CAS (KVMPPC_HCALL_BASE + 0x2) #define KVMPPC_H_UPDATE_DT (KVMPPC_HCALL_BASE + 0x3) -#define KVMPPC_HCALL_MAX KVMPPC_H_UPDATE_DT +/* 0x4 was used for KVMPPC_H_UPDATE_PHANDLE in SLOF */ +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) +#define KVMPPC_HCALL_MAX KVMPPC_H_VOF_CLIENT /* * The hcall range 0xEF00 to 0xEF80 is reserved for use in facilitating @@ -953,4 +957,17 @@ bool spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize, void spapr_set_all_lpcrs(target_ulong value, target_ulong mask); hwaddr spapr_get_rtas_addr(void); bool spapr_memory_hot_unplug_supported(SpaprMachineState *spapr); + +void spapr_vof_reset(SpaprMachineState *spapr, void *fdt, + target_ulong *stack_ptr, Error **errp); +void spapr_vof_quiesce(MachineState *ms); +bool spapr_vof_setprop(MachineState *ms, const char *path, const char *propname, + void *val, int vallen); +target_ulong spapr_h_vof_client(PowerPCCPU *cpu, SpaprMachineState *spapr, + target_ulong opcode, target_ulong *args); +target_ulong spapr_vof_client_architecture_support(MachineState *ms, + CPUState *cs, + target_ulong ovec_addr); +void spapr_vof_client_dt_finalize(SpaprMachineState *spapr, void *fdt); + #endif /* HW_SPAPR_H */ diff --git a/include/hw/ppc/vof.h b/include/hw/ppc/vof.h new file mode 100644 index 000000000000..e64096b77897 --- /dev/null +++ b/include/hw/ppc/vof.h @@ -0,0 +1,42 @@ +/* + * Virtual Open Firmware + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ +#ifndef HW_VOF_H +#define HW_VOF_H + +typedef struct Vof { + uint64_t top_addr; /* copied from rma_size */ + GArray *claimed; /* array of SpaprOfClaimed */ + uint64_t claimed_base; + GHashTable *of_instances; /* ihandle -> SpaprOfInstance */ + uint32_t of_instance_last; + char *bootargs; + long fw_size; +} Vof; + +int vof_client_call(MachineState *ms, Vof *vof, void *fdt, + target_ulong args_real); +uint64_t vof_claim(Vof *vof, uint64_t virt, uint64_t size, uint64_t align); +void vof_init(Vof *vof, uint64_t top_addr, Error **errp); +void vof_cleanup(Vof *vof); +void vof_build_dt(void *fdt, Vof *vof); +uint32_t vof_client_open_store(void *fdt, Vof *vof, const char *nodename, + const char *prop, const char *path); + +#define TYPE_VOF_MACHINE_IF "vof-machine-if" + +typedef struct VofMachineIfClass VofMachineIfClass; +DECLARE_CLASS_CHECKERS(VofMachineIfClass, VOF_MACHINE, TYPE_VOF_MACHINE_IF) + +struct VofMachineIfClass { + InterfaceClass parent; + target_ulong (*client_architecture_support)(MachineState *ms, CPUState *cs, + target_ulong vec); + void (*quiesce)(MachineState *ms); + bool (*setprop)(MachineState *ms, const char *path, const char *propname, + void *val, int vallen); +}; + +#endif /* HW_VOF_H */ diff --git a/pc-bios/vof/vof.h b/pc-bios/vof/vof.h new file mode 100644 index 000000000000..2d8958076907 --- /dev/null +++ b/pc-bios/vof/vof.h @@ -0,0 +1,43 @@ +/* + * Virtual Open Firmware + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ +#include <stdarg.h> + +typedef unsigned char uint8_t; +typedef unsigned short uint16_t; +typedef unsigned long uint32_t; +typedef unsigned long long uint64_t; +#define NULL (0) +#define PROM_ERROR (-1u) +typedef unsigned long ihandle; +typedef unsigned long phandle; +typedef int size_t; +typedef void client(void); + +/* globals */ +extern void _prom_entry(void); /* OF CI entry point (i.e. this firmware) */ + +void do_boot(unsigned long addr, unsigned long r3, unsigned long r4); + +/* libc */ +int strlen(const char *s); +int strcmp(const char *s1, const char *s2); +void *memcpy(void *dest, const void *src, size_t n); +int memcmp(const void *ptr1, const void *ptr2, size_t n); +void *memmove(void *dest, const void *src, size_t n); +void *memset(void *dest, int c, size_t size); + +/* CI wrappers */ +void ci_panic(const char *str); +phandle ci_finddevice(const char *path); +uint32_t ci_getprop(phandle ph, const char *propname, void *prop, int len); + +/* booting from -kernel */ +void boot_from_memory(uint64_t initrd, uint64_t initrdsize); + +/* Entry points for CI and RTAS */ +extern uint32_t ci_entry(uint32_t params); +extern unsigned long hv_rtas(unsigned long params); +extern unsigned int hv_rtas_size; diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index c23bcc449071..f5ae92dc441d 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -101,6 +101,7 @@ #define FDT_MAX_ADDR 0x80000000 /* FDT must stay below that */ #define FW_MAX_SIZE 0x400000 #define FW_FILE_NAME "slof.bin" +#define FW_FILE_NAME_VOF "vof.bin" #define FW_OVERHEAD 0x2800000 #define KERNEL_LOAD_ADDR FW_MAX_SIZE @@ -1640,22 +1641,37 @@ static void spapr_machine_reset(MachineState *machine) fdt = spapr_build_fdt(spapr, true, FDT_MAX_SIZE); - rc = fdt_pack(fdt); + if (spapr->vof) { + target_ulong stack_ptr = 0; - /* Should only fail if we've built a corrupted tree */ - assert(rc == 0); + spapr_vof_reset(spapr, fdt, &stack_ptr, &error_fatal); - /* Load the fdt */ + spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, + stack_ptr, spapr->initrd_base, + spapr->initrd_size); + /* We do not pack the FDT as the client may change properties. */ + } else { + rc = fdt_pack(fdt); + /* Should only fail if we've built a corrupted tree */ + assert(rc == 0); + + spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, + 0, fdt_addr, 0); + } qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt)); - cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt)); + g_free(spapr->fdt_blob); spapr->fdt_size = fdt_totalsize(fdt); spapr->fdt_initial_size = spapr->fdt_size; spapr->fdt_blob = fdt; /* Set up the entry state */ - spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, 0, fdt_addr, 0); first_ppc_cpu->env.gpr[5] = 0; + /* VOF client does not expect the FDT so we do not load it to the VM. */ + if (!spapr->vof) { + /* Load the fdt */ + cpu_physical_memory_write(fdt_addr, spapr->fdt_blob, spapr->fdt_size); + } spapr->fwnmi_system_reset_addr = -1; spapr->fwnmi_machine_check_addr = -1; @@ -2655,7 +2671,8 @@ static void spapr_machine_init(MachineState *machine) SpaprMachineState *spapr = SPAPR_MACHINE(machine); SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine); MachineClass *mc = MACHINE_GET_CLASS(machine); - const char *bios_name = machine->firmware ?: FW_FILE_NAME; + const char *bios_default = !!spapr->vof ? FW_FILE_NAME_VOF : FW_FILE_NAME; + const char *bios_name = machine->firmware ?: bios_default; const char *kernel_filename = machine->kernel_filename; const char *initrd_filename = machine->initrd_filename; PCIHostState *phb; @@ -3012,6 +3029,11 @@ static void spapr_machine_init(MachineState *machine) } qemu_cond_init(&spapr->fwnmi_machine_check_interlock_cond); + + if (spapr->vof) { + spapr->vof->fw_size = fw_size; + spapr_register_hypercall(KVMPPC_H_VOF_CLIENT, spapr_h_vof_client); + } } #define DEFAULT_KVM_TYPE "auto" @@ -3202,6 +3224,28 @@ static void spapr_set_resize_hpt(Object *obj, const char *value, Error **errp) } } +static bool spapr_get_vof(Object *obj, Error **errp) +{ + SpaprMachineState *spapr = SPAPR_MACHINE(obj); + + return spapr->vof != NULL; +} + +static void spapr_set_vof(Object *obj, bool value, Error **errp) +{ + SpaprMachineState *spapr = SPAPR_MACHINE(obj); + + if (spapr->vof) { + vof_cleanup(spapr->vof); + g_free(spapr->vof); + spapr->vof = NULL; + } + if (!value) { + return; + } + spapr->vof = g_malloc0(sizeof(*spapr->vof)); +} + static char *spapr_get_ic_mode(Object *obj, Error **errp) { SpaprMachineState *spapr = SPAPR_MACHINE(obj); @@ -3327,6 +3371,10 @@ static void spapr_instance_init(Object *obj) stringify(KERNEL_LOAD_ADDR) " for -kernel is the default"); spapr->kernel_addr = KERNEL_LOAD_ADDR; + object_property_add_bool(obj, "x-vof", spapr_get_vof, spapr_set_vof); + object_property_set_description(obj, "x-vof", + "Enable Virtual Open Firmware (experimental)"); + /* The machine class defines the default interrupt controller mode */ spapr->irq = smc->irq; object_property_add_str(obj, "ic-mode", spapr_get_ic_mode, @@ -4490,6 +4538,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) XICSFabricClass *xic = XICS_FABRIC_CLASS(oc); InterruptStatsProviderClass *ispc = INTERRUPT_STATS_PROVIDER_CLASS(oc); XiveFabricClass *xfc = XIVE_FABRIC_CLASS(oc); + VofMachineIfClass *vmc = VOF_MACHINE_CLASS(oc); mc->desc = "pSeries Logical Partition (PAPR compliant)"; mc->ignore_boot_device_suffixes = true; @@ -4578,6 +4627,10 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) smc->smp_threads_vsmt = true; smc->nr_xirqs = SPAPR_NR_XIRQS; xfc->match_nvt = spapr_match_nvt; + + vmc->client_architecture_support = spapr_vof_client_architecture_support; + vmc->quiesce = spapr_vof_quiesce; + vmc->setprop = spapr_vof_setprop; } static const TypeInfo spapr_machine_info = { @@ -4597,6 +4650,7 @@ static const TypeInfo spapr_machine_info = { { TYPE_XICS_FABRIC }, { TYPE_INTERRUPT_STATS_PROVIDER }, { TYPE_XIVE_FABRIC }, + { TYPE_VOF_MACHINE_IF }, { } }, }; diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index f25014afda40..61269cf21e3c 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -1233,7 +1233,7 @@ target_ulong do_client_architecture_support(PowerPCCPU *cpu, spapr_setup_hpt(spapr); } - fdt = spapr_build_fdt(spapr, false, fdt_bufsize); + fdt = spapr_build_fdt(spapr, spapr->vof != NULL, fdt_bufsize); g_free(spapr->fdt_blob); spapr->fdt_size = fdt_totalsize(fdt); @@ -1277,6 +1277,25 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu, return ret; } +target_ulong spapr_vof_client_architecture_support(MachineState *ms, + CPUState *cs, + target_ulong ovec_addr) +{ + SpaprMachineState *spapr = SPAPR_MACHINE(ms); + + target_ulong ret = do_client_architecture_support(POWERPC_CPU(cs), spapr, + ovec_addr, FDT_MAX_SIZE); + + /* + * This adds stdout and generates phandles for boottime and CAS FDTs. + * It is alright to update the FDT here as do_client_architecture_support() + * does not pack it. + */ + spapr_vof_client_dt_finalize(spapr, spapr->fdt_blob); + + return ret; +} + static target_ulong h_get_cpu_characteristics(PowerPCCPU *cpu, SpaprMachineState *spapr, target_ulong opcode, diff --git a/hw/ppc/spapr_vof.c b/hw/ppc/spapr_vof.c new file mode 100644 index 000000000000..5e34d5402abf --- /dev/null +++ b/hw/ppc/spapr_vof.c @@ -0,0 +1,156 @@ +/* + * SPAPR machine hooks to Virtual Open Firmware, + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ +#include "qemu/osdep.h" +#include "qemu-common.h" +#include <sys/ioctl.h> +#include "qapi/error.h" +#include "hw/ppc/spapr.h" +#include "hw/ppc/spapr_vio.h" +#include "hw/ppc/fdt.h" +#include "sysemu/sysemu.h" +#include "qom/qom-qobject.h" +#include "trace.h" + +/* Copied from SLOF, and 4K is definitely not enough for GRUB */ +#define OF_STACK_SIZE 0x8000 + +target_ulong spapr_h_vof_client(PowerPCCPU *cpu, SpaprMachineState *spapr, + target_ulong opcode, target_ulong *_args) +{ + int ret = vof_client_call(MACHINE(spapr), spapr->vof, spapr->fdt_blob, + ppc64_phys_to_real(_args[0])); + + if (ret) { + return H_PARAMETER; + } + return H_SUCCESS; +} + +void spapr_vof_client_dt_finalize(SpaprMachineState *spapr, void *fdt) +{ + char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus); + int chosen; + + vof_build_dt(fdt, spapr->vof); + + _FDT(chosen = fdt_path_offset(fdt, "/chosen")); + _FDT(fdt_setprop_string(fdt, chosen, "bootargs", + spapr->vof->bootargs ? : "")); + + /* + * SLOF-less setup requires an open instance of stdout for early + * kernel printk. By now all phandles are settled so we can open + * the default serial console. + */ + if (stdout_path) { + _FDT(vof_client_open_store(fdt, spapr->vof, "/chosen", "stdout", + stdout_path)); + } +} + +void spapr_vof_reset(SpaprMachineState *spapr, void *fdt, + target_ulong *stack_ptr, Error **errp) +{ + Vof *vof = spapr->vof; + + vof_init(vof, spapr->rma_size, errp); + + *stack_ptr = vof_claim(vof, 0, OF_STACK_SIZE, OF_STACK_SIZE); + if (*stack_ptr == -1) { + error_setg(errp, "Memory allocation for stack failed"); + return; + } + /* Stack grows downwards plus reserve space for the minimum stack frame */ + *stack_ptr += OF_STACK_SIZE - 0x20; + + if (spapr->kernel_size && + vof_claim(vof, spapr->kernel_addr, spapr->kernel_size, 0) == -1) { + error_setg(errp, "Memory for kernel is in use"); + return; + } + + if (spapr->initrd_size && + vof_claim(vof, spapr->initrd_base, spapr->initrd_size, 0) == -1) { + error_setg(errp, "Memory for initramdisk is in use"); + return; + } + + spapr_vof_client_dt_finalize(spapr, fdt); + + /* + * At this point the expected allocation map is: + * + * 0..c38 - the initial firmware + * 8000..10000 - stack + * 400000.. - kernel + * 3ea0000.. - initramdisk + * + * We skip writing FDT as nothing expects it; OF client interface is + * going to be used for reading the device tree. + */ +} + +void spapr_vof_quiesce(MachineState *ms) +{ + SpaprMachineState *spapr = SPAPR_MACHINE(ms); + + spapr->fdt_size = fdt_totalsize(spapr->fdt_blob); + spapr->fdt_initial_size = spapr->fdt_size; +} + +bool spapr_vof_setprop(MachineState *ms, const char *path, const char *propname, + void *val, int vallen) +{ + SpaprMachineState *spapr = SPAPR_MACHINE(ms); + + /* + * We only allow changing properties which we know how to update in QEMU + * OR + * the ones which we know that they need to survive during "quiesce". + */ + + if (strcmp(path, "/rtas") == 0) { + if (strcmp(propname, "linux,rtas-base") == 0 || + strcmp(propname, "linux,rtas-entry") == 0) { + /* These need to survive quiesce so let them store in the FDT */ + return true; + } + } + + if (strcmp(path, "/chosen") == 0) { + if (strcmp(propname, "bootargs") == 0) { + Vof *vof = spapr->vof; + + g_free(vof->bootargs); + vof->bootargs = g_strndup(val, vallen); + return true; + } + if (strcmp(propname, "linux,initrd-start") == 0) { + if (vallen == sizeof(uint32_t)) { + spapr->initrd_base = ldl_be_p(val); + return true; + } + if (vallen == sizeof(uint64_t)) { + spapr->initrd_base = ldq_be_p(val); + return true; + } + return false; + } + if (strcmp(propname, "linux,initrd-end") == 0) { + if (vallen == sizeof(uint32_t)) { + spapr->initrd_size = ldl_be_p(val) - spapr->initrd_base; + return true; + } + if (vallen == sizeof(uint64_t)) { + spapr->initrd_size = ldq_be_p(val) - spapr->initrd_base; + return true; + } + return false; + } + } + + return true; +} diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c new file mode 100644 index 000000000000..a283b7d251a7 --- /dev/null +++ b/hw/ppc/vof.c @@ -0,0 +1,1021 @@ +/* + * QEMU PowerPC Virtual Open Firmware. + * + * This implements client interface from OpenFirmware IEEE1275 on the QEMU + * side to leave only a very basic firmware in the VM. + * + * Copyright (c) 2021 IBM Corporation. + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" +#include "qemu-common.h" +#include "qemu/timer.h" +#include "qemu/range.h" +#include "qemu/units.h" +#include "qapi/error.h" +#include <sys/ioctl.h> +#include "exec/ram_addr.h" +#include "exec/address-spaces.h" +#include "hw/ppc/vof.h" +#include "hw/ppc/fdt.h" +#include "sysemu/runstate.h" +#include "qom/qom-qobject.h" +#include "trace.h" + +#include <libfdt.h> + +/* + * OF 1275 "nextprop" description suggests is it 32 bytes max but + * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 chars long. + */ +#define OF_PROPNAME_LEN_MAX 64 + +#define VOF_MAX_PATH 256 +#define VOF_MAX_SETPROPLEN 2048 +#define VOF_MAX_METHODLEN 256 +#define VOF_MAX_FORTHCODE 256 +#define VOF_VTY_BUF_SIZE 256 + +typedef struct { + uint64_t start; + uint64_t size; +} OfClaimed; + +typedef struct { + char *path; /* the path used to open the instance */ + uint32_t phandle; +} OfInstance; + +#define VOF_MEM_READ(pa, buf, size) \ + address_space_read_full(&address_space_memory, \ + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) +#define VOF_MEM_WRITE(pa, buf, size) \ + address_space_write(&address_space_memory, \ + (pa), MEMTXATTRS_UNSPECIFIED, (buf), (size)) + +static int readstr(hwaddr pa, char *buf, int size) +{ + if (VOF_MEM_READ(pa, buf, size) != MEMTX_OK) { + return -1; + } + if (strnlen(buf, size) == size) { + buf[size - 1] = '\0'; + trace_vof_error_str_truncated(buf, size); + return -1; + } + return 0; +} + +static bool cmpservice(const char *s, unsigned nargs, unsigned nret, + const char *s1, unsigned nargscheck, unsigned nretcheck) +{ + if (strcmp(s, s1)) { + return false; + } + if ((nargscheck && (nargs != nargscheck)) || + (nretcheck && (nret != nretcheck))) { + trace_vof_error_param(s, nargscheck, nretcheck, nargs, nret); + return false; + } + + return true; +} + +static void prop_format(char *tval, int tlen, const void *prop, int len) +{ + int i; + const unsigned char *c; + char *t; + const char bin[] = "..."; + + for (i = 0, c = prop; i < len; ++i, ++c) { + if (*c == '\0' && i == len - 1) { + strncpy(tval, prop, tlen - 1); + return; + } + if (*c < 0x20 || *c >= 0x80) { + break; + } + } + + for (i = 0, c = prop, t = tval; i < len; ++i, ++c) { + if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) { + strcpy(t, bin); + return; + } + if (i && i % 4 == 0 && i != len - 1) { + strcat(t, " "); + ++t; + } + t += sprintf(t, "%02X", *c & 0xFF); + } +} + +static int get_path(const void *fdt, int offset, char *buf, int len) +{ + int ret; + + ret = fdt_get_path(fdt, offset, buf, len - 1); + if (ret < 0) { + return ret; + } + + buf[len - 1] = '\0'; + + return strlen(buf) + 1; +} + +static int phandle_to_path(const void *fdt, uint32_t ph, char *buf, int len) +{ + int ret; + + ret = fdt_node_offset_by_phandle(fdt, ph); + if (ret < 0) { + return ret; + } + + return get_path(fdt, ret, buf, len); +} + +static uint32_t vof_finddevice(const void *fdt, uint32_t nodeaddr) +{ + char fullnode[VOF_MAX_PATH]; + uint32_t ret = -1; + int offset; + + if (readstr(nodeaddr, fullnode, sizeof(fullnode))) { + return (uint32_t) ret; + } + + offset = fdt_path_offset(fdt, fullnode); + if (offset >= 0) { + ret = fdt_get_phandle(fdt, offset); + } + trace_vof_finddevice(fullnode, ret); + return (uint32_t) ret; +} + +static const void *getprop(const void *fdt, int nodeoff, const char *propname, + int *proplen, bool *write0) +{ + const char *unit, *prop; + + /* + * The "name" property is not actually stored as a property in the FDT, + * we emulate it by returning a pointer to the node's name and adjust + * proplen to include only the name but not the unit. + */ + if (strcmp(propname, "name") == 0) { + prop = fdt_get_name(fdt, nodeoff, proplen); + if (!prop) { + *proplen = 0; + return NULL; + } + + unit = memchr(prop, '@', *proplen); + if (unit) { + *proplen = unit - prop; + } + *proplen += 1; + + /* + * Since it might be cut at "@" and there will be no trailing zero + * in the prop buffer, tell the caller to write zero at the end. + */ + if (write0) { + *write0 = true; + } + return prop; + } + + if (write0) { + *write0 = false; + } + return fdt_getprop(fdt, nodeoff, propname, proplen); +} + +static uint32_t vof_getprop(const void *fdt, uint32_t nodeph, uint32_t pname, + uint32_t valaddr, uint32_t vallen) +{ + char propname[OF_PROPNAME_LEN_MAX + 1]; + uint32_t ret = 0; + int proplen = 0; + const void *prop; + char trval[64] = ""; + int nodeoff = fdt_node_offset_by_phandle(fdt, nodeph); + bool write0; + + if (nodeoff < 0) { + return -1; + } + if (readstr(pname, propname, sizeof(propname))) { + return -1; + } + prop = getprop(fdt, nodeoff, propname, &proplen, &write0); + if (prop) { + const char zero = 0; + int cb = MIN(proplen, vallen); + + if (VOF_MEM_WRITE(valaddr, prop, cb) != MEMTX_OK || + /* if that was "name" with a unit address, overwrite '@' with '0' */ + (write0 && + cb == proplen && + VOF_MEM_WRITE(valaddr + cb - 1, &zero, 1) != MEMTX_OK)) { + ret = -1; + } else { + /* + * OF1275 says: + * "Size is either the actual size of the property, or -1 if name + * does not exist", hence returning proplen instead of cb. + */ + ret = proplen; + /* Do not format a value if tracepoint is silent, for performance */ + if (trace_event_get_state(TRACE_VOF_GETPROP) && + qemu_loglevel_mask(LOG_TRACE)) { + prop_format(trval, sizeof(trval), prop, ret); + } + } + } else { + ret = -1; + } + trace_vof_getprop(nodeph, propname, ret, trval); + + return ret; +} + +static uint32_t vof_getproplen(const void *fdt, uint32_t nodeph, uint32_t pname) +{ + char propname[OF_PROPNAME_LEN_MAX + 1]; + uint32_t ret = 0; + int proplen = 0; + const void *prop; + int nodeoff = fdt_node_offset_by_phandle(fdt, nodeph); + + if (nodeoff < 0) { + return -1; + } + if (readstr(pname, propname, sizeof(propname))) { + return -1; + } + prop = getprop(fdt, nodeoff, propname, &proplen, NULL); + if (prop) { + ret = proplen; + } else { + ret = -1; + } + trace_vof_getproplen(nodeph, propname, ret); + + return ret; +} + +static uint32_t vof_setprop(MachineState *ms, void *fdt, Vof *vof, + uint32_t nodeph, uint32_t pname, + uint32_t valaddr, uint32_t vallen) +{ + char propname[OF_PROPNAME_LEN_MAX + 1]; + uint32_t ret = -1; + int offset; + char trval[64] = ""; + char nodepath[VOF_MAX_PATH] = ""; + Object *vmo = object_dynamic_cast(OBJECT(ms), TYPE_VOF_MACHINE_IF); + g_autofree char *val = NULL; + + if (vallen > VOF_MAX_SETPROPLEN) { + goto trace_exit; + } + if (readstr(pname, propname, sizeof(propname))) { + goto trace_exit; + } + offset = fdt_node_offset_by_phandle(fdt, nodeph); + if (offset < 0) { + goto trace_exit; + } + ret = get_path(fdt, offset, nodepath, sizeof(nodepath)); + if (ret <= 0) { + goto trace_exit; + } + + val = g_malloc0(vallen); + if (VOF_MEM_READ(valaddr, val, vallen) != MEMTX_OK) { + goto trace_exit; + } + + if (vmo) { + VofMachineIfClass *vmc = VOF_MACHINE_GET_CLASS(vmo); + + if (!vmc->setprop(ms, nodepath, propname, val, vallen)) { + goto trace_exit; + } + } + + ret = fdt_setprop(fdt, offset, propname, val, vallen); + if (ret) { + goto trace_exit; + } + + if (trace_event_get_state(TRACE_VOF_SETPROP) && + qemu_loglevel_mask(LOG_TRACE)) { + prop_format(trval, sizeof(trval), val, vallen); + } + ret = vallen; + +trace_exit: + trace_vof_setprop(nodeph, propname, trval, vallen, ret); + + return ret; +} + +static uint32_t vof_nextprop(const void *fdt, uint32_t phandle, + uint32_t prevaddr, uint32_t nameaddr) +{ + int offset, nodeoff = fdt_node_offset_by_phandle(fdt, phandle); + char prev[OF_PROPNAME_LEN_MAX + 1]; + const char *tmp; + + if (readstr(prevaddr, prev, sizeof(prev))) { + return -1; + } + + fdt_for_each_property_offset(offset, fdt, nodeoff) { + if (!fdt_getprop_by_offset(fdt, offset, &tmp, NULL)) { + return 0; + } + if (prev[0] == '\0' || strcmp(prev, tmp) == 0) { + if (prev[0] != '\0') { + offset = fdt_next_property_offset(fdt, offset); + if (offset < 0) { + return 0; + } + } + if (!fdt_getprop_by_offset(fdt, offset, &tmp, NULL)) { + return 0; + } + + if (VOF_MEM_WRITE(nameaddr, tmp, strlen(tmp) + 1) != MEMTX_OK) { + return -1; + } + return 1; + } + } + + return 0; +} + +static uint32_t vof_peer(const void *fdt, uint32_t phandle) +{ + int ret; + + if (phandle == 0) { + ret = fdt_path_offset(fdt, "/"); + } else { + ret = fdt_next_subnode(fdt, fdt_node_offset_by_phandle(fdt, phandle)); + } + + if (ret < 0) { + ret = 0; + } else { + ret = fdt_get_phandle(fdt, ret); + } + + return ret; +} + +static uint32_t vof_child(const void *fdt, uint32_t phandle) +{ + int ret = fdt_first_subnode(fdt, fdt_node_offset_by_phandle(fdt, phandle)); + + if (ret < 0) { + ret = 0; + } else { + ret = fdt_get_phandle(fdt, ret); + } + + return ret; +} + +static uint32_t vof_parent(const void *fdt, uint32_t phandle) +{ + int ret = fdt_parent_offset(fdt, fdt_node_offset_by_phandle(fdt, phandle)); + + if (ret < 0) { + ret = 0; + } else { + ret = fdt_get_phandle(fdt, ret); + } + + return ret; +} + +static uint32_t vof_do_open(void *fdt, Vof *vof, const char *path) +{ + int offset; + uint32_t ret = 0; + OfInstance *inst = NULL; + + if (vof->of_instance_last == 0xFFFFFFFF) { + /* We do not recycle ihandles yet */ + goto trace_exit; + } + + offset = fdt_path_offset(fdt, path); + if (offset < 0) { + trace_vof_error_unknown_path(path); + goto trace_exit; + } + + inst = g_new0(OfInstance, 1); + inst->phandle = fdt_get_phandle(fdt, offset); + g_assert(inst->phandle); + ++vof->of_instance_last; + + inst->path = g_strdup(path); + g_hash_table_insert(vof->of_instances, + GINT_TO_POINTER(vof->of_instance_last), + inst); + ret = vof->of_instance_last; + +trace_exit: + trace_vof_open(path, inst ? inst->phandle : 0, ret); + + return ret; +} + +uint32_t vof_client_open_store(void *fdt, Vof *vof, const char *nodename, + const char *prop, const char *path) +{ + int node = fdt_path_offset(fdt, nodename); + uint32_t inst = vof_do_open(fdt, vof, path); + + return fdt_setprop_cell(fdt, node, prop, inst); +} + +static uint32_t vof_open(void *fdt, Vof *vof, uint32_t pathaddr) +{ + char path[VOF_MAX_PATH]; + + if (readstr(pathaddr, path, sizeof(path))) { + return -1; + } + + return vof_do_open(fdt, vof, path); +} + +static void vof_close(Vof *vof, uint32_t ihandle) +{ + if (!g_hash_table_remove(vof->of_instances, GINT_TO_POINTER(ihandle))) { + trace_vof_error_unknown_ihandle_close(ihandle); + } +} + +static uint32_t vof_instance_to_package(Vof *vof, uint32_t ihandle) +{ + gpointer instp = g_hash_table_lookup(vof->of_instances, + GINT_TO_POINTER(ihandle)); + uint32_t ret = -1; + + if (instp) { + ret = ((OfInstance *)instp)->phandle; + } + trace_vof_instance_to_package(ihandle, ret); + + return ret; +} + +static uint32_t vof_package_to_path(const void *fdt, uint32_t phandle, + uint32_t buf, uint32_t len) +{ + uint32_t ret = -1; + char tmp[VOF_MAX_PATH] = ""; + + ret = phandle_to_path(fdt, phandle, tmp, sizeof(tmp)); + if (ret > 0) { + if (VOF_MEM_WRITE(buf, tmp, ret) != MEMTX_OK) { + ret = -1; + } + } + + trace_vof_package_to_path(phandle, tmp, ret); + + return ret; +} + +static uint32_t vof_instance_to_path(void *fdt, Vof *vof, uint32_t ihandle, + uint32_t buf, uint32_t len) +{ + uint32_t ret = -1; + uint32_t phandle = vof_instance_to_package(vof, ihandle); + char tmp[VOF_MAX_PATH] = ""; + + if (phandle != -1) { + ret = phandle_to_path(fdt, phandle, tmp, sizeof(tmp)); + if (ret > 0) { + if (VOF_MEM_WRITE(buf, tmp, ret) != MEMTX_OK) { + ret = -1; + } + } + } + trace_vof_instance_to_path(ihandle, phandle, tmp, ret); + + return ret; +} + +static uint32_t vof_write(Vof *vof, uint32_t ihandle, uint32_t buf, + uint32_t len) +{ + char tmp[VOF_VTY_BUF_SIZE]; + unsigned cb; + OfInstance *inst = (OfInstance *) + g_hash_table_lookup(vof->of_instances, GINT_TO_POINTER(ihandle)); + + if (!inst) { + trace_vof_error_write(ihandle); + return -1; + } + + for ( ; len > 0; len -= cb) { + cb = MIN(len, sizeof(tmp) - 1); + if (VOF_MEM_READ(buf, tmp, cb) != MEMTX_OK) { + return -1; + } + + /* FIXME: there is no backend(s) yet so just call a trace */ + if (trace_event_get_state(TRACE_VOF_WRITE) && + qemu_loglevel_mask(LOG_TRACE)) { + tmp[cb] = '\0'; + trace_vof_write(ihandle, cb, tmp); + } + } + + return len; +} + +static void vof_claimed_dump(GArray *claimed) +{ + int i; + OfClaimed c; + + if (trace_event_get_state(TRACE_VOF_CLAIMED) && + qemu_loglevel_mask(LOG_TRACE)) { + + for (i = 0; i < claimed->len; ++i) { + c = g_array_index(claimed, OfClaimed, i); + trace_vof_claimed(c.start, c.start + c.size, c.size); + } + } +} + +static bool vof_claim_avail(GArray *claimed, uint64_t virt, uint64_t size) +{ + int i; + OfClaimed c; + + for (i = 0; i < claimed->len; ++i) { + c = g_array_index(claimed, OfClaimed, i); + if (ranges_overlap(c.start, c.size, virt, size)) { + return false; + } + } + + return true; +} + +static void vof_claim_add(GArray *claimed, uint64_t virt, uint64_t size) +{ + OfClaimed newclaim; + + newclaim.start = virt; + newclaim.size = size; + g_array_append_val(claimed, newclaim); +} + +static gint of_claimed_compare_func(gconstpointer a, gconstpointer b) +{ + return ((OfClaimed *)a)->start - ((OfClaimed *)b)->start; +} + +static void vof_dt_memory_available(void *fdt, GArray *claimed, uint64_t base) +{ + int i, n, offset, proplen = 0, sc, ac; + target_ulong mem0_end; + const uint8_t *mem0_reg; + g_autofree uint8_t *avail = NULL; + uint8_t *availcur; + + if (!fdt || !claimed) { + return; + } + + offset = fdt_path_offset(fdt, "/"); + _FDT(offset); + ac = fdt_address_cells(fdt, offset); + g_assert(ac == 1 || ac == 2); + sc = fdt_size_cells(fdt, offset); + g_assert(sc == 1 || sc == 2); + + offset = fdt_path_offset(fdt, "/memory@0"); + _FDT(offset); + + mem0_reg = fdt_getprop(fdt, offset, "reg", &proplen); + g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc)); + if (sc == 2) { + mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * ac)); + } else { + mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * ac)); + } + + g_array_sort(claimed, of_claimed_compare_func); + vof_claimed_dump(claimed); + + /* + * VOF resides in the first page so we do not need to check if there is + * available memory before the first claimed block + */ + g_assert(claimed->len && (g_array_index(claimed, OfClaimed, 0).start == 0)); + + avail = g_malloc0(sizeof(uint32_t) * (ac + sc) * claimed->len); + for (i = 0, n = 0, availcur = avail; i < claimed->len; ++i) { + OfClaimed c = g_array_index(claimed, OfClaimed, i); + uint64_t start, size; + + start = c.start + c.size; + if (i < claimed->len - 1) { + OfClaimed cn = g_array_index(claimed, OfClaimed, i + 1); + + size = cn.start - start; + } else { + size = mem0_end - start; + } + + if (ac == 2) { + *(uint64_t *) availcur = cpu_to_be64(start); + } else { + *(uint32_t *) availcur = cpu_to_be32(start); + } + availcur += sizeof(uint32_t) * ac; + if (sc == 2) { + *(uint64_t *) availcur = cpu_to_be64(size); + } else { + *(uint32_t *) availcur = cpu_to_be32(size); + } + availcur += sizeof(uint32_t) * sc; + + if (size) { + trace_vof_avail(c.start + c.size, c.start + c.size + size, size); + ++n; + } + } + _FDT((fdt_setprop(fdt, offset, "available", avail, availcur - avail))); +} + +/* + * OF1275: + * "Allocates size bytes of memory. If align is zero, the allocated range + * begins at the virtual address virt. Otherwise, an aligned address is + * automatically chosen and the input argument virt is ignored". + * + * In other words, exactly one of @virt and @align is non-zero. + */ +uint64_t vof_claim(Vof *vof, uint64_t virt, uint64_t size, + uint64_t align) +{ + uint64_t ret; + + if (size == 0) { + ret = -1; + } else if (align == 0) { + if (!vof_claim_avail(vof->claimed, virt, size)) { + ret = -1; + } else { + ret = virt; + } + } else { + vof->claimed_base = QEMU_ALIGN_UP(vof->claimed_base, align); + while (1) { + if (vof->claimed_base >= vof->top_addr) { + error_report("Out of RMA memory for the OF client"); + return -1; + } + if (vof_claim_avail(vof->claimed, vof->claimed_base, size)) { + break; + } + vof->claimed_base += size; + } + ret = vof->claimed_base; + } + + if (ret != -1) { + vof->claimed_base = MAX(vof->claimed_base, ret + size); + vof_claim_add(vof->claimed, ret, size); + } + trace_vof_claim(virt, size, align, ret); + + return ret; +} + +static uint32_t vof_release(Vof *vof, uint64_t virt, uint64_t size) +{ + uint32_t ret = -1; + int i; + GArray *claimed = vof->claimed; + OfClaimed c; + + for (i = 0; i < claimed->len; ++i) { + c = g_array_index(claimed, OfClaimed, i); + if (c.start == virt && c.size == size) { + g_array_remove_index(claimed, i); + ret = 0; + break; + } + } + + trace_vof_release(virt, size, ret); + + return ret; +} + +static void vof_instantiate_rtas(Error **errp) +{ + error_setg(errp, "The firmware should have instantiated RTAS"); +} + +static uint32_t vof_call_method(MachineState *ms, Vof *vof, uint32_t methodaddr, + uint32_t ihandle, uint32_t param1, + uint32_t param2, uint32_t param3, + uint32_t param4, uint32_t *ret2) +{ + uint32_t ret = -1; + char method[VOF_MAX_METHODLEN] = ""; + OfInstance *inst; + + if (!ihandle) { + goto trace_exit; + } + + inst = (OfInstance *) g_hash_table_lookup(vof->of_instances, + GINT_TO_POINTER(ihandle)); + if (!inst) { + goto trace_exit; + } + + if (readstr(methodaddr, method, sizeof(method))) { + goto trace_exit; + } + + if (strcmp(inst->path, "/") == 0) { + if (strcmp(method, "ibm,client-architecture-support") == 0) { + Object *vmo = object_dynamic_cast(OBJECT(ms), TYPE_VOF_MACHINE_IF); + + if (vmo) { + VofMachineIfClass *vmc = VOF_MACHINE_GET_CLASS(vmo); + + ret = vmc->client_architecture_support(ms, first_cpu, param1); + } + + *ret2 = 0; + } + } else if (strcmp(inst->path, "/rtas") == 0) { + if (strcmp(method, "instantiate-rtas") == 0) { + vof_instantiate_rtas(&error_fatal); + ret = 0; + *ret2 = param1; /* rtas-base */ + } + } else { + trace_vof_error_unknown_method(method); + } + +trace_exit: + trace_vof_method(ihandle, method, param1, ret, *ret2); + + return ret; +} + +static uint32_t vof_call_interpret(uint32_t cmdaddr, uint32_t param1, + uint32_t param2, uint32_t *ret2) +{ + uint32_t ret = -1; + char cmd[VOF_MAX_FORTHCODE] = ""; + + /* No interpret implemented so just call a trace */ + readstr(cmdaddr, cmd, sizeof(cmd)); + trace_vof_interpret(cmd, param1, param2, ret, *ret2); + + return ret; +} + +static void vof_quiesce(MachineState *ms, void *fdt, Vof *vof) +{ + Object *vmo = object_dynamic_cast(OBJECT(ms), TYPE_VOF_MACHINE_IF); + /* After "quiesce", no change is expected to the FDT, pack FDT to ensure */ + int rc = fdt_pack(fdt); + + assert(rc == 0); + + if (vmo) { + VofMachineIfClass *vmc = VOF_MACHINE_GET_CLASS(vmo); + + vmc->quiesce(ms); + } + + vof_claimed_dump(vof->claimed); +} + +static uint32_t vof_client_handle(MachineState *ms, void *fdt, Vof *vof, + const char *service, + uint32_t *args, unsigned nargs, + uint32_t *rets, unsigned nrets) +{ + uint32_t ret = 0; + + /* @nrets includes the value which this function returns */ +#define cmpserv(s, a, r) \ + cmpservice(service, nargs, nrets, (s), (a), (r)) + + if (cmpserv("finddevice", 1, 1)) { + ret = vof_finddevice(fdt, args[0]); + } else if (cmpserv("getprop", 4, 1)) { + ret = vof_getprop(fdt, args[0], args[1], args[2], args[3]); + } else if (cmpserv("getproplen", 2, 1)) { + ret = vof_getproplen(fdt, args[0], args[1]); + } else if (cmpserv("setprop", 4, 1)) { + ret = vof_setprop(ms, fdt, vof, args[0], args[1], args[2], args[3]); + } else if (cmpserv("nextprop", 3, 1)) { + ret = vof_nextprop(fdt, args[0], args[1], args[2]); + } else if (cmpserv("peer", 1, 1)) { + ret = vof_peer(fdt, args[0]); + } else if (cmpserv("child", 1, 1)) { + ret = vof_child(fdt, args[0]); + } else if (cmpserv("parent", 1, 1)) { + ret = vof_parent(fdt, args[0]); + } else if (cmpserv("open", 1, 1)) { + ret = vof_open(fdt, vof, args[0]); + } else if (cmpserv("close", 1, 0)) { + vof_close(vof, args[0]); + } else if (cmpserv("instance-to-package", 1, 1)) { + ret = vof_instance_to_package(vof, args[0]); + } else if (cmpserv("package-to-path", 3, 1)) { + ret = vof_package_to_path(fdt, args[0], args[1], args[2]); + } else if (cmpserv("instance-to-path", 3, 1)) { + ret = vof_instance_to_path(fdt, vof, args[0], args[1], args[2]); + } else if (cmpserv("write", 3, 1)) { + ret = vof_write(vof, args[0], args[1], args[2]); + } else if (cmpserv("claim", 3, 1)) { + ret = vof_claim(vof, args[0], args[1], args[2]); + if (ret != -1) { + vof_dt_memory_available(fdt, vof->claimed, vof->claimed_base); + } + } else if (cmpserv("release", 2, 0)) { + ret = vof_release(vof, args[0], args[1]); + if (ret != -1) { + vof_dt_memory_available(fdt, vof->claimed, vof->claimed_base); + } + } else if (cmpserv("call-method", 0, 0)) { + ret = vof_call_method(ms, vof, args[0], args[1], args[2], args[3], + args[4], args[5], rets); + } else if (cmpserv("interpret", 0, 0)) { + ret = vof_call_interpret(args[0], args[1], args[2], rets); + } else if (cmpserv("milliseconds", 0, 1)) { + ret = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL); + } else if (cmpserv("quiesce", 0, 0)) { + vof_quiesce(ms, fdt, vof); + } else if (cmpserv("exit", 0, 0)) { + error_report("Stopped as the VM requested \"exit\""); + vm_stop(RUN_STATE_PAUSED); + } else { + trace_vof_error_unknown_service(service, nargs, nrets); + ret = -1; + } + + return ret; +} + +/* Defined as Big Endian */ +struct prom_args { + uint32_t service; + uint32_t nargs; + uint32_t nret; + uint32_t args[10]; +} QEMU_PACKED; + +int vof_client_call(MachineState *ms, Vof *vof, void *fdt, + target_ulong args_real) +{ + struct prom_args args_be; + uint32_t args[ARRAY_SIZE(args_be.args)]; + uint32_t rets[ARRAY_SIZE(args_be.args)] = { 0 }, ret; + char service[64]; + unsigned nargs, nret, i; + + if (address_space_rw(&address_space_memory, args_real, + MEMTXATTRS_UNSPECIFIED, &args_be, sizeof(args_be), + false) != MEMTX_OK) { + return -EINVAL; + } + nargs = be32_to_cpu(args_be.nargs); + if (nargs >= ARRAY_SIZE(args_be.args)) { + return -EINVAL; + } + + if (address_space_rw(&address_space_memory, be32_to_cpu(args_be.service), + MEMTXATTRS_UNSPECIFIED, service, sizeof(service), + false) != MEMTX_OK) { + return -EINVAL; + } + if (strnlen(service, sizeof(service)) == sizeof(service)) { + /* Too long service name */ + return -EINVAL; + } + + for (i = 0; i < nargs; ++i) { + args[i] = be32_to_cpu(args_be.args[i]); + } + + nret = be32_to_cpu(args_be.nret); + ret = vof_client_handle(ms, fdt, vof, service, args, nargs, rets, nret); + if (!nret) { + return 0; + } + + args_be.args[nargs] = cpu_to_be32(ret); + for (i = 1; i < nret; ++i) { + args_be.args[nargs + i] = cpu_to_be32(rets[i - 1]); + } + + if (address_space_rw(&address_space_memory, + args_real + offsetof(struct prom_args, args[nargs]), + MEMTXATTRS_UNSPECIFIED, args_be.args + nargs, + sizeof(args_be.args[0]) * nret, true) != MEMTX_OK) { + return -EINVAL; + } + + return 0; +} + +static void vof_instance_free(gpointer data) +{ + OfInstance *inst = (OfInstance *) data; + + g_free(inst->path); + g_free(inst); +} + +void vof_init(Vof *vof, uint64_t top_addr, Error **errp) +{ + vof_cleanup(vof); + + vof->of_instances = g_hash_table_new_full(g_direct_hash, g_direct_equal, + NULL, vof_instance_free); + vof->claimed = g_array_new(false, false, sizeof(OfClaimed)); + + /* Keep allocations in 32bit as CLI ABI can only return cells==32bit */ + vof->top_addr = MIN(top_addr, 4 * GiB); + if (vof_claim(vof, 0, vof->fw_size, 0) == -1) { + error_setg(errp, "Memory for firmware is in use"); + } +} + +void vof_cleanup(Vof *vof) +{ + if (vof->claimed) { + g_array_unref(vof->claimed); + } + if (vof->of_instances) { + g_hash_table_unref(vof->of_instances); + } + vof->claimed = NULL; + vof->of_instances = NULL; +} + +void vof_build_dt(void *fdt, Vof *vof) +{ + uint32_t phandle = fdt_get_max_phandle(fdt); + int offset, proplen = 0; + const void *prop; + + /* Assign phandles to nodes without predefined phandles (like XICS/XIVE) */ + for (offset = fdt_next_node(fdt, -1, NULL); + offset >= 0; + offset = fdt_next_node(fdt, offset, NULL)) { + prop = fdt_getprop(fdt, offset, "phandle", &proplen); + if (prop) { + continue; + } + ++phandle; + _FDT(fdt_setprop_cell(fdt, offset, "phandle", phandle)); + } + + vof_dt_memory_available(fdt, vof->claimed, vof->claimed_base); +} + +static const TypeInfo vof_machine_if_info = { + .name = TYPE_VOF_MACHINE_IF, + .parent = TYPE_INTERFACE, + .class_size = sizeof(VofMachineIfClass), +}; + +static void vof_machine_if_register_types(void) +{ + type_register_static(&vof_machine_if_info); +} +type_init(vof_machine_if_register_types) diff --git a/pc-bios/vof/bootmem.c b/pc-bios/vof/bootmem.c new file mode 100644 index 000000000000..771b9e95f95d --- /dev/null +++ b/pc-bios/vof/bootmem.c @@ -0,0 +1,14 @@ +#include "vof.h" + +void boot_from_memory(uint64_t initrd, uint64_t initrdsize) +{ + uint64_t kern[2]; + phandle chosen = ci_finddevice("/chosen"); + + if (ci_getprop(chosen, "qemu,boot-kernel", kern, sizeof(kern)) != + sizeof(kern)) { + return; + } + + do_boot(kern[0], initrd, initrdsize); +} diff --git a/pc-bios/vof/ci.c b/pc-bios/vof/ci.c new file mode 100644 index 000000000000..a80806580dd0 --- /dev/null +++ b/pc-bios/vof/ci.c @@ -0,0 +1,91 @@ +#include "vof.h" + +struct prom_args { + uint32_t service; + uint32_t nargs; + uint32_t nret; + uint32_t args[10]; +}; + +typedef unsigned long prom_arg_t; + +#define ADDR(x) (uint32_t)(x) + +static int prom_handle(struct prom_args *pargs) +{ + void *rtasbase; + uint32_t rtassize = 0; + phandle rtas; + + if (strcmp("call-method", (void *)(unsigned long) pargs->service)) { + return -1; + } + + if (strcmp("instantiate-rtas", (void *)(unsigned long) pargs->args[0])) { + return -1; + } + + rtas = ci_finddevice("/rtas"); + /* rtas-size is set by QEMU depending of FWNMI support */ + ci_getprop(rtas, "rtas-size", &rtassize, sizeof(rtassize)); + if (rtassize < hv_rtas_size) { + return -1; + } + + rtasbase = (void *)(unsigned long) pargs->args[2]; + + memcpy(rtasbase, hv_rtas, hv_rtas_size); + pargs->args[pargs->nargs] = 0; + pargs->args[pargs->nargs + 1] = pargs->args[2]; + + return 0; +} + +void prom_entry(uint32_t args) +{ + if (prom_handle((void *)(unsigned long) args)) { + ci_entry(args); + } +} + +static int call_ci(const char *service, int nargs, int nret, ...) +{ + int i; + struct prom_args args; + va_list list; + + args.service = ADDR(service); + args.nargs = nargs; + args.nret = nret; + + va_start(list, nret); + for (i = 0; i < nargs; i++) { + args.args[i] = va_arg(list, prom_arg_t); + } + va_end(list); + + for (i = 0; i < nret; i++) { + args.args[nargs + i] = 0; + } + + if (ci_entry((uint32_t)(&args)) < 0) { + return PROM_ERROR; + } + + return (nret > 0) ? args.args[nargs] : 0; +} + +void ci_panic(const char *str) +{ + call_ci("exit", 0, 0); +} + +phandle ci_finddevice(const char *path) +{ + return call_ci("finddevice", 1, 1, path); +} + +uint32_t ci_getprop(phandle ph, const char *propname, void *prop, int len) +{ + return call_ci("getprop", 4, 1, ph, propname, prop, len); +} diff --git a/pc-bios/vof/libc.c b/pc-bios/vof/libc.c new file mode 100644 index 000000000000..00c10e6e7da1 --- /dev/null +++ b/pc-bios/vof/libc.c @@ -0,0 +1,92 @@ +#include "vof.h" + +int strlen(const char *s) +{ + int len = 0; + + while (*s != 0) { + len += 1; + s += 1; + } + + return len; +} + +int strcmp(const char *s1, const char *s2) +{ + while (*s1 != 0 && *s2 != 0) { + if (*s1 != *s2) { + break; + } + s1 += 1; + s2 += 1; + } + + return *s1 - *s2; +} + +void *memcpy(void *dest, const void *src, size_t n) +{ + char *cdest; + const char *csrc = src; + + cdest = dest; + while (n-- > 0) { + *cdest++ = *csrc++; + } + + return dest; +} + +int memcmp(const void *ptr1, const void *ptr2, size_t n) +{ + const unsigned char *p1 = ptr1; + const unsigned char *p2 = ptr2; + + while (n-- > 0) { + if (*p1 != *p2) { + return *p1 - *p2; + } + p1 += 1; + p2 += 1; + } + + return 0; +} + +void *memmove(void *dest, const void *src, size_t n) +{ + char *cdest; + const char *csrc; + int i; + + /* Do the buffers overlap in a bad way? */ + if (src < dest && src + n >= dest) { + /* Copy from end to start */ + cdest = dest + n - 1; + csrc = src + n - 1; + for (i = 0; i < n; i++) { + *cdest-- = *csrc--; + } + } else { + /* Normal copy is possible */ + cdest = dest; + csrc = src; + for (i = 0; i < n; i++) { + *cdest++ = *csrc++; + } + } + + return dest; +} + +void *memset(void *dest, int c, size_t size) +{ + unsigned char *d = (unsigned char *)dest; + + while (size-- > 0) { + *d++ = (unsigned char)c; + } + + return dest; +} diff --git a/pc-bios/vof/main.c b/pc-bios/vof/main.c new file mode 100644 index 000000000000..9fc30d2d0957 --- /dev/null +++ b/pc-bios/vof/main.c @@ -0,0 +1,21 @@ +#include "vof.h" + +void do_boot(unsigned long addr, unsigned long _r3, unsigned long _r4) +{ + register unsigned long r3 __asm__("r3") = _r3; + register unsigned long r4 __asm__("r4") = _r4; + register unsigned long r5 __asm__("r5") = (unsigned long) _prom_entry; + + ((client *)(uint32_t)addr)(); +} + +void entry_c(void) +{ + register unsigned long r3 __asm__("r3"); + register unsigned long r4 __asm__("r4"); + register unsigned long r5 __asm__("r5"); + uint64_t initrd = r3, initrdsize = r4; + + boot_from_memory(initrd, initrdsize); + ci_panic("*** No boot target ***\n"); +} diff --git a/tests/qtest/rtas-test.c b/tests/qtest/rtas-test.c index 16751dbd2f55..5f1194a6eb53 100644 --- a/tests/qtest/rtas-test.c +++ b/tests/qtest/rtas-test.c @@ -5,7 +5,7 @@ #include "libqos/libqos-spapr.h" #include "libqos/rtas.h" -static void test_rtas_get_time_of_day(void) +static void run_test_rtas_get_time_of_day(const char *machine) { QOSState *qs; struct tm tm; @@ -13,7 +13,7 @@ static void test_rtas_get_time_of_day(void) uint64_t ret; time_t t1, t2; - qs = qtest_spapr_boot("-machine pseries"); + qs = qtest_spapr_boot(machine); t1 = time(NULL); ret = qrtas_get_time_of_day(qs->qts, &qs->alloc, &tm, &ns); @@ -24,6 +24,16 @@ static void test_rtas_get_time_of_day(void) qtest_shutdown(qs); } +static void test_rtas_get_time_of_day(void) +{ + run_test_rtas_get_time_of_day("-machine pseries"); +} + +static void test_rtas_get_time_of_day_vof(void) +{ + run_test_rtas_get_time_of_day("-machine pseries,x-vof=on"); +} + int main(int argc, char *argv[]) { const char *arch = qtest_get_arch(); @@ -35,6 +45,7 @@ int main(int argc, char *argv[]) exit(EXIT_FAILURE); } qtest_add_func("rtas/get-time-of-day", test_rtas_get_time_of_day); + qtest_add_func("rtas/get-time-of-day-vof", test_rtas_get_time_of_day_vof); return g_test_run(); } diff --git a/MAINTAINERS b/MAINTAINERS index eab178aeee5e..eaba3cf2a5d4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1346,6 +1346,18 @@ F: hw/pci-host/mv64361.c F: hw/pci-host/mv643xx.h F: include/hw/pci-host/mv64361.h +Virtual Open Firmware (VOF) +M: Alexey Kardashevskiy <aik@ozlabs.ru> +M: David Gibson <david@gibson.dropbear.id.au> +M: Greg Kurz <groug@kaod.org> +L: qemu-ppc@nongnu.org +S: Maintained +F: hw/ppc/spapr_vof* +F: hw/ppc/vof* +F: include/hw/ppc/vof* +F: pc-bios/vof/* +F: pc-bios/vof* + RISC-V Machines --------------- OpenTitan diff --git a/default-configs/devices/ppc64-softmmu.mak b/default-configs/devices/ppc64-softmmu.mak index ae0841fa3a18..9fb201dfacfa 100644 --- a/default-configs/devices/ppc64-softmmu.mak +++ b/default-configs/devices/ppc64-softmmu.mak @@ -9,3 +9,4 @@ CONFIG_POWERNV=y # For pSeries CONFIG_PSERIES=y CONFIG_NVDIMM=y +CONFIG_VOF=y diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig index e51e0e5e5ac6..964510dfc73d 100644 --- a/hw/ppc/Kconfig +++ b/hw/ppc/Kconfig @@ -143,3 +143,6 @@ config FW_CFG_PPC config FDT_PPC bool + +config VOF + bool diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build index 597d974dd4ff..aa4c8e6a2eac 100644 --- a/hw/ppc/meson.build +++ b/hw/ppc/meson.build @@ -84,4 +84,7 @@ ppc_ss.add(when: 'CONFIG_VIRTEX', if_true: files('virtex_ml507.c')) # Pegasos2 ppc_ss.add(when: 'CONFIG_PEGASOS2', if_true: files('pegasos2.c')) +ppc_ss.add(when: 'CONFIG_VOF', if_true: files('vof.c')) +ppc_ss.add(when: ['CONFIG_VOF', 'CONFIG_PSERIES'], if_true: files('spapr_vof.c')) + hw_arch += {'ppc': ppc_ss} diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events index b4bbfbb01348..78fc2b5d39d1 100644 --- a/hw/ppc/trace-events +++ b/hw/ppc/trace-events @@ -71,6 +71,30 @@ spapr_rtas_ibm_configure_connector_invalid(uint32_t index) "DRC index: 0x%"PRIx3 spapr_vio_h_reg_crq(uint64_t reg, uint64_t queue_addr, uint64_t queue_len) "CRQ for dev 0x%" PRIx64 " registered at 0x%" PRIx64 "/0x%" PRIx64 spapr_vio_free_crq(uint32_t reg) "CRQ for dev 0x%" PRIx32 " freed" +# vof.c +vof_error_str_truncated(const char *s, int len) "%s truncated to %d" +vof_error_param(const char *method, int nargscheck, int nretcheck, int nargs, int nret) "%s takes/returns %d/%d, not %d/%d" +vof_error_unknown_service(const char *service, int nargs, int nret) "\"%s\" args=%d rets=%d" +vof_error_unknown_method(const char *method) "\"%s\"" +vof_error_unknown_ihandle_close(uint32_t ih) "ih=0x%x" +vof_error_unknown_path(const char *path) "\"%s\"" +vof_error_write(uint32_t ih) "ih=0x%x" +vof_finddevice(const char *path, uint32_t ph) "\"%s\" => ph=0x%x" +vof_claim(uint32_t virt, uint32_t size, uint32_t align, uint32_t ret) "virt=0x%x size=0x%x align=0x%x => 0x%x" +vof_release(uint32_t virt, uint32_t size, uint32_t ret) "virt=0x%x size=0x%x => 0x%x" +vof_method(uint32_t ihandle, const char *method, uint32_t param, uint32_t ret, uint32_t ret2) "ih=0x%x \"%s\"(0x%x) => 0x%x 0x%x" +vof_getprop(uint32_t ph, const char *prop, uint32_t ret, const char *val) "ph=0x%x \"%s\" => len=%d [%s]" +vof_getproplen(uint32_t ph, const char *prop, uint32_t ret) "ph=0x%x \"%s\" => len=%d" +vof_setprop(uint32_t ph, const char *prop, const char *val, uint32_t vallen, uint32_t ret) "ph=0x%x \"%s\" [%s] len=%d => ret=%d" +vof_open(const char *path, uint32_t ph, uint32_t ih) "%s ph=0x%x => ih=0x%x" +vof_interpret(const char *cmd, uint32_t param1, uint32_t param2, uint32_t ret, uint32_t ret2) "[%s] 0x%x 0x%x => 0x%x 0x%x" +vof_package_to_path(uint32_t ph, const char *tmp, uint32_t ret) "ph=0x%x => %s len=%d" +vof_instance_to_path(uint32_t ih, uint32_t ph, const char *tmp, uint32_t ret) "ih=0x%x ph=0x%x => %s len=%d" +vof_instance_to_package(uint32_t ih, uint32_t ph) "ih=0x%x => ph=0x%x" +vof_write(uint32_t ih, unsigned cb, const char *msg) "ih=0x%x [%u] \"%s\"" +vof_avail(uint64_t start, uint64_t end, uint64_t size) "0x%"PRIx64"..0x%"PRIx64" size=0x%"PRIx64 +vof_claimed(uint64_t start, uint64_t end, uint64_t size) "0x%"PRIx64"..0x%"PRIx64" size=0x%"PRIx64 + # ppc.c ppc_tb_adjust(uint64_t offs1, uint64_t offs2, int64_t diff, int64_t seconds) "adjusted from 0x%"PRIx64" to 0x%"PRIx64", diff %"PRId64" (%"PRId64"s)" diff --git a/pc-bios/README b/pc-bios/README index c101c9a04f8f..6e6556e91c92 100644 --- a/pc-bios/README +++ b/pc-bios/README @@ -16,6 +16,8 @@ https://github.com/aik/SLOF, and the image currently in qemu is built from git tag qemu-slof-20210217. +- vof is a minimalistic firmware to work with -machine pseries,x-vof=on. + - sgabios (the Serial Graphics Adapter option ROM) provides a means for legacy x86 software to communicate with an attached serial console as if a video card were attached. The master sources reside in a subversion diff --git a/pc-bios/vof-nvram.bin b/pc-bios/vof-nvram.bin new file mode 100644 index 0000000000000000000000000000000000000000..d183901cf980a91d81c4348bb20487c7bb62a2ec GIT binary patch literal 16384 zcmeI%Jx;?g6bEpZJ8*)oSZeqZi&Z2pKnD)sI4{AHlNb4;RW}a70XPHaW57uo=-#R7 zKSLBhJJ0sdixY3IuY@hzo0r$OmE%T;XE9uh@s1k=AOHafKmY;|fB*y_009U<00Izz z00bZa0SG_<0uX=z1Rwwb2tWV=XCbip6d#B4{{rX#XR%}$Bm^J;0SG|gWP$!?Aq=-I zcT+0Ix{{?1q>9J8r+eW^JK1tYYZZMWQCUwW%0S*~w^p@wfkX-<yRFx)H*+YEt0RRd zmn}6xtwbP`yp4O=>kxMAEA<~5@*g)@mb%KD5!;O~8c)>8rRQBx55=trhk#+1+T3J_ zaf*G4vZAduqy$qda{``6Gnc2DQg<Es<GLxL#9<Oj*zP!8ZSnwf@-j7l47!nFXQO$a z^Hes6YU^_M<KsM*k~zwOSa+2g3Sx{*Eyu^XrB0FM5IJ-*?8`VvpBc4}vS(+_UKJ;= xITAns0uX=z1Rwwb2tWV=5P-nt34DD||Nni|VfbXeJORuY0uX=z1R!vE0>7B^s4f5i literal 0 HcmV?d00001 diff --git a/pc-bios/vof.bin b/pc-bios/vof.bin new file mode 100755 index 0000000000000000000000000000000000000000..7e4c3742deae3c1904f4b2bf03ef72576b12d532 GIT binary patch literal 3784 zcmeHIL1-Lh6n>lC=%gLW9QLr#l}v1e-6f$B_K?xg-PA2?k`e+EC^WLWW>>SAWMeYQ zC_C;<3N|UNY-k00(DV=%JqR9p&{HcE3D`ppJ*-e0y%l=#Xwf=;Z|0vS*)$~>4}u-| znVJ88^S<}K_ue-||L<!cO_V?R31x|H`_EUpociToX{vU=t&yd+OL<UKyB};kz+QhB z&5c^5R*OfC0UYEfe{0VWz1Xrx{vJLJ!{0AesjX2DQ1|8UDFWZoYU;3Ya+TU^>urtH zJqo;f?^3v4BdNYha{UG=R*ht9l@$AgMdnK*hMgkGj0YQ|R;Y{P(Q2exhbdH*f{k-O zI=`QL;QVP8&KexY{_rlYRm(?>l@!UEN`$*qT|UO|Vezh59LFYw5sQRa<Sd0AS|3@V zECs*XM+_Sx=Ol8DA?KdmIbVz+=ZKGo93ys|$m||5fP<Q}f66iTbPx?La-5EXI^7j^ zvamxu?K4lXyAB@`Vts>o)l#JN-=q+8X?<bIoD=6^=Uk2z#;qR8rqkAA99`gqnIqJ} zwGrPlC*Y%fMc9PkCo&#aUNIBHVJ!|HX>dA3<ey_Nz&7?Uh3!sZ8}n@0jyW)ojToqF z-@)p7;ST2uT#5aSIDdVxog2h0jW~}^?W2BuqJDQ!zjf5__U`&!m;%-r^y~zBHecVQ z>AFL6E1jf+Gh-!3)%3^I-As|y%+XS_M)l?@eb&4|*P5I3Wz(T=e(I!e5$hS3;}lKc zuHZe7JNIGmYKqd#4eo=eVH5iq-1$5!`UZ8R;xg=F9rXO5P2J!*%kNqT1;(Z4PF(Gz zv|k(8vj2=WL?OhNvAr5ee6DqZGp{C!sGY=h1C|7z%;^Mrga$3=$~NpI&L(gsaqV3{ zuoZovZ0mzoUwPr{7V?q3TK(}g=g$3}wn=5YR)5^5$U$O!(JvbZ>zAc9?w9^+MEw>K zW+rm_hnE-8H-S9;8#=R;VgJw(#s~l-0F0=I@#u5zO_9d|_3Zpz#_{zNy%+|6JZp;O zKufpqL%>%B{tNrWzbR%RgIVauEDY?Ph4U8hW5DmhEc5|C2^<2CKXY8DG3$>a^dscW zp%Z6(^sSO!q#pE<V%I2aPMu}F=l7&}>UD#4&J4bWS$_B(*Iv{m{gk?I_<9R$Utf7n zqVT)bhbf$S{T;!6WEs8X_THky8tdu@>rv~!1-{+?m(Mr+CfR=3n<bh3xLUf1Hr|OR z3q0Ai7kJQij?4W{;9vLi^zF=q`ww(&%=p-t-!m<-?Z_Loh@uueM^Vf$V@7zE;hQx= z))MY)yRooVK)0mjVQ7t)hpG)Z&Pt9{_y1!%8~nY%ZRKPyTo=#B2ksc2ckzV$IYx<j zYt7T=_OT8jk8K>b#<;dk7xoup&gDKL{(W`Cp@N$z>lg0sJ~Wm4QPtv(!+U(@YU0+O zYtI}DT~rWvpT{vc^thL_4m|JVSP{7>uw`9Vg(j}!d)Cw){O$Jr)cWn|oBjTiIwB@- zeujXXbN>m^@nQU5p4>0Zy>K9_&#q?-{3jVmBogXqMSZhU(N(=L|8_}Nv2}<h3)9oR zGbMehQl#?if?k-_%LTpEJFgcOIFaMxfp5JF<qu2PD;Z^-&)zD}7K^3#%ahoI!@2p& z937vWsw|Xd>D|)Idp(TS`%Y<owlvN5L;;!Th=LWvN@8L3j*MYHk98W)6Iey8IV=Oq F`Ws_KRUH5T literal 0 HcmV?d00001 diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S new file mode 100644 index 000000000000..569688714c91 --- /dev/null +++ b/pc-bios/vof/entry.S @@ -0,0 +1,51 @@ +#define LOAD32(rn, name) \ + lis rn,name##@h; \ + ori rn,rn,name##@l + +#define ENTRY(func_name) \ + .text; \ + .align 2; \ + .globl .func_name; \ + .func_name: \ + .globl func_name; \ + func_name: + +#define KVMPPC_HCALL_BASE 0xf000 +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0) +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5) + + . = 0x100 /* Do exactly as SLOF does */ + +ENTRY(_start) +# LOAD32(%r31, 0) /* Go 32bit mode */ +# mtmsrd %r31,0 + LOAD32(2, __toc_start) + b entry_c + +ENTRY(_prom_entry) + LOAD32(2, __toc_start) + stwu %r1,-112(%r1) + stw %r31,104(%r1) + mflr %r31 + bl prom_entry + nop + mtlr %r31 + ld %r31,104(%r1) + addi %r1,%r1,112 + blr + +ENTRY(ci_entry) + mr 4,3 + LOAD32(3,KVMPPC_H_VOF_CLIENT) + sc 1 + blr + +/* This is the actual RTAS blob copied to the OS at instantiate-rtas */ +ENTRY(hv_rtas) + mr %r4,%r3 + LOAD32(3,KVMPPC_H_RTAS) + sc 1 + blr + .globl hv_rtas_size +hv_rtas_size: + .long . - hv_rtas; diff --git a/pc-bios/vof/l.lds b/pc-bios/vof/l.lds new file mode 100644 index 000000000000..10b557a81f78 --- /dev/null +++ b/pc-bios/vof/l.lds @@ -0,0 +1,48 @@ +OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc") +OUTPUT_ARCH(powerpc:common) + +/* set the entry point */ +ENTRY ( __start ) + +SECTIONS { + __executable_start = .; + + .text : { + *(.text) + } + + __etext = .; + + . = ALIGN(8); + + .data : { + *(.data) + *(.rodata .rodata.*) + *(.got1) + *(.sdata) + *(.opd) + } + + /* FIXME bss at end ??? */ + + . = ALIGN(8); + __bss_start = .; + .bss : { + *(.sbss) *(.scommon) + *(.dynbss) + *(.bss) + } + + . = ALIGN(8); + __bss_end = .; + __bss_size = (__bss_end - __bss_start); + + . = ALIGN(256); + __toc_start = DEFINED (.TOC.) ? .TOC. : ADDR (.got) + 0x8000; + .got : + { + *(.toc .got) + } + . = ALIGN(8); + __toc_end = .; +}
The PAPR platform describes an OS environment that's presented by a combination of a hypervisor and firmware. The features it specifies require collaboration between the firmware and the hypervisor. Since the beginning, the runtime component of the firmware (RTAS) has been implemented as a 20 byte shim which simply forwards it to a hypercall implemented in qemu. The boot time firmware component is SLOF - but a build that's specific to qemu, and has always needed to be updated in sync with it. Even though we've managed to limit the amount of runtime communication we need between qemu and SLOF, there's some, and it has become increasingly awkward to handle as we've implemented new features. This implements a boot time OF client interface (CI) which is enabled by a new "x-vof" pseries machine option (stands for "Virtual Open Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall which implements Open Firmware Client Interface (OF CI). This allows using a smaller stateless firmware which does not have to manage the device tree. The new "vof.bin" firmware image is included with source code under pc-bios/. It also includes RTAS blob. This implements a handful of CI methods just to get -kernel/-initrd working. In particular, this implements the device tree fetching and simple memory allocator - "claim" (an OF CI memory allocator) and updates "/memory@0/available" to report the client about available memory. This implements changing some device tree properties which we know how to deal with, the rest is ignored. To allow changes, this skips fdt_pack() when x-vof=on as not packing the blob leaves some room for appending. In absence of SLOF, this assigns phandles to device tree nodes to make device tree traversing work. When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. This adds basic instances support which are managed by a hash map ihandle -> [phandle]. Before the guest started, the used memory is: 0..e60 - the initial firmware 8000..10000 - stack 400000.. - kernel 3ea0000.. - initramdisk This OF CI does not implement "interpret". Unlike SLOF, this does not format uninitialized nvram. Instead, this includes a disk image with pre-formatted nvram. With this basic support, this can only boot into kernel directly. However this is just enough for the petitboot kernel and initradmdisk to boot from any possible source. Note this requires reasonably recent guest kernel with: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 The immediate benefit is much faster booting time which especially crucial with fully emulated early CPU bring up environments. Also this may come handy when/if GRUB-in-the-userspace sees light of the day. This separates VOF and sPAPR in a hope that VOF bits may be reused by other POWERPC boards which do not support pSeries. This is coded in assumption that later on we might be adding support for booting from QEMU backends (blockdev is the first candidate) without devices/drivers in between as OF1275 does not require that and it is quite easy to so. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- The example command line is: /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \ -nodefaults \ -chardev stdio,id=STDIO0,signal=off,mux=on \ -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ -mon id=MON0,chardev=STDIO0,mode=readline \ -nographic \ -vga none \ -enable-kvm \ -m 8G \ -machine pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off \ -kernel pbuild/kernel-le-guest/vmlinux \ -initrd pb/rootfs.cpio.xz \ -drive id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \ -global spapr-nvram.drive=DRIVE0 \ -snapshot \ -smp 8,threads=8 \ -L /home/aik/t/qemu-ppc64-bios/ \ -trace events=qemu_trace_events \ -d guest_errors \ -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \ -mon chardev=SOCKET0,mode=control --- Changes: v20: * compile vof.bin with -mcpu=power4 for better compatibility * s/std/stw/ in entry.S to make it work on ppc32 * fixed dt_available property to support both 32 and 64bit * shuffled prom_args handling code * do not enforce 32bit in MSR (again, to support 32bit platforms) v19: * put bootargs in the FDT * moved setting properties to a VOF machine hook * moved fw_size and claim for it to vof_init() * added CROSS to the VOF's makefile * simplified phandles assigning * pass MachineState to all machine hooks instead of calling qdev_get_machine (following QOM) * bunch of smaller changes and added comments * added simple test to attempt to start with x-vof=on v18: * fixed top addr (max address for "claim") on radix - it equals to ram_size and vof->top_addr was uint32_t * fixed "available" property which got broken in v14 but it is only visible to clients which care (== grub) * reshuffled vof_dt_memory_available() calls, added vof_init() to allow vof_claim() before rendering the FDT v17: * mv hw/ppc/vof.h include/hw/ppc/vof.h * VofMachineIfClass -> VofMachineClass; it is not VofMachineInterface as nobody used this scheme, usually "Interface" is dropped, a couple of times it is "xxxInterfaceClass" or "xxxIfClass", as used the latter as it is used by include/hw/vmstate-if.h * added SPDX * other fixes from v16 review v16: * rebased on dwg/ppc-for-6.1 * s/SpaprVofInterface/VofMachineInterface/ v15: * bugfix: claimed memory for the VOF itself * ditched OF_STACK_ADDR and allocate one instead, now it starts from 0x8000 because it is aligned to its size (no particular reason though) * coding style * moved nvram.bin up one level * ditched bool in the firmware * made debugging code conditional using trace_event_get_state() + qemu_loglevel_mask() * renamed the CAS interface to SpaprVofInterface * added "write" which for now dumps the message and ihandle via trace point for early debug assistance * commented on when we allocate of_instances in vof_build_dt() * store fw_size is SpaprMachine to let spapr_vof_reset() claim it * many small fixes from v14's review v14: * check for truncates in readstr() * ditched a separate vof_reset() * spapr->vof is a pointer now, dropped the "on" field * removed rtas_base from vof and updated comment why we allow setting it * added myself to maintainers * updated commit log about blockdev and other possible platforms * added a note why new hcall is 0x5 * no in place endianness convertion in spapr_h_vof_client * converted all cpu_physical_memory_read/write to address_space_rw * git mv hw/ppc/spapr_vof_client.c hw/ppc/spapr_vof.c v13: * rebase on latest ppc-for-6.0 * shuffled code around to touch spapr.c less v12: * split VOF and SPAPR v11: * added g_autofree * fixed gcc warnings * fixed few leaks * added nvram image to make "nvram --print-config" not crash; Note that contrary to MIN_NVRAM_SIZE (8 * KiB), the actual minimum size is 16K, or it just does not work (empty output from "nvram") v10: * now rebased to compile with meson v9: * remove special handling of /rtas/rtas-size as now we always add it in QEMU * removed leftovers from scsi/grub/stdout/stdin/... v8: * no read/write/seek * no @dev in instances * the machine flag is "x-vof" for now v7: * now we have a small firmware which loads at 0 as SLOF and starts from 0x100 as SLOF * no MBR/ELF/GRUB business in QEMU anymore * blockdev is a separate patch * networking is a separate patch v6: * borrowed a big chunk of commit log introduction from David * fixed initial stack pointer (points to the highest address of stack) * traces for "interpret" and others * disabled translate_kernel_address() hack so grub can load (work in progress) * added "milliseconds" for grub * fixed "claim" allocator again * moved FDT_MAX_SIZE to spapr.h as spapr_of_client.c wants it too for CAS * moved the most code possible from spapr.c to spapr_of_client.c, such as RTAS, prom entry and FDT build/finalize * separated blobs * GRUB now proceeds to its console prompt (there are still other issues) * parse MBR/GPT to find PReP and load GRUB v5: * made instances keep device and chardev pointers * removed VIO dependencies * print error if RTAS memory is not claimed as it should have been * pack FDT as "quiesce" v4: * fixed open * validate ihandles in "call-method" v3: * fixed phandles allocation * s/__be32/uint32_t/ as we do not normally have __be32 type in qemu * fixed size of /chosen/stdout * bunch of renames * do not create rtas properties at all, let the client deal with it; instead setprop allows changing these in the FDT * no more packing FDT when bios=off - nobody needs it and getprop does not work otherwise * allow updating initramdisk device tree properties (for zImage) * added instances * fixed stdout on OF's "write" * removed special handling for stdout in OF client, spapr-vty handles it instead v2: * fixed claim() * added "setprop" * cleaner client interface and RTAS blobs management * boots to petitboot and further to the target system * more trace points v20 v20! --- pc-bios/vof/Makefile | 23 + include/hw/ppc/spapr.h | 19 +- include/hw/ppc/vof.h | 42 + pc-bios/vof/vof.h | 43 + hw/ppc/spapr.c | 68 +- hw/ppc/spapr_hcall.c | 21 +- hw/ppc/spapr_vof.c | 156 ++++ hw/ppc/vof.c | 1021 +++++++++++++++++++++ pc-bios/vof/bootmem.c | 14 + pc-bios/vof/ci.c | 91 ++ pc-bios/vof/libc.c | 92 ++ pc-bios/vof/main.c | 21 + tests/qtest/rtas-test.c | 15 +- MAINTAINERS | 12 + default-configs/devices/ppc64-softmmu.mak | 1 + hw/ppc/Kconfig | 3 + hw/ppc/meson.build | 3 + hw/ppc/trace-events | 24 + pc-bios/README | 2 + pc-bios/vof-nvram.bin | Bin 0 -> 16384 bytes pc-bios/vof.bin | Bin 0 -> 3784 bytes pc-bios/vof/entry.S | 51 + pc-bios/vof/l.lds | 48 + 23 files changed, 1759 insertions(+), 11 deletions(-) create mode 100644 pc-bios/vof/Makefile create mode 100644 include/hw/ppc/vof.h create mode 100644 pc-bios/vof/vof.h create mode 100644 hw/ppc/spapr_vof.c create mode 100644 hw/ppc/vof.c create mode 100644 pc-bios/vof/bootmem.c create mode 100644 pc-bios/vof/ci.c create mode 100644 pc-bios/vof/libc.c create mode 100644 pc-bios/vof/main.c create mode 100644 pc-bios/vof-nvram.bin create mode 100755 pc-bios/vof.bin create mode 100644 pc-bios/vof/entry.S create mode 100644 pc-bios/vof/l.lds