Message ID | 1300251423-6715-14-git-send-email-david@gibson.dropbear.id.au |
---|---|
State | New |
Headers | show |
On 03/16/2011 05:56 AM, David Gibson wrote: > This patch adds a "pseries" machine to qemu. This aims to emulate a > logical partition on an IBM pSeries machine, compliant to the > "PowerPC Architecture Platform Requirements" (PAPR) document. > > This initial version is quite limited, it implements a basic machine > and PAPR hypercall emulation. So far only one hypercall is present - > H_PUT_TERM_CHAR - so that a (write-only) console is available. > > Multiple CPUs are permitted, with SMP entry handled kexec() style. > > The machine so far more resembles an old POWER4 style "full system > partition" rather than a modern LPAR, in that the guest manages the > page tables directly, rather than via hypercalls. > > The machine requires qemu to be configured with --enable-fdt. The > machine can (so far) only be booted with -kernel - i.e. no partition > firmware is provided. > > Signed-off-by: David Gibson<dwg@au1.ibm.com> > --- > Makefile.target | 2 + > hw/spapr.c | 314 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > hw/spapr.h | 246 ++++++++++++++++++++++++++++++++++++++++++ > hw/spapr_hcall.c | 43 ++++++++ > 4 files changed, 605 insertions(+), 0 deletions(-) > create mode 100644 hw/spapr.c > create mode 100644 hw/spapr.h > create mode 100644 hw/spapr_hcall.c > > diff --git a/Makefile.target b/Makefile.target > index f0df98e..e6a7557 100644 > --- a/Makefile.target > +++ b/Makefile.target > @@ -231,6 +231,8 @@ obj-ppc-y += ppc_prep.o > obj-ppc-y += ppc_oldworld.o > # NewWorld PowerMac > obj-ppc-y += ppc_newworld.o > +# IBM pSeries (sPAPR) > +obj-ppc-y += spapr.o spapr_hcall.o > # PowerPC 4xx boards > obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o > obj-ppc-y += ppc440.o ppc440_bamboo.o > diff --git a/hw/spapr.c b/hw/spapr.c > [snip] > +#endif /* !defined (__HW_SPAPR_H__) */ > diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c > new file mode 100644 > index 0000000..6ddac00 > --- /dev/null > +++ b/hw/spapr_hcall.c Missing license > @@ -0,0 +1,43 @@ > +#include "sysemu.h" > +#include "cpu.h" > +#include "qemu-char.h" > +#include "hw/spapr.h" > + > +struct hypercall { > + spapr_hcall_fn fn; > +} hypercall_table[(MAX_HCALL_OPCODE / 4) + 1]; > + > +void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn) > +{ > + struct hypercall *hc; > + > + assert(opcode<= MAX_HCALL_OPCODE); > + assert((opcode& 0x3) == 0); > + > + hc = hypercall_table + (opcode / 4); > + > + assert(!hc->fn || (fn == hc->fn)); > + > + hc->fn = fn; > +} > + > +target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr, > + target_ulong opcode, target_ulong *args) > +{ > + if (msr_pr) { > + fprintf(stderr, "Hypercall made with MSR=0x" TARGET_FMT_lx "\n", > + env->msr); > + return H_PRIVILEGE; > + } > + > + if ((opcode<= MAX_HCALL_OPCODE) > +&& ((opcode& 0x3) == 0)) { > + struct hypercall *hc = hypercall_table + (opcode / 4); > + > + if (hc->fn) Braces > + return hc->fn(env, spapr, opcode, args); > + } > + > + fprintf(stderr, "Unimplemented hcall 0x" TARGET_FMT_lx "\n", opcode); > + return H_FUNCTION; > +}
On 03/15/2011 11:56 PM, David Gibson wrote: > This patch adds a "pseries" machine to qemu. This aims to emulate a > logical partition on an IBM pSeries machine, compliant to the > "PowerPC Architecture Platform Requirements" (PAPR) document. Can we call the machine 'papr' or at least 'lpar' Technically speaking, System P is the proper name these days, but I think papr or lpar would make a lot more sense to people. > This initial version is quite limited, it implements a basic machine > and PAPR hypercall emulation. So far only one hypercall is present - > H_PUT_TERM_CHAR - so that a (write-only) console is available. > > Multiple CPUs are permitted, with SMP entry handled kexec() style. > > The machine so far more resembles an old POWER4 style "full system > partition" rather than a modern LPAR, in that the guest manages the > page tables directly, rather than via hypercalls. > > The machine requires qemu to be configured with --enable-fdt. The > machine can (so far) only be booted with -kernel - i.e. no partition > firmware is provided. > > Signed-off-by: David Gibson<dwg@au1.ibm.com> > --- > Makefile.target | 2 + > hw/spapr.c | 314 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > hw/spapr.h | 246 ++++++++++++++++++++++++++++++++++++++++++ > hw/spapr_hcall.c | 43 ++++++++ > 4 files changed, 605 insertions(+), 0 deletions(-) > create mode 100644 hw/spapr.c > create mode 100644 hw/spapr.h > create mode 100644 hw/spapr_hcall.c > > diff --git a/Makefile.target b/Makefile.target > index f0df98e..e6a7557 100644 > --- a/Makefile.target > +++ b/Makefile.target > @@ -231,6 +231,8 @@ obj-ppc-y += ppc_prep.o > obj-ppc-y += ppc_oldworld.o > # NewWorld PowerMac > obj-ppc-y += ppc_newworld.o > +# IBM pSeries (sPAPR) > +obj-ppc-y += spapr.o spapr_hcall.o > # PowerPC 4xx boards > obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o > obj-ppc-y += ppc440.o ppc440_bamboo.o > diff --git a/hw/spapr.c b/hw/spapr.c > new file mode 100644 > index 0000000..8b4e16e > --- /dev/null > +++ b/hw/spapr.c > @@ -0,0 +1,314 @@ > +/* > + * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator > + * > + * Copyright (c) 2004-2007 Fabrice Bellard > + * Copyright (c) 2007 Jocelyn Mayer > + * Copyright (c) 2010 David Gibson, IBM Corporation. > + * > + * Permission is hereby granted, free of charge, to any person obtaining a copy > + * of this software and associated documentation files (the "Software"), to deal > + * in the Software without restriction, including without limitation the rights > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > + * copies of the Software, and to permit persons to whom the Software is > + * furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > + * THE SOFTWARE. > + * > + */ > +#include "sysemu.h" > +#include "qemu-char.h" > +#include "hw.h" > +#include "elf.h" > + > +#include "hw/boards.h" > +#include "hw/ppc.h" > +#include "hw/loader.h" > + > +#include "hw/spapr.h" > + > +#include<libfdt.h> > + > +#define KERNEL_LOAD_ADDR 0x00000000 > +#define INITRD_LOAD_ADDR 0x02800000 > +#define FDT_MAX_SIZE 0x10000 > + > +#define TIMEBASE_FREQ 512000000ULL > + > +#define MAX_CPUS 32 > + > +static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize, > + const char *cpu_model, CPUState *envs[], > + sPAPREnvironment *spapr, > + target_phys_addr_t initrd_base, > + target_phys_addr_t initrd_size, > + const char *kernel_cmdline) > +{ > + void *fdt; > + uint64_t mem_reg_property[] = { 0, cpu_to_be64(ramsize) }; > + uint32_t start_prop = cpu_to_be32(initrd_base); > + uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size); > + int i; > + char *modelname; > + > +#define _FDT(exp) \ > + do { \ > + int ret = (exp); \ > + if (ret< 0) { \ > + hw_error("qemu: error creating device tree: %s: %s\n", \ > + #exp, fdt_strerror(ret)); \ > + return NULL; \ > + } \ > + } while (0) I'm not a huge fan of macros like this. It'd be much nicer to use a goto to have common error handling. > + > + fdt = qemu_mallocz(FDT_MAX_SIZE); > + _FDT((fdt_create(fdt, FDT_MAX_SIZE))); > + > + _FDT((fdt_finish_reservemap(fdt))); > + > + /* Root node */ > + _FDT((fdt_begin_node(fdt, ""))); > + _FDT((fdt_property_string(fdt, "device_type", "chrp"))); > + _FDT((fdt_property_string(fdt, "model", "qemu,emulated-pSeries-LPAR"))); > + > + _FDT((fdt_property_cell(fdt, "#address-cells", 0x2))); > + _FDT((fdt_property_cell(fdt, "#size-cells", 0x2))); > + > + /* /chosen */ > + _FDT((fdt_begin_node(fdt, "chosen"))); > + > + _FDT((fdt_property_string(fdt, "bootargs", kernel_cmdline))); > + _FDT((fdt_property(fdt, "linux,initrd-start",&start_prop, sizeof(start_prop)))); > + _FDT((fdt_property(fdt, "linux,initrd-end",&end_prop, sizeof(end_prop)))); > + > + _FDT((fdt_end_node(fdt))); > + > + /* memory node */ > + _FDT((fdt_begin_node(fdt, "memory@0"))); > + > + _FDT((fdt_property_string(fdt, "device_type", "memory"))); > + _FDT((fdt_property(fdt, "reg", mem_reg_property, sizeof(mem_reg_property)))); > + > + _FDT((fdt_end_node(fdt))); > + > + /* cpus */ > + _FDT((fdt_begin_node(fdt, "cpus"))); > + > + _FDT((fdt_property_cell(fdt, "#address-cells", 0x1))); > + _FDT((fdt_property_cell(fdt, "#size-cells", 0x0))); > + > + modelname = qemu_strdup(cpu_model); > + > + for (i = 0; i< strlen(modelname); i++) { > + modelname[i] = toupper(modelname[i]); > + } > + > + for (i = 0; i< smp_cpus; i++) { > + CPUState *env = envs[i]; > + char *nodename; > + uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40), > + 0xffffffff, 0xffffffff}; > + > + if (asprintf(&nodename, "%s@%x", modelname, i)< 0) { > + fprintf(stderr, "Allocation failure\n"); > + exit(1); > + } asprintf isn't portable and we don't have a portable replacement (yet). I'd suggest using a static size buffer and snprintf(). > + > + _FDT((fdt_begin_node(fdt, nodename))); > + > + free(nodename); > + > + _FDT((fdt_property_cell(fdt, "reg", i))); > + _FDT((fdt_property_string(fdt, "device_type", "cpu"))); > + > + _FDT((fdt_property_cell(fdt, "cpu-version", env->spr[SPR_PVR]))); > + _FDT((fdt_property_cell(fdt, "dcache-block-size", env->dcache_line_size))); > + _FDT((fdt_property_cell(fdt, "icache-block-size", env->icache_line_size))); > + _FDT((fdt_property_cell(fdt, "timebase-frequency", TIMEBASE_FREQ))); > + /* Hardcode CPU frequency for now. It's kind of arbitrary on > + * full emu, for kvm we should copy it from the host */ > + _FDT((fdt_property_cell(fdt, "clock-frequency", 1000000000))); > + _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr))); > + _FDT((fdt_property_string(fdt, "status", "okay"))); > + _FDT((fdt_property(fdt, "64-bit", NULL, 0))); > + > + if (envs[i]->mmu_model& POWERPC_MMU_1TSEG) { > + _FDT((fdt_property(fdt, "ibm,processor-segment-sizes", > + segs, sizeof(segs)))); > + } > + > + _FDT((fdt_end_node(fdt))); > + } > + > + qemu_free(modelname); > + > + _FDT((fdt_end_node(fdt))); > + > + _FDT((fdt_end_node(fdt))); /* close root node */ > + _FDT((fdt_finish(fdt))); > + > + if (fdt_size) { > + *fdt_size = fdt_totalsize(fdt); > + } > + > + return fdt; > +} > + > +static uint64_t translate_kernel_address(void *opaque, uint64_t addr) > +{ > + return (addr& 0x0fffffff) + KERNEL_LOAD_ADDR; > +} > + > +static void emulate_spapr_hypercall(CPUState *env, void *opaque) > +{ > + env->gpr[3] = spapr_hypercall(env, (sPAPREnvironment *)opaque, > + env->gpr[3],&env->gpr[4]); > +} > + > +/* FIXME: hack until we implement the proper VIO console */ > +static target_ulong h_put_term_char(CPUState *env, sPAPREnvironment *spapr, > + target_ulong opcode, target_ulong *args) > +{ > + uint8_t buf[16]; > + > + stq_p(buf, args[2]); > + stq_p(buf + 8, args[3]); > + > + qemu_chr_write(serial_hds[0], buf, args[1]); > + > + return 0; > +} > + > + > +/* pSeries LPAR / sPAPR hardware init */ > +static void ppc_spapr_init(ram_addr_t ram_size, > + const char *boot_device, > + const char *kernel_filename, > + const char *kernel_cmdline, > + const char *initrd_filename, > + const char *cpu_model) > +{ > + CPUState *envs[MAX_CPUS]; > + void *fdt; > + int i; > + ram_addr_t ram_offset; > + target_phys_addr_t fdt_addr; > + uint32_t kernel_base, initrd_base; > + long kernel_size, initrd_size; > + int fdt_size; > + sPAPREnvironment *spapr; > + > + spapr = qemu_malloc(sizeof(*spapr)); > + > + /* We place the device tree just below either the top of RAM, or > + * 2GB, so that it can be processed with 32-bit code if > + * necessary */ > + fdt_addr = MIN(ram_size, 0x80000000) - FDT_MAX_SIZE; > + > + /* init CPUs */ > + if (cpu_model == NULL) { > + cpu_model = "POWER7"; > + } > + for (i = 0; i< smp_cpus; i++) { > + CPUState *env = cpu_init(cpu_model); > + > + if (!env) { > + fprintf(stderr, "Unable to find PowerPC CPU definition\n"); > + exit(1); > + } > + /* Set time-base frequency to 512 MHz */ > + cpu_ppc_tb_init(env, TIMEBASE_FREQ); > + qemu_register_reset((QEMUResetHandler*)&cpu_reset, env); > + > + env->emulate_hypercall = emulate_spapr_hypercall; > + env->hcall_opaque = spapr; > + > + env->hreset_vector = 0x60; > + env->hreset_excp_prefix = 0; > + env->gpr[3] = i; > + > + envs[i] = env; > + } > + > + /* allocate RAM */ > + ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size); > + cpu_register_physical_memory(0, ram_size, ram_offset); > + > + spapr_register_hypercall(H_PUT_TERM_CHAR, h_put_term_char); > + > + if (kernel_filename) { > + uint64_t lowaddr = 0; > + > + kernel_base = KERNEL_LOAD_ADDR; > + > + kernel_size = load_elf(kernel_filename, translate_kernel_address, NULL, > + NULL,&lowaddr, NULL, 1, ELF_MACHINE, 0); > + if (kernel_size< 0) { > + kernel_size = load_image_targphys(kernel_filename, kernel_base, > + ram_size - kernel_base); > + } > + if (kernel_size< 0) { > + hw_error("qemu: could not load kernel '%s'\n", kernel_filename); > + exit(1); > + } > + > + /* load initrd */ > + if (initrd_filename) { > + initrd_base = INITRD_LOAD_ADDR; > + initrd_size = load_image_targphys(initrd_filename, initrd_base, > + ram_size - initrd_base); > + if (initrd_size< 0) { > + hw_error("qemu: could not load initial ram disk '%s'\n", > + initrd_filename); > + exit(1); > + } > + } else { > + initrd_base = 0; > + initrd_size = 0; > + } > + > + } else { > + fprintf(stderr, "pSeries machine needs -kernel for now"); > + exit(1); > + } > + > + /* Prepare the device tree */ > + fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, envs, spapr, > + initrd_base, initrd_size, kernel_cmdline); > + if (!fdt) { > + hw_error("Couldn't create pSeries device tree\n"); > + exit(1); > + } > + > + cpu_physical_memory_write(fdt_addr, fdt, fdt_size); > + > + qemu_free(fdt); > + > + envs[0]->gpr[3] = fdt_addr; > + envs[0]->gpr[5] = 0; > + envs[0]->hreset_vector = kernel_base; > +} > + > +static QEMUMachine spapr_machine = { > + .name = "pseries", > + .desc = "pSeries Logical Partition (PAPR compliant)", > + .init = ppc_spapr_init, > + .max_cpus = MAX_CPUS, > + .no_vga = 1, > + .no_parallel = 1, > +}; > + > +static void spapr_machine_init(void) > +{ > + qemu_register_machine(&spapr_machine); > +} > + > +machine_init(spapr_machine_init); > diff --git a/hw/spapr.h b/hw/spapr.h > new file mode 100644 > index 0000000..9e63a19 > --- /dev/null > +++ b/hw/spapr.h > @@ -0,0 +1,246 @@ This needs a copyright of some sort. > +#if !defined (__HW_SPAPR_H__) > +#define __HW_SPAPR_H__ > + > +typedef struct sPAPREnvironment { > +} sPAPREnvironment; > + > +#define H_SUCCESS 0 > +#define H_BUSY 1 /* Hardware busy -- retry later */ > +#define H_CLOSED 2 /* Resource closed */ > +#define H_NOT_AVAILABLE 3 > +#define H_CONSTRAINED 4 /* Resource request constrained to max allowed */ > +#define H_PARTIAL 5 > +#define H_IN_PROGRESS 14 /* Kind of like busy */ > +#define H_PAGE_REGISTERED 15 > +#define H_PARTIAL_STORE 16 > +#define H_PENDING 17 /* returned from H_POLL_PENDING */ > +#define H_CONTINUE 18 /* Returned from H_Join on success */ > +#define H_LONG_BUSY_START_RANGE 9900 /* Start of long busy range */ > +#define H_LONG_BUSY_ORDER_1_MSEC 9900 /* Long busy, hint that 1msec \ > + is a good time to retry */ > +#define H_LONG_BUSY_ORDER_10_MSEC 9901 /* Long busy, hint that 10msec \ > + is a good time to retry */ > +#define H_LONG_BUSY_ORDER_100_MSEC 9902 /* Long busy, hint that 100msec \ > + is a good time to retry */ > +#define H_LONG_BUSY_ORDER_1_SEC 9903 /* Long busy, hint that 1sec \ > + is a good time to retry */ > +#define H_LONG_BUSY_ORDER_10_SEC 9904 /* Long busy, hint that 10sec \ > + is a good time to retry */ > +#define H_LONG_BUSY_ORDER_100_SEC 9905 /* Long busy, hint that 100sec \ > + is a good time to retry */ > +#define H_LONG_BUSY_END_RANGE 9905 /* End of long busy range */ > +#define H_HARDWARE -1 /* Hardware error */ > +#define H_FUNCTION -2 /* Function not supported */ > +#define H_PRIVILEGE -3 /* Caller not privileged */ > +#define H_PARAMETER -4 /* Parameter invalid, out-of-range or conflicting */ > +#define H_BAD_MODE -5 /* Illegal msr value */ > +#define H_PTEG_FULL -6 /* PTEG is full */ > +#define H_NOT_FOUND -7 /* PTE was not found" */ > +#define H_RESERVED_DABR -8 /* DABR address is reserved by the hypervisor on this processor" */ > +#define H_NO_MEM -9 > +#define H_AUTHORITY -10 > +#define H_PERMISSION -11 > +#define H_DROPPED -12 > +#define H_SOURCE_PARM -13 > +#define H_DEST_PARM -14 > +#define H_REMOTE_PARM -15 > +#define H_RESOURCE -16 > +#define H_ADAPTER_PARM -17 > +#define H_RH_PARM -18 > +#define H_RCQ_PARM -19 > +#define H_SCQ_PARM -20 > +#define H_EQ_PARM -21 > +#define H_RT_PARM -22 > +#define H_ST_PARM -23 > +#define H_SIGT_PARM -24 > +#define H_TOKEN_PARM -25 > +#define H_MLENGTH_PARM -27 > +#define H_MEM_PARM -28 > +#define H_MEM_ACCESS_PARM -29 > +#define H_ATTR_PARM -30 > +#define H_PORT_PARM -31 > +#define H_MCG_PARM -32 > +#define H_VL_PARM -33 > +#define H_TSIZE_PARM -34 > +#define H_TRACE_PARM -35 > + > +#define H_MASK_PARM -37 > +#define H_MCG_FULL -38 > +#define H_ALIAS_EXIST -39 > +#define H_P_COUNTER -40 > +#define H_TABLE_FULL -41 > +#define H_ALT_TABLE -42 > +#define H_MR_CONDITION -43 > +#define H_NOT_ENOUGH_RESOURCES -44 > +#define H_R_STATE -45 > +#define H_RESCINDEND -46 > +#define H_MULTI_THREADS_ACTIVE -9005 > + > + > +/* Long Busy is a condition that can be returned by the firmware > + * when a call cannot be completed now, but the identical call > + * should be retried later. This prevents calls blocking in the > + * firmware for long periods of time. Annoyingly the firmware can return > + * a range of return codes, hinting at how long we should wait before > + * retrying. If you don't care for the hint, the macro below is a good > + * way to check for the long_busy return codes > + */ > +#define H_IS_LONG_BUSY(x) ((x>= H_LONG_BUSY_START_RANGE) \ > +&& (x<= H_LONG_BUSY_END_RANGE)) > + > +/* Flags */ > +#define H_LARGE_PAGE (1ULL<<(63-16)) > +#define H_EXACT (1ULL<<(63-24)) /* Use exact PTE or return H_PTEG_FULL */ > +#define H_R_XLATE (1ULL<<(63-25)) /* include a valid logical page num in the pte if the valid bit is set */ > +#define H_READ_4 (1ULL<<(63-26)) /* Return 4 PTEs */ > +#define H_PAGE_STATE_CHANGE (1ULL<<(63-28)) > +#define H_PAGE_UNUSED ((1ULL<<(63-29)) | (1ULL<<(63-30))) > +#define H_PAGE_SET_UNUSED (H_PAGE_STATE_CHANGE | H_PAGE_UNUSED) > +#define H_PAGE_SET_LOANED (H_PAGE_SET_UNUSED | (1ULL<<(63-31))) > +#define H_PAGE_SET_ACTIVE H_PAGE_STATE_CHANGE > +#define H_AVPN (1ULL<<(63-32)) /* An avpn is provided as a sanity test */ > +#define H_ANDCOND (1ULL<<(63-33)) > +#define H_ICACHE_INVALIDATE (1ULL<<(63-40)) /* icbi, etc. (ignored for IO pages) */ > +#define H_ICACHE_SYNCHRONIZE (1ULL<<(63-41)) /* dcbst, icbi, etc (ignored for IO pages */ > +#define H_ZERO_PAGE (1ULL<<(63-48)) /* zero the page before mapping (ignored for IO pages) */ > +#define H_COPY_PAGE (1ULL<<(63-49)) > +#define H_N (1ULL<<(63-61)) > +#define H_PP1 (1ULL<<(63-62)) > +#define H_PP2 (1ULL<<(63-63)) > + > +/* VASI States */ > +#define H_VASI_INVALID 0 > +#define H_VASI_ENABLED 1 > +#define H_VASI_ABORTED 2 > +#define H_VASI_SUSPENDING 3 > +#define H_VASI_SUSPENDED 4 > +#define H_VASI_RESUMED 5 > +#define H_VASI_COMPLETED 6 > + > +/* DABRX flags */ > +#define H_DABRX_HYPERVISOR (1ULL<<(63-61)) > +#define H_DABRX_KERNEL (1ULL<<(63-62)) > +#define H_DABRX_USER (1ULL<<(63-63)) > + > +/* Each control block has to be on a 4K bondary */ > +#define H_CB_ALIGNMENT 4096 > + > +/* pSeries hypervisor opcodes */ > +#define H_REMOVE 0x04 > +#define H_ENTER 0x08 > +#define H_READ 0x0c > +#define H_CLEAR_MOD 0x10 > +#define H_CLEAR_REF 0x14 > +#define H_PROTECT 0x18 > +#define H_GET_TCE 0x1c > +#define H_PUT_TCE 0x20 > +#define H_SET_SPRG0 0x24 > +#define H_SET_DABR 0x28 > +#define H_PAGE_INIT 0x2c > +#define H_SET_ASR 0x30 > +#define H_ASR_ON 0x34 > +#define H_ASR_OFF 0x38 > +#define H_LOGICAL_CI_LOAD 0x3c > +#define H_LOGICAL_CI_STORE 0x40 > +#define H_LOGICAL_CACHE_LOAD 0x44 > +#define H_LOGICAL_CACHE_STORE 0x48 > +#define H_LOGICAL_ICBI 0x4c > +#define H_LOGICAL_DCBF 0x50 > +#define H_GET_TERM_CHAR 0x54 > +#define H_PUT_TERM_CHAR 0x58 > +#define H_REAL_TO_LOGICAL 0x5c > +#define H_HYPERVISOR_DATA 0x60 > +#define H_EOI 0x64 > +#define H_CPPR 0x68 > +#define H_IPI 0x6c > +#define H_IPOLL 0x70 > +#define H_XIRR 0x74 > +#define H_PERFMON 0x7c > +#define H_MIGRATE_DMA 0x78 > +#define H_REGISTER_VPA 0xDC > +#define H_CEDE 0xE0 > +#define H_CONFER 0xE4 > +#define H_PROD 0xE8 > +#define H_GET_PPP 0xEC > +#define H_SET_PPP 0xF0 > +#define H_PURR 0xF4 > +#define H_PIC 0xF8 > +#define H_REG_CRQ 0xFC > +#define H_FREE_CRQ 0x100 > +#define H_VIO_SIGNAL 0x104 > +#define H_SEND_CRQ 0x108 > +#define H_COPY_RDMA 0x110 > +#define H_REGISTER_LOGICAL_LAN 0x114 > +#define H_FREE_LOGICAL_LAN 0x118 > +#define H_ADD_LOGICAL_LAN_BUFFER 0x11C > +#define H_SEND_LOGICAL_LAN 0x120 > +#define H_BULK_REMOVE 0x124 > +#define H_MULTICAST_CTRL 0x130 > +#define H_SET_XDABR 0x134 > +#define H_STUFF_TCE 0x138 > +#define H_PUT_TCE_INDIRECT 0x13C > +#define H_CHANGE_LOGICAL_LAN_MAC 0x14C > +#define H_VTERM_PARTNER_INFO 0x150 > +#define H_REGISTER_VTERM 0x154 > +#define H_FREE_VTERM 0x158 > +#define H_RESET_EVENTS 0x15C > +#define H_ALLOC_RESOURCE 0x160 > +#define H_FREE_RESOURCE 0x164 > +#define H_MODIFY_QP 0x168 > +#define H_QUERY_QP 0x16C > +#define H_REREGISTER_PMR 0x170 > +#define H_REGISTER_SMR 0x174 > +#define H_QUERY_MR 0x178 > +#define H_QUERY_MW 0x17C > +#define H_QUERY_HCA 0x180 > +#define H_QUERY_PORT 0x184 > +#define H_MODIFY_PORT 0x188 > +#define H_DEFINE_AQP1 0x18C > +#define H_GET_TRACE_BUFFER 0x190 > +#define H_DEFINE_AQP0 0x194 > +#define H_RESIZE_MR 0x198 > +#define H_ATTACH_MCQP 0x19C > +#define H_DETACH_MCQP 0x1A0 > +#define H_CREATE_RPT 0x1A4 > +#define H_REMOVE_RPT 0x1A8 > +#define H_REGISTER_RPAGES 0x1AC > +#define H_DISABLE_AND_GETC 0x1B0 > +#define H_ERROR_DATA 0x1B4 > +#define H_GET_HCA_INFO 0x1B8 > +#define H_GET_PERF_COUNT 0x1BC > +#define H_MANAGE_TRACE 0x1C0 > +#define H_FREE_LOGICAL_LAN_BUFFER 0x1D4 > +#define H_QUERY_INT_STATE 0x1E4 > +#define H_POLL_PENDING 0x1D8 > +#define H_ILLAN_ATTRIBUTES 0x244 > +#define H_MODIFY_HEA_QP 0x250 > +#define H_QUERY_HEA_QP 0x254 > +#define H_QUERY_HEA 0x258 > +#define H_QUERY_HEA_PORT 0x25C > +#define H_MODIFY_HEA_PORT 0x260 > +#define H_REG_BCMC 0x264 > +#define H_DEREG_BCMC 0x268 > +#define H_REGISTER_HEA_RPAGES 0x26C > +#define H_DISABLE_AND_GET_HEA 0x270 > +#define H_GET_HEA_INFO 0x274 > +#define H_ALLOC_HEA_RESOURCE 0x278 > +#define H_ADD_CONN 0x284 > +#define H_DEL_CONN 0x288 > +#define H_JOIN 0x298 > +#define H_VASI_STATE 0x2A4 > +#define H_ENABLE_CRQ 0x2B0 > +#define H_GET_EM_PARMS 0x2B8 > +#define H_SET_MPP 0x2D0 > +#define H_GET_MPP 0x2D4 > +#define MAX_HCALL_OPCODE H_GET_MPP > + > +typedef target_ulong (*spapr_hcall_fn)(CPUState *env, sPAPREnvironment *spapr, > + target_ulong opcode, > + target_ulong *args); > + > +void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn); > +target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr, > + target_ulong opcode, target_ulong *args); > + > + > +#endif /* !defined (__HW_SPAPR_H__) */ > diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c > new file mode 100644 > index 0000000..6ddac00 > --- /dev/null > +++ b/hw/spapr_hcall.c > @@ -0,0 +1,43 @@ > +#include "sysemu.h" > +#include "cpu.h" > +#include "qemu-char.h" > +#include "hw/spapr.h" > + > +struct hypercall { > + spapr_hcall_fn fn; > +} hypercall_table[(MAX_HCALL_OPCODE / 4) + 1]; This isn't following CODING_STYLE. > +void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn) > +{ > + struct hypercall *hc; > + > + assert(opcode<= MAX_HCALL_OPCODE); > + assert((opcode& 0x3) == 0); > + > + hc = hypercall_table + (opcode / 4); > + > + assert(!hc->fn || (fn == hc->fn)); > + > + hc->fn = fn; > +} > + > +target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr, > + target_ulong opcode, target_ulong *args) > +{ > + if (msr_pr) { > + fprintf(stderr, "Hypercall made with MSR=0x" TARGET_FMT_lx "\n", > + env->msr); > + return H_PRIVILEGE; > + } > + > + if ((opcode<= MAX_HCALL_OPCODE) > +&& ((opcode& 0x3) == 0)) { > + struct hypercall *hc = hypercall_table + (opcode / 4); > + > + if (hc->fn) > + return hc->fn(env, spapr, opcode, args); > + } > + > + fprintf(stderr, "Unimplemented hcall 0x" TARGET_FMT_lx "\n", opcode); > + return H_FUNCTION; > +} Regards, Anthony Liguori
On 16.03.2011, at 22:59, Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/15/2011 11:56 PM, David Gibson wrote: >> This patch adds a "pseries" machine to qemu. This aims to emulate a >> logical partition on an IBM pSeries machine, compliant to the >> "PowerPC Architecture Platform Requirements" (PAPR) document. > > Can we call the machine 'papr' or at least 'lpar' > > Technically speaking, System P is the proper name these days, but I think papr or lpar would make a lot more sense to people. I actually find the name pretty nice. It gives you what you'd expect without knowing ibm acronyms. Lpar is just plain wrong semantically. It's a different dimension. Papr would work, but then I'd rather go for spapr as there also is an epapr. Alex >
On Wed, Mar 16, 2011 at 04:59:22PM -0500, Anthony Liguori wrote: > On 03/15/2011 11:56 PM, David Gibson wrote: > >This patch adds a "pseries" machine to qemu. This aims to emulate a > >logical partition on an IBM pSeries machine, compliant to the > >"PowerPC Architecture Platform Requirements" (PAPR) document. > > Can we call the machine 'papr' or at least 'lpar' > > Technically speaking, System P is the proper name these days, but I > think papr or lpar would make a lot more sense to people. Well, I thought about renaming it to "spapr", but we thought "pseries" was a name far more likely to be familiar to most people.
diff --git a/Makefile.target b/Makefile.target index f0df98e..e6a7557 100644 --- a/Makefile.target +++ b/Makefile.target @@ -231,6 +231,8 @@ obj-ppc-y += ppc_prep.o obj-ppc-y += ppc_oldworld.o # NewWorld PowerMac obj-ppc-y += ppc_newworld.o +# IBM pSeries (sPAPR) +obj-ppc-y += spapr.o spapr_hcall.o # PowerPC 4xx boards obj-ppc-y += ppc4xx_devs.o ppc4xx_pci.o ppc405_uc.o ppc405_boards.o obj-ppc-y += ppc440.o ppc440_bamboo.o diff --git a/hw/spapr.c b/hw/spapr.c new file mode 100644 index 0000000..8b4e16e --- /dev/null +++ b/hw/spapr.c @@ -0,0 +1,314 @@ +/* + * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator + * + * Copyright (c) 2004-2007 Fabrice Bellard + * Copyright (c) 2007 Jocelyn Mayer + * Copyright (c) 2010 David Gibson, IBM Corporation. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + * + */ +#include "sysemu.h" +#include "qemu-char.h" +#include "hw.h" +#include "elf.h" + +#include "hw/boards.h" +#include "hw/ppc.h" +#include "hw/loader.h" + +#include "hw/spapr.h" + +#include <libfdt.h> + +#define KERNEL_LOAD_ADDR 0x00000000 +#define INITRD_LOAD_ADDR 0x02800000 +#define FDT_MAX_SIZE 0x10000 + +#define TIMEBASE_FREQ 512000000ULL + +#define MAX_CPUS 32 + +static void *spapr_create_fdt(int *fdt_size, ram_addr_t ramsize, + const char *cpu_model, CPUState *envs[], + sPAPREnvironment *spapr, + target_phys_addr_t initrd_base, + target_phys_addr_t initrd_size, + const char *kernel_cmdline) +{ + void *fdt; + uint64_t mem_reg_property[] = { 0, cpu_to_be64(ramsize) }; + uint32_t start_prop = cpu_to_be32(initrd_base); + uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size); + int i; + char *modelname; + +#define _FDT(exp) \ + do { \ + int ret = (exp); \ + if (ret < 0) { \ + hw_error("qemu: error creating device tree: %s: %s\n", \ + #exp, fdt_strerror(ret)); \ + return NULL; \ + } \ + } while (0) + + fdt = qemu_mallocz(FDT_MAX_SIZE); + _FDT((fdt_create(fdt, FDT_MAX_SIZE))); + + _FDT((fdt_finish_reservemap(fdt))); + + /* Root node */ + _FDT((fdt_begin_node(fdt, ""))); + _FDT((fdt_property_string(fdt, "device_type", "chrp"))); + _FDT((fdt_property_string(fdt, "model", "qemu,emulated-pSeries-LPAR"))); + + _FDT((fdt_property_cell(fdt, "#address-cells", 0x2))); + _FDT((fdt_property_cell(fdt, "#size-cells", 0x2))); + + /* /chosen */ + _FDT((fdt_begin_node(fdt, "chosen"))); + + _FDT((fdt_property_string(fdt, "bootargs", kernel_cmdline))); + _FDT((fdt_property(fdt, "linux,initrd-start", &start_prop, sizeof(start_prop)))); + _FDT((fdt_property(fdt, "linux,initrd-end", &end_prop, sizeof(end_prop)))); + + _FDT((fdt_end_node(fdt))); + + /* memory node */ + _FDT((fdt_begin_node(fdt, "memory@0"))); + + _FDT((fdt_property_string(fdt, "device_type", "memory"))); + _FDT((fdt_property(fdt, "reg", mem_reg_property, sizeof(mem_reg_property)))); + + _FDT((fdt_end_node(fdt))); + + /* cpus */ + _FDT((fdt_begin_node(fdt, "cpus"))); + + _FDT((fdt_property_cell(fdt, "#address-cells", 0x1))); + _FDT((fdt_property_cell(fdt, "#size-cells", 0x0))); + + modelname = qemu_strdup(cpu_model); + + for (i = 0; i < strlen(modelname); i++) { + modelname[i] = toupper(modelname[i]); + } + + for (i = 0; i < smp_cpus; i++) { + CPUState *env = envs[i]; + char *nodename; + uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40), + 0xffffffff, 0xffffffff}; + + if (asprintf(&nodename, "%s@%x", modelname, i) < 0) { + fprintf(stderr, "Allocation failure\n"); + exit(1); + } + + _FDT((fdt_begin_node(fdt, nodename))); + + free(nodename); + + _FDT((fdt_property_cell(fdt, "reg", i))); + _FDT((fdt_property_string(fdt, "device_type", "cpu"))); + + _FDT((fdt_property_cell(fdt, "cpu-version", env->spr[SPR_PVR]))); + _FDT((fdt_property_cell(fdt, "dcache-block-size", env->dcache_line_size))); + _FDT((fdt_property_cell(fdt, "icache-block-size", env->icache_line_size))); + _FDT((fdt_property_cell(fdt, "timebase-frequency", TIMEBASE_FREQ))); + /* Hardcode CPU frequency for now. It's kind of arbitrary on + * full emu, for kvm we should copy it from the host */ + _FDT((fdt_property_cell(fdt, "clock-frequency", 1000000000))); + _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr))); + _FDT((fdt_property_string(fdt, "status", "okay"))); + _FDT((fdt_property(fdt, "64-bit", NULL, 0))); + + if (envs[i]->mmu_model & POWERPC_MMU_1TSEG) { + _FDT((fdt_property(fdt, "ibm,processor-segment-sizes", + segs, sizeof(segs)))); + } + + _FDT((fdt_end_node(fdt))); + } + + qemu_free(modelname); + + _FDT((fdt_end_node(fdt))); + + _FDT((fdt_end_node(fdt))); /* close root node */ + _FDT((fdt_finish(fdt))); + + if (fdt_size) { + *fdt_size = fdt_totalsize(fdt); + } + + return fdt; +} + +static uint64_t translate_kernel_address(void *opaque, uint64_t addr) +{ + return (addr & 0x0fffffff) + KERNEL_LOAD_ADDR; +} + +static void emulate_spapr_hypercall(CPUState *env, void *opaque) +{ + env->gpr[3] = spapr_hypercall(env, (sPAPREnvironment *)opaque, + env->gpr[3], &env->gpr[4]); +} + +/* FIXME: hack until we implement the proper VIO console */ +static target_ulong h_put_term_char(CPUState *env, sPAPREnvironment *spapr, + target_ulong opcode, target_ulong *args) +{ + uint8_t buf[16]; + + stq_p(buf, args[2]); + stq_p(buf + 8, args[3]); + + qemu_chr_write(serial_hds[0], buf, args[1]); + + return 0; +} + + +/* pSeries LPAR / sPAPR hardware init */ +static void ppc_spapr_init(ram_addr_t ram_size, + const char *boot_device, + const char *kernel_filename, + const char *kernel_cmdline, + const char *initrd_filename, + const char *cpu_model) +{ + CPUState *envs[MAX_CPUS]; + void *fdt; + int i; + ram_addr_t ram_offset; + target_phys_addr_t fdt_addr; + uint32_t kernel_base, initrd_base; + long kernel_size, initrd_size; + int fdt_size; + sPAPREnvironment *spapr; + + spapr = qemu_malloc(sizeof(*spapr)); + + /* We place the device tree just below either the top of RAM, or + * 2GB, so that it can be processed with 32-bit code if + * necessary */ + fdt_addr = MIN(ram_size, 0x80000000) - FDT_MAX_SIZE; + + /* init CPUs */ + if (cpu_model == NULL) { + cpu_model = "POWER7"; + } + for (i = 0; i < smp_cpus; i++) { + CPUState *env = cpu_init(cpu_model); + + if (!env) { + fprintf(stderr, "Unable to find PowerPC CPU definition\n"); + exit(1); + } + /* Set time-base frequency to 512 MHz */ + cpu_ppc_tb_init(env, TIMEBASE_FREQ); + qemu_register_reset((QEMUResetHandler*)&cpu_reset, env); + + env->emulate_hypercall = emulate_spapr_hypercall; + env->hcall_opaque = spapr; + + env->hreset_vector = 0x60; + env->hreset_excp_prefix = 0; + env->gpr[3] = i; + + envs[i] = env; + } + + /* allocate RAM */ + ram_offset = qemu_ram_alloc(NULL, "ppc_spapr.ram", ram_size); + cpu_register_physical_memory(0, ram_size, ram_offset); + + spapr_register_hypercall(H_PUT_TERM_CHAR, h_put_term_char); + + if (kernel_filename) { + uint64_t lowaddr = 0; + + kernel_base = KERNEL_LOAD_ADDR; + + kernel_size = load_elf(kernel_filename, translate_kernel_address, NULL, + NULL, &lowaddr, NULL, 1, ELF_MACHINE, 0); + if (kernel_size < 0) { + kernel_size = load_image_targphys(kernel_filename, kernel_base, + ram_size - kernel_base); + } + if (kernel_size < 0) { + hw_error("qemu: could not load kernel '%s'\n", kernel_filename); + exit(1); + } + + /* load initrd */ + if (initrd_filename) { + initrd_base = INITRD_LOAD_ADDR; + initrd_size = load_image_targphys(initrd_filename, initrd_base, + ram_size - initrd_base); + if (initrd_size < 0) { + hw_error("qemu: could not load initial ram disk '%s'\n", + initrd_filename); + exit(1); + } + } else { + initrd_base = 0; + initrd_size = 0; + } + + } else { + fprintf(stderr, "pSeries machine needs -kernel for now"); + exit(1); + } + + /* Prepare the device tree */ + fdt = spapr_create_fdt(&fdt_size, ram_size, cpu_model, envs, spapr, + initrd_base, initrd_size, kernel_cmdline); + if (!fdt) { + hw_error("Couldn't create pSeries device tree\n"); + exit(1); + } + + cpu_physical_memory_write(fdt_addr, fdt, fdt_size); + + qemu_free(fdt); + + envs[0]->gpr[3] = fdt_addr; + envs[0]->gpr[5] = 0; + envs[0]->hreset_vector = kernel_base; +} + +static QEMUMachine spapr_machine = { + .name = "pseries", + .desc = "pSeries Logical Partition (PAPR compliant)", + .init = ppc_spapr_init, + .max_cpus = MAX_CPUS, + .no_vga = 1, + .no_parallel = 1, +}; + +static void spapr_machine_init(void) +{ + qemu_register_machine(&spapr_machine); +} + +machine_init(spapr_machine_init); diff --git a/hw/spapr.h b/hw/spapr.h new file mode 100644 index 0000000..9e63a19 --- /dev/null +++ b/hw/spapr.h @@ -0,0 +1,246 @@ +#if !defined (__HW_SPAPR_H__) +#define __HW_SPAPR_H__ + +typedef struct sPAPREnvironment { +} sPAPREnvironment; + +#define H_SUCCESS 0 +#define H_BUSY 1 /* Hardware busy -- retry later */ +#define H_CLOSED 2 /* Resource closed */ +#define H_NOT_AVAILABLE 3 +#define H_CONSTRAINED 4 /* Resource request constrained to max allowed */ +#define H_PARTIAL 5 +#define H_IN_PROGRESS 14 /* Kind of like busy */ +#define H_PAGE_REGISTERED 15 +#define H_PARTIAL_STORE 16 +#define H_PENDING 17 /* returned from H_POLL_PENDING */ +#define H_CONTINUE 18 /* Returned from H_Join on success */ +#define H_LONG_BUSY_START_RANGE 9900 /* Start of long busy range */ +#define H_LONG_BUSY_ORDER_1_MSEC 9900 /* Long busy, hint that 1msec \ + is a good time to retry */ +#define H_LONG_BUSY_ORDER_10_MSEC 9901 /* Long busy, hint that 10msec \ + is a good time to retry */ +#define H_LONG_BUSY_ORDER_100_MSEC 9902 /* Long busy, hint that 100msec \ + is a good time to retry */ +#define H_LONG_BUSY_ORDER_1_SEC 9903 /* Long busy, hint that 1sec \ + is a good time to retry */ +#define H_LONG_BUSY_ORDER_10_SEC 9904 /* Long busy, hint that 10sec \ + is a good time to retry */ +#define H_LONG_BUSY_ORDER_100_SEC 9905 /* Long busy, hint that 100sec \ + is a good time to retry */ +#define H_LONG_BUSY_END_RANGE 9905 /* End of long busy range */ +#define H_HARDWARE -1 /* Hardware error */ +#define H_FUNCTION -2 /* Function not supported */ +#define H_PRIVILEGE -3 /* Caller not privileged */ +#define H_PARAMETER -4 /* Parameter invalid, out-of-range or conflicting */ +#define H_BAD_MODE -5 /* Illegal msr value */ +#define H_PTEG_FULL -6 /* PTEG is full */ +#define H_NOT_FOUND -7 /* PTE was not found" */ +#define H_RESERVED_DABR -8 /* DABR address is reserved by the hypervisor on this processor" */ +#define H_NO_MEM -9 +#define H_AUTHORITY -10 +#define H_PERMISSION -11 +#define H_DROPPED -12 +#define H_SOURCE_PARM -13 +#define H_DEST_PARM -14 +#define H_REMOTE_PARM -15 +#define H_RESOURCE -16 +#define H_ADAPTER_PARM -17 +#define H_RH_PARM -18 +#define H_RCQ_PARM -19 +#define H_SCQ_PARM -20 +#define H_EQ_PARM -21 +#define H_RT_PARM -22 +#define H_ST_PARM -23 +#define H_SIGT_PARM -24 +#define H_TOKEN_PARM -25 +#define H_MLENGTH_PARM -27 +#define H_MEM_PARM -28 +#define H_MEM_ACCESS_PARM -29 +#define H_ATTR_PARM -30 +#define H_PORT_PARM -31 +#define H_MCG_PARM -32 +#define H_VL_PARM -33 +#define H_TSIZE_PARM -34 +#define H_TRACE_PARM -35 + +#define H_MASK_PARM -37 +#define H_MCG_FULL -38 +#define H_ALIAS_EXIST -39 +#define H_P_COUNTER -40 +#define H_TABLE_FULL -41 +#define H_ALT_TABLE -42 +#define H_MR_CONDITION -43 +#define H_NOT_ENOUGH_RESOURCES -44 +#define H_R_STATE -45 +#define H_RESCINDEND -46 +#define H_MULTI_THREADS_ACTIVE -9005 + + +/* Long Busy is a condition that can be returned by the firmware + * when a call cannot be completed now, but the identical call + * should be retried later. This prevents calls blocking in the + * firmware for long periods of time. Annoyingly the firmware can return + * a range of return codes, hinting at how long we should wait before + * retrying. If you don't care for the hint, the macro below is a good + * way to check for the long_busy return codes + */ +#define H_IS_LONG_BUSY(x) ((x >= H_LONG_BUSY_START_RANGE) \ + && (x <= H_LONG_BUSY_END_RANGE)) + +/* Flags */ +#define H_LARGE_PAGE (1ULL<<(63-16)) +#define H_EXACT (1ULL<<(63-24)) /* Use exact PTE or return H_PTEG_FULL */ +#define H_R_XLATE (1ULL<<(63-25)) /* include a valid logical page num in the pte if the valid bit is set */ +#define H_READ_4 (1ULL<<(63-26)) /* Return 4 PTEs */ +#define H_PAGE_STATE_CHANGE (1ULL<<(63-28)) +#define H_PAGE_UNUSED ((1ULL<<(63-29)) | (1ULL<<(63-30))) +#define H_PAGE_SET_UNUSED (H_PAGE_STATE_CHANGE | H_PAGE_UNUSED) +#define H_PAGE_SET_LOANED (H_PAGE_SET_UNUSED | (1ULL<<(63-31))) +#define H_PAGE_SET_ACTIVE H_PAGE_STATE_CHANGE +#define H_AVPN (1ULL<<(63-32)) /* An avpn is provided as a sanity test */ +#define H_ANDCOND (1ULL<<(63-33)) +#define H_ICACHE_INVALIDATE (1ULL<<(63-40)) /* icbi, etc. (ignored for IO pages) */ +#define H_ICACHE_SYNCHRONIZE (1ULL<<(63-41)) /* dcbst, icbi, etc (ignored for IO pages */ +#define H_ZERO_PAGE (1ULL<<(63-48)) /* zero the page before mapping (ignored for IO pages) */ +#define H_COPY_PAGE (1ULL<<(63-49)) +#define H_N (1ULL<<(63-61)) +#define H_PP1 (1ULL<<(63-62)) +#define H_PP2 (1ULL<<(63-63)) + +/* VASI States */ +#define H_VASI_INVALID 0 +#define H_VASI_ENABLED 1 +#define H_VASI_ABORTED 2 +#define H_VASI_SUSPENDING 3 +#define H_VASI_SUSPENDED 4 +#define H_VASI_RESUMED 5 +#define H_VASI_COMPLETED 6 + +/* DABRX flags */ +#define H_DABRX_HYPERVISOR (1ULL<<(63-61)) +#define H_DABRX_KERNEL (1ULL<<(63-62)) +#define H_DABRX_USER (1ULL<<(63-63)) + +/* Each control block has to be on a 4K bondary */ +#define H_CB_ALIGNMENT 4096 + +/* pSeries hypervisor opcodes */ +#define H_REMOVE 0x04 +#define H_ENTER 0x08 +#define H_READ 0x0c +#define H_CLEAR_MOD 0x10 +#define H_CLEAR_REF 0x14 +#define H_PROTECT 0x18 +#define H_GET_TCE 0x1c +#define H_PUT_TCE 0x20 +#define H_SET_SPRG0 0x24 +#define H_SET_DABR 0x28 +#define H_PAGE_INIT 0x2c +#define H_SET_ASR 0x30 +#define H_ASR_ON 0x34 +#define H_ASR_OFF 0x38 +#define H_LOGICAL_CI_LOAD 0x3c +#define H_LOGICAL_CI_STORE 0x40 +#define H_LOGICAL_CACHE_LOAD 0x44 +#define H_LOGICAL_CACHE_STORE 0x48 +#define H_LOGICAL_ICBI 0x4c +#define H_LOGICAL_DCBF 0x50 +#define H_GET_TERM_CHAR 0x54 +#define H_PUT_TERM_CHAR 0x58 +#define H_REAL_TO_LOGICAL 0x5c +#define H_HYPERVISOR_DATA 0x60 +#define H_EOI 0x64 +#define H_CPPR 0x68 +#define H_IPI 0x6c +#define H_IPOLL 0x70 +#define H_XIRR 0x74 +#define H_PERFMON 0x7c +#define H_MIGRATE_DMA 0x78 +#define H_REGISTER_VPA 0xDC +#define H_CEDE 0xE0 +#define H_CONFER 0xE4 +#define H_PROD 0xE8 +#define H_GET_PPP 0xEC +#define H_SET_PPP 0xF0 +#define H_PURR 0xF4 +#define H_PIC 0xF8 +#define H_REG_CRQ 0xFC +#define H_FREE_CRQ 0x100 +#define H_VIO_SIGNAL 0x104 +#define H_SEND_CRQ 0x108 +#define H_COPY_RDMA 0x110 +#define H_REGISTER_LOGICAL_LAN 0x114 +#define H_FREE_LOGICAL_LAN 0x118 +#define H_ADD_LOGICAL_LAN_BUFFER 0x11C +#define H_SEND_LOGICAL_LAN 0x120 +#define H_BULK_REMOVE 0x124 +#define H_MULTICAST_CTRL 0x130 +#define H_SET_XDABR 0x134 +#define H_STUFF_TCE 0x138 +#define H_PUT_TCE_INDIRECT 0x13C +#define H_CHANGE_LOGICAL_LAN_MAC 0x14C +#define H_VTERM_PARTNER_INFO 0x150 +#define H_REGISTER_VTERM 0x154 +#define H_FREE_VTERM 0x158 +#define H_RESET_EVENTS 0x15C +#define H_ALLOC_RESOURCE 0x160 +#define H_FREE_RESOURCE 0x164 +#define H_MODIFY_QP 0x168 +#define H_QUERY_QP 0x16C +#define H_REREGISTER_PMR 0x170 +#define H_REGISTER_SMR 0x174 +#define H_QUERY_MR 0x178 +#define H_QUERY_MW 0x17C +#define H_QUERY_HCA 0x180 +#define H_QUERY_PORT 0x184 +#define H_MODIFY_PORT 0x188 +#define H_DEFINE_AQP1 0x18C +#define H_GET_TRACE_BUFFER 0x190 +#define H_DEFINE_AQP0 0x194 +#define H_RESIZE_MR 0x198 +#define H_ATTACH_MCQP 0x19C +#define H_DETACH_MCQP 0x1A0 +#define H_CREATE_RPT 0x1A4 +#define H_REMOVE_RPT 0x1A8 +#define H_REGISTER_RPAGES 0x1AC +#define H_DISABLE_AND_GETC 0x1B0 +#define H_ERROR_DATA 0x1B4 +#define H_GET_HCA_INFO 0x1B8 +#define H_GET_PERF_COUNT 0x1BC +#define H_MANAGE_TRACE 0x1C0 +#define H_FREE_LOGICAL_LAN_BUFFER 0x1D4 +#define H_QUERY_INT_STATE 0x1E4 +#define H_POLL_PENDING 0x1D8 +#define H_ILLAN_ATTRIBUTES 0x244 +#define H_MODIFY_HEA_QP 0x250 +#define H_QUERY_HEA_QP 0x254 +#define H_QUERY_HEA 0x258 +#define H_QUERY_HEA_PORT 0x25C +#define H_MODIFY_HEA_PORT 0x260 +#define H_REG_BCMC 0x264 +#define H_DEREG_BCMC 0x268 +#define H_REGISTER_HEA_RPAGES 0x26C +#define H_DISABLE_AND_GET_HEA 0x270 +#define H_GET_HEA_INFO 0x274 +#define H_ALLOC_HEA_RESOURCE 0x278 +#define H_ADD_CONN 0x284 +#define H_DEL_CONN 0x288 +#define H_JOIN 0x298 +#define H_VASI_STATE 0x2A4 +#define H_ENABLE_CRQ 0x2B0 +#define H_GET_EM_PARMS 0x2B8 +#define H_SET_MPP 0x2D0 +#define H_GET_MPP 0x2D4 +#define MAX_HCALL_OPCODE H_GET_MPP + +typedef target_ulong (*spapr_hcall_fn)(CPUState *env, sPAPREnvironment *spapr, + target_ulong opcode, + target_ulong *args); + +void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn); +target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr, + target_ulong opcode, target_ulong *args); + + +#endif /* !defined (__HW_SPAPR_H__) */ diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c new file mode 100644 index 0000000..6ddac00 --- /dev/null +++ b/hw/spapr_hcall.c @@ -0,0 +1,43 @@ +#include "sysemu.h" +#include "cpu.h" +#include "qemu-char.h" +#include "hw/spapr.h" + +struct hypercall { + spapr_hcall_fn fn; +} hypercall_table[(MAX_HCALL_OPCODE / 4) + 1]; + +void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn) +{ + struct hypercall *hc; + + assert(opcode <= MAX_HCALL_OPCODE); + assert((opcode & 0x3) == 0); + + hc = hypercall_table + (opcode / 4); + + assert(!hc->fn || (fn == hc->fn)); + + hc->fn = fn; +} + +target_ulong spapr_hypercall(CPUState *env, sPAPREnvironment *spapr, + target_ulong opcode, target_ulong *args) +{ + if (msr_pr) { + fprintf(stderr, "Hypercall made with MSR=0x" TARGET_FMT_lx "\n", + env->msr); + return H_PRIVILEGE; + } + + if ((opcode <= MAX_HCALL_OPCODE) + && ((opcode & 0x3) == 0)) { + struct hypercall *hc = hypercall_table + (opcode / 4); + + if (hc->fn) + return hc->fn(env, spapr, opcode, args); + } + + fprintf(stderr, "Unimplemented hcall 0x" TARGET_FMT_lx "\n", opcode); + return H_FUNCTION; +}
This patch adds a "pseries" machine to qemu. This aims to emulate a logical partition on an IBM pSeries machine, compliant to the "PowerPC Architecture Platform Requirements" (PAPR) document. This initial version is quite limited, it implements a basic machine and PAPR hypercall emulation. So far only one hypercall is present - H_PUT_TERM_CHAR - so that a (write-only) console is available. Multiple CPUs are permitted, with SMP entry handled kexec() style. The machine so far more resembles an old POWER4 style "full system partition" rather than a modern LPAR, in that the guest manages the page tables directly, rather than via hypercalls. The machine requires qemu to be configured with --enable-fdt. The machine can (so far) only be booted with -kernel - i.e. no partition firmware is provided. Signed-off-by: David Gibson <dwg@au1.ibm.com> --- Makefile.target | 2 + hw/spapr.c | 314 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ hw/spapr.h | 246 ++++++++++++++++++++++++++++++++++++++++++ hw/spapr_hcall.c | 43 ++++++++ 4 files changed, 605 insertions(+), 0 deletions(-) create mode 100644 hw/spapr.c create mode 100644 hw/spapr.h create mode 100644 hw/spapr_hcall.c