From patchwork Thu Oct 31 09:28:12 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 287459 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 17D442C03BB for ; Thu, 31 Oct 2013 22:11:55 +1100 (EST) Received: from localhost ([::1]:56355 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VbqAJ-0004of-Lf for incoming@patchwork.ozlabs.org; Thu, 31 Oct 2013 07:11:51 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35916) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vbq9t-0004l8-I9 for qemu-devel@nongnu.org; Thu, 31 Oct 2013 07:11:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vbq9l-00027o-Bc for qemu-devel@nongnu.org; Thu, 31 Oct 2013 07:11:25 -0400 Received: from e23smtp01.au.ibm.com ([202.81.31.143]:49239) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VboYT-0006ML-SQ for qemu-devel@nongnu.org; Thu, 31 Oct 2013 05:28:42 -0400 Received: from /spool/local by e23smtp01.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 31 Oct 2013 19:28:35 +1000 Received: from d23dlp02.au.ibm.com (202.81.31.213) by e23smtp01.au.ibm.com (202.81.31.207) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 31 Oct 2013 19:28:33 +1000 Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [9.190.235.21]) by d23dlp02.au.ibm.com (Postfix) with ESMTP id D4ED62BB0052; Thu, 31 Oct 2013 20:28:18 +1100 (EST) Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.235.138]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r9V9S6kr10355024; Thu, 31 Oct 2013 20:28:07 +1100 Received: from d23av02.au.ibm.com (localhost [127.0.0.1]) by d23av02.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r9V9SHOv008036; Thu, 31 Oct 2013 20:28:17 +1100 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av02.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id r9V9SH3Q008031; Thu, 31 Oct 2013 20:28:17 +1100 Received: from bran.ozlabs.ibm.com (haven.au.ibm.com [9.190.164.82]) by ozlabs.au.ibm.com (Postfix) with ESMTP id D7890A00A6; Thu, 31 Oct 2013 20:28:16 +1100 (EST) Received: from ka1.ozlabs.ibm.com (ka1.ozlabs.ibm.com [10.61.145.11]) by bran.ozlabs.ibm.com (Postfix) with ESMTP id 6F2EA16AAE6; Thu, 31 Oct 2013 20:28:15 +1100 (EST) From: Alexey Kardashevskiy To: qemu-devel@nongnu.org Date: Thu, 31 Oct 2013 20:28:12 +1100 Message-Id: <1383211692-25020-1-git-send-email-aik@ozlabs.ru> X-Mailer: git-send-email 1.8.4.rc4 MIME-Version: 1.0 X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13103109-1618-0000-0000-000004E661C1 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 202.81.31.143 Cc: Alexey Kardashevskiy , Badari Pulavarty , Alexander Graf , qemu-ppc@nongnu.org, Paolo Bonzini , Paul Mackerras , =?UTF-8?q?Andreas=20F=C3=A4rber?= , David Gibson Subject: [Qemu-devel] [PATCH] RFC: spapr: introduce smt_cpu_index X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org This is not really a patch to accept or review, this is a conversation starter. Cc: Badari Pulavarty Cc: Paul Mackerras Cc: David Gibson Cc: Benjamin Herrenschmidt Cc: Andreas Färber Signed-off-by: Alexey Kardashevskiy --- Normall CPUState::cpu_index is used to pick the right CPU for various operations. However default consecutive numbering does not always work for PPC64. For example, on POWER7 (which supports 4 threads per core), "-smp 8,threads=4" should create CPUs with indexes 0,1,2,3,4,5,6,7 and "-smp 8,threads=1" should create CPUs with indexes 0,4,8,12,16,20,24,28. These indexes are reflected in /proc/device-tree/cpus/PowerPC,POWER7@XX and used to call KVM VCPU's ioctl. In order to achieve this, kvmppc_fixup_cpu() was introduced. Roughly, it multiplies cpu_index by the number of threads per core. The approach used to work till we hit NUMA bug. For example, if to run QEMU with: -smp 16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-7,mem=500 -numa node,nodeid=1,cpus=8-15,mem=500 then CPUs 0 1 4 5 6 7 8 9 10 11 12 13 14 15 get assigned to node#0 and only CPUs 2 and 3 are assigned to node#1. This happens because kvmppc_fixup_cpu() does change cpu_index but does not change NUMA binding. We could fix kvmppc_fixup_cpu() to update NUMA binding but the whole idea of hacking cpu_index does not seem right as (for example) with "-smp 8,threads=1" the user would expect that QEMU Monitor's "cpu" command would accept indexes 0, 1, 2,.. but this is not true as it accepts 0, 4, 8,... And there is an existing mechanism to solve my issue which allows having architecture specific CPU indexes via kvm_arch_vcpu_id(). However it is: 1) KVM only and does not have empty stub for TCG; 2) Reverse converter is needed to get CPUState from KVM's CPU index (which does not exist) as sPAPR will use the KVM CPU index in various hypercalls. So I could: 1. fix kvmppc_fixup_cpu() to update NUMA binding; 2. Apply the patch below and make sure it does not break x86/ARM/s390 with and without KVM (does not it?); 3. Move kvm_arch_vcpu_id() + new kvm_arch_get_vcpu_by_id() out of KVM part and make it platform specific with or without KVM. What is the correct approach here? Thanks. --- hw/intc/xics_kvm.c | 14 +++++++------- hw/ppc/spapr.c | 9 +++++---- hw/ppc/spapr_hcall.c | 2 +- hw/ppc/spapr_rtas.c | 4 ++-- include/sysemu/kvm.h | 1 + kvm-stub.c | 10 ++++++++++ target-ppc/cpu.h | 3 +++ target-ppc/kvm.c | 24 +++++++++++++++++++++--- target-ppc/translate_init.c | 1 + 9 files changed, 51 insertions(+), 17 deletions(-) diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c index c203646..c96ca4d 100644 --- a/hw/intc/xics_kvm.c +++ b/hw/intc/xics_kvm.c @@ -65,7 +65,7 @@ static void icp_get_kvm_state(ICPState *ss) ret = kvm_vcpu_ioctl(ss->cs, KVM_GET_ONE_REG, ®); if (ret != 0) { error_report("Unable to retrieve KVM interrupt controller state" - " for CPU %d: %s", ss->cs->cpu_index, strerror(errno)); + " for CPU %ld: %s", kvm_arch_vcpu_id(ss->cs), strerror(errno)); exit(1); } @@ -97,7 +97,7 @@ static int icp_set_kvm_state(ICPState *ss, int version_id) ret = kvm_vcpu_ioctl(ss->cs, KVM_SET_ONE_REG, ®); if (ret != 0) { error_report("Unable to restore KVM interrupt controller state (0x%" - PRIx64 ") for CPU %d: %s", state, ss->cs->cpu_index, + PRIx64 ") for CPU %ld: %s", state, kvm_arch_vcpu_id(ss->cs), strerror(errno)); return ret; } @@ -313,9 +313,9 @@ static void xics_kvm_cpu_setup(XICSState *icp, PowerPCCPU *cpu) KVMXICSState *icpkvm = KVM_XICS(icp); cs = CPU(cpu); - ss = &icp->ss[cs->cpu_index]; + ss = &icp->ss[kvm_arch_vcpu_id(cs)]; - assert(cs->cpu_index < icp->nr_servers); + assert(kvm_arch_vcpu_id(cs) < icp->nr_servers); if (icpkvm->kernel_xics_fd == -1) { abort(); } @@ -325,15 +325,15 @@ static void xics_kvm_cpu_setup(XICSState *icp, PowerPCCPU *cpu) struct kvm_enable_cap xics_enable_cap = { .cap = KVM_CAP_IRQ_XICS, .flags = 0, - .args = {icpkvm->kernel_xics_fd, cs->cpu_index, 0, 0}, + .args = {icpkvm->kernel_xics_fd, kvm_arch_vcpu_id(cs), 0, 0}, }; ss->cs = cs; ret = kvm_vcpu_ioctl(ss->cs, KVM_ENABLE_CAP, &xics_enable_cap); if (ret < 0) { - error_report("Unable to connect CPU%d to kernel XICS: %s", - cs->cpu_index, strerror(errno)); + error_report("Unable to connect CPU%ld to kernel XICS: %s", + kvm_arch_vcpu_id(cs), strerror(errno)); exit(1); } } diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index f76b355..1e00982 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -206,19 +206,20 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr) CPU_FOREACH(cpu) { DeviceClass *dc = DEVICE_GET_CLASS(cpu); + int smt_cpu_index = kvm_arch_vcpu_id(cpu); uint32_t associativity[] = {cpu_to_be32(0x5), cpu_to_be32(0x0), cpu_to_be32(0x0), cpu_to_be32(0x0), cpu_to_be32(cpu->numa_node), - cpu_to_be32(cpu->cpu_index)}; + cpu_to_be32(smt_cpu_index)}; - if ((cpu->cpu_index % smt) != 0) { + if ((smt_cpu_index % smt) != 0) { continue; } snprintf(cpu_model, 32, "/cpus/%s@%x", dc->fw_name, - cpu->cpu_index); + smt_cpu_index); offset = fdt_path_offset(fdt, cpu_model); if (offset < 0) { @@ -367,7 +368,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, CPUPPCState *env = &cpu->env; DeviceClass *dc = DEVICE_GET_CLASS(cs); PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs); - int index = cs->cpu_index; + int index = kvm_arch_vcpu_id(cs); uint32_t servers_prop[smp_threads]; uint32_t gservers_prop[smp_threads * 2]; char *nodename; diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index f755a53..1742c17 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -466,7 +466,7 @@ static target_ulong h_register_vpa(PowerPCCPU *cpu, sPAPREnvironment *spapr, CPUPPCState *tenv; CPUState *tcpu; - tcpu = qemu_get_cpu(procno); + tcpu = kvm_arch_get_vcpu_by_id(procno); if (!tcpu) { return H_PARAMETER; } diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index eb542f2..90a16f1 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -139,7 +139,7 @@ static void rtas_query_cpu_stopped_state(PowerPCCPU *cpu_, } id = rtas_ld(args, 0); - cpu = qemu_get_cpu(id); + cpu = kvm_arch_get_vcpu_by_id(id); if (cpu != NULL) { if (cpu->halted) { rtas_st(rets, 1, 0); @@ -172,7 +172,7 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPREnvironment *spapr, start = rtas_ld(args, 1); r3 = rtas_ld(args, 2); - cs = qemu_get_cpu(id); + cs = kvm_arch_get_vcpu_by_id(id); if (cs != NULL) { PowerPCCPU *cpu = POWERPC_CPU(cs); CPUPPCState *env = &cpu->env; diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 3b25f27..6358ada 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -222,6 +222,7 @@ int kvm_arch_init_vcpu(CPUState *cpu); /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */ unsigned long kvm_arch_vcpu_id(CPUState *cpu); +CPUState *kvm_arch_get_vcpu_by_id(int index); void kvm_arch_reset_vcpu(CPUState *cpu); diff --git a/kvm-stub.c b/kvm-stub.c index e979f76..9d2acab 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -147,3 +147,13 @@ int kvm_irqchip_remove_irqfd_notifier(KVMState *s, EventNotifier *n, int virq) return -ENOSYS; } #endif + +unsigned long kvm_arch_vcpu_id(CPUState *cpu) +{ + return cpu->cpu_index; +} + +CPUState *kvm_arch_get_vcpu_by_id(int index) +{ + return qemu_get_cpu(index); +} diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index 26acdba..1eb417e 100644 --- a/target-ppc/cpu.h +++ b/target-ppc/cpu.h @@ -1068,6 +1068,9 @@ struct CPUPPCState { */ uint8_t fit_period[4]; uint8_t wdt_period[4]; + + /* The CPU index used for KVM */ + int smt_cpu_index; }; #define SET_FIT_PERIOD(a_, b_, c_, d_) \ diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 10d0cd9..8a8948a 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -399,9 +399,27 @@ static inline void kvm_fixup_page_sizes(PowerPCCPU *cpu) #endif /* !defined (TARGET_PPC64) */ -unsigned long kvm_arch_vcpu_id(CPUState *cpu) +unsigned long kvm_arch_vcpu_id(CPUState *cs) { - return cpu->cpu_index; + PowerPCCPU *cpu = POWERPC_CPU(cs); + CPUPPCState *env = &cpu->env; + + return env->smt_cpu_index; +} + +CPUState *kvm_arch_get_vcpu_by_id(int index) +{ + CPUState *cs; + + CPU_FOREACH(cs) { + PowerPCCPU *cpu = POWERPC_CPU(cs); + CPUPPCState *env = &cpu->env; + if (env->smt_cpu_index == index) { + return cs; + } + } + + return NULL; } int kvm_arch_init_vcpu(CPUState *cs) @@ -1773,7 +1791,7 @@ int kvmppc_fixup_cpu(PowerPCCPU *cpu) /* Adjust cpu index for SMT */ smt = kvmppc_smt_threads(); - cs->cpu_index = (cs->cpu_index / smp_threads) * smt + cpu->env.smt_cpu_index = (cs->cpu_index / smp_threads) * smt + (cs->cpu_index % smp_threads); return 0; diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index 3d3952c..6a7ebbd 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -8557,6 +8557,7 @@ static void ppc_cpu_initfn(Object *obj) cs->env_ptr = env; cpu_exec_init(env); + env->smt_cpu_index = cs->cpu_index; env->msr_mask = pcc->msr_mask; env->mmu_model = pcc->mmu_model;