From patchwork Sun Oct 6 20:44:57 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Michael S. Tsirkin" X-Patchwork-Id: 280911 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 7E79A2C0090 for ; Mon, 7 Oct 2013 07:43:04 +1100 (EST) Received: from localhost ([::1]:56382 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VSvAL-0004Ib-PU for incoming@patchwork.ozlabs.org; Sun, 06 Oct 2013 16:43:01 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44280) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VSvA1-0004IT-FS for qemu-devel@nongnu.org; Sun, 06 Oct 2013 16:42:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VSv9v-0006GB-FD for qemu-devel@nongnu.org; Sun, 06 Oct 2013 16:42:41 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12957) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VSv9v-0006G5-6s for qemu-devel@nongnu.org; Sun, 06 Oct 2013 16:42:35 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r96KgXRO003285 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 6 Oct 2013 16:42:33 -0400 Received: from redhat.com (vpn1-5-239.ams2.redhat.com [10.36.5.239]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id r96KgTPK004618; Sun, 6 Oct 2013 16:42:30 -0400 Date: Sun, 6 Oct 2013 23:44:57 +0300 From: "Michael S. Tsirkin" To: Igor Mammedov Message-ID: <20131006204457.GA20192@redhat.com> References: <1380812639-3868-1-git-send-email-mst@redhat.com> <1380812639-3868-19-git-send-email-mst@redhat.com> <20131004181842.0696600c@nial.usersys.redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20131004181842.0696600c@nial.usersys.redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: qemu-devel@nongnu.org, kraxel@redhat.com, Anthony Liguori , pbonzini@redhat.com, Laszlo Ersek , afaerber@suse.de Subject: Re: [Qemu-devel] [PATCH v8 18/26] i386: define pc guest info X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Fri, Oct 04, 2013 at 06:18:42PM +0200, Igor Mammedov wrote: > On Thu, 3 Oct 2013 18:05:35 +0300 > "Michael S. Tsirkin" wrote: > > > This defines a structure that will be used to fill in acpi tables > > where relevant properties are not yet available using QOM. > > > > Reviewed-by: Laszlo Ersek > > Reviewed-by: Gerd Hoffmann > > Tested-by: Gerd Hoffmann > > Signed-off-by: Michael S. Tsirkin > > --- > > include/hw/i386/pc.h | 9 +++++++++ > > hw/i386/pc.c | 31 +++++++++++++++++++++++++++++++ > > 2 files changed, 40 insertions(+) > > > > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h > > index 9b2ddc4..085a621 100644 > > --- a/include/hw/i386/pc.h > > +++ b/include/hw/i386/pc.h > > @@ -9,6 +9,9 @@ > > #include "hw/i386/ioapic.h" > > > > #include "qemu/range.h" > > +#include "qemu/bitmap.h" > > +#include "sysemu/sysemu.h" > > +#include "hw/pci/pci.h" > > > > /* PC-style peripherals (also used by other machines). */ > > > > @@ -20,6 +23,12 @@ typedef struct PcPciInfo { > > struct PcGuestInfo { > > bool has_pci_info; > > bool isapc_ram_fw; > > + hwaddr ram_size; > > + unsigned apic_id_limit; > > + bool apic_xrupt_override; > > + uint64_t numa_nodes; > > + uint64_t *node_mem; > > + uint64_t *node_cpu; > > FWCfgState *fw_cfg; > > }; > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > > index 0c313fe..dbae9da 100644 > > --- a/hw/i386/pc.c > > +++ b/hw/i386/pc.c > > @@ -1028,6 +1028,23 @@ static void pc_fw_cfg_guest_info(PcGuestInfo *guest_info) > > fw_cfg_add_file(guest_info->fw_cfg, "etc/pci-info", info, sizeof *info); > > } > > > > +static void pc_set_cpu_guest_info(CPUState *cpu, PcGuestInfo *guest_info) > > +{ > > + CPUClass *klass = CPU_GET_CLASS(cpu); > > + uint64_t apic_id = klass->get_arch_id(cpu); > > + int j; > > + > > + assert(apic_id < guest_info->apic_id_limit); > > + > > + for (j = 0; j < guest_info->numa_nodes; j++) { > > + assert(cpu->cpu_index < max_cpus); > > + if (test_bit(cpu->cpu_index, node_cpumask[j])) { > > + guest_info->node_cpu[apic_id] = cpu_to_le64(j); > > + break; > > + } > > + } > > +} > > + > > typedef struct PcGuestInfoState { > > PcGuestInfo info; > > Notifier machine_done; > > @@ -1047,6 +1064,20 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, > > { > > PcGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state); > > PcGuestInfo *guest_info = &guest_info_state->info; > > + CPUState *cpu; > > + > > + guest_info->ram_size = below_4g_mem_size + above_4g_mem_size; > > + guest_info->apic_id_limit = pc_apic_id_limit(max_cpus); > > + guest_info->apic_xrupt_override = kvm_allows_irq0_override(); > > + guest_info->numa_nodes = nb_numa_nodes; > > + guest_info->node_mem = g_memdup(node_mem, guest_info->numa_nodes * > > + sizeof *guest_info->node_mem); > > + guest_info->node_cpu = g_malloc0(guest_info->apic_id_limit * > > + sizeof *guest_info->node_cpu); > > + > > + CPU_FOREACH(cpu) { > > + pc_set_cpu_guest_info(cpu, guest_info); > > + } > > pc_guest_info_init() is called only once, now lets suppose we hotplug CPUs > and then reboot guest. Hotadded CPUs won't be accounted in guest_info.node_cpu > since it's initialized only once and is never updated. As result guest will > get stale SRAT table. > > Using a callback in acpi_setup/update could allow to get an updated guest_info. Actually we can fix this simpler just by filling in all numa info ahead of the time. Something like the following should fix this, right? diff --git a/hw/i386/pc.c b/hw/i386/pc.c index bbf11ed..a7fcbf9 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1029,23 +1029,6 @@ static void pc_fw_cfg_guest_info(PcGuestInfo *guest_info) fw_cfg_add_file(guest_info->fw_cfg, "etc/pci-info", info, sizeof *info); } -static void pc_set_cpu_guest_info(CPUState *cpu, PcGuestInfo *guest_info) -{ - CPUClass *klass = CPU_GET_CLASS(cpu); - uint64_t apic_id = klass->get_arch_id(cpu); - int j; - - assert(apic_id < guest_info->apic_id_limit); - - for (j = 0; j < guest_info->numa_nodes; j++) { - assert(cpu->cpu_index < max_cpus); - if (test_bit(cpu->cpu_index, node_cpumask[j])) { - guest_info->node_cpu[apic_id] = cpu_to_le64(j); - break; - } - } -} - typedef struct PcGuestInfoState { PcGuestInfo info; Notifier machine_done; @@ -1066,7 +1049,7 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, { PcGuestInfoState *guest_info_state = g_malloc0(sizeof *guest_info_state); PcGuestInfo *guest_info = &guest_info_state->info; - CPUState *cpu; + int i, j; guest_info->ram_size = below_4g_mem_size + above_4g_mem_size; guest_info->apic_id_limit = pc_apic_id_limit(max_cpus); @@ -1077,8 +1060,15 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, guest_info->node_cpu = g_malloc0(guest_info->apic_id_limit * sizeof *guest_info->node_cpu); - CPU_FOREACH(cpu) { - pc_set_cpu_guest_info(cpu, guest_info); + for (i = 0; i < max_cpus; i++) { + unsigned int apic_id = x86_cpu_apic_id_from_index(i); + assert(apic_id < guest_info->apic_id_limit); + for (j = 0; j < nb_numa_nodes; j++) { + if (test_bit(i, node_cpumask[j])) { + guest_info->node_cpu[apic_id] = j; + break; + } + } } guest_info_state->machine_done.notify = pc_guest_info_machine_done;