From patchwork Tue Sep 29 13:38:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 1373421 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=tlqFv+HJ; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4C10ql5nXnz9sSG for ; Tue, 29 Sep 2020 23:39:47 +1000 (AEST) Received: from localhost ([::1]:39910 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNFr7-0007TH-Ni for incoming@patchwork.ozlabs.org; Tue, 29 Sep 2020 09:39:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40620) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNFq9-0007Pc-6A; Tue, 29 Sep 2020 09:38:45 -0400 Received: from mail-qv1-xf30.google.com ([2607:f8b0:4864:20::f30]:40161) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kNFq6-0001Br-Nl; Tue, 29 Sep 2020 09:38:44 -0400 Received: by mail-qv1-xf30.google.com with SMTP id j3so2255607qvi.7; Tue, 29 Sep 2020 06:38:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=neaS6yuMeaOowQGU9uSi8JXfUOeAiUye6JSFdgemN/M=; b=tlqFv+HJyIDotmGvmdySQg206DYUuk3N0juWlHjQfJ1ytbtWpH4TIq9E+7fYXTxKnb V0VUJ5bs3bXX7jrUIP7VwI+rIRVP97B8A9dg6P2W8LWHjlRj71aJhq9hzxDuDT0gHujv iOSEBFI77aJQUtma4mFnNEpVrxgZkwjf93MZC0jzRAdqVwY9toXpeAIAMOBuTe9qwUNM 3ZxUYTrhifPyApIZZ13cxTBjz+JC2Sb5oyOe2hOUaXLzcqaaD2KRruvF4rTlRNoOEVF1 GUxg41BnH3koJoL/iuyTLTbxYAFfzoNdC1v08r4/z5tNB9jj/S41Jh+LFnpUGocbb/Z3 IASw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=neaS6yuMeaOowQGU9uSi8JXfUOeAiUye6JSFdgemN/M=; b=XG4EDUGnNNAVLEUGaj7jWv48b18q4hSHP3mZ3H4QodzA7KHJVEua8Oq+VgMyAfDCli gnbP3e5YkkIIANtVo4bIaXvfJ9jLnWEQ4cXjZmIBuDhO1MZJ69YgQVdHdbv2Od9Wjp+p C410noaL4P1LsL4gsGpGY8/WjNguUVBQNTg5fX/9HQm7cfLCkdP3etIETQXiamkr2g7t 1mLViX3wZkGS2GheAxXqvhqPa649Gol4b9tymWS2rLMGABp0HbSqGJTrnV+SbzuL3Oam xRFoyp7TSnYkl/Bt7C+UJPl9YGE24OSUlSyQXZDuOM3a6WSbzvluk3bnbPDBYU7F2VwZ mN7g== X-Gm-Message-State: AOAM533f7SSBNFURpjm2VvP4GA0GbQKEeu8ApyIxl/YcpA3OrkgOhIYC YQqTffVlWdRzYuzKfbVRqouVOTp1ebw= X-Google-Smtp-Source: ABdhPJyUg+iMIVzKMpW42Jybv4J14A4K05QPFnJVyIAsZHLvilJt1a68jwq9+fPQdje+ozl0xQ1sNw== X-Received: by 2002:a0c:fe49:: with SMTP id u9mr4456181qvs.40.1601386721169; Tue, 29 Sep 2020 06:38:41 -0700 (PDT) Received: from rekt.ibmuc.com ([2804:431:c7c7:c625:6c0e:4720:8228:5f68]) by smtp.gmail.com with ESMTPSA id j88sm5239938qte.96.2020.09.29.06.38.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Sep 2020 06:38:36 -0700 (PDT) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Subject: [PATCH v3 1/5] spapr: add spapr_machine_using_legacy_numa() helper Date: Tue, 29 Sep 2020 10:38:13 -0300 Message-Id: <20200929133817.560278-2-danielhb413@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200929133817.560278-1-danielhb413@gmail.com> References: <20200929133817.560278-1-danielhb413@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::f30; envelope-from=danielhb413@gmail.com; helo=mail-qv1-xf30.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, groug@kaod.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" The changes to come to NUMA support are all guest visible. In theory we could just create a new 5_1 class option flag to avoid the changes to cascade to 5.1 and under. The reality is that these changes are only relevant if the machine has more than one NUMA node. There is no need to change guest behavior that has been around for years needlesly. This new helper will be used by the next patches to determine whether we should retain the (soon to be) legacy NUMA behavior in the pSeries machine. The new behavior will only be exposed if: - machine is pseries-5.2 and newer; - more than one NUMA node is declared in NUMA state. Reviewed-by: Greg Kurz Reviewed-by: David Gibson Signed-off-by: Daniel Henrique Barboza --- hw/ppc/spapr.c | 12 ++++++++++++ include/hw/ppc/spapr.h | 2 ++ 2 files changed, 14 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index e813c7cfb9..c5d8910a74 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -294,6 +294,15 @@ static hwaddr spapr_node0_size(MachineState *machine) return machine->ram_size; } +bool spapr_machine_using_legacy_numa(SpaprMachineState *spapr) +{ + MachineState *machine = MACHINE(spapr); + SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine); + + return smc->pre_5_2_numa_associativity || + machine->numa_state->num_nodes <= 1; +} + static void add_str(GString *s, const gchar *s1) { g_string_append_len(s, s1, strlen(s1) + 1); @@ -4522,8 +4531,11 @@ DEFINE_SPAPR_MACHINE(5_2, "5.2", true); */ static void spapr_machine_5_1_class_options(MachineClass *mc) { + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc); + spapr_machine_5_2_class_options(mc); compat_props_add(mc->compat_props, hw_compat_5_1, hw_compat_5_1_len); + smc->pre_5_2_numa_associativity = true; } DEFINE_SPAPR_MACHINE(5_1, "5.1", false); diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index 114e819969..d1aae03b97 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -143,6 +143,7 @@ struct SpaprMachineClass { bool smp_threads_vsmt; /* set VSMT to smp_threads by default */ hwaddr rma_limit; /* clamp the RMA to this size */ bool pre_5_1_assoc_refpoints; + bool pre_5_2_numa_associativity; void (*phb_placement)(SpaprMachineState *spapr, uint32_t index, uint64_t *buid, hwaddr *pio, @@ -860,6 +861,7 @@ int spapr_max_server_number(SpaprMachineState *spapr); void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex, uint64_t pte0, uint64_t pte1); void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered); +bool spapr_machine_using_legacy_numa(SpaprMachineState *spapr); /* DRC callbacks. */ void spapr_core_release(DeviceState *dev); From patchwork Tue Sep 29 13:38:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 1373422 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=sYbGRo9o; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4C10qv4vV5z9sSG for ; Tue, 29 Sep 2020 23:39:55 +1000 (AEST) Received: from localhost ([::1]:40304 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNFrF-0007fW-M6 for incoming@patchwork.ozlabs.org; Tue, 29 Sep 2020 09:39:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40682) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNFqE-0007XT-8L; Tue, 29 Sep 2020 09:38:51 -0400 Received: from mail-qk1-x742.google.com ([2607:f8b0:4864:20::742]:34611) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kNFqC-0001Ca-Gp; Tue, 29 Sep 2020 09:38:49 -0400 Received: by mail-qk1-x742.google.com with SMTP id c62so4302876qke.1; Tue, 29 Sep 2020 06:38:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=R4x7UXQx3o9BKP7QdPkixkdBnA0FbHUH8NO4L6FHy/8=; b=sYbGRo9oOVgPESa02XkvUKC2mYS83l3eDmPVjlJwRu9uVNZZMWDTT9NATjCJK9aDun H+5bx8lzNEmaROIUVGAXLPE008vfoFoGqY4PADWVZYiL2WQDzD++vYy4UFsoJBWFBMih 36mafA8YNGZtRUBGpVAKT3uFnjTNh8bjKvILHaTk0rGcV2NCZ+6Zu+JE7tT+8nYk5ur9 UF5QMrrQ5OmWC8Urx1U83oiT42Qzo9HJrRI6I0/CB4/+6XiEYjY8n8BZKARdQMbX6DVQ 2NgvtuvJVe56H7y9T4Y7fbLz9lexKQgWNocyvOMNWfPdad5+nrcBKbTLGKonHTOAiN6V D0VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R4x7UXQx3o9BKP7QdPkixkdBnA0FbHUH8NO4L6FHy/8=; b=HCDhh8XAzAzShCUZAlE9W0e/3l0IxxProe3w8D1HXxeZkyCaiGyRqcFtTb0gqIOrpO EsT01O+x352EKkOuRRLPKWm24vKsMmrxW2UfS74Re2nrJZYmOkHEe3f7/niJN4YW4EZj wBtyxJbS8Q6sXD/ZXKVLnP+VeMgN/vBlBRRvzcJzJ10F0R4vstOFN+PdId9uub6TVPcZ Vcg6Gk4tMkzJJrc6EYtHp1YQ69xrsH6y9HNnHLHd89fF2MluCRWSSMxz6co5fRzn8s+d VNP4dtOxmeLgO41IGfhkwW4mYLRWWPyNUDE+zdkOh4Qj6h9rsG+kmg3OQsPA3AcaQbLS Ot3Q== X-Gm-Message-State: AOAM5312pQKeGPweUwh7kAmyzDPdosFSGWj3zghqCsqCdmKfaC9qVSO7 /irLfypN34v2InEBgYoVhalOuRoLFoc= X-Google-Smtp-Source: ABdhPJxIZhqRcW9tj8LD5NJibbEd7dYqUlfsgUYeeSHFrJmtKQyNdIZgjE6t0KH+7KFoQMNpfoM3dA== X-Received: by 2002:a05:620a:218b:: with SMTP id g11mr4380983qka.494.1601386726924; Tue, 29 Sep 2020 06:38:46 -0700 (PDT) Received: from rekt.ibmuc.com ([2804:431:c7c7:c625:6c0e:4720:8228:5f68]) by smtp.gmail.com with ESMTPSA id j88sm5239938qte.96.2020.09.29.06.38.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Sep 2020 06:38:42 -0700 (PDT) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Subject: [PATCH v3 2/5] spapr_numa: forbid asymmetrical NUMA setups Date: Tue, 29 Sep 2020 10:38:14 -0300 Message-Id: <20200929133817.560278-3-danielhb413@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200929133817.560278-1-danielhb413@gmail.com> References: <20200929133817.560278-1-danielhb413@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::742; envelope-from=danielhb413@gmail.com; helo=mail-qk1-x742.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, groug@kaod.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" The pSeries machine does not support asymmetrical NUMA configurations. This doesn't make much of a different since we're not using user input for pSeries NUMA setup, but this will change in the next patches. To avoid breaking existing setups, gate this change by checking for legacy NUMA support. Reviewed-by: Greg Kurz Reviewed-by: David Gibson Signed-off-by: Daniel Henrique Barboza --- hw/ppc/spapr_numa.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c index 64fe567f5d..fe395e80a3 100644 --- a/hw/ppc/spapr_numa.c +++ b/hw/ppc/spapr_numa.c @@ -19,6 +19,24 @@ /* Moved from hw/ppc/spapr_pci_nvlink2.c */ #define SPAPR_GPU_NUMA_ID (cpu_to_be32(1)) +static bool spapr_numa_is_symmetrical(MachineState *ms) +{ + int src, dst; + int nb_numa_nodes = ms->numa_state->num_nodes; + NodeInfo *numa_info = ms->numa_state->nodes; + + for (src = 0; src < nb_numa_nodes; src++) { + for (dst = src; dst < nb_numa_nodes; dst++) { + if (numa_info[src].distance[dst] != + numa_info[dst].distance[src]) { + return false; + } + } + } + + return true; +} + void spapr_numa_associativity_init(SpaprMachineState *spapr, MachineState *machine) { @@ -61,6 +79,22 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr, spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i); } + + /* + * Legacy NUMA guests (pseries-5.1 and older, or guests with only + * 1 NUMA node) will not benefit from anything we're going to do + * after this point. + */ + if (spapr_machine_using_legacy_numa(spapr)) { + return; + } + + if (!spapr_numa_is_symmetrical(machine)) { + error_report("Asymmetrical NUMA topologies aren't supported " + "in the pSeries machine"); + exit(EXIT_FAILURE); + } + } void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt, From patchwork Tue Sep 29 13:38:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 1373423 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=H/uV7vjO; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4C10qz70Xwz9sSG for ; Tue, 29 Sep 2020 23:39:59 +1000 (AEST) Received: from localhost ([::1]:40748 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNFrJ-0007qB-SB for incoming@patchwork.ozlabs.org; Tue, 29 Sep 2020 09:39:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40698) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNFqF-0007Xu-Uj; Tue, 29 Sep 2020 09:38:51 -0400 Received: from mail-qk1-x743.google.com ([2607:f8b0:4864:20::743]:35396) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kNFqE-0001Ct-6f; Tue, 29 Sep 2020 09:38:51 -0400 Received: by mail-qk1-x743.google.com with SMTP id q5so4299397qkc.2; Tue, 29 Sep 2020 06:38:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=lpsGtZ60cFvA7qd3INWQF05K0q47LnadRWDRZid09OA=; b=H/uV7vjOla3sESrVbk3eM+HuE1nNVTeDk78g8kjAaPX2+gv1+jmtcZxFoSg/UQdF99 mTV9MJVpUXLjY1BhWTw6t96ZsIVG8Py6x3FqFY9qmxwS+i1rvubk6BoVl/KILplkp1+P YjXgNE8ZEj+NH30ln9OIZ9lyTUsY6idgcmAGYtmZlxhLHG82iQfWntIJg3r9k81MwaTx jDK/MAb09xYx88PZJFMr/hUItH2V+6Dn9/POe08Xlv5E26jMULbc2D5jM1l3NBpTXaEY CPb+U+wpawENtyvx40SFSLUlIWXhFurRRHnD3mKXreLGTnkRg3ZY6o/IjUBmOu7VU1F3 p7YQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lpsGtZ60cFvA7qd3INWQF05K0q47LnadRWDRZid09OA=; b=a4Qg8/cfeFcsm4JCxqlRebrphaHYp2hGLOorEe1ltBcLY822bzfZWPg5XepQASU1vz vs150ApQMnJVnIhCyOkiqQuoNoj1KNnIh+KV1lv9Na9qEbm8M74kn6Yvw4PC3DfcE1nN Dw0QpsRknlS+ZfyY/ICmlcqOkt/TljwMETZ045Nkogp14046hqitVVREQctzHo0c1zWh h9Dky67c97rQjfh1EhB7cEmPOv6aXDzGS1f96fVwhfmudpkMxEjFU6MVqXT3FgBSllBY 5iaRqOtNJtoFmY+V6qGtKO7/x8qxU7jAV+hiAs8j9L23X4Kp86oADNbBXk8Vz5p2HnCl KbXQ== X-Gm-Message-State: AOAM532E6clPSCRcP+KxpW5GnQsHJ//ARNC8YJijvcC5Twqlte/V1uLl t4OycEX0TUHvZ4qciDfi/UwSgo9mx14= X-Google-Smtp-Source: ABdhPJz4hg8XxM2dX6GASsamrBTDWHnF9RcFCoyxMzk5r0/A2kv2BXQQJCP95vrawguz7lpwQidmSg== X-Received: by 2002:a05:620a:683:: with SMTP id f3mr4492944qkh.491.1601386728622; Tue, 29 Sep 2020 06:38:48 -0700 (PDT) Received: from rekt.ibmuc.com ([2804:431:c7c7:c625:6c0e:4720:8228:5f68]) by smtp.gmail.com with ESMTPSA id j88sm5239938qte.96.2020.09.29.06.38.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Sep 2020 06:38:48 -0700 (PDT) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Subject: [PATCH v3 3/5] spapr_numa: change reference-points and maxdomain settings Date: Tue, 29 Sep 2020 10:38:15 -0300 Message-Id: <20200929133817.560278-4-danielhb413@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200929133817.560278-1-danielhb413@gmail.com> References: <20200929133817.560278-1-danielhb413@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::743; envelope-from=danielhb413@gmail.com; helo=mail-qk1-x743.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, groug@kaod.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This is the first guest visible change introduced in spapr_numa.c. The previous settings of both reference-points and maxdomains were too restrictive, but enough for the existing associativity we're setting in the resources. We'll change that in the following patches, populating the associativity arrays based on user input. For those changes to be effective, reference-points and maxdomains must be more flexible. After this patch, we'll have 4 distinct levels of NUMA (0x4, 0x3, 0x2, 0x1) and maxdomains will allow for any type of configuration the user intends to do - under the scope and limitations of PAPR itself, of course. Reviewed-by: Greg Kurz Reviewed-by: David Gibson Signed-off-by: Daniel Henrique Barboza --- hw/ppc/spapr_numa.c | 43 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 35 insertions(+), 8 deletions(-) diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c index fe395e80a3..16badb1f4b 100644 --- a/hw/ppc/spapr_numa.c +++ b/hw/ppc/spapr_numa.c @@ -178,24 +178,51 @@ int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt, */ void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas) { + MachineState *ms = MACHINE(spapr); SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); uint32_t refpoints[] = { cpu_to_be32(0x4), - cpu_to_be32(0x4), + cpu_to_be32(0x3), cpu_to_be32(0x2), + cpu_to_be32(0x1), }; uint32_t nr_refpoints = ARRAY_SIZE(refpoints); - uint32_t maxdomain = cpu_to_be32(spapr->gpu_numa_id > 1 ? 1 : 0); + uint32_t maxdomain = ms->numa_state->num_nodes + spapr->gpu_numa_id; uint32_t maxdomains[] = { cpu_to_be32(4), - maxdomain, - maxdomain, - maxdomain, - cpu_to_be32(spapr->gpu_numa_id), + cpu_to_be32(maxdomain), + cpu_to_be32(maxdomain), + cpu_to_be32(maxdomain), + cpu_to_be32(maxdomain) }; - if (smc->pre_5_1_assoc_refpoints) { - nr_refpoints = 2; + if (spapr_machine_using_legacy_numa(spapr)) { + uint32_t legacy_refpoints[] = { + cpu_to_be32(0x4), + cpu_to_be32(0x4), + cpu_to_be32(0x2), + }; + uint32_t legacy_maxdomain = spapr->gpu_numa_id > 1 ? 1 : 0; + uint32_t legacy_maxdomains[] = { + cpu_to_be32(4), + cpu_to_be32(legacy_maxdomain), + cpu_to_be32(legacy_maxdomain), + cpu_to_be32(legacy_maxdomain), + cpu_to_be32(spapr->gpu_numa_id), + }; + + G_STATIC_ASSERT(sizeof(legacy_refpoints) <= sizeof(refpoints)); + G_STATIC_ASSERT(sizeof(legacy_maxdomains) <= sizeof(maxdomains)); + + nr_refpoints = 3; + + memcpy(refpoints, legacy_refpoints, sizeof(legacy_refpoints)); + memcpy(maxdomains, legacy_maxdomains, sizeof(legacy_maxdomains)); + + /* pseries-5.0 and older reference-points array is {0x4, 0x4} */ + if (smc->pre_5_1_assoc_refpoints) { + nr_refpoints = 2; + } } _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points", From patchwork Tue Sep 29 13:38:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 1373425 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=oHcI2m1U; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4C10wF3k9Nz9sTq for ; Tue, 29 Sep 2020 23:43:41 +1000 (AEST) Received: from localhost ([::1]:48404 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNFut-0002gH-Fq for incoming@patchwork.ozlabs.org; Tue, 29 Sep 2020 09:43:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40760) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNFqK-0007ZM-4g; Tue, 29 Sep 2020 09:38:57 -0400 Received: from mail-qk1-x744.google.com ([2607:f8b0:4864:20::744]:33592) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kNFqG-0001DJ-BB; Tue, 29 Sep 2020 09:38:55 -0400 Received: by mail-qk1-x744.google.com with SMTP id s131so4300775qke.0; Tue, 29 Sep 2020 06:38:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=AQnMhIcD5eYV/9ZgfONZjp6v86/KsVMWHA6TpQLzrRY=; b=oHcI2m1UOanbEY0l9aI8ELePzBDuNy4oPomABTk6xvZaM/nYynP/QvErAj/cSOa4Io LzZTteJMBI0+FoCEJ8/ZcULe3GQjFaUNonqGurnUaMyMxyU76fLcMbCYHW8IV+1Vy0HR nPGDqhmVH1NmabEGm13NrLoEGqOeALBq0oRpIz/jZFKNlI3LnrQaMLpIrnElYeef/ytL KKMyb2jfMABbxqJiupBHCQxn1XeKZTaxj1u2Le0r7BcUcVXg943F40+XPe9UTtIIpyq/ vNQbEuG3zKw/2241j4vi2ak9wqMA6LvVNe+EvHvKGafPyx8d0GVLFxxbOTV7pgUs2wy8 8aSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AQnMhIcD5eYV/9ZgfONZjp6v86/KsVMWHA6TpQLzrRY=; b=t1mQ1cRT8E5p7sKuAxYhRzTkkJkMlKNVad5+4hPNonzGh68S5nGKVbFlwoR1nYU6Bh vfTBFL4mbBFWcEtKfr36+wcg12PEKeIk6Rs/0UBFtlyOHwMktbhAAGettJthH2ZhnqZH D3svW9KTIhqAOKQfGZ7PJCLKdv7a5LVYgEcdGnNGlJY61qhq+7qnLozNW8NI7qLdXvtk sgtG0HShPZqbamjrRCC1zeg06JO7aptWrBN7LSScNYpdzPHu2xFQ0Ec+KXrE0+ZO4jE6 q4g4vwjHN3bBDzgP3WpsEFx05rs4rWtiArhzz+B1Zv+CmGQWvP8wp4X0GHFmDXh8ubK7 BqUg== X-Gm-Message-State: AOAM530fdG6HcXkd7dRWip03rtVC2/QiAxsLHyEaZPwdvu3TDftNBUAL AZLOx5wND++8x9/MoPolOj6fjaJ46Uw= X-Google-Smtp-Source: ABdhPJyuCxV56CQXLdRjkidI84GUVNG/t4Dm7bu7s7zqnt85ZmMWPBjQzcEBbdsJSE1wxn1RujBQ9w== X-Received: by 2002:a37:4e45:: with SMTP id c66mr4548828qkb.36.1601386730318; Tue, 29 Sep 2020 06:38:50 -0700 (PDT) Received: from rekt.ibmuc.com ([2804:431:c7c7:c625:6c0e:4720:8228:5f68]) by smtp.gmail.com with ESMTPSA id j88sm5239938qte.96.2020.09.29.06.38.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Sep 2020 06:38:49 -0700 (PDT) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Subject: [PATCH v3 4/5] spapr_numa: consider user input when defining associativity Date: Tue, 29 Sep 2020 10:38:16 -0300 Message-Id: <20200929133817.560278-5-danielhb413@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200929133817.560278-1-danielhb413@gmail.com> References: <20200929133817.560278-1-danielhb413@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::744; envelope-from=danielhb413@gmail.com; helo=mail-qk1-x744.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, groug@kaod.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This patch puts all the pieces together to finally allow user input when defining the NUMA topology of the spapr guest. For each NUMA node A, starting at node id 0, the new spapr_numa_define_associativity_domains() will: - get the distance between node A and B = A + 1 - get the correspondent NUMA level for this distance - assign the associativity domain for A and B for the given NUMA level, using the lowest associativity domain value between them - if there is more NUMA nodes, increment B and repeat Since we always start at the first node (id = 0) and go in ascending order, we are prioritizing any previous associativity already calculated. This is necessary because neither QEMU, nor the pSeries kernel, supports multiple associativity domains for each resource, meaning that we have to decide which associativity relation is relevant. Another side effect is that the first NUMA node, node 0, will always have an associativity array full of zeroes. This is intended - in fact, the Linux kernel expects it (see [1] for more info). Ultimately, all of this results in a best effort approximation for the actual NUMA distances the user input in the command line. Given the nature of how PAPR itself interprets NUMA distances versus the expectations risen by how ACPI SLIT works, there might be better algorithms but, in the end, it'll also result in another way to approximate what the user really wanted. To keep this commit message no longer than it already is, the next patch will update the existing documentation in ppc-spapr-numa.rst with more in depth details and design considerations/drawbacks. [1] https://lore.kernel.org/linuxppc-dev/5e8fbea3-8faf-0951-172a-b41a2138fbcf@gmail.com/ Signed-off-by: Daniel Henrique Barboza --- hw/ppc/spapr_numa.c | 120 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 119 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c index 16badb1f4b..f3d43ceb1e 100644 --- a/hw/ppc/spapr_numa.c +++ b/hw/ppc/spapr_numa.c @@ -37,12 +37,118 @@ static bool spapr_numa_is_symmetrical(MachineState *ms) return true; } +/* + * This function will translate the user distances into + * what the kernel understand as possible values: 10 + * (local distance), 20, 40, 80 and 160, and return the equivalent + * NUMA level for each. Current heuristic is: + * - local distance (10) returns numa_level = 0x4 + * - distances between 11 and 30 inclusive -> rounded to 20, + * numa_level = 0x3 + * - distances between 31 and 60 inclusive -> rounded to 40, + * numa_level = 0x2 + * - distances between 61 and 120 inclusive -> rounded to 80, + * numa_level = 0x1 + * - everything above 120 returns numa_level = 0 to indicate that + * there is no match. This will be calculated as disntace = 160 + * by the kernel (as of v5.9) + */ +static uint8_t spapr_numa_get_numa_level(uint8_t distance) +{ + uint8_t rounded_distance = 160; + uint8_t numa_level; + + if (distance > 11 && distance <= 30) { + rounded_distance = 20; + } else if (distance > 31 && distance <= 60) { + rounded_distance = 40; + } else if (distance > 61 && distance <= 120) { + rounded_distance = 80; + } + + switch (rounded_distance) { + case 10: + numa_level = 0x4; + break; + case 20: + numa_level = 0x3; + break; + case 40: + numa_level = 0x2; + break; + case 80: + numa_level = 0x1; + break; + default: + numa_level = 0; + } + + return numa_level; +} + +static void spapr_numa_define_associativity_domains(SpaprMachineState *spapr) +{ + MachineState *ms = MACHINE(spapr); + NodeInfo *numa_info = ms->numa_state->nodes; + int nb_numa_nodes = ms->numa_state->num_nodes; + int src, dst; + + for (src = 0; src < nb_numa_nodes; src++) { + for (dst = src; dst < nb_numa_nodes; dst++) { + /* + * This is how the associativity domain between A and B + * is calculated: + * + * - get the distance between them + * - get the correspondent NUMA level for this distance + * - the arrays were initialized with their own numa_ids, + * and we're calculating the distance in node_id ascending order, + * starting from node 0. This will have a cascade effect in the + * algorithm because the associativity domains that node 0 defines + * will be carried over to the other nodes, and node 1 + * associativities will be carried over unless there's already a + * node 0 associativity assigned, and so on. This happens because + * we'll assign assoc_src as the associativity domain of dst + * as well, for the given NUMA level. + * + * The PPC kernel expects the associativity domains of node 0 to + * be always 0, and this algorithm will grant that by default. + */ + uint8_t distance = numa_info[src].distance[dst]; + uint8_t n_level = spapr_numa_get_numa_level(distance); + uint32_t assoc_src; + + /* + * n_level = 0 means that the distance is greater than our last + * rounded value (120). In this case there is no NUMA level match + * between src and dst and we can skip the remaining of the loop. + * + * The Linux kernel will assume that the distance between src and + * dst, in this case of no match, is 10 (local distance) doubled + * for each NUMA it didn't match. We have MAX_DISTANCE_REF_POINTS + * levels (4), so this gives us 10*2*2*2*2 = 160. + * + * This logic can be seen in the Linux kernel source code, as of + * v5.9, in arch/powerpc/mm/numa.c, function __node_distance(). + */ + if (n_level == 0) { + continue; + } + + assoc_src = spapr->numa_assoc_array[src][n_level]; + spapr->numa_assoc_array[dst][n_level] = assoc_src; + } + } + +} + void spapr_numa_associativity_init(SpaprMachineState *spapr, MachineState *machine) { SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); int nb_numa_nodes = machine->numa_state->num_nodes; int i, j, max_nodes_with_gpus; + bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr); /* * For all associativity arrays: first position is the size, @@ -56,6 +162,17 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr, for (i = 0; i < nb_numa_nodes; i++) { spapr->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS); spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i); + + /* + * Fill all associativity domains of non-zero NUMA nodes with + * node_id. This is required because the default value (0) is + * considered a match with associativity domains of node 0. + */ + if (!using_legacy_numa && i != 0) { + for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) { + spapr->numa_assoc_array[i][j] = cpu_to_be32(i); + } + } } /* @@ -85,7 +202,7 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr, * 1 NUMA node) will not benefit from anything we're going to do * after this point. */ - if (spapr_machine_using_legacy_numa(spapr)) { + if (using_legacy_numa) { return; } @@ -95,6 +212,7 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr, exit(EXIT_FAILURE); } + spapr_numa_define_associativity_domains(spapr); } void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt, From patchwork Tue Sep 29 13:38:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 1373427 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=dPDTaKwZ; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4C10z74S1yz9sTW for ; Tue, 29 Sep 2020 23:46:11 +1000 (AEST) Received: from localhost ([::1]:55610 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNFxJ-0005hF-KM for incoming@patchwork.ozlabs.org; Tue, 29 Sep 2020 09:46:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:40764) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNFqK-0007ZN-Lo; Tue, 29 Sep 2020 09:38:57 -0400 Received: from mail-qk1-x744.google.com ([2607:f8b0:4864:20::744]:34614) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kNFqI-0001Db-Dy; Tue, 29 Sep 2020 09:38:56 -0400 Received: by mail-qk1-x744.google.com with SMTP id c62so4303193qke.1; Tue, 29 Sep 2020 06:38:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jijcRPvylkUY2TozcNpYKw9C8qqxbdoidV8N19tjEm4=; b=dPDTaKwZtQTJfQYvuPYpUl+Irb8UZ3NOao1kHBopyQx/+Q9bJFc7IIwSTPfBShfbqa wgrxPnFHZ80dr610BbiS4i3IurQymTfur3Oojmmr4Myy7VvATNlLQdEa+BhDvkf/Pd3T bOB/80p4v5KXBqjTQxlGvktuWAIXU6fB5+Kel1ZV69OQYTLjrga7zGOcnyL+fwv78pnX EOjdQ//5f515kZcGpCpQRUaxKjAzxpjFHkYFrrFTGLCV0L/3quX33WT43X8pd+dRDqVA qjhBVNqNRWYcN+7z84+jXceGbU5t3ESEWpIUUxCm0B2bvky3QAaeixOq663WbTJE3GcS kDkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jijcRPvylkUY2TozcNpYKw9C8qqxbdoidV8N19tjEm4=; b=kf4dgA9d02iT3G+lbhyhWH1d3xyAJeoly+Yv6ti62DslkDe66IHEjt+mzhJLmYRj3b l3/k6ZH2F8NseAxOWOlmJBi7cpIUlaHADBsmMcT1EL47TEj/slwtts/A+fRNOgLSTqXH wAVdltFHO0nNuvRqPC8Xi6FW7aKIxiG5NzFYA8pQhOfZirt7CI6+9AMjq/6ck3xMM3cC q21lo7umd/sEryAiREUDG2QLF6+Hnjmi7Xxg5jtW7rUueUb/4c6KQ2zD3KhTYw1xuSVf 9MI2uwxRIVjB4TddTdFFfgC1qNyLOjMiZzULiv+qe+qwqnkVIpZOlnwuR9iG458JKRSs aW2g== X-Gm-Message-State: AOAM533uA7Ld/goZxZ7Q/WJ1IwWx/sHRKYLr1N0r+/hGAGL4asgohzud aFCbdk6lUovEZvtADfOTT7KKwFTHQ9c= X-Google-Smtp-Source: ABdhPJyvaH1zhujglLKoZrXU3wefcyWi8IBxw+5RJ6ecGEWvlZkrfzElu2+O7jgT2s9mjax2WPeYGA== X-Received: by 2002:a05:620a:231:: with SMTP id u17mr4553983qkm.166.1601386732262; Tue, 29 Sep 2020 06:38:52 -0700 (PDT) Received: from rekt.ibmuc.com ([2804:431:c7c7:c625:6c0e:4720:8228:5f68]) by smtp.gmail.com with ESMTPSA id j88sm5239938qte.96.2020.09.29.06.38.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Sep 2020 06:38:51 -0700 (PDT) From: Daniel Henrique Barboza To: qemu-devel@nongnu.org Subject: [PATCH v3 5/5] specs/ppc-spapr-numa: update with new NUMA support Date: Tue, 29 Sep 2020 10:38:17 -0300 Message-Id: <20200929133817.560278-6-danielhb413@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200929133817.560278-1-danielhb413@gmail.com> References: <20200929133817.560278-1-danielhb413@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::744; envelope-from=danielhb413@gmail.com; helo=mail-qk1-x744.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Henrique Barboza , qemu-ppc@nongnu.org, groug@kaod.org, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This update provides more in depth information about the choices and drawbacks of the new NUMA support for the spapr machine. Signed-off-by: Daniel Henrique Barboza --- docs/specs/ppc-spapr-numa.rst | 206 +++++++++++++++++++++++++++++++++- 1 file changed, 205 insertions(+), 1 deletion(-) diff --git a/docs/specs/ppc-spapr-numa.rst b/docs/specs/ppc-spapr-numa.rst index e762038022..6dd13bf97b 100644 --- a/docs/specs/ppc-spapr-numa.rst +++ b/docs/specs/ppc-spapr-numa.rst @@ -158,9 +158,213 @@ kernel tree). This results in the following distances: * resources four NUMA levels apart: 160 -Consequences for QEMU NUMA tuning +pseries NUMA mechanics +====================== + +Starting in QEMU 5.2, the pseries machine considers user input when setting NUMA +topology of the guest. The following changes were made: + +* ibm,associativity-reference-points was changed to {0x4, 0x3, 0x2, 0x1}, allowing + for 4 distinct NUMA distance values based on the NUMA levels + +* ibm,max-associativity-domains was changed to support multiple associativity + domains in all NUMA levels. This is needed to ensure user flexibility + +* ibm,associativity for all resources now varies with user input + +These changes are only effective for pseries-5.2 and newer machines that are +created with more than one NUMA node (disconsidering NUMA nodes created by +the machine itself, e.g. NVLink 2 GPUs). The now legacy support has been +around for such a long time, with users seeing NUMA distances 10 and 40 +(and 80 if using NVLink2 GPUs), and there is no need to disrupt the +existing experience of those guests. + +To bring the user experience x86 users have when tuning up NUMA, we had +to operate under the current pseries Linux kernel logic described in +`How the pseries Linux guest calculates NUMA distances`_. The result +is that we needed to translate NUMA distance user input to pseries +Linux kernel input. + +Translating user distance to kernel distance +-------------------------------------------- + +User input for NUMA distance can vary from 10 to 254. We need to translate +that to the values that the Linux kernel operates on (10, 20, 40, 80, 160). +This is how it is being done: + +* user distance 11 to 30 will be interpreted as 20 +* user distance 31 to 60 will be interpreted as 40 +* user distance 61 to 120 will be interpreted as 80 +* user distance 121 and beyond will be interpreted as 160 +* user distance 10 stays 10 + +The reasoning behind this aproximation is to avoid any round up to the local +distance (10), keeping it exclusive to the 4th NUMA level (which is still +exclusive to the node_id). All other ranges were chosen under the developer +discretion of what would be (somewhat) sensible considering the user input. +Any other strategy can be used here, but in the end the reality is that we'll +have to accept that a large array of values will be translated to the same +NUMA topology in the guest, e.g. this user input: + +:: + + 0 1 2 + 0 10 31 120 + 1 31 10 30 + 2 120 30 10 + +And this other user input: + +:: + + 0 1 2 + 0 10 60 61 + 1 60 10 11 + 2 61 11 10 + +Will both be translated to the same values internally: + +:: + + 0 1 2 + 0 10 40 80 + 1 40 10 20 + 2 80 20 10 + +Users are encouraged to use only the kernel values in the NUMA definition to +avoid being taken by surprise with that the guest is actually seeing in the +topology. There are enough potential surprises that are inherent to the +associativity domain assignment process, discussed below. + + +How associativity domains are assigned +-------------------------------------- + +LOPAPR allows more than one associativity array (or 'string') per allocated +resource. This would be used to represent that the resource has multiple +connections with the board, and then the operational system, when deciding +NUMA distancing, should consider the associativity information that provides +the shortest distance. + +The spapr implementation does not support multiple associativity arrays per +resource, neither does the pseries Linux kernel. We'll have to represent the +NUMA topology using one associativity per resource, which means that choices +and compromises are going to be made. + +Consider the following NUMA topology entered by user input: + +:: + + 0 1 2 3 + 0 10 40 20 40 + 1 40 10 80 40 + 2 20 80 10 20 + 3 40 40 20 10 + +Honoring just the relative distances of node 0 to every other node, one possible +value for all associativity arrays would be: + +* node 0: 0 0 0 0 +* node 1: 1 0 1 1 +* node 2: 2 2 0 2 +* node 3: 3 0 3 3 + +With the reference points {0x4, 0x3, 0x2, 0x1}, for node 0: + +* distance from 0 to 1 is 40 (no match at 0x4 and 0x3, will match + at 0x2) +* distance from 0 to 2 is 20 (no match at 0x4, will match at 0x3) +* distance from 0 to 3 is 40 (no match at 0x4 and 0x3, will match + at 0x2) + +The distances related to node 0 are accounted for. For node 1, and keeping +in mind that we don't need to revisit node 0 again, the distance from +node 1 to 2 is 80, matching at 0x4, and distance from 1 to 3 is 40, +match in 0x3: + +* node 0: 0 0 0 0 +* node 1: 1 0 1 1 +* node 2: 1 2 0 2 +* node 3: 3 0 3 3 + +In the last step we will analyze just nodes 2 and 3. The desired distance +between 2 and 3 is 20, i.e. a match in 0x3. Node 2 already has a +domain assigned in 0x3, 0. We'll preserve it to avoid dissolving the +association between node 0 and node 2, and use it as a domain for +0x3 as well: + +* node 0: 0 0 0 0 +* node 1: 1 0 1 1 +* node 2: 1 2 0 2 +* node 3: 3 0 0 3 + + +The kernel will read these arrays and will calculate the following NUMA topology for +the guest: + +:: + + 0 1 2 3 + 0 10 40 20 20 + 1 40 10 80 40 + 2 20 80 10 20 + 3 20 40 20 10 + +Note that this is not what the user wanted - the desired distance between +0 and 3 is 40, we calculated it as 20. This is what the current logic and +implementation constraints of the kernel and QEMU will provide inside the +LOPAPR specification. + + +Users are welcome to use this knowledge and experiment with the input to get +the NUMA topology they want, or as closer as they want. The important thing +is to keep expectations up to par with what we are capable of provide at this +moment: an approximation. + +Limitations of the implementation --------------------------------- +As mentioned above, the pSeries NUMA distance logic is, in fact, a way to approximate +user choice. The Linux kernel, and PAPR itself, does not provide QEMU with the ways +to fully map user input to actual NUMA distance the guest will use. These limitations +creates two notable limitations in our support: + +* Asymmetrical topologies aren't supported. We only support NUMA topologies where + the distance from node A to B is always the same as B to A. We do not support + any A-B pair where the distance back and forth is asymmetric. For example, the + following topology isn't supported and the pSeries guest will not boot with this + user input: + +:: + + 0 1 + 0 10 40 + 1 20 10 + + +* 'non-transitive' topologies will be poorly translated to the guest. This is the + kind of topology where the distance from a node A to B is X, B to C is X, but + the distance A to C is not X. E.g.: + +:: + + 0 1 2 3 + 0 10 20 20 40 + 1 20 10 80 40 + 2 20 80 10 20 + 3 40 40 20 10 + + In the example above, distance 0 to 2 is 20, 2 to 3 is 20, but 0 to 3 is 40. + The kernel will always match with the shortest associativity domain possible, + and we're attempting to retain the previous established relations between the + nodes. This means that a distance equal to 20 between nodes 0 and 2 and the + same distance 20 between nodes 2 and 3 will cause the distance between 0 and 3 + to also be 20. + + +Legacy (5.1 and older) pseries NUMA mechanics +============================================= + The way the pseries Linux guest calculates NUMA distances has a direct effect on what QEMU users can expect when doing NUMA tuning. As of QEMU 5.1, this is the default ibm,associativity-reference-points being used in the pseries