From patchwork Tue Oct 28 03:35:37 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg KH X-Patchwork-Id: 404065 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id F1F6214007D for ; Tue, 28 Oct 2014 18:01:08 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753291AbaJ1HBH (ORCPT ); Tue, 28 Oct 2014 03:01:07 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:45536 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932645AbaJ1Dqn (ORCPT ); Mon, 27 Oct 2014 23:46:43 -0400 Received: from localhost (60-250-232-251.HINET-IP.hinet.net [60.250.232.251]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 9BA208B4; Tue, 28 Oct 2014 03:46:42 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, sparclinux@vger.kernel.org, Bob Picco , "David S. Miller" Subject: [PATCH 3.16 102/127] sparc64: find_node adjustment Date: Tue, 28 Oct 2014 11:35:37 +0800 Message-Id: <20141028033425.301155371@linuxfoundation.org> X-Mailer: git-send-email 2.1.2 In-Reply-To: <20141028033420.925922046@linuxfoundation.org> References: <20141028033420.925922046@linuxfoundation.org> User-Agent: quilt/0.63-1 MIME-Version: 1.0 Sender: sparclinux-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: sparclinux@vger.kernel.org 3.16-stable review patch. If anyone has any objections, please let me know. ------------------ From: bob picco [ Upstream commit 3dee9df54836d5f844f3d58281d3f3e6331b467f ] We have seen an issue with guest boot into LDOM that causes early boot failures because of no matching rules for node identitity of the memory. I analyzed this on my T4 and concluded there might not be a solution. I saw the issue in mainline too when booting into the control/primary domain - with guests configured. Note, this could be a firmware bug on some older machines. I'll provide a full explanation of the issues below. Should we not find a matching BEST latency group for a real address (RA) then we will assume node 0. On the T4-2 here with the information provided I can't see an alternative. Technically the LDOM shown below should match the MBLOCK to the favorable latency group. However other factors must be considered too. Were the memory controllers configured "fine" grained interleave or "coarse" grain interleaved - T4. Also should a "group" MD node be considered a NUMA node? There has to be at least one Machine Description (MD) "group" and hence one NUMA node. The group can have one or more latency groups (lg) - more than one memory controller. The current code chooses the smallest latency as the most favorable per group. The latency and lg information is in MLGROUP below. MBLOCK is the base and size of the RAs for the machine as fetched from OBP /memory "available" property. My machine has one MBLOCK but more would be possible - with holes? For a T4-2 the following information has been gathered: with LDOM guest MEMBLOCK configuration: memory size = 0x27f870000 memory.cnt = 0x3 memory[0x0] [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes memory[0x1] [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes memory[0x2] [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes reserved.cnt = 0x2 reserved[0x0] [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes reserved[0x1] [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes MBLOCK[0]: base[20000000] size[280000000] offset[0] (note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values) (note: (RA + offset) & mask = val is the formula to detect a match for the memory controller. should there be no match for find_node node, a return value of -1 resulted for the node - BAD) There is one group. It has these forward links MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000] MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000] NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8]) (note: "val" is the best lg's (smallest latency) "match") no LDOM guest - bare metal MEMBLOCK configuration: memory size = 0xfdf2d0000 memory.cnt = 0x3 memory[0x0] [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes memory[0x1] [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes memory[0x2] [0x00000fff766000-0x00000fff771fff], 0xc000 bytes reserved.cnt = 0x2 reserved[0x0] [0x00000020800000-0x00000021a04580], 0x1204581 bytes reserved[0x1] [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes MBLOCK[0]: base[20000000] size[fe0000000] offset[0] there are two groups group node[16d5] MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000] MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000] NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8]) group node[171d] MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000] MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000] NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8]) (note: for this two "group" bare metal machine, 1/2 memory is in group one's lg and 1/2 memory is in group two's lg). Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/mm/init_64.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -839,7 +839,10 @@ static int find_node(unsigned long addr) if ((addr & p->mask) == p->val) return i; } - return -1; + /* The following condition has been observed on LDOM guests.*/ + WARN_ONCE(1, "find_node: A physical address doesn't match a NUMA node" + " rule. Some physical memory will be owned by node 0."); + return 0; } static u64 memblock_nid_range(u64 start, u64 end, int *nid)