From patchwork Thu Dec 20 19:00:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hari Bathini X-Patchwork-Id: 1016966 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLlZ3p0kz9sCh for ; Fri, 21 Dec 2018 06:04:10 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43LLlZ2QYPzDr7m for ; Fri, 21 Dec 2018 06:04:10 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43LLgL0JDzzDr3F for ; Fri, 21 Dec 2018 06:00:30 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) by bilbo.ozlabs.org (Postfix) with ESMTP id 43LLgK4v1Qz8vnb for ; Fri, 21 Dec 2018 06:00:29 +1100 (AEDT) Received: by ozlabs.org (Postfix) id 43LLgK401Jz9sCV; Fri, 21 Dec 2018 06:00:29 +1100 (AEDT) Delivered-To: linuxppc-dev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=hbathini@linux.ibm.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLgJ6kZxz9sCQ for ; Fri, 21 Dec 2018 06:00:28 +1100 (AEDT) Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBKIwk9w115468 for ; Thu, 20 Dec 2018 14:00:26 -0500 Received: from e06smtp01.uk.ibm.com (e06smtp01.uk.ibm.com [195.75.94.97]) by mx0b-001b2d01.pphosted.com with ESMTP id 2pggdx8xj5-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Dec 2018 14:00:26 -0500 Received: from localhost by e06smtp01.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Dec 2018 19:00:24 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp01.uk.ibm.com (192.168.101.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Dec 2018 19:00:22 -0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBKJ0K2X52691094 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 20 Dec 2018 19:00:20 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 65B3A42049; Thu, 20 Dec 2018 19:00:20 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9A8E44203F; Thu, 20 Dec 2018 19:00:18 +0000 (GMT) Received: from hbathini.in.ibm.com (unknown [9.199.47.3]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 20 Dec 2018 19:00:18 +0000 (GMT) Subject: [PATCH 1/9] powerpc/fadump: move internal fadump code to a new file From: Hari Bathini To: Ananth N Mavinakayanahalli , Michael Ellerman , Mahesh J Salgaonkar , Vasant Hegde , linuxppc-dev , Stewart Smith Date: Fri, 21 Dec 2018 00:30:17 +0530 In-Reply-To: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> References: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18122019-4275-0000-0000-000002F37551 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18122019-4276-0000-0000-00003801811F Message-Id: <154533241749.28973.10621662581635723261.stgit@hbathini.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-12-20_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812200154 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Refactoring fadump code means internal fadump code is referenced from different places. For ease, move internal code to a new file. Signed-off-by: Hari Bathini --- arch/powerpc/include/asm/fadump.h | 112 ------------------- arch/powerpc/kernel/Makefile | 2 arch/powerpc/kernel/fadump.c | 190 ++------------------------------- arch/powerpc/kernel/fadump_internal.c | 184 ++++++++++++++++++++++++++++++++ arch/powerpc/kernel/fadump_internal.h | 126 ++++++++++++++++++++++ 5 files changed, 322 insertions(+), 292 deletions(-) create mode 100644 arch/powerpc/kernel/fadump_internal.c create mode 100644 arch/powerpc/kernel/fadump_internal.h diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h index 188776b..028a8ef 100644 --- a/arch/powerpc/include/asm/fadump.h +++ b/arch/powerpc/include/asm/fadump.h @@ -24,34 +24,6 @@ #ifdef CONFIG_FA_DUMP -/* - * The RMA region will be saved for later dumping when kernel crashes. - * RMA is Real Mode Area, the first block of logical memory address owned - * by logical partition, containing the storage that may be accessed with - * translate off. - */ -#define RMA_START 0x0 -#define RMA_END (ppc64_rma_size) - -/* - * On some Power systems where RMO is 128MB, it still requires minimum of - * 256MB for kernel to boot successfully. When kdump infrastructure is - * configured to save vmcore over network, we run into OOM issue while - * loading modules related to network setup. Hence we need aditional 64M - * of memory to avoid OOM issue. - */ -#define MIN_BOOT_MEM (((RMA_END < (0x1UL << 28)) ? (0x1UL << 28) : RMA_END) \ - + (0x1UL << 26)) - -/* The upper limit percentage for user specified boot memory size (25%) */ -#define MAX_BOOT_MEM_RATIO 4 - -#define memblock_num_regions(memblock_type) (memblock.memblock_type.cnt) - -/* Alignement per CMA requirement. */ -#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << \ - max_t(unsigned long, MAX_ORDER - 1, pageblock_order)) - /* Firmware provided dump sections */ #define FADUMP_CPU_STATE_DATA 0x0001 #define FADUMP_HPTE_REGION 0x0002 @@ -60,18 +32,9 @@ /* Dump request flag */ #define FADUMP_REQUEST_FLAG 0x00000001 -/* FAD commands */ -#define FADUMP_REGISTER 1 -#define FADUMP_UNREGISTER 2 -#define FADUMP_INVALIDATE 3 - /* Dump status flag */ #define FADUMP_ERROR_FLAG 0x2000 -#define FADUMP_CPU_ID_MASK ((1UL << 32) - 1) - -#define CPU_UNKNOWN (~((u32)0)) - /* Utility macros */ #define SKIP_TO_NEXT_CPU(reg_entry) \ ({ \ @@ -125,59 +88,8 @@ struct fadump_mem_struct { struct fadump_section rmr_region; }; -/* Firmware-assisted dump configuration details. */ -struct fw_dump { - unsigned long cpu_state_data_size; - unsigned long hpte_region_size; - unsigned long boot_memory_size; - unsigned long reserve_dump_area_start; - unsigned long reserve_dump_area_size; - /* cmd line option during boot */ - unsigned long reserve_bootvar; - - unsigned long fadumphdr_addr; - unsigned long cpu_notes_buf; - unsigned long cpu_notes_buf_size; - - int ibm_configure_kernel_dump; - - unsigned long fadump_enabled:1; - unsigned long fadump_supported:1; - unsigned long dump_active:1; - unsigned long dump_registered:1; - unsigned long nocma:1; -}; - -/* - * Copy the ascii values for first 8 characters from a string into u64 - * variable at their respective indexes. - * e.g. - * The string "FADMPINF" will be converted into 0x4641444d50494e46 - */ -static inline u64 str_to_u64(const char *str) -{ - u64 val = 0; - int i; - - for (i = 0; i < sizeof(val); i++) - val = (*str) ? (val << 8) | *str++ : val << 8; - return val; -} -#define STR_TO_HEX(x) str_to_u64(x) -#define REG_ID(x) str_to_u64(x) - -#define FADUMP_CRASH_INFO_MAGIC STR_TO_HEX("FADMPINF") #define REGSAVE_AREA_MAGIC STR_TO_HEX("REGSAVE") -/* The firmware-assisted dump format. - * - * The register save area is an area in the partition's memory used to preserve - * the register contents (CPU state data) for the active CPUs during a firmware - * assisted dump. The dump format contains register save area header followed - * by register entries. Each list of registers for a CPU starts with - * "CPUSTRT" and ends with "CPUEND". - */ - /* Register save area header. */ struct fadump_reg_save_area_header { __be64 magic_number; @@ -185,29 +97,9 @@ struct fadump_reg_save_area_header { __be32 num_cpu_offset; }; -/* Register entry. */ -struct fadump_reg_entry { - __be64 reg_id; - __be64 reg_value; -}; - -/* fadump crash info structure */ -struct fadump_crash_info_header { - u64 magic_number; - u64 elfcorehdr_addr; - u32 crashing_cpu; - struct pt_regs regs; - struct cpumask online_mask; -}; - -struct fad_crash_memory_ranges { - unsigned long long base; - unsigned long long size; -}; - extern int is_fadump_memory_area(u64 addr, ulong size); -extern int early_init_dt_scan_fw_dump(unsigned long node, - const char *uname, int depth, void *data); +extern int early_init_dt_scan_fw_dump(unsigned long node, const char *uname, + int depth, void *data); extern int fadump_reserve_mem(void); extern int setup_fadump(void); extern int is_fadump_active(void); diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 53d4b8d..8e4bade 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -65,7 +65,7 @@ obj-$(CONFIG_EEH) += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \ eeh_driver.o eeh_event.o eeh_sysfs.o obj-$(CONFIG_GENERIC_TBSYNC) += smp-tbsync.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o -obj-$(CONFIG_FA_DUMP) += fadump.o +obj-$(CONFIG_FA_DUMP) += fadump.o fadump_internal.o ifdef CONFIG_PPC32 obj-$(CONFIG_E500) += idle_e500.o endif diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index c8ba561..e3f989f 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -44,6 +44,8 @@ #include #include +#include "fadump_internal.h" + static struct fw_dump fw_dump; static struct fadump_mem_struct fdm; static const struct fadump_mem_struct *fdm_active; @@ -119,8 +121,8 @@ static int __init fadump_cma_init(void) { return 1; } #endif /* CONFIG_CMA */ /* Scan the Firmware Assisted dump configuration details. */ -int __init early_init_dt_scan_fw_dump(unsigned long node, - const char *uname, int depth, void *data) +int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, + int depth, void *data) { const __be32 *sections; int i, num_sections; @@ -211,68 +213,6 @@ int is_fadump_active(void) return fw_dump.dump_active; } -/* - * Returns 1, if there are no holes in boot memory area, - * 0 otherwise. - */ -static int is_boot_memory_area_contiguous(void) -{ - struct memblock_region *reg; - unsigned long tstart, tend; - unsigned long start_pfn = PHYS_PFN(RMA_START); - unsigned long end_pfn = PHYS_PFN(RMA_START + fw_dump.boot_memory_size); - unsigned int ret = 0; - - for_each_memblock(memory, reg) { - tstart = max(start_pfn, memblock_region_memory_base_pfn(reg)); - tend = min(end_pfn, memblock_region_memory_end_pfn(reg)); - if (tstart < tend) { - /* Memory hole from start_pfn to tstart */ - if (tstart > start_pfn) - break; - - if (tend == end_pfn) { - ret = 1; - break; - } - - start_pfn = tend + 1; - } - } - - return ret; -} - -/* - * Returns 1, if there are no holes in reserved memory area, - * 0 otherwise. - */ -static int is_reserved_memory_area_contiguous(void) -{ - struct memblock_region *reg; - unsigned long start, end; - unsigned long d_start = fw_dump.reserve_dump_area_start; - unsigned long d_end = d_start + fw_dump.reserve_dump_area_size; - int ret = 0; - - for_each_memblock(memory, reg) { - start = max(d_start, (unsigned long)reg->base); - end = min(d_end, (unsigned long)(reg->base + reg->size)); - if (d_start < end) { - /* Memory hole from d_start to start */ - if (start > d_start) - break; - - if (end == d_end) { - ret = 1; - break; - } - d_start = end + 1; - } - } - return ret; -} - /* Print firmware assisted dump configurations for debugging purpose. */ static void fadump_show_config(void) { @@ -638,10 +578,10 @@ static int register_fw_dump(struct fadump_mem_struct *fdm) " dump. Hardware Error(%d).\n", rc); break; case -3: - if (!is_boot_memory_area_contiguous()) + if (!is_boot_memory_area_contiguous(&fw_dump)) pr_err("Can't have holes in boot memory area while " "registering fadump\n"); - else if (!is_reserved_memory_area_contiguous()) + else if (!is_reserved_memory_area_contiguous(&fw_dump)) pr_err("Can't have holes in reserved memory area while" " registering fadump\n"); @@ -711,52 +651,6 @@ void crash_fadump(struct pt_regs *regs, const char *str) rtas_os_term((char *)str); } -#define GPR_MASK 0xffffff0000000000 -static inline int fadump_gpr_index(u64 id) -{ - int i = -1; - char str[3]; - - if ((id & GPR_MASK) == REG_ID("GPR")) { - /* get the digits at the end */ - id &= ~GPR_MASK; - id >>= 24; - str[2] = '\0'; - str[1] = id & 0xff; - str[0] = (id >> 8) & 0xff; - sscanf(str, "%d", &i); - if (i > 31) - i = -1; - } - return i; -} - -static inline void fadump_set_regval(struct pt_regs *regs, u64 reg_id, - u64 reg_val) -{ - int i; - - i = fadump_gpr_index(reg_id); - if (i >= 0) - regs->gpr[i] = (unsigned long)reg_val; - else if (reg_id == REG_ID("NIA")) - regs->nip = (unsigned long)reg_val; - else if (reg_id == REG_ID("MSR")) - regs->msr = (unsigned long)reg_val; - else if (reg_id == REG_ID("CTR")) - regs->ctr = (unsigned long)reg_val; - else if (reg_id == REG_ID("LR")) - regs->link = (unsigned long)reg_val; - else if (reg_id == REG_ID("XER")) - regs->xer = (unsigned long)reg_val; - else if (reg_id == REG_ID("CR")) - regs->ccr = (unsigned long)reg_val; - else if (reg_id == REG_ID("DAR")) - regs->dar = (unsigned long)reg_val; - else if (reg_id == REG_ID("DSISR")) - regs->dsisr = (unsigned long)reg_val; -} - static struct fadump_reg_entry* fadump_read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs) { @@ -771,72 +665,6 @@ fadump_read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs) return reg_entry; } -static u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs) -{ - struct elf_prstatus prstatus; - - memset(&prstatus, 0, sizeof(prstatus)); - /* - * FIXME: How do i get PID? Do I really need it? - * prstatus.pr_pid = ???? - */ - elf_core_copy_kernel_regs(&prstatus.pr_reg, regs); - buf = append_elf_note(buf, CRASH_CORE_NOTE_NAME, NT_PRSTATUS, - &prstatus, sizeof(prstatus)); - return buf; -} - -static void fadump_update_elfcore_header(char *bufp) -{ - struct elfhdr *elf; - struct elf_phdr *phdr; - - elf = (struct elfhdr *)bufp; - bufp += sizeof(struct elfhdr); - - /* First note is a place holder for cpu notes info. */ - phdr = (struct elf_phdr *)bufp; - - if (phdr->p_type == PT_NOTE) { - phdr->p_paddr = fw_dump.cpu_notes_buf; - phdr->p_offset = phdr->p_paddr; - phdr->p_filesz = fw_dump.cpu_notes_buf_size; - phdr->p_memsz = fw_dump.cpu_notes_buf_size; - } - return; -} - -static void *fadump_cpu_notes_buf_alloc(unsigned long size) -{ - void *vaddr; - struct page *page; - unsigned long order, count, i; - - order = get_order(size); - vaddr = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order); - if (!vaddr) - return NULL; - - count = 1 << order; - page = virt_to_page(vaddr); - for (i = 0; i < count; i++) - SetPageReserved(page + i); - return vaddr; -} - -static void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size) -{ - struct page *page; - unsigned long order, count, i; - - order = get_order(size); - count = 1 << order; - page = virt_to_page(vaddr); - for (i = 0; i < count; i++) - ClearPageReserved(page + i); - __free_pages(page, order); -} - /* * Read CPU state dump data and convert it into ELF notes. * The CPU dump starts with magic number "REGSAVE". NumCpusOffset should be @@ -926,9 +754,9 @@ static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm) final_note(note_buf); if (fdh) { - pr_debug("Updating elfcore header (%llx) with cpu notes\n", - fdh->elfcorehdr_addr); - fadump_update_elfcore_header((char *)__va(fdh->elfcorehdr_addr)); + addr = fdh->elfcorehdr_addr; + pr_debug("Updating elfcore header(%lx) with cpu notes\n", addr); + fadump_update_elfcore_header(&fw_dump, (char *)__va(addr)); } return 0; diff --git a/arch/powerpc/kernel/fadump_internal.c b/arch/powerpc/kernel/fadump_internal.c new file mode 100644 index 0000000..570c357 --- /dev/null +++ b/arch/powerpc/kernel/fadump_internal.c @@ -0,0 +1,184 @@ +/* + * Firmware-Assisted Dump internal code. + * + * Copyright 2018-2019, IBM Corp. + * Author: Hari Bathini + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include + +#include "fadump_internal.h" + +void *fadump_cpu_notes_buf_alloc(unsigned long size) +{ + void *vaddr; + struct page *page; + unsigned long order, count, i; + + order = get_order(size); + vaddr = (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order); + if (!vaddr) + return NULL; + + count = 1 << order; + page = virt_to_page(vaddr); + for (i = 0; i < count; i++) + SetPageReserved(page + i); + return vaddr; +} + +void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size) +{ + struct page *page; + unsigned long order, count, i; + + order = get_order(size); + count = 1 << order; + page = virt_to_page(vaddr); + for (i = 0; i < count; i++) + ClearPageReserved(page + i); + __free_pages(page, order); +} + +#define GPR_MASK 0xffffff0000000000 +static inline int fadump_gpr_index(u64 id) +{ + int i = -1; + char str[3]; + + if ((id & GPR_MASK) == REG_ID("GPR")) { + /* get the digits at the end */ + id &= ~GPR_MASK; + id >>= 24; + str[2] = '\0'; + str[1] = id & 0xff; + str[0] = (id >> 8) & 0xff; + if (kstrtoint(str, 10, &i)) + i = -EINVAL; + if (i > 31) + i = -1; + } + return i; +} + +void fadump_set_regval(struct pt_regs *regs, u64 reg_id, u64 reg_val) +{ + int i; + + i = fadump_gpr_index(reg_id); + if (i >= 0) + regs->gpr[i] = (unsigned long)reg_val; + else if (reg_id == REG_ID("NIA")) + regs->nip = (unsigned long)reg_val; + else if (reg_id == REG_ID("MSR")) + regs->msr = (unsigned long)reg_val; + else if (reg_id == REG_ID("CTR")) + regs->ctr = (unsigned long)reg_val; + else if (reg_id == REG_ID("LR")) + regs->link = (unsigned long)reg_val; + else if (reg_id == REG_ID("XER")) + regs->xer = (unsigned long)reg_val; + else if (reg_id == REG_ID("CR")) + regs->ccr = (unsigned long)reg_val; + else if (reg_id == REG_ID("DAR")) + regs->dar = (unsigned long)reg_val; + else if (reg_id == REG_ID("DSISR")) + regs->dsisr = (unsigned long)reg_val; +} + +u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs) +{ + struct elf_prstatus prstatus; + + memset(&prstatus, 0, sizeof(prstatus)); + /* + * FIXME: How do i get PID? Do I really need it? + * prstatus.pr_pid = ???? + */ + elf_core_copy_kernel_regs(&prstatus.pr_reg, regs); + buf = append_elf_note(buf, CRASH_CORE_NOTE_NAME, NT_PRSTATUS, + &prstatus, sizeof(prstatus)); + return buf; +} + +void fadump_update_elfcore_header(struct fw_dump *fadump_conf, char *bufp) +{ + struct elfhdr *elf; + struct elf_phdr *phdr; + + elf = (struct elfhdr *)bufp; + bufp += sizeof(struct elfhdr); + + /* First note is a place holder for cpu notes info. */ + phdr = (struct elf_phdr *)bufp; + + if (phdr->p_type == PT_NOTE) { + phdr->p_paddr = fadump_conf->cpu_notes_buf; + phdr->p_offset = phdr->p_paddr; + phdr->p_memsz = fadump_conf->cpu_notes_buf_size; + phdr->p_filesz = phdr->p_memsz; + } +} + +/* + * Returns 1, if there are no holes in memory area between d_start to d_end, + * 0 otherwise. + */ +static int is_memory_area_contiguous(unsigned long d_start, + unsigned long d_end) +{ + struct memblock_region *reg; + unsigned long start, end; + int ret = 0; + + for_each_memblock(memory, reg) { + start = max_t(unsigned long, d_start, reg->base); + end = min_t(unsigned long, d_end, (reg->base + reg->size)); + if (d_start < end) { + /* Memory hole from d_start to start */ + if (start > d_start) + break; + + if (end == d_end) { + ret = 1; + break; + } + + d_start = end + 1; + } + } + + return ret; +} + +/* + * Returns 1, if there are no holes in boot memory area, + * 0 otherwise. + */ +int is_boot_memory_area_contiguous(struct fw_dump *fadump_conf) +{ + unsigned long d_start = RMA_START; + unsigned long d_end = RMA_START + fadump_conf->boot_memory_size; + + return is_memory_area_contiguous(d_start, d_end); +} + +/* + * Returns 1, if there are no holes in reserved memory area, + * 0 otherwise. + */ +int is_reserved_memory_area_contiguous(struct fw_dump *fadump_conf) +{ + unsigned long d_start = fadump_conf->reserve_dump_area_start; + unsigned long d_end = d_start + fadump_conf->reserve_dump_area_size; + + return is_memory_area_contiguous(d_start, d_end); +} diff --git a/arch/powerpc/kernel/fadump_internal.h b/arch/powerpc/kernel/fadump_internal.h new file mode 100644 index 0000000..13223f8 --- /dev/null +++ b/arch/powerpc/kernel/fadump_internal.h @@ -0,0 +1,126 @@ +/* + * Firmware-Assisted Dump internal code. + * + * Copyright 2018-2019, IBM Corp. + * Author: Hari Bathini + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef __PPC64_FA_DUMP_INTERNAL_H__ +#define __PPC64_FA_DUMP_INTERNAL_H__ + +/* + * The RMA region will be saved for later dumping when kernel crashes. + * RMA is Real Mode Area, the first block of logical memory address owned + * by logical partition, containing the storage that may be accessed with + * translate off. + */ +#define RMA_START 0x0 +#define RMA_END (ppc64_rma_size) + +/* + * On some Power systems where RMO is 128MB, it still requires minimum of + * 256MB for kernel to boot successfully. When kdump infrastructure is + * configured to save vmcore over network, we run into OOM issue while + * loading modules related to network setup. Hence we need additional 64M + * of memory to avoid OOM issue. + */ +#define MIN_BOOT_MEM (((RMA_END < (0x1UL << 28)) ? (0x1UL << 28) : RMA_END) \ + + (0x1UL << 26)) + +/* The upper limit percentage for user specified boot memory size (25%) */ +#define MAX_BOOT_MEM_RATIO 4 + +#define memblock_num_regions(memblock_type) (memblock.memblock_type.cnt) + +/* Alignment per CMA requirement. */ +#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << \ + max_t(unsigned long, MAX_ORDER - 1, \ + pageblock_order)) + +/* FAD commands */ +#define FADUMP_REGISTER 1 +#define FADUMP_UNREGISTER 2 +#define FADUMP_INVALIDATE 3 + +#define FADUMP_CPU_ID_MASK ((1UL << 32) - 1) + +#define CPU_UNKNOWN (~((u32)0)) + +/* + * Copy the ascii values for first 8 characters from a string into u64 + * variable at their respective indexes. + * e.g. + * The string "FADMPINF" will be converted into 0x4641444d50494e46 + */ +static inline u64 str_to_u64(const char *str) +{ + u64 val = 0; + int i; + + for (i = 0; i < sizeof(val); i++) + val = (*str) ? (val << 8) | *str++ : val << 8; + return val; +} +#define STR_TO_HEX(x) str_to_u64(x) +#define REG_ID(x) str_to_u64(x) + +#define FADUMP_CRASH_INFO_MAGIC STR_TO_HEX("FADMPINF") + +/* Register entry. */ +struct fadump_reg_entry { + __be64 reg_id; + __be64 reg_value; +}; + +/* fadump crash info structure */ +struct fadump_crash_info_header { + u64 magic_number; + u64 elfcorehdr_addr; + u32 crashing_cpu; + struct pt_regs regs; + struct cpumask online_mask; +}; + +struct fad_crash_memory_ranges { + unsigned long long base; + unsigned long long size; +}; + +/* Firmware-assisted dump configuration details. */ +struct fw_dump { + unsigned long cpu_state_data_size; + unsigned long hpte_region_size; + unsigned long boot_memory_size; + unsigned long reserve_dump_area_start; + unsigned long reserve_dump_area_size; + /* cmd line option during boot */ + unsigned long reserve_bootvar; + + unsigned long fadumphdr_addr; + unsigned long cpu_notes_buf; + unsigned long cpu_notes_buf_size; + + int ibm_configure_kernel_dump; + + unsigned long fadump_enabled:1; + unsigned long fadump_supported:1; + unsigned long dump_active:1; + unsigned long dump_registered:1; + unsigned long nocma:1; +}; + +/* Helper functions */ +void *fadump_cpu_notes_buf_alloc(unsigned long size); +void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size); +void fadump_set_regval(struct pt_regs *regs, u64 reg_id, u64 reg_val); +u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs); +void fadump_update_elfcore_header(struct fw_dump *fadump_config, char *bufp); +int is_boot_memory_area_contiguous(struct fw_dump *fadump_conf); +int is_reserved_memory_area_contiguous(struct fw_dump *fadump_conf); + +#endif /* __PPC64_FA_DUMP_INTERNAL_H__ */ From patchwork Thu Dec 20 19:00:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hari Bathini X-Patchwork-Id: 1016969 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLpL5VLGz9sC7 for ; Fri, 21 Dec 2018 06:06:34 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43LLpL3xWFzDr9y for ; Fri, 21 Dec 2018 06:06:34 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43LLgS4KJFzDr5Q for ; Fri, 21 Dec 2018 06:00:36 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) by bilbo.ozlabs.org (Postfix) with ESMTP id 43LLgS3VhMz8vRt for ; Fri, 21 Dec 2018 06:00:36 +1100 (AEDT) Received: by ozlabs.org (Postfix) id 43LLgS29s8z9sCV; Fri, 21 Dec 2018 06:00:36 +1100 (AEDT) Delivered-To: linuxppc-dev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=hbathini@linux.ibm.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLgR5vTMz9sCQ for ; Fri, 21 Dec 2018 06:00:35 +1100 (AEDT) Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBKIx1g2043932 for ; Thu, 20 Dec 2018 14:00:34 -0500 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 2pgffuk8q7-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Dec 2018 14:00:33 -0500 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Dec 2018 19:00:31 -0000 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp02.uk.ibm.com (192.168.101.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Dec 2018 19:00:29 -0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBKJ0SJK10224124 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 20 Dec 2018 19:00:28 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0C555AE04D; Thu, 20 Dec 2018 19:00:28 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A6EBDAE057; Thu, 20 Dec 2018 19:00:26 +0000 (GMT) Received: from hbathini.in.ibm.com (unknown [9.199.47.3]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 20 Dec 2018 19:00:26 +0000 (GMT) Subject: [PATCH 2/9] powerpc/fadump: Improve fadump documentation From: Hari Bathini To: Ananth N Mavinakayanahalli , Michael Ellerman , Mahesh J Salgaonkar , Vasant Hegde , linuxppc-dev , Stewart Smith Date: Fri, 21 Dec 2018 00:30:25 +0530 In-Reply-To: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> References: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18122019-0008-0000-0000-000002A4344B X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18122019-0009-0000-0000-0000220ED32F Message-Id: <154533242556.28973.13958422857496711413.stgit@hbathini.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-12-20_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=857 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812200154 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The figures depicting FADump's (Firmware-Assisted Dump) memory layout are missing some finer details like different memory regions and what they represent. Improve the documentation by updating those details. Signed-off-by: Hari Bathini --- Documentation/powerpc/firmware-assisted-dump.txt | 56 ++++++++++++---------- 1 file changed, 30 insertions(+), 26 deletions(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index 18c5fee..4897665 100644 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -125,42 +125,46 @@ space memory except the user pages that were present in CMA region. o Memory Reservation during first kernel - Low memory Top of memory - 0 boot memory size | - | | |<--Reserved dump area -->| | - V V | Permanent Reservation | V - +-----------+----------/ /---+---+----+-----------+----+------+ - | | |CPU|HPTE| DUMP |ELF | | - +-----------+----------/ /---+---+----+-----------+----+------+ - | ^ - | | - \ / - ------------------------------------------- - Boot memory content gets transferred to - reserved area by firmware at the time of - crash + Low memory Top of memory + 0 boot memory size |<--Reserved dump area --->| | + | | | Permanent Reservation | | + V V | (Preserve area) | V + +-----------+----------/ /---+---+----+--------+---+----+------+ + | | |CPU|HPTE| DUMP |HDR|ELF | | + +-----------+----------/ /---+---+----+--------+---+----+------+ + | ^ ^ + | | | + \ / | + ----------------------------------- FADump Header + Boot memory content gets transferred (meta area) + to reserved area by firmware at the + time of crash + Fig. 1 + o Memory Reservation during second kernel after crash - Low memory Top of memory - 0 boot memory size | - | |<------------- Reserved dump area ----------- -->| - V V V - +-----------+----------/ /---+---+----+-----------+----+------+ - | | |CPU|HPTE| DUMP |ELF | | - +-----------+----------/ /---+---+----+-----------+----+------+ + Low memory Top of memory + 0 boot memory size | + | |<------------- Reserved dump area --------------->| + V V |<---- Preserve area ----->| V + +-----------+----------/ /---+---+----+--------+---+----+------+ + | | |CPU|HPTE| DUMP |HDR|ELF | | + +-----------+----------/ /---+---+----+--------+---+----+------+ | | V V Used by second /proc/vmcore kernel to boot Fig. 2 -Currently the dump will be copied from /proc/vmcore to a -a new file upon user intervention. The dump data available through -/proc/vmcore will be in ELF format. Hence the existing kdump -infrastructure (kdump scripts) to save the dump works fine with -minor modifications. +Currently the dump will be copied from /proc/vmcore to a new file upon +user intervention. The dump data available through /proc/vmcore will be +in ELF format. Hence the existing kdump infrastructure (kdump scripts) +to save the dump works fine with minor modifications. KDump scripts on +major Distro releases have already been modified to work seemlessly (no +user intervention in saving the dump) when FADump is used, instead of +KDump, as dump mechanism. The tools to examine the dump will be same as the ones used for kdump. From patchwork Thu Dec 20 19:00:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hari Bathini X-Patchwork-Id: 1016973 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLrf6fyfz9sC7 for ; Fri, 21 Dec 2018 06:08:34 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43LLrf5B3vzDrBs for ; Fri, 21 Dec 2018 06:08:34 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43LLgf3M6qzDr4F for ; Fri, 21 Dec 2018 06:00:46 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) by bilbo.ozlabs.org (Postfix) with ESMTP id 43LLgf1YsXz8tRf for ; Fri, 21 Dec 2018 06:00:46 +1100 (AEDT) Received: by ozlabs.org (Postfix) id 43LLgf172wz9sCh; Fri, 21 Dec 2018 06:00:46 +1100 (AEDT) Delivered-To: linuxppc-dev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=hbathini@linux.ibm.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLgd3MyJz9sCQ for ; Fri, 21 Dec 2018 06:00:45 +1100 (AEDT) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBKIwoND047113 for ; Thu, 20 Dec 2018 14:00:43 -0500 Received: from e06smtp01.uk.ibm.com (e06smtp01.uk.ibm.com [195.75.94.97]) by mx0a-001b2d01.pphosted.com with ESMTP id 2pgdrfr81x-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Dec 2018 14:00:42 -0500 Received: from localhost by e06smtp01.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Dec 2018 19:00:40 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp01.uk.ibm.com (192.168.101.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Dec 2018 19:00:38 -0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBKJ0acg50987090 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 20 Dec 2018 19:00:36 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3CBD5AE05F; Thu, 20 Dec 2018 19:00:36 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 48EF2AE059; Thu, 20 Dec 2018 19:00:34 +0000 (GMT) Received: from hbathini.in.ibm.com (unknown [9.199.47.3]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 20 Dec 2018 19:00:34 +0000 (GMT) Subject: [PATCH 3/9] pseries/fadump: move out platform specific support from generic code From: Hari Bathini To: Ananth N Mavinakayanahalli , Michael Ellerman , Mahesh J Salgaonkar , Vasant Hegde , linuxppc-dev , Stewart Smith Date: Fri, 21 Dec 2018 00:30:33 +0530 In-Reply-To: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> References: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18122019-4275-0000-0000-000002F37560 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18122019-4276-0000-0000-00003801812D Message-Id: <154533243323.28973.2666959974924271089.stgit@hbathini.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-12-20_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812200154 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Introduce callbacks for platform specific operations like register, unregister, invalidate & such, and move pseries specific code into platform code. Signed-off-by: Hari Bathini --- arch/powerpc/include/asm/fadump.h | 71 --- arch/powerpc/kernel/fadump.c | 502 +--------------------- arch/powerpc/kernel/fadump_internal.h | 38 ++ arch/powerpc/platforms/pseries/Makefile | 1 arch/powerpc/platforms/pseries/pseries_fadump.c | 537 +++++++++++++++++++++++ arch/powerpc/platforms/pseries/pseries_fadump.h | 96 ++++ 6 files changed, 707 insertions(+), 538 deletions(-) create mode 100644 arch/powerpc/platforms/pseries/pseries_fadump.c create mode 100644 arch/powerpc/platforms/pseries/pseries_fadump.h diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h index 028a8ef..db9465f 100644 --- a/arch/powerpc/include/asm/fadump.h +++ b/arch/powerpc/include/asm/fadump.h @@ -24,79 +24,8 @@ #ifdef CONFIG_FA_DUMP -/* Firmware provided dump sections */ -#define FADUMP_CPU_STATE_DATA 0x0001 -#define FADUMP_HPTE_REGION 0x0002 -#define FADUMP_REAL_MODE_REGION 0x0011 - -/* Dump request flag */ -#define FADUMP_REQUEST_FLAG 0x00000001 - -/* Dump status flag */ -#define FADUMP_ERROR_FLAG 0x2000 - -/* Utility macros */ -#define SKIP_TO_NEXT_CPU(reg_entry) \ -({ \ - while (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUEND")) \ - reg_entry++; \ - reg_entry++; \ -}) - extern int crashing_cpu; -/* Kernel Dump section info */ -struct fadump_section { - __be32 request_flag; - __be16 source_data_type; - __be16 error_flags; - __be64 source_address; - __be64 source_len; - __be64 bytes_dumped; - __be64 destination_address; -}; - -/* ibm,configure-kernel-dump header. */ -struct fadump_section_header { - __be32 dump_format_version; - __be16 dump_num_sections; - __be16 dump_status_flag; - __be32 offset_first_dump_section; - - /* Fields for disk dump option. */ - __be32 dd_block_size; - __be64 dd_block_offset; - __be64 dd_num_blocks; - __be32 dd_offset_disk_path; - - /* Maximum time allowed to prevent an automatic dump-reboot. */ - __be32 max_time_auto; -}; - -/* - * Firmware Assisted dump memory structure. This structure is required for - * registering future kernel dump with power firmware through rtas call. - * - * No disk dump option. Hence disk dump path string section is not included. - */ -struct fadump_mem_struct { - struct fadump_section_header header; - - /* Kernel dump sections */ - struct fadump_section cpu_state_data; - struct fadump_section hpte_region; - struct fadump_section rmr_region; -}; - -#define REGSAVE_AREA_MAGIC STR_TO_HEX("REGSAVE") - -/* Register save area header. */ -struct fadump_reg_save_area_header { - __be64 magic_number; - __be32 version; - __be32 num_cpu_offset; -}; - extern int is_fadump_memory_area(u64 addr, ulong size); extern int early_init_dt_scan_fw_dump(unsigned long node, const char *uname, int depth, void *data); diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index e3f989f..36d9d48 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -40,15 +40,12 @@ #include #include #include -#include #include #include #include "fadump_internal.h" static struct fw_dump fw_dump; -static struct fadump_mem_struct fdm; -static const struct fadump_mem_struct *fdm_active; #ifdef CONFIG_CMA static struct cma *fadump_cma; #endif @@ -124,63 +121,13 @@ static int __init fadump_cma_init(void) { return 1; } int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, int depth, void *data) { - const __be32 *sections; - int i, num_sections; - int size; - const __be32 *token; - - if (depth != 1 || strcmp(uname, "rtas") != 0) + if (depth != 1) return 0; - /* - * Check if Firmware Assisted dump is supported. if yes, check - * if dump has been initiated on last reboot. - */ - token = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL); - if (!token) - return 1; - - fw_dump.fadump_supported = 1; - fw_dump.ibm_configure_kernel_dump = be32_to_cpu(*token); - - /* - * The 'ibm,kernel-dump' rtas node is present only if there is - * dump data waiting for us. - */ - fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL); - if (fdm_active) - fw_dump.dump_active = 1; - - /* Get the sizes required to store dump data for the firmware provided - * dump sections. - * For each dump section type supported, a 32bit cell which defines - * the ID of a supported section followed by two 32 bit cells which - * gives teh size of the section in bytes. - */ - sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes", - &size); - - if (!sections) - return 1; - - num_sections = size / (3 * sizeof(u32)); + if (strcmp(uname, "rtas") == 0) + return pseries_dt_scan_fadump(&fw_dump, node); - for (i = 0; i < num_sections; i++, sections += 3) { - u32 type = (u32)of_read_number(sections, 1); - - switch (type) { - case FADUMP_CPU_STATE_DATA: - fw_dump.cpu_state_data_size = - of_read_ulong(§ions[1], 2); - break; - case FADUMP_HPTE_REGION: - fw_dump.hpte_region_size = - of_read_ulong(§ions[1], 2); - break; - } - } - - return 1; + return 0; } /* @@ -232,61 +179,6 @@ static void fadump_show_config(void) pr_debug("Boot memory size : %lx\n", fw_dump.boot_memory_size); } -static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm, - unsigned long addr) -{ - if (!fdm) - return 0; - - memset(fdm, 0, sizeof(struct fadump_mem_struct)); - addr = addr & PAGE_MASK; - - fdm->header.dump_format_version = cpu_to_be32(0x00000001); - fdm->header.dump_num_sections = cpu_to_be16(3); - fdm->header.dump_status_flag = 0; - fdm->header.offset_first_dump_section = - cpu_to_be32((u32)offsetof(struct fadump_mem_struct, cpu_state_data)); - - /* - * Fields for disk dump option. - * We are not using disk dump option, hence set these fields to 0. - */ - fdm->header.dd_block_size = 0; - fdm->header.dd_block_offset = 0; - fdm->header.dd_num_blocks = 0; - fdm->header.dd_offset_disk_path = 0; - - /* set 0 to disable an automatic dump-reboot. */ - fdm->header.max_time_auto = 0; - - /* Kernel dump sections */ - /* cpu state data section. */ - fdm->cpu_state_data.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG); - fdm->cpu_state_data.source_data_type = cpu_to_be16(FADUMP_CPU_STATE_DATA); - fdm->cpu_state_data.source_address = 0; - fdm->cpu_state_data.source_len = cpu_to_be64(fw_dump.cpu_state_data_size); - fdm->cpu_state_data.destination_address = cpu_to_be64(addr); - addr += fw_dump.cpu_state_data_size; - - /* hpte region section */ - fdm->hpte_region.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG); - fdm->hpte_region.source_data_type = cpu_to_be16(FADUMP_HPTE_REGION); - fdm->hpte_region.source_address = 0; - fdm->hpte_region.source_len = cpu_to_be64(fw_dump.hpte_region_size); - fdm->hpte_region.destination_address = cpu_to_be64(addr); - addr += fw_dump.hpte_region_size; - - /* RMA region section */ - fdm->rmr_region.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG); - fdm->rmr_region.source_data_type = cpu_to_be16(FADUMP_REAL_MODE_REGION); - fdm->rmr_region.source_address = cpu_to_be64(RMA_START); - fdm->rmr_region.source_len = cpu_to_be64(fw_dump.boot_memory_size); - fdm->rmr_region.destination_address = cpu_to_be64(addr); - addr += fw_dump.boot_memory_size; - - return addr; -} - /** * fadump_calculate_reserve_size(): reserve variable boot area 5% of System RAM * @@ -417,8 +309,8 @@ int __init fadump_reserve_mem(void) * If dump is active then we have already calculated the size during * first kernel. */ - if (fdm_active) - fw_dump.boot_memory_size = be64_to_cpu(fdm_active->rmr_region.source_len); + if (fw_dump.dump_active) + fw_dump.boot_memory_size = fw_dump.rmr_source_len; else { fw_dump.boot_memory_size = fadump_calculate_reserve_size(); #ifdef CONFIG_CMA @@ -427,8 +319,11 @@ int __init fadump_reserve_mem(void) ALIGN(fw_dump.boot_memory_size, FADUMP_CMA_ALIGNMENT); #endif + fw_dump.rmr_source_len = fw_dump.boot_memory_size; } + size = get_fadump_area_size(); + /* * Calculate the memory boundary. * If memory_limit is less than actual memory boundary then reserve @@ -437,13 +332,11 @@ int __init fadump_reserve_mem(void) * specified memory_limit. */ if (memory_limit && memory_limit < memblock_end_of_DRAM()) { - size = get_fadump_area_size(); if ((memory_limit + size) < memblock_end_of_DRAM()) memory_limit += size; else memory_limit = memblock_end_of_DRAM(); - printk(KERN_INFO "Adjusted memory_limit for firmware-assisted" - " dump, now %#016llx\n", memory_limit); + pr_info("memory_limit adjusted to %#016llx\n", memory_limit); } if (memory_limit) memory_boundary = memory_limit; @@ -451,8 +344,6 @@ int __init fadump_reserve_mem(void) memory_boundary = memblock_end_of_DRAM(); if (fw_dump.dump_active) { - pr_info("Firmware-assisted dump is active.\n"); - #ifdef CONFIG_HUGETLB_PAGE /* * FADump capture kernel doesn't care much about hugepages. @@ -471,15 +362,11 @@ int __init fadump_reserve_mem(void) size = memory_boundary - base; fadump_reserve_crash_area(base, size); - fw_dump.fadumphdr_addr = - be64_to_cpu(fdm_active->rmr_region.destination_address) + - be64_to_cpu(fdm_active->rmr_region.source_len); - pr_debug("fadumphdr_addr = %pa\n", &fw_dump.fadumphdr_addr); + fw_dump.fadumphdr_addr = fw_dump.meta_area_start; + pr_debug("fadumphdr_addr = %#016lx\n", fw_dump.fadumphdr_addr); fw_dump.reserve_dump_area_start = base; fw_dump.reserve_dump_area_size = size; } else { - size = get_fadump_area_size(); - /* * Reserve memory at an offset closer to bottom of the RAM to * minimize the impact of memory hot-remove operation. We can't @@ -549,62 +436,6 @@ static int __init early_fadump_reserve_mem(char *p) } early_param("fadump_reserve_mem", early_fadump_reserve_mem); -static int register_fw_dump(struct fadump_mem_struct *fdm) -{ - int rc, err; - unsigned int wait_time; - - pr_debug("Registering for firmware-assisted kernel dump...\n"); - - /* TODO: Add upper time limit for the delay */ - do { - rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL, - FADUMP_REGISTER, fdm, - sizeof(struct fadump_mem_struct)); - - wait_time = rtas_busy_delay_time(rc); - if (wait_time) - mdelay(wait_time); - - } while (wait_time); - - err = -EIO; - switch (rc) { - default: - pr_err("Failed to register. Unknown Error(%d).\n", rc); - break; - case -1: - printk(KERN_ERR "Failed to register firmware-assisted kernel" - " dump. Hardware Error(%d).\n", rc); - break; - case -3: - if (!is_boot_memory_area_contiguous(&fw_dump)) - pr_err("Can't have holes in boot memory area while " - "registering fadump\n"); - else if (!is_reserved_memory_area_contiguous(&fw_dump)) - pr_err("Can't have holes in reserved memory area while" - " registering fadump\n"); - - printk(KERN_ERR "Failed to register firmware-assisted kernel" - " dump. Parameter Error(%d).\n", rc); - err = -EINVAL; - break; - case -9: - printk(KERN_ERR "firmware-assisted kernel dump is already " - " registered."); - fw_dump.dump_registered = 1; - err = -EEXIST; - break; - case 0: - printk(KERN_INFO "firmware-assisted kernel dump registration" - " is successful\n"); - fw_dump.dump_registered = 1; - err = 0; - break; - } - return err; -} - void crash_fadump(struct pt_regs *regs, const char *str) { struct fadump_crash_info_header *fdh = NULL; @@ -647,173 +478,7 @@ void crash_fadump(struct pt_regs *regs, const char *str) fdh->online_mask = *cpu_online_mask; - /* Call ibm,os-term rtas call to trigger firmware assisted dump */ - rtas_os_term((char *)str); -} - -static struct fadump_reg_entry* -fadump_read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs) -{ - memset(regs, 0, sizeof(struct pt_regs)); - - while (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUEND")) { - fadump_set_regval(regs, be64_to_cpu(reg_entry->reg_id), - be64_to_cpu(reg_entry->reg_value)); - reg_entry++; - } - reg_entry++; - return reg_entry; -} - -/* - * Read CPU state dump data and convert it into ELF notes. - * The CPU dump starts with magic number "REGSAVE". NumCpusOffset should be - * used to access the data to allow for additional fields to be added without - * affecting compatibility. Each list of registers for a CPU starts with - * "CPUSTRT" and ends with "CPUEND". Each register entry is of 16 bytes, - * 8 Byte ASCII identifier and 8 Byte register value. The register entry - * with identifier "CPUSTRT" and "CPUEND" contains 4 byte cpu id as part - * of register value. For more details refer to PAPR document. - * - * Only for the crashing cpu we ignore the CPU dump data and get exact - * state from fadump crash info structure populated by first kernel at the - * time of crash. - */ -static int __init fadump_build_cpu_notes(const struct fadump_mem_struct *fdm) -{ - struct fadump_reg_save_area_header *reg_header; - struct fadump_reg_entry *reg_entry; - struct fadump_crash_info_header *fdh = NULL; - void *vaddr; - unsigned long addr; - u32 num_cpus, *note_buf; - struct pt_regs regs; - int i, rc = 0, cpu = 0; - - if (!fdm->cpu_state_data.bytes_dumped) - return -EINVAL; - - addr = be64_to_cpu(fdm->cpu_state_data.destination_address); - vaddr = __va(addr); - - reg_header = vaddr; - if (be64_to_cpu(reg_header->magic_number) != REGSAVE_AREA_MAGIC) { - printk(KERN_ERR "Unable to read register save area.\n"); - return -ENOENT; - } - pr_debug("--------CPU State Data------------\n"); - pr_debug("Magic Number: %llx\n", be64_to_cpu(reg_header->magic_number)); - pr_debug("NumCpuOffset: %x\n", be32_to_cpu(reg_header->num_cpu_offset)); - - vaddr += be32_to_cpu(reg_header->num_cpu_offset); - num_cpus = be32_to_cpu(*((__be32 *)(vaddr))); - pr_debug("NumCpus : %u\n", num_cpus); - vaddr += sizeof(u32); - reg_entry = (struct fadump_reg_entry *)vaddr; - - /* Allocate buffer to hold cpu crash notes. */ - fw_dump.cpu_notes_buf_size = num_cpus * sizeof(note_buf_t); - fw_dump.cpu_notes_buf_size = PAGE_ALIGN(fw_dump.cpu_notes_buf_size); - note_buf = fadump_cpu_notes_buf_alloc(fw_dump.cpu_notes_buf_size); - if (!note_buf) { - printk(KERN_ERR "Failed to allocate 0x%lx bytes for " - "cpu notes buffer\n", fw_dump.cpu_notes_buf_size); - return -ENOMEM; - } - fw_dump.cpu_notes_buf = __pa(note_buf); - - pr_debug("Allocated buffer for cpu notes of size %ld at %p\n", - (num_cpus * sizeof(note_buf_t)), note_buf); - - if (fw_dump.fadumphdr_addr) - fdh = __va(fw_dump.fadumphdr_addr); - - for (i = 0; i < num_cpus; i++) { - if (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUSTRT")) { - printk(KERN_ERR "Unable to read CPU state data\n"); - rc = -ENOENT; - goto error_out; - } - /* Lower 4 bytes of reg_value contains logical cpu id */ - cpu = be64_to_cpu(reg_entry->reg_value) & FADUMP_CPU_ID_MASK; - if (fdh && !cpumask_test_cpu(cpu, &fdh->online_mask)) { - SKIP_TO_NEXT_CPU(reg_entry); - continue; - } - pr_debug("Reading register data for cpu %d...\n", cpu); - if (fdh && fdh->crashing_cpu == cpu) { - regs = fdh->regs; - note_buf = fadump_regs_to_elf_notes(note_buf, ®s); - SKIP_TO_NEXT_CPU(reg_entry); - } else { - reg_entry++; - reg_entry = fadump_read_registers(reg_entry, ®s); - note_buf = fadump_regs_to_elf_notes(note_buf, ®s); - } - } - final_note(note_buf); - - if (fdh) { - addr = fdh->elfcorehdr_addr; - pr_debug("Updating elfcore header(%lx) with cpu notes\n", addr); - fadump_update_elfcore_header(&fw_dump, (char *)__va(addr)); - } - return 0; - -error_out: - fadump_cpu_notes_buf_free((unsigned long)__va(fw_dump.cpu_notes_buf), - fw_dump.cpu_notes_buf_size); - fw_dump.cpu_notes_buf = 0; - fw_dump.cpu_notes_buf_size = 0; - return rc; - -} - -/* - * Validate and process the dump data stored by firmware before exporting - * it through '/proc/vmcore'. - */ -static int __init process_fadump(const struct fadump_mem_struct *fdm_active) -{ - struct fadump_crash_info_header *fdh; - int rc = 0; - - if (!fdm_active || !fw_dump.fadumphdr_addr) - return -EINVAL; - - /* Check if the dump data is valid. */ - if ((be16_to_cpu(fdm_active->header.dump_status_flag) == FADUMP_ERROR_FLAG) || - (fdm_active->cpu_state_data.error_flags != 0) || - (fdm_active->rmr_region.error_flags != 0)) { - printk(KERN_ERR "Dump taken by platform is not valid\n"); - return -EINVAL; - } - if ((fdm_active->rmr_region.bytes_dumped != - fdm_active->rmr_region.source_len) || - !fdm_active->cpu_state_data.bytes_dumped) { - printk(KERN_ERR "Dump taken by platform is incomplete\n"); - return -EINVAL; - } - - /* Validate the fadump crash info header */ - fdh = __va(fw_dump.fadumphdr_addr); - if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { - printk(KERN_ERR "Crash info header is not valid.\n"); - return -EINVAL; - } - - rc = fadump_build_cpu_notes(fdm_active); - if (rc) - return rc; - - /* - * We are done validating dump info and elfcore header is now ready - * to be exported. set elfcorehdr_addr so that vmcore module will - * export the elfcore header through '/proc/vmcore'. - */ - elfcorehdr_addr = fdh->elfcorehdr_addr; - - return 0; + fw_dump.ops->crash_fadump(str); } static void free_crash_memory_ranges(void) @@ -1010,7 +675,7 @@ static int fadump_setup_crash_memory_ranges(void) static inline unsigned long fadump_relocate(unsigned long paddr) { if (paddr > RMA_START && paddr < fw_dump.boot_memory_size) - return be64_to_cpu(fdm.rmr_region.destination_address) + paddr; + return fw_dump.rmr_destination_addr + paddr; else return paddr; } @@ -1083,7 +748,7 @@ static int fadump_create_elfcore_headers(char *bufp) * to the specified destination_address. Hence set * the correct offset. */ - phdr->p_offset = be64_to_cpu(fdm.rmr_region.destination_address); + phdr->p_offset = fw_dump.rmr_destination_addr; } phdr->p_paddr = mbase; @@ -1135,7 +800,8 @@ static int register_fadump(void) if (ret) return ret; - addr = be64_to_cpu(fdm.rmr_region.destination_address) + be64_to_cpu(fdm.rmr_region.source_len); + addr = fw_dump.meta_area_start; + /* Initialize fadump crash info header. */ addr = init_fadump_header(addr); vaddr = __va(addr); @@ -1144,72 +810,19 @@ static int register_fadump(void) fadump_create_elfcore_headers(vaddr); /* register the future kernel dump with firmware. */ - return register_fw_dump(&fdm); -} - -static int fadump_unregister_dump(struct fadump_mem_struct *fdm) -{ - int rc = 0; - unsigned int wait_time; - - pr_debug("Un-register firmware-assisted dump\n"); - - /* TODO: Add upper time limit for the delay */ - do { - rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL, - FADUMP_UNREGISTER, fdm, - sizeof(struct fadump_mem_struct)); - - wait_time = rtas_busy_delay_time(rc); - if (wait_time) - mdelay(wait_time); - } while (wait_time); - - if (rc) { - printk(KERN_ERR "Failed to un-register firmware-assisted dump." - " unexpected error(%d).\n", rc); - return rc; - } - fw_dump.dump_registered = 0; - return 0; -} - -static int fadump_invalidate_dump(const struct fadump_mem_struct *fdm) -{ - int rc = 0; - unsigned int wait_time; - - pr_debug("Invalidating firmware-assisted dump registration\n"); - - /* TODO: Add upper time limit for the delay */ - do { - rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL, - FADUMP_INVALIDATE, fdm, - sizeof(struct fadump_mem_struct)); - - wait_time = rtas_busy_delay_time(rc); - if (wait_time) - mdelay(wait_time); - } while (wait_time); - - if (rc) { - pr_err("Failed to invalidate firmware-assisted dump registration. Unexpected error (%d).\n", rc); - return rc; - } - fw_dump.dump_active = 0; - fdm_active = NULL; - return 0; + pr_debug("Registering for firmware-assisted kernel dump...\n"); + return fw_dump.ops->register_fadump(&fw_dump); } void fadump_cleanup(void) { /* Invalidate the registration only if dump is active. */ if (fw_dump.dump_active) { - /* pass the same memory dump structure provided by platform */ - fadump_invalidate_dump(fdm_active); + pr_debug("Invalidating firmware-assisted dump registration\n"); + fw_dump.ops->invalidate_fadump(&fw_dump); } else if (fw_dump.dump_registered) { /* Un-register Firmware-assisted dump if it was registered. */ - fadump_unregister_dump(&fdm); + fw_dump.ops->unregister_fadump(&fw_dump); free_crash_memory_ranges(); } } @@ -1292,7 +905,7 @@ static void fadump_invalidate_release_mem(void) return; } - destination_address = be64_to_cpu(fdm_active->cpu_state_data.destination_address); + destination_address = fw_dump.preserv_area_start; fadump_cleanup(); mutex_unlock(&fadump_mutex); @@ -1318,8 +931,9 @@ static void fadump_invalidate_release_mem(void) fw_dump.cpu_notes_buf = 0; fw_dump.cpu_notes_buf_size = 0; } + /* Initialize the kernel dump memory structure for FAD registration. */ - init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start); + fw_dump.ops->init_fadump_mem_struct(&fw_dump); } static ssize_t fadump_release_memory_store(struct kobject *kobj, @@ -1370,7 +984,7 @@ static ssize_t fadump_register_store(struct kobject *kobj, int ret = 0; int input = -1; - if (!fw_dump.fadump_enabled || fdm_active) + if (!fw_dump.fadump_enabled || fw_dump.dump_active) return -EPERM; if (kstrtoint(buf, 0, &input)) @@ -1383,13 +997,15 @@ static ssize_t fadump_register_store(struct kobject *kobj, if (fw_dump.dump_registered == 0) { goto unlock_out; } + /* Un-register Firmware-assisted dump */ - fadump_unregister_dump(&fdm); + pr_debug("Un-register firmware-assisted dump\n"); + fw_dump.ops->unregister_fadump(&fw_dump); break; case 1: if (fw_dump.dump_registered == 1) { /* Un-register Firmware-assisted dump */ - fadump_unregister_dump(&fdm); + fw_dump.ops->unregister_fadump(&fw_dump); } /* Register Firmware-assisted dump */ ret = register_fadump(); @@ -1406,62 +1022,13 @@ static ssize_t fadump_register_store(struct kobject *kobj, static int fadump_region_show(struct seq_file *m, void *private) { - const struct fadump_mem_struct *fdm_ptr; - if (!fw_dump.fadump_enabled) return 0; mutex_lock(&fadump_mutex); - if (fdm_active) - fdm_ptr = fdm_active; - else { - mutex_unlock(&fadump_mutex); - fdm_ptr = &fdm; - } + fw_dump.ops->fadump_region_show(&fw_dump, m); + mutex_unlock(&fadump_mutex); - seq_printf(m, - "CPU : [%#016llx-%#016llx] %#llx bytes, " - "Dumped: %#llx\n", - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address), - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) + - be64_to_cpu(fdm_ptr->cpu_state_data.source_len) - 1, - be64_to_cpu(fdm_ptr->cpu_state_data.source_len), - be64_to_cpu(fdm_ptr->cpu_state_data.bytes_dumped)); - seq_printf(m, - "HPTE: [%#016llx-%#016llx] %#llx bytes, " - "Dumped: %#llx\n", - be64_to_cpu(fdm_ptr->hpte_region.destination_address), - be64_to_cpu(fdm_ptr->hpte_region.destination_address) + - be64_to_cpu(fdm_ptr->hpte_region.source_len) - 1, - be64_to_cpu(fdm_ptr->hpte_region.source_len), - be64_to_cpu(fdm_ptr->hpte_region.bytes_dumped)); - seq_printf(m, - "DUMP: [%#016llx-%#016llx] %#llx bytes, " - "Dumped: %#llx\n", - be64_to_cpu(fdm_ptr->rmr_region.destination_address), - be64_to_cpu(fdm_ptr->rmr_region.destination_address) + - be64_to_cpu(fdm_ptr->rmr_region.source_len) - 1, - be64_to_cpu(fdm_ptr->rmr_region.source_len), - be64_to_cpu(fdm_ptr->rmr_region.bytes_dumped)); - - if (!fdm_active || - (fw_dump.reserve_dump_area_start == - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address))) - goto out; - - /* Dump is active. Show reserved memory region. */ - seq_printf(m, - " : [%#016llx-%#016llx] %#llx bytes, " - "Dumped: %#llx\n", - (unsigned long long)fw_dump.reserve_dump_area_start, - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) - 1, - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) - - fw_dump.reserve_dump_area_start, - be64_to_cpu(fdm_ptr->cpu_state_data.destination_address) - - fw_dump.reserve_dump_area_start); -out: - if (fdm_active) - mutex_unlock(&fadump_mutex); return 0; } @@ -1542,12 +1109,13 @@ int __init setup_fadump(void) * if dump process fails then invalidate the registration * and release memory before proceeding for re-registration. */ - if (process_fadump(fdm_active) < 0) + if (fw_dump.ops->process_fadump(&fw_dump) < 0) fadump_invalidate_release_mem(); } /* Initialize the kernel dump memory structure for FAD registration. */ else if (fw_dump.reserve_dump_area_size) - init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start); + fw_dump.ops->init_fadump_mem_struct(&fw_dump); + fadump_init_files(); return 1; diff --git a/arch/powerpc/kernel/fadump_internal.h b/arch/powerpc/kernel/fadump_internal.h index 13223f8..d89127f5 100644 --- a/arch/powerpc/kernel/fadump_internal.h +++ b/arch/powerpc/kernel/fadump_internal.h @@ -47,6 +47,12 @@ #define FADUMP_UNREGISTER 2 #define FADUMP_INVALIDATE 3 +/* Firmware-Assited Dump platforms */ +enum fadump_platform_type { + FADUMP_PLATFORM_UNKNOWN = 0, + FADUMP_PLATFORM_PSERIES, +}; + #define FADUMP_CPU_ID_MASK ((1UL << 32) - 1) #define CPU_UNKNOWN (~((u32)0)) @@ -91,6 +97,9 @@ struct fad_crash_memory_ranges { unsigned long long size; }; +/* Platform specific callback functions */ +struct fadump_ops; + /* Firmware-assisted dump configuration details. */ struct fw_dump { unsigned long cpu_state_data_size; @@ -98,6 +107,8 @@ struct fw_dump { unsigned long boot_memory_size; unsigned long reserve_dump_area_start; unsigned long reserve_dump_area_size; + unsigned long meta_area_start; + unsigned long preserv_area_start; /* cmd line option during boot */ unsigned long reserve_bootvar; @@ -105,6 +116,9 @@ struct fw_dump { unsigned long cpu_notes_buf; unsigned long cpu_notes_buf_size; + unsigned long rmr_source_len; + unsigned long rmr_destination_addr; + int ibm_configure_kernel_dump; unsigned long fadump_enabled:1; @@ -112,6 +126,20 @@ struct fw_dump { unsigned long dump_active:1; unsigned long dump_registered:1; unsigned long nocma:1; + + enum fadump_platform_type fadump_platform; + struct fadump_ops *ops; +}; + +struct fadump_ops { + ulong (*init_fadump_mem_struct)(struct fw_dump *fadump_config); + int (*register_fadump)(struct fw_dump *fadump_config); + int (*unregister_fadump)(struct fw_dump *fadump_config); + int (*invalidate_fadump)(struct fw_dump *fadump_config); + int (*process_fadump)(struct fw_dump *fadump_config); + void (*fadump_region_show)(struct fw_dump *fadump_config, + struct seq_file *m); + void (*crash_fadump)(const char *msg); }; /* Helper functions */ @@ -123,4 +151,14 @@ void fadump_update_elfcore_header(struct fw_dump *fadump_config, char *bufp); int is_boot_memory_area_contiguous(struct fw_dump *fadump_conf); int is_reserved_memory_area_contiguous(struct fw_dump *fadump_conf); +#ifdef CONFIG_PPC_PSERIES +extern int pseries_dt_scan_fadump(struct fw_dump *fadump_config, ulong node); +#else +static inline int +pseries_dt_scan_fadump(struct fw_dump *fadump_config, ulong node) +{ + return 1; +} +#endif + #endif /* __PPC64_FA_DUMP_INTERNAL_H__ */ diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile index a43ec84..dace1a4 100644 --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -25,6 +25,7 @@ obj-$(CONFIG_LPARCFG) += lparcfg.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_IBMEBUS) += ibmebus.o obj-$(CONFIG_PAPR_SCM) += papr_scm.o +obj-$(CONFIG_FA_DUMP) += pseries_fadump.o ifdef CONFIG_PPC_PSERIES obj-$(CONFIG_SUSPEND) += suspend.o diff --git a/arch/powerpc/platforms/pseries/pseries_fadump.c b/arch/powerpc/platforms/pseries/pseries_fadump.c new file mode 100644 index 0000000..5450d2b --- /dev/null +++ b/arch/powerpc/platforms/pseries/pseries_fadump.c @@ -0,0 +1,537 @@ +/* + * Firmware-Assisted Dump support on POWERVM platform. + * + * Copyright 2011, IBM Corporation + * Author: Mahesh Salgaonkar + * + * Copyright 2018-2019, IBM Corp. + * Author: Hari Bathini + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#undef DEBUG +#define pr_fmt(fmt) "pseries fadump: " fmt + +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include "../../kernel/fadump_internal.h" +#include "pseries_fadump.h" + +static struct pseries_fadump_mem_struct fdm; +static const struct pseries_fadump_mem_struct *fdm_active; + +static void pseries_set_preserv_area_start(struct fw_dump *fadump_conf) +{ + const struct pseries_fadump_mem_struct *fdm_ptr; + + if (fdm_active) + fdm_ptr = fdm_active; + else + fdm_ptr = &fdm; + + fadump_conf->preserv_area_start = + be64_to_cpu(fdm_ptr->cpu_state_data.destination_address); + + pr_debug("Preserve area start address: 0x%lx\n", + fadump_conf->preserv_area_start); +} + +static void pseries_set_meta_area_start(struct fw_dump *fadump_conf) +{ + fadump_conf->meta_area_start = (fadump_conf->rmr_destination_addr + + fadump_conf->rmr_source_len); + + pr_debug("Meta area start address: 0x%lx\n", + fadump_conf->meta_area_start); +} + +static void update_fadump_config(struct fw_dump *fadump_conf, + const struct pseries_fadump_mem_struct *fdm) +{ + fadump_conf->rmr_destination_addr = + be64_to_cpu(fdm->rmr_region.destination_address); + + if (fadump_conf->dump_active) { + fadump_conf->rmr_source_len = + be64_to_cpu(fdm->rmr_region.source_len); + } + + pseries_set_meta_area_start(fadump_conf); + pseries_set_preserv_area_start(fadump_conf); +} + +static ulong pseries_init_fadump_mem_struct(struct fw_dump *fadump_conf) +{ + ulong addr = fadump_conf->reserve_dump_area_start; + + memset(&fdm, 0, sizeof(struct pseries_fadump_mem_struct)); + addr = addr & PAGE_MASK; + + fdm.header.dump_format_version = cpu_to_be32(0x00000001); + fdm.header.dump_num_sections = cpu_to_be16(3); + fdm.header.dump_status_flag = 0; + fdm.header.offset_first_dump_section = + cpu_to_be32((u32)offsetof(struct pseries_fadump_mem_struct, + cpu_state_data)); + + /* + * Fields for disk dump option. + * We are not using disk dump option, hence set these fields to 0. + */ + fdm.header.dd_block_size = 0; + fdm.header.dd_block_offset = 0; + fdm.header.dd_num_blocks = 0; + fdm.header.dd_offset_disk_path = 0; + + /* set 0 to disable an automatic dump-reboot. */ + fdm.header.max_time_auto = 0; + + /* Kernel dump sections */ + /* cpu state data section. */ + fdm.cpu_state_data.request_flag = + cpu_to_be32(PSERIES_FADUMP_REQUEST_FLAG); + fdm.cpu_state_data.source_data_type = + cpu_to_be16(PSERIES_FADUMP_CPU_STATE_DATA); + fdm.cpu_state_data.source_address = 0; + fdm.cpu_state_data.source_len = + cpu_to_be64(fadump_conf->cpu_state_data_size); + fdm.cpu_state_data.destination_address = cpu_to_be64(addr); + addr += fadump_conf->cpu_state_data_size; + + /* hpte region section */ + fdm.hpte_region.request_flag = cpu_to_be32(PSERIES_FADUMP_REQUEST_FLAG); + fdm.hpte_region.source_data_type = + cpu_to_be16(PSERIES_FADUMP_HPTE_REGION); + fdm.hpte_region.source_address = 0; + fdm.hpte_region.source_len = + cpu_to_be64(fadump_conf->hpte_region_size); + fdm.hpte_region.destination_address = cpu_to_be64(addr); + addr += fadump_conf->hpte_region_size; + + /* RMA region section */ + fdm.rmr_region.request_flag = cpu_to_be32(PSERIES_FADUMP_REQUEST_FLAG); + fdm.rmr_region.source_data_type = + cpu_to_be16(PSERIES_FADUMP_REAL_MODE_REGION); + fdm.rmr_region.source_address = cpu_to_be64(RMA_START); + fdm.rmr_region.source_len = + cpu_to_be64(fadump_conf->boot_memory_size); + fdm.rmr_region.destination_address = cpu_to_be64(addr); + addr += fadump_conf->boot_memory_size; + + update_fadump_config(fadump_conf, &fdm); + + return addr; +} + +static int pseries_register_fadump(struct fw_dump *fadump_conf) +{ + int rc, err; + unsigned int wait_time; + + /* TODO: Add upper time limit for the delay */ + do { + rc = rtas_call(fadump_conf->ibm_configure_kernel_dump, 3, 1, + NULL, FADUMP_REGISTER, &fdm, + sizeof(struct pseries_fadump_mem_struct)); + + wait_time = rtas_busy_delay_time(rc); + if (wait_time) + mdelay(wait_time); + + } while (wait_time); + + err = -EIO; + switch (rc) { + default: + pr_err("Failed to register. Unknown Error(%d).\n", rc); + break; + case -1: + pr_err("Failed to register. Hardware Error(%d).\n", rc); + break; + case -3: + if (!is_boot_memory_area_contiguous(fadump_conf)) + pr_err("Can't hot-remove boot memory area.\n"); + else if (!is_reserved_memory_area_contiguous(fadump_conf)) + pr_err("Can't hot-remove reserved memory area.\n"); + + pr_err("Failed to register. Parameter Error(%d).\n", rc); + err = -EINVAL; + break; + case -9: + pr_err("Already registered!\n"); + fadump_conf->dump_registered = 1; + err = -EEXIST; + break; + case 0: + pr_err("Registration is successful!\n"); + fadump_conf->dump_registered = 1; + err = 0; + break; + } + + return err; +} + +static int pseries_unregister_fadump(struct fw_dump *fadump_conf) +{ + int rc = 0; + unsigned int wait_time; + + /* TODO: Add upper time limit for the delay */ + do { + rc = rtas_call(fadump_conf->ibm_configure_kernel_dump, 3, 1, + NULL, FADUMP_UNREGISTER, &fdm, + sizeof(struct pseries_fadump_mem_struct)); + + wait_time = rtas_busy_delay_time(rc); + if (wait_time) + mdelay(wait_time); + } while (wait_time); + + if (rc) { + pr_err("Failed to un-register - unexpected error(%d).\n", rc); + return rc; + } + + fadump_conf->dump_registered = 0; + return 0; +} + +static int pseries_invalidate_fadump(struct fw_dump *fadump_conf) +{ + int rc = 0; + unsigned int wait_time; + + /* TODO: Add upper time limit for the delay */ + do { + rc = rtas_call(fadump_conf->ibm_configure_kernel_dump, 3, 1, + NULL, FADUMP_INVALIDATE, fdm_active, + sizeof(struct pseries_fadump_mem_struct)); + + wait_time = rtas_busy_delay_time(rc); + if (wait_time) + mdelay(wait_time); + } while (wait_time); + + if (rc) { + pr_err("Failed to invalidate - unexpected error (%d).\n", rc); + return rc; + } + + fadump_conf->dump_active = 0; + fdm_active = NULL; + return 0; +} + +static struct fadump_reg_entry* +fadump_read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs) +{ + memset(regs, 0, sizeof(struct pt_regs)); + + while (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUEND")) { + fadump_set_regval(regs, be64_to_cpu(reg_entry->reg_id), + be64_to_cpu(reg_entry->reg_value)); + reg_entry++; + } + reg_entry++; + return reg_entry; +} + +/* + * Read CPU state dump data and convert it into ELF notes. + * The CPU dump starts with magic number "REGSAVE". NumCpusOffset should be + * used to access the data to allow for additional fields to be added without + * affecting compatibility. Each list of registers for a CPU starts with + * "CPUSTRT" and ends with "CPUEND". Each register entry is of 16 bytes, + * 8 Byte ASCII identifier and 8 Byte register value. The register entry + * with identifier "CPUSTRT" and "CPUEND" contains 4 byte cpu id as part + * of register value. For more details refer to PAPR document. + * + * Only for the crashing cpu we ignore the CPU dump data and get exact + * state from fadump crash info structure populated by first kernel at the + * time of crash. + */ +static int __init fadump_build_cpu_notes(struct fw_dump *fadump_conf) +{ + struct fadump_reg_save_area_header *reg_header; + struct fadump_reg_entry *reg_entry; + struct fadump_crash_info_header *fdh = NULL; + void *vaddr; + unsigned long addr; + u32 num_cpus, *note_buf; + struct pt_regs regs; + int i, rc = 0, cpu = 0; + + addr = be64_to_cpu(fdm_active->cpu_state_data.destination_address); + vaddr = __va(addr); + + reg_header = vaddr; + if (be64_to_cpu(reg_header->magic_number) != REGSAVE_AREA_MAGIC) { + pr_err("Unable to read register save area.\n"); + return -ENOENT; + } + + pr_debug("--------CPU State Data------------\n"); + pr_debug("Magic Number: %llx\n", be64_to_cpu(reg_header->magic_number)); + pr_debug("NumCpuOffset: %x\n", be32_to_cpu(reg_header->num_cpu_offset)); + + vaddr += be32_to_cpu(reg_header->num_cpu_offset); + num_cpus = be32_to_cpu(*((__be32 *)(vaddr))); + pr_debug("NumCpus : %u\n", num_cpus); + vaddr += sizeof(u32); + reg_entry = (struct fadump_reg_entry *)vaddr; + + /* Allocate buffer to hold cpu crash notes. */ + fadump_conf->cpu_notes_buf_size = num_cpus * sizeof(note_buf_t); + fadump_conf->cpu_notes_buf_size = + PAGE_ALIGN(fadump_conf->cpu_notes_buf_size); + note_buf = fadump_cpu_notes_buf_alloc(fadump_conf->cpu_notes_buf_size); + if (!note_buf) { + pr_err("Failed to allocate 0x%lx bytes for cpu notes buffer\n", + fadump_conf->cpu_notes_buf_size); + return -ENOMEM; + } + fadump_conf->cpu_notes_buf = __pa(note_buf); + + pr_debug("Allocated buffer for cpu notes of size %ld at %p\n", + (num_cpus * sizeof(note_buf_t)), note_buf); + + if (fadump_conf->fadumphdr_addr) + fdh = __va(fadump_conf->fadumphdr_addr); + + for (i = 0; i < num_cpus; i++) { + if (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUSTRT")) { + pr_err("Unable to read CPU state data\n"); + rc = -ENOENT; + goto error_out; + } + /* Lower 4 bytes of reg_value contains logical cpu id */ + cpu = be64_to_cpu(reg_entry->reg_value) & FADUMP_CPU_ID_MASK; + if (fdh && !cpumask_test_cpu(cpu, &fdh->online_mask)) { + SKIP_TO_NEXT_CPU(reg_entry); + continue; + } + pr_debug("Reading register data for cpu %d...\n", cpu); + if (fdh && fdh->crashing_cpu == cpu) { + regs = fdh->regs; + note_buf = fadump_regs_to_elf_notes(note_buf, ®s); + SKIP_TO_NEXT_CPU(reg_entry); + } else { + reg_entry++; + reg_entry = fadump_read_registers(reg_entry, ®s); + note_buf = fadump_regs_to_elf_notes(note_buf, ®s); + } + } + final_note(note_buf); + + if (fdh) { + pr_debug("Updating elfcore header (%llx) with cpu notes\n", + fdh->elfcorehdr_addr); + fadump_update_elfcore_header(fadump_conf, + __va(fdh->elfcorehdr_addr)); + } + return 0; + +error_out: + fadump_cpu_notes_buf_free((ulong)__va(fadump_conf->cpu_notes_buf), + fadump_conf->cpu_notes_buf_size); + fadump_conf->cpu_notes_buf = 0; + fadump_conf->cpu_notes_buf_size = 0; + return rc; + +} + +/* + * Validate and process the dump data stored by firmware before exporting + * it through '/proc/vmcore'. + */ +static int __init pseries_process_fadump(struct fw_dump *fadump_conf) +{ + struct fadump_crash_info_header *fdh; + int rc = 0; + + if (!fdm_active || !fadump_conf->fadumphdr_addr) + return -EINVAL; + + /* Check if the dump data is valid. */ + if ((be16_to_cpu(fdm_active->header.dump_status_flag) == + PSERIES_FADUMP_ERROR_FLAG) || + (fdm_active->cpu_state_data.error_flags != 0) || + (fdm_active->rmr_region.error_flags != 0)) { + pr_err("Dump taken by platform is not valid\n"); + return -EINVAL; + } + if ((fdm_active->rmr_region.bytes_dumped != + fdm_active->rmr_region.source_len) || + !fdm_active->cpu_state_data.bytes_dumped) { + pr_err("Dump taken by platform is incomplete\n"); + return -EINVAL; + } + + /* Validate the fadump crash info header */ + fdh = __va(fadump_conf->fadumphdr_addr); + if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { + pr_err("Crash info header is not valid.\n"); + return -EINVAL; + } + + if (!fdm_active->cpu_state_data.bytes_dumped) + return -EINVAL; + + rc = fadump_build_cpu_notes(fadump_conf); + if (rc) + return rc; + + /* + * We are done validating dump info and elfcore header is now ready + * to be exported. set elfcorehdr_addr so that vmcore module will + * export the elfcore header through '/proc/vmcore'. + */ + elfcorehdr_addr = fdh->elfcorehdr_addr; + + return 0; +} + +static void pseries_fadump_region_show(struct fw_dump *fadump_conf, + struct seq_file *m) +{ + const struct pseries_fadump_mem_struct *fdm_ptr; + const struct pseries_fadump_section *cpu_data_section; + + if (fdm_active) + fdm_ptr = fdm_active; + else + fdm_ptr = &fdm; + + cpu_data_section = &(fdm_ptr->cpu_state_data); + seq_printf(m, + "CPU :[%#016llx-%#016llx] %#llx bytes, Dumped: %#llx\n", + be64_to_cpu(cpu_data_section->destination_address), + be64_to_cpu(cpu_data_section->destination_address) + + be64_to_cpu(cpu_data_section->source_len) - 1, + be64_to_cpu(cpu_data_section->source_len), + be64_to_cpu(cpu_data_section->bytes_dumped)); + seq_printf(m, + "HPTE:[%#016llx-%#016llx] %#llx bytes, Dumped: %#llx\n", + be64_to_cpu(fdm_ptr->hpte_region.destination_address), + be64_to_cpu(fdm_ptr->hpte_region.destination_address) + + be64_to_cpu(fdm_ptr->hpte_region.source_len) - 1, + be64_to_cpu(fdm_ptr->hpte_region.source_len), + be64_to_cpu(fdm_ptr->hpte_region.bytes_dumped)); + seq_printf(m, + "DUMP:[%#016llx-%#016llx] %#llx bytes, Dumped: %#llx\n", + be64_to_cpu(fdm_ptr->rmr_region.destination_address), + be64_to_cpu(fdm_ptr->rmr_region.destination_address) + + be64_to_cpu(fdm_ptr->rmr_region.source_len) - 1, + be64_to_cpu(fdm_ptr->rmr_region.source_len), + be64_to_cpu(fdm_ptr->rmr_region.bytes_dumped)); + + if (!fdm_active || + (fadump_conf->reserve_dump_area_start == + be64_to_cpu(cpu_data_section->destination_address))) + return; + + /* Dump is active. Show reserved memory region. */ + seq_printf(m, + " :[%#016lx-%#016llx] %#llx bytes, Dumped: %#llx\n", + fadump_conf->reserve_dump_area_start, + be64_to_cpu(cpu_data_section->destination_address) - 1, + be64_to_cpu(cpu_data_section->destination_address) - + fadump_conf->reserve_dump_area_start, + be64_to_cpu(cpu_data_section->destination_address) - + fadump_conf->reserve_dump_area_start); +} + +static void pseries_crash_fadump(const char *msg) +{ + /* Call ibm,os-term rtas call to trigger firmware assisted dump */ + rtas_os_term((char *)msg); +} + + +static struct fadump_ops pseries_fadump_ops = { + .init_fadump_mem_struct = pseries_init_fadump_mem_struct, + .register_fadump = pseries_register_fadump, + .unregister_fadump = pseries_unregister_fadump, + .invalidate_fadump = pseries_invalidate_fadump, + .process_fadump = pseries_process_fadump, + .fadump_region_show = pseries_fadump_region_show, + .crash_fadump = pseries_crash_fadump, +}; + +int __init pseries_dt_scan_fadump(struct fw_dump *fadump_conf, ulong node) +{ + const __be32 *sections; + int i, num_sections; + int size; + const __be32 *token; + + /* + * Check if Firmware Assisted dump is supported. if yes, check + * if dump has been initiated on last reboot. + */ + token = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL); + if (!token) + return 1; + + fadump_conf->ibm_configure_kernel_dump = be32_to_cpu(*token); + + /* + * The 'ibm,kernel-dump' rtas node is present only if there is + * dump data waiting for us. + */ + fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL); + if (fdm_active) { + pr_info("Firmware-assisted dump is active.\n"); + fadump_conf->dump_active = 1; + update_fadump_config(fadump_conf, fdm_active); + } + + /* Get the sizes required to store dump data for the firmware provided + * dump sections. + * For each dump section type supported, a 32bit cell which defines + * the ID of a supported section followed by two 32 bit cells which + * gives the size of the section in bytes. + */ + sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes", + &size); + + if (!sections) + return 1; + + num_sections = size / (3 * sizeof(u32)); + + for (i = 0; i < num_sections; i++, sections += 3) { + u32 type = (u32)of_read_number(sections, 1); + + switch (type) { + case PSERIES_FADUMP_CPU_STATE_DATA: + fadump_conf->cpu_state_data_size = + of_read_ulong(§ions[1], 2); + break; + case PSERIES_FADUMP_HPTE_REGION: + fadump_conf->hpte_region_size = + of_read_ulong(§ions[1], 2); + break; + } + } + + fadump_conf->ops = &pseries_fadump_ops; + fadump_conf->fadump_platform = FADUMP_PLATFORM_PSERIES; + fadump_conf->fadump_supported = 1; + + return 1; +} diff --git a/arch/powerpc/platforms/pseries/pseries_fadump.h b/arch/powerpc/platforms/pseries/pseries_fadump.h new file mode 100644 index 0000000..d61e5d9 --- /dev/null +++ b/arch/powerpc/platforms/pseries/pseries_fadump.h @@ -0,0 +1,96 @@ +/* + * Firmware-Assisted Dump support on POWERVM platform. + * + * Copyright 2018-2019, IBM Corp. + * Author: Hari Bathini + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef __PPC64_PSERIES_FA_DUMP_H__ +#define __PPC64_PSERIES_FA_DUMP_H__ + +/* Firmware provided dump sections */ +#define PSERIES_FADUMP_CPU_STATE_DATA 0x0001 +#define PSERIES_FADUMP_HPTE_REGION 0x0002 +#define PSERIES_FADUMP_REAL_MODE_REGION 0x0011 + +/* Dump request flag */ +#define PSERIES_FADUMP_REQUEST_FLAG 0x00000001 + +/* Dump status flag */ +#define PSERIES_FADUMP_ERROR_FLAG 0x2000 + +/* Utility macros */ +#define SKIP_TO_NEXT_CPU(reg_entry) \ +({ \ + while (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUEND")) \ + reg_entry++; \ + reg_entry++; \ +}) + +/* Kernel Dump section info */ +struct pseries_fadump_section { + __be32 request_flag; + __be16 source_data_type; + __be16 error_flags; + __be64 source_address; + __be64 source_len; + __be64 bytes_dumped; + __be64 destination_address; +}; + +/* ibm,configure-kernel-dump header. */ +struct pseries_fadump_section_header { + __be32 dump_format_version; + __be16 dump_num_sections; + __be16 dump_status_flag; + __be32 offset_first_dump_section; + + /* Fields for disk dump option. */ + __be32 dd_block_size; + __be64 dd_block_offset; + __be64 dd_num_blocks; + __be32 dd_offset_disk_path; + + /* Maximum time allowed to prevent an automatic dump-reboot. */ + __be32 max_time_auto; +}; + +/* + * Firmware Assisted dump memory structure. This structure is required for + * registering future kernel dump with power firmware through rtas call. + * + * No disk dump option. Hence disk dump path string section is not included. + */ +struct pseries_fadump_mem_struct { + struct pseries_fadump_section_header header; + + /* Kernel dump sections */ + struct pseries_fadump_section cpu_state_data; + struct pseries_fadump_section hpte_region; + struct pseries_fadump_section rmr_region; +}; + +#define REGSAVE_AREA_MAGIC STR_TO_HEX("REGSAVE") + +/* The firmware-assisted dump format. + * + * The register save area is an area in the partition's memory used to preserve + * the register contents (CPU state data) for the active CPUs during a firmware + * assisted dump. The dump format contains register save area header followed + * by register entries. On pseries, each list of registers for a CPU starts with + * "CPUSTRT" and ends with "CPUEND". + */ + +/* Register save area header. */ +struct fadump_reg_save_area_header { + __be64 magic_number; + __be32 version; + __be32 num_cpu_offset; +}; + +#endif /* __PPC64_PSERIES_FA_DUMP_H__ */ From patchwork Thu Dec 20 19:00:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Hari Bathini X-Patchwork-Id: 1016976 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLvS4g6Kz9sC7 for ; Fri, 21 Dec 2018 06:11:00 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43LLvR1qQGzDqXb for ; Fri, 21 Dec 2018 06:10:59 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43LLgn438czDr5k for ; Fri, 21 Dec 2018 06:00:53 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) by bilbo.ozlabs.org (Postfix) with ESMTP id 43LLgn2LH9z8tRf for ; Fri, 21 Dec 2018 06:00:53 +1100 (AEDT) Received: by ozlabs.org (Postfix) id 43LLgn1yJMz9sCh; Fri, 21 Dec 2018 06:00:53 +1100 (AEDT) Delivered-To: linuxppc-dev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=hbathini@linux.ibm.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLgm3t94z9sCQ for ; Fri, 21 Dec 2018 06:00:52 +1100 (AEDT) Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBKIwkjf045000 for ; Thu, 20 Dec 2018 14:00:50 -0500 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2pgg5phhbr-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Dec 2018 14:00:49 -0500 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Dec 2018 19:00:48 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Dec 2018 19:00:46 -0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBKJ0ipO36831248 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 20 Dec 2018 19:00:44 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8CC794203F; Thu, 20 Dec 2018 19:00:44 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 18F0A42045; Thu, 20 Dec 2018 19:00:43 +0000 (GMT) Received: from hbathini.in.ibm.com (unknown [9.199.47.3]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 20 Dec 2018 19:00:42 +0000 (GMT) Subject: [PATCH 4/9] powerpc/fadump: enable fadump support on OPAL based POWER platform From: Hari Bathini To: Ananth N Mavinakayanahalli , Michael Ellerman , Mahesh J Salgaonkar , Vasant Hegde , linuxppc-dev , Stewart Smith Date: Fri, 21 Dec 2018 00:30:41 +0530 In-Reply-To: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> References: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18122019-0020-0000-0000-000002FA715D X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18122019-0021-0000-0000-0000214A84EB Message-Id: <154533244145.28973.4771785304842251623.stgit@hbathini.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-12-20_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812200154 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Hari Bathini Firmware-assisted dump support is enabled for OPAL based POWER platforms in P9 firmware. Make the corresponding updates in kernel to enable fadump support for such platforms. Signed-off-by: Hari Bathini --- arch/powerpc/Kconfig | 5 arch/powerpc/include/asm/opal-api.h | 35 ++ arch/powerpc/include/asm/opal.h | 1 arch/powerpc/kernel/fadump.c | 243 +++++++++++---- arch/powerpc/kernel/fadump_internal.c | 27 +- arch/powerpc/kernel/fadump_internal.h | 44 ++- arch/powerpc/platforms/powernv/Makefile | 1 arch/powerpc/platforms/powernv/opal-fadump.c | 373 +++++++++++++++++++++++ arch/powerpc/platforms/powernv/opal-fadump.h | 40 ++ arch/powerpc/platforms/powernv/opal-wrappers.S | 1 arch/powerpc/platforms/pseries/pseries_fadump.c | 18 - 11 files changed, 705 insertions(+), 83 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/opal-fadump.c create mode 100644 arch/powerpc/platforms/powernv/opal-fadump.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8be3126..08add7a 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -565,7 +565,7 @@ config CRASH_DUMP config FA_DUMP bool "Firmware-assisted dump" - depends on PPC64 && PPC_RTAS + depends on PPC64 && (PPC_RTAS || PPC_POWERNV) select CRASH_CORE select CRASH_DUMP help @@ -576,7 +576,8 @@ config FA_DUMP is meant to be a kdump replacement offering robustness and speed not possible without system firmware assistance. - If unsure, say "N" + If unsure, say "y". Only special kernels like petitboot may + need to say "N" here. config IRQ_ALL_CPUS bool "Distribute interrupts on all CPUs by default" diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 870fb7b..6076e51 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -210,7 +210,8 @@ #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR 164 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR 165 #define OPAL_NX_COPROC_INIT 167 -#define OPAL_LAST 167 +#define OPAL_CONFIGURE_FADUMP 170 +#define OPAL_LAST 170 #define QUIESCE_HOLD 1 /* Spin all calls at entry */ #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */ @@ -972,6 +973,37 @@ struct opal_sg_list { }; /* + * Firmware-Assisted Dump (FADump) + */ + +/* The maximum number of dump sections supported by OPAL */ +#define OPAL_FADUMP_NR_SECTIONS 64 + +/* Kernel Dump section info */ +struct opal_fadump_section { + u8 src_type; + u8 reserved[7]; + __be64 src_addr; + __be64 src_size; + __be64 dest_addr; + __be64 dest_size; +}; + +/* + * FADump memory structure for registering dump support with + * POWER f/w through opal call. + */ +struct opal_fadump_mem_struct { + + __be16 section_size; /*sizeof(struct fadump_section) */ + __be16 section_count; /* number of sections */ + __be32 crashing_cpu; /* Thread on which OPAL crashed */ + __be64 reserved; + + struct opal_fadump_section section[OPAL_FADUMP_NR_SECTIONS]; +}; + +/* * Dump region ID range usable by the OS */ #define OPAL_DUMP_REGION_HOST_START 0x80 @@ -1051,6 +1083,7 @@ enum { OPAL_REBOOT_NORMAL = 0, OPAL_REBOOT_PLATFORM_ERROR = 1, OPAL_REBOOT_FULL_IPL = 2, + OPAL_REBOOT_OS_ERROR = 3, }; /* Argument to OPAL_PCI_TCE_KILL */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index ff38664..08cc09f 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -43,6 +43,7 @@ int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn, uint64_t PE_handle); int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap, uint64_t rate_phys, uint32_t size); +int64_t opal_configure_fadump(uint64_t command, void *data, uint64_t data_size); int64_t opal_console_write(int64_t term_number, __be64 *length, const uint8_t *buffer); int64_t opal_console_read(int64_t term_number, __be64 *length, diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 36d9d48..190f7ed 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -46,12 +46,13 @@ #include "fadump_internal.h" static struct fw_dump fw_dump; + #ifdef CONFIG_CMA static struct cma *fadump_cma; #endif static DEFINE_MUTEX(fadump_mutex); -struct fad_crash_memory_ranges *crash_memory_ranges; +struct fadump_memory_range *crash_memory_ranges; int crash_memory_ranges_size; int crash_mem_ranges; int max_crash_mem_ranges; @@ -127,6 +128,9 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, if (strcmp(uname, "rtas") == 0) return pseries_dt_scan_fadump(&fw_dump, node); + if (strcmp(uname, "ibm,opal") == 0) + return opal_dt_scan_fadump(&fw_dump, node); + return 0; } @@ -138,6 +142,7 @@ int is_fadump_memory_area(u64 addr, ulong size) { u64 d_start = fw_dump.reserve_dump_area_start; u64 d_end = d_start + fw_dump.reserve_dump_area_size; + u64 b_end = fw_dump.boot_memory_size + fw_dump.boot_memory_hole_size; if (!fw_dump.dump_registered) return 0; @@ -145,7 +150,7 @@ int is_fadump_memory_area(u64 addr, ulong size) if (((addr + size) > d_start) && (addr <= d_end)) return 1; - return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size; + return (((addr + size) > RMA_START) && (addr <= b_end)); } int should_fadump_crash(void) @@ -163,6 +168,8 @@ int is_fadump_active(void) /* Print firmware assisted dump configurations for debugging purpose. */ static void fadump_show_config(void) { + int i; + pr_debug("Support for firmware-assisted dump (fadump): %s\n", (fw_dump.fadump_supported ? "present" : "no support")); @@ -177,6 +184,13 @@ static void fadump_show_config(void) pr_debug(" CPU state data size: %lx\n", fw_dump.cpu_state_data_size); pr_debug(" HPTE region size : %lx\n", fw_dump.hpte_region_size); pr_debug("Boot memory size : %lx\n", fw_dump.boot_memory_size); + pr_debug("Real memory region hole size : %lx\n", + fw_dump.boot_memory_hole_size); + pr_debug("Real meory regions count : %lx\n", fw_dump.rmr_regions_cnt); + for (i = 0; i < fw_dump.rmr_regions_cnt; i++) { + pr_debug("%d. RMR base = %lx, size = %lx\n", (i+1), + fw_dump.rmr_src_addr[i], fw_dump.rmr_src_size[i]); + } } /** @@ -271,23 +285,108 @@ static unsigned long get_fadump_area_size(void) return size; } -static void __init fadump_reserve_crash_area(unsigned long base, - unsigned long size) +static int __init add_rmr_region(unsigned long rmr_start, + unsigned long rmr_size) { + int i = fw_dump.rmr_regions_cnt++; + + if (fw_dump.rmr_regions_cnt > MAX_REAL_MEM_REGIONS) + return 0; + + pr_debug("Added real memory range[%d] [%#016lx-%#016lx)\n", + i, rmr_start, (rmr_start + rmr_size)); + fw_dump.rmr_src_addr[i] = rmr_start; + fw_dump.rmr_src_size[i] = rmr_size; + return 1; +} + +/* + * Platforms like PowerNV have an upper limit on the size. + * If 'rmr_size' is bigger than that limit, split this memory range + * into multiple entries. + */ +static int __init add_rmr_regions(unsigned long rmr_start, + unsigned long rmr_size) +{ + unsigned long rstart, rsize, max_size; + int ret = 1; + + rstart = rmr_start; + max_size = fw_dump.max_copy_size ? fw_dump.max_copy_size : rmr_size; + while (rmr_size) { + if (rmr_size > max_size) + rsize = max_size; + else + rsize = rmr_size; + + ret = add_rmr_region(rstart, rsize); + if (!ret) + break; + + rmr_size -= rsize; + rstart += rsize; + } + + return ret; +} + +static int __init fadump_get_rmr_regions(void) +{ + int ret = 1; struct memblock_region *reg; - unsigned long mstart, mend, msize; + unsigned long base, size, cur_size, last_end; + unsigned long mem_size = fw_dump.boot_memory_size; + fw_dump.rmr_regions_cnt = 0; + fw_dump.boot_memory_hole_size = 0; + + last_end = 0; + cur_size = 0; for_each_memblock(memory, reg) { - mstart = max_t(unsigned long, base, reg->base); - mend = reg->base + reg->size; - mend = min(base + size, mend); - - if (mstart < mend) { - msize = mend - mstart; - memblock_reserve(mstart, msize); - pr_info("Reserved %ldMB of memory at %#016lx for saving crash dump\n", - (msize >> 20), mstart); + base = reg->base; + size = reg->size; + + if (base > last_end) + fw_dump.boot_memory_hole_size += (base - last_end); + + if ((cur_size + size) >= mem_size) { + size = (mem_size - cur_size); + ret = add_rmr_regions(base, size); + break; } + + mem_size -= size; + cur_size += size; + ret = add_rmr_regions(base, size); + if (!ret) + break; + + last_end = base + size; + } + + return ret; +} + +/* Preserve everything above the base address */ +static void __init fadump_reserve_crash_area(unsigned long base) +{ + struct memblock_region *reg; + unsigned long mstart, msize; + + for_each_memblock(memory, reg) { + mstart = reg->base; + msize = reg->size; + + if ((mstart + msize) < base) + continue; + + if (mstart < base) { + msize -= (base - mstart); + mstart = base; + } + pr_info("Reserving %luMB of memory at %#016lx for saving crash dump", + (msize >> 20), mstart); + memblock_reserve(mstart, msize); } } @@ -304,6 +403,7 @@ int __init fadump_reserve_mem(void) fw_dump.fadump_enabled = 0; return 0; } + /* * Initialize boot memory size * If dump is active then we have already calculated the size during @@ -320,9 +420,15 @@ int __init fadump_reserve_mem(void) FADUMP_CMA_ALIGNMENT); #endif fw_dump.rmr_source_len = fw_dump.boot_memory_size; + if (!fadump_get_rmr_regions()) { + fw_dump.fadump_enabled = 0; + pr_err("Too many holes in boot memory area to enable fadump\n"); + return 0; + } } size = get_fadump_area_size(); + fw_dump.reserve_dump_area_size = size; /* * Calculate the memory boundary. @@ -343,6 +449,8 @@ int __init fadump_reserve_mem(void) else memory_boundary = memblock_end_of_DRAM(); + base = fw_dump.boot_memory_size + fw_dump.boot_memory_hole_size; + base = PAGE_ALIGN(base); if (fw_dump.dump_active) { #ifdef CONFIG_HUGETLB_PAGE /* @@ -354,18 +462,15 @@ int __init fadump_reserve_mem(void) #endif /* * If last boot has crashed then reserve all the memory - * above boot_memory_size so that we don't touch it until + * above boot memory size so that we don't touch it until * dump is written to disk by userspace tool. This memory - * will be released for general use once the dump is saved. + * can be released for general use by invalidating fadump. */ - base = fw_dump.boot_memory_size; - size = memory_boundary - base; - fadump_reserve_crash_area(base, size); + fadump_reserve_crash_area(base); fw_dump.fadumphdr_addr = fw_dump.meta_area_start; pr_debug("fadumphdr_addr = %#016lx\n", fw_dump.fadumphdr_addr); fw_dump.reserve_dump_area_start = base; - fw_dump.reserve_dump_area_size = size; } else { /* * Reserve memory at an offset closer to bottom of the RAM to @@ -373,27 +478,25 @@ int __init fadump_reserve_mem(void) * use memblock_find_in_range() here since it doesn't allocate * from bottom to top. */ - for (base = fw_dump.boot_memory_size; - base <= (memory_boundary - size); - base += size) { + while (base <= (memory_boundary - size)) { if (memblock_is_region_memory(base, size) && !memblock_is_region_reserved(base, size)) break; + + base += size; } + if ((base > (memory_boundary - size)) || memblock_reserve(base, size)) { pr_err("Failed to reserve memory\n"); return 0; } - pr_info("Reserved %ldMB of memory at %ldMB for firmware-" - "assisted dump (System RAM: %ldMB)\n", - (unsigned long)(size >> 20), - (unsigned long)(base >> 20), + pr_info("Reserved %ldMB of memory at %#016lx (System RAM: %ldMB)\n", + (unsigned long)(size >> 20), base, (unsigned long)(memblock_phys_mem_size() >> 20)); fw_dump.reserve_dump_area_start = base; - fw_dump.reserve_dump_area_size = size; return fadump_cma_init(); } return 1; @@ -495,7 +598,7 @@ static void free_crash_memory_ranges(void) */ static int allocate_crash_memory_ranges(void) { - struct fad_crash_memory_ranges *new_array; + struct fadump_memory_range *new_array; u64 new_size; new_size = crash_memory_ranges_size + PAGE_SIZE; @@ -512,7 +615,7 @@ static int allocate_crash_memory_ranges(void) crash_memory_ranges = new_array; crash_memory_ranges_size = new_size; max_crash_mem_ranges = (new_size / - sizeof(struct fad_crash_memory_ranges)); + sizeof(struct fadump_memory_range)); return 0; } @@ -624,36 +727,40 @@ static int fadump_init_elfcore_header(char *bufp) static int fadump_setup_crash_memory_ranges(void) { struct memblock_region *reg; - unsigned long long start, end; - int ret; + unsigned long long start, end, offset; + int i, ret; pr_debug("Setup crash memory ranges.\n"); crash_mem_ranges = 0; + offset = fw_dump.boot_memory_size + fw_dump.boot_memory_hole_size; /* - * add the first memory chunk (RMA_START through boot_memory_size) as - * a separate memory chunk. The reason is, at the time crash firmware - * will move the content of this memory chunk to different location - * specified during fadump registration. We need to create a separate - * program header for this chunk with the correct offset. + * Add real memory region(s) whose content is going to be moved to + * a different location, specified during fadump registration, by + * firmware at the time of crash. We need to create separate program + * header(s) for this memory chunk with the correct offset. */ - ret = fadump_add_crash_memory(RMA_START, fw_dump.boot_memory_size); - if (ret) - return ret; + for (i = 0; i < fw_dump.rmr_regions_cnt; i++) { + start = fw_dump.rmr_src_addr[i]; + end = start + fw_dump.rmr_src_size[i]; + ret = fadump_add_crash_memory(start, end); + if (ret) + return ret; + } for_each_memblock(memory, reg) { start = (unsigned long long)reg->base; end = start + (unsigned long long)reg->size; /* - * skip the first memory chunk that is already added (RMA_START + * Skip the first memory chunk that is already added (RMA_START * through boot_memory_size). This logic needs a relook if and * when RMA_START changes to a non-zero value. */ BUILD_BUG_ON(RMA_START != 0); - if (start < fw_dump.boot_memory_size) { - if (end > fw_dump.boot_memory_size) - start = fw_dump.boot_memory_size; + if (start < offset) { + if (end > offset) + start = offset; else continue; } @@ -674,17 +781,32 @@ static int fadump_setup_crash_memory_ranges(void) */ static inline unsigned long fadump_relocate(unsigned long paddr) { - if (paddr > RMA_START && paddr < fw_dump.boot_memory_size) - return fw_dump.rmr_destination_addr + paddr; - else - return paddr; + unsigned long raddr, rstart, rend, offset; + int i; + + offset = 0; + raddr = paddr; + for (i = 0; i < fw_dump.rmr_regions_cnt; i++) { + rstart = fw_dump.rmr_src_addr[i]; + rend = rstart + fw_dump.rmr_src_size[i]; + + if (paddr > rstart && paddr < rend) { + raddr += fw_dump.rmr_destination_addr + offset; + break; + } + + offset += fw_dump.rmr_src_size[i]; + } + + return raddr; } static int fadump_create_elfcore_headers(char *bufp) { struct elfhdr *elf; struct elf_phdr *phdr; - int i; + unsigned long long raddr, offset; + int i, j; fadump_init_elfcore_header(bufp); elf = (struct elfhdr *)bufp; @@ -727,9 +849,12 @@ static int fadump_create_elfcore_headers(char *bufp) (elf->e_phnum)++; /* setup PT_LOAD sections. */ - + j = 0; + offset = 0; + raddr = fw_dump.rmr_src_addr[0]; for (i = 0; i < crash_mem_ranges; i++) { unsigned long long mbase, msize; + mbase = crash_memory_ranges[i].base; msize = crash_memory_ranges[i].size; @@ -742,13 +867,17 @@ static int fadump_create_elfcore_headers(char *bufp) phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = mbase; - if (mbase == RMA_START) { + if (mbase == raddr) { /* * The entire RMA region will be moved by firmware * to the specified destination_address. Hence set * the correct offset. */ - phdr->p_offset = fw_dump.rmr_destination_addr; + phdr->p_offset = fw_dump.rmr_destination_addr + offset; + if (j < (fw_dump.rmr_regions_cnt - 1)) { + offset += fw_dump.rmr_src_size[j]; + raddr = fw_dump.rmr_src_addr[++j]; + } } phdr->p_paddr = mbase; @@ -914,14 +1043,14 @@ static void fadump_invalidate_release_mem(void) * later for releasing the memory for general use. */ reserved_area_start = fw_dump.reserve_dump_area_start; - reserved_area_end = reserved_area_start + - fw_dump.reserve_dump_area_size; + reserved_area_end = + memory_limit ? memory_limit : memblock_end_of_DRAM(); + /* - * Setup reserve_dump_area_start and its size so that we can - * reuse this reserved memory for Re-registration. + * Setup reserve_dump_area_start so that we can reuse this + * reserved memory for Re-registration. */ fw_dump.reserve_dump_area_start = destination_address; - fw_dump.reserve_dump_area_size = get_fadump_area_size(); fadump_release_memory(reserved_area_start, reserved_area_end); if (fw_dump.cpu_notes_buf) { diff --git a/arch/powerpc/kernel/fadump_internal.c b/arch/powerpc/kernel/fadump_internal.c index 570c357..b46c7da 100644 --- a/arch/powerpc/kernel/fadump_internal.c +++ b/arch/powerpc/kernel/fadump_internal.c @@ -10,6 +10,9 @@ * 2 of the License, or (at your option) any later version. */ +#undef DEBUG +#define pr_fmt(fmt) "fadump: " fmt + #include #include #include @@ -48,6 +51,15 @@ void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size) __free_pages(page, order); } +void fadump_set_meta_area_start(struct fw_dump *fadump_conf) +{ + fadump_conf->meta_area_start = (fadump_conf->rmr_destination_addr + + fadump_conf->rmr_source_len); + + pr_debug("Meta area start address: 0x%lx\n", + fadump_conf->meta_area_start); +} + #define GPR_MASK 0xffffff0000000000 static inline int fadump_gpr_index(u64 id) { @@ -165,10 +177,19 @@ static int is_memory_area_contiguous(unsigned long d_start, */ int is_boot_memory_area_contiguous(struct fw_dump *fadump_conf) { - unsigned long d_start = RMA_START; - unsigned long d_end = RMA_START + fadump_conf->boot_memory_size; + int i, ret = 0; + unsigned long d_start, d_end; - return is_memory_area_contiguous(d_start, d_end); + for (i = 0; i < fadump_conf->rmr_regions_cnt; i++) { + d_start = fadump_conf->rmr_src_addr[i]; + d_end = d_start + fadump_conf->rmr_src_size[i]; + + ret = is_memory_area_contiguous(d_start, d_end); + if (!ret) + break; + } + + return ret; } /* diff --git a/arch/powerpc/kernel/fadump_internal.h b/arch/powerpc/kernel/fadump_internal.h index d89127f5..61c6335 100644 --- a/arch/powerpc/kernel/fadump_internal.h +++ b/arch/powerpc/kernel/fadump_internal.h @@ -47,12 +47,6 @@ #define FADUMP_UNREGISTER 2 #define FADUMP_INVALIDATE 3 -/* Firmware-Assited Dump platforms */ -enum fadump_platform_type { - FADUMP_PLATFORM_UNKNOWN = 0, - FADUMP_PLATFORM_PSERIES, -}; - #define FADUMP_CPU_ID_MASK ((1UL << 32) - 1) #define CPU_UNKNOWN (~((u32)0)) @@ -92,13 +86,23 @@ struct fadump_crash_info_header { struct cpumask online_mask; }; -struct fad_crash_memory_ranges { +/* Platform specific callback functions */ +struct fadump_ops; + +/* Firmware-Assited Dump platforms */ +enum fadump_platform_type { + FADUMP_PLATFORM_UNKNOWN = 0, + FADUMP_PLATFORM_PSERIES, + FADUMP_PLATFORM_POWERNV, +}; + +struct fadump_memory_range { unsigned long long base; unsigned long long size; }; -/* Platform specific callback functions */ -struct fadump_ops; +/* Maximum no. of real memory regions supported by the kernel */ +#define MAX_REAL_MEM_REGIONS 8 /* Firmware-assisted dump configuration details. */ struct fw_dump { @@ -119,6 +123,17 @@ struct fw_dump { unsigned long rmr_source_len; unsigned long rmr_destination_addr; + unsigned long boot_memory_hole_size; + unsigned long rmr_regions_cnt; + unsigned long rmr_src_addr[MAX_REAL_MEM_REGIONS]; + unsigned long rmr_src_size[MAX_REAL_MEM_REGIONS]; + + /* + * Maximum size supported by firmware to copy from source to + * destination address per entry. + */ + unsigned long max_copy_size; + int ibm_configure_kernel_dump; unsigned long fadump_enabled:1; @@ -145,6 +160,7 @@ struct fadump_ops { /* Helper functions */ void *fadump_cpu_notes_buf_alloc(unsigned long size); void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size); +void fadump_set_meta_area_start(struct fw_dump *fadump_conf); void fadump_set_regval(struct pt_regs *regs, u64 reg_id, u64 reg_val); u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs); void fadump_update_elfcore_header(struct fw_dump *fadump_config, char *bufp); @@ -161,4 +177,14 @@ pseries_dt_scan_fadump(struct fw_dump *fadump_config, ulong node) } #endif +#ifdef CONFIG_PPC_POWERNV +extern int opal_dt_scan_fadump(struct fw_dump *fadump_config, ulong node); +#else +static inline int +opal_dt_scan_fadump(struct fw_dump *fadump_config, ulong node) +{ + return 1; +} +#endif + #endif /* __PPC64_FA_DUMP_INTERNAL_H__ */ diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index b540ce8e..adc0de6 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -6,6 +6,7 @@ obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o obj-y += opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o +obj-$(CONFIG_FA_DUMP) += opal-fadump.o obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o obj-$(CONFIG_CXL_BASE) += pci-cxl.o obj-$(CONFIG_EEH) += eeh-powernv.o diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c new file mode 100644 index 0000000..0679d98 --- /dev/null +++ b/arch/powerpc/platforms/powernv/opal-fadump.c @@ -0,0 +1,373 @@ +/* + * Firmware-Assisted Dump support on POWER platform (OPAL). + * + * Copyright 2018-2019, IBM Corp. + * Author: Hari Bathini + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#undef DEBUG +#define pr_fmt(fmt) "opal fadump: " fmt + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include "../../kernel/fadump_internal.h" +#include "opal-fadump.h" + +static struct opal_fadump_mem_struct fdm; +static const struct opal_fadump_mem_struct *fdm_active; +unsigned long fdm_actual_size; + +static void opal_set_preserv_area_start(struct fw_dump *fadump_conf) +{ + fadump_conf->preserv_area_start = fadump_conf->rmr_destination_addr; + + pr_debug("Preserve area start address: 0x%lx\n", + fadump_conf->preserv_area_start); +} + +static void update_fadump_config(struct fw_dump *fadump_conf, + const struct opal_fadump_mem_struct *fdm) +{ + unsigned long base, size, last_end; + int section_cnt = be16_to_cpu(fdm->section_count); + int unused_sections = (OPAL_MAX_SECTIONS - section_cnt); + int i, j; + + pr_debug("section_cnt: %d\n", section_cnt); + BUILD_BUG_ON(OPAL_MAX_SECTIONS > OPAL_FADUMP_NR_SECTIONS); + WARN_ON(unused_sections < 0); + fdm_actual_size = sizeof(*fdm) - + (unused_sections * sizeof(struct opal_fadump_section)); + + /* + * The first real memory region entry is the real memory + * regions destination address. + */ + fadump_conf->rmr_destination_addr = 0; + for (i = 0; i < section_cnt; i++) { + if (fdm->section[i].src_type == + OPAL_FADUMP_REAL_MODE_REGION) { + fadump_conf->rmr_destination_addr = + be64_to_cpu(fdm->section[i].dest_addr); + break; + } + } + pr_debug("Destination address of real memory regions: %#016lx\n", + fadump_conf->rmr_destination_addr); + + if (fadump_conf->dump_active) { + j = 0; + last_end = 0; + fadump_conf->rmr_source_len = 0; + fadump_conf->boot_memory_hole_size = 0; + for (i = 0; i < section_cnt; i++) { + if (fdm->section[i].src_type == + OPAL_FADUMP_REAL_MODE_REGION) { + base = be64_to_cpu(fdm->section[i].src_addr); + size = be64_to_cpu(fdm->section[i].src_size); + pr_debug("%d. RMR base: 0x%lx, size: 0x%lx\n", + (i + 1), base, size); + + fadump_conf->rmr_src_addr[j] = base; + fadump_conf->rmr_src_size[j] = size; + fadump_conf->rmr_source_len += size; + + if (base > last_end) { + fadump_conf->boot_memory_hole_size += + (base - last_end); + } + + last_end = base + size; + j++; + } + } + fadump_conf->rmr_regions_cnt = j; + pr_debug("Real memory regions count: %lu\n", + fadump_conf->rmr_regions_cnt); + } + + fadump_set_meta_area_start(fadump_conf); + opal_set_preserv_area_start(fadump_conf); +} + +static ulong opal_init_fadump_mem_struct(struct fw_dump *fadump_conf) +{ + ulong addr = fadump_conf->reserve_dump_area_start; + int i, section_cnt = 0; + + fdm.section_size = cpu_to_be16(sizeof(struct opal_fadump_section)); + + /* RMA region sections */ + for (i = 0; i < fadump_conf->rmr_regions_cnt; i++) { + fdm.section[RMR_REGION_INPUT_IDX + i].src_type = + OPAL_FADUMP_REAL_MODE_REGION; + fdm.section[RMR_REGION_INPUT_IDX + i].src_addr = + cpu_to_be64(fadump_conf->rmr_src_addr[i]); + fdm.section[RMR_REGION_INPUT_IDX + i].dest_addr = + cpu_to_be64(addr); + fdm.section[RMR_REGION_INPUT_IDX + i].src_size = + fdm.section[RMR_REGION_INPUT_IDX + i].dest_size = + cpu_to_be64(fadump_conf->rmr_src_size[i]); + + section_cnt++; + addr += fadump_conf->rmr_src_size[i]; + } + + fdm.section_count = cpu_to_be16(section_cnt); + update_fadump_config(fadump_conf, &fdm); + + return addr; +} + +static int opal_register_fadump(struct fw_dump *fadump_conf) +{ + int rc, err = -EIO; + + rc = opal_configure_fadump(FADUMP_REGISTER, &fdm, fdm_actual_size); + switch (rc) { + default: + pr_err("Failed to register. Unknown Error(%d).\n", rc); + break; + case OPAL_UNSUPPORTED: + pr_err("Support not available.\n"); + fadump_conf->fadump_supported = 0; + fadump_conf->fadump_enabled = 0; + break; + case OPAL_INTERNAL_ERROR: + pr_err("Failed to register. Hardware Error(%d).\n", rc); + break; + case OPAL_PARAMETER: + pr_err("Failed to register. Parameter Error(%d).\n", rc); + break; + case OPAL_PERMISSION: + pr_err("Already registered!\n"); + fadump_conf->dump_registered = 1; + err = -EEXIST; + break; + case OPAL_SUCCESS: + pr_err("Registration is successful!\n"); + fadump_conf->dump_registered = 1; + err = 0; + break; + } + + return err; +} + +static int opal_unregister_fadump(struct fw_dump *fadump_conf) +{ + int rc; + + rc = opal_configure_fadump(FADUMP_UNREGISTER, &fdm, fdm_actual_size); + if (rc) { + pr_err("Failed to un-register - unexpected Error(%d).\n", rc); + return -EIO; + } + + fadump_conf->dump_registered = 0; + return 0; +} + +static int opal_invalidate_fadump(struct fw_dump *fadump_conf) +{ + int rc; + + rc = opal_configure_fadump(FADUMP_INVALIDATE, (void *)fdm_active, + fdm_actual_size); + if (rc) { + pr_err("Failed to invalidate - unexpected Error(%d).\n", rc); + return -EIO; + } + + fadump_conf->dump_active = 0; + fdm_active = NULL; + return 0; +} + +/* + * Read CPU state dump data and convert it into ELF notes. + * + * Each register entry is of 16 bytes, A numerical identifier along with + * a GPR/SPR flag in the first 8 bytes and the register value in the next + * 8 bytes. For more details refer to F/W documentation. + */ +static int __init fadump_build_cpu_notes(struct fw_dump *fadump_conf) +{ + u32 num_cpus = 1, *note_buf; + struct fadump_crash_info_header *fdh = NULL; + + /* Allocate buffer to hold cpu crash notes. */ + fadump_conf->cpu_notes_buf_size = num_cpus * sizeof(note_buf_t); + fadump_conf->cpu_notes_buf_size = + PAGE_ALIGN(fadump_conf->cpu_notes_buf_size); + note_buf = fadump_cpu_notes_buf_alloc(fadump_conf->cpu_notes_buf_size); + if (!note_buf) { + pr_err("Failed to allocate 0x%lx bytes for cpu notes buffer\n", + fadump_conf->cpu_notes_buf_size); + return -ENOMEM; + } + fadump_conf->cpu_notes_buf = __pa(note_buf); + + pr_debug("Allocated buffer for cpu notes of size %ld at %p\n", + (num_cpus * sizeof(note_buf_t)), note_buf); + + if (fadump_conf->fadumphdr_addr) + fdh = __va(fadump_conf->fadumphdr_addr); + + if (fdh && (fdh->crashing_cpu != CPU_UNKNOWN)) { + note_buf = fadump_regs_to_elf_notes(note_buf, &(fdh->regs)); + final_note(note_buf); + + pr_debug("Updating elfcore header (%llx) with cpu notes\n", + fdh->elfcorehdr_addr); + fadump_update_elfcore_header(fadump_conf, + __va(fdh->elfcorehdr_addr)); + } + + return 0; +} + +static int __init opal_process_fadump(struct fw_dump *fadump_conf) +{ + struct fadump_crash_info_header *fdh; + int rc = 0; + + if (!fdm_active || !fadump_conf->fadumphdr_addr) + return -EINVAL; + + /* Validate the fadump crash info header */ + fdh = __va(fadump_conf->fadumphdr_addr); + if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { + pr_err("Crash info header is not valid.\n"); + return -EINVAL; + } + + /* + * TODO: To build cpu notes, find a way to map PIR to logical id. + * Also, we may need different method for pseries and powernv. + * The currently booted kernel could have a different PIR to + * logical id mapping. So, try saving info of previous kernel's + * paca to get the right PIR to logical id mapping. + */ + rc = fadump_build_cpu_notes(fadump_conf); + if (rc) + return rc; + + /* + * We are done validating dump info and elfcore header is now ready + * to be exported. set elfcorehdr_addr so that vmcore module will + * export the elfcore header through '/proc/vmcore'. + */ + elfcorehdr_addr = fdh->elfcorehdr_addr; + + return rc; +} + +static void opal_fadump_region_show(struct fw_dump *fadump_conf, + struct seq_file *m) +{ + int i; + const struct opal_fadump_mem_struct *fdm_ptr; + + if (fdm_active) + fdm_ptr = fdm_active; + else + fdm_ptr = &fdm; + + seq_puts(m, "-----------------------------------------------------"); + seq_puts(m, "-----------------------------\n"); + seq_puts(m, "| | Source | "); + seq_puts(m, " Destination |\n"); + seq_puts(m, "- ------------------------------------------------"); + seq_puts(m, "-----------------------------\n"); + seq_puts(m, "|Type| Address | Size | "); + seq_puts(m, "Address | Size |\n"); + seq_puts(m, "-----------------------------------------------------"); + seq_puts(m, "-----------------------------\n"); + + for (i = 0; i < be16_to_cpu(fdm_ptr->section_count); i++) { + seq_printf(m, "|%3u |0x%016llx|0x%016llx|0x%016llx|0x%016llx|\n", + fdm_ptr->section[i].src_type, + be64_to_cpu(fdm_ptr->section[i].src_addr), + be64_to_cpu(fdm_ptr->section[i].src_size), + be64_to_cpu(fdm_ptr->section[i].dest_addr), + be64_to_cpu(fdm_ptr->section[i].dest_size)); + seq_puts(m, "---------------------------------------------"); + seq_puts(m, "-------------------------------------\n"); + } + +} + +static void opal_crash_fadump(const char *msg) +{ + int rc; + + rc = opal_cec_reboot2(OPAL_REBOOT_OS_ERROR, msg); + if (rc == OPAL_UNSUPPORTED) { + pr_emerg("Reboot type %d not supported.\n", + OPAL_REBOOT_OS_ERROR); + } else if (rc == OPAL_HARDWARE) + pr_emerg("No backend support for MPIPL!\n"); +} + +static struct fadump_ops opal_fadump_ops = { + .init_fadump_mem_struct = opal_init_fadump_mem_struct, + .register_fadump = opal_register_fadump, + .unregister_fadump = opal_unregister_fadump, + .invalidate_fadump = opal_invalidate_fadump, + .process_fadump = opal_process_fadump, + .fadump_region_show = opal_fadump_region_show, + .crash_fadump = opal_crash_fadump, +}; + +int __init opal_dt_scan_fadump(struct fw_dump *fadump_conf, ulong node) +{ + unsigned long dn; + + /* + * Check if Firmware Assisted dump is supported. if yes, check + * if dump has been initiated on last reboot. + */ + dn = of_get_flat_dt_subnode_by_name(node, "dump"); + if (dn == -FDT_ERR_NOTFOUND) { + pr_debug("OPAL support missing!\n"); + return 1; + } + + /* + * Firmware currently supports only 32-bit value for size, + * align it to 1MB size. + */ + fadump_conf->max_copy_size = _ALIGN_DOWN(0xFFFFFFFF, (1 << 20)); + + /* + * Check if dump has been initiated on last reboot. + */ + fdm_active = of_get_flat_dt_prop(dn, "result-table", NULL); + if (fdm_active) { + pr_info("Firmware-assisted dump is active.\n"); + fadump_conf->dump_active = 1; + update_fadump_config(fadump_conf, (void *)__pa(fdm_active)); + } + + fadump_conf->ops = &opal_fadump_ops; + fadump_conf->fadump_platform = FADUMP_PLATFORM_POWERNV; + fadump_conf->fadump_supported = 1; + + return 1; +} diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h b/arch/powerpc/platforms/powernv/opal-fadump.h new file mode 100644 index 0000000..a5eeb2c --- /dev/null +++ b/arch/powerpc/platforms/powernv/opal-fadump.h @@ -0,0 +1,40 @@ +/* + * Firmware-Assisted Dump support on POWER platform (OPAL). + * + * Copyright 2018-2019, IBM Corp. + * Author: Hari Bathini + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef __PPC64_OPAL_FA_DUMP_H__ +#define __PPC64_OPAL_FA_DUMP_H__ + +#define OPAL_FADUMP_CPU_STATE_DATA 0x0000 +/* OPAL : 0x01 – 0x39 */ +#define OPAL_FADUMP_OPAL_REGION 0x0001 +/* Firmware/SMF : 0x40 – 0x79 */ +#define OPAL_FADUMP_FW_REGION 0x0040 +/* Kernel memory region : 0x80 – 0xb9 */ +#define OPAL_FADUMP_REAL_MODE_REGION 0x0080 +/* Reserved for future use : 0xc0 – 0xff */ +#define OPAL_FADUMP_RESERVED_REGION 0x00c0 + +enum opal_fadump_section_types { + CPU_STATE_TYPE = 0, + OPAL_REGION_TYPE, + FW_REGION_TYPE, + RMR_REGION_TYPE, + OPAL_SECTIONS +}; + +/* Starting index of RMR region in dump sections while registering */ +#define RMR_REGION_INPUT_IDX 0 + +#define OPAL_MAX_SECTIONS (OPAL_SECTIONS + \ + MAX_REAL_MEM_REGIONS - 1) + +#endif /* __PPC64_OPAL_FA_DUMP_H__ */ diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index 2515282..178ef3b 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -331,3 +331,4 @@ OPAL_CALL(opal_pci_set_pbcq_tunnel_bar, OPAL_PCI_SET_PBCQ_TUNNEL_BAR); OPAL_CALL(opal_sensor_read_u64, OPAL_SENSOR_READ_U64); OPAL_CALL(opal_sensor_group_enable, OPAL_SENSOR_GROUP_ENABLE); OPAL_CALL(opal_nx_coproc_init, OPAL_NX_COPROC_INIT); +OPAL_CALL(opal_configure_fadump, OPAL_CONFIGURE_FADUMP); diff --git a/arch/powerpc/platforms/pseries/pseries_fadump.c b/arch/powerpc/platforms/pseries/pseries_fadump.c index 5450d2b..f380f3f 100644 --- a/arch/powerpc/platforms/pseries/pseries_fadump.c +++ b/arch/powerpc/platforms/pseries/pseries_fadump.c @@ -49,15 +49,6 @@ static void pseries_set_preserv_area_start(struct fw_dump *fadump_conf) fadump_conf->preserv_area_start); } -static void pseries_set_meta_area_start(struct fw_dump *fadump_conf) -{ - fadump_conf->meta_area_start = (fadump_conf->rmr_destination_addr + - fadump_conf->rmr_source_len); - - pr_debug("Meta area start address: 0x%lx\n", - fadump_conf->meta_area_start); -} - static void update_fadump_config(struct fw_dump *fadump_conf, const struct pseries_fadump_mem_struct *fdm) { @@ -65,11 +56,16 @@ static void update_fadump_config(struct fw_dump *fadump_conf, be64_to_cpu(fdm->rmr_region.destination_address); if (fadump_conf->dump_active) { - fadump_conf->rmr_source_len = + fadump_conf->rmr_src_addr[0] = + be64_to_cpu(fdm->rmr_region.source_address); + fadump_conf->rmr_src_size[0] = be64_to_cpu(fdm->rmr_region.source_len); + fadump_conf->rmr_regions_cnt = 1; + fadump_conf->rmr_source_len = fadump_conf->rmr_src_size[0]; + fadump_conf->boot_memory_hole_size = 0; } - pseries_set_meta_area_start(fadump_conf); + fadump_set_meta_area_start(fadump_conf); pseries_set_preserv_area_start(fadump_conf); } From patchwork Thu Dec 20 19:00:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Hari Bathini X-Patchwork-Id: 1016979 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLyJ1jvCz9sC7 for ; Fri, 21 Dec 2018 06:13:28 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43LLyJ0Lm1zDrDy for ; Fri, 21 Dec 2018 06:13:28 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43LLgz3bV4zDr3Z for ; Fri, 21 Dec 2018 06:01:03 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) by bilbo.ozlabs.org (Postfix) with ESMTP id 43LLgz2nRYz8tRf for ; Fri, 21 Dec 2018 06:01:03 +1100 (AEDT) Received: by ozlabs.org (Postfix) id 43LLgz1hC7z9sCQ; Fri, 21 Dec 2018 06:01:03 +1100 (AEDT) Delivered-To: linuxppc-dev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=hbathini@linux.ibm.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLgy4VtPz9sCV for ; Fri, 21 Dec 2018 06:01:02 +1100 (AEDT) Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBKIwklk044960 for ; Thu, 20 Dec 2018 14:01:00 -0500 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0a-001b2d01.pphosted.com with ESMTP id 2pgg5phhjk-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Dec 2018 14:00:59 -0500 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Dec 2018 19:00:57 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Dec 2018 19:00:54 -0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBKJ0qg459637950 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 20 Dec 2018 19:00:52 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 831C611C06E; Thu, 20 Dec 2018 19:00:52 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CFFD511C058; Thu, 20 Dec 2018 19:00:50 +0000 (GMT) Received: from hbathini.in.ibm.com (unknown [9.199.47.3]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 20 Dec 2018 19:00:50 +0000 (GMT) Subject: [PATCH 5/9] powerpc/fadump: process architected register state data provided by firmware From: Hari Bathini To: Ananth N Mavinakayanahalli , Michael Ellerman , Mahesh J Salgaonkar , Vasant Hegde , linuxppc-dev , Stewart Smith Date: Fri, 21 Dec 2018 00:30:49 +0530 In-Reply-To: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> References: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18122019-0016-0000-0000-0000023956F9 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18122019-0017-0000-0000-00003291B458 Message-Id: <154533244975.28973.16083421864653823983.stgit@hbathini.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-12-20_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812200154 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Hari Bathini Firmware provides architected register state data at the time of crash. This data contains PIR value. Need to store the logical CPUs PIR values to match the data provided by f/w with the corresponding logical CPU. Signed-off-by: Hari Bathini Signed-off-by: Vasant Hegde --- arch/powerpc/kernel/fadump.c | 40 +--- arch/powerpc/kernel/fadump_internal.c | 129 ++++++++++++++ arch/powerpc/kernel/fadump_internal.h | 32 +++ arch/powerpc/platforms/powernv/opal-fadump.c | 216 +++++++++++++++++++++-- arch/powerpc/platforms/powernv/opal-fadump.h | 9 + arch/powerpc/platforms/pseries/pseries_fadump.c | 1 6 files changed, 384 insertions(+), 43 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 190f7ed..d9cf809 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -276,10 +276,10 @@ static unsigned long get_fadump_area_size(void) size += fw_dump.hpte_region_size; size += fw_dump.boot_memory_size; size += sizeof(struct fadump_crash_info_header); - size += sizeof(struct elfhdr); /* ELF core header.*/ - size += sizeof(struct elf_phdr); /* place holder for cpu notes */ - /* Program headers for crash memory regions. */ - size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2); + /* To store the start address of backup area */ + size += sizeof(unsigned long *); + size += get_fadump_elfcore_hdr_size(); + size += fw_dump.backup_area_size; size = PAGE_ALIGN(size); return size; @@ -892,26 +892,6 @@ static int fadump_create_elfcore_headers(char *bufp) return 0; } -static unsigned long init_fadump_header(unsigned long addr) -{ - struct fadump_crash_info_header *fdh; - - if (!addr) - return 0; - - fw_dump.fadumphdr_addr = addr; - fdh = __va(addr); - addr += sizeof(struct fadump_crash_info_header); - - memset(fdh, 0, sizeof(struct fadump_crash_info_header)); - fdh->magic_number = FADUMP_CRASH_INFO_MAGIC; - fdh->elfcorehdr_addr = addr; - /* We will set the crashing cpu id in crash_fadump() during crash. */ - fdh->crashing_cpu = CPU_UNKNOWN; - - return addr; -} - static int register_fadump(void) { unsigned long addr; @@ -929,15 +909,15 @@ static int register_fadump(void) if (ret) return ret; - addr = fw_dump.meta_area_start; - /* Initialize fadump crash info header. */ - addr = init_fadump_header(addr); + addr = fw_dump.ops->init_fadump_header(&fw_dump); vaddr = __va(addr); pr_debug("Creating ELF core headers at %#016lx\n", addr); fadump_create_elfcore_headers(vaddr); + fadump_populate_backup_area(&fw_dump); + /* register the future kernel dump with firmware. */ pr_debug("Registering for firmware-assisted kernel dump...\n"); return fw_dump.ops->register_fadump(&fw_dump); @@ -1242,8 +1222,12 @@ int __init setup_fadump(void) fadump_invalidate_release_mem(); } /* Initialize the kernel dump memory structure for FAD registration. */ - else if (fw_dump.reserve_dump_area_size) + else if (fw_dump.reserve_dump_area_size) { fw_dump.ops->init_fadump_mem_struct(&fw_dump); + fw_dump.ops->init_fadump_header(&fw_dump); + init_fadump_backup_area(&fw_dump); + fadump_populate_backup_area(&fw_dump); + } fadump_init_files(); diff --git a/arch/powerpc/kernel/fadump_internal.c b/arch/powerpc/kernel/fadump_internal.c index b46c7da..ea6f8ba 100644 --- a/arch/powerpc/kernel/fadump_internal.c +++ b/arch/powerpc/kernel/fadump_internal.c @@ -20,6 +20,34 @@ #include "fadump_internal.h" +/* + * Initializes the legacy fadump header format. + * Platform specific code can reuse/overwrite this format. + * OPAL platform overrides this data to add backup area support. + * + * TODO: Extend backup area support to pseries to make it robust? + */ +unsigned long generic_init_fadump_header(struct fw_dump *fadump_conf) +{ + unsigned long addr = fadump_conf->meta_area_start; + struct fadump_crash_info_header *fdh; + + if (!addr) + return 0; + + fadump_conf->fadumphdr_addr = addr; + fdh = __va(addr); + addr += sizeof(struct fadump_crash_info_header); + + memset(fdh, 0, sizeof(struct fadump_crash_info_header)); + fdh->magic_number = FADUMP_CRASH_INFO_MAGIC; + fdh->elfcorehdr_addr = addr; + /* We will set the crashing cpu id in crash_fadump() during crash. */ + fdh->crashing_cpu = CPU_UNKNOWN; + + return addr; +} + void *fadump_cpu_notes_buf_alloc(unsigned long size) { void *vaddr; @@ -106,6 +134,43 @@ void fadump_set_regval(struct pt_regs *regs, u64 reg_id, u64 reg_val) regs->dsisr = (unsigned long)reg_val; } +void fadump_set_regval_regnum(struct pt_regs *regs, u64 reg_id, + u64 reg_val, int reg_cnt) +{ + if (reg_cnt >= 63) { + if (reg_id < 32) { + regs->gpr[reg_id] = reg_val; + return; + } + } + switch (reg_id) { + case 2000: + regs->nip = reg_val; + break; + case 2001: + regs->msr = reg_val; + break; + case 9: + regs->ctr = reg_val; + break; + case 8: + regs->link = reg_val; + break; + case 1: + regs->xer = reg_val; + break; + case 2002: + regs->ccr = reg_val; + break; + case 19: + regs->dar = reg_val; + break; + case 18: + regs->dsisr = reg_val; + break; + } +} + u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs) { struct elf_prstatus prstatus; @@ -121,6 +186,19 @@ u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs) return buf; } +unsigned long get_fadump_elfcore_hdr_size(void) +{ + unsigned long size = 0; + + size = sizeof(struct elfhdr); /* ELF core header.*/ + size += sizeof(struct elf_phdr); /* place holder for cpu notes */ + size += sizeof(struct elf_phdr); /* vmcoreinfo notes program header */ + /* Program headers for crash memory regions. */ + size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 1); + + return size; +} + void fadump_update_elfcore_header(struct fw_dump *fadump_conf, char *bufp) { struct elfhdr *elf; @@ -203,3 +281,54 @@ int is_reserved_memory_area_contiguous(struct fw_dump *fadump_conf) return is_memory_area_contiguous(d_start, d_end); } + +void init_fadump_backup_area(struct fw_dump *fadump_conf) +{ + unsigned long addr = fadump_conf->backup_area_start; + struct fadump_backup_area *backup_info; + + if (!addr) + return; + + backup_info = __va(addr); + memset(backup_info, 0xFF, fadump_conf->backup_area_size); + backup_info->version = BACKUP_AREA_VERSION_V1; + backup_info->size = fadump_conf->backup_area_size; + backup_info->nr_threads = 0; +} + +static inline void read_pir(void *val) +{ + *(unsigned long *)val = mfspr(SPRN_PIR); +} + +unsigned long fadump_populate_backup_area(struct fw_dump *fadump_conf) +{ + unsigned long pir, addr; + struct fadump_backup_area *backup_info; + unsigned int i; + + if (!fadump_conf->backup_area_start) + return 0; + + addr = fadump_conf->backup_area_start; + backup_info = __va(addr); + addr += fadump_conf->backup_area_size; + + backup_info->present_mask = *cpu_present_mask; + for_each_present_cpu(i) { + /* + * Skip if PIR is already read to avoid complex scenarios + * where the CPUs are offline'd after initial read. + */ + if (backup_info->thread_pir[i] != 0xFFFFFFFFU) + continue; + + smp_call_function_single(i, read_pir, &pir, 1); + pr_debug("Logical CPU: %d, PIR: 0x%lx\n", i, pir); + backup_info->thread_pir[i] = pir; + backup_info->nr_threads++; + } + + return addr; +} diff --git a/arch/powerpc/kernel/fadump_internal.h b/arch/powerpc/kernel/fadump_internal.h index 61c6335..a117f60 100644 --- a/arch/powerpc/kernel/fadump_internal.h +++ b/arch/powerpc/kernel/fadump_internal.h @@ -71,6 +71,9 @@ static inline u64 str_to_u64(const char *str) #define FADUMP_CRASH_INFO_MAGIC STR_TO_HEX("FADMPINF") +/* Backup area support in this version */ +#define FADUMP_CRASH_INFO_MAGIC_V2 STR_TO_HEX("FADINFV2") + /* Register entry. */ struct fadump_reg_entry { __be64 reg_id; @@ -89,6 +92,21 @@ struct fadump_crash_info_header { /* Platform specific callback functions */ struct fadump_ops; +#define BACKUP_AREA_VERSION_V1 1 + +/* Backup area populated with data for processing in capture kernel */ +struct fadump_backup_area { + u32 size; + u32 version:4; + u32 nr_threads:28; + u32 thread_pir[NR_CPUS]; + struct cpumask present_mask; + /* + * New backup data entries can be added here by bumping up + * the version field. + */ +}; + /* Firmware-Assited Dump platforms */ enum fadump_platform_type { FADUMP_PLATFORM_UNKNOWN = 0, @@ -106,6 +124,9 @@ struct fadump_memory_range { /* Firmware-assisted dump configuration details. */ struct fw_dump { + unsigned long cpu_state_destination_addr; + unsigned long cpu_state_data_version; + unsigned long cpu_state_entry_size; unsigned long cpu_state_data_size; unsigned long hpte_region_size; unsigned long boot_memory_size; @@ -113,6 +134,8 @@ struct fw_dump { unsigned long reserve_dump_area_size; unsigned long meta_area_start; unsigned long preserv_area_start; + unsigned long backup_area_start; + unsigned long backup_area_size; /* cmd line option during boot */ unsigned long reserve_bootvar; @@ -148,6 +171,7 @@ struct fw_dump { struct fadump_ops { ulong (*init_fadump_mem_struct)(struct fw_dump *fadump_config); + ulong (*init_fadump_header)(struct fw_dump *fadump_config); int (*register_fadump)(struct fw_dump *fadump_config); int (*unregister_fadump)(struct fw_dump *fadump_config); int (*invalidate_fadump)(struct fw_dump *fadump_config); @@ -157,15 +181,23 @@ struct fadump_ops { void (*crash_fadump)(const char *msg); }; +/* Generic version of fadump operations */ +unsigned long generic_init_fadump_header(struct fw_dump *fadump_conf); + /* Helper functions */ void *fadump_cpu_notes_buf_alloc(unsigned long size); void fadump_cpu_notes_buf_free(unsigned long vaddr, unsigned long size); void fadump_set_meta_area_start(struct fw_dump *fadump_conf); void fadump_set_regval(struct pt_regs *regs, u64 reg_id, u64 reg_val); +void fadump_set_regval_regnum(struct pt_regs *regs, u64 reg_id, + u64 reg_val, int reg_cnt); u32 *fadump_regs_to_elf_notes(u32 *buf, struct pt_regs *regs); +unsigned long get_fadump_elfcore_hdr_size(void); void fadump_update_elfcore_header(struct fw_dump *fadump_config, char *bufp); int is_boot_memory_area_contiguous(struct fw_dump *fadump_conf); int is_reserved_memory_area_contiguous(struct fw_dump *fadump_conf); +void init_fadump_backup_area(struct fw_dump *fadump_conf); +unsigned long fadump_populate_backup_area(struct fw_dump *fadump_conf); #ifdef CONFIG_PPC_PSERIES extern int pseries_dt_scan_fadump(struct fw_dump *fadump_config, ulong node); diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c index 0679d98..9e677de 100644 --- a/arch/powerpc/platforms/powernv/opal-fadump.c +++ b/arch/powerpc/platforms/powernv/opal-fadump.c @@ -32,6 +32,39 @@ static struct opal_fadump_mem_struct fdm; static const struct opal_fadump_mem_struct *fdm_active; unsigned long fdm_actual_size; +/* + * Backup area is not available in older format. In the newer fadump + * header format (v2), backup info is stored at the end of elfcorehdrs + * and pointer to this address is stored at the tail end of FADump + * crash info header. + */ +static void opal_set_backup_area_start(struct fw_dump *fadump_conf) +{ + unsigned long addr = fadump_conf->meta_area_start; + + /* + * The start of meta area holds fadump_crash_info_header followed + * by a pointer to backup area start address, elfcore headers & + * backup info. + */ + addr += sizeof(struct fadump_crash_info_header); + + if (fadump_conf->dump_active) { + /* Pointer to backup area start address */ + unsigned long *ptr = __va(addr); + + addr = *ptr; + } else { + addr += sizeof(unsigned long *); + addr += get_fadump_elfcore_hdr_size(); + } + + fadump_conf->backup_area_start = addr; + + pr_debug("Backup area start address: 0x%lx\n", + fadump_conf->backup_area_start); +} + static void opal_set_preserv_area_start(struct fw_dump *fadump_conf) { fadump_conf->preserv_area_start = fadump_conf->rmr_destination_addr; @@ -94,6 +127,12 @@ static void update_fadump_config(struct fw_dump *fadump_conf, last_end = base + size; j++; + } else if (fdm->section[i].src_type == + OPAL_FADUMP_CPU_STATE_DATA) { + fadump_conf->cpu_state_destination_addr = + be64_to_cpu(fdm->section[i].dest_addr); + fadump_conf->cpu_state_data_size = + be64_to_cpu(fdm->section[i].dest_size); } } fadump_conf->rmr_regions_cnt = j; @@ -103,6 +142,7 @@ static void update_fadump_config(struct fw_dump *fadump_conf, fadump_set_meta_area_start(fadump_conf); opal_set_preserv_area_start(fadump_conf); + opal_set_backup_area_start(fadump_conf); } static ulong opal_init_fadump_mem_struct(struct fw_dump *fadump_conf) @@ -134,6 +174,41 @@ static ulong opal_init_fadump_mem_struct(struct fw_dump *fadump_conf) return addr; } +/* + * Newer fadump header version (v2) is used for process'ing OPAL FADump. + * In this version, PIR to Logical CPU map is backed up by crashing kernel + * for the capture kernel to make sense of the register state data provided + * by F/W. The start address of the area where this info is backed up is + * stored at the tail end of fadump crash info header. + */ +static ulong opal_init_fadump_header(struct fw_dump *fadump_conf) +{ + unsigned long addr = fadump_conf->meta_area_start; + struct fadump_crash_info_header *fdh; + unsigned long *backup_area_ptr; + + if (!addr) + return 0; + + fdh = __va(addr); + addr = generic_init_fadump_header(fadump_conf); + fdh->magic_number = FADUMP_CRASH_INFO_MAGIC_V2; + + /* + * This function returns the start address of elfcore headers. + * Earlier, elfcore headers sit right below crash info header but + * with V2, pointer to backup area start address (8 bytes) sits + * in-between. So, update the return value and elfcorehdr_addr + * in fadump crash info structure accordingly. + */ + backup_area_ptr = __va(addr); + addr += sizeof(unsigned long *); + fdh->elfcorehdr_addr = addr; + *backup_area_ptr = fadump_conf->backup_area_start; + + return addr; +} + static int opal_register_fadump(struct fw_dump *fadump_conf) { int rc, err = -EIO; @@ -199,6 +274,39 @@ static int opal_invalidate_fadump(struct fw_dump *fadump_conf) return 0; } +static inline int fadump_get_logical_cpu(struct fadump_backup_area *ba, u32 pir) +{ + int i = 0, cpu = CPU_UNKNOWN; + + for_each_cpu(i, &(ba->present_mask)) { + if (ba->thread_pir[i] == pir) { + cpu = i; + break; + } + } + + return cpu; +} + +static struct fadump_reg_entry* +fadump_read_registers(unsigned int regs_per_thread, + struct fadump_reg_entry *reg_entry, + struct pt_regs *regs) +{ + int i; + int reg_cnt = 0; + + memset(regs, 0, sizeof(struct pt_regs)); + + for (i = 0; i < regs_per_thread; i++) { + fadump_set_regval_regnum(regs, be64_to_cpu(reg_entry->reg_id), + be64_to_cpu(reg_entry->reg_value), + reg_cnt++); + reg_entry++; + } + return reg_entry; +} + /* * Read CPU state dump data and convert it into ELF notes. * @@ -206,10 +314,27 @@ static int opal_invalidate_fadump(struct fw_dump *fadump_conf) * a GPR/SPR flag in the first 8 bytes and the register value in the next * 8 bytes. For more details refer to F/W documentation. */ -static int __init fadump_build_cpu_notes(struct fw_dump *fadump_conf) +static int __init fadump_build_cpu_notes(struct fw_dump *fadump_conf, + struct fadump_backup_area *backup_info) { - u32 num_cpus = 1, *note_buf; + struct opal_thread_hdr *thdr; + struct fadump_reg_entry *reg_entry; struct fadump_crash_info_header *fdh = NULL; + unsigned long addr; + u32 num_cpus, *note_buf; + u32 thread_pir; + char *bufp; + struct pt_regs regs; + int i, rc = 0, cpu = 0; + unsigned int size_of_each_thread, regs_per_thread; + + size_of_each_thread = fadump_conf->cpu_state_entry_size; + num_cpus = (fadump_conf->cpu_state_data_size / size_of_each_thread); + regs_per_thread = ((size_of_each_thread - CPU_REG_ENTRY_OFFSET) / + sizeof(struct fadump_reg_entry)); + + addr = fadump_conf->cpu_state_destination_addr; + bufp = __va(addr); /* Allocate buffer to hold cpu crash notes. */ fadump_conf->cpu_notes_buf_size = num_cpus * sizeof(note_buf_t); @@ -229,22 +354,58 @@ static int __init fadump_build_cpu_notes(struct fw_dump *fadump_conf) if (fadump_conf->fadumphdr_addr) fdh = __va(fadump_conf->fadumphdr_addr); - if (fdh && (fdh->crashing_cpu != CPU_UNKNOWN)) { - note_buf = fadump_regs_to_elf_notes(note_buf, &(fdh->regs)); - final_note(note_buf); + if (backup_info->nr_threads != num_cpus) { + pr_warn("Calculated numcpus (%d) not same as populated value (%d)!\n", + num_cpus, backup_info->nr_threads); + } + pr_debug("--------CPU State Data------------\n"); + pr_debug("NumCpus : %u\n", num_cpus); + + for (i = 0; i < num_cpus; i++, bufp += size_of_each_thread) { + thdr = (struct opal_thread_hdr *)bufp; + thread_pir = be32_to_cpu(thdr->pir); + cpu = fadump_get_logical_cpu(backup_info, thread_pir); + if (cpu == CPU_UNKNOWN) { + pr_warn("Unable to get the logical CPU of PIR %d\n", + thread_pir); + continue; + } + + reg_entry = (struct fadump_reg_entry *)(bufp + + CPU_REG_ENTRY_OFFSET); + + if (fdh) { + if (!cpumask_test_cpu(cpu, &fdh->online_mask)) + continue; + + if (fdh->crashing_cpu == cpu) { + regs = fdh->regs; + note_buf = fadump_regs_to_elf_notes(note_buf, + ®s); + continue; + } + } + + fadump_read_registers(regs_per_thread, reg_entry, ®s); + note_buf = fadump_regs_to_elf_notes(note_buf, ®s); + } + final_note(note_buf); + if (fdh) { pr_debug("Updating elfcore header (%llx) with cpu notes\n", fdh->elfcorehdr_addr); fadump_update_elfcore_header(fadump_conf, __va(fdh->elfcorehdr_addr)); } - return 0; + return rc; } static int __init opal_process_fadump(struct fw_dump *fadump_conf) { struct fadump_crash_info_header *fdh; + struct fadump_backup_area *backup_info = NULL; + unsigned long addr; int rc = 0; if (!fdm_active || !fadump_conf->fadumphdr_addr) @@ -252,19 +413,19 @@ static int __init opal_process_fadump(struct fw_dump *fadump_conf) /* Validate the fadump crash info header */ fdh = __va(fadump_conf->fadumphdr_addr); - if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) { + if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC_V2) { pr_err("Crash info header is not valid.\n"); return -EINVAL; } - /* - * TODO: To build cpu notes, find a way to map PIR to logical id. - * Also, we may need different method for pseries and powernv. - * The currently booted kernel could have a different PIR to - * logical id mapping. So, try saving info of previous kernel's - * paca to get the right PIR to logical id mapping. - */ - rc = fadump_build_cpu_notes(fadump_conf); + addr = fadump_conf->backup_area_start; + backup_info = __va(addr); + if (!addr || (backup_info->version != BACKUP_AREA_VERSION_V1)) { + pr_err("Backup data missing or unsupported!\n"); + return -EINVAL; + } + + rc = fadump_build_cpu_notes(fadump_conf, backup_info); if (rc) return rc; @@ -327,6 +488,7 @@ static void opal_crash_fadump(const char *msg) static struct fadump_ops opal_fadump_ops = { .init_fadump_mem_struct = opal_init_fadump_mem_struct, + .init_fadump_header = opal_init_fadump_header, .register_fadump = opal_register_fadump, .unregister_fadump = opal_unregister_fadump, .invalidate_fadump = opal_invalidate_fadump, @@ -338,6 +500,7 @@ static struct fadump_ops opal_fadump_ops = { int __init opal_dt_scan_fadump(struct fw_dump *fadump_conf, ulong node) { unsigned long dn; + const __be32 *prop; /* * Check if Firmware Assisted dump is supported. if yes, check @@ -349,6 +512,19 @@ int __init opal_dt_scan_fadump(struct fw_dump *fadump_conf, ulong node) return 1; } + fadump_conf->backup_area_size = sizeof(struct fadump_backup_area); + + prop = of_get_flat_dt_prop(dn, "cpu-data-version", NULL); + if (prop) + fadump_conf->cpu_state_data_version = of_read_number(prop, 1); + + if (fadump_conf->cpu_state_data_version != CPU_STATE_DATA_VERSION) { + pr_err("CPU state data format version mismatch!\n"); + pr_err("Kernel: %u, OPAL: %lu\n", CPU_STATE_DATA_VERSION, + fadump_conf->cpu_state_data_version); + return 1; + } + /* * Firmware currently supports only 32-bit value for size, * align it to 1MB size. @@ -363,6 +539,16 @@ int __init opal_dt_scan_fadump(struct fw_dump *fadump_conf, ulong node) pr_info("Firmware-assisted dump is active.\n"); fadump_conf->dump_active = 1; update_fadump_config(fadump_conf, (void *)__pa(fdm_active)); + + /* + * Doesn't need to populate these fields while registering dump + * as destination address and size are provided by F/W. + */ + prop = of_get_flat_dt_prop(dn, "cpu-data-size", NULL); + if (prop) { + fadump_conf->cpu_state_entry_size = + of_read_number(prop, 1); + } } fadump_conf->ops = &opal_fadump_ops; diff --git a/arch/powerpc/platforms/powernv/opal-fadump.h b/arch/powerpc/platforms/powernv/opal-fadump.h index a5eeb2c..392e4ce 100644 --- a/arch/powerpc/platforms/powernv/opal-fadump.h +++ b/arch/powerpc/platforms/powernv/opal-fadump.h @@ -13,6 +13,9 @@ #ifndef __PPC64_OPAL_FA_DUMP_H__ #define __PPC64_OPAL_FA_DUMP_H__ +#define CPU_STATE_DATA_VERSION 1 +#define CPU_REG_ENTRY_OFFSET 16 + #define OPAL_FADUMP_CPU_STATE_DATA 0x0000 /* OPAL : 0x01 – 0x39 */ #define OPAL_FADUMP_OPAL_REGION 0x0001 @@ -37,4 +40,10 @@ enum opal_fadump_section_types { #define OPAL_MAX_SECTIONS (OPAL_SECTIONS + \ MAX_REAL_MEM_REGIONS - 1) +struct opal_thread_hdr { + __be32 pir; + u8 core_state; + u8 reserved[11]; +} __packed; + #endif /* __PPC64_OPAL_FA_DUMP_H__ */ diff --git a/arch/powerpc/platforms/pseries/pseries_fadump.c b/arch/powerpc/platforms/pseries/pseries_fadump.c index f380f3f..f1d7b66 100644 --- a/arch/powerpc/platforms/pseries/pseries_fadump.c +++ b/arch/powerpc/platforms/pseries/pseries_fadump.c @@ -460,6 +460,7 @@ static void pseries_crash_fadump(const char *msg) static struct fadump_ops pseries_fadump_ops = { .init_fadump_mem_struct = pseries_init_fadump_mem_struct, + .init_fadump_header = generic_init_fadump_header, .register_fadump = pseries_register_fadump, .unregister_fadump = pseries_unregister_fadump, .invalidate_fadump = pseries_invalidate_fadump, From patchwork Thu Dec 20 19:00:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hari Bathini X-Patchwork-Id: 1016981 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LM0c2zt7z9sCQ for ; Fri, 21 Dec 2018 06:15:28 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43LM0c0sfGzDqR7 for ; Fri, 21 Dec 2018 06:15:28 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43LLh60YydzDr4t for ; Fri, 21 Dec 2018 06:01:10 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) by bilbo.ozlabs.org (Postfix) with ESMTP id 43LLh56Nxfz8tRf for ; Fri, 21 Dec 2018 06:01:09 +1100 (AEDT) Received: by ozlabs.org (Postfix) id 43LLh55nbcz9sCQ; Fri, 21 Dec 2018 06:01:09 +1100 (AEDT) Delivered-To: linuxppc-dev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=hbathini@linux.ibm.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLh52fNpz9sCV for ; Fri, 21 Dec 2018 06:01:09 +1100 (AEDT) Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBKIwlbs028012 for ; Thu, 20 Dec 2018 14:01:07 -0500 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2pgffrke89-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Dec 2018 14:01:07 -0500 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Dec 2018 19:01:04 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Dec 2018 19:01:02 -0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBKJ10Kw41156716 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 20 Dec 2018 19:01:00 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 96F1DA405F; Thu, 20 Dec 2018 19:01:00 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D37E6A4060; Thu, 20 Dec 2018 19:00:58 +0000 (GMT) Received: from hbathini.in.ibm.com (unknown [9.199.47.3]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 20 Dec 2018 19:00:58 +0000 (GMT) Subject: [PATCH 6/9] powerpc/powernv: export /proc/opalcore for analysing opal crashes From: Hari Bathini To: Ananth N Mavinakayanahalli , Michael Ellerman , Mahesh J Salgaonkar , Vasant Hegde , linuxppc-dev , Stewart Smith Date: Fri, 21 Dec 2018 00:30:57 +0530 In-Reply-To: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> References: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18122019-0020-0000-0000-000002FA7168 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18122019-0021-0000-0000-0000214A84F6 Message-Id: <154533245772.28973.16648357509786201016.stgit@hbathini.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-12-20_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812200154 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Hari Bathini Export /proc/opalcore file to analyze opal crashes Signed-off-by: Hari Bathini --- arch/powerpc/platforms/powernv/Makefile | 2 arch/powerpc/platforms/powernv/opal-core.c | 385 ++++++++++++++++++++++++++ arch/powerpc/platforms/powernv/opal-core.h | 35 ++ arch/powerpc/platforms/powernv/opal-fadump.c | 73 +++++ 4 files changed, 488 insertions(+), 7 deletions(-) create mode 100644 arch/powerpc/platforms/powernv/opal-core.c create mode 100644 arch/powerpc/platforms/powernv/opal-core.h diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index adc0de6..9420631 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -6,7 +6,7 @@ obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o obj-y += opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o -obj-$(CONFIG_FA_DUMP) += opal-fadump.o +obj-$(CONFIG_FA_DUMP) += opal-fadump.o opal-core.o obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o obj-$(CONFIG_CXL_BASE) += pci-cxl.o obj-$(CONFIG_EEH) += eeh-powernv.o diff --git a/arch/powerpc/platforms/powernv/opal-core.c b/arch/powerpc/platforms/powernv/opal-core.c new file mode 100644 index 0000000..1d75526 --- /dev/null +++ b/arch/powerpc/platforms/powernv/opal-core.c @@ -0,0 +1,385 @@ +/* + * Interface for exporting the OPAL ELF core. + * Heavily inspired from fs/proc/vmcore.c + * + * Copyright 2018-2019, IBM Corp. + * Author: Hari Bathini + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "opal-core.h" + +struct opalcore { + struct list_head list; + unsigned long long paddr; + unsigned long long size; + loff_t offset; +}; + +static LIST_HEAD(opalcore_list); + +/* Total size of opalcore file. */ +static size_t opalcore_size; + +/* This buffer includes all the ELF core headers and the PT_NOTE */ +static char *opalcorebuf; +static size_t opalcorebuf_sz; + +/* NT_AUXV buffer */ +static char auxv_buf[AUXV_DESC_SZ]; + +/* Pointer to the first PT_LOAD in the ELF file */ +Elf64_Phdr *ptload_phdr; +unsigned int ptload_cnt; + +static struct proc_dir_entry *proc_opalcore; + +static struct opalcore * __init get_new_element(void) +{ + return kzalloc(sizeof(struct opalcore), GFP_KERNEL); +} + +static inline int is_opalcore_usable(void) +{ + return (opalcorebuf != NULL) ? 1 : 0; +} + +static Elf64_Word *append_elf64_note(Elf64_Word *buf, char *name, + unsigned int type, void *data, + size_t data_len) +{ + Elf64_Nhdr *note = (Elf64_Nhdr *)buf; + Elf64_Word namesz = strlen(name) + 1; + + note->n_namesz = cpu_to_be32(namesz); + note->n_descsz = cpu_to_be32(data_len); + note->n_type = cpu_to_be32(type); + buf += DIV_ROUND_UP(sizeof(*note), sizeof(Elf64_Word)); + memcpy(buf, name, namesz); + buf += DIV_ROUND_UP(namesz, sizeof(Elf64_Word)); + memcpy(buf, data, data_len); + buf += DIV_ROUND_UP(data_len, sizeof(Elf64_Word)); + + return buf; +} + +static void fill_prstatus(struct elf_prstatus *prstatus, int cpu, + struct opalcore_config *oc_conf) +{ + memset(prstatus, 0, sizeof(struct elf_prstatus)); + elf_core_copy_kernel_regs(&(prstatus->pr_reg), &(oc_conf->regs[cpu])); + + /* + * Overload PID with PIR value. + * As a PIR value could also be '0', add an offset of '100' + * to every PIR to avoid misinterpretations in GDB. + */ + prstatus->pr_pid = cpu_to_be32(100 + oc_conf->thread_pir[cpu]); + prstatus->pr_ppid = cpu_to_be32(1); + + /* + * Indicate SIGTERM for crash initiated from OPAL. + * SIGUSR1 otherwise. + */ + if (cpu == oc_conf->crashing_cpu) { + short sig; + + sig = oc_conf->is_opal_initiated ? SIGTERM : SIGUSR1; + prstatus->pr_cursig = cpu_to_be16(sig); + } +} + +static Elf64_Word *regs_to_elf64_notes(Elf64_Word *buf, + struct opalcore_config *oc_conf) +{ + int i; + struct elf_prstatus prstatus; + + /* + * First NT_PRSTATUS note should be crashing cpu info + * for GDB to interpret it appropriately. + */ + fill_prstatus(&prstatus, oc_conf->crashing_cpu, oc_conf); + buf = append_elf64_note(buf, CRASH_CORE_NOTE_NAME, NT_PRSTATUS, + &prstatus, sizeof(prstatus)); + + for_each_cpu(i, &(oc_conf->online_mask)) { + /* + * Skip crashing CPU as it's already added as the first + * NT_PRSTATUS note. + */ + if (i == oc_conf->crashing_cpu) + continue; + + fill_prstatus(&prstatus, i, oc_conf); + buf = append_elf64_note(buf, CRASH_CORE_NOTE_NAME, NT_PRSTATUS, + &prstatus, sizeof(prstatus)); + } + + return buf; +} + +static Elf64_Word *auxv_to_elf64_notes(Elf64_Word *buf, + uint64_t opal_boot_entry) +{ + int idx = 0; + Elf64_Off *bufp = (Elf64_Off *)auxv_buf; + + memset(bufp, 0, AUXV_DESC_SZ); + + /* Entry point of OPAL */ + bufp[idx++] = cpu_to_be64(AT_ENTRY); + bufp[idx++] = cpu_to_be64(opal_boot_entry); + + /* end of vector */ + bufp[idx++] = cpu_to_be64(AT_NULL); + + buf = append_elf64_note(buf, CRASH_CORE_NOTE_NAME, NT_AUXV, + auxv_buf, AUXV_DESC_SZ); + return buf; +} + +/* + * Read from the ELF header and then the crash dump. + * Returns number of bytes read on success, -errno on failure. + */ +static ssize_t read_opalcore(struct file *file, char __user *buffer, + size_t buflen, loff_t *fpos) +{ + struct opalcore *m; + ssize_t tsz, acc = 0; + + if (buflen == 0 || *fpos >= opalcore_size) + return 0; + + /* Read ELF core header and/or PT_NOTE segment */ + if (*fpos < opalcorebuf_sz) { + tsz = min(opalcorebuf_sz - (size_t)*fpos, buflen); + if (copy_to_user(buffer, opalcorebuf + *fpos, tsz)) + return -EFAULT; + buflen -= tsz; + *fpos += tsz; + buffer += tsz; + acc += tsz; + + /* leave now if filled buffer already */ + if (buflen == 0) + return acc; + } + + list_for_each_entry(m, &opalcore_list, list) { + if (*fpos < m->offset + m->size) { + void *addr; + + tsz = (size_t)min_t(unsigned long long, + m->offset + m->size - *fpos, + buflen); + addr = (void *)(m->paddr + *fpos - m->offset); + if (copy_to_user(buffer, __va(addr), tsz)) + return -EFAULT; + buflen -= tsz; + *fpos += tsz; + buffer += tsz; + acc += tsz; + + /* leave now if filled buffer already */ + if (buflen == 0) + return acc; + } + } + + return acc; +} + +static const struct file_operations proc_opalcore_operations = { + .read = read_opalcore, +}; + +int __init create_opalcore(struct opalcore_config *oc_conf) +{ + int hdr_size, cpu_notes_size, order, count; + int i, ret; + unsigned int numcpus; + unsigned long paddr; + Elf64_Ehdr *elf; + Elf64_Phdr *phdr; + loff_t opalcore_off; + struct opalcore *new; + struct page *page; + char *bufp; + struct device_node *dn; + uint64_t opal_base_addr; + uint64_t opal_boot_entry; + + + if (opalcorebuf || (oc_conf->ptload_cnt == 0) || + (oc_conf->ptload_cnt > MAX_PT_LOAD_CNT)) + return -EINVAL; + + numcpus = cpumask_weight(&(oc_conf->online_mask)); + hdr_size = (sizeof(Elf64_Ehdr) + + ((oc_conf->ptload_cnt + 1) * sizeof(Elf64_Phdr))); + cpu_notes_size = ((numcpus * (CRASH_CORE_NOTE_HEAD_BYTES + + CRASH_CORE_NOTE_NAME_BYTES + + CRASH_CORE_NOTE_DESC_BYTES)) + + (CRASH_CORE_NOTE_HEAD_BYTES + + CRASH_CORE_NOTE_NAME_BYTES + AUXV_DESC_SZ)); + opalcorebuf_sz = (hdr_size + cpu_notes_size); + order = get_order(opalcorebuf_sz); + opalcorebuf = (char *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, order); + if (!opalcorebuf) { + pr_err("Not enough memory to setup opalcore\n"); + return -ENOMEM; + } + + pr_debug("opalcorebuf = 0x%lx\n", (unsigned long)opalcorebuf); + + count = 1 << order; + page = virt_to_page(opalcorebuf); + for (i = 0; i < count; i++) + SetPageReserved(page + i); + + /* Read OPAL related device-tree entries */ + dn = of_find_node_by_name(NULL, "ibm,opal"); + if (dn) { + ret = of_property_read_u64(dn, "opal-base-address", + &opal_base_addr); + ret |= of_property_read_u64(dn, "opal-boot-address", + &opal_boot_entry); + } + if (!dn || ret) + pr_warn("WARNING: Failed to read OPAL base & entry values\n"); + + /* Use count to keep track of the program headers */ + count = 0; + + bufp = opalcorebuf; + elf = (Elf64_Ehdr *)bufp; + bufp += sizeof(Elf64_Ehdr); + memcpy(elf->e_ident, ELFMAG, SELFMAG); + elf->e_ident[EI_CLASS] = ELF_CLASS; + elf->e_ident[EI_DATA] = ELFDATA2MSB; + elf->e_ident[EI_VERSION] = EV_CURRENT; + elf->e_ident[EI_OSABI] = ELF_OSABI; + memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD); + elf->e_type = cpu_to_be16(ET_CORE); + elf->e_machine = cpu_to_be16(ELF_ARCH); + elf->e_version = cpu_to_be32(EV_CURRENT); + elf->e_entry = 0; + elf->e_phoff = cpu_to_be64(sizeof(Elf64_Ehdr)); + elf->e_shoff = 0; + elf->e_flags = 0; + + elf->e_ehsize = cpu_to_be16(sizeof(Elf64_Ehdr)); + elf->e_phentsize = cpu_to_be16(sizeof(Elf64_Phdr)); + elf->e_phnum = 0; + elf->e_shentsize = 0; + elf->e_shnum = 0; + elf->e_shstrndx = 0; + + phdr = (Elf64_Phdr *)bufp; + bufp += sizeof(Elf64_Phdr); + phdr->p_type = cpu_to_be32(PT_NOTE); + phdr->p_flags = 0; + phdr->p_align = 0; + phdr->p_paddr = phdr->p_vaddr = 0; + phdr->p_offset = cpu_to_be64(hdr_size); + phdr->p_filesz = phdr->p_memsz = cpu_to_be64(cpu_notes_size); + count++; + + opalcore_off = opalcorebuf_sz; + ptload_phdr = (Elf64_Phdr *)bufp; + ptload_cnt = oc_conf->ptload_cnt; + paddr = 0; + for (i = 0; i < ptload_cnt; i++) { + phdr = (Elf64_Phdr *)bufp; + bufp += sizeof(Elf64_Phdr); + phdr->p_type = cpu_to_be32(PT_LOAD); + phdr->p_flags = cpu_to_be32(PF_R|PF_W|PF_X); + phdr->p_align = 0; + + new = get_new_element(); + if (!new) + return -ENOMEM; + new->paddr = oc_conf->ptload_addr[i]; + new->size = oc_conf->ptload_size[i]; + new->offset = opalcore_off; + list_add_tail(&new->list, &opalcore_list); + + phdr->p_paddr = cpu_to_be64(paddr); + phdr->p_vaddr = cpu_to_be64(opal_base_addr + paddr); + phdr->p_filesz = phdr->p_memsz = + cpu_to_be64(oc_conf->ptload_size[i]); + phdr->p_offset = cpu_to_be64(opalcore_off); + + count++; + opalcore_off += oc_conf->ptload_size[i]; + paddr += oc_conf->ptload_size[i]; + } + + elf->e_phnum = cpu_to_be16(count); + + bufp = (char *)regs_to_elf64_notes((Elf64_Word *)bufp, oc_conf); + bufp = (char *)auxv_to_elf64_notes((Elf64_Word *)bufp, opal_boot_entry); + + opalcore_size = opalcore_off; + return 0; +} + +/* Init function for opalcore module. */ +static int __init opalcore_init(void) +{ + int rc = 0; + + /* + * If opalcorebuf= is set in the 2nd kernel, + * then capture the dump. + */ + if (!(is_opalcore_usable())) + return rc; + + proc_opalcore = proc_create("opalcore", 0400, NULL, + &proc_opalcore_operations); + if (proc_opalcore) + proc_set_size(proc_opalcore, opalcore_size); + return 0; +} +fs_initcall(opalcore_init); + +/* Cleanup function for opalcore module. */ +void opalcore_cleanup(void) +{ + unsigned long order, count, i; + struct page *page; + + if (proc_opalcore) { + proc_remove(proc_opalcore); + proc_opalcore = NULL; + } + + ptload_phdr = NULL; + ptload_cnt = 0; + + /* free core buffer */ + order = get_order(opalcorebuf_sz); + count = 1 << order; + page = virt_to_page(opalcorebuf); + for (i = 0; i < count; i++) + ClearPageReserved(page + i); + __free_pages(page, order); +} diff --git a/arch/powerpc/platforms/powernv/opal-core.h b/arch/powerpc/platforms/powernv/opal-core.h new file mode 100644 index 0000000..bb7a89a --- /dev/null +++ b/arch/powerpc/platforms/powernv/opal-core.h @@ -0,0 +1,35 @@ +/* + * Copyright 2018-2019, IBM Corp. + * Author: Hari Bathini + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _OPALCORE_H +#define _OPALCORE_H + +#define MAX_PT_LOAD_CNT 16 + +/* NT_AUXV note related info */ +#define AUXV_CNT 1 +#define AUXV_DESC_SZ (((2 * AUXV_CNT) + 1) * sizeof(Elf64_Off)) + +struct opalcore_config { + unsigned int crashing_cpu; + unsigned int is_opal_initiated:1; + unsigned int ptload_cnt:15; + unsigned int reserved:16; + unsigned long ptload_addr[MAX_PT_LOAD_CNT]; + unsigned long ptload_size[MAX_PT_LOAD_CNT]; + struct pt_regs regs[NR_CPUS]; + uint32_t thread_pir[NR_CPUS]; + struct cpumask online_mask; +}; + +extern int create_opalcore(struct opalcore_config *opalcore_config); +extern void opalcore_cleanup(void); + +#endif /* _OPALCORE_H */ diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c index 9e677de..5bd0a0f 100644 --- a/arch/powerpc/platforms/powernv/opal-fadump.c +++ b/arch/powerpc/platforms/powernv/opal-fadump.c @@ -27,8 +27,10 @@ #include "../../kernel/fadump_internal.h" #include "opal-fadump.h" +#include "opal-core.h" static struct opal_fadump_mem_struct fdm; +static struct opalcore_config oc_config; static const struct opal_fadump_mem_struct *fdm_active; unsigned long fdm_actual_size; @@ -262,6 +264,8 @@ static int opal_invalidate_fadump(struct fw_dump *fadump_conf) { int rc; + opalcore_cleanup(); + rc = opal_configure_fadump(FADUMP_INVALIDATE, (void *)fdm_active, fdm_actual_size); if (rc) { @@ -291,17 +295,19 @@ static inline int fadump_get_logical_cpu(struct fadump_backup_area *ba, u32 pir) static struct fadump_reg_entry* fadump_read_registers(unsigned int regs_per_thread, struct fadump_reg_entry *reg_entry, - struct pt_regs *regs) + struct pt_regs *regs, bool opal_data) { int i; + u64 reg_value; int reg_cnt = 0; memset(regs, 0, sizeof(struct pt_regs)); for (i = 0; i < regs_per_thread; i++) { + reg_value = (opal_data ? reg_entry->reg_value : + be64_to_cpu(reg_entry->reg_value)); fadump_set_regval_regnum(regs, be64_to_cpu(reg_entry->reg_id), - be64_to_cpu(reg_entry->reg_value), - reg_cnt++); + reg_value, reg_cnt++); reg_entry++; } return reg_entry; @@ -382,12 +388,26 @@ static int __init fadump_build_cpu_notes(struct fw_dump *fadump_conf, regs = fdh->regs; note_buf = fadump_regs_to_elf_notes(note_buf, ®s); + fadump_read_registers(regs_per_thread, + reg_entry, + &oc_config.regs[cpu], + true); + + pr_debug("crashing cpu%d - R1 : 0x%lx, NIP : 0x%lx\n", + cpu, regs.gpr[1], regs.nip); + pr_debug("cpu%d - R1 : 0x%lx, NIP : 0x%lx\n", + cpu, oc_config.regs[cpu].gpr[1], + oc_config.regs[cpu].nip); continue; } } - fadump_read_registers(regs_per_thread, reg_entry, ®s); + fadump_read_registers(regs_per_thread, reg_entry, ®s, false); note_buf = fadump_regs_to_elf_notes(note_buf, ®s); + fadump_read_registers(regs_per_thread, reg_entry, + &oc_config.regs[cpu], true); + pr_debug("cpu%d - R1 : 0x%lx, NIP : 0x%lx\n", cpu, + oc_config.regs[cpu].gpr[1], oc_config.regs[cpu].nip); } final_note(note_buf); @@ -406,7 +426,7 @@ static int __init opal_process_fadump(struct fw_dump *fadump_conf) struct fadump_crash_info_header *fdh; struct fadump_backup_area *backup_info = NULL; unsigned long addr; - int rc = 0; + int i, rc = 0; if (!fdm_active || !fadump_conf->fadumphdr_addr) return -EINVAL; @@ -436,7 +456,48 @@ static int __init opal_process_fadump(struct fw_dump *fadump_conf) */ elfcorehdr_addr = fdh->elfcorehdr_addr; - return rc; + /* + * pt_regs & PIR info for opalcore are populated while building + * cpu notes for vmcore. Populate remaining info to facilitate + * exporting /proc/opalcore file. + */ + oc_config.ptload_cnt = 0; + for (i = 0; i < be16_to_cpu(fdm_active->section_count); i++) { + u8 src_type = fdm_active->section[i].src_type; + + if ((src_type < OPAL_FADUMP_OPAL_REGION) || + (src_type >= OPAL_FADUMP_FW_REGION)) + continue; + + if (oc_config.ptload_cnt >= MAX_PT_LOAD_CNT) + break; + + oc_config.ptload_addr[oc_config.ptload_cnt] = + be64_to_cpu(fdm_active->section[i].dest_addr); + oc_config.ptload_size[oc_config.ptload_cnt++] = + be64_to_cpu(fdm_active->section[i].dest_size); + } + + if (fdh->crashing_cpu == CPU_UNKNOWN) { + u32 pir = be32_to_cpu(fdm_active->crashing_cpu); + + oc_config.is_opal_initiated = 1; + oc_config.crashing_cpu = fadump_get_logical_cpu(backup_info, + pir); + } else { + oc_config.is_opal_initiated = 0; + oc_config.crashing_cpu = fdh->crashing_cpu; + } + + oc_config.online_mask = fdh->online_mask; + memcpy(&(oc_config.thread_pir), &(backup_info->thread_pir), + sizeof(backup_info->thread_pir)); + + rc = create_opalcore(&oc_config); + if (rc) + pr_warn("Could not create opalcore ELF file\n"); + + return 0; } static void opal_fadump_region_show(struct fw_dump *fadump_conf, From patchwork Thu Dec 20 19:01:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hari Bathini X-Patchwork-Id: 1016984 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LM3H2Th3z9sC7 for ; Fri, 21 Dec 2018 06:17:47 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43LM3H0bdmzDrNX for ; Fri, 21 Dec 2018 06:17:47 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43LLhF4bSGzDr3m for ; Fri, 21 Dec 2018 06:01:17 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) by bilbo.ozlabs.org (Postfix) with ESMTP id 43LLhF3Vs0z8tRf for ; Fri, 21 Dec 2018 06:01:17 +1100 (AEDT) Received: by ozlabs.org (Postfix) id 43LLhF2vzwz9sDB; Fri, 21 Dec 2018 06:01:17 +1100 (AEDT) Delivered-To: linuxppc-dev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=hbathini@linux.ibm.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLhD6d0nz9sCh for ; Fri, 21 Dec 2018 06:01:16 +1100 (AEDT) Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBKIwkVC115453 for ; Thu, 20 Dec 2018 14:01:15 -0500 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 2pggdx8yrw-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Dec 2018 14:01:13 -0500 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Dec 2018 19:01:11 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Dec 2018 19:01:10 -0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBKJ18Fr2556332 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 20 Dec 2018 19:01:08 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 91CB2A4054; Thu, 20 Dec 2018 19:01:08 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E9D12A4062; Thu, 20 Dec 2018 19:01:06 +0000 (GMT) Received: from hbathini.in.ibm.com (unknown [9.199.47.3]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 20 Dec 2018 19:01:06 +0000 (GMT) Subject: [PATCH 7/9] powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel From: Hari Bathini To: Ananth N Mavinakayanahalli , Michael Ellerman , Mahesh J Salgaonkar , Vasant Hegde , linuxppc-dev , Stewart Smith Date: Fri, 21 Dec 2018 00:31:05 +0530 In-Reply-To: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> References: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18122019-0012-0000-0000-000002DBEA6D X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18122019-0013-0000-0000-000021118D9B Message-Id: <154533246582.28973.5733341614417266722.stgit@hbathini.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-12-20_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812200154 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures that crash data, from previously crash'ed kernel, is preserved. This helps in cases where FADUMP is not enabled but the subsequent memory preserving kernel boot is likely to process this crash data. One typical usecase for this config option is petitboot kernel. Signed-off-by: Hari Bathini --- arch/powerpc/Kconfig | 9 ++++++ arch/powerpc/include/asm/fadump.h | 12 ++++---- arch/powerpc/kernel/Makefile | 6 +++- arch/powerpc/kernel/fadump.c | 41 ++++++++++++++++++++++---- arch/powerpc/kernel/fadump_internal.h | 6 ++++ arch/powerpc/kernel/prom.c | 4 +-- arch/powerpc/platforms/powernv/Makefile | 6 +++- arch/powerpc/platforms/powernv/opal-fadump.c | 35 ++++++++++++++++++++++ 8 files changed, 104 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 08add7a..afa4e79 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -579,6 +579,15 @@ config FA_DUMP If unsure, say "y". Only special kernels like petitboot may need to say "N" here. +config PRESERVE_FA_DUMP + bool "Preserve Firmware-assisted dump" + depends on PPC64 && PPC_POWERNV && !FA_DUMP + help + On a kernel with FA_DUMP disabled, this option helps to preserve + crash data from a previously crash'ed kernel. Useful when the next + memory preserving kernel boot would process this crash data. + Petitboot kernel is the typical usecase for this option. + config IRQ_ALL_CPUS bool "Distribute interrupts on all CPUs by default" depends on SMP diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h index db9465f..92a9ddf 100644 --- a/arch/powerpc/include/asm/fadump.h +++ b/arch/powerpc/include/asm/fadump.h @@ -22,14 +22,16 @@ #ifndef __PPC64_FA_DUMP_H__ #define __PPC64_FA_DUMP_H__ -#ifdef CONFIG_FA_DUMP +#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP) +extern int early_init_dt_scan_fw_dump(unsigned long node, const char *uname, + int depth, void *data); +extern int fadump_reserve_mem(void); +#endif +#ifdef CONFIG_FA_DUMP extern int crashing_cpu; extern int is_fadump_memory_area(u64 addr, ulong size); -extern int early_init_dt_scan_fw_dump(unsigned long node, const char *uname, - int depth, void *data); -extern int fadump_reserve_mem(void); extern int setup_fadump(void); extern int is_fadump_active(void); extern int should_fadump_crash(void); @@ -41,4 +43,4 @@ static inline int is_fadump_active(void) { return 0; } static inline int should_fadump_crash(void) { return 0; } static inline void crash_fadump(struct pt_regs *regs, const char *str) { } #endif -#endif +#endif /* __PPC64_FA_DUMP_H__ */ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 8e4bade..8ed84d2 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -65,7 +65,11 @@ obj-$(CONFIG_EEH) += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \ eeh_driver.o eeh_event.o eeh_sysfs.o obj-$(CONFIG_GENERIC_TBSYNC) += smp-tbsync.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o -obj-$(CONFIG_FA_DUMP) += fadump.o fadump_internal.o +ifeq ($(CONFIG_FA_DUMP),y) +obj-y += fadump.o fadump_internal.o +else +obj-$(CONFIG_PRESERVE_FA_DUMP) += fadump.o +endif ifdef CONFIG_PPC32 obj-$(CONFIG_E500) += idle_e500.o endif diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index d9cf809..2c9457b 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -47,6 +47,7 @@ static struct fw_dump fw_dump; +#ifdef CONFIG_FA_DUMP #ifdef CONFIG_CMA static struct cma *fadump_cma; #endif @@ -117,6 +118,7 @@ int __init fadump_cma_init(void) #else static int __init fadump_cma_init(void) { return 1; } #endif /* CONFIG_CMA */ +#endif /* CONFIG_FA_DUMP */ /* Scan the Firmware Assisted dump configuration details. */ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, @@ -125,8 +127,10 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, if (depth != 1) return 0; +#ifdef CONFIG_FA_DUMP if (strcmp(uname, "rtas") == 0) return pseries_dt_scan_fadump(&fw_dump, node); +#endif if (strcmp(uname, "ibm,opal") == 0) return opal_dt_scan_fadump(&fw_dump, node); @@ -134,6 +138,7 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, const char *uname, return 0; } +#ifdef CONFIG_FA_DUMP /* * If fadump is registered, check if the memory provided * falls within boot memory area and reserved memory area. @@ -366,6 +371,7 @@ static int __init fadump_get_rmr_regions(void) return ret; } +#endif /* CONFIG_FA_DUMP */ /* Preserve everything above the base address */ static void __init fadump_reserve_crash_area(unsigned long base) @@ -384,7 +390,7 @@ static void __init fadump_reserve_crash_area(unsigned long base) msize -= (base - mstart); mstart = base; } - pr_info("Reserving %luMB of memory at %#016lx for saving crash dump", + pr_info("Reserving %luMB of memory at %#016lx for preserving crash data", (msize >> 20), mstart); memblock_reserve(mstart, msize); } @@ -392,26 +398,46 @@ static void __init fadump_reserve_crash_area(unsigned long base) int __init fadump_reserve_mem(void) { - unsigned long base, size, memory_boundary; + unsigned long base; +#ifndef CONFIG_PRESERVE_FA_DUMP + unsigned long size, memory_boundary; if (!fw_dump.fadump_enabled) return 0; if (!fw_dump.fadump_supported) { - printk(KERN_INFO "Firmware-assisted dump is not supported on" - " this hardware\n"); + pr_info("Firmware-assisted dump is not supported on this hardware\n"); fw_dump.fadump_enabled = 0; return 0; } +#endif /* * Initialize boot memory size * If dump is active then we have already calculated the size during * first kernel. */ - if (fw_dump.dump_active) + if (fw_dump.dump_active) { fw_dump.boot_memory_size = fw_dump.rmr_source_len; - else { + + /* + * When dump is active but PRESERVE_FA_DUMP is enabled on the + * kernel, preserve crash data. The subsequent memory preserving + * kernel boot is likely to process this crash data. + */ +#ifdef CONFIG_PRESERVE_FA_DUMP + pr_info("Preserving crash data for processing in next boot.\n"); + base = fw_dump.boot_memory_size + fw_dump.boot_memory_hole_size; + base = PAGE_ALIGN(base); + + /* + * If last boot has crashed then reserve all the memory + * above boot memory size to preserve crash data. + */ + fadump_reserve_crash_area(base); + } +#else /* CONFIG_PRESERVE_FA_DUMP */ + } else { fw_dump.boot_memory_size = fadump_calculate_reserve_size(); #ifdef CONFIG_CMA if (!fw_dump.nocma) @@ -499,6 +525,7 @@ int __init fadump_reserve_mem(void) fw_dump.reserve_dump_area_start = base; return fadump_cma_init(); } +#endif /* !CONFIG_PRESERVE_FA_DUMP */ return 1; } @@ -507,6 +534,7 @@ unsigned long __init arch_reserved_kernel_pages(void) return memblock_reserved_size() / PAGE_SIZE; } +#ifdef CONFIG_FA_DUMP /* Look for fadump= cmdline option. */ static int __init early_fadump_param(char *p) { @@ -1234,3 +1262,4 @@ int __init setup_fadump(void) return 1; } subsys_initcall(setup_fadump); +#endif /* CONFIG_FA_DUMP */ diff --git a/arch/powerpc/kernel/fadump_internal.h b/arch/powerpc/kernel/fadump_internal.h index a117f60..ce4c0f9 100644 --- a/arch/powerpc/kernel/fadump_internal.h +++ b/arch/powerpc/kernel/fadump_internal.h @@ -13,6 +13,7 @@ #ifndef __PPC64_FA_DUMP_INTERNAL_H__ #define __PPC64_FA_DUMP_INTERNAL_H__ +#ifdef CONFIG_FA_DUMP /* * The RMA region will be saved for later dumping when kernel crashes. * RMA is Real Mode Area, the first block of logical memory address owned @@ -106,6 +107,7 @@ struct fadump_backup_area { * the version field. */ }; +#endif /* CONFIG_FA_DUMP */ /* Firmware-Assited Dump platforms */ enum fadump_platform_type { @@ -166,9 +168,12 @@ struct fw_dump { unsigned long nocma:1; enum fadump_platform_type fadump_platform; +#ifdef CONFIG_FA_DUMP struct fadump_ops *ops; +#endif }; +#ifdef CONFIG_FA_DUMP struct fadump_ops { ulong (*init_fadump_mem_struct)(struct fw_dump *fadump_config); ulong (*init_fadump_header)(struct fw_dump *fadump_config); @@ -198,6 +203,7 @@ int is_boot_memory_area_contiguous(struct fw_dump *fadump_conf); int is_reserved_memory_area_contiguous(struct fw_dump *fadump_conf); void init_fadump_backup_area(struct fw_dump *fadump_conf); unsigned long fadump_populate_backup_area(struct fw_dump *fadump_conf); +#endif /* CONFIG_FA_DUMP */ #ifdef CONFIG_PPC_PSERIES extern int pseries_dt_scan_fadump(struct fw_dump *fadump_config, ulong node); diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index fe758ce..aa425ad 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -705,7 +705,7 @@ void __init early_init_devtree(void *params) of_scan_flat_dt(early_init_dt_scan_opal, NULL); #endif -#ifdef CONFIG_FA_DUMP +#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP) /* scan tree to see if dump is active during last boot */ of_scan_flat_dt(early_init_dt_scan_fw_dump, NULL); #endif @@ -732,7 +732,7 @@ void __init early_init_devtree(void *params) if (PHYSICAL_START > MEMORY_START) memblock_reserve(MEMORY_START, 0x8000); reserve_kdump_trampoline(); -#ifdef CONFIG_FA_DUMP +#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP) /* * If we fail to reserve memory for firmware-assisted dump then * fallback to kexec based kdump. diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 9420631..750675e 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -6,7 +6,11 @@ obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o obj-y += opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o -obj-$(CONFIG_FA_DUMP) += opal-fadump.o opal-core.o +ifeq ($(CONFIG_FA_DUMP),y) +obj-y += opal-fadump.o opal-core.o +else +obj-$(CONFIG_PRESERVE_FA_DUMP) += opal-fadump.o +endif obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o obj-$(CONFIG_CXL_BASE) += pci-cxl.o obj-$(CONFIG_EEH) += eeh-powernv.o diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c index 5bd0a0f..b3f3903 100644 --- a/arch/powerpc/platforms/powernv/opal-fadump.c +++ b/arch/powerpc/platforms/powernv/opal-fadump.c @@ -27,13 +27,16 @@ #include "../../kernel/fadump_internal.h" #include "opal-fadump.h" +#ifdef CONFIG_FA_DUMP #include "opal-core.h" static struct opal_fadump_mem_struct fdm; static struct opalcore_config oc_config; +#endif static const struct opal_fadump_mem_struct *fdm_active; unsigned long fdm_actual_size; +#ifndef CONFIG_PRESERVE_FA_DUMP /* * Backup area is not available in older format. In the newer fadump * header format (v2), backup info is stored at the end of elfcorehdrs @@ -74,6 +77,7 @@ static void opal_set_preserv_area_start(struct fw_dump *fadump_conf) pr_debug("Preserve area start address: 0x%lx\n", fadump_conf->preserv_area_start); } +#endif /* !CONFIG_PRESERVE_FA_DUMP */ static void update_fadump_config(struct fw_dump *fadump_conf, const struct opal_fadump_mem_struct *fdm) @@ -142,11 +146,41 @@ static void update_fadump_config(struct fw_dump *fadump_conf, fadump_conf->rmr_regions_cnt); } +#ifndef CONFIG_PRESERVE_FA_DUMP fadump_set_meta_area_start(fadump_conf); opal_set_preserv_area_start(fadump_conf); opal_set_backup_area_start(fadump_conf); +#endif } +/* + * When dump is active but PRESERVE_FA_DUMP is enabled on the kernel, + * ensure crash data is preserved in hope that the subsequent memory + * preserving kernel boot is going to process this crash data. + */ +#ifdef CONFIG_PRESERVE_FA_DUMP +int __init opal_dt_scan_fadump(struct fw_dump *fadump_conf, ulong node) +{ + unsigned long dn; + + dn = of_get_flat_dt_subnode_by_name(node, "dump"); + if (dn == -FDT_ERR_NOTFOUND) + return 1; + + /* + * Check if dump has been initiated on last reboot. + */ + fdm_active = of_get_flat_dt_prop(dn, "result-table", NULL); + if (fdm_active) { + pr_info("Firmware-assisted dump is active.\n"); + fadump_conf->dump_active = 1; + update_fadump_config(fadump_conf, (void *)__pa(fdm_active)); + } + + return 1; +} + +#else /* CONFIG_PRESERVE_FA_DUMP */ static ulong opal_init_fadump_mem_struct(struct fw_dump *fadump_conf) { ulong addr = fadump_conf->reserve_dump_area_start; @@ -618,3 +652,4 @@ int __init opal_dt_scan_fadump(struct fw_dump *fadump_conf, ulong node) return 1; } +#endif /* !CONFIG_PRESERVE_FA_DUMP */ From patchwork Thu Dec 20 19:01:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hari Bathini X-Patchwork-Id: 1016986 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LM5f5SnFz9s8J for ; Fri, 21 Dec 2018 06:19:50 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43LM5f41jSzDqZx for ; Fri, 21 Dec 2018 06:19:50 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43LLhN4LPHzDr5j for ; Fri, 21 Dec 2018 06:01:24 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) by bilbo.ozlabs.org (Postfix) with ESMTP id 43LLhN3YTsz8tRf for ; Fri, 21 Dec 2018 06:01:24 +1100 (AEDT) Received: by ozlabs.org (Postfix) id 43LLhN3Bcdz9sDB; Fri, 21 Dec 2018 06:01:24 +1100 (AEDT) Delivered-To: linuxppc-dev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=hbathini@linux.ibm.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLhM6XFxz9sCh for ; Fri, 21 Dec 2018 06:01:23 +1100 (AEDT) Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBKIwlK4059994 for ; Thu, 20 Dec 2018 14:01:22 -0500 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0b-001b2d01.pphosted.com with ESMTP id 2pgg39sr8p-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Dec 2018 14:01:21 -0500 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Dec 2018 19:01:19 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Dec 2018 19:01:18 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBKJ1GEc4063536 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 20 Dec 2018 19:01:16 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5F1C95204F; Thu, 20 Dec 2018 19:01:16 +0000 (GMT) Received: from hbathini.in.ibm.com (unknown [9.199.47.3]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id DCCAC5205A; Thu, 20 Dec 2018 19:01:14 +0000 (GMT) Subject: [PATCH 8/9] powerpc/fadump: use FADump instead of fadump for how it is pronounced From: Hari Bathini To: Ananth N Mavinakayanahalli , Michael Ellerman , Mahesh J Salgaonkar , Vasant Hegde , linuxppc-dev , Stewart Smith Date: Fri, 21 Dec 2018 00:31:13 +0530 In-Reply-To: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> References: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18122019-0020-0000-0000-000002FA716C X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18122019-0021-0000-0000-0000214A84FB Message-Id: <154533247381.28973.1251273540929942556.stgit@hbathini.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-12-20_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812200154 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Signed-off-by: Hari Bathini --- Documentation/powerpc/firmware-assisted-dump.txt | 56 +++++++++++----------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index 4897665..326f89c 100644 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -8,18 +8,18 @@ a crashed system, and to do so from a fully-reset system, and to minimize the total elapsed time until the system is back in production use. -- Firmware assisted dump (fadump) infrastructure is intended to replace +- Firmware-Assisted Dump (FADump) infrastructure is intended to replace the existing phyp assisted dump. - Fadump uses the same firmware interfaces and memory reservation model as phyp assisted dump. -- Unlike phyp dump, fadump exports the memory dump through /proc/vmcore +- Unlike phyp dump, FADump exports the memory dump through /proc/vmcore in the ELF format in the same way as kdump. This helps us reuse the kdump infrastructure for dump capture and filtering. - Unlike phyp dump, userspace tool does not need to refer any sysfs interface while reading /proc/vmcore. -- Unlike phyp dump, fadump allows user to release all the memory reserved +- Unlike phyp dump, FADump allows user to release all the memory reserved for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem. -- Once enabled through kernel boot parameter, fadump can be +- Once enabled through kernel boot parameter, FADump can be started/stopped through /sys/kernel/fadump_registered interface (see sysfs files section below) and can be easily integrated with kdump service start/stop init scripts. @@ -33,7 +33,7 @@ dump offers several strong, practical advantages: in a clean, consistent state. -- Once the dump is copied out, the memory that held the dump is immediately available to the running kernel. And therefore, - unlike kdump, fadump doesn't need a 2nd reboot to get back + unlike kdump, FADump doesn't need a 2nd reboot to get back the system to the production configuration. The above can only be accomplished by coordination with, @@ -61,7 +61,7 @@ as follows: boot successfully. For syntax of crashkernel= parameter, refer to Documentation/kdump/kdump.txt. If any offset is provided in crashkernel= parameter, it will be ignored - as fadump uses a predefined offset to reserve memory + as FADump uses a predefined offset to reserve memory for boot memory dump preservation in case of a crash. -- After the low memory (boot memory) area has been saved, the @@ -119,7 +119,7 @@ blocking this significant chunk of memory from production kernel. Hence, the implementation uses the Linux kernel's Contiguous Memory Allocator (CMA) for memory reservation if CMA is configured for kernel. With CMA reservation this memory will be available for applications to -use it, while kernel is prevented from using it. With this fadump will +use it, while kernel is prevented from using it. With this FADump will still be able to capture all of the kernel memory and most of the user space memory except the user pages that were present in CMA region. @@ -169,14 +169,14 @@ KDump, as dump mechanism. The tools to examine the dump will be same as the ones used for kdump. -How to enable firmware-assisted dump (fadump): +How to enable firmware-assisted dump (FADump): ------------------------------------- 1. Set config option CONFIG_FA_DUMP=y and build kernel. -2. Boot into linux kernel with 'fadump=on' kernel cmdline option. - By default, fadump reserved memory will be initialized as CMA area. - Alternatively, user can boot linux kernel with 'fadump=nocma' to - prevent fadump to use CMA. +2. Boot into linux kernel with 'FADump=on' kernel cmdline option. + By default, FADump reserved memory will be initialized as CMA area. + Alternatively, user can boot linux kernel with 'FADump=nocma' to + prevent FADump to use CMA. 3. Optionally, user can also set 'crashkernel=' kernel cmdline to specify size of the memory to reserve for boot memory dump preservation. @@ -189,7 +189,7 @@ NOTE: 1. 'fadump_reserve_mem=' parameter has been deprecated. Instead option is set at kernel cmdline. 3. if user wants to capture all of user space memory and ok with reserved memory not available to production system, then - 'fadump=nocma' kernel parameter can be used to fallback to + 'FADump=nocma' kernel parameter can be used to fallback to old behaviour. Sysfs/debugfs files: @@ -202,29 +202,29 @@ Here is the list of files under kernel sysfs: /sys/kernel/fadump_enabled - This is used to display the fadump status. - 0 = fadump is disabled - 1 = fadump is enabled + This is used to display the FADump status. + 0 = FADump is disabled + 1 = FADump is enabled This interface can be used by kdump init scripts to identify if - fadump is enabled in the kernel and act accordingly. + FADump is enabled in the kernel and act accordingly. /sys/kernel/fadump_registered - This is used to display the fadump registration status as well - as to control (start/stop) the fadump registration. - 0 = fadump is not registered. - 1 = fadump is registered and ready to handle system crash. + This is used to display the FADump registration status as well + as to control (start/stop) the FADump registration. + 0 = FADump is not registered. + 1 = FADump is registered and ready to handle system crash. - To register fadump echo 1 > /sys/kernel/fadump_registered and + To register FADump echo 1 > /sys/kernel/fadump_registered and echo 0 > /sys/kernel/fadump_registered for un-register and stop the - fadump. Once the fadump is un-registered, the system crash will not + FADump. Once the FADump is un-registered, the system crash will not be handled and vmcore will not be captured. This interface can be easily integrated with kdump service start/stop. /sys/kernel/fadump_release_mem - This file is available only when fadump is active during + This file is available only when FADump is active during second kernel. This is used to release the reserved memory region that are held for saving crash dump. To release the reserved memory echo 1 to it: @@ -243,20 +243,20 @@ Here is the list of files under powerpc debugfs: /sys/kernel/debug/powerpc/fadump_region - This file shows the reserved memory regions if fadump is + This file shows the reserved memory regions if FADump is enabled otherwise this file is empty. The output format is: : [-] bytes, Dumped: e.g. - Contents when fadump is registered during first kernel + Contents when FADump is registered during first kernel # cat /sys/kernel/debug/powerpc/fadump_region CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0 HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0 DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0 - Contents when fadump is active during second kernel + Contents when FADump is active during second kernel # cat /sys/kernel/debug/powerpc/fadump_region CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020 @@ -273,7 +273,7 @@ TODO: o Need to come up with the better approach to find out more accurate boot memory size that is required for a kernel to boot successfully when booted with restricted memory. - o The fadump implementation introduces a fadump crash info structure + o The FADump implementation introduces a FADump crash info structure in the scratch area before the ELF core header. The idea of introducing this structure is to pass some important crash info data to the second kernel which will help second kernel to populate ELF core header with From patchwork Thu Dec 20 19:01:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hari Bathini X-Patchwork-Id: 1016988 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LM8T10mVz9sDB for ; Fri, 21 Dec 2018 06:22:17 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43LM8S5yRmzDrMD for ; Fri, 21 Dec 2018 06:22:16 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from ozlabs.org (bilbo.ozlabs.org [203.11.71.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43LLhX086czDr6x for ; Fri, 21 Dec 2018 06:01:32 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) by bilbo.ozlabs.org (Postfix) with ESMTP id 43LLhW6T3Zz8tRf for ; Fri, 21 Dec 2018 06:01:31 +1100 (AEDT) Received: by ozlabs.org (Postfix) id 43LLhW65S7z9sCr; Fri, 21 Dec 2018 06:01:31 +1100 (AEDT) Delivered-To: linuxppc-dev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=hbathini@linux.ibm.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43LLhW1xCfz9sCh for ; Fri, 21 Dec 2018 06:01:31 +1100 (AEDT) Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wBKIwlGP115583 for ; Thu, 20 Dec 2018 14:01:29 -0500 Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103]) by mx0b-001b2d01.pphosted.com with ESMTP id 2pggdx9088-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Dec 2018 14:01:29 -0500 Received: from localhost by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Dec 2018 19:01:27 -0000 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Dec 2018 19:01:26 -0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wBKJ1OIH8192390 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 20 Dec 2018 19:01:24 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 64F144C044; Thu, 20 Dec 2018 19:01:24 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B0EC64C050; Thu, 20 Dec 2018 19:01:22 +0000 (GMT) Received: from hbathini.in.ibm.com (unknown [9.199.47.3]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 20 Dec 2018 19:01:22 +0000 (GMT) Subject: [PATCH 9/9] powerpc/fadump: Update documentation about OPAL platform support From: Hari Bathini To: Ananth N Mavinakayanahalli , Michael Ellerman , Mahesh J Salgaonkar , Vasant Hegde , linuxppc-dev , Stewart Smith Date: Fri, 21 Dec 2018 00:31:21 +0530 In-Reply-To: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> References: <154533238217.28973.10173741387253773210.stgit@hbathini.in.ibm.com> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18122019-0028-0000-0000-0000032D5DE6 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18122019-0029-0000-0000-000023E9C562 Message-Id: <154533248159.28973.754639553919897320.stgit@hbathini.in.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-12-20_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812200154 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" With FADump support now available on both pseries and OPAL platforms, update FADump documentation with these details. Also, update about backup area and why it is used. Signed-off-by: Hari Bathini --- Documentation/powerpc/firmware-assisted-dump.txt | 102 ++++++++++++++-------- 1 file changed, 64 insertions(+), 38 deletions(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index 326f89c..eff9f38 100644 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -70,7 +70,8 @@ as follows: normal. -- The freshly booted kernel will notice that there is a new - node (ibm,dump-kernel) in the device tree, indicating that + node (ibm,dump-kernel on PSeries or ibm,opal/dump/result-table + on OPAL platform) in the device tree, indicating that there is crash data available from a previous boot. During the early boot OS will reserve rest of the memory above boot memory size effectively booting with restricted memory @@ -92,7 +93,20 @@ as follows: Please note that the firmware-assisted dump feature is only available on Power6 and above systems with recent -firmware versions. +firmware versions on PSeries (PowerVM) platform and Power9 +and above systems with recent firmware versions on PowerNV +(OPAL) platform. + +To process dump on OPAL platform, additional meta data (PIR to +Logical CPU map) from the crashing kernel is required. This info +has to be backed up by the crashing kernel for capture kernel to +use it in making sense of the register state data provided by the +F/W. The start address of the area where this info is backed up +is stored at the tail end of FADump crash info header. To indicate +the presence of this additional meta data (backup info), the magic +number field in FADump crash info header is overloaded as version +identifier. + Implementation details: ---------------------- @@ -108,56 +122,65 @@ that are run. If there is dump data, then the memory is held. If there is no waiting dump data, then only the memory required -to hold CPU state, HPTE region, boot memory dump and elfcore -header, is usually reserved at an offset greater than boot memory -size (see Fig. 1). This area is *not* released: this region will -be kept permanently reserved, so that it can act as a receptacle -for a copy of the boot memory content in addition to CPU state -and HPTE region, in the case a crash does occur. Since this reserved -memory area is used only after the system crash, there is no point in -blocking this significant chunk of memory from production kernel. -Hence, the implementation uses the Linux kernel's Contiguous Memory -Allocator (CMA) for memory reservation if CMA is configured for kernel. -With CMA reservation this memory will be available for applications to -use it, while kernel is prevented from using it. With this FADump will -still be able to capture all of the kernel memory and most of the user -space memory except the user pages that were present in CMA region. +to hold CPU state, HPTE region, boot memory dump, FADump header, +elfcore header and backup area, is usually reserved at an offset +greater than boot memory size (see Fig. 1). This area is *not* +released: this region will be kept permanently reserved, so that +it can act as a receptacle for a copy of the boot memory content in +addition to CPU state and HPTE region, in the case a crash does occur. +Since this reserved memory area is used only after the system crash, +there is no point in blocking this significant chunk of memory from +production kernel. Hence, the implementation uses the Linux kernel's +Contiguous Memory Allocator (CMA) for memory reservation if CMA is +configured for kernel. With CMA reservation this memory will be +available for applications to use it, while kernel is prevented from +using it. With this FADump will still be able to capture all of the +kernel memory and most of the user space memory except the user pages +that were present in CMA region. o Memory Reservation during first kernel - Low memory Top of memory - 0 boot memory size |<--Reserved dump area --->| | - | | | Permanent Reservation | | - V V | (Preserve area) | V - +-----------+----------/ /---+---+----+--------+---+----+------+ - | | |CPU|HPTE| DUMP |HDR|ELF | | - +-----------+----------/ /---+---+----+--------+---+----+------+ - | ^ ^ - | | | - \ / | - ----------------------------------- FADump Header - Boot memory content gets transferred (meta area) - to reserved area by firmware at the - time of crash - + Low memory Top of memory + 0 boot memory size |<---- Reserved dump area ---->| | + | | | Permanent Reservation | | + V V | (Preserve area) | V + +-----------+--------/ /---+---+----+-------+-----+----+--+-------+ + | | |///|////| DUMP |HDR|/|ELF |//| | + +-----------+--------/ /---+---+----+-------+-----+----+--+-------+ + | ^ ^ ^ ^ | ^^ + | | | | | | || + \ CPU HPTE / | \ / Backup Info + --------------------------------- | ---- + Boot memory content gets transferred | Start address of + to reserved area by firmware at the | Backup Info. + time of crash. | + FADump Header + (meta area) Fig. 1 o Memory Reservation during second kernel after crash - Low memory Top of memory - 0 boot memory size | - | |<------------- Reserved dump area --------------->| - V V |<---- Preserve area ----->| V - +-----------+----------/ /---+---+----+--------+---+----+------+ - | | |CPU|HPTE| DUMP |HDR|ELF | | - +-----------+----------/ /---+---+----+--------+---+----+------+ + Low memory Top of memory + 0 boot memory size | + | |<--------------- Reserved dump area ---------------->| + V V |<----- Preserve area -------->| | + +-----------+--------/ /---+---+----+-------+-----+----+--+-------+ + | | |///|////| DUMP |HDR|/|ELF |//| | + +-----------+--------/ /---+---+----+-------+-----+----+--+-------+ | | V V Used by second /proc/vmcore kernel to boot Fig. 2 + +---+ + |///| -> Regions (CPU, HPTE, HDR extension & Backup area) marked + +---+ like this in the above figures are not always present + For example, OPAL platform does not have CPU & HPTE regions + while PSeries platform doesn't use Backup area currently. + + Currently the dump will be copied from /proc/vmcore to a new file upon user intervention. The dump data available through /proc/vmcore will be in ELF format. Hence the existing kdump infrastructure (kdump scripts) @@ -289,7 +312,10 @@ TODO: 2. Reserve the area of predefined size (say PAGE_SIZE) for this structure and have unused area as reserved (initialized to zero) for future field additions. + The advantage of approach 1 over 2 is we don't need to reserve extra space. + Using approach 1 to provide additional meta data on OPAL platform while + overloading magic number field as version identifier for version tracking. --- Author: Mahesh Salgaonkar This document is based on the original documentation written for phyp